Post-processors

The anemoi-graphs package provides an API to implement post-processors, which are optional methods applied after a graph is constructed. These post-processors allow users to modify or refine the graph to suit specific use cases. They can be configured in the recipe file to enable flexible and automated post-processing workflows.

RemoveUnconnectedNodes

The RemoveUnconnectedNodes post-processor is designed to prune unconnected nodes from a graph. This is particularly useful in scenarios where disconnected nodes do not contribute to the analysis or where the focus is limited to a specific subset of the graph.

One notable application of RemoveUnconnectedNodes is in Limited Area Modeling (LAM), where a global dataset is often specified as a forcing boundary, but the analysis is only concerned with nodes near the limited area boundary. By pruning unconnected nodes, this post-processor ensures the resulting graph is focused on the region of interest, making it more efficient during training.

The RemoveUnconnectedNodes post-processor also provides functionality to store the indices of the pruned nodes (mask). This feature is particularly valuable for workflows involving training or inference, as it enables users to repeat the same masking operation consistently across different stages of analysis. To enable this feature, the user can specify the save_indices_mask_attr parameter. This parameter takes a string that represents the name of the new node attribute where the masking indices will be stored.

nodes: ...
edges: ...

post_processors:
- _target_: anemoi.graphs.processors.RemoveUnconnectedNodes
   nodes_name: data
   save_mask_indices_to_attr: indices_connected_nodes # optional

The RemoveUnconnectedNodes post-processor also supports an ignore argument, which is optional but highly convenient in certain use cases. This argument corresponds to the name of a node attribute used as a mask to prevent certain nodes from being dropped, even if they are unconnected. For example, in LAM workflows, it may be necessary to retain data nodes from the regional dataset that remain unconnected.

By specifying the ignore argument, users can ensure that such nodes are preserved. For example:

nodes: ...
edges: ...

post_processors:
- _target_: anemoi.graphs.processors.RemoveUnconnectedNodes
   nodes_name: data
   ignore: important_nodes
   save_mask_indices_to_attr: indices_connected_nodes # optional

In this configuration, any node with the attribute important_nodes set will not be pruned, regardless of its connectivity status.

RestrictEdgeLength

The RestrictEdgeLength post-processor will remove edges longer than a certain treshold (set in km). This can be useful when one or multiple edge builders create edges of various lenghts, some of which are undesirable. For example when using KNNEdges applied to all of the hidden mesh but only a subset of the data nodes (e.g. those in a LAM region) also connections are made to hidden mesh nodes very far away from the restricted set of data nodes. With this post-processor one can remove such edges, effectively providing a KNNedges algorithm applied only that part of the data mesh within a certain distance to the restricted set of data nodes.

After the long edges are removed the edge attributes are recomputed, since the removal of a large number of edges can change their distribution.

nodes: ...
edges: ...

post_processors:
- _target_: anemoi.graphs.processors.RestrictEdgeLength
  source_name: data                #source nodes of the edges to be processed
  target_name: hidden              #target nodes of the edges to be processed
  max_length_km: 20                #edges longer than the threshold of 20 km will be removed

The RestrictEdgeLength post-processor also supports the source_mask_attr_name and target_mask_attr_name arguments. These are optional but allow to refer to a Boolean attribute of the source/target nodes and only those edges whose source/target is True under this Boolean mask will be postprocessed. This can be useful if one wants to exclude a subset of edges that are allowed to be longer than the threshold. An example usage:

nodes: ...
   attributes:
      cutout:
         _target_: anemoi.graphs.nodes.attributes.CutOutMask
edges: ...
postprocessors:
- _target_: anemoi.graphs.processors.RestrictEdgeLength
  source_name: data                #source nodes of the edges to be processed
  target_name: hidden              #target nodes of the edges to be processed
  max_length_km: 20                    #edges longer than this threshold (in km) will be removed
  source_mask_attr_name: cutout    #optional

With this configuration only edges whose source is in the cutout region will be post-processed, i.e. those edges with source node outside the cutout region will be preserved regardless of their length.