Background

This page introduces some graph notation, terminology, and background information which will be used in the rest of the documentation.

Terminology

A graph \(G = (V, E)\) is a collection of nodes/vertices \(V\) and edges \(E\) that connect the nodes.

nodes

A node represents a location (2D) on the earth’s surface which may contain additional attributes.

edges

An edge represents a connection between two nodes. The edges can be used to define the flow of information between the nodes. Edges may also contain attributes related to their length, direction or other properties.

Encoder-processor-decoder graph

In weather models, the nodes \(V\) can be classified into two categories:

data nodes

A set of nodes representing one or multiple datasets. The data nodes may correspond to the input/output of our data-driven model. They can be defined from anemoi datasets and this method supports all anemoi-datasets operations such as cutout or thinning.

hidden nodes

The hidden nodes capture intermediate representations of the model, which are used to learn the dynamics of the system considered (atmosphere, ocean, etc, …). These nodes can be generated from existing locations (Anemoi datasets or NPZ files) or algorithmically from iterative refinements of polygons over the globe.

Another important term that can refer to both data and hidden nodes is the following:

isolated nodes

A set of nodes that are not connected to any other nodes in the graph. These nodes can be used to store additional information that is not directly used in the training process.

Similarly, the edges \(V\) can be classified into three categories:

  • Encoder edges: These encoder edges connect the data nodes with the hidden nodes to encode the input data into the latent space.

  • Processor edges: These processor edges connect the hidden nodes with the hidden nodes to process the latent space.

  • Decoder edges: These decoder edges connect the hidden nodes with the data nodes to decode the latent space into the output data.

The commands and syntax for building the graphs at each layer are the same in anemoi-graphs. However, it is important to keep this distinction in mind when designing a weather graph to be used in a data-driven model with anemoi-training.

Graph configurations

anemoi-graphs offers exceptional flexibility, allowing users to define a wide variety of graph configurations. The image below highlights some particularly useful examples.

Graph configurations

Design principles

When designing a graph for a weather model, we suggest the following guidelines:

  • Use a coarser resolution for the hidden nodes. This will reduce the computational cost of training and inference.

  • All input nodes should be connected to the hidden nodes. This will ensure that all available information can be used.

  • In the encoder edges, minimise the number of connections to the hidden nodes. This will reduce the computational cost.

  • All output nodes should have incoming connections from a few surrounding hidden nodes.

  • The number of incoming connections in each set of nodes should be be similar to make the training more stable.

  • Think whether or not your use case requires long-range connections between the hidden nodes or not.

Data structure

The graphs generated by anemoi-utils are represented as a pytorch_geometric.data.HeteroData object. They include all the attributes specified in the recipe file and the node/edge type. The node/edge type represents the node/edge builder used to create the set of nodes/edges.

HeteroData(
  data={
    x=[40320, 2],  # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
    node_type='ZarrDatasetNodes',
    area_weight=[40320, 1],
  },
  hidden={
    x=[10242, 2],  # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
    node_type='TriNodes',
    area_weight=[10242, 1],
  },
  (data, to, hidden)={
    edge_index=[2, 62980],
    edge_type='CutOffEdges',
    edge_length=[62980, 1],
    edge_dirs=[62980, 2],
  },
  (hidden, to, hidden)={
    edge_index=[2, 81900],
    edge_type='MultiScaleEdges',
    edge_length=[81900, 1],
    edge_dirs=[81900, 2],
  },
  (hidden, to, data)={
    edge_index=[2, 120960],
    edge_type='KNNEdges',
    edge_length=[120960, 1],
    edge_dirs=[120960, 2],
  }
)

The HeteroData object contains some useful attributes such as node_types and edge_types which output the nodes and edges defined in the respective graph.

>>> graph.node_types
['data', 'hidden']

>>> graph.edge_types
[("data", "to", "hidden"), ("hidden", "to", "hidden"), ("hidden", "to", "data")]

In addition, you can inspect the attributes of the nodes and edges using the node_attrs and edge_attrs methods.

>>> graph["data"].node_attrs()
["x", "area_weight"]

>>> graph[("data", "to", "hidden")].edge_attrs()
['edge_index', 'edge_length', 'edge_dirs']