Overview

A graph \(G = (V, E)\) is a collection of nodes/vertices \(V\) and edges \(E\) that connect the nodes. The nodes can represent locations in the globe.

In weather models, the nodes \(V\) can generally be classified into 2 categories:

  • Data nodes: The data nodes represent the input/output of the data-driven model, so they are linked to existing datasets.

  • Hidden nodes: These hidden nodes represent the latent space, where the internal dynamics are learned.

Similarly, the edges \(V\) can be classified into 3 categories:

  • Encoder edges: These encoder edges connect the data nodes with the hidden nodes to encode the input data into the latent space.

  • Processor edges: These processor edges connect the hidden nodes with the hidden nodes to process the latent space.

  • Decoder edges: These decoder edges connect the hidden nodes with the data nodes to decode the latent space into the output data.

When building the graph with anemoi-graphs, there is no difference between these categories. However, it is important to keep this distinction in mind when designing a weather graph to be used in a data-driven model with anemoi-training.

Design principles

In particular, when designing a graph for a weather model, the following guidelines should be followed:

  • Use a coarser resolution for the hidden nodes. This will reduce the computational cost of training and inference.

  • All input nodes should be connected to the hidden nodes. This will ensure that all available information can be used.

  • In the encoder edges, minimise the number of connections to the hidden nodes. This will reduce the computational cost.

  • All output nodes should have incoming connections from a few surrounding hidden nodes.

  • The number of incoming connections in each set of nodes should be be similar to make the training more stable.

  • Think whether or not your use case requires long-range connections between the hidden nodes or not.

Data structure

The graphs generated by anemoi-utils are represented as a pytorch_geometric.data.HeteroData object. They include all the attributes specified in the recipe file and the node/edge type. The node/edge type represents the node/edge builder used to create the set of nodes/edges.

HeteroData(
  data={
    x=[40320, 2],  # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
    node_type='ZarrDatasetNodes',
    area_weight=[40320, 1],
  },
  hidden={
    x=[10242, 2],  # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
    node_type='TriNodes',
    area_weight=[10242, 1],
  },
  (data, to, hidden)={
    edge_index=[2, 62980],
    edge_type='CutOffEdges',
    edge_length=[62980, 1],
    edge_dirs=[62980, 2],
  },
  (hidden, to, hidden)={
    edge_index=[2, 81900],
    edge_type='MultiScaleEdges',
    edge_length=[81900, 1],
    edge_dirs=[81900, 2],
  },
  (hidden, to, data)={
    edge_index=[2, 120960],
    edge_type='KNNEdges',
    edge_length=[120960, 1],
    edge_dirs=[120960, 2],
  }
)

The HeteroData object contains some useful attributes such as node_types and edge_types which output the nodes and edges defined in the respective graph.

>>> graph.node_types
['data', 'hidden']

>>> graph.edge_types
[("data", "to", "hidden"), ("hidden", "to", "hidden"), ("hidden", "to", "data")]

In addition, you can inspect the attributes of the nodes and edges using the node_attrs and edge_attrs methods.

>>> graph["data"].node_attrs()
["x", "area_weight"]

>>> graph[("data", "to", "hidden")].edge_attrs()
['edge_index', 'edge_length', 'edge_dirs']

Installing

To install the package, you can use the following command:

pip install anemoi-graphs[...options...]

The options are:

  • dev: install the development dependencies

  • docs: install the dependencies for the documentation

  • test: install the dependencies for testing

  • all: install all the dependencies

Contributing

git clone ...
cd anemoi-graphs
pip install .[dev]
pip install -r docs/requirements.txt

You may also have to install pandoc on MacOS:

brew install pandoc