Overview
A graph \(G = (V, E)\) is a collection of nodes/vertices \(V\) and edges \(E\) that connect the nodes. The nodes can represent locations in the globe.
In weather models, the nodes \(V\) can generally be classified into 2 categories:
Data nodes: The data nodes represent the input/output of the data-driven model, so they are linked to existing datasets.
Hidden nodes: These hidden nodes represent the latent space, where the internal dynamics are learned.
Similarly, the edges \(V\) can be classified into 3 categories:
Encoder edges: These encoder edges connect the data nodes with the hidden nodes to encode the input data into the latent space.
Processor edges: These processor edges connect the hidden nodes with the hidden nodes to process the latent space.
Decoder edges: These decoder edges connect the hidden nodes with the data nodes to decode the latent space into the output data.
When building the graph with anemoi-graphs, there is no difference between these categories. However, it is important to keep this distinction in mind when designing a weather graph to be used in a data-driven model with anemoi-training.
Design principles
In particular, when designing a graph for a weather model, the following guidelines should be followed:
Use a coarser resolution for the hidden nodes. This will reduce the computational cost of training and inference.
All input nodes should be connected to the hidden nodes. This will ensure that all available information can be used.
In the encoder edges, minimise the number of connections to the hidden nodes. This will reduce the computational cost.
All output nodes should have incoming connections from a few surrounding hidden nodes.
The number of incoming connections in each set of nodes should be be similar to make the training more stable.
Think whether or not your use case requires long-range connections between the hidden nodes or not.
Data structure
The graphs generated by anemoi-utils are represented as a pytorch_geometric.data.HeteroData object. They include all the attributes specified in the recipe file and the node/edge type. The node/edge type represents the node/edge builder used to create the set of nodes/edges.
HeteroData(
data={
x=[40320, 2], # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
node_type='ZarrDatasetNodes',
area_weight=[40320, 1],
},
hidden={
x=[10242, 2], # coordinates in radians (lat in [-pi/2, pi/2], lon in [0, 2pi])
node_type='TriNodes',
area_weight=[10242, 1],
},
(data, to, hidden)={
edge_index=[2, 62980],
edge_type='CutOffEdges',
edge_length=[62980, 1],
edge_dirs=[62980, 2],
},
(hidden, to, hidden)={
edge_index=[2, 81900],
edge_type='MultiScaleEdges',
edge_length=[81900, 1],
edge_dirs=[81900, 2],
},
(hidden, to, data)={
edge_index=[2, 120960],
edge_type='KNNEdges',
edge_length=[120960, 1],
edge_dirs=[120960, 2],
}
)
The HeteroData object contains some useful attributes such as node_types and edge_types which output the nodes and edges defined in the respective graph.
>>> graph.node_types
['data', 'hidden']
>>> graph.edge_types
[("data", "to", "hidden"), ("hidden", "to", "hidden"), ("hidden", "to", "data")]
In addition, you can inspect the attributes of the nodes and edges using the node_attrs and edge_attrs methods.
>>> graph["data"].node_attrs()
["x", "area_weight"]
>>> graph[("data", "to", "hidden")].edge_attrs()
['edge_index', 'edge_length', 'edge_dirs']
Installing
To install the package, you can use the following command:
pip install anemoi-graphs[...options...]
The options are:
dev: install the development dependenciesdocs: install the dependencies for the documentationtest: install the dependencies for testingall: install all the dependencies
Contributing
git clone ...
cd anemoi-graphs
pip install .[dev]
pip install -r docs/requirements.txt
You may also have to install pandoc on MacOS:
brew install pandoc