.. _overview: ########## Overview ########## A graph :math:`G = (V, E)` is a collection of nodes/vertices :math:`V` and edges :math:`E` that connect the nodes. The nodes can represent locations in the globe. In weather models, the nodes :math:`V` can generally be classified into 2 categories: - **Data nodes**: The `data nodes` represent the input/output of the data-driven model, so they are linked to existing datasets. - **Hidden nodes**: These `hidden nodes` represent the latent space, where the internal dynamics are learned. Similarly, the edges :math:`V` can be classified into 3 categories: - **Encoder edges**: These `encoder edges` connect the `data` nodes with the `hidden` nodes to encode the input data into the latent space. - **Processor edges**: These `processor edges` connect the `hidden` nodes with the `hidden` nodes to process the latent space. - **Decoder edges**: These `decoder edges` connect the `hidden` nodes with the `data` nodes to decode the latent space into the output data. When building the graph with `anemoi-graphs`, there is no difference between these categories. However, it is important to keep this distinction in mind when designing a weather graph to be used in a data-driven model with :ref:`anemoi-training `. ******************* Design principles ******************* In particular, when designing a graph for a weather model, the following guidelines should be followed: - Use a coarser resolution for the `hidden nodes`. This will reduce the computational cost of training and inference. - All input nodes should be connected to the `hidden nodes`. This will ensure that all available information can be used. - In the encoder edges, minimise the number of connections to the `hidden nodes`. This will reduce the computational cost. - All output nodes should have incoming connections from a few surrounding `hidden nodes`. - The number of incoming connections in each set of nodes should be be similar to make the training more stable. - Think whether or not your use case requires long-range connections between the `hidden nodes` or not. **************** Data structure **************** The graphs generated by :ref:`anemoi-utils ` are represented as a `pytorch_geometric.data.HeteroData `_ object. They include all the attributes specified in the recipe file and the node/edge type. The node/edge type represents the node/edge builder used to create the set of nodes/edges. .. literalinclude:: _static/hetero_data_graph.txt :language: console The `HeteroData` object contains some useful attributes such as `node_types` and `edge_types` which output the nodes and edges defined in the respective graph. .. code:: console >>> graph.node_types ['data', 'hidden'] >>> graph.edge_types [("data", "to", "hidden"), ("hidden", "to", "hidden"), ("hidden", "to", "data")] In addition, you can inspect the attributes of the nodes and edges using the `node_attrs` and `edge_attrs` methods. .. code:: console >>> graph["data"].node_attrs() ["x", "area_weight"] >>> graph[("data", "to", "hidden")].edge_attrs() ['edge_index', 'edge_length', 'edge_dirs'] ************ Installing ************ To install the package, you can use the following command: .. code:: bash pip install anemoi-graphs[...options...] The options are: - ``dev``: install the development dependencies - ``docs``: install the dependencies for the documentation - ``test``: install the dependencies for testing - ``all``: install all the dependencies ************** Contributing ************** .. code:: bash git clone ... cd anemoi-graphs pip install .[dev] pip install -r docs/requirements.txt You may also have to install pandoc on MacOS: .. code:: bash brew install pandoc