.. _overview:

##########
 Overview
##########

A graph :math:`G = (V, E)` is a collection of nodes/vertices :math:`V`
and edges :math:`E` that connect the nodes. The nodes can represent
locations in the globe.

In weather models, the nodes :math:`V` can generally be classified into
2 categories:

-  **Data nodes**: The `data nodes` represent the input/output of the
   data-driven model, so they are linked to existing datasets.
-  **Hidden nodes**: These `hidden nodes` represent the latent space,
   where the internal dynamics are learned.

Similarly, the edges :math:`V` can be classified into 3 categories:

-  **Encoder edges**: These `encoder edges` connect the `data` nodes
   with the `hidden` nodes to encode the input data into the latent
   space.

-  **Processor edges**: These `processor edges` connect the `hidden`
   nodes with the `hidden` nodes to process the latent space.

-  **Decoder edges**: These `decoder edges` connect the `hidden` nodes
   with the `data` nodes to decode the latent space into the output
   data.

When building the graph with `anemoi-graphs`, there is no difference
between these categories. However, it is important to keep this
distinction in mind when designing a weather graph to be used in a
data-driven model with :ref:`anemoi-training
<anemoi-training:index-page>`.

*******************
 Design principles
*******************

In particular, when designing a graph for a weather model, the following
guidelines should be followed:

-  Use a coarser resolution for the `hidden nodes`. This will reduce the
   computational cost of training and inference.
-  All input nodes should be connected to the `hidden nodes`. This will
   ensure that all available information can be used.
-  In the encoder edges, minimise the number of connections to the
   `hidden nodes`. This will reduce the computational cost.
-  All output nodes should have incoming connections from a few
   surrounding `hidden nodes`.
-  The number of incoming connections in each set of nodes should be be
   similar to make the training more stable.
-  Think whether or not your use case requires long-range connections
   between the `hidden nodes` or not.

****************
 Data structure
****************

The graphs generated by :ref:`anemoi-utils <anemoi-utils:index-page>`
are represented as a `pytorch_geometric.data.HeteroData
<https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.HeteroData.html>`_
object. They include all the attributes specified in the recipe file and
the node/edge type. The node/edge type represents the node/edge builder
used to create the set of nodes/edges.

.. literalinclude:: _static/hetero_data_graph.txt
   :language: console

The `HeteroData` object contains some useful attributes such as
`node_types` and `edge_types` which output the nodes and edges defined
in the respective graph.

.. code:: console

   >>> graph.node_types
   ['data', 'hidden']

   >>> graph.edge_types
   [("data", "to", "hidden"), ("hidden", "to", "hidden"), ("hidden", "to", "data")]

In addition, you can inspect the attributes of the nodes and edges using
the `node_attrs` and `edge_attrs` methods.

.. code:: console

   >>> graph["data"].node_attrs()
   ["x", "area_weight"]

   >>> graph[("data", "to", "hidden")].edge_attrs()
   ['edge_index', 'edge_length', 'edge_dirs']

************
 Installing
************

To install the package, you can use the following command:

.. code:: bash

   pip install anemoi-graphs[...options...]

The options are:

-  ``dev``: install the development dependencies
-  ``docs``: install the dependencies for the documentation
-  ``test``: install the dependencies for testing
-  ``all``: install all the dependencies

**************
 Contributing
**************

.. code:: bash

   git clone ...
   cd anemoi-graphs
   pip install .[dev]
   pip install -r docs/requirements.txt

You may also have to install pandoc on MacOS:

.. code:: bash

   brew install pandoc