Create your first model

This section describes how to create an existing model from the anemoi-models package.

In this example we show how to create an instance of the Encoder-Processor-Decoder that uses a Graph Transformer for the encoder and decoder and a sliding window transformer [1] for the processor.

Our implemented models are instantiated by omegaconf [2] and hydra [3]. Commonly used model configurations can be found in configs/models (see Configuration Basics).

Model Configuration

First, let’s take the model configuration transformer.yaml:

model:
  _target_: anemoi.models.models.encoder_processor_decoder.AnemoiModelEncProcDec

num_channels: 1024

processor:
  _target_: anemoi.models.layers.processor.TransformerProcessor
  num_layers: 16
  num_chunks: 2

encoder:
  _target_: anemoi.models.layers.mapper.GraphTransformerForwardMapper
  trainable_size: 8
  sub_graph_edge_attributes: ${model.attributes.edges}
  num_chunks: 1
  mlp_hidden_ratio: 4
  num_heads: 16

decoder:
  _target_: anemoi.models.layers.mapper.GraphTransformerBackwardMapper
  trainable_size: 8
  sub_graph_edge_attributes: ${model.attributes.edges}
  num_chunks: 1
  mlp_hidden_ratio: 4
  num_heads: 16

residual:
   _target_: anemoi.models.layers.residual.SkipConnection

attributes:
  edges:
  - edge_length
  - edge_dirs
  nodes: []

Typically the model is instantiated in Anemoi Training or Anemoi Inference. For this example we will load the model configuration by itself to understand the different components needed to create a model.

from omegaconf import OmegaConf

model_config = OmegaConf.load("transformer.yaml")

Define statistics, data indices and supporting arrays

As described in Overview, we want to create a model interface that can be used for training and inference. For that we need to create the statistics, data indices and supporting arrays which is required for the pre- and postprocessing. These attributes are provided by the Welcome to anemoi-datasets documentation!.

Statistics

The statistics are simply stored in a dictionary with the mean, stdev, maximum and minimum of the variables. They are usually loaded from the dataset, i.e. ds.statistics:

statistics = {
    "mean": [0.5, 1.1, 0.0],
    "stdev": [0.1, 0.1, 0.1],
    "maximum": [1.0, 1.0, 1.0],
    "minimum": [0.0, 0.0, 0.0],
}

Data Indices

Data indices is a dictionary with the forcing and diagnostic variables. They are usually created from the dataset, i.e. ds.name_to_index:

from anemoi.models.data_indices.collection import IndexCollection

name_to_index = {"10u": 0, "10v": 1, "2d": 2, "2t": 3}

# This part is usually defined in the config/data/zarr.yaml file.
data_config = dict(
    data={
        "forcing": ["cos_latitude"],
        "diagnostics": ["tp", "cp"],
        "remapper": [],
    }
)
data_indices = IndexCollection(data_config, name_to_index)

Supporting Arrays

Supporting arrays is a dictionary with the latitudes and longitudes of the grid and naturally comes from the dataset, i.e. ds.supporting_arrays.

supporting_arrays = {
    "latitudes": [90.0, 89.0, 88.0],
    "longitudes": [0.0, 1.0, 2.0]
}

Creating the Graph

All our currently implemented models are based on a graph encoder and decoder. The graph is created by the GraphCreator class which is part of Anemoi Graphs.

from anemoi.models.graphs.create import GraphCreator

graph_config = OmegaConf.load("graph.yaml")
graph_data = GraphCreator(config=graph_config).create()

Initializing the Model

Now that we have all the pieces needed to create the model, we can call the AnemoiModelInterface class.

from anemoi.models.interface import AnemoiModelInterface

model_interface = AnemoiModelInterface(
    statistics=statistics,
    data_indices=data_indices,
    supporting_arrays=supporting_arrays,
    graph_data=graph_data,
    config=model_config,
)

The model interface includes the preprocessor, postprocessor and the actual model (see Overview).

model_interface.preprocessor
model_interface.postprocessor
model_interface.model

Note

During training the forward pass is done by the model_interface.forward method while during inference the model_interface.predict_step. Their difference is that the forward function assumes an already normalized state and predicts the normalized state while the predict_step performs the pre- and post-processing in addition to the forward step.

y_norm = model_interface.forward(x_norm) with x_in and y_pred are normalized.
y = model_interface.predict_step(x) with x and y are absolute values.

The PyTorch Model

The model architecture is in model_interface.model which is a pytorch.nn.Module. The model therefore has a forward() function and inherits all the important features for training.

In this example, model_interface.model is the following:

AnemoiModelEncProcDec(
  (encoder_graph_provider): StaticGraphProvider(
    (trainable): TrainableTensor()
  )
  (encoder): GraphTransformerForwardMapper(
    (proc): GraphTransformerMapperBlock(
      (lin_key): Linear(in_features=1024, out_features=1024, bias=True)
      ...
    )
  )
  (processor_graph_provider): StaticGraphProvider(
    (trainable): TrainableTensor()
  )
  (processor): TransformerProcessor(
    ...
  )
  (decoder_graph_provider): StaticGraphProvider(
    (trainable): TrainableTensor()
  )
  (decoder): GraphTransformerBackwardMapper(
    (proc): GraphTransformerMapperBlock(
      (lin_key): Linear(in_features=1024, out_features=1024, bias=True)
      ...
  )
)

Note that each encoder, processor, and decoder has a corresponding *_graph_provider that manages the graph edges and trainable edge parameters. The graph providers supply edge attributes and indices to their corresponding mappers/processors during the forward pass.

Layer Kernels - Switching out Layers

The model interface allows switching out layers in the model. For example, if you want to use a different activation function, you can simply change the activation function in the model configuration. Anemoi will automatically train the model with the new activation function.

This functionality is optional and can be used to test different layers and architectures. The model interface will automatically create the new model with the new layer. For example, if you want to use the Sine activation function instead of the GELU activation function, you can simply change the activation function in a model component, like in the processor below:

processor:
  _target_: anemoi.models.layers.processor.TransformerProcessor
  num_layers: 16
  num_chunks: 2
  layer_kernels:
    Activation:
      _target_: torch.nn.SiLU

Available Layer Kernels

This is entirely optional and uses sensible defaults for each layer. Currently, you can switch out the following layers (with a given key):

Activation function (Activation): Default torch.nn.GELU
Linear layers (Linear): Default torch.nn.Linear
Layer Normalisation (LayerNorm): Default torch.nn.LayerNorm
Query Normalisation (QueryNorm): Default anemoi.models.layers.normalization.AutocastLayerNorm
Key Normalisation (KeyNorm): Default anemoi.models.layers.normalization.AutocastLayerNorm

These layers can technically accept any type of PyTorch nn.Module that implements a forward pass. The default layers are chosen to be compatible with the model architecture and the training process.

Suitable Alternatives

Examples for suitable alternatives within Anemoi are:

Normalisation Layers (see modules/normalization):

anemoi.models.layers.normalization.AutocastLayerNorm
anemoi.models.layers.normalization.ConditionalLayerNorm

Activation functions (see modules/activations):

anemoi.models.layers.activations.Sine
torch.nn.SiLU, torch.nn.ReLU, or any torch.nn activation

For gated variants (GLU, SwiGLU, GEGLU, ReGLU), use mlp_implementation on the processor/encoder/decoder instead of layer_kernels.Activation.

The _target_ can be any local or installed class (see Hydra documentation [4]).

When to Use Layer Kernels

Layer kernels are particularly useful when:

You need to use specialized implementations for efficiency
You want to experiment with different normalization techniques
You need to customize the behaviour of specific layers in different parts of the model

MLP Implementations

Transformer and GraphTransformer components support selecting the feed-forward implementation via mlp_implementation:

processor:
  mlp_hidden_ratio: 4
  mlp_implementation: mlp  # options: mlp, glu, swiglu, geglu, reglu

encoder:
  mlp_hidden_ratio: 4
  mlp_implementation: mlp

decoder:
  mlp_hidden_ratio: 4
  mlp_implementation: mlp

Recommended mlp_hidden_ratio:

mlp: 4
gated variants (glu, swiglu, geglu, reglu): ~2.67 (for comparable parameter count/compute to mlp:4)

For GNN components, the same mlp_implementation and mlp_hidden_ratio options are available. GNNs additionally support mlp_extra_layers to control MLP depth (number of hidden layers).

Footnotes