Create your first model
This section describes how to create an existing model from the
anemoi-models package.
In this example we show how to create an instance of the Encoder-Processor-Decoder that uses a Graph Transformer for the encoder and decoder and a sliding window transformer [1] for the processor.
Our implemented models are instantiated by omegaconf [2] and hydra
[3]. Commonly used model configurations can be found in
configs/models (see Configuration Basics).
Model Configuration
First, let’s take the model configuration transformer.yaml:
model:
_target_: anemoi.models.models.encoder_processor_decoder.AnemoiModelEncProcDec
num_channels: 1024
processor:
_target_: anemoi.models.layers.processor.TransformerProcessor
num_layers: 16
num_chunks: 2
encoder:
_target_: anemoi.models.layers.mapper.GraphTransformerForwardMapper
trainable_size: 8
sub_graph_edge_attributes: ${model.attributes.edges}
num_chunks: 1
mlp_hidden_ratio: 4
num_heads: 16
decoder:
_target_: anemoi.models.layers.mapper.GraphTransformerBackwardMapper
trainable_size: 8
sub_graph_edge_attributes: ${model.attributes.edges}
num_chunks: 1
mlp_hidden_ratio: 4
num_heads: 16
residual:
_target_: anemoi.models.layers.residual.SkipConnection
attributes:
edges:
- edge_length
- edge_dirs
nodes: []
Typically the model is instantiated in Anemoi Training or Anemoi Inference. For this example we will load the model configuration by itself to understand the different components needed to create a model.
from omegaconf import OmegaConf
model_config = OmegaConf.load("transformer.yaml")
Define statistics, data indices and supporting arrays
As described in Overview, we want to create a model interface that can be used for training and inference. For that we need to create the statistics, data indices and supporting arrays which is required for the pre- and postprocessing. These attributes are provided by the Welcome to anemoi-datasets documentation!.
Statistics
The statistics are simply stored in a dictionary with the mean,
stdev, maximum and minimum of the variables. They are usually loaded
from the dataset, i.e. ds.statistics:
statistics = {
"mean": [0.5, 1.1, 0.0],
"stdev": [0.1, 0.1, 0.1],
"maximum": [1.0, 1.0, 1.0],
"minimum": [0.0, 0.0, 0.0],
}
Data Indices
Data indices is a dictionary with the forcing and diagnostic
variables. They are usually created from the dataset, i.e.
ds.name_to_index:
from anemoi.models.data_indices.collection import IndexCollection
name_to_index = {"10u": 0, "10v": 1, "2d": 2, "2t": 3}
# This part is usually defined in the config/data/zarr.yaml file.
data_config = dict(
data={
"forcing": ["cos_latitude"],
"diagnostics": ["tp", "cp"],
"remapper": [],
}
)
data_indices = IndexCollection(data_config, name_to_index)
Supporting Arrays
Supporting arrays is a dictionary with the latitudes and longitudes
of the grid and naturally comes from the dataset, i.e.
ds.supporting_arrays.
supporting_arrays = {
"latitudes": [90.0, 89.0, 88.0],
"longitudes": [0.0, 1.0, 2.0]
}
Creating the Graph
All our currently implemented models are based on a graph encoder and
decoder. The graph is created by the GraphCreator class which is
part of Anemoi Graphs.
from anemoi.models.graphs.create import GraphCreator
graph_config = OmegaConf.load("graph.yaml")
graph_data = GraphCreator(config=graph_config).create()
Initializing the Model
Now that we have all the pieces needed to create the model, we can call
the AnemoiModelInterface class.
from anemoi.models.interface import AnemoiModelInterface
model_interface = AnemoiModelInterface(
statistics=statistics,
data_indices=data_indices,
supporting_arrays=supporting_arrays,
graph_data=graph_data,
config=model_config,
)
The model interface includes the preprocessor, postprocessor and the actual model (see Overview).
model_interface.preprocessor
model_interface.postprocessor
model_interface.model
Note
During training the forward pass is done by the
model_interface.forward method while during inference the
model_interface.predict_step. Their difference is that the
forward function assumes an already normalized state and predicts the
normalized state while the predict_step performs the pre- and
post-processing in addition to the forward step.
y_norm = model_interface.forward(x_norm)withx_inandy_predare normalized.y = model_interface.predict_step(x)withxandyare absolute values.
The PyTorch Model
The model architecture is in model_interface.model which is a
pytorch.nn.Module. The model therefore has a forward() function
and inherits all the important features for training.
In this example, model_interface.model is the following:
AnemoiModelEncProcDec(
(encoder_graph_provider): StaticGraphProvider(
(trainable): TrainableTensor()
)
(encoder): GraphTransformerForwardMapper(
(proc): GraphTransformerMapperBlock(
(lin_key): Linear(in_features=1024, out_features=1024, bias=True)
...
)
)
(processor_graph_provider): StaticGraphProvider(
(trainable): TrainableTensor()
)
(processor): TransformerProcessor(
...
)
(decoder_graph_provider): StaticGraphProvider(
(trainable): TrainableTensor()
)
(decoder): GraphTransformerBackwardMapper(
(proc): GraphTransformerMapperBlock(
(lin_key): Linear(in_features=1024, out_features=1024, bias=True)
...
)
)
Note that each encoder, processor, and decoder has a corresponding
*_graph_provider that manages the graph edges and trainable edge
parameters. The graph providers supply edge attributes and indices to
their corresponding mappers/processors during the forward pass.
Layer Kernels - Switching out Layers
The model interface allows switching out layers in the model. For example, if you want to use a different activation function, you can simply change the activation function in the model configuration. Anemoi will automatically train the model with the new activation function.
This functionality is optional and can be used to test different layers
and architectures. The model interface will automatically create the new
model with the new layer. For example, if you want to use the Sine
activation function instead of the GELU activation function, you can
simply change the activation function in a model component, like in the
processor below:
processor:
_target_: anemoi.models.layers.processor.TransformerProcessor
num_layers: 16
num_chunks: 2
layer_kernels:
Activation:
_target_: torch.nn.SiLU
Available Layer Kernels
This is entirely optional and uses sensible defaults for each layer. Currently, you can switch out the following layers (with a given key):
Activation function (
Activation): Defaulttorch.nn.GELULinear layers (
Linear): Defaulttorch.nn.LinearLayer Normalisation (
LayerNorm): Defaulttorch.nn.LayerNormQuery Normalisation (
QueryNorm): Defaultanemoi.models.layers.normalization.AutocastLayerNormKey Normalisation (
KeyNorm): Defaultanemoi.models.layers.normalization.AutocastLayerNorm
These layers can technically accept any type of PyTorch nn.Module
that implements a forward pass. The default layers are chosen to be
compatible with the model architecture and the training process.
Suitable Alternatives
Examples for suitable alternatives within Anemoi are:
Normalisation Layers (see modules/normalization):
anemoi.models.layers.normalization.AutocastLayerNormanemoi.models.layers.normalization.ConditionalLayerNorm
Activation functions (see modules/activations):
anemoi.models.layers.activations.Sinetorch.nn.SiLU,torch.nn.ReLU, or anytorch.nnactivation
For gated variants (GLU, SwiGLU, GEGLU, ReGLU), use mlp_implementation
on the processor/encoder/decoder instead of layer_kernels.Activation.
The _target_ can be any local or installed class (see Hydra
documentation [4]).
When to Use Layer Kernels
Layer kernels are particularly useful when:
You need to use specialized implementations for efficiency
You want to experiment with different normalization techniques
You need to customize the behaviour of specific layers in different parts of the model
MLP Implementations
Transformer and GraphTransformer components support selecting the
feed-forward implementation via mlp_implementation:
processor:
mlp_hidden_ratio: 4
mlp_implementation: mlp # options: mlp, glu, swiglu, geglu, reglu
encoder:
mlp_hidden_ratio: 4
mlp_implementation: mlp
decoder:
mlp_hidden_ratio: 4
mlp_implementation: mlp
Recommended mlp_hidden_ratio:
mlp:4gated variants (
glu,swiglu,geglu,reglu):~2.67(for comparable parameter count/compute tomlp:4)
For GNN components, the same mlp_implementation and
mlp_hidden_ratio options are available. GNNs additionally support
mlp_extra_layers to control MLP depth (number of hidden layers).
Footnotes