Models

The models module provides several neural network architectures that work with graph input data and follow an encoder-processor-decoder structure.

Encoder-Processor-Decoder Model

The model defines a network architecture with configurable encoder, processor, and decoder components (Lang et al. (2024a)).

class anemoi.models.models.encoder_processor_decoder.AnemoiModelEncProcDec(*, model_config: DictConfig, data_indices: dict, statistics: dict, n_step_input: int, n_step_output: int, graph_data: HeteroData)

Bases: BaseGraphModel

Message passing graph neural network.

forward(x: dict[str, Tensor], *, model_comm_group: ProcessGroup | None = None, grid_shard_sizes: dict[str, list[int] | None] | None = None, **kwargs) → dict[str, Tensor]

Forward pass of the model.

Parameters:

x (dict[str, Tensor]) – Input data
model_comm_group (Optional[ProcessGroup], optional) – Model communication group, by default None
grid_shard_sizes (DatasetShardSizes, optional) – Per-dataset shard sizes for the grid dimension. None means the corresponding dataset is replicated, not sharded.

Returns:

Output of the model, with the same shape as the input (sharded if input is sharded)

Return type:

dict[str, Tensor]

fill_metadata(md_dict) → None: To be implemented in subclasses to fill model-specific metadata.

Residual connections (including graph-based truncation) are configured in the model config; see Residual connections for details.

Ensemble Encoder-Processor-Decoder Model

The ensemble model architecture implementing the AIFS-CRPS approach Lang et al. (2024b).

Key features:

Based on the base encoder-processor-decoder architecture
Injects noise in the processor for each ensemble member using anemoi.models.layers.normalization.ConditionalLayerNorm

class anemoi.models.models.ens_encoder_processor_decoder.AnemoiEnsModelEncProcDec(*, model_config: DictConfig, data_indices: dict, statistics: dict, graph_data: HeteroData, n_step_input: int, n_step_output: int)

Bases: AnemoiModelEncProcDec

Message passing graph neural network with ensemble functionality.

forward(x: dict[str, Tensor], *, fcstep: int, model_comm_group: ProcessGroup | None = None, grid_shard_sizes: dict[str, list[int] | None] | None = None, **kwargs) → dict[str, Tensor]

Forward operator.

Parameters:

x (dict[str, torch.Tensor]) – Input tensor, shape (bs, m, e, n, f)
fcstep (int) – Forecast step
model_comm_group (ProcessGroup, optional) – Model communication group
grid_shard_sizes (DatasetShardSizes, optional) – Per-dataset shard sizes for the grid dimension. None means the corresponding dataset is replicated, not sharded.
**kwargs – Additional keyword arguments

Returns:

Output tensor per dataset

Return type:

dict[str, Tensor]

For the training-side CRPS setup, including loss, truncation, and ensemble-specific configuration changes, see Ensemble CRPS-based training.

Hierarchical Encoder-Processor-Decoder Model

This model extends the standard encoder-processor-decoder architecture by introducing a hierarchical processor.

Key features:

Requires a predefined list of hidden nodes, [hidden_1, …, hidden_n]
Nodes must be sorted to match the expected flow of information data -> hidden_1 -> … -> hidden_n -> … -> hidden_1 -> data
Supports hierarchical level processing through the enable_hierarchical_level_processing configuration. This argument determines whether a processor is added at each hierarchy level or only at the final level.
Channel scaling: 2^n * config.num_channels where n is the hierarchy level

By default, the number of channels for the mappers is defined as 2^n * config.num_channels, where n represents the hierarchy level. This scaling ensures that the processing capacity grows proportionally with the depth of the hierarchy, enabling efficient handling of data.

class anemoi.models.models.hierarchical.AnemoiModelEncProcDecHierarchical(*, model_config: DictConfig, data_indices: dict, statistics: dict, n_step_input: int, n_step_output: int, graph_data: HeteroData)

Bases: AnemoiModelEncProcDec

Message passing hierarchical graph neural network.

forward(x: dict[str, Tensor], model_comm_group: ProcessGroup | None = None, grid_shard_sizes: dict[str, list[int] | None] | None = None, **kwargs) → dict[str, Tensor]

Forward pass of the model.

Parameters:

x (dict[str, Tensor]) – Input data
model_comm_group (Optional[ProcessGroup], optional) – Model communication group, by default None
grid_shard_sizes (DatasetShardSizes, optional) – Per-dataset shard sizes for the grid dimension. None means the corresponding dataset is replicated, not sharded.

Returns:

Output of the model, with the same shape as the input (sharded if input is sharded)

Return type:

dict[str, Tensor]