Layers

Environment Variables

ANEMOI_INFERENCE_NUM_CHUNKS

This environment variable controls the number of chunks used in the Mapper during inference. Setting this variable allows the model to split large computations into a specified number of smaller chunks, reducing memory overhead. If not set, it falls back to the default value of, 1 i.e. no chunking. See pull request #46.

Mappers

class anemoi.models.layers.mapper.BaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, out_channels_dst: int | None = None, cpu_offload: bool = False, activation: str = 'SiLU', **kwargs)

Bases: Module, ABC

Base Mapper from souce dimension to destination dimension.

pre_process(x, shard_shapes, model_comm_group=None) tuple[Tensor, Tensor, tuple[int], tuple[int]]

Pre-processing for the Mappers.

Splits the tuples into src and dst nodes and shapes as the base operation.

Parameters:
  • x (Tuple[Tensor]) – Data containing source and destination nodes and edges.

  • shard_shapes (Tuple[Tuple[int], Tuple[int]]) – Shapes of the sharded source and destination nodes.

  • model_comm_group (ProcessGroup) – Groups which GPUs work together on one model instance

Returns:

Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes

Return type:

Tuple[Tensor, Tensor, Tuple[int], Tuple[int]]

post_process(x_dst, shapes_dst, model_comm_group=None)

Post-processing for the mapper.

class anemoi.models.layers.mapper.BackwardMapperPostProcessMixin

Bases: object

Post-processing for Backward Mapper from hidden -> data.

class anemoi.models.layers.mapper.ForwardMapperPreProcessMixin

Bases: object

Pre-processing for Forward Mapper from data -> hidden.

class anemoi.models.layers.mapper.GraphTransformerBaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: GraphEdgeMixin, BaseMapper

Graph Transformer Base Mapper from hidden -> data or data -> hidden.

forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.mapper.GraphTransformerForwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: ForwardMapperPreProcessMixin, GraphTransformerBaseMapper

Graph Transformer Mapper from data -> hidden.

forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.mapper.GraphTransformerBackwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: BackwardMapperPostProcessMixin, GraphTransformerBaseMapper

Graph Transformer Mapper from hidden -> data.

pre_process(x, shard_shapes, model_comm_group=None)

Pre-processing for the Mappers.

Splits the tuples into src and dst nodes and shapes as the base operation.

Parameters:
  • x (Tuple[Tensor]) – Data containing source and destination nodes and edges.

  • shard_shapes (Tuple[Tuple[int], Tuple[int]]) – Shapes of the sharded source and destination nodes.

  • model_comm_group (ProcessGroup) – Groups which GPUs work together on one model instance

Returns:

Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes

Return type:

Tuple[Tensor, Tensor, Tuple[int], Tuple[int]]

class anemoi.models.layers.mapper.GNNBaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: GraphEdgeMixin, BaseMapper

Base for Graph Neural Network Mapper from hidden -> data or data -> hidden.

forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.mapper.GNNForwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: ForwardMapperPreProcessMixin, GNNBaseMapper

Graph Neural Network Mapper data -> hidden.

class anemoi.models.layers.mapper.GNNBackwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)

Bases: BackwardMapperPostProcessMixin, GNNBaseMapper

Graph Neural Network Mapper from hidden -> data.

pre_process(x, shard_shapes, model_comm_group=None)

Pre-processing for the Mappers.

Splits the tuples into src and dst nodes and shapes as the base operation.

Parameters:
  • x (Tuple[Tensor]) – Data containing source and destination nodes and edges.

  • shard_shapes (Tuple[Tuple[int], Tuple[int]]) – Shapes of the sharded source and destination nodes.

  • model_comm_group (ProcessGroup) – Groups which GPUs work together on one model instance

Returns:

Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes

Return type:

Tuple[Tensor, Tensor, Tuple[int], Tuple[int]]

forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Processors

class anemoi.models.layers.processor.BaseProcessor(num_layers: int, *args, num_channels: int = 128, num_chunks: int = 2, activation: str = 'GELU', cpu_offload: bool = False, **kwargs)

Bases: Module, ABC

Base Processor.

build_layers(processor_chunk_class, *args, **kwargs) None

Build Layers.

run_layers(data: tuple, *args, **kwargs) Tensor

Run Layers with checkpoint.

forward(x: Tensor, *args, **kwargs) Tensor

Example forward pass.

class anemoi.models.layers.processor.TransformerProcessor(num_layers: int, layer_kernels: DotDict, *args, window_size: int | None = None, num_channels: int = 128, num_chunks: int = 2, activation: str = 'GELU', cpu_offload: bool = False, num_heads: int = 16, mlp_hidden_ratio: int = 4, dropout_p: float = 0.1, attention_implementation: str = 'flash_attention', softcap: float = 0.0, use_alibi_slopes: bool = False, **kwargs)

Bases: BaseProcessor

Transformer Processor.

forward(x: Tensor, batch_size: int, shard_shapes: tuple[tuple[int], ...], model_comm_group: ProcessGroup | None = None, *args, **kwargs) Tensor

Example forward pass.

class anemoi.models.layers.processor.GNNProcessor(num_layers: int, layer_kernels: DotDict, *args, trainable_size: int = 8, num_channels: int = 128, num_chunks: int = 2, mlp_extra_layers: int = 0, activation: str = 'SiLU', cpu_offload: bool = False, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, **kwargs)

Bases: GraphEdgeMixin, BaseProcessor

GNN Processor.

forward(x: Tensor, batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tensor

Example forward pass.

class anemoi.models.layers.processor.GraphTransformerProcessor(num_layers: int, layer_kernels: DotDict, trainable_size: int = 8, num_channels: int = 128, num_chunks: int = 2, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', cpu_offload: bool = False, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, **kwargs)

Bases: GraphEdgeMixin, BaseProcessor

Processor.

forward(x: Tensor, batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None, *args, **kwargs) Tensor

Example forward pass.

Chunks

class anemoi.models.layers.chunk.BaseProcessorChunk(num_channels: int, num_layers: int, *args, activation: str = 'GELU', **kwargs)

Bases: Module, ABC

Base Processor Chunk.

build_blocks(block: Module, *args, **kwargs) None

Build Layers.

abstract forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.chunk.TransformerProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, window_size: int, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float = None, use_alibi_slopes: bool = None)

Bases: BaseProcessorChunk

Wraps transformer blocks for checkpointing in Processor.

forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.chunk.GNNProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', edge_dim: int | None = None)

Bases: BaseProcessorChunk

Wraps edge embedding message passing blocks for checkpointing in Processor.

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) Tuple[Tensor, Tensor | None]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.chunk.GraphTransformerProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', edge_dim: int | None = None)

Bases: BaseProcessorChunk

Wraps graph transformer blocks for checkpointing in Processor.

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) Tuple[Tensor, Tensor | None]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Blocks

class anemoi.models.layers.block.BaseBlock(**kwargs)

Bases: Module, ABC

Base class for network blocks.

abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, size: Tuple[int, int] | None = None, model_comm_group: ProcessGroup | None = None) tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.TransformerProcessorBlock(num_channels: int, hidden_dim: int, num_heads: int, activation: str, window_size: int, layer_kernels: DotDict, dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float = None, use_alibi_slopes: bool = None)

Bases: BaseBlock

Transformer block with MultiHeadSelfAttention and MLPs.

forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphConvBaseBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)

Bases: BaseBlock

Message passing block with MLPs for node embeddings.

abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphConvProcessorBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)

Bases: GraphConvBaseBlock

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphConvMapperBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)

Bases: GraphConvBaseBlock

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphTransformerBaseBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)

Bases: BaseBlock, ABC

Message passing block with MLPs for node embeddings.

shard_qkve_heads(query: Tensor, key: Tensor, value: Tensor, edges: Tensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None) tuple[Tensor, Tensor, Tensor, Tensor]

Shards qkv and edges along head dimension.

shard_output_seq(out: Tensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor

Shards Tensor sequence dimension.

abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphTransformerMapperBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)

Bases: GraphTransformerBaseBlock

Graph Transformer Block for node embeddings.

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.block.GraphTransformerProcessorBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)

Bases: GraphTransformerBaseBlock

Graph Transformer Block for node embeddings.

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Graph

class anemoi.models.layers.graph.TrainableTensor(tensor_size: int, trainable_size: int)

Bases: Module

Trainable Tensor Module.

forward(x: Tensor, batch_size: int) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.graph.NamedNodesAttributes(num_trainable_params: int, graph_data: HeteroData)

Bases: Module

Named Nodes Attributes information.

num_nodes

Number of nodes for each group of nodes.

Type:

dict[str, int]

attr_ndims

Total dimension of node attributes (non-trainable + trainable) for each group of nodes.

Type:

dict[str, int]

trainable_tensors

Dictionary of trainable tensors for each group of nodes.

Type:

nn.ModuleDict

get_coordinates(self, name: str) Tensor

Get the coordinates of a set of nodes.

forward(self, name: str, batch_size: int) Tensor

Get the node attributes to be passed trough the graph neural network.

define_fixed_attributes(graph_data: HeteroData, num_trainable_params: int) None

Define fixed attributes.

register_coordinates(name: str, node_coords: Tensor) None

Register coordinates.

get_coordinates(name: str) Tensor

Return original coordinates.

register_tensor(name: str, num_trainable_params: int) None

Register a trainable tensor.

forward(name: str, batch_size: int) Tensor

Returns the node attributes to be passed trough the graph neural network.

It includes both the coordinates and the trainable parameters.

Conv

class anemoi.models.layers.conv.GraphConv(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', **kwargs)

Bases: MessagePassing

Message passing module for convolutional node and edge interactions.

forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, size: Tuple[int, int] | None = None)

Runs the forward pass of the module.

message(x_i: Tensor, x_j: Tensor, edge_attr: Tensor, dim_size: int | None = None) Tensor

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

aggregate(edges_new: Tensor, edge_index: Tensor | SparseTensor, dim_size: int | None = None) tuple[Tensor, Tensor]

Aggregates messages from neighbors as \(\bigoplus_{j \in \mathcal{N}(i)}\).

Takes in the output of message computation as first argument and any argument which was initially passed to propagate().

By default, this function will delegate its call to the underlying Aggregation module to reduce messages as specified in __init__() by the aggr argument.

class anemoi.models.layers.conv.GraphTransformerConv(out_channels: int, dropout: float = 0.0, **kwargs)

Bases: MessagePassing

Message passing part of graph transformer operator.

Adapted from ‘Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification’ (https://arxiv.org/abs/2009.03509)

forward(query: Tensor, key: Tensor, value: Tensor, edge_attr: Tensor | None, edge_index: Tensor | SparseTensor, size: Tuple[int, int] | None = None)

Runs the forward pass of the module.

message(heads: int, query_i: Tensor, key_j: Tensor, value_j: Tensor, edge_attr: Tensor | None, index: Tensor, ptr: Tensor | None, size_i: int | None) Tensor

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

Attention

class anemoi.models.layers.attention.MultiHeadSelfAttention(num_heads: int, embed_dim: int, layer_kernels: DotDict, bias: bool = False, is_causal: bool = False, window_size: int | None = None, dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float | None = None, use_alibi_slopes: bool = False)

Bases: Module

Multi Head Self Attention Pytorch Layer

allows for three different attention implementations: - scaled dot product attention, see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html - flash attention, see https://github.com/Dao-AILab/flash-attention

forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.attention.SDPAAttentionWrapper

Bases: Module

Wrapper for Pytorch scaled dot product attention

forward(query, key, value, batch_size: int, causal=False, window_size=None, dropout_p=0.0, softcap=None, alibi_slopes=None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class anemoi.models.layers.attention.FlashAttentionWrapper

Bases: Module

Wrapper for Flash attention.

forward(query, key, value, batch_size: int, causal: bool = False, window_size: int = None, dropout_p: float = 0.0, softcap: float | None = None, alibi_slopes: Tensor = None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

anemoi.models.layers.attention.get_alibi_slopes(num_heads: int) Tensor

Calculates linearly decreasing slopes for alibi attention.

Parameters:

num_heads (int) – number of attention heads

Returns:

aLiBi slopes

Return type:

Tensor

Multi-Layer Perceptron

class anemoi.models.layers.mlp.MLP(in_features: int, hidden_dim: int, out_features: int, layer_kernels: DotDict, n_extra_layers: int = 0, activation: str = 'SiLU', final_activation: bool = False, layer_norm: bool = True, checkpoints: bool = False)

Bases: Module

Multi-layer perceptron with optional checkpoint.

forward(x: Tensor) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Utils

class anemoi.models.layers.utils.CheckpointWrapper(module: Module)

Bases: Module

Wrapper for checkpointing a module.

forward(*args, **kwargs)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

anemoi.models.layers.utils.load_layer_kernels(kernel_config: DotDict | None = {}) DotDict

Load layer kernels from the config.

Parameters:

kernel_config – Optional[DotDict] Kernel configuration

Returns:

hydra partial instantiation of the layer kernels

Return type:

DotDict