Layers
Environment Variables
ANEMOI_INFERENCE_NUM_CHUNKS
This environment variable controls the number of chunks used in the Mapper during inference. Setting this variable allows the model to split large computations into a specified number of smaller chunks, reducing memory overhead. If not set, it falls back to the default value of, 1 i.e. no chunking. See pull request #46.
Mappers
- class anemoi.models.layers.mapper.BaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, out_channels_dst: int | None = None, cpu_offload: bool = False, activation: str = 'SiLU', **kwargs)
Bases:
Module
,ABC
Base Mapper from souce dimension to destination dimension.
- pre_process(x, shard_shapes, model_comm_group=None) tuple[Tensor, Tensor, tuple[int], tuple[int]]
Pre-processing for the Mappers.
Splits the tuples into src and dst nodes and shapes as the base operation.
- Parameters:
- Returns:
Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes
- Return type:
- post_process(x_dst, shapes_dst, model_comm_group=None)
Post-processing for the mapper.
- class anemoi.models.layers.mapper.BackwardMapperPostProcessMixin
Bases:
object
Post-processing for Backward Mapper from hidden -> data.
- class anemoi.models.layers.mapper.ForwardMapperPreProcessMixin
Bases:
object
Pre-processing for Forward Mapper from data -> hidden.
- class anemoi.models.layers.mapper.GraphTransformerBaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
GraphEdgeMixin
,BaseMapper
Graph Transformer Base Mapper from hidden -> data or data -> hidden.
- forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.mapper.GraphTransformerForwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
ForwardMapperPreProcessMixin
,GraphTransformerBaseMapper
Graph Transformer Mapper from data -> hidden.
- forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.mapper.GraphTransformerBackwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'GELU', num_heads: int = 16, mlp_hidden_ratio: int = 4, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
BackwardMapperPostProcessMixin
,GraphTransformerBaseMapper
Graph Transformer Mapper from hidden -> data.
- pre_process(x, shard_shapes, model_comm_group=None)
Pre-processing for the Mappers.
Splits the tuples into src and dst nodes and shapes as the base operation.
- Parameters:
- Returns:
Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes
- Return type:
- class anemoi.models.layers.mapper.GNNBaseMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
GraphEdgeMixin
,BaseMapper
Base for Graph Neural Network Mapper from hidden -> data or data -> hidden.
- forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.mapper.GNNForwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
ForwardMapperPreProcessMixin
,GNNBaseMapper
Graph Neural Network Mapper data -> hidden.
- class anemoi.models.layers.mapper.GNNBackwardMapper(in_channels_src: int = 0, in_channels_dst: int = 0, hidden_dim: int = 128, trainable_size: int = 8, out_channels_dst: int | None = None, num_chunks: int = 1, cpu_offload: bool = False, activation: str = 'SiLU', mlp_extra_layers: int = 0, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, layer_kernels: DotDict = None)
Bases:
BackwardMapperPostProcessMixin
,GNNBaseMapper
Graph Neural Network Mapper from hidden -> data.
- pre_process(x, shard_shapes, model_comm_group=None)
Pre-processing for the Mappers.
Splits the tuples into src and dst nodes and shapes as the base operation.
- Parameters:
- Returns:
Source nodes, destination nodes, sharded source node shapes, sharded destination node shapes
- Return type:
- forward(x: Tuple[Tensor, Tensor], batch_size: int, shard_shapes: tuple[tuple[int], tuple[int]], model_comm_group: ProcessGroup | None = None) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Processors
- class anemoi.models.layers.processor.BaseProcessor(num_layers: int, *args, num_channels: int = 128, num_chunks: int = 2, activation: str = 'GELU', cpu_offload: bool = False, **kwargs)
Bases:
Module
,ABC
Base Processor.
- forward(x: Tensor, *args, **kwargs) Tensor
Example forward pass.
- class anemoi.models.layers.processor.TransformerProcessor(num_layers: int, layer_kernels: DotDict, *args, window_size: int | None = None, num_channels: int = 128, num_chunks: int = 2, activation: str = 'GELU', cpu_offload: bool = False, num_heads: int = 16, mlp_hidden_ratio: int = 4, dropout_p: float = 0.1, attention_implementation: str = 'flash_attention', softcap: float = 0.0, use_alibi_slopes: bool = False, **kwargs)
Bases:
BaseProcessor
Transformer Processor.
- class anemoi.models.layers.processor.GNNProcessor(num_layers: int, layer_kernels: DotDict, *args, trainable_size: int = 8, num_channels: int = 128, num_chunks: int = 2, mlp_extra_layers: int = 0, activation: str = 'SiLU', cpu_offload: bool = False, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, **kwargs)
Bases:
GraphEdgeMixin
,BaseProcessor
GNN Processor.
- class anemoi.models.layers.processor.GraphTransformerProcessor(num_layers: int, layer_kernels: DotDict, trainable_size: int = 8, num_channels: int = 128, num_chunks: int = 2, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', cpu_offload: bool = False, sub_graph: HeteroData | None = None, sub_graph_edge_attributes: list[str] | None = None, src_grid_size: int = 0, dst_grid_size: int = 0, **kwargs)
Bases:
GraphEdgeMixin
,BaseProcessor
Processor.
Chunks
- class anemoi.models.layers.chunk.BaseProcessorChunk(num_channels: int, num_layers: int, *args, activation: str = 'GELU', **kwargs)
Bases:
Module
,ABC
Base Processor Chunk.
- abstract forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.chunk.TransformerProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, window_size: int, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float = None, use_alibi_slopes: bool = None)
Bases:
BaseProcessorChunk
Wraps transformer blocks for checkpointing in Processor.
- forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.chunk.GNNProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', edge_dim: int | None = None)
Bases:
BaseProcessorChunk
Wraps edge embedding message passing blocks for checkpointing in Processor.
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) Tuple[Tensor, Tensor | None]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.chunk.GraphTransformerProcessorChunk(num_channels: int, num_layers: int, layer_kernels: DotDict, num_heads: int = 16, mlp_hidden_ratio: int = 4, activation: str = 'GELU', edge_dim: int | None = None)
Bases:
BaseProcessorChunk
Wraps graph transformer blocks for checkpointing in Processor.
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) Tuple[Tensor, Tensor | None]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Blocks
- class anemoi.models.layers.block.BaseBlock(**kwargs)
Bases:
Module
,ABC
Base class for network blocks.
- abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, size: Tuple[int, int] | None = None, model_comm_group: ProcessGroup | None = None) tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.TransformerProcessorBlock(num_channels: int, hidden_dim: int, num_heads: int, activation: str, window_size: int, layer_kernels: DotDict, dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float = None, use_alibi_slopes: bool = None)
Bases:
BaseBlock
Transformer block with MultiHeadSelfAttention and MLPs.
- forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphConvBaseBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)
Bases:
BaseBlock
Message passing block with MLPs for node embeddings.
- abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphConvProcessorBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)
Bases:
GraphConvBaseBlock
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphConvMapperBlock(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', update_src_nodes: bool = True, num_chunks: int = 1, **kwargs)
Bases:
GraphConvBaseBlock
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None) tuple[Tensor, Tensor]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphTransformerBaseBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)
-
Message passing block with MLPs for node embeddings.
- shard_qkve_heads(query: Tensor, key: Tensor, value: Tensor, edges: Tensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None) tuple[Tensor, Tensor, Tensor, Tensor]
Shards qkv and edges along head dimension.
- shard_output_seq(out: Tensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor
Shards Tensor sequence dimension.
- abstract forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphTransformerMapperBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)
Bases:
GraphTransformerBaseBlock
Graph Transformer Block for node embeddings.
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.block.GraphTransformerProcessorBlock(in_channels: int, hidden_dim: int, out_channels: int, edge_dim: int, layer_kernels: DotDict, num_heads: int = 16, bias: bool = True, activation: str = 'GELU', num_chunks: int = 1, update_src_nodes: bool = False, **kwargs)
Bases:
GraphTransformerBaseBlock
Graph Transformer Block for node embeddings.
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, shapes: tuple, batch_size: int, model_comm_group: ProcessGroup | None = None, size: Tuple[int, int] | None = None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Graph
- class anemoi.models.layers.graph.TrainableTensor(tensor_size: int, trainable_size: int)
Bases:
Module
Trainable Tensor Module.
- forward(x: Tensor, batch_size: int) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.graph.NamedNodesAttributes(num_trainable_params: int, graph_data: HeteroData)
Bases:
Module
Named Nodes Attributes information.
- attr_ndims
Total dimension of node attributes (non-trainable + trainable) for each group of nodes.
- trainable_tensors
Dictionary of trainable tensors for each group of nodes.
- Type:
nn.ModuleDict
- forward(self, name: str, batch_size: int) Tensor
Get the node attributes to be passed trough the graph neural network.
Conv
- class anemoi.models.layers.conv.GraphConv(in_channels: int, out_channels: int, layer_kernels: DotDict, mlp_extra_layers: int = 0, activation: str = 'SiLU', **kwargs)
Bases:
MessagePassing
Message passing module for convolutional node and edge interactions.
- forward(x: Tuple[Tensor, Tensor | None], edge_attr: Tensor, edge_index: Tensor | SparseTensor, size: Tuple[int, int] | None = None)
Runs the forward pass of the module.
- message(x_i: Tensor, x_j: Tensor, edge_attr: Tensor, dim_size: int | None = None) Tensor
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
- aggregate(edges_new: Tensor, edge_index: Tensor | SparseTensor, dim_size: int | None = None) tuple[Tensor, Tensor]
Aggregates messages from neighbors as \(\bigoplus_{j \in \mathcal{N}(i)}\).
Takes in the output of message computation as first argument and any argument which was initially passed to
propagate()
.By default, this function will delegate its call to the underlying
Aggregation
module to reduce messages as specified in__init__()
by theaggr
argument.
- class anemoi.models.layers.conv.GraphTransformerConv(out_channels: int, dropout: float = 0.0, **kwargs)
Bases:
MessagePassing
Message passing part of graph transformer operator.
Adapted from ‘Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification’ (https://arxiv.org/abs/2009.03509)
- forward(query: Tensor, key: Tensor, value: Tensor, edge_attr: Tensor | None, edge_index: Tensor | SparseTensor, size: Tuple[int, int] | None = None)
Runs the forward pass of the module.
- message(heads: int, query_i: Tensor, key_j: Tensor, value_j: Tensor, edge_attr: Tensor | None, index: Tensor, ptr: Tensor | None, size_i: int | None) Tensor
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
Attention
- class anemoi.models.layers.attention.MultiHeadSelfAttention(num_heads: int, embed_dim: int, layer_kernels: DotDict, bias: bool = False, is_causal: bool = False, window_size: int | None = None, dropout_p: float = 0.0, attention_implementation: str = 'flash_attention', softcap: float | None = None, use_alibi_slopes: bool = False)
Bases:
Module
Multi Head Self Attention Pytorch Layer
allows for three different attention implementations: - scaled dot product attention, see https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html - flash attention, see https://github.com/Dao-AILab/flash-attention
- forward(x: Tensor, shapes: list, batch_size: int, model_comm_group: ProcessGroup | None = None) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.attention.SDPAAttentionWrapper
Bases:
Module
Wrapper for Pytorch scaled dot product attention
- forward(query, key, value, batch_size: int, causal=False, window_size=None, dropout_p=0.0, softcap=None, alibi_slopes=None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class anemoi.models.layers.attention.FlashAttentionWrapper
Bases:
Module
Wrapper for Flash attention.
- forward(query, key, value, batch_size: int, causal: bool = False, window_size: int = None, dropout_p: float = 0.0, softcap: float | None = None, alibi_slopes: Tensor = None)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Multi-Layer Perceptron
- class anemoi.models.layers.mlp.MLP(in_features: int, hidden_dim: int, out_features: int, layer_kernels: DotDict, n_extra_layers: int = 0, activation: str = 'SiLU', final_activation: bool = False, layer_norm: bool = True, checkpoints: bool = False)
Bases:
Module
Multi-layer perceptron with optional checkpoint.
- forward(x: Tensor) Tensor
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Utils
- class anemoi.models.layers.utils.CheckpointWrapper(module: Module)
Bases:
Module
Wrapper for checkpointing a module.
- forward(*args, **kwargs)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.