.. _residual-connections: ###################### Residual connections ###################### Residual connections are a key architectural feature in Anemoi's encoder-processor-decoder models, enabling more effective information flow and gradient propagation across network layers. Residual connections help mitigate issues such as vanishing gradients and support the training of deeper, and more expressive models. The configurable residual connections link input data to output data. The type of residual connection used in a model is specified under the ``residual`` key in the model configuration YAML. This modular approach allows users to select and customize the residual strategy best suited for their forecasting task, whether it be a standard skip connection or a truncated connection. ***************************** Standard residual (default) ***************************** The standard residual formulation used in most models is: .. math:: x(t+1) = x(t) + f_\theta(x(t)) where :math:`f_\theta` is the learned model increment. This preserves the full input state and adds a correction. ***************** Skip Connection ***************** Returns the most recent timestep unchanged: .. math:: \text{residual}(x) = x_t This is the default residual and corresponds to the standard formulation above (the model output is added externally by the architecture). .. autoclass:: anemoi.models.layers.residual.SkipConnection :members: :no-undoc-members: :show-inheritance: ********************** Truncated Connection ********************** Projects the input to a coarser grid and back, removing high-frequency content from the skip connection via sparse spatial projections: .. math:: \text{residual}(x) = P_{\text{up}} \, P_{\text{down}} \, x_t where :math:`P_{\text{down}}` maps to the coarse grid and :math:`P_{\text{up}}` maps back to the original resolution. .. autoclass:: anemoi.models.layers.residual.TruncatedConnection :members: :no-undoc-members: :show-inheritance: **************** Configuration **************** Both connection types are configured under the ``residual`` key in the model config. ``TruncatedConnection`` accepts sibling-class kwargs such as ``step`` transparently, so switching between connection types requires only changing ``_target_``. ``TruncatedConnection`` supports two modes, both via the ``truncation_config`` key: - **On-the-fly**: the truncation subgraph is built at runtime from the main graph using a coarser ``grid`` specification. - **File-based**: precomputed ``.npz`` projection matrices are loaded from disk. Choose one mode per config; do not mix the two within the same ``truncation_config`` block. On-the-fly example: .. code:: yaml model: residual: _target_: anemoi.models.layers.residual.TruncatedConnection truncation_config: grid: o32 num_nearest_neighbours: 3 sigma: 1.0 File-based example: .. code:: yaml model: residual: _target_: anemoi.models.layers.residual.TruncatedConnection truncation_config: truncation_down_file_path: /path/to/O96-O32-grid-box-average.mat.npz truncation_up_file_path: /path/to/O32-O96-grid-box-average.mat.npz row_normalize: false .. note:: The top-level ``truncation_up_file_path`` and ``truncation_down_file_path`` kwargs are still accepted for backward compatibility, but the recommended approach is to move them inside ``truncation_config``. ***************************** Learnable residual (Ornstein) ***************************** Learnable residual connections introduce a trainable scaling parameter :math:`\alpha` on the residual connection, giving a formulation equivalent to a discretized Ornstein--Uhlenbeck process: .. math:: x(t+1) = \alpha \cdot x(t) + f_\theta(x(t)) With :math:`\alpha` trainable and :math:`\alpha < 1`, errors in the state are contracted at each step rather than perfectly preserved. This bounds error growth during autoregressive integration. Two variants are available, offering increasing degrees of spatial structure in the learnable parameters. ***************************** Scalar Ornstein Connection ***************************** A single learnable scalar :math:`\alpha_v` per prognostic variable :math:`v`: .. math:: \text{residual}(x)_v = (1 - \alpha_v) \cdot x_{t,v} where :math:`\alpha_v \in (\alpha_{\text{buff}}, 1)` is parameterized via a sigmoid. This is the simplest Ornstein variant -- no spatial structure, just a per-variable damping factor. .. autoclass:: anemoi.models.layers.residual.ScalarOrnsteinConnection :members: :no-undoc-members: :show-inheritance: ******************************* Spectral Ornstein Connection ******************************* Spatially-varying :math:`\alpha` and bias :math:`\mu`, defined as smooth functions on the sphere via spherical harmonic (SH) coefficients: .. math:: \text{residual}(x)_v = \bigl(1 - \alpha_v(s)\bigr) \cdot x_{t,v} + \mu_v(s) + \sum_i \beta_{i,v}(s) \cdot f_i where :math:`s` denotes the spatial location, :math:`\alpha_v(s)`, :math:`\mu_v(s)`, and :math:`\beta_{i,v}(s)` are reconstructed from low-order SH coefficients (controlled by ``lmax``), and :math:`f_i` are optional forcing regressors. When ``truncate=True``, a learnable spectral low-pass filter is applied to the input fields *before* computing the residual. This removes high-frequency content from the skip connection, forcing the model to reconstruct fine-scale detail from scratch. An optional anti-aliasing blend (``anti_aliasing=True``) smoothly mixes the filtered and unfiltered fields. .. autoclass:: anemoi.models.layers.residual.SpectralOrnsteinConnection :members: :no-undoc-members: :show-inheritance: