Losses

This module is used to define the loss function used to train the model.

Anemoi-training exposes a couple of loss functions by default to be used, all of which are subclassed from BaseWeightedLoss. This class enables scalar multiplication, and graph node weighting.

class anemoi.training.losses.weightedloss.BaseWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)

Bases: Module, ABC

Node-weighted general loss.

add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) → ScaleTensor

Add new scalar to be applied along dimension.

Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.

Parameters:

dimension (int | tuple[int]) – Dimension/s to apply the scalar to
scalar (torch.Tensor) – Scalar tensor to apply
name (str | None, optional) – Name of the scalar, by default None

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

update_scalar(name: str, scalar: Tensor, *, override: bool = False) → None

Update an existing scalar maintaining original dimensions.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.

Parameters:

name (str) – Name of the scalar to update
scalar (torch.Tensor) – New scalar tensor
override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

scale(x: Tensor, subset_indices: tuple[int, ...] | None = None, *, without_scalars: list[str] | list[int] | None = None) → Tensor

Scale a tensor by the variable_scaling.

Parameters:

x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
subset_indices (tuple[int,...], optional) – Indices to subset the calculated scalar and x tensor with, by default None.
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Scaled error tensor

Return type:

torch.Tensor

scale_by_node_weights(x: Tensor, squash: bool = True) → Tensor

Scale a tensor by the node_weights.

Equivalent to reducing and averaging accordingly across all dimensions of the tensor.

Parameters:

x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True If False, the loss returned of shape (n_outputs)

Returns:

Scaled error tensor

Return type:

torch.Tensor

abstract forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) → Tensor

Calculates the lat-weighted scaled loss.

Parameters:

pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Weighted loss

Return type:

torch.Tensor

property name: str: Used for logging identification purposes.

class anemoi.training.losses.weightedloss.FunctionalWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)

Bases: BaseWeightedLoss

WeightedLoss which a user can subclass and provide calculate_difference.

calculate_difference should calculate the difference between the prediction and target. All scaling and weighting is handled by the parent class.

Example:

```python class MyLoss(FunctionalWeightedLoss):

def calculate_difference(self, pred, target):
return pred - target

```

abstract calculate_difference(pred: Tensor, target: Tensor) → Tensor: Calculate Difference between prediction and target.

forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) → Tensor

Calculates the lat-weighted scaled loss.

Parameters:

pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Weighted loss

Return type:

torch.Tensor

Default Loss Functions

By default anemoi-training trains the model using a latitude-weighted mean-squared-error, which is defined in the WeightedMSELoss class in anemoi/training/losses/mse.py. The loss function can be configured in the config file at config.training.training_loss, and config.training.validation_metrics.

The following loss functions are available by default:

WeightedMSELoss: Latitude-weighted mean-squared-error.
WeightedMAELoss: Latitude-weighted mean-absolute-error.
WeightedHuberLoss: Latitude-weighted Huber loss.
WeightedLogCoshLoss: Latitude-weighted log-cosh loss.
WeightedRMSELoss: Latitude-weighted root-mean-squared-error.
CombinedLoss: Combined component weighted loss.

These are available in the anemoi.training.losses module, at anemoi.training.losses.{short_name}.{class_name}.

So for example, to use the WeightedMSELoss class, you would reference it in the config as follows:

# loss function for the model
training_loss:
   # loss class to initialise
   _target_: anemoi.training.losses.mse.WeightedMSELoss
   # loss function kwargs here

Scalars

In addition to node scaling, the loss function can also be scaled by a scalar. These are provided by the Forecaster class, and a user can define whether to include them in the loss function by setting scalars in the loss config dictionary.

# loss function for the model
training_loss:
   # loss class to initialise
   _target_: anemoi.training.losses.mse.WeightedMSELoss
   scalars: ['scalar1', 'scalar2']

Currently, the following scalars are available for use:

variable: Scale by the feature/variable weights as defined in the config config.training.variable_loss_scaling.

Validation Metrics

Validation metrics as defined in the config file at config.training.validation_metrics follow the same initialisation behaviour as the loss function, but can be a list. In this case all losses are calculated and logged as a dictionary with the corresponding name

Scaling Validation Losses

Validation metrics can not by default be scaled by scalars across the variable dimension, but can be by all other scalars. If you want to scale a validation metric by the variable weights, it must be added to config.training.scale_validation_metrics.

These metrics are then kept in the normalised, preprocessed space, and thus the indexing of scalars aligns with the indexing of the tensors.

By default, only all is kept in the normalised space and scaled.

# List of validation metrics to keep in normalised space, and scalars to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
metrics:
   - 'all'
   # - "*"

Custom Loss Functions

Additionally, you can define your own loss function by subclassing BaseWeightedLoss and implementing the forward method, or by subclassing FunctionalWeightedLoss and implementing the calculate_difference function. The latter abstracts the scaling, and node weighting, and allows you to just specify the difference calculation.

from anemoi.training.losses.weightedloss import FunctionalWeightedLoss

class MyLossFunction(FunctionalWeightedLoss):
   def calculate_difference(self, pred, target):
      return (pred - target) ** 2

Then in the config, set _target_ to the class name, and any additional kwargs to the loss function.

Combined Losses

Building on the simple single loss functions, a user can define a combined loss, one that weights and combines multiple loss functions.

This can be done by referencing the CombinedLoss class in the config file, and setting the losses key to a list of loss functions to combine. Each of those losses is then initalised just like the other losses above.

training_loss:
   __target__: anemoi.training.losses.combined.CombinedLoss
   losses:
      - __target__: anemoi.training.losses.mse.WeightedMSELoss
      - __target__: anemoi.training.losses.mae.WeightedMAELoss
   scalars: ['variable']
   loss_weights: [1.0,0.5]

All kwargs passed to CombinedLoss are passed to each of the loss functions, and the loss weights are used to scale the individual losses before combining them.

class anemoi.training.losses.combined.CombinedLoss(*extra_losses: dict[str, Any] | Callable, losses: tuple[dict[str, Any] | Callable] | None = None, loss_weights: tuple[int, ...], **kwargs)

Bases: Module

Combined Loss function.

forward(pred: Tensor, target: Tensor, **kwargs) → Tensor

Calculates the combined loss.

Parameters:

pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
kwargs (Any) – Additional arguments to pass to the loss functions Will be passed to all loss functions

Returns:

Combined loss

Return type:

torch.Tensor

Utility Functions

There is also generic functions that are useful for losses in anemoi/training/losses/utils.py.

grad_scaler is used to automatically scale the loss gradients in the loss function using the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2. This can be switched on in the config by setting the option config.training.loss_gradient_scaling=True.

ScaleTensor is a class that can record and apply arbitrary scaling factors to tensors. It supports relative indexing, combining multiple scalars over the same dimensions, and is only constructed at broadcasting time, so the shape can be resolved to match the tensor exactly.

anemoi.training.losses.utils.grad_scaler(module: Module, grad_in: tuple[Tensor, ...], grad_out: tuple[Tensor, ...]) → tuple[Tensor, ...] | None

Scales the loss gradients.

Uses the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2

Use <module>.register_full_backward_hook(grad_scalar, prepend=False) to register this hook.

Parameters:

module (nn.Module) – Loss object (not used)
grad_in (tuple[torch.Tensor, ...]) – Loss gradients
grad_out (tuple[torch.Tensor, ...]) – Output gradients (not used)

Returns:

Re-scaled input gradients

Return type:

tuple[torch.Tensor, …]

class anemoi.training.losses.utils.Shape(func: Callable[[int], int])

Bases: object

Shape resolving object.

Bases: object

Dynamically resolved tensor scaling class.

Allows a user to specify a scalar and the dimensions it should be applied to. The class will then enforce that additional scalars are compatible with the specified dimensions.

When get_scalar or scale is called, the class will return the product of all scalars, resolved to the dimensional size of the input tensor.

Additionally, the class can be subsetted to only return a subset of the scalars, but only from those given names.

Examples

>>> tensor = torch.randn(3, 4, 5)
>>> scalars = ScaleTensor((0, torch.randn(3)), (1, torch.randn(4)))
>>> scaled_tensor = scalars.scale(tensor)
>>> scalars.get_scalar(tensor.ndim).shape
torch.Size([3, 4, 1])
>>> scalars.add_scalar(-1, torch.randn(5))
>>> scalars.get_scalar(tensor.ndim).shape
torch.Size([3, 4, 5])

property shape: Shape

Get the shape of the scale tensor.

Returns a Shape object to be indexed, Will only resolve those dimensions specified in the tensors.

validate_scalar(dimension: int | tuple[int], scalar: Tensor) → None

Check if the scalar is compatible with the given dimension.

Parameters:

dimension (int | tuple[int]) – Dimensions to check scalar against
scalar (torch.Tensor) – Scalar tensor to check

Raises:

ValueError – If the scalar is not compatible with the given dimension

add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) → ScaleTensor

Add new scalar to be applied along dimension.

Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.

Parameters:

dimension (int | tuple[int]) – Dimension/s to apply the scalar to
scalar (torch.Tensor) – Scalar tensor to apply
name (str | None, optional) – Name of the scalar, by default None

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

remove_scalar(scalar_to_remove: str | int) → ScaleTensor

Remove scalar from ScaleTensor.

Parameters:: scalar_to_remove (str | int) – Name or index of tensor to remove
Raises:: ValueError – If the scalar is not in the scalars
Returns:: ScaleTensor with the scalar removed
Return type:: ScaleTensor

freeze_state() → FrozenStateRecord

Freeze the state of the Scalar with a context manager.

Any changes made will be reverted on exit.

Returns:: Context manager to freeze the state of this ScaleTensor
Return type:: FrozenStateRecord

update_scalar(name: str, scalar: Tensor, *, override: bool = False) → None

Update an existing scalar maintaining original dimensions.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.

Parameters:

name (str) – Name of the scalar to update
scalar (torch.Tensor) – New scalar tensor
override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

add(new_scalars: dict[str, tuple[int | tuple[int], Tensor]] | list[tuple[int | tuple[int], Tensor]] | None = None, **kwargs) → None

Add multiple scalars to the existing scalars.

Parameters:

new_scalars (dict[str, TENSOR_SPEC] | list[TENSOR_SPEC] | None, optional) – Scalars to add, see add_scalar for more info, by default None
**kwargs – Kwargs form of {name: (dimension, tensor)} to add to the scalars

update(updated_scalars: dict[str, Tensor] | None = None, override: bool = False, **kwargs) → None

Update multiple scalars in the existing scalars.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of shape.

Parameters:

updated_scalars (dict[str, torch.Tensor] | None, optional) – Scalars to update, referenced by name, by default None
override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False
**kwargs – Kwargs form of {name: tensor} to update in the scalars

subset(scalar_identifier: str | Sequence[str] | int | Sequence[int]) → ScaleTensor

Get subset of the scalars, filtering by name or dimension.

Parameters:: scalar_identifier (str | Sequence[str] | int | Sequence[int]) – Name/s or dimension/s of the scalars to get
Returns:: Subset of self
Return type:: ScaleTensor

subset_by_str(scalars: str | Sequence[str]) → ScaleTensor

Get subset of the scalars, filtering by name.

See .subset_by_dim for subsetting by affected dimensions.

Parameters:: scalars (str | Sequence[str]) – Name/s of the scalars to get
Returns:: Subset of self
Return type:: ScaleTensor

subset_by_dim(dimensions: int | Sequence[int]) → ScaleTensor

Get subset of the scalars, filtering by dimension.

See .subset for subsetting by name.

Parameters:: dimensions (int | Sequence[int]) – Dimensions to get scalars of
Returns:: Subset of self
Return type:: ScaleTensor

without(scalar_identifier: str | Sequence[str] | int | Sequence[int]) → ScaleTensor

Get subset of the scalars, filtering out by name or dimension.

Parameters:: scalar_identifier (str | Sequence[str] | int | Sequence[int]) – Name/s or dimension/s of the scalars to exclude
Returns:: Subset of self
Return type:: ScaleTensor

without_by_str(scalars: str | Sequence[str]) → ScaleTensor

Get subset of the scalars, filtering out by name.

Parameters:: scalars (str | Sequence[str]) – Name/s of the scalars to exclude
Returns:: Subset of self
Return type:: ScaleTensor

without_by_dim(dimensions: int | Sequence[int]) → ScaleTensor

Get subset of the scalars, filtering out by dimension.

Parameters:: dimensions (int | Sequence[int]) – Dimensions to exclude scalars of
Returns:: Subset of self
Return type:: ScaleTensor

resolve(ndim: int) → ScaleTensor

Resolve relative indexes in scalars by associating against ndim.

i.e. if a scalar was given as effecting dimension -1, and ndim was provided as 4, the scalar will be fixed to effect dimension 3.

Parameters:: ndim (int) – Number of dimensions to resolve relative indexing against
Returns:: ScaleTensor with all relative indexes resolved
Return type:: ScaleTensor

scale(tensor: Tensor) → Tensor

Scale a given tensor by the scalars.

Parameters:: tensor (torch.Tensor) – Input tensor to scale
Returns:: Scaled tensor
Return type:: torch.Tensor

get_scalar(ndim: int, device: str | None = None) → Tensor

Get completely resolved scalar tensor.

Parameters:

ndim (int) – Number of dimensions of the tensor to resolve the scalars to Used to resolve relative indices, and add singleton dimensions
device (str | None, optional) – Device to move the scalar to, by default None

Returns:

Scalar tensor

Return type:

torch.Tensor

Raises:

ValueError – If resolving relative indices is invalid

to(*args, **kwargs) → None: Move scalars inplace.