Losses

This module is used to define the loss function used to train the model.

Anemoi-training exposes a couple of loss functions by default to be used, all of which are subclassed from BaseWeightedLoss. This class enables scalar multiplication, and graph node weighting.

class anemoi.training.losses.weightedloss.BaseWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)

Bases: Module, ABC

Node-weighted general loss.

add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor

Add new scalar to be applied along dimension.

Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.

Parameters:
  • dimension (int | tuple[int]) – Dimension/s to apply the scalar to

  • scalar (torch.Tensor) – Scalar tensor to apply

  • name (str | None, optional) – Name of the scalar, by default None

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

update_scalar(name: str, scalar: Tensor, *, override: bool = False) None

Update an existing scalar maintaining original dimensions.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.

Parameters:
  • name (str) – Name of the scalar to update

  • scalar (torch.Tensor) – New scalar tensor

  • override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

scale(x: Tensor, subset_indices: tuple[int, ...] | None = None, *, without_scalars: list[str] | list[int] | None = None) Tensor

Scale a tensor by the variable_scaling.

Parameters:
  • x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)

  • subset_indices (tuple[int,...], optional) – Indices to subset the calculated scalar and x tensor with, by default None.

  • without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Scaled error tensor

Return type:

torch.Tensor

scale_by_node_weights(x: Tensor, squash: bool = True) Tensor

Scale a tensor by the node_weights.

Equivalent to reducing and averaging accordingly across all dimensions of the tensor.

Parameters:
  • x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)

  • squash (bool, optional) – Average last dimension, by default True If False, the loss returned of shape (n_outputs)

Returns:

Scaled error tensor

Return type:

torch.Tensor

abstract forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor

Calculates the lat-weighted scaled loss.

Parameters:
  • pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • squash (bool, optional) – Average last dimension, by default True

  • scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None

  • without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Weighted loss

Return type:

torch.Tensor

property name: str

Used for logging identification purposes.

class anemoi.training.losses.weightedloss.FunctionalWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)

Bases: BaseWeightedLoss

WeightedLoss which a user can subclass and provide calculate_difference.

calculate_difference should calculate the difference between the prediction and target. All scaling and weighting is handled by the parent class.

Example:

```python class MyLoss(FunctionalWeightedLoss):

def calculate_difference(self, pred, target):

return pred - target

```

abstract calculate_difference(pred: Tensor, target: Tensor) Tensor

Calculate Difference between prediction and target.

forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor

Calculates the lat-weighted scaled loss.

Parameters:
  • pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • squash (bool, optional) – Average last dimension, by default True

  • scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None

  • without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None

Returns:

Weighted loss

Return type:

torch.Tensor

Deterministic Loss Functions

By default anemoi-training trains the model using a latitude-weighted mean-squared-error, which is defined in the WeightedMSELoss class in anemoi/training/losses/mse.py. The loss function can be configured in the config file at config.training.training_loss, and config.training.validation_metrics.

The following loss functions are available by default:

  • WeightedMSELoss: Latitude-weighted mean-squared-error.

  • WeightedMAELoss: Latitude-weighted mean-absolute-error.

  • WeightedHuberLoss: Latitude-weighted Huber loss.

  • WeightedLogCoshLoss: Latitude-weighted log-cosh loss.

  • WeightedRMSELoss: Latitude-weighted root-mean-squared-error.

  • CombinedLoss: Combined component weighted loss.

These are available in the anemoi.training.losses module, at anemoi.training.losses.{short_name}.{class_name}.

So for example, to use the WeightedMSELoss class, you would reference it in the config as follows:

# loss function for the model
training_loss:
   # loss class to initialise
   _target_: anemoi.training.losses.mse.WeightedMSELoss
   # loss function kwargs here

Probabilistic Loss Functions

The following probabilistic loss functions are available by default:

  • KernelCRPSLoss: Kernel CRPS loss.

  • AlmostFairKernelCRPSLoss: Almost fair Kernel CRPS loss see Lang et al. (2024).

The config for these loss functions is the same as for the deterministic:

# loss function for the model
training_loss:
   # loss class to initialise
   _target_: anemoi.training.losses.kcrps.KernelCRPSLoss
   # loss function kwargs here

Scalars

In addition to node scaling, the loss function can also be scaled by a scalar. These are provided by the Forecaster class, and a user can define whether to include them in the loss function by setting scalars in the loss config dictionary.

# loss function for the model
training_loss:
   # loss class to initialise
   _target_: anemoi.training.losses.mse.WeightedMSELoss
   scalars: ['scalar1', 'scalar2']

Currently, the following scalars are available for use:

  • variable: Scale by the feature/variable weights as defined in the config config.training.variable_loss_scaling.

Validation Metrics

Validation metrics as defined in the config file at config.training.validation_metrics follow the same initialisation behaviour as the loss function, but can be a list. In this case all losses are calculated and logged as a dictionary with the corresponding name

Scaling Validation Losses

Validation metrics can not by default be scaled by scalars across the variable dimension, but can be by all other scalars. If you want to scale a validation metric by the variable weights, it must be added to config.training.scale_validation_metrics.

These metrics are then kept in the normalised, preprocessed space, and thus the indexing of scalars aligns with the indexing of the tensors.

By default, only all is kept in the normalised space and scaled.

# List of validation metrics to keep in normalised space, and scalars to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
metrics:
   - 'all'
   # - "*"

Custom Loss Functions

Additionally, you can define your own loss function by subclassing BaseWeightedLoss and implementing the forward method, or by subclassing FunctionalWeightedLoss and implementing the calculate_difference function. The latter abstracts the scaling, and node weighting, and allows you to just specify the difference calculation.

from anemoi.training.losses.weightedloss import FunctionalWeightedLoss

class MyLossFunction(FunctionalWeightedLoss):
   def calculate_difference(self, pred, target):
      return (pred - target) ** 2

Then in the config, set _target_ to the class name, and any additional kwargs to the loss function.

Combined Losses

Building on the simple single loss functions, a user can define a combined loss, one that weights and combines multiple loss functions.

This can be done by referencing the CombinedLoss class in the config file, and setting the losses key to a list of loss functions to combine. Each of those losses is then initalised just like the other losses above.

training_loss:
   _target_: anemoi.training.losses.combined.CombinedLoss
   losses:
      - _target_: anemoi.training.losses.mse.WeightedMSELoss
      - _target_: anemoi.training.losses.mae.WeightedMAELoss
   loss_weights: [1.0,0.5]
   scalars: ['variable']

All extra kwargs passed to CombinedLoss are passed to each of the loss functions, and the loss weights are used to scale the individual losses before combining them.

If scalars is not given in the underlying loss functions, all the scalars given to the CombinedLoss are used.

If different scalars are required for each loss, the root level scalars of the CombinedLoss should contain all the scalars required by the individual losses. Then the scalars for each loss can be set in the individual loss config.

training_loss:
   _target_: anemoi.training.losses.combined.CombinedLoss
   losses:
         - _target_: anemoi.training.losses.mse.WeightedMSELoss
           scalars: ['variable']
         - _target_: anemoi.training.losses.mae.WeightedMAELoss
           scalars: ['loss_weights_mask']
   loss_weights: [1.0, 1.0]
   scalars: ['*']
class anemoi.training.losses.combined.CombinedLoss(*extra_losses: dict[str, Any] | Callable | BaseWeightedLoss, loss_weights: tuple[int, ...] | None = None, losses: tuple[dict[str, Any] | Callable | BaseWeightedLoss] | None = None, **kwargs)

Bases: BaseWeightedLoss

Combined Loss function.

forward(pred: torch.Tensor, target: torch.Tensor, **kwargs) torch.Tensor

Calculates the combined loss.

Parameters:
  • pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)

  • kwargs (Any) – Additional arguments to pass to the loss functions Will be passed to all loss functions

Returns:

Combined loss

Return type:

torch.Tensor

property name: str

Used for logging identification purposes.

property scalar: ScaleTensor

Get union of underlying scalars.

add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor

Add new scalar to be applied along dimension.

Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.

Parameters:
  • dimension (int | tuple[int]) – Dimension/s to apply the scalar to

  • scalar (torch.Tensor) – Scalar tensor to apply

  • name (str | None, optional) – Name of the scalar, by default None

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

update_scalar(name: str, scalar: Tensor, *, override: bool = False) None

Update an existing scalar maintaining original dimensions.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.

Parameters:
  • name (str) – Name of the scalar to update

  • scalar (torch.Tensor) – New scalar tensor

  • override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

Utility Functions

There is also generic functions that are useful for losses in anemoi/training/losses/utils.py.

grad_scaler is used to automatically scale the loss gradients in the loss function using the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2. This can be switched on in the config by setting the option config.training.loss_gradient_scaling=True.

ScaleTensor is a class that can record and apply arbitrary scaling factors to tensors. It supports relative indexing, combining multiple scalars over the same dimensions, and is only constructed at broadcasting time, so the shape can be resolved to match the tensor exactly.

anemoi.training.losses.utils.grad_scaler(module: Module, grad_in: tuple[Tensor, ...], grad_out: tuple[Tensor, ...]) tuple[Tensor, ...] | None

Scales the loss gradients.

Uses the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2

Use <module>.register_full_backward_hook(grad_scalar, prepend=False) to register this hook.

Parameters:
  • module (nn.Module) – Loss object (not used)

  • grad_in (tuple[torch.Tensor, ...]) – Loss gradients

  • grad_out (tuple[torch.Tensor, ...]) – Output gradients (not used)

Returns:

Re-scaled input gradients

Return type:

tuple[torch.Tensor, …]

class anemoi.training.losses.utils.Shape(func: Callable[[int], int])

Bases: object

Shape resolving object.

class anemoi.training.losses.utils.ScaleTensor(scalars: dict[str, tuple[int | tuple[int], Tensor]] | tuple[int | tuple[int], Tensor] | None = None, *tensors: tuple[int | tuple[int], Tensor], **named_tensors: dict[str, tuple[int | tuple[int], Tensor]])

Bases: object

Dynamically resolved tensor scaling class.

Allows a user to specify a scalar and the dimensions it should be applied to. The class will then enforce that additional scalars are compatible with the specified dimensions.

When get_scalar or scale is called, the class will return the product of all scalars, resolved to the dimensional size of the input tensor.

Additionally, the class can be subsetted to only return a subset of the scalars, but only from those given names.

Examples

>>> tensor = torch.randn(3, 4, 5)
>>> scalars = ScaleTensor((0, torch.randn(3)), (1, torch.randn(4)))
>>> scaled_tensor = scalars.scale(tensor)
>>> scalars.get_scalar(tensor.ndim).shape
torch.Size([3, 4, 1])
>>> scalars.add_scalar(-1, torch.randn(5))
>>> scalars.get_scalar(tensor.ndim).shape
torch.Size([3, 4, 5])
property shape: Shape

Get the shape of the scale tensor.

Returns a Shape object to be indexed, Will only resolve those dimensions specified in the tensors.

validate_scalar(dimension: int | tuple[int], scalar: Tensor) None

Check if the scalar is compatible with the given dimension.

Parameters:
  • dimension (int | tuple[int]) – Dimensions to check scalar against

  • scalar (torch.Tensor) – Scalar tensor to check

Raises:

ValueError – If the scalar is not compatible with the given dimension

add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor

Add new scalar to be applied along dimension.

Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.

Parameters:
  • dimension (int | tuple[int]) – Dimension/s to apply the scalar to

  • scalar (torch.Tensor) – Scalar tensor to apply

  • name (str | None, optional) – Name of the scalar, by default None

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

remove_scalar(scalar_to_remove: str | int) ScaleTensor

Remove scalar from ScaleTensor.

Parameters:

scalar_to_remove (str | int) – Name or index of tensor to remove

Raises:

ValueError – If the scalar is not in the scalars

Returns:

ScaleTensor with the scalar removed

Return type:

ScaleTensor

freeze_state() FrozenStateRecord

Freeze the state of the Scalar with a context manager.

Any changes made will be reverted on exit.

Returns:

Context manager to freeze the state of this ScaleTensor

Return type:

FrozenStateRecord

update_scalar(name: str, scalar: Tensor, *, override: bool = False) None

Update an existing scalar maintaining original dimensions.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.

Parameters:
  • name (str) – Name of the scalar to update

  • scalar (torch.Tensor) – New scalar tensor

  • override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

add(new_scalars: dict[str, tuple[int | tuple[int], Tensor]] | list[tuple[int | tuple[int], Tensor]] | None = None, **kwargs) None

Add multiple scalars to the existing scalars.

Parameters:
  • new_scalars (dict[str, TENSOR_SPEC] | list[TENSOR_SPEC] | None, optional) – Scalars to add, see add_scalar for more info, by default None

  • **kwargs – Kwargs form of {name: (dimension, tensor)} to add to the scalars

update(updated_scalars: dict[str, Tensor] | None = None, override: bool = False, **kwargs) None

Update multiple scalars in the existing scalars.

If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of shape.

Parameters:
  • updated_scalars (dict[str, torch.Tensor] | None, optional) – Scalars to update, referenced by name, by default None

  • override (bool, optional) – Whether to override the scalar ignoring dimension compatibility, by default False

  • **kwargs – Kwargs form of {name: tensor} to update in the scalars

subset(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor

Get subset of the scalars, filtering by name or dimension.

Parameters:

scalar_identifier (str | Sequence[str] | int | Sequence[int]) – Name/s or dimension/s of the scalars to get

Returns:

Subset of self

Return type:

ScaleTensor

subset_by_str(scalars: str | Sequence[str]) ScaleTensor

Get subset of the scalars, filtering by name.

See .subset_by_dim for subsetting by affected dimensions.

Parameters:

scalars (str | Sequence[str]) – Name/s of the scalars to get

Returns:

Subset of self

Return type:

ScaleTensor

subset_by_dim(dimensions: int | Sequence[int]) ScaleTensor

Get subset of the scalars, filtering by dimension.

See .subset for subsetting by name.

Parameters:

dimensions (int | Sequence[int]) – Dimensions to get scalars of

Returns:

Subset of self

Return type:

ScaleTensor

without(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor

Get subset of the scalars, filtering out by name or dimension.

Parameters:

scalar_identifier (str | Sequence[str] | int | Sequence[int]) – Name/s or dimension/s of the scalars to exclude

Returns:

Subset of self

Return type:

ScaleTensor

without_by_str(scalars: str | Sequence[str]) ScaleTensor

Get subset of the scalars, filtering out by name.

Parameters:

scalars (str | Sequence[str]) – Name/s of the scalars to exclude

Returns:

Subset of self

Return type:

ScaleTensor

without_by_dim(dimensions: int | Sequence[int]) ScaleTensor

Get subset of the scalars, filtering out by dimension.

Parameters:

dimensions (int | Sequence[int]) – Dimensions to exclude scalars of

Returns:

Subset of self

Return type:

ScaleTensor

resolve(ndim: int) ScaleTensor

Resolve relative indexes in scalars by associating against ndim.

i.e. if a scalar was given as effecting dimension -1, and ndim was provided as 4, the scalar will be fixed to effect dimension 3.

Parameters:

ndim (int) – Number of dimensions to resolve relative indexing against

Returns:

ScaleTensor with all relative indexes resolved

Return type:

ScaleTensor

scale(tensor: Tensor) Tensor

Scale a given tensor by the scalars.

Parameters:

tensor (torch.Tensor) – Input tensor to scale

Returns:

Scaled tensor

Return type:

torch.Tensor

get_scalar(ndim: int, device: str | None = None) Tensor

Get completely resolved scalar tensor.

Parameters:
  • ndim (int) – Number of dimensions of the tensor to resolve the scalars to Used to resolve relative indices, and add singleton dimensions

  • device (str | None, optional) – Device to move the scalar to, by default None

Returns:

Scalar tensor

Return type:

torch.Tensor

Raises:

ValueError – If resolving relative indices is invalid

to(*args, **kwargs) None

Move scalars inplace.