Losses
This module is used to define the loss function used to train the model.
Anemoi-training exposes a couple of loss functions by default to be
used, all of which are subclassed from BaseWeightedLoss
. This class
enables scalar multiplication, and graph node weighting.
- class anemoi.training.losses.weightedloss.BaseWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)
Bases:
Module
,ABC
Node-weighted general loss.
- add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor
Add new scalar to be applied along dimension.
Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.
- Parameters:
- Returns:
ScaleTensor with the scalar removed
- Return type:
- update_scalar(name: str, scalar: Tensor, *, override: bool = False) None
Update an existing scalar maintaining original dimensions.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.
- scale(x: Tensor, subset_indices: tuple[int, ...] | None = None, *, without_scalars: list[str] | list[int] | None = None) Tensor
Scale a tensor by the variable_scaling.
- Parameters:
x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
subset_indices (tuple[int,...], optional) – Indices to subset the calculated scalar and x tensor with, by default None.
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Scaled error tensor
- Return type:
torch.Tensor
- scale_by_node_weights(x: Tensor, squash: bool = True) Tensor
Scale a tensor by the node_weights.
Equivalent to reducing and averaging accordingly across all dimensions of the tensor.
- Parameters:
x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True If False, the loss returned of shape (n_outputs)
- Returns:
Scaled error tensor
- Return type:
torch.Tensor
- abstract forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor
Calculates the lat-weighted scaled loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Weighted loss
- Return type:
torch.Tensor
- class anemoi.training.losses.weightedloss.FunctionalWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)
Bases:
BaseWeightedLoss
WeightedLoss which a user can subclass and provide calculate_difference.
calculate_difference should calculate the difference between the prediction and target. All scaling and weighting is handled by the parent class.
Example:
```python class MyLoss(FunctionalWeightedLoss):
- def calculate_difference(self, pred, target):
return pred - target
- abstract calculate_difference(pred: Tensor, target: Tensor) Tensor
Calculate Difference between prediction and target.
- forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor
Calculates the lat-weighted scaled loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Weighted loss
- Return type:
torch.Tensor
Deterministic Loss Functions
By default anemoi-training trains the model using a latitude-weighted
mean-squared-error, which is defined in the WeightedMSELoss
class in
anemoi/training/losses/mse.py
. The loss function can be configured
in the config file at config.training.training_loss
, and
config.training.validation_metrics
.
The following loss functions are available by default:
WeightedMSELoss
: Latitude-weighted mean-squared-error.WeightedMAELoss
: Latitude-weighted mean-absolute-error.WeightedHuberLoss
: Latitude-weighted Huber loss.WeightedLogCoshLoss
: Latitude-weighted log-cosh loss.WeightedRMSELoss
: Latitude-weighted root-mean-squared-error.CombinedLoss
: Combined component weighted loss.
These are available in the anemoi.training.losses
module, at
anemoi.training.losses.{short_name}.{class_name}
.
So for example, to use the WeightedMSELoss
class, you would
reference it in the config as follows:
# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
# loss function kwargs here
Probabilistic Loss Functions
The following probabilistic loss functions are available by default:
KernelCRPSLoss
: Kernel CRPS loss.AlmostFairKernelCRPSLoss
: Almost fair Kernel CRPS loss see Lang et al. (2024).
The config for these loss functions is the same as for the deterministic:
# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.kcrps.KernelCRPSLoss
# loss function kwargs here
Scalars
In addition to node scaling, the loss function can also be scaled by a
scalar. These are provided by the Forecaster
class, and a user can
define whether to include them in the loss function by setting
scalars
in the loss config dictionary.
# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
scalars: ['scalar1', 'scalar2']
Currently, the following scalars are available for use:
variable
: Scale by the feature/variable weights as defined in the configconfig.training.variable_loss_scaling
.
Validation Metrics
Validation metrics as defined in the config file at
config.training.validation_metrics
follow the same initialisation
behaviour as the loss function, but can be a list. In this case all
losses are calculated and logged as a dictionary with the corresponding
name
Scaling Validation Losses
Validation metrics can not by default be scaled by scalars across the variable dimension, but can be by all other scalars. If you want to scale a validation metric by the variable weights, it must be added to config.training.scale_validation_metrics.
These metrics are then kept in the normalised, preprocessed space, and thus the indexing of scalars aligns with the indexing of the tensors.
By default, only all is kept in the normalised space and scaled.
# List of validation metrics to keep in normalised space, and scalars to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
metrics:
- 'all'
# - "*"
Custom Loss Functions
Additionally, you can define your own loss function by subclassing
BaseWeightedLoss
and implementing the forward
method, or by
subclassing FunctionalWeightedLoss
and implementing the
calculate_difference
function. The latter abstracts the scaling, and
node weighting, and allows you to just specify the difference
calculation.
from anemoi.training.losses.weightedloss import FunctionalWeightedLoss
class MyLossFunction(FunctionalWeightedLoss):
def calculate_difference(self, pred, target):
return (pred - target) ** 2
Then in the config, set _target_
to the class name, and any
additional kwargs to the loss function.
Combined Losses
Building on the simple single loss functions, a user can define a combined loss, one that weights and combines multiple loss functions.
This can be done by referencing the CombinedLoss
class in the config
file, and setting the losses
key to a list of loss functions to
combine. Each of those losses is then initalised just like the other
losses above.
training_loss:
_target_: anemoi.training.losses.combined.CombinedLoss
losses:
- _target_: anemoi.training.losses.mse.WeightedMSELoss
- _target_: anemoi.training.losses.mae.WeightedMAELoss
loss_weights: [1.0,0.5]
scalars: ['variable']
All extra kwargs passed to CombinedLoss
are passed to each of the
loss functions, and the loss weights are used to scale the individual
losses before combining them.
If scalars
is not given in the underlying loss functions, all the
scalars given to the CombinedLoss
are used.
If different scalars are required for each loss, the root level scalars
of the CombinedLoss
should contain all the scalars required by the
individual losses. Then the scalars for each loss can be set in the
individual loss config.
training_loss:
_target_: anemoi.training.losses.combined.CombinedLoss
losses:
- _target_: anemoi.training.losses.mse.WeightedMSELoss
scalars: ['variable']
- _target_: anemoi.training.losses.mae.WeightedMAELoss
scalars: ['loss_weights_mask']
loss_weights: [1.0, 1.0]
scalars: ['*']
- class anemoi.training.losses.combined.CombinedLoss(*extra_losses: dict[str, Any] | Callable | BaseWeightedLoss, loss_weights: tuple[int, ...] | None = None, losses: tuple[dict[str, Any] | Callable | BaseWeightedLoss] | None = None, **kwargs)
Bases:
BaseWeightedLoss
Combined Loss function.
- forward(pred: torch.Tensor, target: torch.Tensor, **kwargs) torch.Tensor
Calculates the combined loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
kwargs (Any) – Additional arguments to pass to the loss functions Will be passed to all loss functions
- Returns:
Combined loss
- Return type:
torch.Tensor
- property scalar: ScaleTensor
Get union of underlying scalars.
- add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor
Add new scalar to be applied along dimension.
Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.
- Parameters:
- Returns:
ScaleTensor with the scalar removed
- Return type:
- update_scalar(name: str, scalar: Tensor, *, override: bool = False) None
Update an existing scalar maintaining original dimensions.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.
Utility Functions
There is also generic functions that are useful for losses in
anemoi/training/losses/utils.py
.
grad_scaler
is used to automatically scale the loss gradients in the
loss function using the formula in https://arxiv.org/pdf/2306.06079.pdf,
section 4.3.2. This can be switched on in the config by setting the
option config.training.loss_gradient_scaling=True
.
ScaleTensor
is a class that can record and apply arbitrary scaling
factors to tensors. It supports relative indexing, combining multiple
scalars over the same dimensions, and is only constructed at
broadcasting time, so the shape can be resolved to match the tensor
exactly.
- anemoi.training.losses.utils.grad_scaler(module: Module, grad_in: tuple[Tensor, ...], grad_out: tuple[Tensor, ...]) tuple[Tensor, ...] | None
Scales the loss gradients.
Uses the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2
Use <module>.register_full_backward_hook(grad_scalar, prepend=False) to register this hook.
- class anemoi.training.losses.utils.Shape(func: Callable[[int], int])
Bases:
object
Shape resolving object.
- class anemoi.training.losses.utils.ScaleTensor(scalars: dict[str, tuple[int | tuple[int], Tensor]] | tuple[int | tuple[int], Tensor] | None = None, *tensors: tuple[int | tuple[int], Tensor], **named_tensors: dict[str, tuple[int | tuple[int], Tensor]])
Bases:
object
Dynamically resolved tensor scaling class.
Allows a user to specify a scalar and the dimensions it should be applied to. The class will then enforce that additional scalars are compatible with the specified dimensions.
When get_scalar or scale is called, the class will return the product of all scalars, resolved to the dimensional size of the input tensor.
Additionally, the class can be subsetted to only return a subset of the scalars, but only from those given names.
Examples
>>> tensor = torch.randn(3, 4, 5) >>> scalars = ScaleTensor((0, torch.randn(3)), (1, torch.randn(4))) >>> scaled_tensor = scalars.scale(tensor) >>> scalars.get_scalar(tensor.ndim).shape torch.Size([3, 4, 1]) >>> scalars.add_scalar(-1, torch.randn(5)) >>> scalars.get_scalar(tensor.ndim).shape torch.Size([3, 4, 5])
- property shape: Shape
Get the shape of the scale tensor.
Returns a Shape object to be indexed, Will only resolve those dimensions specified in the tensors.
- validate_scalar(dimension: int | tuple[int], scalar: Tensor) None
Check if the scalar is compatible with the given dimension.
- Parameters:
- Raises:
ValueError – If the scalar is not compatible with the given dimension
- add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor
Add new scalar to be applied along dimension.
Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.
- Parameters:
- Returns:
ScaleTensor with the scalar removed
- Return type:
- remove_scalar(scalar_to_remove: str | int) ScaleTensor
Remove scalar from ScaleTensor.
- Parameters:
scalar_to_remove (str | int) – Name or index of tensor to remove
- Raises:
ValueError – If the scalar is not in the scalars
- Returns:
ScaleTensor with the scalar removed
- Return type:
- freeze_state() FrozenStateRecord
Freeze the state of the Scalar with a context manager.
Any changes made will be reverted on exit.
- Returns:
Context manager to freeze the state of this ScaleTensor
- Return type:
FrozenStateRecord
- update_scalar(name: str, scalar: Tensor, *, override: bool = False) None
Update an existing scalar maintaining original dimensions.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.
- add(new_scalars: dict[str, tuple[int | tuple[int], Tensor]] | list[tuple[int | tuple[int], Tensor]] | None = None, **kwargs) None
Add multiple scalars to the existing scalars.
- update(updated_scalars: dict[str, Tensor] | None = None, override: bool = False, **kwargs) None
Update multiple scalars in the existing scalars.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of shape.
- Parameters:
- subset(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering by name or dimension.
- subset_by_str(scalars: str | Sequence[str]) ScaleTensor
Get subset of the scalars, filtering by name.
See .subset_by_dim for subsetting by affected dimensions.
- subset_by_dim(dimensions: int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering by dimension.
See .subset for subsetting by name.
- without(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering out by name or dimension.
- without_by_str(scalars: str | Sequence[str]) ScaleTensor
Get subset of the scalars, filtering out by name.
- without_by_dim(dimensions: int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering out by dimension.
- resolve(ndim: int) ScaleTensor
Resolve relative indexes in scalars by associating against ndim.
i.e. if a scalar was given as effecting dimension -1, and ndim was provided as 4, the scalar will be fixed to effect dimension 3.
- Parameters:
ndim (int) – Number of dimensions to resolve relative indexing against
- Returns:
ScaleTensor with all relative indexes resolved
- Return type:
- scale(tensor: Tensor) Tensor
Scale a given tensor by the scalars.
- Parameters:
tensor (torch.Tensor) – Input tensor to scale
- Returns:
Scaled tensor
- Return type:
torch.Tensor
- get_scalar(ndim: int, device: str | None = None) Tensor
Get completely resolved scalar tensor.
- Parameters:
- Returns:
Scalar tensor
- Return type:
torch.Tensor
- Raises:
ValueError – If resolving relative indices is invalid