Losses
This module is used to define the loss function used to train the model.
Anemoi-training exposes a couple of loss functions by default to be
used, all of which are subclassed from BaseWeightedLoss. This class
enables scalar multiplication, and graph node weighting.
- class anemoi.training.losses.weightedloss.BaseWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)
Bases:
Module,ABCNode-weighted general loss.
- add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor
Add new scalar to be applied along dimension.
Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.
- Parameters:
- Returns:
ScaleTensor with the scalar removed
- Return type:
- update_scalar(name: str, scalar: Tensor, *, override: bool = False) None
Update an existing scalar maintaining original dimensions.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.
- scale(x: Tensor, subset_indices: tuple[int, ...] | None = None, *, without_scalars: list[str] | list[int] | None = None) Tensor
Scale a tensor by the variable_scaling.
- Parameters:
x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
subset_indices (tuple[int,...], optional) – Indices to subset the calculated scalar and x tensor with, by default None.
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Scaled error tensor
- Return type:
torch.Tensor
- scale_by_node_weights(x: Tensor, squash: bool = True) Tensor
Scale a tensor by the node_weights.
Equivalent to reducing and averaging accordingly across all dimensions of the tensor.
- Parameters:
x (torch.Tensor) – Tensor to be scaled, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True If False, the loss returned of shape (n_outputs)
- Returns:
Scaled error tensor
- Return type:
torch.Tensor
- abstract forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor
Calculates the lat-weighted scaled loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Weighted loss
- Return type:
torch.Tensor
- class anemoi.training.losses.weightedloss.FunctionalWeightedLoss(node_weights: Tensor, ignore_nans: bool = False)
Bases:
BaseWeightedLossWeightedLoss which a user can subclass and provide calculate_difference.
calculate_difference should calculate the difference between the prediction and target. All scaling and weighting is handled by the parent class.
Example:
```python class MyLoss(FunctionalWeightedLoss):
- def calculate_difference(self, pred, target):
return pred - target
- abstract calculate_difference(pred: Tensor, target: Tensor) Tensor
Calculate Difference between prediction and target.
- forward(pred: Tensor, target: Tensor, squash: bool = True, *, scalar_indices: tuple[int, ...] | None = None, without_scalars: list[str] | list[int] | None = None) Tensor
Calculates the lat-weighted scaled loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
squash (bool, optional) – Average last dimension, by default True
scalar_indices (tuple[int,...], optional) – Indices to subset the calculated scalar with, by default None
without_scalars (list[str] | list[int] | None, optional) – list of scalars to exclude from scaling. Can be list of names or dimensions to exclude. By default None
- Returns:
Weighted loss
- Return type:
torch.Tensor
Default Loss Functions
By default anemoi-training trains the model using a latitude-weighted
mean-squared-error, which is defined in the WeightedMSELoss class in
anemoi/training/losses/mse.py. The loss function can be configured
in the config file at config.training.training_loss, and
config.training.validation_metrics.
The following loss functions are available by default:
WeightedMSELoss: Latitude-weighted mean-squared-error.WeightedMAELoss: Latitude-weighted mean-absolute-error.WeightedHuberLoss: Latitude-weighted Huber loss.WeightedLogCoshLoss: Latitude-weighted log-cosh loss.WeightedRMSELoss: Latitude-weighted root-mean-squared-error.CombinedLoss: Combined component weighted loss.
These are available in the anemoi.training.losses module, at
anemoi.training.losses.{short_name}.{class_name}.
So for example, to use the WeightedMSELoss class, you would
reference it in the config as follows:
# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
# loss function kwargs here
Scalars
In addition to node scaling, the loss function can also be scaled by a
scalar. These are provided by the Forecaster class, and a user can
define whether to include them in the loss function by setting
scalars in the loss config dictionary.
# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
scalars: ['scalar1', 'scalar2']
Currently, the following scalars are available for use:
variable: Scale by the feature/variable weights as defined in the configconfig.training.variable_loss_scaling.
Validation Metrics
Validation metrics as defined in the config file at
config.training.validation_metrics follow the same initialisation
behaviour as the loss function, but can be a list. In this case all
losses are calculated and logged as a dictionary with the corresponding
name
Scaling Validation Losses
Validation metrics can not by default be scaled by scalars across the variable dimension, but can be by all other scalars. If you want to scale a validation metric by the variable weights, it must be added to config.training.scale_validation_metrics.
These metrics are then kept in the normalised, preprocessed space, and thus the indexing of scalars aligns with the indexing of the tensors.
By default, only all is kept in the normalised space and scaled.
# List of validation metrics to keep in normalised space, and scalars to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
metrics:
- 'all'
# - "*"
Custom Loss Functions
Additionally, you can define your own loss function by subclassing
BaseWeightedLoss and implementing the forward method, or by
subclassing FunctionalWeightedLoss and implementing the
calculate_difference function. The latter abstracts the scaling, and
node weighting, and allows you to just specify the difference
calculation.
from anemoi.training.losses.weightedloss import FunctionalWeightedLoss
class MyLossFunction(FunctionalWeightedLoss):
def calculate_difference(self, pred, target):
return (pred - target) ** 2
Then in the config, set _target_ to the class name, and any
additional kwargs to the loss function.
Combined Losses
Building on the simple single loss functions, a user can define a combined loss, one that weights and combines multiple loss functions.
This can be done by referencing the CombinedLoss class in the config
file, and setting the losses key to a list of loss functions to
combine. Each of those losses is then initalised just like the other
losses above.
training_loss:
__target__: anemoi.training.losses.combined.CombinedLoss
losses:
- __target__: anemoi.training.losses.mse.WeightedMSELoss
- __target__: anemoi.training.losses.mae.WeightedMAELoss
scalars: ['variable']
loss_weights: [1.0,0.5]
All kwargs passed to CombinedLoss are passed to each of the loss
functions, and the loss weights are used to scale the individual losses
before combining them.
- class anemoi.training.losses.combined.CombinedLoss(*extra_losses: dict[str, Any] | Callable, losses: tuple[dict[str, Any] | Callable] | None = None, loss_weights: tuple[int, ...], **kwargs)
Bases:
ModuleCombined Loss function.
- forward(pred: Tensor, target: Tensor, **kwargs) Tensor
Calculates the combined loss.
- Parameters:
pred (torch.Tensor) – Prediction tensor, shape (bs, ensemble, lat*lon, n_outputs)
target (torch.Tensor) – Target tensor, shape (bs, ensemble, lat*lon, n_outputs)
kwargs (Any) – Additional arguments to pass to the loss functions Will be passed to all loss functions
- Returns:
Combined loss
- Return type:
torch.Tensor
Utility Functions
There is also generic functions that are useful for losses in
anemoi/training/losses/utils.py.
grad_scaler is used to automatically scale the loss gradients in the
loss function using the formula in https://arxiv.org/pdf/2306.06079.pdf,
section 4.3.2. This can be switched on in the config by setting the
option config.training.loss_gradient_scaling=True.
ScaleTensor is a class that can record and apply arbitrary scaling
factors to tensors. It supports relative indexing, combining multiple
scalars over the same dimensions, and is only constructed at
broadcasting time, so the shape can be resolved to match the tensor
exactly.
- anemoi.training.losses.utils.grad_scaler(module: Module, grad_in: tuple[Tensor, ...], grad_out: tuple[Tensor, ...]) tuple[Tensor, ...] | None
Scales the loss gradients.
Uses the formula in https://arxiv.org/pdf/2306.06079.pdf, section 4.3.2
Use <module>.register_full_backward_hook(grad_scalar, prepend=False) to register this hook.
- class anemoi.training.losses.utils.Shape(func: Callable[[int], int])
Bases:
objectShape resolving object.
- class anemoi.training.losses.utils.ScaleTensor(scalars: dict[str, tuple[int | tuple[int], Tensor]] | tuple[int | tuple[int], Tensor] | None = None, *tensors: tuple[int | tuple[int], Tensor], **named_tensors: dict[str, tuple[int | tuple[int], Tensor]])
Bases:
objectDynamically resolved tensor scaling class.
Allows a user to specify a scalar and the dimensions it should be applied to. The class will then enforce that additional scalars are compatible with the specified dimensions.
When get_scalar or scale is called, the class will return the product of all scalars, resolved to the dimensional size of the input tensor.
Additionally, the class can be subsetted to only return a subset of the scalars, but only from those given names.
Examples
>>> tensor = torch.randn(3, 4, 5) >>> scalars = ScaleTensor((0, torch.randn(3)), (1, torch.randn(4))) >>> scaled_tensor = scalars.scale(tensor) >>> scalars.get_scalar(tensor.ndim).shape torch.Size([3, 4, 1]) >>> scalars.add_scalar(-1, torch.randn(5)) >>> scalars.get_scalar(tensor.ndim).shape torch.Size([3, 4, 5])
- property shape: Shape
Get the shape of the scale tensor.
Returns a Shape object to be indexed, Will only resolve those dimensions specified in the tensors.
- validate_scalar(dimension: int | tuple[int], scalar: Tensor) None
Check if the scalar is compatible with the given dimension.
- Parameters:
- Raises:
ValueError – If the scalar is not compatible with the given dimension
- add_scalar(dimension: int | tuple[int], scalar: Tensor, *, name: str | None = None) ScaleTensor
Add new scalar to be applied along dimension.
Dimension can be a single int even for a multi-dimensional scalar, in this case the dimensions are assigned as a range starting from the given int. Negative indexes are also valid, and will be resolved against the tensor’s ndim.
- Parameters:
- Returns:
ScaleTensor with the scalar removed
- Return type:
- remove_scalar(scalar_to_remove: str | int) ScaleTensor
Remove scalar from ScaleTensor.
- Parameters:
scalar_to_remove (str | int) – Name or index of tensor to remove
- Raises:
ValueError – If the scalar is not in the scalars
- Returns:
ScaleTensor with the scalar removed
- Return type:
- freeze_state() FrozenStateRecord
Freeze the state of the Scalar with a context manager.
Any changes made will be reverted on exit.
- Returns:
Context manager to freeze the state of this ScaleTensor
- Return type:
FrozenStateRecord
- update_scalar(name: str, scalar: Tensor, *, override: bool = False) None
Update an existing scalar maintaining original dimensions.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of validity against original scalar.
- add(new_scalars: dict[str, tuple[int | tuple[int], Tensor]] | list[tuple[int | tuple[int], Tensor]] | None = None, **kwargs) None
Add multiple scalars to the existing scalars.
- update(updated_scalars: dict[str, Tensor] | None = None, override: bool = False, **kwargs) None
Update multiple scalars in the existing scalars.
If override is False, the scalar must be valid against the original dimensions. If override is True, the scalar will be updated regardless of shape.
- Parameters:
- subset(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering by name or dimension.
- subset_by_str(scalars: str | Sequence[str]) ScaleTensor
Get subset of the scalars, filtering by name.
See .subset_by_dim for subsetting by affected dimensions.
- subset_by_dim(dimensions: int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering by dimension.
See .subset for subsetting by name.
- without(scalar_identifier: str | Sequence[str] | int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering out by name or dimension.
- without_by_str(scalars: str | Sequence[str]) ScaleTensor
Get subset of the scalars, filtering out by name.
- without_by_dim(dimensions: int | Sequence[int]) ScaleTensor
Get subset of the scalars, filtering out by dimension.
- resolve(ndim: int) ScaleTensor
Resolve relative indexes in scalars by associating against ndim.
i.e. if a scalar was given as effecting dimension -1, and ndim was provided as 4, the scalar will be fixed to effect dimension 3.
- Parameters:
ndim (int) – Number of dimensions to resolve relative indexing against
- Returns:
ScaleTensor with all relative indexes resolved
- Return type:
- scale(tensor: Tensor) Tensor
Scale a given tensor by the scalars.
- Parameters:
tensor (torch.Tensor) – Input tensor to scale
- Returns:
Scaled tensor
- Return type:
torch.Tensor
- get_scalar(ndim: int, device: str | None = None) Tensor
Get completely resolved scalar tensor.
- Parameters:
- Returns:
Scalar tensor
- Return type:
torch.Tensor
- Raises:
ValueError – If resolving relative indices is invalid