Diagnostics

The diagnostics module in anemoi-training is used to monitor progress during training. It is split into two parts:

tracking training to a standard machine learning tracking tool. This monitors the training and validation losses and uploads the plots created by the callbacks.

a series of callbacks, evaluated on the validation dataset, including plots of example forecasts and power spectra plots;

Trackers

By default, anemoi-training uses MLFlow tracker, but it includes functionality to use both Weights & Biases and Tensorboard.

Callbacks

The callbacks can also be used to evaluate forecasts over longer rollouts beyond the forecast time that the model is trained on. The number of rollout steps for verification (or forecast iteration steps) is set using config.dataloader.validation_rollout = *num_of_rollout_steps*.

Callbacks are configured in the config file under the config.diagnostics key.

For regular callbacks, they can be provided as a list of dictionaries underneath the config.diagnostics.callbacks key. Each dictionary must have a _target_ key which is used by hydra to instantiate the callback, any other kwarg is passed to the callback’s constructor.

callbacks:
   - _target_: anemoi.training.diagnostics.callbacks.evaluation.RolloutEval
   rollout:
   - ${dataloader.validation_rollout}
   frequency: 20

Plotting callbacks are configured in a similar way, but they are specified underneath the config.diagnostics.plot.callbacks key. This is done to ensure seperation and ease of configuration between experiments. config.diagnostics.plot is a broader config file specifying the parameters to plot, as well as the plotting frequency, and asynchronosity.

Setting config.diagnostics.plot.asynchronous, means that the model training doesn’t stop whilst the callbacks are being evaluated. This is useful for large models where the plotting can take a long time. The plotting module uses asynchronous callbacks via asyncio and concurrent.futures.ThreadPoolExecutor to handle plotting tasks without blocking the main application. A dedicated event loop runs in a separate background thread, allowing plotting tasks to be offloaded to worker threads. This setup keeps the main thread responsive, handling plot-related tasks asynchronously and efficiently in the background.

Plot adapter compatibility

Task-specific plot adapters normalize output handling so plotting callbacks can use the same interface across task types:

forecaster tasks use ForecasterPlotAdapter;
autoencoder tasks use AutoencoderPlotAdapter;
temporal downscaler tasks use TemporalDownscalerPlotAdapter.

These adapters rely on the shared task _step return format (loss, metrics, predictions) where predictions is always a list of dataset-keyed dictionaries.

Focus Area

Plotting callbacks (such as PlotSample and PlotLoss) support a focus_area parameter. This allows you to restrict the geographic scope of plots to specific regions or masks. A focus area can be defined in two ways:

Mask Name: A mask_attr_name string referencing a boolean mask defined within the graph data.
Lat/Lon Bounds: A latlon_bbox list specifying a bounding box: [lat_min, lon_min, lat_max, lon_max].

When a focus area is applied, the plot filenames and experiment log tags will automatically include a suffix (e.g., _mask_attr_name or _latlon_bbox) to distinguish them from global plots.

# Example: Focusing on multiple specific geographic region
focus_areas:
   europe:
      latlon_bbox: [30.0, -20.0, 60.0, 40.0]
   china:
      latlon_bbox: [18.0, 73.0, 54.0, 135.0]

Rendering Methods

There is an additional flag in the plotting callbacks to control the rendering method for geospatial plots, offering a trade-off between performance and detail.

When datashader is set to True, Datashader is
used for rendering, which accelerates plotting through efficient hexbining, particularly useful for large datasets. This approach can produce smoother-looking plots due to the aggregation of data points.
If datashader is set to False, matplotlib.scatter is used, which provides
sharper and more detailed visuals but may be slower for large datasets.

Projection

Plotting callbacks also support config.diagnostics.plot.projection_kind to control the map projection used for geospatial figures.

equirectangular (default): regular axes, no Cartopy dependency.
lambert_conformal: regional Lambert Conformal projection fitted to the plotted latitude/longitude domain (requires Cartopy).

When datashader: True is enabled, plotting is forced to equirectangular because Datashader rendering does not support Cartopy transforms.

Note - this asynchronous behaviour is only available for the plotting callbacks.

Progress Bar

The progress bar callback can be configured to control how training progress is displayed. This is particularly useful on HPC systems with SLURM where output is written to files, as the default RichProgressBar in PyTorch Lightning 2.6+ may not work correctly. The progress bar is controlled by two configuration options:

enable_progress_bar: A boolean flag to enable or disable the progress bar entirely
progress_bar: Configuration for which progress bar callback to use

enable_progress_bar: True
progress_bar:
  _target_: pytorch_lightning.callbacks.TQDMProgressBar
  refresh_rate: 1

Lightning 2.6+ supports the (https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.RichProgressBar.html#lightning.pytorch.callbacks.RichProgressBar)[RichProgressBar], which is recommended for interactive terminals and (https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.TQDMProgressBar.html#lightning.pytorch.callbacks.TQDMProgressBar)[TQDMProgressBar] , that should be used with SLURM.

plot:
   asynchronous: True # Whether to plot asynchronously
   datashader: True # Whether to use datashader for plotting (faster)
   projection_kind: equirectangular # or lambert_conformal (requires Cartopy)
   frequency: # Frequency of the plotting
   batch: 750
   epoch: 5

   # Parameters to plot
   parameters:
      - z_500
      - t_850
      - u_850

   # Sample index
   sample_idx: 0

   # Precipitation and related fields
   precip_and_related_fields: [tp, cp]

   datasets_to_plot: ["data"]

   focus_areas:
      europe:
         latlon_bbox: [30.0, -20.0, 60.0, 40.0]
      china:
         latlon_bbox: [18.0, 73.0, 54.0, 135.0]

   callbacks:
      - _target_: anemoi.training.diagnostics.callbacks.plot.PlotLoss
         dataset_names: ["your_dataset_name"]
         # group parameters by categories when visualizing contributions to the loss
         # one-parameter groups are possible to highlight individual parameters
         parameter_groups:
            moisture: [tp, cp, tcw]
            sfc_wind: [10u, 10v]

      - _target_: anemoi.training.diagnostics.callbacks.plot.PlotSample
         dataset_names: ["your_dataset_name"]
         sample_idx: ${diagnostics.plot.sample_idx}
         per_sample : 6
         parameters: ${diagnostics.plot.parameters}

Below is the documentation for the default callbacks provided, but it is also possible for users to add callbacks using the same structure:

class anemoi.training.diagnostics.callbacks.checkpoint.AnemoiCheckpoint(**kwargs: dict)

Bases: ModelCheckpoint

A checkpoint callback that saves the model after every validation epoch.

on_train_end(trainer: Trainer, pl_module: LightningModule) → None

Save the last checkpoint at the end of training.

If the candidates aren’t better than the last checkpoint, then no checkpoints are saved. Note - this method if triggered when using max_epochs, it won’t save any checkpoints since the monitor candidates won’t show any changes with regard the the ‘on_train_epoch_end’ hook.

on_train_start(trainer: Trainer, pl_module: LightningModule) → None: Check that model’s metadata does not contain Pydantic schemas references.

class anemoi.training.diagnostics.callbacks.evaluation.RolloutEval(rollout: list[int | None] | ListConfig, every_n_batches: int)

Bases: Callback

Evaluates the model performance over a (longer) rollout window.

Health warning: this callback runs only every every_n_batches validation batches, so metrics are a sampled view of validation dates. Metrics are logged with distributed synchronization.

on_validation_batch_end(trainer: Trainer, pl_module: LightningModule, outputs: list, batch: Tensor, batch_idx: int) → None: Called when the validation batch ends.

class anemoi.training.diagnostics.callbacks.optimiser.LearningRateMonitor(logging_interval: str = 'step', log_momentum: bool = False)

Bases: LearningRateMonitor

Provide LearningRateMonitor from pytorch_lightning as a callback.

class anemoi.training.diagnostics.callbacks.plot.PlottingSettings(*, datashader: bool = True, projection_kind: str = 'equirectangular', asynchronous: bool = True, save_basedir: str | Path | None = None, colormaps: dict | None = None, precip_and_related_fields: list[str] | None = None, focus_areas: dict | None = None, dataset_names: list[str] | None = None)

Bases: BaseModel

Settings for plotting callbacks, shared across all plot callbacks in a run.

classmethod from_plot_config(plot_cfg: DictConfig, save_basedir: str | Path | None) → PlottingSettings: Construct from a validated diagnostics.plot config node.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class anemoi.training.diagnostics.callbacks.plot.BasePlotExecutor

Bases: ABC

Abstract base class for plot executors.

Defines the interface for scheduling plot function calls and shutting down.

abstractmethod schedule(fn: Any, trainer: Trainer, *args: Any, **kwargs: Any) → None: Schedule fn(trainer, *args, **kwargs) for execution.

abstractmethod shutdown(wait: bool = True) → None: Release any resources held by the executor.

class anemoi.training.diagnostics.callbacks.plot.SyncPlotExecutor

Bases: BasePlotExecutor

Executes plot functions synchronously on the calling thread.

schedule(fn: Any, trainer: Trainer, *args: Any, **kwargs: Any) → None: Schedule fn(trainer, *args, **kwargs) for execution.

shutdown(wait: bool = True) → None: Release any resources held by the executor.

class anemoi.training.diagnostics.callbacks.plot.AsyncPlotExecutor

Bases: BasePlotExecutor

Manages asynchronous plot execution in a background thread with an event loop.

Runs a single-threaded executor backed by a dedicated asyncio event loop, allowing plot functions to be submitted from the main thread without blocking it.

schedule(fn: Any, trainer: Trainer, *args: Any, **kwargs: Any) → None: Schedule fn(trainer, *args, **kwargs) to run asynchronously.

shutdown(wait: bool = True) → None

Shut down the executor and stop the event loop.

Parameters:: wait (bool) – If True (default), block until all pending plot tasks finish before stopping the loop — prevents “Task was destroyed but it is pending!” warnings on normal teardown. Set to False when called from an error handler running on the background thread itself (to avoid deadlock).

class anemoi.training.diagnostics.callbacks.plot.BasePlotCallback(dataset_names: list[str] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: Callback, ABC

Factory for creating a callback that plots data to Experiment Logging.

on_fit_start(trainer: Trainer, pl_module: LightningModule) → None: Check for NCCL timeout risk with asynchronous plotting.

property artifact_subfolder: str

Return the artifact subfolder name for experiment logging.

Used by MLflow to organize artifacts into per-callback folders. Derived automatically from the concrete callback class name.

teardown(trainer: Trainer, pl_module: LightningModule, stage: str) → None: Teardown the callback.

plot(trainer: Trainer, *args: Any, **kwargs: Any) → None: Schedule the plot function via the executor (sync or async).

class anemoi.training.diagnostics.callbacks.plot.BasePerBatchPlotCallback(every_n_batches: int | None = None, dataset_names: list[str] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePlotCallback

Base Callback for plotting at the end of each batch.

on_validation_batch_end(trainer: Trainer, pl_module: LightningModule, output: TrainingStepOutput, batch: dict[str, Tensor], batch_idx: int, **kwargs) → None: Called when the validation batch ends.

class anemoi.training.diagnostics.callbacks.plot.BasePerEpochPlotCallback(every_n_epochs: int | None = None, dataset_names: list[str] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePlotCallback

Base Callback for plotting at the end of each epoch.

on_fit_start(trainer: Trainer, pl_module: LightningModule) → None: Check for NCCL timeout risk with asynchronous plotting.

on_validation_epoch_end(trainer: Trainer, pl_module: LightningModule, **kwargs) → None: Called when the val epoch ends.

class anemoi.training.diagnostics.callbacks.plot.GraphTrainableFeaturesPlot(dataset_names: list[str] | None = None, every_n_epochs: int | None = None, q_extreme_limit: float = 0.05, plotting_settings: PlottingSettings | None = None)

Bases: BasePerEpochPlotCallback

Visualize the node & edge trainable features defined.

class anemoi.training.diagnostics.callbacks.plot.PlotLoss(parameter_groups: dict[dict[str, list[str]]], every_n_batches: int | None = None, dataset_names: list[str] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePerBatchPlotCallback

Plots the unsqueezed loss over rollouts.

sort_and_color_by_parameter_group(parameter_names: list[str]) → tuple[ndarray, ndarray, dict, list]: Sort parameters by group and prepare colors.

on_validation_batch_end(trainer: Trainer, pl_module: LightningModule, output: TrainingStepOutput, batch: dict[str, Tensor], batch_idx: int) → None: Called when the validation batch ends.

class anemoi.training.diagnostics.callbacks.plot.BasePlotAdditionalMetrics(every_n_batches: int | None = None, dataset_names: list[str] | None = None, focus_area: list[dict] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePerBatchPlotCallback

Base processing class for additional metrics.

process(pl_module: LightningModule, dataset_name: str, outputs: TrainingStepOutput, batch: dict[str, Tensor], members: int | list[int] | None = 0, processed_cache: dict | None = None) → tuple[ndarray, ndarray]

Process the data and output tensors for plotting one dataset specified by dataset_name.

Results are cached in processed_cache when provided, keyed by (dataset_name, members). Subsequent calls with the same key return the cached result without recomputation, avoiding redundant post-processing when multiple callbacks process the same batch.

Parameters:

pl_module (pl.LightningModule) – The LightningModule instance.
dataset_name (str) – The name of the dataset to process.
outputs (TrainingStepOutput) – The outputs from the model. The predictions must be a list of dicts (one per outer step).
batch (dict[str, torch.Tensor]) – The batch of data.
members (int | list[int] | None, optional) – Ensemble members to select. Only used when the plot adapter is ensemble-aware. None returns all members. Default is 0 (first member).
processed_cache (dict | None, optional) – Optional dict for caching computed results across callbacks within the same batch. Should be created fresh per batch (e.g. in on_validation_batch_end) so that it is not shared across batches. Safe for async execution since each batch invocation captures its own dict. Default is None (no caching).

Returns:

The post-processed input data and output tensor for plotting.

Return type:

tuple[np.ndarray, np.ndarray]

process_output_tensor(pl_module: LightningModule, dataset_name: str, outputs: list[dict[str, Tensor]], members: int | list[int] | None = 0) → ndarray: Post-process and mask per-step output tensors for plotting.

class anemoi.training.diagnostics.callbacks.plot.PlotSample(sample_idx: int, parameters: list[str], accumulation_levels_plot: list[float], precip_and_related_fields: list[str] | None = None, colormaps: dict[str, Colormap] | None = None, per_sample: int = 6, every_n_batches: int | None = None, dataset_names: list[str] | None = None, focus_area: list[dict] | None = None, prediction_label: str = 'pred', auxiliary_label: str = 'corrupted targets', plotting_settings: PlottingSettings | None = None)

Bases: BasePlotAdditionalMetrics

Plots a post-processed sample: input, target and prediction.

class anemoi.training.diagnostics.callbacks.plot.PlotEnsSample(sample_idx: int, parameters: list[str], accumulation_levels_plot: list[float], precip_and_related_fields: list[str] | None = None, colormaps: dict[str, Colormap] | None = None, per_sample: int = 6, every_n_batches: int | None = None, dataset_names: list[str] | None = None, members: list[int] | int | None = None, focus_area: list[dict] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: PlotSample

Plot ensemble mean, spread, and the difference of members to the mean for each variable.

class anemoi.training.diagnostics.callbacks.plot.PlotSpectrum(sample_idx: int, parameters: list[str], min_delta: float | None = None, every_n_batches: int | None = None, dataset_names: list[str] | None = None, focus_area: list[dict] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePlotAdditionalMetrics

Plots TP related metric comparing target and prediction.

The actual increment (output - input) is plot for prognostic variables while the output is plot for diagnostic ones.

Power Spectrum

class anemoi.training.diagnostics.callbacks.plot.PlotHistogram(sample_idx: int, parameters: list[str], precip_and_related_fields: list[str] | None = None, log_scale: bool = False, every_n_batches: int | None = None, dataset_names: list[str] | None = None, focus_area: list[dict] | None = None, plotting_settings: PlottingSettings | None = None)

Bases: BasePlotAdditionalMetrics

Plots histograms comparing target and prediction.

The actual increment (output - input) is plot for prognostic variables while the output is plot for diagnostic ones.

Plot adapter: single entry point for diagnostics callbacks.

Groups the plot-related hooks so task classes expose one attribute (plot_adapter) instead of five small methods.

The EnsemblePlotAdapterWrapper allows to wrap any task-specific adapter, adding ensemble member handling without modifying the inner adapter’s logic.

class anemoi.training.diagnostics.callbacks.plot_adapter.BasePlotAdapter(task: BaseTask)

Bases: ABC

Abstract plotting contract. Subclasses define output_times, get_init_step, iter_plot_samples.

select_members(tensor: Any, members: int | list[int] | None = None) → Any: Select ensemble members from tensor. No-op for non-ensemble adapters.

prepare_loss_batch(batch: dict) → dict: Prepare batch for loss plotting. No-op for non-ensemble adapters.

abstractmethod iter_plot_samples(data: Any, output_tensor: Any) → Iterator[tuple[Any, Any, Any, str]]: Yield (x, y_true, y_pred, tag_suffix) or (sample, recon, tag) per plot sample.

class anemoi.training.diagnostics.callbacks.plot_adapter.ForecasterPlotAdapter(task: BaseTask)

Bases: BasePlotAdapter

Plot Adapter to adapt plots to the rollout set-up of the Forecaster Task.

Handles multiple loss plots, n_step_output targets per step, multi-step iter.

iter_plot_samples(data: Any, output_tensor: Any) → Iterator[tuple[Any, Any, Any, str]]: Yield (x, y_true, y_pred, tag_suffix) or (sample, recon, tag) per plot sample.

class anemoi.training.diagnostics.callbacks.plot_adapter.TemporalDownscalerPlotAdapter(task: BaseTask)

Bases: BasePlotAdapter

Plot Adapter for TemporalDownscaler Task.

Handles squeezing (1, n_step_output, …) -> (n_step_output, …).

iter_plot_samples(data: Any, output_tensor: Any) → Iterator[tuple[Any, Any, Any, str]]: Yield (x, y_true, y_pred, tag_suffix) or (sample, recon, tag) per plot sample.

class anemoi.training.diagnostics.callbacks.plot_adapter.AutoencoderPlotAdapter(task: BaseTask)

Bases: BasePlotAdapter

Plot Adapter for Autoencoder Task: single (sample, recon, tag) yield.

iter_plot_samples(data: Any, output_tensor: Any) → Iterator[tuple[Any, Any, Any, str]]: Yield (x, y_true, y_pred, tag_suffix) or (sample, recon, tag) per plot sample.

class anemoi.training.diagnostics.callbacks.plot_adapter.EnsemblePlotAdapterWrapper(inner: BasePlotAdapter)

Bases: BasePlotAdapter

Wraps any task-specific adapter, adding ensemble member handling.

This adapter decorates an inner (task-specific) adapter to handle the extra ensemble dimension present in ensemble training outputs. Batch shape convention: (B, T, E, G, V) where E is ensemble members.

select_members(tensor: Any, members: int | list[int] | None = None) → Any

Slice ensemble members from dim 2 of the output tensor.

Parameters:

tensor (Any) – Tensor with shape (…, members, grid, vars).
members (int | list[int] | None) – Members to select. None returns all members, int/list selects specific members.

Returns:

Tensor with selected ensemble members.

Return type:

Any

prepare_loss_batch(batch: dict) → dict: Return the batch for loss plotting.

iter_plot_samples(data: Any, output_tensor: Any) → Iterator[tuple[Any, Any, Any, str]]: Yield (x, y_true, y_pred, tag_suffix) or (sample, recon, tag) per plot sample.

class anemoi.training.diagnostics.callbacks.provenance.ParentUUIDCallback

Bases: Callback

A callback that retrieves the parent UUID for a model, if it is a child model.

on_load_checkpoint(trainer: Trainer, pl_module: LightningModule, checkpoint: Module) → None

Called when loading a model checkpoint, use to reload state.

Parameters:

trainer – the current Trainer instance.
pl_module – the current LightningModule instance.
checkpoint – the full checkpoint dictionary that got loaded by the Trainer.