Schemas

This module defines pydantic schemas, which are used to validate the configuration before a training run is started. The top-level config yaml matches the BaseSchema.

The below schemas are organised below identically to the training config files,

Data

class anemoi.training.schemas.data.DatasetDataSchema(*, forcing: list[str] = <factory>, diagnostic: list[str] = <factory>, target: list[str] | None = None, processors: dict[str, ~anemoi.models.schemas.data_processor.PreprocessorSchema])

Bases: BaseModel

A class used to represent the configuration of a single dataset.

forcing: list[str]: Features that are not part of the forecast state but are used as forcing to generate the forecast state.

diagnostic: list[str]: Features that are only part of the forecast state and are not used as an input to the model.

target: list[str] | None

prognostic = diagnostic - forcing.union(target).

Type:: Features used to compute the loss against forecasted variables. Cannot be prognostic or diagnostic, can have the same name as forcing variables but have a different role. Such that

processors: dict[str, PreprocessorSchema]: Layers of model performing computation on latent space. Processors including imputers and normalizers are applied in order of definition. (single dataset mode)

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class anemoi.training.schemas.data.DataSchema(*, format: str, frequency: str | None = None, datasets: dict[str, DatasetDataSchema] | None = None, num_features: int | None)

Bases: BaseModel

A class used to represent the overall configuration of the dataset(s).

format

The format of the data.

Type:: str

resolution

The resolution of the data.

Type:: str

frequency

The frequency of the data.

Type:: str

timestep

The timestep of the data.

Type:: str

datasets

“Dictionary mapping dataset names to their configurations.”

Type:: dict[str, DatasetDataSchema] | None

num_features

The number of features in the forecast state. To be set in the code.

Type:: int, optional

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

format: str: Format of the data.

frequency: str | None: Time frequency requested from the dataset. Must be null when using trajectory (forecast) datasets.

datasets: dict[str, DatasetDataSchema] | None: Dictionary mapping dataset names to their configurations.

num_features: int | None: Number of features in the forecast state. To be set in the code.

Dataloader

Diagnostics

class anemoi.training.schemas.diagnostics.GraphTrainableFeaturesPlotSchema(*, _target_: Literal['anemoi.training.diagnostics.callbacks.plot.GraphTrainableFeaturesPlot'], dataset_names: list[str], every_n_epochs: int | None)

Schemas

Data

Dataloader

Diagnostics

System

Graph

Model

Training