Configuring the Training
Anemoi training is designed so you can adjust key parts of the models and training process without needing to modify the underlying code.
A basic introduction to the configuration system is provided in the getting started section. This section will go into more detail on how to configure the training pipeline.
Default Config Groups
A typical config file will start with specifying the default config settings at the top as follows:
defaults:
- data: zarr
- dataloader: native_grid
- diagnostics: evaluation
- hardware: example
- graph: multi_scale
- model: gnn
- training: default
- _self_
These are group configs for each section. The options after the defaults are then used to override the configs, by assigning new features and keywords.
You can also find these defaults in other configs, like the
hardware, which implements:
defaults:
- paths: example
- files: example
YAML-based config overrides
The config files are written in YAML format. This allows for easy overrides of the default settings. For example, to change the model from the default GNN to a transformer, you can use the following config in the config groups.:
model: transformer
This will override the default model config with the transformer model.
You can also override individual settings. For example, to change the learning rate from the default value of 0.625e-4 to 1e-3, you can add the following to the config you’re using:
training:
lr:
rate: 1e-3
You can also change the GPU count to whatever you have available:
hardware:
num_gpus_per_node: 1
This matches the interface of the underlying defaults in Anemoi training.
Example Config File
Here is an example of a config file that changes the model to a transformer, the learning rate to 1e-3, and the number of GPUs to 1. We also need to specify the paths to the data, output, and graph data and give the names of the files to use. You can get a dataset from the Anemoi Datasets catalogue or create one using the Anemoi Datasets package.
You can create a graph using Anemoi Graphs or one will be created for you at runtime. Note that you must specify a filename for the graph, here we use first_graph_m320.pt.
You’ll also notice we’ve specified a resolution for the data, this must match the dataset you provide.
defaults:
- data: zarr
- dataloader: native_grid
- diagnostics: evaluation
- hardware: example
- graph: multi_scale
- model: transformer # Change from default group
- training: default
- _self_
data:
resolution: n320
hardware:
num_gpus_per_node: 1
paths:
output: /home/username/anemoi/training/output
data: /home/username/anemoi/datasets
graph: /home/username/anemoi/training/graphs
files:
dataset: datset-n320-2019-2021-6h.zarr
graph: first_graph_n320.pt
training:
lr:
rate: 1e-3
When we save this example.yaml file, we can run the training with this config using:
anemoi-training train --config-name=example.yaml
Command-line config overrides
It is also possible to use command line config overrides. We can switch out group configs using
anemoi-training train model=transformer
or override individual config entries such as
anemoi-training train diagnostics.plot.enabled=False
or combine everything together
anemoi-training train --config-name=debug.yaml model=transformer diagnostics.plot.enabled=False