Introduction
An Anemoi dataset is a thin wrapper around a zarr store that is optimised for training data-driven weather forecasting models. It is organised in such a way that I/O operations are minimised (see Overview).
To open a dataset, you can use the open_dataset function.
from anemoi.datasets import open_dataset
ds = open_dataset("/path/to/dataset.zarr")
You can then access the data in the dataset using the ds object as if it was a NumPy array.
print(ds.shape)
print(len(ds))
print(ds[0])
print(ds[10:20])
One of the main features of the anemoi-datasets package is the ability to subset or combine datasets.
from anemoi.datasets import open_dataset
ds = open_dataset("path/to/dataset.zarr", start=2000, end=2020)
In that case, a dataset is created that only contains the data between the years 2000 and 2020. Combining is done by passing multiple paths to the open_dataset function:
from anemoi.datasets import open_dataset
ds = open_dataset("path/to/dataset1.zarr", "path/to/dataset2.zarr")
In the latter case, the datasets are combined along the time dimension or the variable dimension depending on the dataset’s structure.