Opening datasets
The simplest way to open a dataset is to use the open_dataset function:
from anemoi.datasets import open_dataset
ds = open_dataset(dataset, option1=value1, option2=...)
In that example, dataset can be:
a local path to a dataset on disk:
from anemoi.datasets import open_dataset
ds = open_dataset("/path/to/dataset.zarr")
a URL to a dataset in the cloud:
from anemoi.datasets import open_dataset
ds1 = open_dataset("https://path/to/dataset.zarr")
ds2 = open_dataset("s3://path/to/dataset.zarr")
a dataset name, which is a string that identifies a dataset in the anemoi configuration file.
from anemoi.datasets import open_dataset
ds = open_dataset("dataset_name")
an already opened dataset. In that case, the function uses the options to return a modified dataset, for example with a different time range or frequency.
from anemoi.datasets import open_dataset
ds1 = open_dataset("/path/to/dataset.zarr")
ds2 = open_dataset(ds1, frequency="24h", start="2000", end="2010")
a dictionary with a
dataset
key that can be any of the above, and the remaining keys being the options. The purpose of this option is to allow the user to open a dataset based on a configuration file. See an example below:
from anemoi.datasets import open_dataset
ds = open_dataset({"dataset": dataset, "option1": value1, "option2": ...})
a list of any of the above that will be combined either by concatenation or joining, based on their compatibility.
from anemoi.datasets import open_dataset
ds = open_dataset([dataset1, dataset2, ...])
a combining keyword, such as join, concat, ensembles, etc. followed by a list of the above. See Combining datasets for more information.
from anemoi.datasets import open_dataset
ds = open_dataset(
ensemble=[dataset1, dataset2],
option1=value1,
option2=...,
)
Note
In the example above, the options option1, option2, apply to the combined dataset. To apply options to individual datasets, use a list of dictionaries as shown below. The options option1, option2, apply to the first dataset, and option3, option4, to the second dataset, etc.
from anemoi.datasets import open_dataset
ds = open_dataset(
combine=[
{"dataset": dataset1, "option1": value1, "option2": ...},
{"dataset": dataset2, "option3": value3, "option4": ...},
]
)
As mentioned above, using the dictionary to open a dataset can be useful for software that provides users with the ability to define their requirements in a configuration file:
with open("config.yaml") as file:
config = yaml.safe_load(file)
ds = open_dataset(config)
The dictionary can be as complex as needed, for example:
from anemoi.datasets import open_dataset
config = {
"dataset": {
"ensemble": [
"/path/to/dataset1.zarr",
{"dataset": "dataset_name", "end": 2010},
{"dataset": "s3://path/to/dataset3.zarr", "start": 2000, "end": 2010},
],
"frequency": "24h",
},
"select": ["2t", "msl"],
}
ds = open_dataset(config)