datasets

The datasets command is used to manage datasets in the registry. It can be used to register or unregister a datasets, add and remove information about a dataset, and upload it to the catalogue.

The dataset names MUST follow the naming convention documented at <TODO ADD NAMING CONVENTION URL>. For instance dataset-name can be aifs-ea-an-oper-0001-mars-o96-1979-2022-6h-v6.

Registering

After creating locally a new dataset (using anemoi-datasets), registering it in the catalogue can be done as follow:

anemoi-registry datasets /path/to/dataset-name.zarr --register

Write credentials are needed to register a dataset to the catalogue. See Configuring.

Adding metadata

Additional information should be added to the dataset, such as the recipe used to create it, the status of the dataset, and the location of the dataset. This can be done as follow:

anemoi-registry datasets /path/to/dataset-name.zarr --register --set-recipe ./recipe.yaml --set-status experimental

Alternatively, the metadata can be added to an existing dataset:

anemoi-registry datasets dataset-name --set-recipe ./recipe.yaml
anemoi-registry datasets dataset-name --set-status experimental

Uploading to S3

Uploading the dataset to the catalogue to S3 can be done as follow:

anemoi-registry datasets /path/to/dataset-name.zarr --add-location ewc --upload

S3 credentials are required to upload a dataset, see Configuring.

Command line help

Manage datasets in the catalogue. Register, add locations, set status, etc.

usage: anemoi-registry datasets [-h] [--register] [--unregister] [--url]
                                [--set-status STATUS] [--set-recipe FILE]
                                [--add-local PLATFORM]
                                [--add-location PLATFORM] [--uri-pattern PATH]
                                [--upload | --no-upload]
                                [--remove-location PLATFORM]
                                [--delete-location PLATFORM]
                                NAME_OR_PATH

Positional Arguments

NAME_OR_PATH

The name or the path of a dataset.

Named Arguments

--register

Register a dataset in the catalogue.

Default: False

--unregister

Remove a dataset from catalogue (without deleting it from its locations). Ignore all other options.

Default: False

--url

Print the URL of the dataset.

Default: False

--set-status

Set the status to the dataset.

--set-recipe

Set the recipe file to [re-]build the dataset.

--add-local

Platform name to add a new location to the NAME_OR_PATH. Requires that NAME_OR_PATH is a path.

--add-location

Platform name to add a new location.

--uri-pattern

Path of the new location using {name}, such as ‘s3://ml-datasets/{name}.zarr’ . Requires a platform name in –add-location.

--upload, --no-upload

Upload the dataset. Requires a platform name in –add-location.

Default: False

--remove-location

Platform name to remove from the catalogue.

--delete-location

Actually delete the data when removing a location from the catalogue. Deletion of the data can take a long time. The location in the calogue is only removed when the deletion is successful. Implies –remove-location PLATFORM when the deletion is finished.