datasets
The datasets command is used to manage datasets in the registry. It can be used to register or unregister a datasets, add and remove information about a dataset, and upload it to the catalogue.
The dataset names MUST follow the naming convention documented at <TODO ADD NAMING CONVENTION URL>. For instance dataset-name can be aifs-ea-an-oper-0001-mars-o96-1979-2022-6h-v6.
Registering
After creating locally a new dataset (using anemoi-datasets), registering it in the catalogue can be done as follow:
anemoi-registry datasets /path/to/dataset-name.zarr --register
Write credentials are needed to register a dataset to the catalogue. See Configuring.
Adding metadata
Additional information should be added to the dataset, such as the recipe used to create it, the status of the dataset, and the location of the dataset. This can be done as follow:
anemoi-registry datasets /path/to/dataset-name.zarr --register --set-recipe ./recipe.yaml --set-status experimental
Alternatively, the metadata can be added to an existing dataset:
anemoi-registry datasets dataset-name --set-recipe ./recipe.yaml
anemoi-registry datasets dataset-name --set-status experimental
Uploading to S3
Uploading the dataset to the catalogue to S3 can be done as follow:
anemoi-registry datasets /path/to/dataset-name.zarr --add-location ewc --upload
S3 credentials are required to upload a dataset, see Configuring.
Command line help
Manage datasets in the catalogue. Register, add locations, set status, etc.
usage: anemoi-registry datasets [-h] [--register] [--unregister] [--url]
[--set-status STATUS] [--set-recipe FILE]
[--add-local PLATFORM]
[--add-location PLATFORM] [--uri-pattern PATH]
[--upload | --no-upload]
[--remove-location PLATFORM]
[--delete-location PLATFORM]
NAME_OR_PATH
Positional Arguments
- NAME_OR_PATH
The name or the path of a dataset.
Named Arguments
- --register
Register a dataset in the catalogue.
Default: False
- --unregister
Remove a dataset from catalogue (without deleting it from its locations). Ignore all other options.
Default: False
- --url
Print the URL of the dataset.
Default: False
- --set-status
Set the status to the dataset.
- --set-recipe
Set the recipe file to [re-]build the dataset.
- --add-local
Platform name to add a new location to the NAME_OR_PATH. Requires that NAME_OR_PATH is a path.
- --add-location
Platform name to add a new location.
- --uri-pattern
Path of the new location using {name}, such as ‘s3://ml-datasets/{name}.zarr’ . Requires a platform name in –add-location.
- --upload, --no-upload
Upload the dataset. Requires a platform name in –add-location.
Default: False
- --remove-location
Platform name to remove from the catalogue.
- --delete-location
Actually delete the data when removing a location from the catalogue. Deletion of the data can take a long time. The location in the calogue is only removed when the deletion is successful. Implies –remove-location PLATFORM when the deletion is finished.