Dataset naming conventions

A dataset name is a string used to identify a dataset. It is designed to be human readable and is not designed to be parsed and splitted into parts.

To ensure consistency, a dataset name should follow the following rules:

  • All lower case.

  • Only letters and numbers and dashes - are allowed.

  • No underscore _ and no dot . and no upper case letter and no other special character (@, #, * etc.).

Additionlly, a dataset name is built from different parts joined with - as follows (each part can contain additional -):

purpose-content-source-resolution-start-year-end-year-frequency-version[-extra-str]

Note

This is the current naming conventions for datasets in the Anemoi registry. It will need to be updated and adapted as more datasets are added. The part purpose is especially difficult to define for some datasets and may be revisited.

The tables below provides more details and some examples.

Dataset naming conventions

Component

Description

purpose

Can be aifs because the data is used to train the AIFS model. Is also sometime metno for data from the Norwegian Meteorological Institute. This definition may need to be revisited.

content

The content of the dataset CAN have four parts, such as: class-type-stream-expver

  • class: od Operational archive (class is a MARS keyword)

  • type: an Analysis (type is a MARS keyword)

  • stream: oper Atmospheric model (stream is a MARS keyword)

  • expver: 0001 (operational model)

source

mars (when data is from MARS), could be opendap or other.

resolution

o96 (could be : n320, 0p2 for 0.2 degree)

start-year

1979 if the first validity time is in 1979.

end-year

2020 if the first validity time is in 2020. Notice that if the dataset is from 18.04.2020 to 19.07.2020, the star-year and end-year are both 2020. For instance in aifs-od-an-oper-0001-mars-o96-2020-2020-6h-v5

frequency

1h (could be : 6h, 10m for 10 minutes)

version

This is the version of the content of the dataset, e.g. which variables, levels, etc. This is not the version of the format. There must be a “v” before the version number. The “v” is not part of the version number. For instance …-v5 is the fifth version of the content of the dataset.

extra-str

Experimental datasets can have additional text in the name. This extra string can contain additional -. It provides additional information about the content of dataset.

Examples

aifs-od-an-oper-0001-mars-o96-1979-2022-1h-v5

aifs-ea-an-oper-0001-mars-o96-1979-2022-6h-v6

aifs-ea-an-enda-0001-mars-o96-1979-2022-6h-v6-recentered-on-oper

aifs-ea-an-oper-0001-mars-n320-1979-2022-6h-v4

inca-an-oper-0001-gridefix-1km-2023-2024-10m-v1