ERA5 training data

Warning

Do not train a model using the URL below. You will need to download it locally first. The dataset is quite large (about 0.5 TB) and is composed of over 65,000 files.

ECMWF provides a dataset of ERA5 reanalysis data on a O96 octahedral reduced Gaussian grid, which has a resolution of approximately 1°. The dataset provides 6-hourly data for the period from 1979 to 2023. The list of variable is provided below.

The dataset contains data from the Copernicus Climate Data Store and is available under the CC-BY-4.0 license.

The dataset can be download from https://data.ecmwf.int/anemoi-datasets/era5-o96-1979-2023-6h-v8.zarr.

Downloading the dataset

To download the dataset, you can use the anemoi-datasets copy command. You will need version 0.5.22 of the package or above.

% pip install "anemoi-datasets>=0.5.22"
% anemoi-dataset copy \
    --url https://data.ecmwf.int/anemoi-datasets/era5-o96-1979-2023-6h-v8.zarr \
    --target era5-o96-1979-2023-6h-v8.zarr

By default, the download will process 100 files at a time, in one thread. If your internet connection is fast enough, you can increase the number of threads using the --transfers option. If your internet connection is slow, you can decrease the number files processed at a time using the --blocks option.

If the download fails, you can resume the download using the --resume option, this will skip the blocks that have already been downloaded.

Note

The HTTP server hosting the dataset will limit the overall number of simultaneous connections. This means that your download may be affected by other users downloading the same data. If you get an error 429 Too many requests, simply restart the download with --resume, and lower the number of threads.

Content of the dataset

Pressure level variables

Variable

Description

Units

q

Specific humidity

kg/kg

t

Temperature

K

u

U-component of wind

m/s

v

V-component of wind

m/s

w

Vertical velocity

Pa/s

z

Geopotential

m²/s²

Each of the variables above are named in the dataset as <variable>_<level>. For example, the Geopotential at 1000hPa is name z_1000. The pressure levels are: 1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100 and 50 hPa.

Single level variables

Variable

Description

Units

10u

U-component of wind at 10m

m/s

10v

V-component of wind at 10m

m/s

2d

Dew point temperature at 2m

K

2t

Air temperature at 2m

K

cp

Convection precipitation (6h accumulation)

m

lsm

Land-sea mask

0-1

msl

Mean sea level pressure

Pa

sdor

Standard deviation of sub-gridscale orography

m

skt

Skin temperature

K

slor

Slope of sub-gridscale orography

.

sp

Surface pressure

Pa

tcw

Total column water

m

tp

Total precipitation (6h accumulation)

m

z

Orography

m²/s²

Forcing variables

Variable

Description

Units

cos_latitude

Cosine of latitude

.

cos_longitude

Cosine of longitude

.

sin_latitude

Sine of latitude

.

sin_longitude

Sine of longitude

.

cos_julian_day

Cosine of Julian day

.

cos_local_time

Cosine of local time

.

sin_julian_day

Sine of Julian day

.

sin_local_time

Sine of local time

.

insolation

Insolation

.

For more information on the forcing variables, see the forcings in the anemoi-datasets package.