Retrieve Command

The retrieve command generates data retrieval requests for running inference. It analyses a checkpoint and creates MARS or JSON requests for the required input data based on the model’s configuration, lagged inputs, and variable requirements.

Description

The retrieve command is primarily used to prepare data for inference by generating retrieval requests. It’s particularly useful for:

  • Preparing operational forecasts with specific date requirements

  • Staging data from remote archives (MARS, FDB, CDS)

  • Understanding what data is needed by a trained model

  • Generating requests for forcing data over forecast periods

The command outputs either:

  • JSON format (default): machine-readable list of requests

  • MARS format (--mars): ready-to-use MARS retrieval commands

This command is commonly used in operational workflows where data retrieval and model inference are separate steps.

Used by prepml.

usage: anemoi-inference retrieve [-h] [--defaults DEFAULTS] [--date DATE]
                                 [--output OUTPUT]
                                 [--staging-dates STAGING_DATES]
                                 [--forecast-dates] [--extra EXTRA]
                                 [--use-scda] [--use-grib-paramid]
                                 [--dont-fail-for-missing-paramid]
                                 [--include INCLUDE] [--exclude EXCLUDE]
                                 [--input-type {constant-forcings,default-input,dynamic-forcings,prognostics}]
                                 [--mars] [--target TARGET] [--verb VERB]
                                 [--dataset-name DATASET_NAME]
                                 config [overrides ...]

Positional Arguments

config

Path to config file. Can be omitted to pass config with overrides and defaults.

overrides

Overrides as key=value

Named Arguments

--defaults

Sources of default values.

--date

Date

--output

Output file

--staging-dates

Path to a file with staging dates

--forecast-dates

Use forecast dates (for forcings)

Default: False

--extra

Additional request values. Can be repeated

--use-scda

Use scda stream for 6/18 input time

Default: False

--use-grib-paramid

Use paramId instead of param.

Default: False

--dont-fail-for-missing-paramid

Do not fail if a parameter ID is missing.

Default: False

--include

Comma-separated list of variable categories to include

--exclude

Comma-separated list of variable categories to exclude

--input-type

Possible choices: constant-forcings, default-input, dynamic-forcings, prognostics

Type of input variables to retrieve.

Default: “default-input”

--mars

Write requests for MARS retrieval

Default: False

--target

Target path for the MARS retrieval requests

Default: “input.grib”

--verb

Verb for the MARS retrieval requests

Default: “retrieve”

--dataset-name

Dataset name to prepare requests for (used for multi-dataset checkpoints)

Default: “data”

Examples

Generate JSON Requests for a Single Date

anemoi-inference retrieve config.yaml --date 2025-01-01T00

Output:

[
  {
    "class": "od",
    "stream": "oper",
    "type": "an",
    "date": "20250101",
    "time": "0000",
    "levtype": "sfc",
    "param": "2t/sp/10u/10v",
    "grid": "0.25/0.25",
    "area": "90/-180/-90/180"
  }
]

Generate MARS Retrieval Commands

anemoi-inference retrieve config.yaml --date 2025-01-01T00 --mars

Output:

retrieve,
   class=od,
   stream=oper,
   type=an,
   date=20250101,
   time=0000,
   levtype=sfc,
   param=2t/sp/10u/10v,
   grid=0.25/0.25,
   area=90/-180/-90/180,
   target=input.grib

Save to File

anemoi-inference retrieve config.yaml --date 2025-01-01T00 --output requests.json

Generate Requests for Forecast Dates (Forcings)

anemoi-inference retrieve config.yaml \
    --date 2025-01-01T00 \
    --forecast-dates \
    --include forcing

This generates requests for all timesteps from the initial date through the lead time, useful for time-varying forcing data.

Bulk Staging with Multiple Dates

# Create a file with dates
echo "2025-01-01T00" > dates.txt
echo "2025-01-01T06" >> dates.txt
echo "2025-01-01T12" >> dates.txt

anemoi-inference retrieve config.yaml --staging-dates dates.txt --mars

Exclude Computed Variables

anemoi-inference retrieve config.yaml \
    --date 2025-01-01T00 \
    --exclude computed,forcing

Add Extra MARS Parameters

anemoi-inference retrieve config.yaml \
    --date 2025-01-01T00 \
    --extra class=ea \
    --extra expver=0001 \
    --mars

Use with Operational Inference

# Step 1: Generate retrieval request
anemoi-inference retrieve config.yaml \
    --date $(date -u +%Y-%m-%dT%H) \
    --mars > retrieve.req

# Step 2: Retrieve data
mars retrieve.req

# Step 3: Run inference
anemoi-inference run config.yaml

Variable Categories

The following variable categories can be used with --include, --exclude, and --input-type:

Category

Description

prognostic

Model prognostic variables (state variables evolved by the model)

diagnostic

Diagnostic variables (derived from prognostic state)

constant

Time-invariant fields (land-sea mask, orography, etc.)

forcing

Time-varying external forcing (solar radiation, etc.)

computed

Variables computed from other variables

Use Cases

Operational Forecasting

In operational systems, data retrieval and model inference are often separated:

  1. retrieve generates the request for current conditions

  2. Data is retrieved from archive

  3. run executes the forecast

This separation allows for caching, parallel retrieval, and better error handling.

Batch Processing

For historical reanalysis or verification, use --staging-dates to generate requests for many dates at once.

Understanding Model Requirements

Use retrieve to inspect what data a model needs without running inference:

anemoi-inference retrieve config.yaml --date 2025-01-01T00

See also