Parallel Output

When running inference on large models with many output fields, writing results to disk can become a bottleneck. The Parallel Output wrapper offloads the I/O work to multiple writer processes, allowing the main inference loop to continue without waiting for disk writes to complete.

Each writer process handles a subset of the output fields and writes them independently to its own file. This distributes both the I/O bandwidth and the encoding/serialisation cost across multiple CPU cores.

How It Works 

The parallel output wraps any other output type (e.g. grib, zarr, netcdf) and:

Spawns num_writers writer processes (using fork).
At each forecast step, splits the output fields evenly across the writers.
Each writer receives its chunk via a multiprocessing queue and writes it using its own instance of the wrapped output.
On shutdown, all writers are gracefully terminated with a signal via their queues.

Each writer appends a suffix _w<id> to the output file name to avoid conflicts (e.g. output_w0.grib, output_w1.grib).

Configuration 

To enable parallel output, wrap your existing output configuration inside a parallel block:

output:
  parallel:
    num_writers: 4
    output:
      grib:
        path: /path/to/output.grib

This produces four output files:

/path/to/output_w0.grib
/path/to/output_w1.grib
/path/to/output_w2.grib
/path/to/output_w3.grib

This short syntax, where the inner output is implied, is also supported:

output:
  parallel:
    num_writers: 4
    grib:
      path: /path/to/output.grib

Parameters 

ParallelOutput.__init__(context: Context, metadata: Metadata, *, output: Output | Any | None = None, num_writers: int = 1, **kwargs: Any)

Initialise the ParallelOutput.

Parameters:

context (Context) – The inference context.
metadata (Metadata) – Metadata for the dataset.
output (Output | Any | None) – The inner output (or its config dict) that will be forked into writer processes.
num_writers (int) – Number of writer processes to spawn. Must be >= 1. Defaults to 1 (single output file, asynchronous writes).
**kwargs (Any) – Forwarded to the inner output.

Examples 

GRIB output with 2 writers

output:
  parallel:
    num_writers: 2
    output:
      grib:
        path: forecast.grib

Zarr output with 4 writers

output:
  parallel:
    num_writers: 4
    output:
      zarr:
        store: forecast.zarr

NetCDF output with 8 writers

output:
  parallel:
    num_writers: 8
    output:
      netcdf:
        path: forecast.nc

Combining with parallel inference

Parallel output can be used together with the parallel runner for distributed model inference. In this case, the CPU process associated with GPU 0 will handle the output writing. Only CPU process 0 will spawn writer processes.

runner: parallel
lead_time: 240h
checkpoint: /path/to/checkpoint.ckpt

input:
  grib: /path/to/input.grib

output:
  parallel:
    num_writers: 4
    output:
      grib:
        path: /path/to/output.grib

Choosing `num_writers`

For small resolutions or infrequent output steps, parallel writing might be unnecessary.

For larger output sizes, one can set num_writers: 1. This will produce a single output file, but the writing will happen asynchronously in a separate process, allowing the main inference loop to continue without waiting for disk I/O.

If you suspect that writing to disk is a bottleneck, you can experiment with increasing num_writers. 2 or 4 writers is often a good starting point.

Troubleshooting 

Writer process crashes 

If a writer process crashes (e.g. due to a disk-full error or a bug in the output backend), the main process will detect that the writer is dead and abort inference with a RuntimeError.

Parallel Output

How It Works

Configuration

Parameters

Examples