Parallel Output
When running inference on large models with many output fields, writing results to disk can become a bottleneck. The Parallel Output wrapper offloads the I/O work to multiple writer processes, allowing the main inference loop to continue without waiting for disk writes to complete.
Each writer process handles a subset of the output fields and writes them independently to its own file. This distributes both the I/O bandwidth and the encoding/serialisation cost across multiple CPU cores.
How It Works
The parallel output wraps any other output type (e.g. grib,
zarr, netcdf) and:
Spawns
num_writerswriter processes (usingfork).At each forecast step, splits the output fields evenly across the writers.
Each writer receives its chunk via a multiprocessing queue and writes it using its own instance of the wrapped output.
On shutdown, all writers are gracefully terminated with a signal via their queues.
Each writer appends a suffix _w<id> to the output file name to
avoid conflicts (e.g. output_w0.grib, output_w1.grib).
Configuration
To enable parallel output, wrap your existing output configuration
inside a parallel block:
output:
parallel:
num_writers: 4
output:
grib:
path: /path/to/output.grib
This produces four output files:
/path/to/output_w0.grib/path/to/output_w1.grib/path/to/output_w2.grib/path/to/output_w3.grib
This short syntax, where the inner output is implied, is also supported:
output:
parallel:
num_writers: 4
grib:
path: /path/to/output.grib
Parameters
- ParallelOutput.__init__(context: Context, metadata: Metadata, *, output: Output | Any | None = None, num_writers: int = 1, **kwargs: Any)
Initialise the ParallelOutput.
- Parameters:
context (Context) – The inference context.
metadata (Metadata) – Metadata for the dataset.
output (Output | Any | None) – The inner output (or its config dict) that will be forked into writer processes.
num_writers (int) – Number of writer processes to spawn. Must be >= 1. Defaults to 1 (single output file, asynchronous writes).
**kwargs (Any) – Forwarded to the inner output.
Examples
GRIB output with 2 writers
output:
parallel:
num_writers: 2
output:
grib:
path: forecast.grib
Zarr output with 4 writers
output:
parallel:
num_writers: 4
output:
zarr:
store: forecast.zarr
NetCDF output with 8 writers
output:
parallel:
num_writers: 8
output:
netcdf:
path: forecast.nc
Combining with parallel inference
Parallel output can be used together with the parallel runner for distributed model inference. In this case, the CPU process associated with GPU 0 will handle the output writing. Only CPU process 0 will spawn writer processes.
runner: parallel
lead_time: 240h
checkpoint: /path/to/checkpoint.ckpt
input:
grib: /path/to/input.grib
output:
parallel:
num_writers: 4
output:
grib:
path: /path/to/output.grib
Choosing num_writers
For small resolutions or infrequent output steps, parallel writing might be unnecessary.
For larger output sizes, one can set num_writers: 1. This will produce a single output file, but the writing will happen asynchronously in a separate process, allowing the main inference loop to continue without waiting for disk I/O.
If you suspect that writing to disk is a bottleneck, you can experiment with increasing num_writers. 2 or 4 writers is often a good starting point.
Troubleshooting
Writer process crashes
If a writer process crashes (e.g. due to a disk-full error or a bug in the output backend), the main process will detect that the writer is dead and abort inference with a RuntimeError.