clusters
client
- class anemoi.inference.clusters.client.ComputeClient(world_size: int, local_rank: int, global_rank: int, master_addr: str, master_port: int, process_group: 'torch.distributed.ProcessGroup | None')
Bases:
object
- class anemoi.inference.clusters.client.ComputeClientFactory
Bases:
ABCAbstract factory class for compute client creation.
- create_client() ComputeClient
Create and return a ComputeClient instance.
- create_model_comm_group() ProcessGroup | None
Create the communication group for model parallelism.
distributed
- class anemoi.inference.clusters.distributed.DistributedCluster
Bases:
MappingClusterDistributed cluster that uses environment variables for distributed setup.
manual
- class anemoi.inference.clusters.manual.ManualSpawner(world_size: int, port: int | None = None)
Bases:
ComputeSpawnerManual cluster that uses user-defined world size for distributed setup.
Example usage
In the config ```yaml cluster:
- manual:
world_size: 4 port: 12345
- spawn(fn: Callable[[Configuration, ComputeClientFactory], None], config: Configuration) None
Spawn processes for parallel execution.
- Parameters:
fn (SPAWN_FUNCTION) – The function to run in each process. Expects to receive the configuration and compute client factory as arguments.
config (Configuration) – The configuration object for the runner.
mapping
- class anemoi.inference.clusters.mapping.EnvMapping(local_rank: str | list[str], global_rank: str | list[str], world_size: str | list[str], master_addr: str | list[str], master_port: str | list[str], backend: str | None = None, init_method: str = 'env://')
Bases:
objectDataclass to hold environment variable mappings for cluster configuration.
Elements can be either strings or lists of strings. If a list is provided, the first found environment variable will be used.
- class anemoi.inference.clusters.mapping.MappingCluster(mapping: dict | EnvMapping)
Bases:
ComputeClientFactoryCustom cluster that uses user-defined environment variables for distributed setup.
Example usage
```python from anemoi.inference.clusters.mapping import MappingCluster cluster = MappingCluster(mapping={
“local_rank”: “LOCAL_RANK_ENV_VAR”, “global_rank”: “GLOBAL_RANK_ENV_VAR”, “world_size”: “WORLD_SIZE_ENV_VAR”, “master_addr”: “MASTER_ADDR_ENV_VAR”, “master_port”: “MASTER_PORT_ENV_VAR”, “init_method”: “env://”,
})
- class anemoi.inference.clusters.mapping.CustomCluster(**kwargs)
Bases:
MappingClusterCustom cluster that uses user-defined environment variables for distributed setup.
Example usage
- parallel:
- cluster:
- custom:
local_rank: LOCAL_RANK_ENV_VAR global_rank: GLOBAL_RANK_ENV_VAR world_size: WORLD_SIZE_ENV_VAR master_addr: MASTER_ADDR_ENV_VAR master_port: MASTER_PORT_ENV_VAR init_method: env://
mpi
- class anemoi.inference.clusters.mpi.MPICluster(use_mpi_backend: bool = False, **kwargs)
Bases:
MappingClusterMPI cluster that uses MPI environment variables for distributed setup.
slurm
- class anemoi.inference.clusters.slurm.SlurmCluster
Bases:
MappingClusterSlurm cluster that uses SLURM environment variables for distributed setup.
spawner
- class anemoi.inference.clusters.spawner.ComputeSpawner
Bases:
ABCAbstract base class for cluster operations for parallel execution.
- abstractmethod spawn(fn: Callable[[Configuration, ComputeClientFactory], None], config: Configuration) None
Spawn processes for parallel execution.
- Parameters:
fn (SPAWN_FUNCTION) – The function to run in each process. Expects to receive the configuration and compute client factory as arguments.
config (Configuration) – The configuration object for the runner.