Migration System
This serves as general information on how the migration system works. This can be useful for contributors who need to write a migration script, users who want to understand how their checkpoint are updated, or futur contributors to the migration code
The migration system’s goal is to allow users to keep a checkpoint trained on a version of anemoi-models, and use it on newer or older version, even if it would have lead to a break of the checkpoint.
This is not only convenient for user to avoid having to retrain a full model just because a layer has been renamed, but also it allowes more flexibility to contributors for changes that they would not have done lest breaking existing checkpoints.
General Overview
Migrations are stored in anemoi-models in migrations.scripts as an
ordered list of scripts. Each script contains:
some metadata information, such as the version of the migration system, or the version of anemoi-models,
a
migratefunction to migrate checkpointsoptionnally a
migrate_setupfunction to fix import issues.
Similarly, the checkpoint contain some migration information that informs on its migration state:
the
nameof the migration: corresponds to the filename of the script in anemoi-models,the
metadata: same as in the migration scripts,the
signature: a hash digest of the original migration script. This is used to detect whether already executed scripts have changed. For now, it only logs a warning, but a more complex behavior could be added in the future,
Compatibility groups
Some changes cannot be migrated. For example, a change in architecture that adds some trainable weights. When this happens, a “final” migration script need to be created. The “final” migrations act as separators to show migrations that are compatible with one another. For example, let’s look at this list of migration in anemoi-models:
Name |
migration 1 |
migration 2 |
final migration |
migration 3 |
final migraion |
migration 4 |
|---|---|---|---|---|---|---|
Version |
0.8.1 |
0.8.3 |
0.9.0 |
0.10.5 |
0.12.0 |
0.12.2 |
Compatibility group |
1 |
1 |
2 |
2 |
3 |
3 |
This also shows the compatibility groups that groups migrations that
are compatible with one-another.
For example, for a checkpoint trained on version 0.8.1, migration 1
is already registered in the checkpoint. This checkpoint can be migrated
to be used with all versions of its compatibility group (group 1) up
until (and excluding) 0.9.0.
Similarly, a checkpoint trained on version 0.12.2 can be downgraded up until (and including) 0.12.0.
Note
Checkpoints only store migration information of their own compatibility group. The “final” migration of a group can also be seen as the first migration of the following group. In fact, “final” migrations are always the first registered migration of a group, and acts as a marker of the compatibility group of the checkpoint. The first compatibility group is an exception, and does not start with a “final” migration.
Resolution algorithm
The operations to execute are decided by the following resolution algorithm. To follow along, here is an example:
In anemoi-models |
In the checkpoint |
|---|---|
migration 1 |
migration 1 |
migration 2 |
migration 2 |
migration 5 |
|
migration 6 |
|
migration 7 |
First, we check if there are extra migrations in the checkpoint. If so, fail.
Then, we migrate any missing migrations in the checkpoint, starting from the start (here migration 5, 6 and 7).
In the example, it will produce:
MIGRATE migration 5
MIGRATE migration 6
MIGRATE migration 7
Executed migrations
The whole history of migrations is stored in the metadata of the checkpoint. It can be accessed through:
>>> history = metadata.get("migrations", {}).get("history", [])
>>> for executed_migration in history:
... print(executed_migration)
{ "type": "migrate", "name": "migration_name2.py", "signature": "[...]" }
Migrator
- exception anemoi.models.migrations.migrator.IncompatibleCheckpointException
Bases:
BaseExceptionThe provided checkpoint cannot be migrated because it is to old/recent.
- exception anemoi.models.migrations.migrator.IncompleteMigrationScript
Bases:
BaseExceptionThe migration script is missing some mandatory content (metadata).
- class anemoi.models.migrations.migrator.MigrationMetadata(versions: MigrationVersions, final: bool = False)
Bases:
objectMetadata object of the migration.
- versions: MigrationVersions
Migration and anemoi-model versions.
- class anemoi.models.migrations.migrator.SerializedMigration
Bases:
TypedDictThe serialized migration stored in the checkpoint
- class anemoi.models.migrations.migrator.Migration(name: str, metadata: MigrationMetadata, signature: str, migrate: Callable[[MutableMapping[str, Any]], MutableMapping[str, Any]] | None = None, migrate_setup: Callable[[MigrationContext], None] | None = None)
Bases:
objectRepresents a migration
- metadata: MigrationMetadata
Tracked metadata
- migrate: Callable[[MutableMapping[str, Any]], MutableMapping[str, Any]] | None = None
Callback to execute the migration
- migrate_setup: Callable[[MigrationContext], None] | None = None
Setup function to execute before loading the checkpoint. This can be used to mock missing modules or Attributes.
- classmethod from_serialized(migration: SerializedMigration) Migration
Alt init to load the migration from the serialized migration dict in the checkpoint This migration does not contain the
migrateormigrate_setupcallbacks as they are not serialized.- Parameters:
migration (SerializedMigration) – The serialized migration dict
- Returns:
The migration.
- Return type:
- serialize() SerializedMigration
Serialize this migration
- Returns:
The serialized dict to store in the checkpoint.
- Return type:
- class anemoi.models.migrations.migrator.MigrationOp(run: Callable[[MutableMapping[str, Any]], MutableMapping[str, Any]], migration: Migration)
Bases:
objectMigration Operation
Setup Context
- class anemoi.models.migrations.setup_context.MigrationContext
Bases:
objectA context object allowing setup callbacks to access some utilities:
context.move_attribute("pkg.start.MyClass", "pkg.end.MyRenamedClass")to update pathsto attributes.
context.move_module("pkg.start", "pkg.end")to move a full module.context.delete_attribute("pkg.mod.MyClass")to remove a class you can use “*” asa wildcard for the attribute name:
context.delete_attribute("pkg.mod.*")will remove all attribute from the module.
- delete_attribute(path: str) None
Indicate that an attribute has been deleted. Any class referencing this module will be replace by a
MissingAttributeobject.- Parameters:
path (str) – Path to the attribute. For example
pkg.mod.MyClass.