Key idea

One source of truth for every model artefact. Each trained model gets a name, a version, metadata (metrics, config, lineage), and a stage (None → Staging → Production → Archived). Deployment systems pull "the model tagged production" by name, never by file path.

Why this matters: in production you want to upgrade or roll back a model without redeploying the serving infrastructure. The registry is the indirection — the serving code asks "give me the current production model"; the registry returns the right artefact. Versioning, audit trails, and rollback are all properties of the registry.

What the registry stores

  • The model artefact (weights / pickle / ONNX)
  • Hyperparameters and the training config
  • Training metrics and final eval scores
  • Dataset version / hash
  • Lineage: parent model, training run, code commit
  • Stage: None / Staging / Production / Archived

Tools

  • MLflow Model Registry: open-source, mature
  • W&B Artifacts: hosted, same-tool as tracking
  • SageMaker Model Registry: AWS-native
  • Vertex Model Registry: GCP-native
  • BentoML Yatai: model + bento packaging
import mlflow

# At the end of training, log the model and metadata
with mlflow.start_run() as run:
    mlflow.log_params(cfg.flatten())
    mlflow.log_metrics({"val/auc": 0.91, "test/auc": 0.89})
    mlflow.pytorch.log_model(model, "model")

# Register as a versioned entity
result = mlflow.register_model(
    f"runs:/{run.info.run_id}/model",
    name="credit-risk-classifier",
)
print(f"Registered version {result.version}")

# Promote to production after promotion gate passes
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="credit-risk-classifier", version=result.version, stage="Production",
)

# At serve time
model = mlflow.pytorch.load_model("models:/credit-risk-classifier/Production")
Want lineage tracking, model cards, & multi-environment promotion?
Model versioning identity $$ \text{model version} \;=\; \text{hash}(\text{code commit}, \text{data hash}, \text{config}, \text{seeds}) $$
  • Reproducibility = re-deriving the same model from these inputs
  • The registry stores the artefact and the inputs separately
  • Audit trail: who created it, when, with what data, why promoted

Stages. "None" (just registered), "Staging" (under evaluation), "Production" (serving live), "Archived" (kept for audit). The transitions need explicit gates — usually CI green + eval pass for Staging, human review for Production.

Aliases vs stages. MLflow has both. Stages are bucket-based; aliases are pointer-based ("@latest", "@champion", "@candidate"). Modern MLflow favours aliases; flexible enough for non-linear workflows.

Multi-environment promotion. Dev → staging → production. Each environment's serving system points at its registry stage. Promotion is a registry operation, not a redeploy.

Lineage. Every model points to its training run, which points to its config, dataset version, and parent model (if fine-tuned). A few hops takes you from a production prediction back to the raw data that produced it.

Model cards. Mitchell et al. (2019). Standardised model documentation — intended use, training data, performance metrics across subgroups, ethical considerations. Increasingly a regulatory requirement; HuggingFace, OpenAI, and Anthropic all ship them.

Artefact storage. The registry tracks; the storage holds the weights. S3 / GCS / Azure Blob / on-prem. Use registries that decouple metadata storage from artefact storage so you can swap backends without losing history.

import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()

# Modern MLflow style — use aliases for routing
client.set_registered_model_alias(
    name="credit-risk-classifier", alias="production", version=42,
)
client.set_registered_model_alias(
    name="credit-risk-classifier", alias="champion", version=42,
)
client.set_registered_model_alias(
    name="credit-risk-classifier", alias="challenger", version=43,
)

# Serving code is decoupled from versions
prod = mlflow.pytorch.load_model("models:/credit-risk-classifier@production")

# Roll back: just re-point the alias, no redeploy
client.set_registered_model_alias(
    name="credit-risk-classifier", alias="production", version=41,
)
Want model cards, governance, & supply-chain attestation?
Supply chain for models $$ (\text{base model}, \text{dataset}, \text{code}, \text{config}) \xrightarrow{\text{training}} (\text{artefact}, \text{provenance}, \text{signature}) $$
  • Each input is itself versioned and signed
  • Output artefact carries a provenance record traceable to all inputs
  • SLSA-like attestation for ML — not yet standard but converging

Model cards in the registry. Auto-generate from the training run: dataset, training metrics, subgroup metrics, intended use, known limitations. Mitchell et al.'s template is the reference; HuggingFace has built-in support.

Governance & audit. Who can promote? Who reviewed? When? Most enterprise registries (MLflow Enterprise, AzureML, Vertex) have role-based access + audit logs. For high-stakes domains (medical, financial), this is a regulatory requirement.

Model signing & attestation. Cryptographically sign artefacts so consumers can verify provenance. Sigstore for ML. SBOM (software bill of materials) extending to model artefacts. Active area in 2024+ as supply-chain attacks become a concern.

Lineage propagation. Fine-tuned model → base model → pre-training data. Each step links to the previous; a prediction can be traced back through fine-tuning to pre-training data. Useful for debugging and compliance.

Registry-driven CI. CI workflows triggered by registry events: new "staging" model → run extended evals; new "production" model → deploy + announce. Registries with webhooks (MLflow Enterprise, W&B) make this natural.

Distillation lineage. A distilled student model points to its teacher. Compound chains (teacher → student → re-distilled student) need careful tracking; otherwise debugging "where did this behaviour come from" is impossible.

Multi-model systems. When a production prediction depends on several models (retrieval encoder + ranker + reranker + LLM), the registry needs to support "deployments" that bundle versions across multiple model names.

import mlflow
from huggingface_hub import ModelCard, ModelCardData

# Auto-generate a model card from the training run
def make_card(run_id, model_name, version):
    run = mlflow.get_run(run_id)
    card_data = ModelCardData(
        language="en",
        license="apache-2.0",
        model_name=model_name,
        tags=["classification", "credit-risk"],
        metrics={
            "val_auc": run.data.metrics["val/auc"],
            "test_auc": run.data.metrics["test/auc"],
        },
    )
    card = ModelCard.from_template(card_data, template_path="MODEL_CARD_TEMPLATE.md")
    card.save(f"cards/{model_name}-{version}.md")
    return card
Too dense?