Why this matters: in production you want to upgrade or roll back a model without redeploying the serving infrastructure. The registry is the indirection — the serving code asks "give me the current production model"; the registry returns the right artefact. Versioning, audit trails, and rollback are all properties of the registry.
Model Registries
A central catalogue of every model version — with metadata, lineage, and clear stages.
One source of truth for every model artefact. Each trained model gets a name, a version, metadata (metrics, config, lineage), and a stage (None → Staging → Production → Archived). Deployment systems pull "the model tagged production" by name, never by file path.
What the registry stores
- The model artefact (weights / pickle / ONNX)
- Hyperparameters and the training config
- Training metrics and final eval scores
- Dataset version / hash
- Lineage: parent model, training run, code commit
- Stage: None / Staging / Production / Archived
Tools
- MLflow Model Registry: open-source, mature
- W&B Artifacts: hosted, same-tool as tracking
- SageMaker Model Registry: AWS-native
- Vertex Model Registry: GCP-native
- BentoML Yatai: model + bento packaging
import mlflow
# At the end of training, log the model and metadata
with mlflow.start_run() as run:
mlflow.log_params(cfg.flatten())
mlflow.log_metrics({"val/auc": 0.91, "test/auc": 0.89})
mlflow.pytorch.log_model(model, "model")
# Register as a versioned entity
result = mlflow.register_model(
f"runs:/{run.info.run_id}/model",
name="credit-risk-classifier",
)
print(f"Registered version {result.version}")
# Promote to production after promotion gate passes
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="credit-risk-classifier", version=result.version, stage="Production",
)
# At serve time
model = mlflow.pytorch.load_model("models:/credit-risk-classifier/Production")
- Reproducibility = re-deriving the same model from these inputs
- The registry stores the artefact and the inputs separately
- Audit trail: who created it, when, with what data, why promoted
Stages. "None" (just registered), "Staging" (under evaluation), "Production" (serving live), "Archived" (kept for audit). The transitions need explicit gates — usually CI green + eval pass for Staging, human review for Production.
Aliases vs stages. MLflow has both. Stages are bucket-based; aliases are pointer-based ("@latest", "@champion", "@candidate"). Modern MLflow favours aliases; flexible enough for non-linear workflows.
Multi-environment promotion. Dev → staging → production. Each environment's serving system points at its registry stage. Promotion is a registry operation, not a redeploy.
Lineage. Every model points to its training run, which points to its config, dataset version, and parent model (if fine-tuned). A few hops takes you from a production prediction back to the raw data that produced it.
Model cards. Mitchell et al. (2019). Standardised model documentation — intended use, training data, performance metrics across subgroups, ethical considerations. Increasingly a regulatory requirement; HuggingFace, OpenAI, and Anthropic all ship them.
Artefact storage. The registry tracks; the storage holds the weights. S3 / GCS / Azure Blob / on-prem. Use registries that decouple metadata storage from artefact storage so you can swap backends without losing history.
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Modern MLflow style — use aliases for routing
client.set_registered_model_alias(
name="credit-risk-classifier", alias="production", version=42,
)
client.set_registered_model_alias(
name="credit-risk-classifier", alias="champion", version=42,
)
client.set_registered_model_alias(
name="credit-risk-classifier", alias="challenger", version=43,
)
# Serving code is decoupled from versions
prod = mlflow.pytorch.load_model("models:/credit-risk-classifier@production")
# Roll back: just re-point the alias, no redeploy
client.set_registered_model_alias(
name="credit-risk-classifier", alias="production", version=41,
)
- Each input is itself versioned and signed
- Output artefact carries a provenance record traceable to all inputs
- SLSA-like attestation for ML — not yet standard but converging
Model cards in the registry. Auto-generate from the training run: dataset, training metrics, subgroup metrics, intended use, known limitations. Mitchell et al.'s template is the reference; HuggingFace has built-in support.
Governance & audit. Who can promote? Who reviewed? When? Most enterprise registries (MLflow Enterprise, AzureML, Vertex) have role-based access + audit logs. For high-stakes domains (medical, financial), this is a regulatory requirement.
Model signing & attestation. Cryptographically sign artefacts so consumers can verify provenance. Sigstore for ML. SBOM (software bill of materials) extending to model artefacts. Active area in 2024+ as supply-chain attacks become a concern.
Lineage propagation. Fine-tuned model → base model → pre-training data. Each step links to the previous; a prediction can be traced back through fine-tuning to pre-training data. Useful for debugging and compliance.
Registry-driven CI. CI workflows triggered by registry events: new "staging" model → run extended evals; new "production" model → deploy + announce. Registries with webhooks (MLflow Enterprise, W&B) make this natural.
Distillation lineage. A distilled student model points to its teacher. Compound chains (teacher → student → re-distilled student) need careful tracking; otherwise debugging "where did this behaviour come from" is impossible.
Multi-model systems. When a production prediction depends on several models (retrieval encoder + ranker + reranker + LLM), the registry needs to support "deployments" that bundle versions across multiple model names.
import mlflow
from huggingface_hub import ModelCard, ModelCardData
# Auto-generate a model card from the training run
def make_card(run_id, model_name, version):
run = mlflow.get_run(run_id)
card_data = ModelCardData(
language="en",
license="apache-2.0",
model_name=model_name,
tags=["classification", "credit-risk"],
metrics={
"val_auc": run.data.metrics["val/auc"],
"test_auc": run.data.metrics["test/auc"],
},
)
card = ModelCard.from_template(card_data, template_path="MODEL_CARD_TEMPLATE.md")
card.save(f"cards/{model_name}-{version}.md")
return card