Reproducible runs in CI. Pin everything: Python, torch, CUDA, NumPy, system OS. Use a hashed lockfile. Set determinism flags. Save the run's full provenance (config, git sha, data hash) as an artefact.
Model evaluation gates. A model is only promoted if it beats the production baseline on held-out evals and clears subgroup-specific bars (no regressions on minority groups). Tools: model registries (MLflow, W&B), promotion workflows, model cards.
Data drift checks in CI. Before retraining, validate that the new data matches the schema and distribution of training data within thresholds. Great Expectations + a GH Action runs this on every data update.
Security & supply chain. Pip-audit or safety on every PR. Dependabot for automated security updates. SBOM generation. Pin git submodules / external models by SHA.
Cost monitoring. Track CI / training cost per PR. Quotas per team. Alert when a run uses 10× normal compute — usually means a bug.
Self-hosted runners. For GPU jobs or special hardware. Manage scaling (Karpenter, RunPod) so you only pay for time you use. Worth it when GitHub-hosted GPU pricing exceeds your dedicated-runner cost.
Release engineering. Semantic versioning for the package; a separate version for the trained model (commit + checkpoint id). Changelog auto-generation from conventional commits.