argparse, click, typer, fire — and how to wire a config file into the command line without losing your mind.
Key idea
Every script you'll run more than twice deserves a CLI. A clean command line + a config file gives you reproducibility, schedulability, and shareability for free. Modern Python has good tools (Typer, Hydra) that make this nearly trivial.
Three reasonable choices.typer: type-hint-driven, modern, great defaults. click: classic, mature, used everywhere. argparse: stdlib; fine for tiny scripts. fire: zero-config; useful for prototypes. Skip sys.argv parsing; you'll regret it.
The pattern. One CLI per script. Each command takes a config (YAML / Hydra). Hyperparameters in the config; flags for things that vary per-run (output dir, debug mode). The CLI just dispatches; the work is in modules.
What goes where
CLI flags: run-specific (out dir, debug, dry-run)
Config file: hyperparameters, paths, model architecture
Env vars: secrets, API keys, runtime config
Code: never put paths or hyperparameters here
Common mistakes
15 positional arguments — make them named
Hard-coded paths in the CLI defaults
One giant script with subcommands for unrelated things
No --help text — users (you, in 3 weeks) will hate you
import typer
from pathlib import Path
import yaml
app = typer.Typer(no_args_is_help=True)
@app.command()
def train(
config: Path = typer.Argument(..., help="Path to YAML config"),
out: Path = typer.Option("runs/", help="Output directory"),
debug: bool = typer.Option(False, help="Quick smoke run"),
seed: int = typer.Option(0, help="Random seed"),
):
"""Train a model from a YAML config."""
cfg = yaml.safe_load(config.read_text())
if debug:
cfg["max_steps"] = 10
run_training(cfg, out=out, seed=seed)
@app.command()
def evaluate(
checkpoint: Path,
test_data: Path,
threshold: float = 0.5,
):
"""Evaluate a checkpoint on a held-out test set."""
...
if __name__ == "__main__":
app()
Hydra. Facebook's config framework. YAML configs, override from CLI (python train.py model.lr=1e-3), composable configs (defaults: [model: resnet, data: cifar10]), multi-runs / sweeps. Most ML production projects converge on Hydra.
Pydantic + CLI. Define configs as Pydantic models — get validation, type coercion, defaults. typer integrates well. Useful for strict schemas; pairs nicely with Pydantic-everywhere codebases.
Subcommands.my-tool train ..., my-tool evaluate ..., my-tool deploy .... Typer's decorator pattern. Better than one huge script with a --mode flag.
Help is documentation. Every flag gets a one-line description. --help output is what you'll read in 3 weeks; make it good. Examples in the docstring are nicer than the user manual.
Dry-run flag.--dry-run prints what would happen without doing it. Useful for destructive operations (training that overwrites, deployments, data writes).
Config logging. The script writes the fully-resolved config to the run's output directory. Every flag override, every default, every env var — recorded. Reproducibility starts here.
import hydra
from omegaconf import DictConfig, OmegaConf
@hydra.main(version_base=None, config_path="configs", config_name="train")
def main(cfg: DictConfig):
# Hydra automatically creates an output dir and writes cfg.yaml there
print(OmegaConf.to_yaml(cfg)) # print the resolved config
run_training(cfg)
if __name__ == "__main__":
main()
# Usage:
# python train.py # defaults
# python train.py model.lr=1e-2 data=cifar100 # override individual values
# python train.py --multirun model.lr=1e-2,1e-3,1e-4 # sweep
The CLI contract$$ \text{stdin} \to \text{exit code, stdout, stderr} $$
Exit code 0 on success, non-zero on failure (matters for CI / shells)
stdout for data / results; stderr for logs / progress
Pipeable, scriptable, testable
Shell completion. Typer and Click both auto-generate completion scripts for bash / zsh / fish. my-tool --install-completion. Cheap UX win; pays dividends every day.
Plugins. Click and Typer both support plugin loading from entry-points. Useful for very large CLIs (a "platform" CLI with sub-tools). Most ML projects don't need this — but it's there when you do.
Long-running CLIs. A train command might run for days. Print structured logs to stderr, write metrics to a logger / file, write checkpoints. Support graceful shutdown on Ctrl+C — save state and exit cleanly.
Daemon mode. Some CLIs spawn long-lived services (serving, monitoring). systemd unit files, Docker containers, or supervisord. The CLI itself should fork-and-detach cleanly or run in the foreground for the supervisor.
Testable CLIs. Click and Typer both ship CliRunner for invoking commands programmatically. Asserts exit code + stdout. Same as testing any function.
Environment variable conventions. Pydantic Settings + dotenv for secrets. Prefix env vars (MYAPP_LOG_LEVEL) to avoid collision. Document them.
Versioning. Every CLI has --version. Helps debugging "which version is the CI runner using" mysteries.
import typer
from typer.testing import CliRunner
app = typer.Typer()
@app.command()
def add(a: int, b: int):
"""Add two numbers."""
typer.echo(a + b)
# Test the CLI as a unit
def test_add():
runner = CliRunner()
result = runner.invoke(app, ["3", "4"])
assert result.exit_code == 0
assert result.stdout.strip() == "7"