Where to put your data, models, configs, and code so future-you doesn't curse past-you.
Key idea
Separate the things that change at different rates. Code changes daily, configs change per-experiment, data and models change rarely (and live elsewhere). Put each in its own directory; never mix them.
The single biggest predictor of an ML project becoming unmaintainable is everything-in-one-notebook. Six months in, nobody — including you — knows which version of which preprocessing belongs to which model.
A clean repo splits responsibilities. A reasonable starting layout for a research project:
You're starting any project meant to last more than a week
Multiple people will work on the code
You want to reproduce an experiment 3 months from now
You'll eventually package or deploy the code
Don't over-engineer when
It's a one-off experiment that lives in a single notebook
You're prototyping for an afternoon and will throw it away
The "project" is two files of glue between two libraries
Adding structure would slow you down more than it helps
Want a concrete starter template and the reasoning behind each directory?
Key idea
Use src layout (your code is an importable package in src/) and config-driven entry points (scripts in scripts/ read configs, never embed hyperparameters). Treat notebooks as scratch paper, not source of truth.
The "src layout" puts your package one directory deeper than the project root. Why bother? It prevents you from accidentally importing your code from the project root instead of the installed version — a real bug that wastes hours when tests pass locally but fail in CI.
Why scripts/ separately from src/? Things in src/my_project/ are library code: importable, testable, no argparse. Things in scripts/ are entry points: argparse / Hydra, side effects, calls into the library. This split makes the library reusable from notebooks, CLIs, and other scripts without modification.
Notebooks are scratch paper. Notebooks for exploration, plotting, debugging. Anything you'd want to call again belongs in src/. Promote code out of notebooks aggressively — every function that survives to a second notebook should be in the package.
Don't commit data. Even small data. Use .gitignore + a download script (scripts/download_data.py) or a versioned store (DVC, S3 with a config). Repos with committed CSVs are repos that go bad.
You'll run multiple experiments with different configs
You want to publish the code or share it with collaborators
You'll deploy a model from this codebase
Lighter shape when
Quick proof-of-concept that won't survive a week
Single-file utility that wraps an existing library
Notebook-driven analysis on a single dataset
You're learning, not building
Want monorepo tradeoffs, naming conventions, and templates?
Key idea
Beyond the basic shape, the harder decisions are about boundaries — package per concern vs. flat package, monorepo vs. polyrepo, where to draw the line between library and application code, and how to keep coupling low across teams.
Cookiecutter / templates. Don't hand-roll. cookiecutter-data-science is the canonical research-project template. lightning-hydra-template for PyTorch Lightning + Hydra. cookiecutter-uv for modern uv-based projects. Pick one, customize once, stop bikeshedding.
Monorepo vs. polyrepo. ML projects often involve a data-prep service, a training service, a serving service, and a shared library. Monorepo (Bazel, Pants, uv workspaces) makes refactors across boundaries cheap but raises CI cost. Polyrepo isolates and versions independently but creates coordination overhead. For ≤ 4 services / 1 team: monorepo wins. Beyond that it depends.
Library / application split. The hardest line. Anything reusable (data loaders, model definitions, metrics) is library; anything specific to one experiment (hyperparameters, paths, schedules) is application. The test: could another project consume this without modification? If yes, it's library. If no, it doesn't belong in src/.
Versioning & releases. Use importlib.metadata.version to surface the package version in logs and run metadata. Tag releases (git tags + semver). Log the git SHA on every training run; you'll thank yourself when "model A from June" needs reproducing in October.
Naming conventions. Boring is good. Plurals for collections (models/, not model/). Verbs for action modules (training.py), nouns for data structures (dataset.py). Avoid utils.py if you can — it tends to become a graveyard.
Reach for it when
Multi-service ML platform — monorepo plus shared libs
Library you intend others to depend on
Production deployment with strict release process
You're standardizing across multiple internal ML teams
Skip it when
Research project where boundaries shift weekly
Solo dev — coordination overhead outweighs the structure