Preprocessing on the full dataset. The most common leak. Fitting StandardScaler, SimpleImputer, PCA, or any encoder on the full dataset (including val/test) before splitting leaks statistics. Always fit on train only; use scikit-learn Pipeline to enforce it.
Target leakage. A feature that's a near-duplicate of the target. Often subtle — "was the loan paid off?" as a feature when predicting whether the loan is good. The model gets perfect train and val scores, then the feature is unavailable in production.
Temporal leakage. Random splits on time-series data put the future in train. The model "learns" to use future information. Always use time-based splits for any data with a time dimension.
Group leakage. Multiple records per entity (patient, user, session) split across train and val. The model memorises entity-specific quirks and reports good val performance — but a new entity at deployment is fundamentally unseen.
Hyperparameter leakage. Tuning extensively on the test set. After enough trials, you've selected for performance on that test set; the number is no longer an unbiased estimate of true performance.