Mathematics
The math that actually shows up when you do ML. Organised by sub-discipline; each entry has a note on what it's uniquely good for.
Start here
If you only read one thing.
- Mathematics for Machine Learning Deisenroth, Faisal & Ong. Covers exactly the math used in ML — linear algebra, vector calculus, probability, optimisation — and nothing else. The single most-recommended starting point.
Linear Algebra
Vectors, matrices, eigendecomposition, SVD — the language every ML model speaks.
- 3Blue1Brown — Essence of Linear Algebra 15-video visual series. The intuition pump for "what is a matrix doing?" Watch alongside any textbook — you'll never see linalg the same way.
- MIT 18.06 — Gilbert Strang The classic MIT linear algebra course. Strang's lectures are widely considered the best math lectures on YouTube. Pair with his book "Introduction to Linear Algebra".
- Immersive Linear Algebra Browser-based textbook with interactive 3D visualizations on every page. Best when you want to play with concepts before reading the formal definitions.
- Matrix calculus quick reference Print and pin to your wall. Covers the matrix-calculus identities you'll keep forgetting when deriving backprop.
- The Matrix Calculus You Need For Deep Learning Parr & Howard. Long but focused: exactly the matrix calculus identities used in neural network derivations, with examples.
Calculus & Optimization
Derivatives, the chain rule, gradient descent, and convex optimization. The machinery of training.
- 3Blue1Brown — Essence of Calculus Same author and quality as the linalg series. The chain rule episode alone is worth more than most textbook chapters on it.
- Boyd & Vandenberghe — Convex Optimization The canonical text. Free PDF. Read chapters 1–5 for the foundations; the later chapters are specialist references. Beautiful, careful exposition.
- Ruder — Optimizing Gradient Descent Comprehensive overview of SGD variants — momentum, Adagrad, RMSprop, Adam, AdamW. The single clearest reference for "which optimizer should I use".
- Distill — Why Momentum Really Works The best deep dive on momentum specifically. Interactive visualizations that build intuition you can't get from formulas alone.
Probability & Statistics
Distributions, expectation, MLE / MAP, Bayesian inference, hypothesis testing.
- Seeing Theory Brown University's interactive probability primer. Click, drag, see what happens. The best intuition builder for distributions, regression, and inference.
- McElreath — Statistical Rethinking The friendliest Bayesian textbook. Free lectures on YouTube; book is paid but worth every cent. Builds intuition through worked examples and chapter-by-chapter Stan / PyMC code.
- Think Stats — Allen Downey Stats taught through Python. Light on theory, heavy on the things you'll actually do. Free PDF + companion notebooks.
- Wasserman — All of Statistics Compressed treatment of mathematical statistics. Goes fast, assumes mathematical maturity. The reference when you want the formal treatment in one place.
- Murphy — Probabilistic Machine Learning Two volumes, free PDF. Volume 1 (Foundations) covers ML through a probabilistic lens; volume 2 (Advanced) goes into modern Bayesian methods. Strongest single source on probabilistic ML.
Information Theory
Entropy, KL divergence, cross-entropy, mutual information. Underrated for ML — most loss functions are info-theoretic.
- MacKay — Information Theory, Inference, and Learning The classic. Free PDF, hundreds of pages. Idiosyncratic but illuminating — connects info theory and Bayesian inference in a way no other book does.
- Olah — Visual Information Theory The visual intuition for entropy and KL divergence. Read once, never look back at the formulas with confusion.
- MacKay lectures (Cambridge) David MacKay teaching the contents of his book. Lectures are excellent and free on YouTube.
Cheat sheets & references
When you need a formula or identity, fast.
- The Matrix Cookbook Petersen & Pedersen. The definitive reference for matrix identities, derivatives, and inverses. Bookmark this; you'll come back.
- Stanford CS229 — Probability Cheatsheet Compact reference for distributions, expectations, common identities. Companion cheatsheets for linear algebra and the broader course also worth grabbing.