Grid search. The default beginner choice. Wasteful — most hyperparameters aren't equally important, but grid spends the same effort on each axis. Useful only with ≤ 3 hyperparameters.
Random search. Sample N points uniformly. Bergstra & Bengio (2012) showed this is strictly better than grid in high dimensions because the search distributes budget across the relevant axes. Surprisingly hard to beat in practice when N is large enough.
Bayesian optimisation. Build a surrogate model (Gaussian process or random forest) of the objective; pick the next trial to maximise expected improvement. Wins when each trial is expensive. Tools: Optuna, scikit-optimize, BoTorch.
Hyperband / ASHA. Allocate many trials, but kill the bad ones early. Train at low compute; promote the top-k to more compute; iterate. Often the most cost-efficient strategy for deep learning.
Population-based training (PBT). Run many models in parallel; periodically copy the best ones' weights and perturb their hyperparameters. Adapts hyperparameters during training.