Hyperparameter optimization

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.
The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data. The objective function takes a tuple of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance.

Approaches

Grid search

The traditional way of performing hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set
or evaluation on a held-out validation set.
Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search.
For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say
Grid search then trains an SVM with each pair in the Cartesian product of these two sets and evaluates their performance on a held-out validation set. Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure.
Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other.

Random search

Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample.

Bayesian optimization

Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization, aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration and exploitation. In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.

Gradient-based optimization

For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression.
A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation.

Evolutionary optimization

Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution:

Create an initial population of random solutions
Evaluate the hyperparameters tuples and acquire their fitness function
Rank the hyperparameter tuples by their relative fitness
Replace the worst-performing hyperparameter tuples with new hyperparameter tuples generated through crossover and mutation
Repeat steps 2-4 until satisfactory algorithm performance is reached or algorithm performance is no longer improving

Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, automated machine learning, deep neural network architecture search, as well as training of the weights in deep neural networks.

Population-based

Population Based Training learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. Poorly performing models are iteratively replaced with models that adopt modified hyperparameter values from a better performer. The modification allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures.

Others

and spectral approaches have also been developed.

Open-source software

Grid search

is a Kubernetes-native system which includes grid search.
scikit-learn is a Python package which includes search.
is a Python library for distributed hyperparameter tuning and supports grid search.
includes grid search for Keras.
provides grid search over algorithms in the H2O open source machine learning library.
Random search
, also via and , are Python packages which include random search.
is a Kubernetes-native system which includes random search.
scikit-learn is a Python package which includes search.
is a Python library for distributed hyperparameter tuning and supports random search over arbitrary parameter distributions.
includes a customizable random search for Keras.
Bayesian
is a Bayesian hyperparameter optimization layer on top of scikit-learn.
is a Python-based experimentation platform that supports Bayesian optimization and bandit optimization as exploration strategies.
is a Matlab package which uses semidefinite programming for minimizing a black-box function over discrete inputs. A Python 3 implementation is also included.
is a Python package which combines Bayesian optimization with bandit-based methods.
is a Kubernetes-native system which includes bayesian optimization.
, also with , is an R package for model-based/Bayesian optimization of black-box functions.
is a Python package or sequential model-based optimization with a scipy.optimize interface.
SMAC is a Python/Java library implementing Bayesian optimization.
is an R package for tuning random forests using model-based optimization.
is a Python package for black box optimization, compatible with arbitrary functions that need to be optimized.
Gradient-based optimization
is a Python package containing Tensorflow implementations and wrappers for gradient-based hyperparamteter optimization with forward and reverse mode algorithmic differentiation.
is an open-source software library which provides a gradient boosting framework for C++, Java, Python, R, and Julia.
Evolutionary
is a Python framework for general evolutionary computation which is flexible and integrates with parallelization packages like and pyspark, and other Python frameworks like sklearn via .
is a Python package that performs Deep Neural Network architecture search using genetic programming.
is a Python package which includes Differential_evolution, Evolution_strategy, Bayesian_optimization, population control methods for the noisy case and Particle_swarm_optimization.
is a Python library for distributed hyperparameter tuning and leverages for evolutionary algorithm support.
Other
dlib is a C++ package with a Python API which has a parameter-free optimizer based on and trust region optimizers working in tandem.
is a Python library for hyperparameter tuning execution and integrates with/scales many existing hyperparameter optimization libraries such as , , and .
is a Python package for spectral hyperparameter optimization.
, also via and , are Python packages which include Tree of Parzen Estimators based distributed hyperparameter optimization.
is a Kubernetes-native system which includes grid, random search, bayesian optimization, hyperband, and NAS based on reinforcement learning.
is a Python package for gradient-free optimization using techniques such as differential evolution, sequential quadratic programming, fastGA, covariance matrix adaptation, population control methods, and particle swarm optimization.
is a Python package which includes hyperparameter tuning for neural networks in local and distributed environments. Its techniques include TPE, random, anneal, evolution, SMAC, batch, grid, and hyperband.
is a similar Python package which includes several techniques grid search, Bayesian and genetic Optimization
is a Python implementation of Covariance Matrix Adaptation Evolution Strategy.
is a Python package that uses a radial basis function model
Commercial services
uses Gaussian processes to tune hyperparameters.
supports mixed search domains
supports mixed search domains
supports multiobjective, multifidelity and constraint optimization
supports mixed search domains, multiobjective, constraints, parallel optimization and surrogate models.
supports mixed search domains, multiobjective, multisolution, multifidelity, constraint, and parallel optimization.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Hyperparameter optimization

Approaches

Grid search

Random search

Bayesian optimization

Gradient-based optimization

Evolutionary optimization

Population-based

Others

Open-source software

Grid search

Random search

Bayesian

Gradient-based optimization

Evolutionary

Other

Commercial services