Stability (learning theory)

Stability, also known as algorithmic stability, is a notion in computational learning theory of how a machine learning algorithm is perturbed by small changes to its inputs. A stable learning algorithm is one for which the prediction does not change much when the training data is modified slightly. For instance, consider a machine learning algorithm that is being trained to recognize handwritten letters of the alphabet, using 1000 examples of handwritten letters and their labels as a training set. One way to modify this training set is to leave out an example, so that only 999 examples of handwritten letters and their labels are available. A stable learning algorithm would produce a similar classifier with both the 1000-element and 999-element training sets.
Stability can be studied for many types of learning problems, from language learning to inverse problems in physics and engineering, as it is a property of the learning process rather than the type of information being learned. The study of stability gained importance in computational learning theory in the 2000s when it was shown to have a connection with generalization. It was shown that for large classes of learning algorithms, notably empirical risk minimization algorithms, certain types of stability ensure good generalization.

History

A central goal in designing a machine learning system is to guarantee that the learning algorithm will generalize, or perform accurately on new examples after being trained on a finite number of them. In the 1990s, milestones were reached in obtaining generalization bounds for supervised learning algorithms. The technique historically used to prove generalization was to show that an algorithm was consistent, using the uniform convergence properties of empirical quantities to their means. This technique was used to obtain generalization bounds for the large class of empirical risk minimization algorithms. An ERM algorithm is one that selects a solution from a hypothesis space in such a way to minimize the empirical error on a training set.
A general result, proved by Vladimir Vapnik for an ERM binary classification algorithms, is that for any target function and input distribution, any hypothesis space with VC-dimension, and training examples, the algorithm is consistent and will produce a training error that is at most from the true error. The result was later extended to almost-ERM algorithms with function classes that do not have unique minimizers.
Vapnik's work, using what became known as VC theory, established a relationship between generalization of a learning algorithm and properties of the hypothesis space of functions being learned. However, these results could not be applied to algorithms with hypothesis spaces of unbounded VC-dimension. Put another way, these results could not be applied when the information being learned had a complexity that was too large to measure. Some of the simplest machine learning algorithms—for instance, for regression—have hypothesis spaces with unbounded VC-dimension. Another example is language learning algorithms that can produce sentences of arbitrary length.
Stability analysis was developed in the 2000s for computational learning theory and is an alternative method for obtaining generalization bounds. The stability of an algorithm is a property of the learning process, rather than a direct property of the hypothesis space, and it can be assessed in algorithms that have hypothesis spaces with unbounded or undefined VC-dimension such as nearest neighbor. A stable learning algorithm is one for which the learned function does not change much when the training set is slightly modified, for instance by leaving out an example. A measure of Leave one out error is used in a Cross Validation Leave One Out algorithm to evaluate a learning algorithm's stability with respect to the loss function. As such, stability analysis is the application of sensitivity analysis to machine learning.

Summary of classic results

Early 1900s - Stability in learning theory was earliest described in terms of continuity of the learning map, traced to Andrey Nikolayevich Tikhonov.
1979 - Devroye and Wagner observed that the leave-one-out behavior of an algorithm is related to its sensitivity to small changes in the sample.
1999 - Kearns and Ron discovered a connection between finite VC-dimension and stability.
2002 - In a landmark paper, Bousquet and Elisseeff proposed the notion of uniform hypothesis stability of a learning algorithm and showed that it implies low generalization error. Uniform hypothesis stability, however, is a strong condition that does not apply to large classes of algorithms, including ERM algorithms with a hypothesis space of only two functions.
2002 - Kutin and Niyogi extended Bousquet and Elisseeff's results by providing generalization bounds for several weaker forms of stability which they called almost-everywhere stability. Furthermore, they took an initial step in establishing the relationship between stability and consistency in ERM algorithms in the Probably Approximately Correct setting.
2004 - In an unusual publication sufficient for generalization in bounded loss classes, and b) necessary and sufficient for consistency of ERM algorithms for certain loss functions.
2010 - Shalev Shwartz noticed problems with the original results of Vapnik due to the complex relations between hypothesis space and loss class. They discuss stability notions that capture different loss classes and different types of learning, supervised and unsupervised.
Preliminary definitions

We define several terms related to learning algorithms training sets, so that we can then define stability in multiple ways and present theorems from the field.
A machine learning algorithm, also known as a learning map, maps a training data set, which is a set of labeled examples, onto a function from to, where and are in the same space of the training examples. The functions are selected from a hypothesis space of functions called.
The training set from which an algorithm learns is defined as
and is of size in
drawn i.i.d. from an unknown distribution D.
Thus, the learning map is defined as a mapping from into, mapping a training set onto a function from to. Here, we consider only deterministic algorithms where is symmetric with respect to, i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable.
The loss of a hypothesis with respect to an example is then defined as.
The empirical error of is.
The true error of is
Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows:

By removing the i-th element
By replacing the i-th element
Definitions of stability

Hypothesis Stability

An algorithm has hypothesis stability β with respect to the loss function V if the following holds:

Point-wise Hypothesis Stability

An algorithm has point-wise hypothesis stability β with respect to the loss function V if the following holds:

Error Stability

An algorithm has error stability β with respect to the loss function V if the following holds:

Uniform Stability

An algorithm has uniform stability β with respect to the loss function V if the following holds:
A probabilistic version of uniform stability β is:
An algorithm is said to be stable, when the value of decreases as.

Leave-one-out cross-validation (CVloo) Stability

An algorithm has CVloo stability β with respect to the loss function V if the following holds:
The definition of Stability is equivalent to Pointwise-hypothesis stability seen earlier.

Expected-leave-one-out error ( $Eloo_{err}$ ) Stability

An algorithm has stability if for each n there exists a and a such that:
, with and going to zero for

Classic theorems

From Bousquet and Elisseeff :
For symmetric learning algorithms with bounded loss, if the algorithm has Uniform Stability with the probabilistic definition above, then the algorithm generalizes.
Uniform Stability is a strong condition which is not met by all algorithms but is, surprisingly, met by the large and important class of Regularization algorithms.
The generalization bound is given in the article.
From Mukherjee et al. :

For symmetric learning algorithms with bounded loss, if the algorithm has both Leave-one-out cross-validation Stability and Expected-leave-one-out error Stability as defined above, then the algorithm generalizes.
Neither condition alone is sufficient for generalization. However, both together ensure generalization.
For ERM algorithms specifically, Leave-one-out cross-validation Stability is both necessary and sufficient for consistency and generalization.

This is an important result for the foundations of learning theory, because it shows that two previously unrelated properties of an algorithm, stability and consistency, are equivalent for ERM.
The generalization bound is given in the article.

Algorithms that are stable

This is a list of algorithms that have been shown to be stable, and the article where the associated generalization bounds are provided.

Linear regression
k-NN classifier with a loss function.
Support Vector Machine classification with a bounded kernel and where the regularizer is a norm in a Reproducing Kernel Hilbert Space. A large regularization constant leads to good stability.
Soft margin SVM classification.
Regularized Least Squares regression.
The minimum relative entropy algorithm for classification.
A version of bagging regularizers with the number of regressors increasing with.
Multi-class SVM classification.
All learning algorithms with Tikhonov regularization satisfies Uniform Stability criteria and are, thus, generalizable.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...