Vapnik–Chervonenkis dimension

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis dimension is a measure of the capacity of a space of functions that can be learned by a statistical classification algorithm. It is defined as the cardinality of the largest set of points that the algorithm can shatter. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.
Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This function may not fit the training set well, because it has a low capacity. This notion of capacity is made rigorous below.

Definitions

VC dimension of a set-family

Let be a set family and a set. Their intersection is defined as the following set family:
We say that a set is shattered by if contains all the subsets of, i.e.:
The VC dimension of is the largest cardinality of sets shattered by. If arbitrarily large subsets can be shattered, the VC dimension is.

VC dimension of a classification model

A classification model with some parameter vector is said to shatter a set of data points if, for all assignments of labels to those points, there exists a such that the model makes no errors when evaluating that set of data points.
The VC dimension of a model is the maximum number of points that can be arranged so that shatters them. More formally, it is the maximum cardinal such that some data point set of cardinality can be shattered by.

Examples

1. is a constant classifier. Its VC-dimension is 0 since it cannot shatter even a single point. In general, the VC dimension of a finite classification model, which can return at most different classifiers, is at most .
2. is a single-parametric threshold classifier on real numbers; i.e, for a certain threshold, the classifier returns 1 if the input number is larger than and 0 otherwise. The VC dimension of is 1 because: It can shatter a single point. For every point, a classifier labels it as 0 if and labels it as 1 if. It cannot shatter any set of two points. For every set of two numbers, if the smaller is labeled 1, then the larger must also be labeled 1, so not all labelings are possible.
3. is a single-parametric interval classifier on real numbers; i.e, for a certain parameter, the classifier returns 1 if the input number is in the interval and 0 otherwise. The VC dimension of is 2 because: It can shatter some sets of two points. E.g, for every set, a classifier labels it as if or if, as if, as if, and as if.
It cannot shatter any set of three points. For every set of three numbers, if the smallest and the largest are labeled 1, then the middle one must also be labeled 1, so not all labelings are possible.
4. is a straight line as a classification model on points in a two-dimensional plane. The line should separate positive data points from negative data points. There exist sets of 3 points that can indeed be shattered using this model. However, no set of 4 points can be shattered: by Radon's theorem, any four points can be partitioned into two subsets with intersecting convex hulls, so it is not possible to separate one of these two subsets from the other. Thus, the VC dimension of this particular classifier is 3. It is important to remember that while one can choose any arrangement of points, the arrangement of those points cannot change when attempting to shatter for some label assignment. Note, only 3 of the 2³ = 8 possible label assignments are shown for the three points.
5. is a single-parametric sine classifier, i.e, for a certain parameter, the classifier returns 1 if the input number is larger than and 0 otherwise. The VC dimension of is infinite, since it can shatter any finite subset of the set.

Uses

In statistical learning theory

The VC dimension can predict a probabilistic upper bound on the test error of a classification model. Vapnik proved that the probability of the test error distancing from an upper bound is given by:
where is the VC dimension of the classification model,, and is the size of the training set.
The VC dimension also appears in sample-complexity bounds. A space of binary functions with VC dimension can be learned with:
samples, where is the learning error and is the failure probability. Thus, the sample-complexity is a linear function of the VC dimension of the hypothesis space.

In [computational geometry]

The VC dimension is one of the critical parameters in the size of ε-nets, which determines the complexity of approximation algorithms based on them; range sets without finite VC dimension may not have finite ε-nets at all.

Bounds

0. The VC dimension of the dual set-family of is strictly less than, and this is best possible.
1. The VC dimension of a finite set-family is at most. This is because by definition.
2. Given a set-family, define as a set-family that contains all intersections of elements of. Then:
3. Given a set-family and an element, define where denotes symmetric set difference. Then:

VC dimension of a finite projective plane

A finite projective plane of order n is a collection of n² + n + 1 sets over n² + n + 1 elements, for which:

Each line contains exactly n + 1 points.
Each line intersects every other line in exactly one point.
Each point is contained in exactly n + 1 lines.
Each point is in exactly one line in common with every other point.
At least four points do not lie in a common line.

The VC dimension of a finite projective plane is 2.
Proof: For each pair of distinct points, there is one line that contains both of them, lines that contain only one of them, and lines that contain none of them, so every set of size 2 is shattered. For any triple of three distinct points, if there is a line x that contain all three, then there is no line y that contains exactly two. Hence, no set of size 3 is shattered.

VC dimension of a boosting classifier

Suppose we have a base class of simple classifiers, whose VC dimension is.
We can construct a more powerful classifier by combining several different classifiers from ; this technique is called boosting. Formally, given classifiers and a weight vector, we can define the following classifier:
The VC dimension of the set of all such classifiers, assuming, is at most:

VC dimension of a neural network

A neural network is described by a directed acyclic graph G, where:

V is the set of nodes. Each node is a simple computation cell.
E is the set of edges, Each edge has a weight.
The input to the network is represented by the sources of the graph – the nodes with no incoming edges.
The output of the network is represented by the sinks of the graph – the nodes with no outgoing edges.
Each intermediate node gets as input a weighted sum of the outputs of the nodes at its incoming edges, where the weights are the weights on the edges.
Each intermediate node outputs a certain increasing function of its input, such as the sign function or the sigmoid function. This function is called the activation function.

The VC dimension of a neural network is bounded as follows:

If the activation function is the sign function and the weights are general, then the VC dimension is at most.
If the activation function is the sigmoid function and the weights are general, then the VC dimension is at least and at most.
If the weights come from a finite family, then, for both activation functions, the VC dimension is at most.
Generalizations

The VC dimension is defined for spaces of binary functions. Several generalizations have been suggested for spaces of non-binary functions.

For multi-valued functions, the Natarajan dimension can be used. Ben David et al present a generalization of this concept.
For real-valued functions, Pollard's pseudo-dimension can be used.
The Rademacher complexity provides similar bounds to the VC, and can sometimes provide more insight than VC dimension calculations into such statistical methods such as those using kernels.
Footnotes

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...