Least-squares support-vector machine

Least-squares support-vector machines are least-squares versions of support-vector machines, which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming problem for classical SVMs. Least-squares SVM classifiers were proposed by Suykens and Vandewalle. LS-SVMs are a class of kernel-based learning methods.

From support-vector machine to least-squares support-vector machine

Given a training set with input data and corresponding binary class labels, the SVM classifier, according to Vapnik’s original formulation, satisfies the following conditions:
which is equivalent to
where is the nonlinear map from original space to the high- or infinite-dimensional space.

Inseparable data

In case such a separating hyperplane does not exist, we introduce so-called slack variables such that
According to the structural risk minimization principle, the risk bound is minimized by the following minimization problem:
To solve this problem, we could construct the Lagrangian function:
where are the Lagrangian multipliers. The optimal point will be in the saddle point of the Lagrangian function, and then we obtain
By substituting by its expression in the Lagrangian formed from the appropriate objective and constraints, we will get the following quadratic programming problem:
where is called the kernel function. Solving this QP problem subject to constraints in, we will get the hyperplane in the high-dimensional space and hence the classifier in the original space.

Least-squares SVM formulation

The least-squares version of the SVM classifier is obtained by reformulating the minimization problem as
subject to the equality constraints
The least-squares SVM classifier formulation above implicitly corresponds to a regression interpretation with binary targets.
Using, we have
with Notice, that this error would also make sense for least-squares data fitting, so that the same end results holds for the regression case.
Hence the LS-SVM classifier formulation is equivalent to
with and
Both and should be considered as hyperparameters to tune the amount of regularization versus the sum squared error. The solution does only depend on the ratio, therefore the original formulation uses only as tuning parameter. We use both and as parameters in order to provide a Bayesian interpretation to LS-SVM.
The solution of LS-SVM regressor will be obtained after we construct the Lagrangian function:
where are the Lagrange multipliers. The conditions for optimality are
Elimination of and will yield a linear system instead of a quadratic programming problem:
with, and. Here, is an identity matrix, and is the kernel matrix defined by.

Kernel function ''K''

For the kernel function K one typically has the following choices:

Linear kernel :
Polynomial kernel of degree :
Radial basis function RBF kernel :
MLP kernel :

where,,, and are constants. Notice that the Mercer condition holds for all and values in the polynomial and RBF case, but not for all possible choices of and in the MLP case. The scale parameters, and determine the scaling of the inputs in the polynomial, RBF and MLP kernel function. This scaling is related to the bandwidth of the kernel in statistics, where it is shown that the bandwidth is an important parameter of the generalization behavior of a kernel method.

Bayesian interpretation for LS-SVM

A Bayesian interpretation of the SVM has been proposed by Smola et al. They showed that the use of different kernels in SVM can be regarded as defining different prior probability distributions on the functional space, as. Here is a constant and is the regularization operator corresponding to the selected kernel.
A general Bayesian evidence framework was developed by MacKay, and MacKay has used it to the problem of regression, forward neural network and classification network. Provided data set, a model with parameter vector and a so-called hyperparameter or regularization parameter, Bayesian inference is constructed with 3 levels of inference:

In level 1, for a given value of, the first level of inference infers the posterior distribution of by Bayesian rule
The second level of inference determines the value of, by maximizing
The third level of inference in the evidence framework ranks different models by examining their posterior probabilities

We can see that Bayesian evidence framework is a unified theory for learning the model and model selection.
Kwok used the Bayesian evidence framework to interpret the formulation of SVM and model selection. And he also applied Bayesian evidence framework to support vector regression.
Now, given the data points and the hyperparameters and of the model, the model parameters and are estimated by maximizing the posterior. Applying Bayes’ rule, we obtain
where is a normalizing constant such the integral over all possible and is equal to 1.
We assume and are independent of the hyperparameter, and are conditional independent, i.e., we assume
When, the distribution of will approximate a uniform distribution. Furthermore, we assume and are Gaussian distribution, so we obtain the a priori distribution of and with to be
Here is the dimensionality of the feature space, same as the dimensionality of.
The probability of is assumed to depend only on and. We assume that the data points are independently identically distributed, so that:
In order to obtain the least square cost function, it is assumed that the probability of a data point is proportional to:
A Gaussian distribution is taken for the errors as:
It is assumed that the and are determined in such a way that the class centers and are mapped onto the target -1 and +1, respectively. The projections of the class elements follow a multivariate Gaussian distribution, which have variance.
Combining the preceding expressions, and neglecting all constants, Bayes’ rule becomes
The maximum posterior density estimates and are then be obtained by minimizing the negative logarithm of, so we arrive.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...