Sparse PCA

Sparse principal component analysis is a specialised technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis for the reduction of dimensionality of data by introducing sparsity structures to the input variables.
A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables. Sparse PCA overcomes this disadvantage by finding linear combinations that contain just a few input variables.
Contemporary datasets often have the number of input variables comparable with or even much larger than the number of samples. It has been shown that if does not converge to zero, the classical PCA is not consistent.
But sparse PCA can retain consistency even if

Mathematical formulation

Consider a data matrix,, where each of the columns represent an input variable, and each of the rows represents an independent sample from data population. One assumes each column of has mean zero, otherwise one can subtract column-wise mean from each element of.
Let be the empirical covariance matrix of, which has dimension. Given an integer with, the sparse PCA problem can be formulated as maximizing the variance along a direction represented by vector while constraining its cardinality:
The first constraint specifies that v is a unit vector. In the second constraint, represents the L0 norm of v, which is defined as the number of its non-zero components. So the second constraint specifies that the number of non-zero components in v is less than or equal to k, which is typically an integer that is much smaller than dimension p. The optimal value of is known as the k-sparse largest eigenvalue.
If one takes k=p, the problem reduces to the ordinary PCA, and the optimal value becomes the largest eigenvalue of covariance matrix Σ.
After finding the optimal solution v, one deflates Σ to obtain a new matrix
and iterate this process to obtain further principal components. However, unlike PCA, sparse PCA cannot guarantee that different principal components are orthogonal. In order to achieve orthogonality, additional constraints must be enforced.
The following equivalent definition is in matrix form.
Let be a p×p symmetric matrix, one can rewrite the sparse PCA problem as
Tr is the matrix trace, and represents the non-zero elements in matrix V.
The last line specifies that V has matrix rank one and is positive semidefinite.
The last line means that one has, so is equivalent to.
Moreover, the rank constraint in this formulation is actually redundant, and therefore sparse PCA can be cast as the following mixed-integer semidefinite program
Because of the cardinality constraint, the maximization problem is hard to solve exactly, especially when dimension p is high. In fact, the sparse PCA problem in is NP-hard in the strong sense.

Algorithms for Sparse PCA

Several alternative approaches have been proposed, including

a regression framework,
a convex relaxation/semidefinite programming framework,
a generalized power method framework
an alternating maximization framework
forward-backward greedy search and exact methods using branch-and-bound techniques,
a certifiably optimal branch-and-bound approach
Bayesian formulation framework.
A certifiably optimal mixed-integer semidefinite branch-and-cut approach

The methodological and theoretical developments of Sparse PCA as well as its applications in scientific studies are recently reviewed in a survey paper.

Regression approach via lasso (elastic net)

Semidefinite Programming Relaxation

It has been proposed that sparse PCA can be approximated by semidefinite programming. If one drops the rank constraint and relaxes the cardinality constraint by a 1-norm convex constraint, one gets a semidefinite programming relaxation, which can be solved efficiently in polynomial time:
In the second constraint, is a p×1 vector of ones, and |V| is the matrix whose elements are the absolute values of the elements of V.
The optimal solution to the relaxed problem is not guaranteed to have rank one. In that case, can be truncated to retain only the dominant eigenvector.
While the semidefinite program does not scale beyond n=300 covariates, it has been shown that a second-order cone relaxation of the semidefinite relaxation is almost as tight and successfully solves problems with n=1000s of covariates

Applications

Financial Data Analysis

Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may generate principal components that are weighted combination of all the assets. In contrast, sparse PCA would produce principal components that are weighted combination of only a few input assets, so one can easily interpret its meaning. Furthermore, if one uses a trading strategy based on these principal components, fewer assets imply less transaction costs.

Biology

Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principal component that involves only a few genes, so researchers can focus on these specific genes for further analysis.

High-dimensional Hypothesis Testing

Contemporary datasets often have the number of input variables comparable with or even much larger than the number of samples. It has been shown that if does not converge to zero, the classical PCA is not consistent. In other words, if we let in, then
the optimal value does not converge to the largest eigenvalue of data population when the sample size, and the optimal solution does not converge to the direction of maximum variance.
But sparse PCA can retain consistency even if
The k-sparse largest eigenvalue can be used to discriminate an isometric model, where every direction has the same variance, from a spiked covariance model in high-dimensional setting. Consider a hypothesis test where the null hypothesis specifies that data are generated from a multivariate normal distribution with mean 0 and covariance equal to an identity matrix, and the alternative hypothesis specifies that data is generated from a spiked model with signal strength :
where has only k non-zero coordinates. The largest k-sparse eigenvalue can discriminate the two hypothesis if and only if.
Since computing k-sparse eigenvalue is NP-hard, one can approximate it by the optimal value of semidefinite programming relaxation. If that case, we can discriminate the two hypotheses if. The additional term cannot be improved by any other polynomical time algorithm if the planted clique conjecture holds.

Software/source code

elasticnet – R package for Sparse Estimation and Sparse PCA using Elastic-Nets
nsprcomp - R package for sparse and/or non-negative PCA based on thresholded power iterations
Scikit-learn – Python library for machine learning which contains Sparse PCA and other techniques in the decomposition module.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...