Kernel density estimation

In statistics, kernel density estimation is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy dramatically.

Definition

Let be a univariate independent and identically distributed sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is
where K is the kernel — a non-negative function — and is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as. Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.
A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means, where ϕ is the standard normal density function.
The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels are placed at each data point locations x_i. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning.

Example

Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. To see this, we compare the construction of histogram and kernel density estimators, using these 6 data points:

Sample	1	2	3	4	5	6
Value	-2.1	-1.3	-0.4	1.9	5.1	6.2

For the histogram, first the horizontal axis is divided into sub-intervals or bins which cover the range of the data. In this case, we have six bins each of width 2. Whenever a data point falls inside this interval, we place a box of height 1/12. If more than one data point falls inside the same bin, we stack the boxes on top of each other.
For the kernel density estimate, we place a normal kernel with standard deviation 2.25 on each of the data points x_i. The kernels are summed to make the kernel density estimate. The smoothness of the kernel density estimate is evident compared to the discreteness of the histogram, as kernel density estimates converge faster to the true underlying density for continuous random variables.

Bandwidth selection

The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution. The grey curve is the true density. In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit , where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit the estimate retains the shape of the used kernel, centered on the mean of the samples.
The most common optimality criterion used to select this parameter is the expected L₂ risk function, also termed the mean integrated squared error:
Under weak assumptions on ƒ and K,,
MISE = AMISE + o where o is the little o notation.
The AMISE is the Asymptotic MISE which consists of the two leading terms
where for a function g,
and ƒ' is the second derivative of ƒ. The minimum of this AMISE is the solution to this differential equation
or
Neither the AMISE nor the h_AMISE formulas are able to be used directly since they involve the unknown density function ƒ or its second derivative ƒ', so a variety of automatic, data-based methods have been developed for selecting the bandwidth. Many review studies have been carried out to compare their efficacies, with the general consensus that the plug-in selectors
and cross validation selectors are the most useful over a wide range of data sets.
Substituting any bandwidth h which has the same asymptotic order n^−1/5 as h_AMISE into the AMISE
gives that AMISE = O, where O is the big o notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the n^−4/5 rate is slower than the typical n⁻¹ convergence rate of parametric methods.
If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate or the samples, this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation.
Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult.

A rule-of-thumb bandwidth estimator

If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h is:
In order to make the h value more robust to make the fitness well for both long-tailed and skew distribution and bimodal mixture distribution, it is better to substitute the value of with another parameter A, which is given by:
Another modification that will improve the model is to reduce the factor from 1.06 to 0.9. Then the final formula would be:
where is the standard deviation of the samples, n is the sample size. IQR is the interquartile range.
This approximation is termed the normal distribution approximation, Gaussian approximation, or Silverman's rule of thumb.
While this rule of thumb is easy to compute, it should be used with caution as it can yield widely inaccurate estimates when the density is not close to being normal. For example, consider estimating the bimodal Gaussian mixture:
from a sample of 200 points. The figure on the right below shows the true density and two kernel density estimates—one using the rule-of-thumb bandwidth, and the other using
a solve-the-equation bandwidth. The estimate based on the rule-of-thumb bandwidth is significantly oversmoothed.
The Matlab script for this example uses
and is given below.

% Data
randn % Used for reproducibility
data = ; % Two Normals mixed
% True
phi = @ exp/sqrt; % Normal Density
tpdf = @ phi/2+phi/2; % True Density
% Kernel
h = std*^; % Bandwidth estimated by Silverman's Rule of Thumb
kernel = @ mean/h); % Kernel Density
kpdf = @ arrayfun; % Elementwise application
% Plot
figure, clf, hold on
x = linspace; % Linear Space
plot % Plot True Density
plot % Plot Kernel Density with Silverman's Rule of Thumb
kde % Plot Kernel Density with Solve-the-Equation Bandwidth

The same code with R language
` Data

set.seed #Used for reproducibility
data = c,rnorm) #Two Normals mixed

` True

phi = function exp/sqrt #Normal Density
tpdf = function phi/2+phi/2 #True Density

` Kernel

h = sd*^ #Bandwidth estimated by Silverman's Rule of Thumb
Kernel2 = function mean/h) #Kernel Density
kpdf = function sapply #Elementwise application

` Plot

x=seq #Linear Space
plot,type="l",ylim=c #Plot True Density
par
plot,type="l",ylim=c #Plot Kernel Density with Silverman's Rule of Thumb

Relation to the characteristic function density estimator

Given the sample, it is natural to estimate the characteristic function as
Knowing the characteristic function, it is possible to find the corresponding probability density function through the Fourier transform formula. One difficulty with applying this inversion formula is that it leads to a diverging integral, since the estimate is unreliable for large t’s. To circumvent this problem, the estimator is multiplied by a damping function, which is equal to 1 at the origin and then falls to 0 at infinity. The “bandwidth parameter” h controls how fast we try to dampen the function. In particular when h is small, then ψ_h will be approximately one for a large range of t’s, which means that remains practically unaltered in the most important region of t’s.
The most common choice for function ψ is either the uniform function, which effectively means truncating the interval of integration in the inversion formula to, or the Gaussian function. Once the function ψ has been chosen, the inversion formula may be applied, and the density estimator will be
where K is the Fourier transform of the damping function ψ. Thus the kernel density estimator coincides with the characteristic function density estimator.

Geometric and topological features

We can extend the definition of the mode to a local sense and define the local modes:
Namely, is the collection of points for which the density function is locally maximized. A natural estimator of is a plug-in from KDE, where and are KDE version of and. Under mild assumptions, is a consistent estimator of. Note that one can use the mean shift algorithm to compute the estimator numerically.

Statistical implementation

A non-exhaustive list of software implementations of kernel density estimators includes:

In Analytica release 4.4, the Smoothing option for PDF results uses KDE, and from expressions it is available via the built-in Pdf function.
In C/C++, is a library that can be used to compute kernel density estimates using normal kernels. MATLAB interface available.
In C++, is a library for variable kernel density estimation.
In C++, mlpack is a library that can compute KDE using many different kernels. It allows to set an error tolerance for faster computation. Python and R interfaces available.
in C# and F#, Math.NET Numerics is an open source library for numerical computation which includes
In CrimeStat, kernel density estimation is implemented using five different kernel functions – normal, uniform, quartic, negative exponential, and triangular. Both single- and dual-kernel density estimate routines are available. Kernel density estimation is also used in interpolating a Head Bang routine, in estimating a two-dimensional Journey-to-crime density function, and in estimating a three-dimensional Bayesian Journey-to-crime estimate.
In ELKI, kernel density functions can be found in the package de.lmu.ifi.dbs.elki.math.statistics.kernelfunctions
In ESRI products, kernel density mapping is managed out of the Spatial Analyst toolbox and uses the Quartic kernel.
In Excel, the Royal Society of Chemistry has created an add-in to run kernel density estimation based on their .
In gnuplot, kernel density estimation is implemented by the smooth kdensity option, the datafile can contain a weight and bandwidth for each point, or the bandwidth can be set automatically according to "Silverman's rule of thumb".
In Haskell, kernel density is implemented in the package.
In IGOR Pro, kernel density estimation is implemented by the StatsKDE operation. Bandwidth can be user specified or estimated by means of Silverman, Scott or Bowmann and Azzalini. Kernel types are: Epanechnikov, Bi-weight, Tri-weight, Triangular, Gaussian and Rectangular.
In Java, the Weka package provides , among others.
In JavaScript, the visualization package D3.js offers a KDE package in its science.stats package.
In JMP, the distribution platform can be used to create univariate kernel density estimates, and the Fit Y by X platform can be used to create bivariate kernel density estimates. It uses the formula, but not the optimized formula mentioned in book of Silverman.
In Julia, kernel density estimation is implemented in the package.
In MATLAB, kernel density estimation is implemented through the ksdensity function. As of the 2018a release of MATLAB, Alternatively, a free MATLAB software package which implements an automatic bandwidth selection method is available from the MATLAB Central File Exchange for
*
*
*

A free MATLAB toolbox with implementation of kernel regression, kernel density estimation, kernel estimation of hazard function and many others is available on .

In Mathematica, numeric kernel density estimation is implemented by the function SmoothKernelDistribution and symbolic estimation is implemented using the function KernelMixtureDistribution both of which provide data-driven bandwidths.
In Minitab, the Royal Society of Chemistry has created a macro to run kernel density estimation based on their .
In the NAG Library, kernel density estimation is implemented via the g10ba routine.
In , C++ kernel density methods focus on data from the Special Euclidean group.
In Octave, kernel density estimation is implemented by the kernel_density option.
In Origin, 2D kernel density plot can be made from its user interface, and two functions, Ksdensity for 1D and Ks2density for 2D can be used from its , Python, or C code.
In Perl, an implementation can be found in the
In PHP, an implementation can be found in the
In Python, many implementations exist: in the , SciPy, Statsmodels, and Scikit-learn . supports weighted data and its FFT implementation is orders of magnitude faster than the other implementations. The commonly used pandas library offers support for kde plotting through the plot method. The package for weighted and correlated MCMC samples supports optimized bandwidth, boundary correction and higher-order methods for 1D and 2D distributions. One newly used package for kernel density estimation is seaborn.
In R, it is implemented through density in the base distribution, and bw.nrd0 function is used in stats package, this function uses the optimized formula in Silverman's book. bkde in the , ParetoDensityEstimation in the , kde in the , dkden and dbckden in the , npudens in the , sm.density in the . For an implementation of the kde.R function, which does not require installing any packages or libraries, see . The , dedicated to urban analysis, implements kernel density estimation through kernel_smoothing.
In SAS, proc kde can be used to estimate univariate and bivariate kernel densities.
In Apache Spark, you can use the KernelDensity class
In Stata, it is implemented through kdensity; for example histogram x, kdensity. Alternatively a free Stata module KDENS is available from allowing a user to estimate 1D or 2D density functions.
In Swift, it is implemented through SwiftStats.KernelDensityEstimation in the open-source statistics library .
In python a GPU implementation of KDE

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...