Kriging

In statistics, originally in geostatistics, kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances. Under suitable assumptions on the priors, kriging gives the best linear unbiased prediction of the intermediate values. Interpolating methods based on other criteria such as smoothness may not yield the most likely intermediate values. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.
The theoretical basis for the method was developed by the French mathematician Georges Matheron in 1960, based on the Master's thesis of Danie G. Krige, the pioneering plotter of distance-weighted average gold grades at the Witwatersrand reef complex in South Africa. Krige sought to estimate the most likely distribution of gold based on samples from a few boreholes. The English verb is to krige and the most common noun is kriging; both are often pronounced with a hard "g", following an Anglicized pronunciation of the name "Krige". The word is sometimes capitalized as Kriging in the literature.

Main principles

Related terms and techniques

The basic idea of kriging is to predict the value of a function at a given point by computing a weighted average of the known values of the function in the neighborhood of the point. The method is mathematically closely related to regression analysis. Both theories derive a best linear unbiased estimator, based on assumptions on covariances, make use of Gauss–Markov theorem to prove independence of the estimate and error, and make use of very similar formulae. Even so, they are useful in different frameworks: kriging is made for estimation of a single realization of a random field, while regression models are based on multiple observations of a multivariate data set.
The kriging estimation may also be seen as a spline in a reproducing kernel Hilbert space, with the reproducing kernel given by the covariance function. The difference with the classical kriging approach is provided by the interpretation: while the spline is motivated by a minimum norm interpolation based on a Hilbert space structure, kriging is motivated by an expected squared prediction error based on a stochastic model.
Kriging with polynomial trend surfaces is mathematically identical to generalized least squares polynomial curve fitting.
Kriging can also be understood as a form of Bayesian inference. Kriging starts with a prior distribution over functions. This prior takes the form of a Gaussian process: samples from a function will be normally distributed, where the covariance between any two samples is the covariance function of the Gaussian process evaluated at the spatial location of two points. A set of values is then observed, each value associated with a spatial location. Now, a new value can be predicted at any new spatial location, by combining the Gaussian prior with a Gaussian likelihood function for each of the observed values. The resulting posterior distribution is also Gaussian, with a mean and covariance that can be simply computed from the observed values, their variance, and the kernel matrix derived from the prior.

Geostatistical estimator

In geostatistical models, sampled data is interpreted as the result of a random process. The fact that these models incorporate uncertainty in their conceptualization doesn't mean that the phenomenon – the forest, the aquifer, the mineral deposit – has resulted from a random process, but rather it allows one to build a methodological basis for the spatial inference of quantities in unobserved locations, and to quantify the uncertainty associated with the estimator.
A stochastic process is, in the context of this model, simply a way to approach the set of data collected from the samples. The first step in geostatistical modulation is to create a random process that best describes the set of observed data.
A value from location is interpreted as a realization of the random variable. In the space, where the set of samples is dispersed, there are realizations of the random variables, correlated between themselves.
The set of random variables constitutes a random function of which only one realization is known – the set of observed data. With only one realization of each random variable it's theoretically impossible to determine any statistical parameter of the individual variables or the function. The proposed solution in the geostatistical formalism consists in assuming various degrees of stationarity in the random function, in order to make possible the inference of some statistic values.
For instance, if one assumes, based on the homogeneity of samples in area where the variable is distributed, the hypothesis that the first moment is stationary, then one is assuming that the mean can be estimated by the arithmetic mean of sampled values.
The hypothesis of stationarity related to the second moment is defined in the following way: the correlation between two random variables solely depends on the spatial distance between them, and is independent of their location. Thus if and then:
and, for simplicity, we define and.
This hypothesis allows one to infer those two measures – the variogram and the covariogram:
where:

;
denotes the set of pairs of observations such that, and is the number of pairs in the set. In this set, and denote the same element. Generally an "approximate distance" is used, implemented using a certain tolerance.
Linear estimation

Spatial inference, or estimation, of a quantity, at an unobserved location, is calculated from a linear combination of the observed values and weights :
The weights are intended to summarize two extremely important procedures in a spatial inference process:

reflect the structural "proximity" of samples to the estimation location,
at the same time, they should have a desegregation effect, in order to avoid bias caused by eventual sample clusters

When calculating the weights, there are two objectives in the geostatistical formalism: unbias and minimal variance of estimation.
If the cloud of real values is plotted against the estimated values, the criterion for global unbias, intrinsic stationarity or wide sense stationarity of the field, implies that the mean of the estimations must be equal to mean of the real values.
The second criterion says that the mean of the squared deviations must be minimal, which means that when the cloud of estimated values versus the cloud real values is more disperse, the estimator is more imprecise.

Methods

Depending on the stochastic properties of the random field and the various degrees of stationarity assumed, different methods for calculating the weights can be deduced, i.e. different types of kriging apply. Classical methods are:

Ordinary kriging assumes constant unknown mean only over the search neighborhood of.
Simple kriging assumes stationarity of the first moment over the entire domain with a known mean:, where is the known mean.
Universal kriging assumes a general polynomial trend model, such as linear trend model.
IRFk-kriging assumes to be an unknown polynomial in.
Indicator kriging uses indicator functions instead of the process itself, in order to estimate transition probabilities.
* Multiple-indicator kriging is a version of indicator kriging working with a family of indicators. Initially, MIK showed considerable promise as a new method that could more accurately estimate overall global mineral deposit concentrations or grades. However, these benefits have been outweighed by other inherent problems of practicality in modelling due to the inherently large block sizes used and also the lack of mining scale resolution. Conditional simulation is fast becoming the accepted replacement technique in this case.
Disjunctive kriging is a nonlinear generalisation of kriging.
Lognormal kriging interpolates positive data by means of logarithms.
Ordinary kriging

The unknown value is interpreted as a random variable located in, as well as the values of neighbors samples. The estimator is also interpreted as a random variable located in, a result of the linear combination of variables.
In order to deduce the kriging system for the assumptions of the model, the following error committed while estimating in is declared:
The two quality criteria referred to previously can now be expressed in terms of the mean and variance of the new random variable :
Lack of bias:
Since the random function is stationary,, the following constraint is observed:
In order to ensure that the model is unbiased, the weights must sum to one.
Minimum variance:
Two estimators can have, but the dispersion around their mean determines the difference between the quality of estimators. To find an estimator with minimum variance, we need to minimize.
* see covariance matrix for a detailed explanation
* where the literals stand for.
Once defined the covariance model or variogram, or, valid in all field of analysis of, then we can write an expression for the estimation variance of any estimator in function of the covariance between the samples and the covariances between the samples and the point to estimate:
Some conclusions can be asserted from this expression. The variance of estimation:

is not quantifiable to any linear estimator, once the stationarity of the mean and of the spatial covariances, or variograms, are assumed.
grows when the covariance between the samples and the point to estimate decreases. This means that, when the samples are farther away from, the estimation becomes worse.
grows with the a priori variance of the variable. When the variable is less disperse, the variance is lower in any point of the area.
does not depend on the values of the samples. This means that the same spatial configuration always reproduces the same estimation variance in any part of the area. This way, the variance does not measure the uncertainty of estimation produced by the local variable.

;System of equations
Solving this optimization problem results in the kriging system:
the additional parameter is a Lagrange multiplier used in the minimization of the kriging error to honor the unbiasedness condition.

Simple kriging

Simple kriging is mathematically the simplest, but the least general. It assumes the expectation of the random field to be known, and relies on a covariance function. However, in most applications neither the expectation nor the covariance are known beforehand.
The practical assumptions for the application of simple kriging are:

wide sense stationarity of the field,.
The expectation is zero everywhere:.
Known covariance function

;System of equations
The kriging weights of simple kriging have no unbiasedness condition
and are given by the simple kriging equation system:
This is analogous to a linear regression of on the other.
;Estimation
The interpolation by simple kriging is given by:
The kriging error is given by:
which leads to the generalised least squares version of the Gauss–Markov theorem :

Properties

The kriging estimation is unbiased:
The kriging estimation honors the actually observed value:
The kriging estimation is the best linear unbiased estimator of if the assumptions hold. However :
* As with any method: If the assumptions do not hold, kriging might be bad.
* There might be better nonlinear and/or biased methods.
* No properties are guaranteed, when the wrong variogram is used. However typically still a 'good' interpolation is achieved.
* Best is not necessarily good: e.g. In case of no spatial dependence the kriging interpolation is only as good as the arithmetic mean.
Kriging provides as a measure of precision. However this measure relies on the correctness of the variogram.
Applications

Although kriging was developed originally for applications in geostatistics, it is a general method of statistical interpolation that can be applied within any discipline to sampled data from random fields that satisfy the appropriate mathematical assumptions. It can be used where spatially-related data has been collected and estimates of "fill-in" data are desired in the locations between the actual measurements.
To date kriging has been used in a variety of disciplines, including the following:

Environmental science
Hydrogeology
Mining
Natural resources
Remote sensing
Real estate appraisal
Integrated Circuit Analysis and Optimization
Modelling of Microwave Devices
Astronomy
Design and analysis of computer experiments

Another very important and rapidly growing field of application, in engineering, is the interpolation of data coming out as response variables of deterministic computer simulations, e.g. finite element method simulations. In this case, kriging is used as a metamodeling tool, i.e. a black box model built over a designed set of computer experiments. In many practical engineering problems, such as the design of a metal forming process, a single FEM simulation might be several hours or even a few days long. It is therefore more efficient to design and run a limited number of computer simulations, and then use a kriging interpolator to rapidly predict the response in any other design point. Kriging is therefore used very often as a so-called surrogate model, implemented inside optimization routines.

Historical references

Agterberg, F P, Geomathematics, Mathematical Background and Geo-Science Applications, Elsevier Scientific Publishing Company, Amsterdam, 1974
Cressie, N. A. C., The origins of kriging, Mathematical Geology, v. 22, pp 239–252, 1990
Krige, D.G, A statistical approach to some mine valuations and allied problems at the Witwatersrand, Master's thesis of the University of Witwatersrand, 1951
Link, R F and Koch, G S, Experimental Designs and Trend-Surface Analsysis, Geostatistics, A colloquium, Plenum Press, New York, 1970
Matheron, G., "Principles of geostatistics", Economic Geology, 58, pp 1246–1266, 1963
Matheron, G., "The intrinsic random functions, and their applications", Adv. Appl. Prob., 5, pp 439–468, 1973
Merriam, D F, Editor, Geostatistics, a colloquium, Plenum Press, New York, 1970
Books

Abramowitz, M., and Stegun, I., Handbook of Mathematical Functions, Dover Publications, New York.
Banerjee, S., Carlin, B.P. and Gelfand, A.E.. Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC Press, Taylor and Francis Group.
Chiles, J.-P. and P. Delfiner Geostatistics, Modeling Spatial uncertainty, Wiley Series in Probability and statistics.
Clark, I, and Harper, W.V., Practical Geostatistics 2000, Ecosse North America, USA
Cressie, N Statistics for spatial data, Wiley, New York
David, M Handbook of Applied Advanced Geostatistical Ore Reserve Estimation, Elsevier Scientific Publishing
Deutsch, C.V., and Journel, A. G., GSLIB - Geostatistical Software Library and User's Guide, Oxford University Press, New York, 338 pp.
Goovaerts, P. Geostatistics for Natural Resources Evaluation, Oxford University Press, New York
Isaaks, E. H., and Srivastava, R. M., An Introduction to Applied Geostatistics, Oxford University Press, New York, 561 pp.
Journel, A. G. and C. J. Huijbregts Mining Geostatistics, Academic Press London
Journel, A. G., Fundamentals of Geostatistics in Five Lessons, American Geophysical Union, Washington D.C.
. Also, .
Stein, M. L., Statistical Interpolation of Spatial Data: Some Theory for Kriging, Springer, New York.
Wackernagel, H. Multivariate Geostatistics - An Introduction with Applications, Springer Berlin

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Kriging

Main principles

Related terms and techniques

Geostatistical estimator

Linear estimation

Methods

Ordinary kriging

Simple kriging

Properties

Applications

Design and analysis of computer experiments

Historical references

Books