Cramér–Rao bound


In estimation theory and statistics, the Cramér–Rao bound expresses a lower bound on the variance of unbiased estimators of a deterministic parameter, stating that the variance of any such estimator is at least as high as the inverse of the Fisher information. The result is named in honor of Harald Cramér and C. R. Rao, but has independently also been derived by Maurice Fréchet, Georges Darmois, as well as Alexander Aitken and Harold Silverstone.
An unbiased estimator which achieves this lower bound is said to be efficient. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information.
The Cramér–Rao bound can also be used to bound the variance of biased estimators of given bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias.

Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain [|regularity conditions], which hold for most well-behaved distributions. These conditions are listed [|later in this section].

Scalar unbiased case

Suppose is an unknown deterministic parameter which is to be estimated from independent observations of, each distributed according to some probability density function. The variance of any unbiased estimator of is then bounded by the reciprocal of the Fisher information :
where the Fisher information is defined by
and is the natural logarithm of the likelihood function for a single sample and denotes the expected value. If is twice differentiable and certain regularity conditions hold then the Fisher information can also be defined as follows:
The efficiency of an unbiased estimator measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as
or the minimum possible variance for an unbiased estimator divided by its actual variance.
The Cramér–Rao lower bound thus gives

General scalar case

A more general form of the bound can be obtained by considering a biased estimator, whose expectation is not but a function of this parameter, say,. Hence is not generally equal to 0. In this case, the bound is given by
where is the derivative of , and is the Fisher information defined [|above].

Bound on the variance of biased estimators

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator with bias, and let. By the result above, any unbiased estimator whose expectation is has variance greater than or equal to. Thus, any estimator whose bias is given by a function satisfies
The unbiased version of the bound is a special case of this result, with.
It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation we find that the mean squared error of a biased estimator is bounded by
using the standard decomposition of the MSE. Note, however, that if this bound might be less than the unbiased Cramér–Rao bound. For instance, in the [|example of estimating variance below],.

Multivariate case

Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector
with probability density function which satisfies the two regularity conditions below.
The Fisher information matrix is a matrix with element defined as
Let be an estimator of any vector function of parameters,, and denote its expectation vector by. The Cramér–Rao bound then states that the covariance matrix of satisfies
where
If is an unbiased estimator of , then the Cramér–Rao bound reduces to
If it is inconvenient to compute the inverse of the Fisher information matrix,
then one can simply take the reciprocal of the corresponding diagonal element
to find a lower bound.

Regularity conditions

The bound relies on two weak regularity conditions on the probability density function,, and the estimator :
Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of as well, i.e.,
In this case, it can be shown that the Fisher information equals
The Cramèr–Rao bound can then be written as
In some cases, this formula gives a more convenient technique for evaluating the bound.

Single-parameter proof

The following is a proof of the general scalar case of the Cramér–Rao bound described above. Assume that is an estimator with expectation , i.e. that. The goal is to prove that, for all,
Let be a random variable with probability density function.
Here is a statistic, which is used as an estimator for. Define as the score:
where the chain rule is used in the final equality above. Then the expectation of, written, is zero. This is because:
where the integral and partial derivative have been interchanged.
If we consider the covariance of and, we have, because. Expanding this expression we have
again because the integration and differentiation operations commute.
The Cauchy–Schwarz inequality shows that
therefore
which proves the proposition.

Examples

Multivariate normal distribution

For the case of a d-variate normal distribution
the Fisher information matrix has elements
where "tr" is the trace.
For example, let be a sample of independent observations with unknown mean and known variance .
Then the Fisher information is a scalar given by
and so the Cramér–Rao bound is

Normal variance with known mean

Suppose X is a normally distributed random variable with known mean and unknown variance. Consider the following statistic:
Then T is unbiased for, as. What is the variance of T?
. The first term is the fourth moment about the mean and has value ; the second is the square of the variance, or.
Thus
Now, what is the Fisher information in the sample? Recall that the score V is defined as
where is the likelihood function. Thus in this case,
where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or
Thus the information in a sample of independent observations is just times this, or
The Cramer–Rao bound states that
In this case, the inequality is saturated, showing that the estimator is efficient.
However, we can achieve a lower mean squared error using a biased estimator. The estimator
obviously has a smaller variance, which is in fact
Its bias is
so its mean squared error is
which is clearly less than the Cramér–Rao bound found above.
When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n + 1, rather than n − 1 or n + 2.