Rao–Blackwell theorem

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.
The Rao–Blackwell theorem states that if g is any kind of estimator of a parameter θ, then the conditional expectation of g given T, where T is a sufficient statistic, is typically a better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g, and then evaluate that conditional expected value to get an estimator that is in various senses optimal.
The theorem is named after Calyampudi Radhakrishna Rao and David Blackwell. The process of transforming an estimator using the Rao–Blackwell theorem is sometimes called Rao–Blackwellization. The transformed estimator is called the Rao–Blackwell estimator.

Definitions

An estimator δ is an observable random variable used for estimating some unobservable quantity. For example, one may be unable to observe the average height of all male students at the University of X, but one may observe the heights of a random sample of 40 of them. The average height of those 40—the "sample average"—may be used as an estimator of the unobservable "population average".
A sufficient statistic T is a statistic calculated from data X to estimate some parameter θ for which no other statistic which can be calculated from data X provides any additional information about θ. It is defined as an observable random variable such that the conditional probability distribution of all observable data X given T does not depend on the unobservable parameter θ, such as the mean or standard deviation of the whole population from which the data X was taken. In the most frequently cited examples, the "unobservable" quantities are parameters that parametrize a known family of probability distributions according to which the data are distributed.
A Rao–Blackwell estimator δ₁ of an unobservable quantity θ is the conditional expected value E | T) of some estimator δ given a sufficient statistic T. Call δ the "original estimator" and δ₁ the "improved estimator". It is important that the improved estimator be observable, i.e. that it does not depend on θ. Generally, the conditional expected value of one function of these data given another function of these data does depend on θ, but the very definition of sufficiency given above entails that this one does not.
The mean squared error of an estimator is the expected value of the square of its deviation from the unobservable quantity being estimated.
The theorem

Mean-squared-error version

One case of Rao–Blackwell theorem states:
In other words,
The essential tools of the proof besides the definition above are the law of total expectation and the fact that for any random variable Y, E cannot be less than ². That inequality is a case of Jensen's inequality, although it may also be shown to follow instantly from the frequently mentioned fact that
More precisely, the mean square error of the Rao-Blackwell estimator has the following decomposition
Since, the Rao-Blackwell theorem immediately follows.

Convex loss generalization

The more general version of the Rao–Blackwell theorem speaks of the "expected loss" or risk function:
where the "loss function" L may be any convex function. If the loss function is twice-differentiable, as in the case for mean-squared-error, then we have the sharper inequality

Properties

The improved estimator is unbiased if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation. The theorem holds regardless of whether biased or unbiased estimators are used.
The theorem seems very weak: it says only that the Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous.

Example

Phone calls arrive at a switchboard according to a Poisson process at an average rate of λ per minute. This rate is not observable, but the numbers X₁,..., X_n of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e^−λ that the next one-minute period passes with no phone calls.
An extremely crude estimator of the desired probability is
i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator.
The sum
can be readily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data X₁,..., X_n, depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator
After doing some algebra we have
Since the average number of calls arriving during the first n minutes is nλ, one might not be surprised if this estimator has a fairly high probability of being close to
So δ₁ is clearly a very much improved estimator of that last quantity. In fact, since S_n is complete and δ₀ is unbiased, δ₁ is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem.

Idempotence

Rao–Blackwellization is an idempotent operation. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator.

Completeness and Lehmann–Scheffé minimum variance

If the conditioning statistic is both complete and sufficient, and the starting estimator is unbiased, then the Rao-Blackwell estimator is the unique "best unbiased estimator": see Lehmann–Scheffé theorem.
An example of an improvable Rao–Blackwell improvement, when using a minimal sufficient statistic that is not complete, was provided by Galili and Meilijson in 2016. Let be a random sample from a scale-uniform distribution with unknown mean and known design parameter. In the search for "best" possible unbiased estimators for it is natural to consider as an initial unbiased estimator for and then try to improve it. Since is not a function of, the minimal sufficient statistic for , it may be improved using the Rao–Blackwell theorem as follows:
However, the following unbiased estimator can be shown to have lower variance:
And in fact, it could be even further improved when using the following estimator:

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...