Overdispersion

In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given statistical model.
A common task in applied statistics is choosing a parametric model to fit a given set of empirical observations. This necessitates an assessment of the fit of the chosen model. It is usually possible to choose the model parameters in such a way that the theoretical population mean of the model is approximately equal to the sample mean. However, especially for simple models with few parameters, theoretical predictions may not match empirical observations for higher moments. When the observed variance is higher than the variance of a theoretical model, overdispersion has occurred. Conversely, underdispersion means that there was less variation in the data than predicted. Overdispersion is a very common feature in applied data analysis because in practice, populations are frequently heterogeneous contrary to the assumptions implicit within widely used simple parametric models.

Examples

Poisson

Overdispersion is often encountered when fitting very simple parametric models, such as those based on the Poisson distribution. The Poisson distribution has one free parameter and does not allow for the variance to be adjusted independently of the mean. The choice of a distribution from the Poisson family is often dictated by the nature of the empirical data. For example, Poisson regression analysis is commonly used to model count data. If overdispersion is a feature, an alternative model with additional free parameters may provide a better fit. In the case of count data, a Poisson mixture model like the negative binomial distribution can be proposed instead, in which the mean of the Poisson distribution can itself be thought of as a random variable drawn – in this case – from the gamma distribution thereby introducing an additional free parameter.

Binomial

As a more concrete example, it has been observed that the number of boys born to families does not conform faithfully to a binomial distribution as might be expected. Instead, the sex ratios of families seem to skew toward either boys or girls i.e. there are more all-boy families, more all-girl families and not enough families close to the population 51:49 boy-to-girl mean ratio than expected from a binomial distribution, and the resulting empirical variance is larger than specified by a binomial model.
In this case, the beta-binomial model distribution is a popular and analytically tractable alternative model to the binomial distribution since it provides a better fit to the observed data. To capture the heterogeneity of the families, one can think of the probability parameter of the binomial model as itself a random variable drawn for each family from a beta distribution as the mixing distribution. The resulting compound distribution has an additional free parameter.
Another common model for overdispersion—when some of the observations are not Bernoulli—arises from introducing a normal random variable into a logistic model. Software is widely available for fitting this type of multilevel model. In this case, if the variance of the normal variable is zero, the model reduces to the standard logistic regression. This model has an additional free parameter, namely the variance of the normal variable.
With respect to binomial random variables, the concept of overdispersion makes sense only if n>1.

Normal distribution

As the normal distribution has variance as a parameter, any data with finite variance can be modeled with a normal distribution with the exact variance – the normal distribution is a two-parameter model, with mean and variance. Thus, in the absence of an underlying model, there is no notion of data being overdispersed relative to the normal model, though the fit may be poor in other respects. However, in the case that the data is modeled by a normal distribution with an expected variation, it can be over- or under-dispersed relative to that prediction.
For example, in a statistical survey, the margin of error predicts the sampling error and hence dispersion of results on repeated surveys. If one performs a meta-analysis of repeated surveys of a fixed population, one expects the results to fall on normal distribution with standard deviation equal to the margin of error. However, in the presence of study heterogeneity where studies have different sampling bias, the distribution is instead a compound distribution and will be overdistributed relative to the predicted distribution. For example, given repeated opinion polls all with a margin of error of 3%, if they are conducted by different polling organizations, one expects the results to have standard deviation greater than 3%, due to pollster bias from different methodologies.

Differences in terminology among disciplines

Over- and underdispersion are terms which have been adopted in branches of the biological sciences. In parasitology, the term 'overdispersion' is generally used as defined here – meaning a distribution with a higher than expected variance.
In some areas of ecology, however, meanings have been transposed, so that overdispersion is actually taken to mean more even than expected. This confusion has caused some ecologists to suggest that the terms 'aggregated', or 'contagious', would be better used in ecology for 'overdispersed'. Such preferences are creeping into parasitology too. Generally this suggestion has not been heeded, and confusion persists in the literature.
Furthermore in demography, overdispersion is often evident in the analysis of death count data, but demographers prefer the term 'unobserved heterogeneity'.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...