Clustered standard errors

Clustered standard errors are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups and where the sampling and/or treatment assignment is correlated within each group. Clustered standard errors are widely used in a variety of applied econometric settings, including difference-in-differences or experiments. Analogous to how Huber-White standard errors are consistent in the presence of heteroscedasticity and Newey–West standard errors are consistent in the presence of accurately-modeled autocorrelation, clustered standard errors are consistent in the presence of cluster-based sampling or treatment assignment. Clustered standard errors are often justified by possible correlation in modeling residuals within each cluster; while recent work suggests that this is not the precise justification behind clustering, it may be pedagogically useful.

Intuitive Motivation

Clustered standard errors are often useful when treatment is assigned at the level of a cluster instead of at the individual level. For example, suppose that an educational researcher wants to discover whether a new teaching technique improves student test scores. She therefore assigns teachers in "treated" classrooms to try this new technique, while leaving "control" classrooms unaffected. When analyzing her results, she may want to keep the data at the student level. However, when estimating the standard error or confidence interval of her statistical model, she realizes that classical or even heteroscedasticity-robust standard errors are inappropriate because student test scores within each class are not independently distributed. Instead, students in classes with better teachers have especially high test scores while students in classes with worse teachers have especially low test scores. The researcher can cluster her standard errors at the level of a classroom to account for this aspect of her experiment.
While this example is very specific, similar issues arise in a wide variety of settings. For example, in many panel data settings clustering often offers a simple and effective way to account for non-independence between periods within each unit. Another common and logically distinct justification for clustering arises when a full population cannot be randomly sampled, and so instead clusters are sampled and then units are randomized within cluster. In this case, clustered standard errors account for the uncertainty driven by the fact that the researcher does not observe large parts of the population of interest.

Mathematical Motivation

A useful mathematical illustration comes from the case of one-way clustering in an ordinary least squares model. Consider a simple model with N observations that are subdivided in C clusters. Let be an vector of outcomes, a matrix of covariates, an vector of unknown parameters, and an vector of unexplained residuals:
As is standard with OLS models, we minimize the sum of squared residuals to get an estimate :
From there, we can derive the classic "sandwich" estimator:
Denoting yields a potentially more familiar form
While one can develop a plug-in estimator by defining and letting, this completely flexible estimator will not converge to as. Given the assumptions that a practitioner deems as reasonable, different types of standard errors solve this problem in different ways. For example, classic homoskedastic standard errors assume that is diagonal with identical elements, which simplifies the expression for. Huber-White standard errors assume is diagonal but that the diagonal value varies, while other types of standard errors make other restrictions on the form of this matrix to reduce the number of parameters that the practitioner needs to estimate.
Clustered standard errors assume that is block-diagonal according to the clusters in the sample, with unrestricted values in each block but zeros elsewhere. In this case, one can define and as the within-block analogues of and and derive the following mathematical fact:
By constructing plug-in matrices, one can form an estimator for that is consistent as the number of clusters becomes large. While no specific number of clusters is statistically proven to be sufficient, practitioners often cite a number in the range of 30-50 and are comfortable using clustered standard errors when the number of clusters exceeds that threshold.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...