Spurious relationship

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

Examples

A well-known case of a spurious relationship can be found in the time-series literature, where a spurious regression is a regression that provides misleading statistical evidence of a linear relationship between independent non-stationary variables. In fact, the non-stationarity may be due to the presence of a unit root in both variables. In particular, any two nominal economic variables are likely to be correlated with each other, even when neither has a causal effect on the other, because each equals a real variable times the price level, and the common presence of the price level in the two data series imparts correlation to them.
An example of a spurious relationship can be seen by examining a city's ice cream sales. These sales are highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both. The heat wave is an example of a hidden or unseen variable, also known as a confounding variable.
Another commonly noted example is a series of Dutch statistics showing a positive correlation between the number of storks nesting in a series of springs and the number of human babies born at that time. Of course there was no causal connection; they were correlated with each other only because they were correlated with the weather nine months before the observations. However Höfer et al. showed the correlation to be stronger than just weather variations as he could show in post reunification Germany that, while the number of clinical deliveries was not linked with the rise in stork population, out of hospital deliveries correlated with the stork population.
In rare cases, a spurious relationship can occur between two completely unrelated variables without any confounding variable, as was the case between the success of the Washington Redskins professional football team in a specific game before each presidential election and the success of the incumbent President's political party in said election. For 16 consecutive elections between 1940 and 2000, the Redskins Rule correctly matched whether the incumbent President's political party would retain or lose the Presidency. The rule eventually failed shortly after Elias Sports Bureau discovered the correlation in 2000; in 2004, 2012 and 2016, the results of the Redskins game and the election did not match.

Hypothesis testing

Often one tests a null hypothesis of no correlation between two variables, and chooses in advance to reject the hypothesis if the correlation computed from a data sample would have occurred in less than 5% of data samples if the null hypothesis were true. While a true null hypothesis will be accepted 95% of the time, the other 5% of the times having a true null of no correlation a zero correlation will be wrongly rejected, causing acceptance of a correlation which is spurious. Here the spurious correlation in the sample resulted from random selection of a sample that did not reflect the true properties of the underlying population.

Detecting spurious relationships

The term "spurious relationship" is commonly used in statistics and in particular in experimental research techniques, both of which attempt to understand and predict direct causal relationships. A non-causal correlation can be spuriously created by an antecedent which causes both. Mediating variables,, if undetected, estimate a total effect rather than direct effect without adjustment for the mediating variable M. Because of this, experimentally identified correlations do not represent causal relationships unless spurious relationships can be ruled out.

Experiments

In experiments, spurious relationships can often be identified by controlling for other factors, including those that have been theoretically identified as possible confounding factors. For example, consider a researcher trying to determine whether a new drug kills bacteria; when the researcher applies the drug to a bacterial culture, the bacteria die. But to help in ruling out the presence of a confounding variable, another culture is subjected to conditions that are as nearly identical as possible to those facing the first-mentioned culture, but the second culture is not subjected to the drug. If there is an unseen confounding factor in those conditions, this control culture will die as well, so that no conclusion of efficacy of the drug can be drawn from the results of the first culture. On the other hand, if the control culture does not die, then the researcher cannot reject the hypothesis that the drug is efficacious.

Non-experimental statistical analyses

Disciplines whose data are mostly non-experimental, such as economics, usually employ observational data to establish causal relationships. The body of statistical techniques used in economics is called econometrics. The main statistical method in econometrics is multivariable regression analysis. Typically a linear relationship such as
is hypothesized, in which is the dependent variable, for j = 1, ..., k is the j^th independent variable, and is the error term. If there is reason to believe that none of the s is caused by y, then estimates of the coefficients are obtained. If the null hypothesis that is rejected, then the alternative hypothesis that and equivalently that causes y cannot be rejected. On the other hand, if the null hypothesis that cannot be rejected, then equivalently the hypothesis of no causal effect of on y cannot be rejected. Here the notion of causality is one of contributory causality: If the true value, then a change in will result in a change in y unless some other causative variable, either included in the regression or implicit in the error term, change in such a way as to exactly offset its effect; thus a change in is not sufficient to change y. Likewise, a change in is not necessary to change y, because a change in y could be caused by something implicit in the error term.
Regression analysis controls for other relevant variables by including them as regressors. This helps to avoid mistaken inference of causality due to the presence of a third, underlying, variable that influences both the potentially causative variable and the potentially caused variable: its effect on the potentially caused variable is captured by directly including it in the regression, so that effect will not be picked up as a spurious effect of the potentially causative variable of interest. In addition, the use of multivariate regression helps to avoid wrongly inferring that an indirect effect of, say x₁ is a direct effect.
Just as an experimenter must be careful to employ an experimental design that controls for every confounding factor, so also must the user of multiple regression be careful to control for all confounding factors by including them among the regressors. If a confounding factor is omitted from the regression, its effect is captured in the error term by default, and if the resulting error term is correlated with one of the included regressors, then the estimated regression may be biased or inconsistent.
In addition to regression analysis, the data can be examined to determine if Granger causality exists. The presence of Granger causality indicates both that x precedes y, and that x contains unique information about y.

Other relationships

There are several other relationships defined in statistical analysis as follows.

Direct relationship
Mediating relationship
Moderating relationship
Footnotes

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...