Ecological fallacy

An ecological fallacy is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong. 'Ecological fallacy' is a term that is sometimes used to describe the fallacy of division, which is not a statistical fallacy. The four common statistical ecological fallacies are: confusion between ecological correlations and individual correlations, confusion between group average and total average, Simpson's paradox, and confusion between higher average and higher likelihood.

Examples

Mean and median

An example of ecological fallacy is the assumption that a population mean has a simple interpretation when considering likelihoods for an individual.
For instance, if the mean score of a group is larger than zero, this does not imply that a random individual of that group is more likely to have a positive score than a negative one. Similarly, if a particular group of people is measured to have a lower mean IQ than the general population, it is an error to conclude that a randomly-selected member of the group is more likely than not to have a lower IQ than the mean IQ of the general population; it is also not necessarily the case that a randomly selected member of the group is more likely than not to have a lower IQ than a randomly-selected member of the general population. Mathematically, this comes from the fact that a distribution can have a positive mean but a negative median. This property is linked to the skewness of the distribution.
Consider the following numerical example:

Group A: 80% of people got 40 points and 20% of them got 95 points. The mean score is 51 points.
Group B: 50% of people got 45 points and 50% got 55 points. The mean score is 50 points.
If we pick two people at random from A and B, there are 4 possible outcomes:
* A – 40, B – 45
* A – 40, B – 55
* A – 95, B – 45
* A – 95, B – 55
Although Group A has a higher mean score, 80% of the time a random individual of A will score lower than a random individual of B.
Individual and aggregate correlations

Assume that at the individual level, being Protestant reduces one's tendency to commit suicide but the probability that one's neighbor commits suicide increases one's tendency to become Protestant. Then, even if at the individual level there is negative correlation between suicidal tendencies and Protestantism, there can be a positive correlation at the aggregate level.
The aggregate model correctly describes a uniquely positive correlation between becoming Protestant and neighborly suicide, if and only if, inside each other religion, one's tendency to convert or become more religious is not positively correlated with neighbors committing suicide.
Similarly, even if at the individual level, wealth is positively correlated to tendency to vote Republican, we observe that wealthier states tend to vote Democratic. For example, in 2004, the Republican candidate, George W. Bush, won the fifteen poorest states, and the Democratic candidate, John Kerry, won 9 of the 11 wealthiest states. Yet 62% of voters with annual incomes over $200,000 voted for Bush, but only 36% of voters with annual incomes of $15,000 or less voted for Bush.
Aggregate-level correlation will differ from individual-level correlation if voting preferences are affected by the total wealth of the state even after controlling for individual wealth. It could be that the true driving factor in voting preference is self-perceived relative wealth; perhaps those who see themselves as better off than their neighbours are more likely to vote Republican. In this case, an individual would be more likely to vote Republican if she became wealthier, but she would be more likely to vote for a Democrat if her neighbor's wealth increased.
However, the observed difference in voting habits based on state-level and individual-level wealth could also be explained by the common confusion between higher averages and higher likelihoods as discussed above. States may not be wealthier because they contain more wealthy people, but rather because they contain a small number of super-rich individuals; the ecological fallacy then results from incorrectly assuming that individuals in wealthier states are more likely to be wealthy.
An early example of the ecological fallacy was Émile Durkheim's 1897 study of suicide in France, although this has been debated by some.
Many examples of ecological fallacies can be found in studies of social networks, which often combine analysis and implications from different levels. This has been illustrated in an academic paper on networks of farmers in Sumatra.

Robinson's paradox

A 1950 paper by William S. Robinson computed the illiteracy rate and the proportion of the population born outside the US for each state and for the District of Columbia, as of the 1930 census. He showed that these two figures were associated with a negative correlation of −0.53; in other words, the greater the proportion of immigrants in a state, the lower its average illiteracy. However, when individuals are considered, the correlation was +0.12. Robinson showed that the negative correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or "ecological" data. In 2011, it was found that Robinson's calculations of the ecological correlations are based on the wrong state level data. The correlation of −0.53 mentioned above is in fact −0.46. Robinson's paper was seminal, but the term 'ecological fallacy' was not coined until 1958 by Selvin.

Formal problem

The correlation of aggregate quantities is not equal to the correlation of individual quantities. Denote by X_i, Y_i two quantities at the individual level. The formula for the covariance of the aggregate quantities in groups of size N is
The covariance of two aggregated variables depends not only on the covariance of two variables within the same individuals but also on covariances of the variables between different individuals. In other words, correlation of aggregate variables take into account cross sectional effects which are not relevant at the individual level.
The problem for correlations entails naturally a problem for regressions on aggregate variables: the correlation fallacy is therefore an important issue for a researcher who wants to measure causal impacts. Start with a regression model where the outcome is impacted by
The regression model at the aggregate level is obtained by summing the individual equations:
Nothing prevents the regressors and the errors from being correlated at the aggregate level. Therefore, generally, running a regression on aggregate data does not estimate the same model than running a regression with individual data.
The aggregate model is correct if and only if
This means that, controlling for, does not determine.

Choosing between aggregate and individual inference

There is nothing wrong in running regressions on aggregate data if one is interested in the aggregate model. For instance, for the governor of a state, it is correct to run regressions between police force on crime rate at the state level if one is interested in the policy implication of a rise in police force. However, an ecological fallacy would happen if a city council deduces the impact of an increase in police force in the crime rate at the city level from the correlation at the state level.
Choosing to run aggregate or individual regressions to understand aggregate impacts on some policy depends on the following trade-off: aggregate regressions lose individual level data but individual regressions add strong modeling assumptions. Some researchers suggest that the ecological correlation gives a better picture of the outcome of public policy actions, thus they recommend the ecological correlation over the individual level correlation for this purpose. Other researchers disagree, especially when the relationships among the levels are not clearly modeled. To prevent ecological fallacy, researchers with no individual data can model first what is occurring at the individual level, then model how the individual and group levels are related, and finally examine whether anything occurring at the group level adds to the understanding of the relationship. For instance, in evaluating the impact of state policies, it is helpful to know that policy impacts vary less among the states than do the policies themselves, suggesting that the policy differences are not well translated into results, despite high ecological correlations.

Group and total averages

Ecological fallacy can also refer to the following fallacy: the average for a group is approximated by the average in the total population divided by the group size. Suppose one knows the number of Protestants and the suicide rate in the USA, but one does not have data linking religion and suicide at the individual level. If one is interested in the suicide rate of Protestants, it is a mistake to estimate it by the total suicide rate divided by the number of Protestants.
Formally, denote the mean of the group, we generally have:
However, the law of total probability gives
As we know that is between 0 and 1, this equation gives a bound for.

Simpson's paradox

A striking ecological fallacy is Simpson's paradox: the fact that when comparing two populations divided into groups, the average of some variable in the first population can be higher in every group and yet lower in the total population. Formally, when each value of Z refers to a different group and X refers to some treatment, it can happen that
When does not depend on, the Simpson's paradox is exactly the omitted variable bias for the regression of Y on X where the regressor is a dummy variable and the omitted variable is a categorical variable defining groups for each value it takes. The application is striking because the bias is high enough that parameters have opposite signs.

Legal applications

The ecological fallacy was discussed in a court challenge to the 2004 Washington gubernatorial election in which a number of illegal voters were identified, after the election; their votes were unknown, because the vote was by secret ballot. The challengers argued that illegal votes cast in the election would have followed the voting patterns of the precincts in which they had been cast, and thus adjustments should be made accordingly. An expert witness said this approach was like trying to figure out Ichiro Suzuki's batting average by looking at the batting average of the entire Seattle Mariners team, since the illegal votes were cast by an unrepresentative sample of each precinct's voters, and might be as different from the average voter in the precinct as Ichiro was from the rest of his team. The judge determined that the challengers' argument was an ecological fallacy and rejected it.

Citations

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...