Causal inference


Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed. The science of why things occur is called etiology. Causal inference is an example of causal reasoning.

Definition

Inferring the cause of something has been described as:
Epidemiological studies employ different epidemiological methods of collecting and measuring evidence of risk factors and effect and different ways of measuring association between the two. A hypothesis is formulated, and then tested with statistical methods. It is statistical inference that helps decide if data are due to chance, also called random variation, or indeed correlated and if so how strongly. However, correlation does not imply causation, so further methods must be used to infer causation.
Common frameworks for causal inference are structural equation modeling and the Rubin causal model.

In epidemiology

studies patterns of health and disease in defined populations of living beings in order to infer causes and effects. An association between an exposure to a putative risk factor and a disease may be suggestive of, but is not equivalent to causality because correlation does not imply causation. Historically, Koch's postulates have been used since the 19th century to decide if a microorganism was the cause of a disease. In the 20th century the Bradford Hill criteria, described in 1965 have been used to assess causality of variables outside microbiology, although even these criteria are not exclusive ways to determine causality.
In molecular epidemiology the phenomena studied are on a molecular biology level, including genetics, where biomarkers are evidence of cause or effects.
A recent trend is to identify evidence for influence of the exposure on molecular pathology within diseased tissue or cells, in the emerging interdisciplinary field of molecular pathological epidemiology. Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. Considering the inherent nature of heterogeneity of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and public health sciences, exemplified as personalized medicine and precision medicine.

In computer science

Determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. The primary approaches are based on Algorithmic information theory models and noise models.

Algorithmic information models

Compare two programs, both of which output both and.
The shortest such program implies the uncompressed stored variable more-likely causes the computed variable.

Noise models

Incorporate an independent noise term in the model to compare the evidences of the two directions.
Here are some of the noise models for the hypothesis Y → X with the noise E:
The common assumption in these models are:
On an intuitive level, the idea is that the factorization of the joint distribution P into P*P typically yields models of lower total complexity than the factorization into P*P. Although the notion of “complexity” is intuitively appealing, it is not obvious how it should be precisely defined. A different family of methods attempt to discover causal "footprints" from large amounts of labeled data, and allow the prediction of more flexible causal relations.

In statistics and economics

In statistics and economics, causality is often tested via regression analysis. Several methods can be used to distinguish actual causality from spurious correlations. First, economists constructing regression models establish the direction of causal relation based on economic theory. For example, if one studies the dependency between rainfall and the future price of a commodity, then theory indicates that rainfall can influence prices, but futures prices cannot make changes to the amount of rain. Second, the instrumental variables technique may be employed to remove any reverse causation by introducing a role for other variables that are known to be unaffected by the dependent variable. Third, economists consider time precedence to choose appropriate model specification. Given that partial correlations are symmetrical, one cannot determine the direction of causal relation based on correlations only. Based on the notion of probabilistic view on causality, economists assume that causes must be prior in time than their effects. This leads to using the variables representing phenomena happening earlier as independent variables and developing econometric tests for causality applicable in time series analysis. Fifth, other regressors are included to ensure that confounding variables are not causing a regressor to appear to be significant spuriously but, in the areas suffering from the problem of multicollinearity such as macroeconomics, it is in principle impossible to include all confounding factors and therefore econometric models are susceptible to the common-cause fallacy.. Recently, the movement of design-based econometrics has popularized using natural experiments and quasi-experimental research designs to address the problem of spurious correlations.

In social science

The social sciences have moved increasingly toward a quantitative framework for assessing causality. Much of this has been described as a means of providing greater rigor to social science methodology. Political science was significantly influenced by the publication of Designing Social Inquiry, by Gary King, Robert Keohane, and Sidney Verba, in 1994. King, Keohane, and Verba recommended that researchers applying both quantitative and qualitative methods adopt the language of statistical inference to be clearer about their subjects of interest and units of analysis. Proponents of quantitative methods have also increasingly adopted the potential outcomes framework, developed by Donald Rubin, as a standard for inferring causality.
Debates over the appropriate application of quantitative methods to infer causality resulted in increased attention to the reproducibility of studies. Critics of widely-practiced methodologies argued that researchers have engaged in P hacking to publish articles on the basis of spurious correlations. To prevent this, some have advocated that researchers preregister their research designs prior to conducting to their studies so that they do not inadvertently overemphasize a non-reproducible finding that was not the initial subject of inquiry but was found to be statistically significant during data analysis. Internal debates about methodology and reproducibility within the social sciences have at times been acrimonious.
While much of the emphasis remains on statistical inference in the potential outcomes framework, social science methodologists have developed new tools to conduct causal inference with both qualitative and quantitative methods, sometimes called a “mixed methods” approach. Advocates of diverse methodological approaches argue that different methodologies are better suited to different subjects of study. Sociologist Herbert Smith and Political Scientists James Mahoney and Gary Goertz have cited the observation of Paul Holland, a statistician and author of the 1986 article “Statistics and Causal Inference,” that statistical inference is most appropriate for assessing the “effects of causes” rather than the “causes of effects.” Qualitative methodologists have argued that formalized models of causation, including process tracing and fuzzy set theory, provide opportunities to infer causation through the identification of critical factors within case studies or through a process of comparison among several case studies. These methodologies are also valuable for subjects in which a limited number of potential observations or the presence of confounding variables would limit the applicability of statistical inference.