Propensity score matching

In the statistical analysis of observational data, propensity score matching is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul Rosenbaum and Donald Rubin introduced the technique in 1983.
The possibility of bias arises because a difference in the treatment outcome between treated and untreated groups may be caused by a factor that predicts treatment rather than the treatment itself. In randomized experiments, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatment-groups will be balanced on average, by the law of large numbers. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. Matching attempts to reduce the treatment assignment bias, and mimic randomization, by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment.
For example, one may be interested to know the consequences of smoking. An observational study is required since it is unethical to randomly assign people to the treatment 'smoking.' The treatment effect estimated by simply comparing those who smoked to those who did not smoke would be biased by any factors that predict smoking. PSM attempts to control for these biases by making the groups receiving treatment and not-treatment comparable with respect to the control variables.

Overview

PSM is for cases of causal inference and simple selection bias in non-experimental settings in which: few units in the non-treatment comparison group are comparable to the treatment units; and selecting a subset of comparison units similar to the treatment unit is difficult because units must be compared across a high-dimensional set of pretreatment characteristics.
In normal matching, single characteristics that distinguish treatment and control groups are matched in an attempt to make the groups more alike. But if the two groups do not have substantial overlap, then substantial error may be introduced. For example, if only the worst cases from the untreated "comparison" group are compared to only the best cases from the treatment group, the result may be regression toward the mean, which may make the comparison group look better or worse than reality.
PSM employs a predicted probability of group membership—e.g., treatment versus control group—based on observed predictors, usually obtained from logistic regression to create a counterfactual group. Propensity scores may be used for matching or as covariates, alone or with other matching variables or covariates.

General procedure

1. Run logistic regression:

Dependent variable: Z = 1, if participate; Z = 0, otherwise.
Choose appropriate confounders
Obtain an estimation for the propensity score: predicted probability or log.

2. Check that propensity score is balanced across treatment and comparison groups, and check that covariates are balanced across treatment and comparison groups within strata of the propensity score.

Use standardized differences or graphs to examine distributions

3. Match each participant to one or more nonparticipants on propensity score:

Nearest neighbor matching
Caliper matching: comparison units within a certain width of the propensity score of the treated units get matched, where the width is generally a fraction of the standard deviation of the propensity score
Mahalanobis metric matching in conjunction with PSM
Stratification matching
Difference-in-differences matching
Exact matching

4. Verify that covariates are balanced across treatment and comparison groups in the matched or weighted sample
5. Multivariate analysis based on new sample

Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant

Note: When you have multiple matches for a single treated observation, it is essential to use Weighted Least Squares rather than Ordinary Least Squares.

Formal definitions

Basic settings

The basic case is of two treatments, with N subjects. Each subject i would respond to the treatment with and to the control with. The quantity to be estimated is the average treatment effect:. The variable indicates if subject i got treatment or control. Let be a vector of observed pretreatment measurement for the ith subject. The observations of are made prior to treatment assignment, but the features in may not include all of the ones used to decide on the treatment assignment. The numbering of the units are assumed to not contain any information beyond what is contained in. The following sections will omit the i index while still discussing about the stochastic behavior of some subject.

Strongly ignorable treatment assignment

Let some subject have a vector of covariates X, and some potential outcomes r₀ and r₁ under control and treatment, respectively. Treatment assignment is said to be strongly ignorable if the potential outcomes are independent of treatment conditional on background variables X. This can be written compactly as
where denotes statistical independence.

Balancing score

A balancing score b is a function of the observed covariates X such that the conditional distribution of X given b is the same for treated and control units:
The most trivial function is.

Propensity score

A propensity score is the probability of a unit being assigned to a particular treatment given a set of observed covariates. Propensity scores are used to reduce selection bias by equating groups based on these covariates.
Suppose that we have a binary treatment indicator Z, a response variable r, and background observed covariates X. The propensity score is defined as the conditional probability of treatment given background variables:

Main theorems

The following were first presented, and proven, by Rosenbaum and Rubin in 1983:

The propensity score is a balancing score.
Any score that is 'finer' than the propensity score is a balancing score. The propensity score is the coarsest balancing score function, as it takes a multidimensional object and transforms it into one dimension, while is the finest one.
If treatment assignment is strongly ignorable given X then:
Using estimates of the balancing score of units using the sample at hand can produce sample balance on X
Relationship to sufficiency

If we think of the value of Z as a parameter of the population that impacts the distribution of X then the balancing score serves as a sufficient statistic for Z. Furthermore, the above theorems indicate that the propensity score is a minimal sufficient statistic if thinking of Z as a parameter of X. Lastly, if treatment assignment Z is strongly ignorable given X then the propensity score is a minimal sufficient statistic for the joint distribution of.

Graphical test for detecting the presence of confounding variables

has shown that there exists a simple graphical test, called the back-door criterion, which detects the presence of confounding variables. To estimate the effect of treatment, the background variables X must block all back-door paths in the graph. This blocking can be done either by adding the confounding variable as a control in regression, or by matching on the confounding variable.

Advantages and disadvantages

PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias" and is no longer recommended compared to other matching methods. The insights behind the use of matching still hold but should be applied with other matching methods; propensity scores also have other productive uses in weighting and doubly robust estimation.
Like other matching procedures, PSM estimates an average treatment effect from observational data. The key advantages of PSM were, at the time of its introduction, that by using a linear combination of covariates for a single score, it balances treatment and control groups on a large number of covariates without losing a large number of observations. If units in the treatment and control were balanced on a large number of covariates one at a time, large numbers of observations would be needed to overcome the "dimensionality problem" whereby the introduction of a new balancing covariate increases the minimum necessary number of observations in the sample geometrically.
One disadvantage of PSM is that it only accounts for observed covariates. Factors that affect assignment to treatment and outcome but that cannot be observed cannot be accounted for in the matching procedure. As the procedure only controls for observed variables, any hidden bias due to latent variables may remain after matching. Another issue is that PSM requires large samples, with substantial overlap between treatment and control groups.
General concerns with matching have also been raised by Judea Pearl, who has argued that hidden bias may actually increase because matching on observed variables may unleash bias due to dormant unobserved confounders. Similarly, Pearl has argued that bias reduction can only be assured by modelling the qualitative causal relationships between treatment, outcome, observed and unobserved covariates. Confounding occurs when the experimenter is unable to control for alternative, non-causal explanations for an observed relationship between independent and dependent variables. Such control should satisfy the "backdoor criterion" of Pearl.

Implementations in statistics packages

R: propensity score matching is available as part of the MatchIt package. It can also easily be implemented manually.
SAS: The PSMatch procedure, and macro OneToManyMTCH match observations based on a propensity score.
Stata: several commands implement propensity score matching, including the user-written psmatch2. Stata version 13 and later also offers the built-in command teffects psmatch.
SPSS: A dialog box for Propensity Score Matching is available from the IBM SPSS Statistics menu, and allows the user to set the match tolerance, randomize case order when drawing samples, prioritize exact matches, sample with or without replacement, set a random seed, and maximize performance by increasing processing speed and minimizing memory usage. The FUZZY Python procedure can also easily be added as an extension to the software through the Extensions dialog box. This procedure matches cases and controls by utilizing random draws from the controls, based on a specified set of key variables. The FUZZY command supports exact and fuzzy matching.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...