False discovery rate
The false discovery rate is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the expected proportion of "discoveries" that are false. FDR-controlling procedures provide less stringent control of Type I errors compared to familywise error rate controlling procedures, which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased numbers of Type I errors.
History
Technological motivations
The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals. By the late 1980s and 1990s, the development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform hundreds and thousands of statistical tests on a given data set. The technology of microarrays was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions.As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes and large numbers of variables being measured per sample. In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard multiple comparison procedures. This created a need within many scientific communities to abandon FWER and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than FWER in flagging possibly noteworthy observations.
Literature
The FDR concept was formally described by Yoav Benjamini and Yosef Hochberg in 1995 as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields. In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers.Prior to the 1995 introduction of the FDR concept, various precursor ideas had been considered in the statistics literature. In 1979, Holm proposed the Holm procedure, a stepwise algorithm for controlling the FWER that is at least as powerful as the well-known Bonferroni adjustment. This stepwise algorithm sorts the p-values and sequentially rejects the hypotheses starting from the smallest p-values.
Benjamini said that the false discovery rate, and the paper Benjamini and Hochberg, had its origins in two papers concerned with multiple testing:
- The first paper is by Schweder and Spjotvoll who suggested plotting the ranked p-values and assessing the number of true null hypotheses via an eye-fitted line starting from the largest p-values. The p-values that deviate from this straight line then should correspond to the false null hypotheses. This idea was later developed into an algorithm and incorporated the estimation of into procedures such as Bonferroni, Holm or Hochberg. This idea is closely related to the graphical interpretation of the [|BH procedure].
- The second paper is by Branko Soric which introduced the terminology of "discovery" in the multiple hypothesis testing context. Soric used the expected number of false discoveries divided by the number of discoveries as a warning that "a large part of statistical discoveries may be wrong". This led Benjamini and Hochberg to the idea that a similar error rate, rather than being merely a warning, can serve as a worthy goal to control.
Definitions
Based on definitions below we can define as the proportion of false discoveries among the discoveries :where is the number of false discoveries and is the number of true discoveries.
The false discovery rate is then simply:
where is the expected value of. The goal is to keep FDR below a given threshold q. To avoid division by zero, is defined to be 0 when. Formally,.
Classification of multiple hypothesis tests
Controlling procedures
The settings for many procedures is such that we have null hypotheses tested and their corresponding p-values. We list these p-values in ascending order and denote them by. A procedure that goes from a small p-value to a large one will be called a step-up procedure. In a similar way, in a "step-down" procedure we move from a large corresponding test statistic to a smaller one.Benjamini–Hochberg procedure
The Benjamini–Hochberg procedure controls the FDR at level. It works as follows:- For a given, find the largest such that
- Reject the null hypothesis for all for.
The BH procedure is valid when the tests are independent, and also in various scenarios of dependence, but is not universally valid. It also satisfies the inequality:
If an estimator of is inserted into the BH procedure, it is no longer guaranteed to achieve FDR control at the desired level. Adjustments may be needed in the estimator and several modifications have been proposed.
Note that the mean for these tests is, the Mean or MFDR, adjusted for independent or positively correlated tests. The MFDR expression here is for a single recomputed value of and is not part of the Benjamini and Hochberg method.
Benjamini–Yekutieli procedure
The Benjamini–Yekutieli procedure controls the false discovery rate under arbitrary dependence assumptions. This refinement modifies the threshold and finds the largest such that:- If the tests are independent or positively correlated :
- Under arbitrary dependence:
Using MFDR and formulas above, an adjusted MFDR, or AFDR, is the min for dependent tests.
The other way to address dependence is by bootstrapping and rerandomization.
Properties
Adaptive and scalable
Using a multiplicity procedure that controls the FDR criterion is adaptive and scalable. Meaning that controlling the FDR can be very permissive, or conservative - all depending on the number of hypotheses tested and the level of significance.The FDR criterion adapts so that the same number of false discoveries will have different implications, depending on the total number of discoveries. This contrasts with the family wise error rate criterion. For example, if inspecting 100 hypotheses :
- If we make 4 discoveries, having 2 of them be false discoveries is often very costly. Whereas,
- If we make 50 discoveries, having 2 of them be false discoveries is often not very costly.
- If we make 100 discoveries, having 5 of them be false discoveries may not be very costly.
- Similarly, if we make 1000 discoveries, having 50 of them be false discoveries may still not be very costly.
Dependency among the test statistics
- Independent:
- Independent and continuous:
- Positive dependent:
- In the general case: , where is the Euler–Mascheroni constant.
Proportion of true hypotheses
Related concepts
The discovery of the FDR was preceded and followed by many other types of error rates. These include:- is defined as:. Testing individually each hypothesis at level guarantees that
- is defined as:. There are numerous procedures that control the FWER.
- , suggested by Lehmann and Romano, van der Laan at al, is defined as:.
- is defined as:.
- is the proportion of false discoveries among the discoveries", suggested by Soric in 1989, and is defined as:. This is a mixture of expectations and realizations, and has the problem of control for.
- was used by Benjamini and Hochberg, and later called "Fdr" by Efron and earlier. It is defined as:. This error rate cannot be strictly controlled because it is 1 when.
- was used by Benjamini and Hochberg, and later called "pFDR" by Storey. It is defined as:. This error rate cannot be strictly controlled because it is 1 when.
- False exceedance rate, defined as:
- . Associated with each hypothesis i is a weight, the weights capture importance/price. The W-FDR is defined as:.
- . Stemming from statistical process control: associated with each hypothesis i is a cost and with the intersection hypothesis a cost. The motivation is that stopping a production process may incur a fixed cost. It is defined as:
- is defined as:.
- by Sarkar; Genovese and Wasserman is defined as:
- is defined as:
- The local fdr is defined as:
False coverage rate