John D. Storey


John D. Storey is the William R. Harman '63 and Mary-Love Harman Professor in Genomics at Princeton University. His research is focused on statistical inference of high-dimensional data, particularly genomic data. Storey was the founding director of the Princeton University Center for Statistics and Machine Learning.

Research

Storey's early research focused on the false discovery rate. At the time the false discovery rate had only been studied in the context of sequential p-value methods and it was not yet in widespread use. However, Storey showed that false discovery rates can be approached through point estimation opening up this very active branch of statistics to false discovery rates. He simultaneously proved a result showing that the positive false discovery rate is exactly equal to a Bayesian posterior probability, thereby providing the first direct connection between false discovery rates and Bayesian theory. In these works, he also invented the q-value, which is a false discovery rate analogue of the p-value. Storey then introduced false discovery rates and q-values as widely applicable measures of statistical significance in genomics, shifting the focus from false positive control to false discovery rate control.
With Jeff Leek, Storey discovered that "expression heterogeneity", or unmodeled sources of systematic variation in gene expression data, are very prevalent and need to be modeled and corrected when analyzing genome-wide gene expression data. Leek and Storey introduced "surrogate variable analysis", which is a high-dimensional regression model that includes both known and unknown covariates. He has developed a number of methods for estimating this model.
Recently, Storey has shifted his focus to population genomics, where he has introduced genome-wide models of allele frequencies, Hardy–Weinberg equilibrium, and F-statistics that hold under arbitrary population structures.

Honors and awards