Volcano plot (statistics)


In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate data points between two conditions and one wishes to quickly identify the most meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test with the magnitude of the change, enabling quick visual identification of those data-points that display large magnitude changes that are also statistically significant.
A volcano plot is constructed by plotting the negative log of the p value on the y axis. This results in data points with low p values appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or right-hand sides. These represent values that display large magnitude fold changes as well as high statistical significance.
Additional information can be added by coloring the points according to a third dimension of data, but this is not uniformly employed. Volcano plots are also used to graphically display a significance analysis of microarrays gene selection criterion, an example of regularization.
The concept of volcano plot can be generalized to other applications, where the x axis is related to a measure of
the strength of a statistical signal, and y axis is related to a measure of the statistical significance of the signal.
For example, in a genetic association case-control study, such as Genome-wide association study,
a point in a volcano plot represents a single-nucleotide polymorphism.
Its x value can be the odds ratio and its y value can be -log10 of the p value from a Chi-square test
or a Chi-square test statistic.