Anderson–Darling test

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.
K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.
In addition to its use as a test of fit for distributions, it can be used in parameter estimation as the basis for a form of minimum distance estimation procedure.
The test is named after Theodore Wilbur Anderson and Donald A. Darling, who invented it in 1952.

The single-sample test

The Anderson–Darling and Cramér–von Mises statistics belong to the class of
quadratic EDF statistics. If the hypothesized distribution is, and empirical cumulative distribution function is, then the quadratic EDF statistics measure the distance between and by
where is the number of elements in the sample, and is a weighting function. When the weighting function is, the statistic
is the Cramér–von Mises statistic. The Anderson–Darling test is based on the distance
which is obtained when the weight function is. Thus, compared with the Cramér–von Mises distance, the Anderson–Darling distance places more weight on observations in the tails of the distribution.

Basic test statistic

The Anderson–Darling test assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothesized underlying distribution and assuming the data does arise from this distribution, the cumulative distribution function of the data can be assumed to follow a uniform distribution. The data can be then tested for uniformity with a distance test. The formula for the test statistic to assess if data comes from a CDF is
where
The test statistic can then be compared against the critical values of the theoretical distribution. Note that in this case no parameters are estimated in relation to the cumulative distribution function.

Tests for families of distributions

Essentially the same test statistic can be used in the test of fit of a family of distributions, but then it must be compared against the critical values appropriate to that family of theoretical distributions and dependent also on the method used for parameter estimation.

Test for normality

Empirical testing has found that the Anderson–Darling test is not quite as good as the Shapiro–Wilk test, but is better than other tests. Stephens found to be one of the best empirical distribution function statistics for detecting most departures from normality.
The computation differs based on what is known about the distribution:

Case 0: The mean and the variance are both known.
Case 1: The variance is known, but the mean is unknown.
Case 2: The mean is known, but the variance is unknown.
Case 3: Both the mean and the variance are unknown.

The n observations,, for, of the variable must be sorted such that and the notation in the following assumes that X_i represent the ordered observations. Let
The values are standardized to create new values, given by
With the standard normal CDF, is calculated using
An alternative expression in which only a single observation is dealt with at each step of the summation is:
A modified statistic can be calculated using
If or exceeds a given critical value, then the hypothesis of normality is rejected with
some significance level. The critical values are given in the table below for values of.
Note 1: If = 0 or any then cannot be calculated and is undefined.
Note 2: The above adjustment formula is taken from Shorak & Wellner. Care is required in comparisons across different sources as often the specific adjustment formula is not stated.
Note 3: Stephens notes that the test becomes better when the parameters are computed from the data, even if they are known.
Note 4: Marsaglia & Marsaglia provide a more accurate result for Case 0 at 85% and 99%.

Case	n	15%	10%	5%	2.5%	1%
0		1.621	1.933	2.492	3.070	3.878
1			0.908	1.105	1.304	1.573
2			1.760	2.323	2.904	3.690
3	10	0.514	0.578	0.683	0.779	0.926
	20	0.528	0.591	0.704	0.815	0.969
	50	0.546	0.616	0.735	0.861	1.021
	100	0.559	0.631	0.754	0.884	1.047
		0.576	0.656	0.787	0.918	1.092

Alternatively, for case 3 above, D'Agostino in Table 4.7 on p. 123 and on pages 372–373 gives the adjusted statistic:
and normality is rejected if exceeds 0.631, 0.752, 0.873, 1.035, or 1.159 at 10%, 5%, 2.5%, 1%, and 0.5% significance levels, respectively; the procedure is valid for sample size at least n=8. The formulas for computing the p-values for other values of are given in Table 4.9 on p. 127 in the same book.

Tests for other distributions

Above, it was assumed that the variable was being tested for normal distribution. Any other family of distributions can be tested but the test for each family is implemented by using a different modification of the basic test statistic and this is referred to critical values specific to that family of distributions. The modifications of the statistic and tables of critical values are given by Stephens for the exponential, extreme-value, Weibull, gamma, logistic, Cauchy, and von Mises distributions. Tests for the log-normal distribution can be implemented by transforming the data using a logarithm and using the above test for normality. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published by Pearson & Hartley. Details for these distributions, with the addition of the Gumbel distribution, are also given by Shorak & Wellner. Details for the logistic distribution are given by Stephens. A test for the Weibull distribution can be obtained by making use of the fact that the logarithm of a Weibull variate has a Gumbel distribution.

Non-parametric ''k''-sample tests

Fritz Scholz and Michael A. Stephens discuss a test, based on the Anderson–Darling measure of agreement between distributions, for whether a number of random samples with possibly different sample sizes may have arisen from the same distribution, where this distribution is unspecified. The R package kSamples implements this rank test for comparing k samples among several other such rank tests.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...