Probability distribution fitting

Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
The aim of distribution fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the phenomenon in a certain interval.
There are many probability distributions of which some can be fitted more closely to the observed frequency of the data than others, depending on the characteristics of the phenomenon and of the distribution. The distribution giving a close fit is supposed to lead to good predictions.
In distribution fitting, therefore, one needs to select a distribution that suits the data well.

Selection of distribution

The selection of the appropriate distribution depends on the presence or absence of symmetry of the data set with respect to the mean value.
Symmetrical distributions
When the data are symmetrically distributed around the mean while the frequency of occurrence of data farther away from the mean diminishes, one may for example select the normal distribution, the logistic distribution, or the Student's t-distribution. The first two are very similar, while the last, with one degree of freedom, has "heavier tails" meaning that the values farther away from the mean occur relatively more often. The Cauchy distribution is also symmetric.
Skew distributions to the right
When the larger values tend to be farther away from the mean than the smaller values, one has a skew distribution to the right, one may for example select the log-normal distribution, the log-logistic distribution, the Gumbel distribution, the exponential distribution, the Pareto distribution, the Weibull distribution, the Burr distribution, or the Fréchet distribution. The last four distributions are bounded to the left.
Skew distributions to the left
When the smaller values tend to be farther away from the mean than the larger values, one has a skew distribution to the left, one may for example select the square-normal distribution , the inverted Gumbel distribution, the Dagum distribution, or the Gompertz distribution, which is bounded to the left.

Techniques of fitting

The following techniques of distribution fitting exist:

Parametric methods, by which the parameters of the distribution are calculated from the data series. The parametric methods are:
*method of moments
*maximum spacing estimation
*method of L-moments
*Maximum likelihood method

by the regression method with added confidence band using cumfreq

Regression method, using a transformation of the cumulative distribution function so that a linear relation is found between the cumulative probability and the values of the data, which may also need to be transformed, depending on the selected probability distribution. In this method the cumulative probability needs to be estimated by the plotting position.
Generalization of distributions

It is customary to transform data logarithmically to fit symmetrical distributions to data obeying a distribution that is positively skewed, see lognormal distribution and the loglogistic distribution. A similar effect can be achieved by taking the square root of the data.
To fit a symmetrical distribution to data obeying a negatively skewed distribution one could use the squared values of the data to accomplish the fit.
More generally one can raise the data to a power p in order to fit symmetrical distributions to data obeying a distribution of any skewness, whereby p < 1 when the skewness is positive and p > 1 when the skewness is negative. The optimal value of p is to be found by a numerical method. The numerical method may consist of assuming a range of p values, then applying the distribution fitting procedure repeatedly for all the assumed p values, and finally selecting the value of p for which the sum of squares of deviations of calculated probabilities from measured frequencies is minimum, as is done in CumFreq.
The generalization enhances the flexibility of probability distributions and increases their applicability in distribution fitting.

Inversion of skewness

Skewed distributions can be inverted by replacing in the mathematical expression of the cumulative distribution function by its complement: F'=1-F, obtaining the complementary distribution function that gives a mirror image. In this manner, a distribution that is skewed to the right is transformed into a distribution that is skewed to the left and vice versa.
The technique of skewness inversion increases the number of probability distributions available for distribution fitting and enlarges the distribution fitting opportunities.

Shifting of distributions

Some probability distributions, like the exponential, do not support data values equal to or less than zero. Yet, when negative data are present, such distributions can still be used replacing X by Y=X-Xm, where Xm is the minimum value of X. This replacement represents a shift of the probability distribution in positive direction, i.e. to the right, because Xm is negative. After completing the distribution fitting of Y, the corresponding X-values are found from X=Y+Xm, which represents a back-shift of the distribution in negative direction, i.e. to the left.
The technique of distribution shifting augments the chance to find a properly fitting probability distribution.

Composite distributions

The option exists to use two different probability distributions, one for the lower data range, and one for the higher like for example the Laplace distribution. The ranges are separated by a break-point. The use of such composite probability distributions can be opportune when the data of the phenomenon studied were obtained under two sets different conditions.

Uncertainty of prediction

Predictions of occurrence based on fitted probability distributions are subject to uncertainty, which arises from the following conditions:

The true probability distribution of events may deviate from the fitted distribution, as the observed data series may not be totally representative of the real probability of occurrence of the phenomenon due to random error
The occurrence of events in another situation or in the future may deviate from the fitted distribution as this occurrence can also be subject to random error
A change of environmental conditions may cause a change in the probability of occurrence of the phenomenon

curves of 50-year samples from a theoretical 1000 year record, data from Benson
An estimate of the uncertainty in the first and second case can be obtained with the binomial probability distribution using for example the probability of exceedance Pe and the probability of non-exceedance Pn. In this case there are only two possibilities: either there is exceedance or there is non-exceedance. This duality is the reason that the binomial distribution is applicable.
With the binomial distribution one can obtain a confidence interval of the prediction. Such an interval also estimates the risk of failure, i.e. the chance that the predicted event still remains outside the confidence interval. The confidence or risk analysis may include the return period T=1/Pe'' as is done in hydrology.

Goodness of fit

By ranking the goodness of fit of various distributions one can get an impression of which distribution is acceptable and which is not.

Histogram and density function

From the cumulative distribution function one can derive a histogram and the probability density function.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...