In statistics, the generalized Pareto distribution is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location, scale, and shape. Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as.
The related location-scale family of distributions is obtained by replacing the argumentz by and adjusting the support accordingly. The cumulative distribution function of is where the support of is when, and when. The probability density function of is again, for when, and when. The pdf is a solution of the following differential equation:
If U is uniformly distributed on (0, 1], then and Both formulas are obtained by inversion of the cdf. In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.
GPD as an Exponential-Gamma Mixture
A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter. and then Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that: must be positive.
If ,, , then is distributed according to the , denoted by , . The probability density function of , is where the support is for, and for. For all, the becomes the location parameter. See the right panel for the pdf when the shape is positive. The exGPD has finite moments of all orders for all and. of the as a function of. The red dotted line corresponds to the value of variance evaluated at. The moment-generating function of is where and denote the beta function and gamma function, respectively. The variance of , depends on the shape parameter only through the polygamma function of order 1 : See the right panel for the variance as a function of. Note that. Note that the roles of the scale parameter and the shape parameter under are separably interpretable, which may lead to a robust efficient estimation for the than using the . The roles of the two parameters are associated each other under ; see the formula of variance wherein both parameters are participated.
The Hill's estimator
Assume that are observations from an unknown heavy-tailed distribution such that its tail distribution is regularly varying with the tail-index . To be specific, the tail distribution is described as It is of a particular interest in the extreme value theory to estimate the shape parameter, especially when is positive. Let be their conditional excess distribution function. Pickands–Balkema–de Haan theorem states that for a large class of underlying distribution functions, and large, is well approximated by the generalized Pareto distribution, which motivated Peak Over Threshold methods to estimate : the GPD plays the key role in POT approach. A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For, write for the -th largest value of. Then, with this notation, the Hill's estimator based on the upper order statistics is defined as In practice, the Hill estimator is used as follows. First, calculate the estimator at each integer, and then plot the ordered pairs. Then, select from the set of Hill estimators which are roughly constant with respect to : these stable values are regarded as reasonable estimates for the shape parameter. If are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter . Note that the Hill estimator makes a use of the log-transformation for the observations.