Probability Distributions

208
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 05 Oct 2010 15:08:33 UTC Probability Distributions

Transcript of Probability Distributions

Probability Distributions

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 05 Oct 2010 15:08:33 UTC

ContentsArticlesProbability distribution 1 7 7 13 15 21 27 34 37 45 53 57 60 64 68 71 78 84 88 108 117 129 134 140 140 142 148 156 158 163 170 177

Continous DistributionsBeta distribution Burr distribution Cauchy distribution Chi-square distribution Dirichlet distribution F-distribution Gamma distribution Exponential distribution Erlang distribution Kumaraswamy distribution Inverse Gaussian distribution Laplace distribution Lvy distribution Log-logistic distribution Log-normal distribution Logistic distribution Normal distribution Pareto distribution Student's t-distribution Uniform distribution (continuous) Weibull distribution

Discrete distributionsBernoulli distribution Beta-binomial distribution Binomial distribution Uniform distribution (discrete) Geometric distribution Hypergeometric distribution Negative binomial distribution YuleSimon distribution

Zipf's law

180 186 186 189 196

Multivariate distributionsMultinomial distribution Multivariate normal distribution Wishart distribution

ReferencesArticle Sources and Contributors Image Sources, Licenses and Contributors 200 203

Article LicensesLicense 205

Probability distribution

1

Probability distributionIn probability theory and statistics, a probability distribution identifies either the probability of each value of a random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous).[1] The probability distribution describes the range of possible values that a random variable can attain and the probability that the value of the random variable is within any (measurable) subset of that range. When the random variable takes values in the set of real numbers, the probability distribution is completely described by the cumulative distribution function, whose value at each real x is the probability that the random variable is smaller than or equal to x. The concept of the probability distribution and the random variables The Normal distribution, often called the "bell curve". which they describe underlies the mathematical discipline of probability theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are made with some intrinsic error; in physics many processes are described probabilistically, from the kinetic properties of gases to the quantum mechanical description of fundamental particles. For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate. There are various probability distributions that show up in various different applications. Two of the most important ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian distribution, has a familiar "bell curve" shape and approximates many different naturally occurring distributions over real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads and tails, each with probability 1/2.

Formal definitionIn the measure-theoretic formalization of probability theory, a random variable is defined as a measurable function X from a probability space to measurable space . A probability distribution is the pushforward measure X*P=PX 1 on .

Probability distributions of real-valued random variablesBecause a probability distribution Pr on the real line is determined by the probability of a real-valued random variable X being in a half-open interval (-,x], the probability distribution is completely characterized by its cumulative distribution function:

Probability distribution Discrete probability distribution A probability distribution is called discrete if its cumulative distribution function only increases in jumps. More precisely, a probability distribution is discrete if there is a finite or countable set whose probability is 1. For many familiar discrete distributions, the set of possible values is topologically discrete in the sense that all its points are isolated points. But, there are discrete distributions for which this countable set is dense on the real line. Discrete distributions are characterized by a probability mass function, such that

2

Continuous probability distribution By one convention, a probability distribution is called continuous if its cumulative distribution function for all . is continuous and, therefore, the probability measure of singletons

Another convention reserves the term continuous probability distribution for absolutely continuous distributions. These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

TerminologyThe support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be understood as the points or elements that are actual members of the distribution. A discrete random variable is a random variable whose probability distribution is discrete. Similarly, a continuous random variable is a random variable whose probability distribution is continuous.

Simulated samplingThe following algorithm lets one sample from a probability distribution (either discrete or continuous). This algorithm assumes that one has access to the inverse of the cumulative distribution (easy to calculate with a discrete distribution, can be approximated for continuous distributions) and a computational primitive called "random()" which returns an arbitrary-precision floating-point-value in the range of [0,1).define function sampleFrom(cdfInverse (type="function")): // input: // // // cdfInverse(x) - the inverse of the CDF of the probability distribution example: if distribution is [[Gaussian]], one can use a [[Taylor approximation]] of the inverse of [[erf]](x) example: if distribution is discrete, see explanation below pseudocode

// output: // type="real number" - a value sampled from the probability distribution represented by cdfInverse

r = random()

while(r == 0): r = random()

(make sure r is not equal to 0; discontinuity possible)

return cdfInverse(r)

Probability distribution For discrete distributions, the function cdfInverse (inverse of cumulative distribution function) can be calculated from samples as follows: for each element in the sample range (discrete values along the x-axis), calculating the total samples before it. Normalize this new discrete distribution. This new discrete distribution is the CDF, and can be turned into an object which acts like a function: calling cdfInverse(query) returns the smallest x-value such that the CDF is greater than or equal to the query.define function dataToCdfInverse(discreteDistribution (type="dictionary")) // input: // // // discreteDistribution - a mapping from possible values to frequencies/probabilities example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]] with chance=p example: setting p=0.5 in the above example, this is a [[fair coin]] where P(X=1)->"heads" and P(X=0)->"tails"

3

// output: // type="function" - a function that represents (CDF^-1)(x)

define function cdfInverse(x): integral = 0 go through mapping (key->value) in sorted order, adding value to integral... stop when integral > x (or integral >= x, doesn't matter) return last key we added

return cdfInverse

Note that often, mathematics environments and computer algebra systems will have some way to represent probability distributions and sample from them. This functionality might even have been developed in third-party libraries. Such packages greatly facilitate such sampling, most likely have optimizations for common distributions, and are likely to be more elegant than the above bare-bones solution.

Some properties The probability density function of the sum of two independent random variables is the convolution of each of their density functions. The probability density function of the difference of two independent random variables is the cross-correlation of their density functions. Probability distributions are not a vector space they are not closed under linear combinations, as these do not preserve non-negativity or total integral 1 but they are closed under convex combination, thus forming a convex subset of the space of functions (or measures).

Common probability distributionsThe following is a list of some of the most common probability distributions, grouped by the type of process that they are related to. For a more complete list, see list of probability distributions, which groups by the nature of the outcome being considered (discrete, continuous, multivariate, etc.) Note also that all of the univariate distributions below are singly-peaked; that is, it is assumed that the values cluster around a single point. In practice, actually-observed quantities may cluster around multiple values. Such quantities can be modeled using a mixture distribution.

Probability distribution

4

Related to real-valued quantities that grow linearly (e.g. errors, offsets) Normal distribution (aka Gaussian distribution), for a single such quantity; the most common continuous distribution Multivariate normal distribution (aka multivariate Gaussian distribution), for vectors of correlated outcomes that are individually Gaussian-distributed

Related to positive real-valued quantities that grow exponentially (e.g. prices, incomes, populations) Log-normal distribution, for a single such quantity whose log is normally distributed Pareto distribution, for a single such quantity whose log is exponentially distributed; the prototypical power law distribution

Related to real-valued quantities that are assumed to be uniformly distributed over a (possibly unknown) region Discrete uniform distribution, for a finite set of values (e.g. the outcome of a fair die) Continuous uniform distribution, for continuously-distributed values

Related to Bernoulli trials (yes/no events, with a given probability)Basic distributions Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no) Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total number of independent occurrences Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs Geometric distribution, for binomial-type observations but where the quantity of interest is the number of failures before the first success; a special case of the negative binomial distribution Related to sampling schemes over a finite population Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, using sampling with replacement Hypergeometric distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, using sampling without replacement Beta-binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, sampling using a Polya urn scheme (in some sense, the "opposite" of sampling without replacement)

Probability distribution

5

Related to categorical outcomes (events with K possible outcomes, with a given probability for each outcome) Categorical distribution, for a single categorical outcome (e.g. yes/no/maybe in a survey); a generalization of the Bernoulli distribution Multinomial distribution, for the number of each type of catergorical outcome, given a fixed number of total outcomes; a generalization of the binomial distribution Multivariate hypergeometric distribution, similar to the multinomial distribution, but using sampling without replacement; a generalization of the hypergeometric distribution

Related to events in a Poisson process (events that occur independently with a given rate) Poisson distribution, for the number of occurrences of a Poisson-type event in a given period of time Exponential distribution, for the time before the next Poisson-type event occurs

Useful for hypothesis testing related to normally-distributed outcomes Chi-square distribution, the distribution of a sum of squared standard normal variables; useful e.g. for inference regarding the sample variance of normally-distributed samples (see chi-square test) Student's t distribution, the distribution of the ratio of a standard normal variable and the square root of a scaled chi squared variable; useful for inference regarding the mean of normally-distributed samples with unknown variance (see Student's t-test) F-distribution, the distribution of the ratio of two scaled chi squared variables; useful e.g. for inferences that involve comparing variances or involving R-squared (the squared correlation coefficient)

Useful as conjugate prior distributions in Bayesian inference Beta distribution, for a single probability (real number between 0 and 1); conjugate to the Bernoulli distribution and binomial distribution Gamma distribution, for a non-negative scaling parameter; conjugate to the rate parameter of a Poisson distribution or exponential distribution, the precision (inverse variance) of a normal distribution, etc. Dirichlet distribution, for a vector of probabilities that must sum to 1; conjugate to the categorical distribution and multinomial distribution; generalization of the beta distribution Wishart distribution, for a symmetric non-negative definite matrix; conjugate to the inverse of the covariance matrix of a multivariate normal distribution; generalzation of the gamma distribution

See also Copula (statistics) Cumulative distribution function Histogram Inverse transform sampling Likelihood function List of statistical topics Probability density function Random variable RiemannStieltjes integral application to probability theory

Notes[1] Everitt, B.S. (2006) The Cambridge Dictionary of Statistics, Third Edition. pp. 313314. Cambridge University Press, Cambridge. ISBN 0521690277

Probability distribution

6

External links An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. (http://www.youtube.com/ watch?v=AUSKTk9ENzg) from Index Funds Advisors IFA.com (http://www.ifa.com), youtube.com Interactive Discrete and Continuous Probability Distributions (http://www.socr.ucla.edu/htmls/ SOCR_Distributions.html), socr.ucla.edu A Compendium of Common Probability Distributions (http://www.causascientia.org/math_stat/Dists/ Compendium.pdf) A Compendium of Distributions (http://www.vosesoftware.com/content/ebook.pdf), vosesoftware.com Statistical Distributions - Overview (http://www.xycoon.com/contdistroverview.htm), xycoon.com Probability Distributions (http://www.sitmo.com/eqcat/8) in Quant Equation Archive, sitmo.com A Probability Distribution Calculator (http://www.covariable.com/continuous.html), covariable.com Sourceforge.net (http://sourceforge.net/projects/distexplorer/), Distribution Explorer: a mixed C++ and C# Windows application that allows you to explore the properties of 20+ statistical distributions, and calculate CDF, PDF & quantiles. Written using open-source C++ from the Boost.org (http://www.boost.org) Math Toolkit library. Explore different probability distributions and fit your own dataset online - interactive tool (http://www.xjtek. com/anylogic/demo_models/111/), xjtek.com

7

Continous DistributionsBeta distributionBeta Probability density function

Cumulative distribution function

parameters: support: pdf: cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: mgf: cf:

shape (real) shape (real)

no closed form for

see text see text

Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval (0, 1) parameterized by two positive shape parameters, typically denoted by and . It is the special case of the Dirichlet distribution with only two parameters. Just as the Dirichlet distribution is the conjugate prior of the multinomial distribution and categorical distribution, the beta distribution is the conjugate prior of the binomial distribution and bernoulli distribution. In Bayesian statistics, it can be seen as the likelihood of the parameter p of a binomial distribution from observing 1 independent events with probability p and 1 with probability 1p.

8

CharacterizationProbability density functionThe probability density function of the beta distribution is:

where

is the gamma function. The beta function, B, appears as a normalization constant to ensure that the total

probability integrates to unity.

Cumulative distribution functionThe cumulative distribution function is

where

is the incomplete beta function and

is the regularized incomplete beta function.

PropertiesThe expected value ( ), variance (second central moment), skewness (third central moment), and kurtosis excess (forth central moment) of a Beta distribution random variable X with parameters and are:

The skewness is

The kurtosis excess is:

In general, the

th raw moment is given by

Beta distribution where is a Pochhammer symbol representing rising factorial. It can also be written in a recursive form as

9

One can also show that

Quantities of informationGiven two beta distributed random variables, X ~ Beta(, ) and Y ~ Beta(', '), the information entropy of X is [1] where is the digamma function.

The cross entropy is It follows that the KullbackLeibler divergence between these two beta distributions is

ShapesThe beta density function can take on different shapes depending on the values of the two parameters: Moreover, if is the uniform [0,1] distribution is U-shaped (red plot) or is strictly decreasing (blue plot) is strictly convex is a straight line is strictly concave or is strictly increasing (green plot) is strictly convex is a straight line is strictly concave is unimodal (purple & black plots) then the density function is symmetric about 1/2 (red & purple plots).

Parameter estimationLet

be the sample mean and

be the sample variance. The method-of-moments estimates of the parameters are

When the distribution is required over an interval other than [0,1], say with in the above equations.[2] [3]

, then replace

with

and

Beta distribution There is no closed-form of the maximum likelihood estimates for the parameters.

10

Related distributions If X has a beta distribution, then T=X/(1X) has a "beta distribution of the second kind", also called the beta prime distribution. The connection with the binomial distribution is mentioned below. The Beta(1,1) distribution is identical to the standard uniform distribution. If X has the Beta(3/2,3/2) distribution and R > 0 is a real parameter, then Y:=2RXR has the Wigner semicircle distribution. If X and Y are independently distributed Gamma(,) and Gamma(,) respectively, then X/(X+Y) is distributed Beta(,). If X and Y are independently distributed Beta(,) and F(2,2) (Snedecor's F distribution with 2 and 2 degrees of freedom), then Pr(X/(+x)) =Pr(Y>x) for all x>0. The beta distribution is a special case of the Dirichlet distribution for only two parameters. The Kumaraswamy distribution resembles the beta distribution. If has a uniform distribution, then , which is a special case of the Beta distribution called the power-function distribution. Binomial opinions in subjective logic are equivalent to Beta distributions. Beta(1/2,1/2) is the Jeffreys prior for a proportion and is equivalent to arcsine distribution. Beta(i,j) with integer values of i and j is the distribution of the i-th order statistic (the i-th smallest value) of a sample of i+j1 independent random variables uniformly distributed between 0 and 1. The cumulative probability from 0 to x is thus the probability that the i-th smallest value is less than x, in other words, it is the probability that at least i of the random variables are less than x, a probability given by summing over the binomial distribution with its p parameter set to x. This shows the intimate connection between the beta distribution and the binomial distribution.

ApplicationsRule of successionA classic application of the beta distribution is the rule of succession, introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. It states that, given s successes in n conditionally independent Bernoulli trials with probability p, that p should be estimated as . This estimate may be regarded as the

expected value of the posterior distribution over p, namely Beta(s+1,ns+1), which is given by Bayes' rule if one assumes a uniform prior over p (i.e., Beta(1,1)) and then observes that p generated s successes in n trials.

Bayesian statisticsBeta distributions are used extensively in Bayesian statistics, since beta distributions provide a family of conjugate prior distributions for binomial (including Bernoulli) and geometric distributions. The Beta(0,0) distribution is an improper prior and sometimes used to represent ignorance of parameter values.

Task duration modelingThe beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution along with the triangular distribution is used extensively in PERT, critical path method (CPM) and other project management / control systems to describe the time to completion of a task. In project management, shorthand computations are widely used to estimate the mean and standard deviation of the beta distribution:

Beta distribution

11

where a is the minimum, c is the maximum, and b is the most likely value. Using this set of approximations is known as three-point estimation and are exact only for particular values of and , specifically when[4] :

or vice versa. These are notably poor approximations for most other beta distributions exhibiting average errors of 40% in the mean and 549% in the variance[5] [6] [7]

Information theoryWe introduce one exemplary use of beta distribution in information theory, particularly for the information theoretic performance analysis for a communication system. In sensor array systems, the distribution of two vector production is used for the performance estimation in frequent. Assume that s and v are vectors the (M1)-dimensional nullspace of h with isotropic i.i.d. where s, v and h are in CM and the elements of h are i.i.d complex Gaussian random values. Then, the production of s and v with absolute of the result |sHv| is beta(1,M2) distributed.

Four parametersA beta distribution with the two shape parameters and is supported on the range [0,1]. It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum and maximum values of the distribution.[8] The probability density function of the four parameter beta distribution is given by

The standard form can be obtained by letting

References[1] A. C. G. Verdugo Lazo and P. N. Rathie. "On the entropy of continuous probability distributions," IEEE Trans. Inf. Theory, IT-24:120122,1978. [2] Engineering Statistics Handbook (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda366h. htm) [3] Brighton Webs Ltd. Data & Analysis Services for Industry & Education (http:/ / www. brighton-webs. co. uk/ distributions/ beta. asp) [4] Grubbs, Frank E. (1962). Attempts to Validate Certain PERT Statistics or Picking on PERT. Operations Research 10(6), p. 912915. [5] Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 10861091. [6] Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595609. [7] DRMI Newsletter, Issue 12, April 8, 2005 (http:/ / www. nps. edu/ drmi/ docs/ 1apr05-newsletter. pdf) [8] Beta4 distribution (http:/ / www. vosesoftware. com/ ModelRiskHelp/ Distributions/ Continuous_distributions/ Beta_distribution. htm)

Beta distribution

12

External links Weisstein, Eric W., " Beta Distribution (http://mathworld.wolfram.com/BetaDistribution.html)" from MathWorld. "Beta Distribution" (http://demonstrations.wolfram.com/BetaDistribution/) by Fiona Maclachlan, the Wolfram Demonstrations Project, 2007. Beta Distribution Overview and Example (http://www.xycoon.com/beta.htm), xycoon.com Beta Distribution (http://www.brighton-webs.co.uk/distributions/beta.asp), brighton-webs.co.uk Beta Distributions (http://isometricland.com/geogebra/geogebra_beta_distributions.php) Applet showing beta distributions in action.

Burr distribution

13

Burr distributionBurr Probability density function

Cumulative distribution function

parameters: support: pdf: cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: where B() is the beta function

Burr distribution

14mgf: cf:

In probability theory, statistics and econometrics, the Burr Type XII distribution or simply the Burr distribution is a continuous probability distribution for a non-negative random variable. It is also known as the Singh-Maddala distribution and is one of a number of different distributions sometimes called the "generalized log-logistic distribution". It is most commonly used to model household income (See: Household income in the U.S. and compare to magenta graph at right). The Burr distribution has probability density function:[1] [2]

and cumulative distribution function:

See alsoLog-logistic distribution

References[1] Maddala, G.S.. 1983, 1996. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press. [2] Tadikamalla, Pandu R. (1980), "A Look at the Burr and Related Distributions" (http:/ / links. jstor. org/ sici?sici=0306-7734(198012)48:32. 0. CO;2-Z), International Statistical Review 48 (3): 337344, doi:10.2307/1402945,

Cauchy distribution

15

Cauchy distributionNot to be confused with the Lorenz curve.CauchyLorentz Probability density function

The purple curve is the standard Cauchy distribution Cumulative distribution function

parameters: support: pdf:

location (real) scale (real)

cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: mgf: cf: not defined not defined not defined not defined not defined

The CauchyLorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz(ian) function, or BreitWigner distribution.

Cauchy distribution Its importance in physics is due to its being the solution to the differential equation describing forced resonance.[1] In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in the upper half-plane. In spectroscopy, it is the description of the shape of spectral lines which are subject to homogeneous broadening in which all atoms interact in the same way with the frequency range contained in the line shape. Many mechanisms cause homogeneous broadening, most notably collision broadening, and ChantlerAlda radiation.[2]

16

CharacterizationProbability density functionThe Cauchy distribution has the probability density function

where x0 is the location parameter, specifying the location of the peak of the distribution, and is the scale parameter which specifies the half-width at half-maximum (HWHM). is also equal to half the interquartile range. Cauchy himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta function (see there). The amplitude of the above Lorentzian function is given by

The special case when x0 = 0 and = 1 is called the standard Cauchy distribution with the probability density function

In physics, a three-parameter Lorentzian function is often used, as follows:

where I is the height of the peak.

Cauchy distribution

17

Cumulative distribution functionThe cumulative distribution function (cdf) is:

and the inverse cumulative distribution function of the Cauchy distribution is

PropertiesThe Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its mode and median are well defined and are both equal to x0. When U and V are two independent normally distributed random variables with expected value 0 and variance 1, then the ratio U/V has the standard Cauchy distribution. If X1, ..., Xn are independent and identically distributed random variables, each with a standard Cauchy distribution, then the sample mean (X1 + ... + Xn)/n has the same standard Cauchy distribution (the sample median, which is not affected by extreme values, can be used as a measure of central tendency). To see that this is true, compute the characteristic function of the sample mean:

where

is the sample mean. This example serves to show that the hypothesis of finite variance in the central limit

theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all stable distributions, of which the Cauchy distribution is a special case. The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly stable distribution. The standard Cauchy distribution coincides with the Student's t-distribution with one degree of freedom. Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under linear transformations with real coefficients. In addition, the Cauchy distribution is the only univariate distribution which is closed under linear fractional transformations with real coefficients. In this connection, see also McCullagh's parametrization of the Cauchy distributions.

Characteristic functionLet X denote a Cauchy distributed random variable. The characteristic function of the Cauchy distribution is given by which is just the Fourier transform of the probability density. It follows that the probability may be expressed in terms of the characteristic function by:

Explanation of undefined momentsMeanIf a probability distribution has a density function f(x) then the mean is

The question is now whether this is the same thing as

Cauchy distribution

18

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy distribution, both the positive and negative terms of (2) are infinite. This means (2) is undefined. Moreover, if (1) is construed as a Lebesgue integral, then (1) is also undefined, since (1) is then defined simply as the difference (2) between positive and negative parts. However, if (1) is construed as an improper integral rather than a Lebesgue integral, then (2) is undefined, and (1) is not necessarily well-defined. We may take (1) to mean

and this is its Cauchy principal value, which is zero, but we could also take (1) to mean, for example,

which is not zero, as can be seen easily by computing the integral. Various results in probability theory about expected values, such as the strong law of large numbers, will not work in such cases.

Second momentWithout a defined mean, it is impossible to consider the variance or standard deviation of a standard Cauchy distribution, as these are defined with respect to the mean. But the second moment about zero can be considered. It turns out to be infinite:

Estimation of parametersSince the mean and variance of the Cauchy distribution are not defined, attempts to estimate these parameters will not be successful. For example, if N samples are taken from a Cauchy distribution, one may calculate the sample mean as:

Although the sample values will be concentrated about the central value , the sample mean will become increasingly variable as more samples are taken, due to the increased likelihood of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the samples themselves, i.e., the sample mean of a large sample is no better (or worse) an estimator of than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more samples are taken. Therefore, more robust means of estimating the central value and the scaling parameter are needed. One simple method is to take the median value of the sample as an estimator of and half the sample interquartile range as an estimator of . Other, more precise and robust methods have been developed [3] For example, the truncated mean of the middle 24% sample order statistics produces an estimate for that is more efficient than [4] [5] using either the sample median or the full sample mean. However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases if the mean more than 24% of the sample is used.[4] [5] Maximum likelihood can also be used to estimate the parameters and . However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima.[6] Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples.[7] The log-likelihood function for the Cauchy distribution for sample size n is: Maximizing the log likelihood function with respect to and produces the following system of equations:

Cauchy distribution

19

Solving just for

requires solving a polynomial of degree 2n1,[6] and solving just for (first for , then ). It is also worthwhile to note that must satisfy

requires solving a is a .

polynomial of degree monotone function in

and that the solution

Therefore, whether solving for one parameter or for both paramters simultaneously, a numerical solution on a computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating using the sample median is only about 81% as asymptotically efficient as estimating by maximum [5] [8] likelihood. The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically efficient an estimator of as the maimum likelihood estimate.[5] When Newton's method is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for .

Multivariate Cauchy distributionA random vector X = (X1, , Xk) is said to have the multivariate Cauchy distribution if every linear combination of its components Y=a1X1 + + akXk has a Cauchy distribution. That is, for any constant vector a Rk, the random variable Y = aX should have a univariate Cauchy distribution.[9] The characteristic function of a multivariate Cauchy distribution is given by:

where

and

are real functions with[9]

a homogeneous function of degree one and[9]

a positive

homogeneous function of degree one. and

More formally:

An example of a bivariate Cauchy distribution can be given by:[10]

Note that in this example, even though there is no analogue to a covariance matrix, x and y are not statistically independent.[10]

Related distributions The ratio of two independent standard normal random variables is a standard Cauchy variable, a Cauchy(0,1). Thus the Cauchy distribution is a ratio distribution. The standard Cauchy(0,1) distribution arises as a special case of Student's t distribution with one degree of freedom. Relation to stable distribution: if X ~ Stable , then X ~Cauchy(, ).

Relativistic BreitWigner distributionIn nuclear and particle physics, the energy profile of a resonance is described by the relativistic BreitWigner distribution, while the Cauchy distribution is the (non-relativistic) BreitWigner distribution.

Cauchy distribution

20

See also McCullagh's parametrization of the Cauchy distributions Lvy flight and Lvy process Slash distribution Wrapped Cauchy distribution

References[1] http:/ / webphysics. davidson. edu/ Projects/ AnAntonelli/ node5. html Note that the intensity, which follows the Cauchy distribution, is the square of the amplitude. [2] E. Hecht (1987). Optics (2nd ed.). Addison-Wesley. p.603. [3] Cane, Gwenda J. (1974). "Linear Estimation of Parameters of the Cauchy Distribution Based on Sample Quantiles" (http:/ / www. jstor. org/ stable/ 2285535). Journal of the American Statistical Association 69 (345): 243245. . [4] Rothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1966). "A note on estimation from a Cauchy sample". Journal of the American Statistical Association 59 (306): 460463. [5] Bloch, Daniel (1966). "A note on the estimation of the location parameters of the Cauchy distribution" (http:/ / www. jstor. org/ pss/ 2282794). Journal of the American Statistical Association 61 (316): 852855. . [6] Ferguson, Thomas S. (1978). "Maximum Likelihood Estimates of the Parameters of the Cauchy Distribution for Samples of Size 3 and 4" (http:/ / www. jstor. org/ pss/ 2286549). Journal of the American Statistical Association 73 (361): 211. . [7] Cohen Freue, Gabriella V. (2007). "The Pitman estimator of the Cauchy location parameter" (http:/ / faculty. ksu. edu. sa/ 69424/ USEPAP/ Coushy dist. pdf). Journal of Statistical Planning and Inference 137: 1901. . [8] Barnett, V. D. (1966). "Order Statistics Estimators of the Location of the Cauchy Distribution" (http:/ / www. jstor. org/ pss/ 2283210). Journal of the American Statistical Association 61 (316): 1205. . [9] Ferguson, Thomas S. (1962). "A Representation of the Symmetric Bivariate Cauchy Distribution" (http:/ / www. jstor. org/ pss/ 2237984). Journal of the American Statistical Association: 1256. . [10] Molenberghs, Geert; Lesaffre, Emmanuel (1997). "Non-linear Integral Equations to Approximate Bivariate Densities with Given Marginals and Dependence Function" (http:/ / www3. stat. sinica. edu. tw/ statistica/ oldpdf/ A7n310. pdf). Statistica Sinica 7: 713738. .

External links Earliest Uses: The entry on Cauchy distribution has some historical information. (http://jeff560.tripod.com/c. html) Weisstein, Eric W., " Cauchy Distribution (http://mathworld.wolfram.com/CauchyDistribution.html)" from MathWorld. GNU Scientific Library Reference Manual (http://www.gnu.org/software/gsl/manual/gsl-ref. html#SEC294)

Chi-square distribution

21

Chi-square distributionProbability density function

Cumulative distribution function

notation:

or

parameters: k N1 degrees of freedom support: pdf: cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: mgf: cf: (1 2 t)k/2 for t < (1 2 i t)k/2[1]

x [0, +)

k

max{ k 2, 0 } 2k

12 / k

In probability theory and statistics, the chi-square distribution (also chi-squared or -distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing, or in construction of confidence intervals.[2] [3] [4] [5] The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of

Chi-square distribution qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks. The chi-square distribution is a special case of the gamma distribution.

22

DefinitionIf X1, , Xk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as

The chi-square distribution has one parameter: k a positive integer that specifies the number of degrees of freedom (i.e. the number of Xis)

CharacteristicsFurther properties of the chi-square distribution can be found in the box at right.

Probability density functionThe probability density function (pdf) of the chi-square distribution is

where (k/2) denotes the Gamma function, which has closed-form values at the half-integers. For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-square distribution.

Cumulative distribution functionIts cumulative distribution function is:

where (k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function. In a special case of k = 2 this function has a simple form:

Tables of this distribution usually in its cumulative form are widely available and the function is included in many spreadsheets and all statistical packages. For a closed form approximation for the CDF, see under Noncentral chi-square distribution.

Chi-square distribution

23

AdditivityIt follows from the definition of the chi-square distribution that the sum of independent chi-square variables is also chi-square distributed. Specifically, if {Xi}i=1n are independent chi-square variables with {ki}i=1n degrees of freedom, respectively, then Y = X1 + + Xn is chi-square distributed with k1 + + kn degrees of freedom.

Information entropyThe information entropy is given by where (x) is the Digamma function.

Noncentral momentsThe moments about zero of a chi-square distribution with k degrees of freedom are given by[6] [7]

CumulantsThe cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:

Asymptotic propertiesBy the central limit theorem, because the chi-square distribution is the sum of k independent random variables, it converges to a normal distribution for large k (k>50 is approximately normal).[8] Specifically, if X~(k), then as k tends to infinity, the distribution of tends to a standard normal distribution. However, convergence is slow as the skewness is and the excess kurtosis is 12/k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some examples are: If X ~ (k) then to R. A. Fisher). If X ~ (k) then and Hilferty, 1931) is approximately normally distributed with mean and variance (Wilson is approximately normally distributed with mean and unit variance (result credited

Related distributionsA chi-square variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables. If Y is a k-dimensional Gaussian random vector with mean vector and rank k covariance matrix C, then X=(Y)TC1(Y) is chi-square distributed with k degrees of freedom. The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-square distribution called the noncentral chi-square distribution. If Y is a vector of k i.i.d. standard normal random variables and A is a kk idempotent matrix with rank kn then the quadratic form YTAY is chi-square distributed with kn degrees of freedom. The chi-square distribution is also naturally related to other distributions arising from the Gaussian. In particular, Y is F-distributed, Y~F(k1,k2) if If X is chi-square distributed, then2 2

where X1~(k1) and X2 ~(k2) are statistically independent. is chi distributed.

If X1 ~ k1 and X2 ~ k2 are statistically independent, then X1 + X2 ~2k1+k2. If X1 and X2 are not independent, then X1 + X2 is not chi-square distributed.

Chi-square distribution

24

GeneralizationsThe chi-square distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

Chi-square distributionsNoncentral chi-square distribution The noncentral chi-square distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means. Generalized chi-square distribution The generalized chi-square distribution is obtained from the quadratic form zAz where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributionsThe chi-square distribution X~(k) is a special case of the gamma distribution, in that X~(k/2,2) (using the shape parameterization of the gamma distribution). Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X~(2), then X~Exp(1/2) is an exponential distribution. The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X~(k) with even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

ApplicationsThe chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Students t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom. Following are some of the most common situations in which the chi-square distribution arises from a Gaussian-distributed sample. if X1, , Xn are i.i.d. N(, 2) random variables, then . The box below shows probability distributions with name starting with chi for some statistics based on Xi Normal(i, 2i), i = 1, , k, independent random variables: where

Chi-square distribution

25

Name chi-square distribution

Statistic

noncentral chi-square distribution chi distribution

noncentral chi distribution

Table of value vs P valueThe P-value is the probability of observing a test statistic at least as extreme in a Chi-square distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the P-value. The table below gives a number of P-values matching to for the first 10 degrees of freedom. A P-value of 0.05 or less is usually regarded as statistically significant.Degrees of freedom (df) 1 2 3 4 5 6 7 8 9 10 P value (Probability) value 0.004 0.02 0.06 0.15 0.46 1.07 0.10 0.35 0.71 1.14 1.63 2.17 2.73 3.32 3.94 0.95 0.21 0.45 0.71 1.39 2.41 0.58 1.01 1.42 2.37 3.66 1.06 1.65 2.20 3.36 4.88 1.61 2.34 3.00 4.35 6.06 2.20 3.07 3.83 5.35 7.23 2.83 3.82 4.67 6.35 8.38 3.49 4.59 5.53 7.34 9.52 [9]

1.64 3.22 4.64 5.99 7.29 8.56 9.80

2.71 4.60 6.25 7.78 9.24

3.84 5.99 7.82 9.49

6.64 9.21

10.83 13.82

11.34 16.27 13.28 18.47

11.07 15.09 20.52

10.64 12.59 16.81 22.46 12.02 14.07 18.48 24.32

11.03 13.36 15.51 20.09 26.12

4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 0.90 0.80 0.70 0.50 0.30 Nonsignificant 0.20 0.10 0.05 0.01 0.001

Significant

Chi-square distribution

26

See also Cochran's theorem Degrees of freedom (statistics) Fisher's method for combining independent tests of significance Generalized chi-square distribution High-dimensional space Inverse-chi-square distribution Noncentral chi-square distribution Normal distribution Pearson's chi-square test Proofs related to chi-square distribution Wishart distribution

ReferencesFootnotes[1] M.A. Sanders. "Characteristic function of the central chi-square distribution" (http:/ / www. planetmathematics. com/ CentralChiDistr. pdf). . Retrieved 2009-03-06. [2] Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" (http:/ / www. math. sfu. ca/ ~cbm/ aands/ page_940. htm), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp.940, MR0167642, ISBN978-0486612720, . [3] NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda3666. htm) [4] Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN0-471-58495-9. [5] Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN0-07-042864-6. [6] Chi-square distribution (http:/ / mathworld. wolfram. com/ Chi-SquaredDistribution. html), from MathWorld, retrieved Feb. 11, 2009 [7] M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1 [8] Box, Hunter and Hunter. Statistics for experimenters. Wiley. p.46. [9] Chi-Square Test (http:/ / www2. lv. psu. edu/ jxm57/ irp/ chisquar. html) Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State University. In turn citing: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV

Notations Wilson, E.B. Hilferty, M.M. (1931) The distribution of chi-square. Proceedings of the National Academy of Sciences, Washington, 17, 684688.

External links Earliest Uses of Some of the Words of Mathematics: entry on Chi square has a brief history (http://jeff560. tripod.com/c.html) Course notes on Chi-Square Goodness of Fit Testing (http://www.stat.yale.edu/Courses/1997-98/101/chigf. htm) from Yale University Stats 101 class. Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e.g. x, for a normal population (http://demonstrations.wolfram.com/StatisticsAssociatedWithNormalSamples/) Simple algorithm for approximating cdf and inverse cdf for the chi-square distribution with a pocket calculator (http://www.jstor.org/stable/2348373)

Dirichlet distribution

27

Dirichlet distribution

Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors . Clockwise from top left: =(6,2,2), (3,7,5), (6,2,6), (2,3,4).

In probability and statistics, the Dirichlet distribution (after Johann Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parametrized by a vector of positive reals. It is the multivariate generalization of the beta distribution, and conjugate prior of the categorical distribution and multinomial distribution in Bayesian statistics. That is, its probability density function returns the belief that the probabilities of K rival events are given that each event has been observed times. The support of the Dirichlet distribution (i.e. the set of values for which the density is non-zero) is a -dimensional vector of real numbers in the range , all of which sum to 1. These can be viewed as the probabilities of a K-way categorical event. Another way to express this is that the domain of the Dirichlet distribution is itself a probability distribution, specifically a -dimensional discrete distribution. Note that the technical term for the set of points in the support of a -dimensional Dirichlet distribution is the open standard -simplex, which is a generalization of a triangle, embedded in the next-higher dimension. For example, with , the support looks like an equilateral triangle embedded in a downward-angle fashion in three-dimensional space, with vertices at and , i.e. touching each of the coordinate axes at a point 1 unit away from the origin. A very common special case is the symmetric Dirichlet distribution, where all of the elements making up the vector have the same value. In this case, the distribution can be parametrized by a single scalar value , called the concentration parameter. When this value is 1, the symmetric Dirichlet distribution is equivalent to a uniform distribution over the open standard standard -simplex, i.e. it is uniform over all points in its support. Values of the concentration parameter above 1 prefer variates that are dense, evenly-distributed distributions, i.e. all probabilities returned are similar to each other. Values of the concentration parameter below 1 prefer sparse distributions, i.e. most of the probabilities returned will be close to 0, and the vast majority of the mass will be concentrated in a few of the probabilities. The infinite-dimensional generalization of the Dirichlet distribution is the Dirichlet process.

Dirichlet distribution

28

Probability density functionThe Dirichlet distribution of order K2 with parameters 1, ..., K >0 has a probability density function with respect to Lebesgue measure on the Euclidean space RK1 given by

for all x1, ..., xK1 >0 satisfying x1 + ... + xK1 30), probability that we'll need to wait another 10 seconds for the first arrival (T>30+10) is the same as the initial probability that we need to wait more than 10 seconds for the first arrival (T>10). This is often misunderstood by students taking courses on probability: the fact that Pr(T>40|T>30) =Pr(T>10) does not mean that the events T>40 and T>30 are independent. To summarize: "memorylessness" of the probability distribution of the waiting time T until the first arrival means

It does not mean

(That would be independence. These two events are not independent.) The exponential distributions and the geometric distributions are the only memoryless probability distributions. The exponential distribution is consequently also necessarily the only continuous probability distribution that has a constant Failure rate.

QuartilesThe quantile function (inverse cumulative distribution function) for Exponential() is

for 0 p < 1. The quartiles are therefore: first quartile ln(4/3)/ median ln(2)/ third quartile ln(4)/

KullbackLeibler divergenceThe directed KullbackLeibler divergence between Exp(0) ('true' distribution) and Exp() ('approximating' distribution) is given by

Maximum entropy distributionAmong all continuous probability distributions with support [0,) and mean , the exponential distribution with = 1/ has the largest entropy.

Exponential distribution

49

Distribution of the minimum of exponential random variablesLet X1, ..., Xn be independent exponentially distributed random variables with rate parameters 1, ..., n. Then is also exponentially distributed, with parameter

This can be seen by considering the complementary cumulative distribution function: The index of the variable which achieves the minimum is distributed according to the law

Note that

is not exponentially distributed.

Parameter estimationSuppose a given variable is exponentially distributed and the rate parameter is to be estimated.

Maximum likelihoodThe likelihood function for , given an independent and identically distributed sample x = (x1, ..., xn) drawn from the variable, is where

is the sample mean. The derivative of the likelihood function's logarithm is Consequently the maximum likelihood estimate for the rate parameter is

While this estimate is the most likely reconstruction of the true parameter , it is only an estimate, and as such, one can imagine that the more data points are available the better the estimate will be. It so happens that one can compute an exact confidence interval that is, a confidence interval that is valid for all number of samples, not just large ones. The 100(1)% exact confidence interval for this estimate is given by[1]

Where

is the MLE estimate, is the true value of the parameter, and 2k;

x

is the value of the chi squared

distribution with k degrees of freedom that gives x cumulative probability (i.e. the value found in chi-squared tables [2]).

Exponential distribution

50

Bayesian inferenceThe conjugate prior for the exponential distribution is the gamma distribution (of which the exponential distribution is a special case). The following parameterization of the gamma pdf is useful:

The posterior distribution p can then be expressed in terms of the likelihood function defined above and a gamma prior:

Now the posterior density p has been specified up to a missing normalizing constant. Since it has the form of a gamma pdf, this can easily be filled in, and one obtains

Here the parameter can be interpreted as the number of prior observations, and as the sum of the prior observations.

PredictionHaving observed a sample of n data points from an unknown exponential distribution a common task is to use these samples to make predictions about future data from the same source. A common predictive distribution over future samples is the so-called plug-in distribution, formed by plugging a suitable estimate for the rate parameter into the exponential density function. A common choice of estimate is the one provided by the principle of maximum likelihood, and using this yields the predictive density over a future sample xn+1, conditioned on the observed samples x = (x1, ..., xn) given by

The Bayesian approach provides a predictive distribution which takes into account the uncertainty of the estimated parameter, although this may depend crucially on the choice of prior. A recent alternative that is free of the issues of choosing priors is the Conditional Normalized Maximum Likelihood (CNML) predictive distribution [3]

The accuracy of a predictive distribution may be measured using the distance or divergence between the true exponential distribution with rate parameter, 0, and the predictive distribution based on the sample x. The KullbackLeibler divergence is a commonly used, parameterisation free measure of the difference between two distributions. Letting (0||p) denote the KullbackLeibler divergence between an exponential with rate parameter 0 and a predictive distribution p it can be shown that

where the expectation is taken with respect to the exponential distribution with rate parameter 0 (0, ), and ( ) is the digamma function. It is clear that the CNML predictive distribution is strictly superior to the maximum likelihood plug-in distribution in terms of average KullbackLeibler divergence for all sample sizes n > 0.

Exponential distribution

51

Generating exponential variatesA conceptually very simple method for generating exponential variates is based on inverse transform sampling: Given a random variate U drawn from the uniform distribution on the unit interval (0,1), the variate has an exponential distribution, where F1 is the quantile function, defined by

Moreover, if U is uniform on (0,1), then so is 1U. This means one can generate exponential variates as follows:

Other methods for generating exponential variates are discussed by Knuth[4] and Devroye.[5] The ziggurat algorithm is a fast method for generating exponential variates. A fast method for generating a set of ready-ordered exponential variates without using a sorting routine is also available.[5]

Related distributions An exponential distribution is a special case of a gamma distribution with = 1 (or k = 1 depending on the parameter set used). Both an exponential distribution and a gamma distribution are special cases of the phase-type distribution. Y Weibull(, ), i.e. Y has a Weibull distribution, if Y = X1/ and X Exponential(). In particular, every exponential distribution is also a Weibull distribution. Y Rayleigh(), i.e. Y has a Rayleigh distribution, if and X Exponential(). Y Gumbel(, ), i.e. Y has a Gumbel distribution if Y = log(X) and X Exponential(). Y Laplace, i.e. Y has a Laplace distribution, if Y = X1 X2 for two independent exponential distributions X1 and X2. Y Exponential, i.e. Y has an exponential distribution if Y = min(X1, , XN) for independent exponential distributions Xi. Y Uniform(0, 1), i.e. Y has a uniform distribution if Y = exp( X) and X Exponential(). X 22, i.e. X has a chi-square distribution with 2 degrees of freedom, if . Let X1Xn Exponential() be exponentially distributed and independent and Y = i=1nXi. Then Y Gamma(n, 1/), i.e. Y has a Gamma distribution. X SkewLogistic(), then log(1 + e) Exponential(): see skew-logistic distribution. Let X Exponential(X) and Y Exponential(Y) be independent. Then function Other related distributions: Hyper-exponential distribution the distribution whose density is a weighted sum of exponential densities. Hypoexponential distribution the distribution of a general sum of exponential random variables. exGaussian distribution the sum of an exponential distribution and a normal distribution. . This can be used to obtain a confidence interval for has probability density .

Exponential distribution

52

See also Dead time an application of exponential distribution to particle detector analysis.

References[1] K. S. Trivedi, Probability and Statistics with Reliability, Queueing and Computer Science applications, Chapter 10 Statistical Inference, http:/ / www. ee. duke. edu/ ~kst/ BLUEppt/ chap10f_secure. pdf [2] http:/ / www. unc. edu/ ~farkouh/ usefull/ chi. html [3] D. F. Schmidt and E. Makalic, "Universal Models for the Exponential Distribution", IEEE Transactions on Information Theory, Volume 55, Number 7, pp. 30873090, 2009 doi:10.1109/TIT.2009.2018331 [4] Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn. Boston: AddisonWesley. ISBN 0-201-89684-2. See section 3.4.1, p. 133. [5] Luc Devroye (1986). Non-Uniform Random Variate Generation (http:/ / cg. scs. carleton. ca/ ~luc/ rnbookindex. html). New York: Springer-Verlag. ISBN 0-387-96305-7. See chapter IX (http:/ / cg. scs. carleton. ca/ ~luc/ chapter_nine. pdf), section 2, pp. 392401.

Erlang distribution

53

Erlang distributionErlang Probability density function

Cumulative distribution function

parameters: alt.: support: pdf: cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: mgf: cf:

shape rate (real) scale (real)

no simple closed form for

for

The Erlang distribution is a continuous probability distribution with wide applicability primarily due to its relation to the exponential and Gamma distributions. The Erlang distribution was developed by A. K. Erlang to examine the number of telephone calls which might be made at the same time to the operators of the switching stations. This work on telephone traffic engineering has been expanded to consider waiting times in queueing systems in general.

Erlang distribution The distribution is now used in the fields of stochastic processes and of biomathematics.

54

OverviewThe distribution is a continuous distribution, which has a positive value for all real numbers greater than zero, and is given by two parameters: the shape , which is a non-negative integer, and the rate , which is a non-negative real number. The distribution is sometimes defined using the inverse of the rate parameter, the scale When the shape parameter distribution is a special case of the Gamma distribution where the shape parameter distribution, this parameter is not restricted to the integers. . equals 1, the distribution simplifies to the exponential distribution. The Erlang is an integer. In the Gamma

CharacterizationProbability density functionThe probability density function of the Erlang distribution is

The parameter ):

is called the shape parameter and the parameter

is called the rate parameter. An alternative, but

equivalent, parametrization uses the scale parameter

which is the reciprocal of the rate parameter (i.e.

When the scale parameter

equals 2, then distribution simplifies to the chi-square distribution with 2k degrees of

freedom. It can therefore be regarded a generalized chi-square distribution. Because of the factorial function in the denominator, the Erlang distribution is only defined when the parameter k is a positive integer. In fact, this distribution is sometimes called the Erlang-k distribution (e.g., an Erlang-2 distribution is an Erlang distribution with k=2). The Gamma distribution generalizes the Erlang by allowing to be any real number, using the gamma function instead of the factorial function.

Cumulative distribution functionThe cumulative distribution function of the Erlang distribution is:

where

is the lower incomplete gamma function. The CDF may also be expressed as

Erlang distribution

55

OccurrenceWaiting timesEvents which occur independently with some average rate are modeled with a Poisson process. The waiting times between k occurrences of the event are Erlang distributed. (The related question of the number of events in a given amount of time is described by the Poisson distribution.) The Erlang distribution, which measures the time between incoming calls, can be used in conjunction with the expected duration of incoming calls to produce information about the traffic load measured in Erlang units. This can be used to determine the probability of packet loss or delay, according to various assumptions made about whether blocked calls are aborted (Erlang B formula) or queued until served (Erlang C formula). The Erlang-B and C formulae are still in everyday use for traffic modeling for applications such as the design of call centers.

Compartment modelsThe Erlang distribution also occurs as a description of the rate of transition of elements through a system of compartments. Such systems are widely used in biology and ecology. For example, in mathematical epidemiology, an individual may progress at an exponential rate from healthy to carrier and again exponentially from carrier to infectious. The probability of seeing an infectious individual at time t would then be given by Erlang distribution with k=2. Such models have the useful property that the variance in the infectious compartment is large. In a pure exponential model the variance is - which is often unrealistically small.

Stochastic processesThe Erlang distribution is the distribution of the sum of k independent identically distributed random variables each having an exponential distribution. The rate of the Erlang distribution is the rate of this exponential distribution.

See also Erlang B formula Exponential distribution Gamma distribution Poisson distribution Coxian distribution Poisson process Erlang unit Engset calculation Phase-type distribution Traffic generation model

Erlang distribution

56

External links Erlang Distribution [1] An Introduction to Erlang B and Erlang C by Ian Angus [2] (PDF Document - Has terms and formulae plus short biography) Resource Dimensioning Using Erlang-B and Erlang-C [3] Erlang-C [4] Erlang-B and Erlang-C spreadsheets [5]

References[1] [2] [3] [4] [5] http:/ / www. xycoon. com/ erlang. htm http:/ / www. tarrani. net/ linda/ ErlangBandC. pdf http:/ / www. eventhelix. com/ RealtimeMantra/ CongestionControl/ resource_dimensioning_erlang_b_c. htm http:/ / www. kooltoolz. com/ Erlang-C. htm http:/ / www. pccl. demon. co. uk/ spreadsheets/

Kumaraswamy distribution

57

Kumaraswamy distributionKumaraswamy Probability density function

Cumulative distribution function

parameters: support: pdf: cdf: mean: median:

(real) (real)

mode: variance: skewness: ex.kurtosis: entropy:

for (complicated-see text) (complicated-see text) (complicated-see text)

Kumaraswamy distribution

58mgf: cf:

In probability and statistics, the Kumaraswamy's double bounded distribution is a family of continuous probability distributions defined on the interval [0,1] differing in the values of their two non-negative shape parameters, a and b. It is similar to the Beta distribution, but much simpler to use especially in simulation studies due to the simple closed form of both its probability density function and cumulative distribution function. This distribution was originally proposed by Poondi Kumaraswamy for variables that are lower and upper bounded.

CharacterizationProbability density functionThe probability density function of the Kumaraswamy distribution is

Cumulative distribution functionThe cumulative distribution function is therefore

Generalizing to arbitrary rangeIn its simplest form, the distribution has a range of [0,1]. In a more general form, we may replace the normalized variable x with the unshifted and unscaled variable z where:

The distribution is sometimes combined with a "pike probability" or a Dirac delta function, e.g.:

PropertiesThe raw moments of the Kumaraswamy distribution are given by :

where B is the Beta function. The variance, skewness, and excess kurtosis can be calculated from these raw moments. For example, the variance is:

Kumaraswamy distribution

59

Relation to the Beta distributionThe Kuramaswamy distribution is closely related to Beta distribution. Assume that Xa,b is a Kumaraswamy distributed random variable with parameters a and b. Then Xa,b is the a-th root of a suitably defined Beta distributed random variable. More formally, Let Y1,b denote a Beta distributed random variable with parameters and . One has the following relation between Xa,b and Y1,b.

with equality in distribution. One may introduce generalised Kuramaswamy distributions by considering random variables of the form with and where denotes a Beta distributed random variable with parameters and . The raw ,

moments of this generalized Kumaraswamy distribution are given by:

Note that we can reobtain the original moments setting

,

and

. However, in general the

cumulative distribution function does not have a closed form solution.

ExampleA good example of the use of the Kumaraswamy distribution is the storage volume of a reservoir of capacity zmax whose upper bound is zmax and lower bound is 0 (Fletcher, 1996).

References Kumaraswamy, P. (1980). "A generalized probability density function for double-bounded random processes". Journal of Hydrology 46: 7988. doi:10.1016/0022-1694(80)90036-0. Fletcher, S.G., and Ponnambalam, K. (1996). "Estimation of reservoir yield and storage distribution using moments analysis". Journal of Hydrology 182: 259275. doi:10.1016/0022-1694(95)02946-X.

Inverse Gaussian distribution

60

Inverse Gaussian distributionIn probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter family of continuous probability distributions with support on (0,). Its probability density function is given by

Inverse Gaussian Probability density function

parameters: support: pdf: cdf: where Gaussian) distribution c.d.f. mean: median: mode: is the standard normal (standard

variance: skewness: ex.kurtosis: entropy: mgf: cf:

for x > 0, where

is the mean and

is the shape parameter.

Inverse Gaussian distribution As tends to infinity, the inverse Gaussian distribution becomes more like a normal (Gaussian) distribution. The inverse Gaussian distribution has several properties analogous to a Gaussian distribution. The name can be misleading: it is an "inverse" only in that, while the Gaussian describes a Brownian Motion's level at a fixed time, the inverse Gaussian describes the distribution of the time a Brownian Motion with positive drift takes to reach a fixed positive level. Its cumulant generating function (logarithm of the characteristic function) is the inverse of the cumulant generating function of a Gaussian random variable. To indicate that a random variable X is inverse Gaussian-distributed with mean and shape parameter we write

61

PropertiesSummationIf Xi has a IG(0wi, 0wi) distribution for i=1,2,...,n and all Xi are independent, then

Note that

is constant for all i. This is a necessary condition for the summation. Otherwise S would not be inverse Gaussian.

ScalingFor any t > 0 it holds that

Exponential familyThe inverse Gaussian distribution is a two-parameter exponential family with natural parameters -/(2) and -/2, and natural statistics X and 1/X.

Relationship with Brownian motionThe stochastic process Xt given by

(where Wt is a standard Brownian motion and Then, the first passage time for a fixed level

) is a Brownian motion with drift . by Xt is distributed according to an inverse-gaussian:

Inverse Gaussian distribution

62

When drift is zeroA common special case of the above arises when the Brownian motion has no drift. In that case, parameter tends to infinity, and the first passage time for fixed level has probability density function .

Maximum likelihoodThe model where

with all wi known, (,) unknown and all Xi independent has the following likelihood function Solving the likelihood equation yields the following maximum likelihood estimates

and

are independent and

Generating random variates from an inverse-Gaussian distributionGenerate a random variate from a normal distribution with a mean of 0 and 1 standard deviation

Square the value

and use this relation

Generate another random variate, this time sampled from a uniformed distribution between 0 and 1

If

then return

else return

Sample code in Java language: public double inverseGaussian(double mu, double lambda) { Random rand = new Random(); double v = rand.nextGaussian(); // sample from a normal distribution with a mean of 0 and 1 standard deviation double y = v*v;

Inverse Gaussian distribution double x = mu + (mu*mu*y)/(2*lambda) - (mu/(2*lambda)) * Math.sqrt(4*mu*lambda*y + mu*mu*y*y); double test = rand.nextDouble(); // sample from a uniform distribution between 0 and 1 if (test 0 is a scale parameter. If = 0 and b = 1, the positive half-line is exactly an exponential distribution scaled by 1/2. The pdf of the Laplace distribution is also reminiscent of the normal distribution; however, whereas the normal distribution is expressed in terms of the squared difference from the mean , the Laplace density is expressed in terms of the absolute difference from the mean. Consequently the Laplace distribution has fatter tails than the normal distribution.

Cumulative distribution functionThe Laplace distribution is easy to integrate (if one distinguishes two symmetric cases) due to the use of the absolute value function. Its cumulative distribution function is as follows:

The inverse cumulative distribution function is given by

Generating random variables according to the Laplace distributionGiven a random variable U drawn from the uniform distribution in the interval (-1/2,1/2], the random variable

has a Laplace distribution with parameters and b. This follows from the inverse cumulative distribution function given above. A Laplace(0, b) variate can also be generated as the difference of two i.i.d. Exponential(1/b) random variables. Equivalently, a Laplace(0, 1) random variable can be generated as the logarithm of the ratio of two iid uniform random variables.

Laplace distribution

66

Parameter estimationGiven N independent and identically distributed samples x1, x2, ..., xN, an estimator and the maximum likelihood estimator of b is of is the sample median,[1]

(revealing a link between the Laplace distribution and least absolute deviations).

Moments

Related distributions If If If If then and . and and independent of , then is . The generalized Gaussian distribution (version 1) equals the Laplace distribution when its shape parameter set to 1. The scale parameter is then equal to . independent of , then is an exponential distribution. independent of , then

Relation to the exponential distributionA Laplace random variable can be represented as the difference of two iid exponential random variables. One way to show this is by using the characteristic function approach. For any set of independent continuous random variables, for any linear combination of those variables, its characteristic function (which uniquely determines the distribution) can be acquired by multiplying the correspond characteristic functions. Consider two i.i.d random variables . The characteristic functions for are

, respectively. On multiplying these characteristic functions (equivalent to the characteristic function of the sum of therandom variables This is the same as the characteristic function for ), the result is , which is . .

Laplace distribution

67

Sargan distributionsSargan distributions are a system of distributions of which the Laplace distribution is a core member. A p'th order Sargan distribution has density[2] [3]

for parameters >0, j 0. The Laplace distribution results for p=0.

See also Log-Laplace distribution Cauchy distribution, also called the "Lorentzian distribution" (the Fourier transform of the Laplace) Characteristic function (probability theory)

References[1] Robert M. Norton (May 1984). "The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator" (http:/ / www. jstor. org/ pss/ 2683252). The American Statistician (American Statistical Association) 38 (2): 135136. doi:10.2307/2683252. . [2] Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-x [3] Johnson, N.L., Kotz S., Balakrishnan, N. (1994) Continuous Univariate Distributions, Wiley. ISBN 0-471-58495-9. p. 60

Lvy distribution

68

Lvy distributionLvy (unshifted) Probability density function

Cumulative distribution function

parameters: support: pdf: cdf: mean: median: mode: variance: skewness: ex.kurtosis: entropy: gamma mgf: cf: undefined infinite undefined undefined is Euler infinite

In probability theory and statistics, the Lvy distribution, named after Paul Pierre Lvy, is a continuous probability distribution for a non-negative random variable. In spectroscopy this distribution, with frequency as the dependent variable, is known as a van der Waals profile.[1]

Lvy distribution It is one of the few distributions that are stable and that have probability density functions that are analytically expressible, the others being the normal distribution and the Cauchy distribution. All three are special cases of the stable distributions, which does not generally have an analytically expressible probability density function.

69

DefinitionThe probability density function of the Lvy distribution over the domain is

where

is the location parameter and

is the scale parameter. The cumulative distribution function is

where

is the complementary error function. The shift parameter , and changing the support to the interval [ ,

has the effect of shifting the curve to ). Like all stable distributions, the

the right by an amount

Levy distribution has a standard form f(x;0,1) which has the following property: where y is defined as

The characteristic function of the Lvy distribution is given by

Note that the characteristic function can also be written in the same form used for the stable distribution with and :

Assuming

, the nth moment of the unshifted Lvy distribution is formally defined by:

which diverges for all n>0 so that the moments of the Lvy distribution do not exist. The moment generating function is then formally defined by:

which diverges for

and is therefore not defined in an interval around zero, so that the moment generating

function is not defined per se. Like all stable distributions except the normal distribution, the wing of the probability density function exhibits heavy tail behavior falling off according to a power law:

This is illustrated in the diagram below, in which the probability density functions for various values of c and are plotted on a log-log scale.

Lvy distribution

70

Probability density function for the Lvy distribution on a log-log scale.

Related distributions Relation to stable distribution: If Relation to Scale-inverse-chi-square distribution: If Relation to inverse gamma distribution: If Relation to Normal distribution: If Relation to Folded normal distribution: If then then then then then

Applications The Lvy distribution is of interest to the financial modeling community due to its empirical similarity to the returns of securities. It is claimed that fruit flies follow a form of the distribution to find food (Lvy flight).[2] The frequency of geomagnetic reversals appears to follow a Lvy distribution The time of hitting a single point (different from the starting point 0) by the Brownian motion has the Lvy distribution. The length of the path followed by a photon in a turbid medium follows the Lvy distribution. [3] The Lvy distribution has been used post 1987 crash by the Options Clearing Corporation for setting margin requirements because its parameters are more robust to extreme events than those of a normal distribution, and thus extreme events do not suddenly increase margin requirements which may worsen a crisis.[4] The statistics of solar flares are described by a non-Gaussian distribution. The solar flare statistics were shown to be describable by a Lvy distribution and it was assumed that intermittent solar flares perturb the intrinsic fluctuations in Earths average temperature. The end result of this perturbation is that the statistics of the temperature anomalies inherit the statistical structure that was evident in the intermittency of the solar flare data.[5]

Lvy distribution

71

Footnotes[1] "van der Waals profile" appears with lowercase "van" in almost all sources, such as: Statistical mechanics of the liquid surface by Clive Anthony Croxton, 1980, A Wiley-Interscience publication, ISBN 0471276634, 9780471276630, (http:/ / books. google. it/ books?id=Wve2AAAAIAAJ& q="Van+ der+ Waals+ profile"& dq="Van+ der+ Waals+ profile"& hl=en); and in Journal of technical physics, Volume 36, by Instytut Podstawowych Problemw Techniki (Polska Akademia Nauk), publisher: Pastwowe Wydawn. Naukowe., 1995, (http:/ / books. google. it/ books?id=2XpVAAAAMAAJ& q="Van+ der+ Waals+ profile"& dq="Van+ der+ Waals+ profile"& hl=en) [2] "The Lvy distribution as maximizing one's chances of finding a tasty snack" (http:/ / www. livescience. com/ animalworld/ 070403_fly_tricks. html). . Retrieved April 7 2007. [3] Rogers, Geoffrey L, Multiple path analysis of reflectance from turbid media. Journal of the Optical Society of America A, 25:11, p 2879-2883 (2008). [4] Do economists make markets?: on the performativity of economics (http:/ / books. google. com/ books?id=7BkByw1gtigC) by Donald A. MacKenzie, Fabian Muniesa, Lucia Siu, Princeton University Press, 2007, ISBN 978 0 69113016 3, p. 80 (http:/ / books. google. com/ books?id=7BkByw1gtigC& pg=PA80) [5] Scafetta, N., Bruce, J.W., Is climate sensitive to solar variability? Physics Today, 60, 50-51 (2008) (http:/ / www. fel. duke. edu/ ~scafetta/ pdf/ opinion0308. pdf).

Notes References "Information on stable distributions" (http://academic2.american.edu/~jpnolan/stable/stable.html). Retrieved July 13 2005. - John P. Nolan's introduction to stable distributions, some papers on stable laws, and a free program to compute stable densities, cumulative distribution functions, quantiles, estimate parameters, etc. See especially An introduction to stable distributions, Chapter 1 (http://academic2.american.edu/~jpnolan/stable/ chap1.pdf)

Log-logistic distributionLog-logistic Probability density function

values of

as shown in legend

Log-logistic distribution

72Cumulative distribution function

values of parameters: support: pdf: cdf: mean:

as shown in legend scale shape

if median: mode: if variance: skewness: ex.kurtosis: entropy: mgf: cf:

, else undefined

, 0 otherwise

See main text

In probability and statistics, the log-logistic distribution (known as the Fisk distribution in economics) is a continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later, for example mortality from cancer following diagnosis or treatment. It has also been used in hydrology to model stream flow and precipitation, and in economics as a simple model of the distribution of wealth or income. The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic distribution. It is similar in shape to the log-normal distribution but has heavier tails. Its cumulative distribution function can be written in closed form, unlike that of the log-normal.

Log-logistic distribution

73

CharacterisationThere are several different parameterizations of the distribution in use. The one shown here gives reasonably interpretable parameters and a simple form for the cumulative distribution function.[1] [2] The parameter is a scale parameter and is also the median of the distribution. The parameter distribution is unimodal when and its dispersion decreases as increases. The cumulative distribution function is is a shape parameter. The

where

,

,

The probability density function is

PropertiesMomentsThe th raw moment exists only when when it is given by[3] [4]

where B() is the beta function. Expressions for the mean, variance, skewness and kurtosis can be derived from this. Writing for convenience, the mean is

and the variance is Explicit expressions for the skewness and kurtosis are lengthy.[5] As

tends to infinity the mean tends to

, the

variance and skewness tend to zero and the excess kurtosis tends to 6/5 (see also related distributions below).

Log-logistic distribution

74

QuantilesThe quantile function (inverse cumulative distribution function) is :

It follows that the median is

, the lower quartile is

and the upper quartile is

.

ApplicationsSurvival analysisThe log-logistic distribution provides one parametric model for survival analysis. Unlike the more commonly-used Weibull distribution, it can have a non-monotonic hazard function: when the hazard function is unimodal (when 1, the hazard decreases monotonically). The fact that the cumulative distribution function can be written in closed form is particularly useful for analysis of survival data with censoring.[6] The log-logistic distribution can be used as the basis of an accelerated failure time model by allowing to differ between groups, or more generally by introducing covariates that affect by modelling[7]

Hazard function.

values of

as shown in legend

but not

as a linear function of the

covariates. The survival function is

and so the hazard function is

HydrologyThe log-logistic distribution has been used in hydrology for modelling stream flow rates and precipitation.[1] [2]

EconomicsThe log-logistic has been used as a simple model of the distribution of wealth or income in economics, where it is known as the Fisk distribution.[8] Its Gini coefficient is .[9]

Related distributions If X has a log-logistic distribution with scale parameter distribution with location parameter As the shape parameter and shape parameter . then Y=log(X) has a logistic and scale parameter ,

o