Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist....

24
04 - Special Distributions Discrete RVs Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution Beta Distribution Bivariate Normal Road Map to Distributions References Special Distributions Brian Vegetabile 2017 Statistics Bootcamp Department of Statistics University of California, Irvine September 14th, 2016 1 / 24

Transcript of Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist....

Page 1: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Special Distributions

Brian Vegetabile

2017 Statistics BootcampDepartment of Statistics

University of California, Irvine

September 14th, 2016

1 / 24

Page 2: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Bernoulli Distribution I

• One of the most important distributions in statistics is theBernoulli Distribution

• The Bernoulli distribution is used to describe experimentswith binary outcomes, say 0 and 1.

• Think ‘heads’ or ‘tails’, ‘yes’ or ‘no’, ‘win’ or ‘loss’• Often called a ‘Bernoulli trial’

• Ultimately, there is some probability p of ‘succeeding’ anda corresponding probability (1− p) of failing based uponthe rules of probability.

2 / 24

Page 3: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Bernoulli Distribution II

• If we define the value 1 as being a success, we can writethis as follows

X =

{1 with probability p0 with probability 1− p , 0 ≤ p ≤ 1

• To create a probability mass function, consider

P [X = 1] = p P [X = 0] = 1− p

therefore one way to write the mass function is as follows

P [X = x] = pX(x) =

{px(1− p)1−x x = 0, 1

0 otherwise

• Show properties of this distribution: CDF, expectation,variance, MGF...

3 / 24

Page 4: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Bernoulli Distribution III• It is easy to see that this is a probability mass function.

• pX(x) ≥ 0 for all x, and•∑

x pX(X) = p+ (1− p) = 1.

• We can also easily find the mean and variance,

E(X) =∑x

xpX(x) = 1× (p) + 0× (1− p) = p

E(X2) =∑x

x2pX(x) = 12 × (p) + 0× (1− p) = p

V ar(X) = E(X2)− (E(X))2 = p− p2 = p(1− p)

• Additionally, we can find the moment generating functionfor this random variable

E(etX) =∑x

etxpX(x) = et(1)p+et(0)(1−p) = (1−p)+pet

4 / 24

Page 5: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Binomial Distribution I

• Related to the Bernoulli distribution is the BinomialDistribution.

• A binomial random variable can arise from a sequence ofBernoulli trials with the properties that,

• Trials are independent events• Each trial results in exactly one of the same two mutually

exclusive outcomes• The probability of success (and subsequently failure)

remains constant from trial to trial.

• Therefore a binomial random variable can be consideredas the sum of n Bernoulli random variables. That is thenumber of successes in n Bernoulli trials.

• Example: Number of ‘heads’ in ten independent cointosses.

5 / 24

Page 6: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Binomial Distribution II

• We can write the probability mass function in a similarway to the Bernoulli distribution

P [X = x] = pX(x) =

{ (nx

)px(1− p)n−x x = 0, 1, 2, . . . , n

0 otherwise

• Note: Showing that this is indeed a distribution requiresthe use of the binomial theorem, where

(x+ y)n =

n∑i=0

(n

i

)xn−iyi

• The expectation and variance are also similar

E(X) = np V ar(X) = np(1− p)

6 / 24

Page 7: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Poisson Distribution I

• Another important discrete distribution is the Poissondistribution.

• While the Binomial distribution counts the number ofsuccesses in a series of trials, the Poisson distributioncounts the number of events in a given time interval.

• Binomial ‘counts’ are bounded by the number of trials• Poisson counts are in an interval are not bounded.

• Examples that generally can be modeled with a PoissonDistribution

• The number of misprints on a page (or a group of pages)of a book

• The number of customers entering a post office on a givenday

• The number of α-particles discharged in a fixed period oftime from some radioactive material

7 / 24

Page 8: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Poisson Distribution II

• Additionally, the Poisson distribution can be used tomodel the number of events that occur in a spatial region.

• The distribution is parameterized by a value λ which isoften referred to as the rate or intensity of thedistribution, which governs the mean of the distribution

• The mass function is given as follows

f(x|λ) =

{e−λλ

x

x! for x = 0, 1, 2, . . .0 otherwise

8 / 24

Page 9: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Poisson Distribution III

• To verify that this is a distribution, we must show that∑∞x=0 f(x|λ) = 1. Additionally, from calculus, we know

the power series characterization ea =∑∞

n=0an

n! . Thus,

∞∑x=0

e−λλx

x!= e−λ

∞∑x=0

λx

x!= e−λeλ = 1

• We can use similar mathematical tricks to derive themean and variance.

• The Poisson distribution can be used to approximate theBinomial distribution.

9 / 24

Page 10: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Self-Study: Review Poisson Process

• The Poisson distribution can be derived from a few basicassumptions that we list below, but do not show thederivation:

i) Start with no arrivalsii) Arrivals in disjoint time periods are independentiii) Number of arrivals depends only on the period lengthiv) Arrival probability is proportional to the period length, if

length is smallv) No simultaneous arrivals

10 / 24

Page 11: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Uniform Distribution

• The simplest continuous distribution is when mass isspread out ‘uniformly’ on some interval [a, b]

• The density function is as follows:

f(x|λ) =

{1b−a for x ∈ [a, b]

0 otherwise

• Quickly show CDF and Expected Values

11 / 24

Page 12: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Normal Distribution I

• The most “famous” distribution is the Normal distributionand it is often informally referred to as the ‘bell curve’

• The distribution is symmetric and unbounded on the realline, and concentrates mass at it’s mean/mode/median.

• It is very useful and can be used to satisfactorily representmany phenomenon in the world such as

• Distribution of heights of Airforce Pilots• Distribution of IQ scores• Distribution of measurement errors

• The distribution plays an important role in the centrallimit theorem which is used in much of statistics.

12 / 24

Page 13: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Normal Distribution II

• The density of the distribution is

f(x|λ) =1√

2πσ2exp

(−(x− µ)2

2σ2

)for −∞ < x <∞

• The following are the mean and variance of thedistribution

E(X) = µ V ar(X) = σ2

•√σ2 = σ is often referred to the as standard deviation of

the distribution.

• We do not derive these properties here.

13 / 24

Page 14: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Gamma Distribution I

• The Gamma distribution is an important positive valueddistribution

• The Gamma distribution, under various parametersettings, is related to many other named distributions.(exponential, Weibull, χ2, etc)

• The Gamma distribution allows plays important rolesthroughout Bayesian Statistics.

14 / 24

Page 15: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Gamma Distribution II

• An important mathematical relationship for thisdistribution is that of the gamma function, specificallyprovided α is positive,

Γ(α) =

∫ ∞0

tα−1e−tdt.

• Related are two important properties of this function

1 Γ(α+ 1) = αΓ(α)

2 For any integer n > 1, Γ(n) = (n− 1)!.

15 / 24

Page 16: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Gamma Distribution III

• The density of the gamma distribution is

f(x|α, β) =1

Γ(α)βαxα−1 exp

(−xβ

)where α is the shape parameter since it controls the‘peakedness’ of the distribution and β is the scale since itmainly influences the spread of the distribution.

• There is also an alternative parameterization... SeeWikipedia (This will trip you up).

16 / 24

Page 17: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Kernel Trick for Integration I

• To illustrate the ‘kernel trick’ for integration, we find theexpected value of the gamma distribution.

E(X) =

∫ ∞0

xxα−1 exp

(−xβ

)dx

=1

Γ(α)βα

∫ ∞0

x(α+1)−1 exp

(−xβ

)dx

• We notice though that if we multiply and divide by1

Γ(α+1)βα+1 , then the integral becomes the pdf of a

Gamma(α+ 1, β) distribution.

=Γ(α+ 1)βα+1

Γ(α)βα

∫ ∞0

1

Γ(α+ 1)βα+1x(α+1)−1 exp

(−xβ

)dx

17 / 24

Page 18: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Kernel Trick for Integration II

• The term on the right integrates to 1 and we are left withthe following expression.

=Γ(α+ 1)βα+1

Γ(α)βα

= αβ

Where the last line holds by properties of the gammafunction.

• The kernel trick will become invaluable through the courseof the year.

18 / 24

Page 19: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Special Gamma Distributions

• The gamma(α, β) family has many special distributions.

• When α = 1, the gamma distribution reduces to theexponential distribution

• If α = p/2, where p is an integer, and β = 2, then thegamma distribution becomes a χ2 distribution with pdegrees of freedom

• The χ2 distribution will become very importantthroughout the year.

• The list goes on and on....

19 / 24

Page 20: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Beta Distribution I

• Another important distribution that will come up often isthe Beta distribution which a continuous and boundedrandom variable.

• The density is continuous on the interval (0, 1) and isindexed by the parameters α and β.

• Most frequently used in Bayesian statistics to model apriori beliefs about proportions.

• There is a more general family of beta distributions forgeneral intervals

20 / 24

Page 21: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Beta Distribution II

• The distribution relies on the relationship

B(α, β) =

∫ 1

0xα−1(1− x)β−1dx.

where B(α, β) = Γ(α)Γ(β)Γ(α+β) .

• Thus the density is

f(x|α, β) =1

B(α, β)xα−1(1−x)β−1 for x ∈ [0, 1], α > 0, β > 0.

• When β = α = 1 the beta reduces to the Uniformdistribution on (0, 1).

21 / 24

Page 22: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Bivariate Normal Distributions

• To introduce multivariate distributions, we define thebivariate normal distribution.

• A RV X = (X1, X2) has the bivariate normal distributionN(µ1, µ2, σ

21, σ

22, ρ) if (for some σi > 0,−1, ρ < 1) and

real-valued µi

f(x|µ1, µ2, σ21 , σ

22 , ρ) =

1

2πσ1σ2√

1− ρ2exp

(−

1

2(1− ρ2)

{(x1 − µ1

σ1

)2−

(x1 − µ1

σ1

)(x2 − µ2

σ2

)+

(x2 − µ2

σ2

)2})

• When ρ = 0 this will factor into two independent normaldistributions.

22 / 24

Page 23: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

Roadmap of Univariate Distributions

• https://en.wikipedia.org/wiki/Relationships_

among_probability_distributions

23 / 24

Page 24: Special Distributions - Data Science Initiative...2020/09/04  · Bernoulli Dist. Binomial Dist. Poisson Dist. Continuous RVs Uniform Distribution Normal Distribution Gamma Distribution

04 - SpecialDistributions

Discrete RVs

Bernoulli Dist.

Binomial Dist.

Poisson Dist.

ContinuousRVs

UniformDistribution

NormalDistribution

GammaDistribution

BetaDistribution

Bivariate Normal

Road Map toDistributions

References

References

24 / 24