2 Discrete Distributions -...

Post on 24-Apr-2020

11 views 0 download

Transcript of 2 Discrete Distributions -...

ProbabilitySTAT 416Spring 2007

2 Discrete Distributions

1. Introduction

2. Mean and Variance

3. Binomial Distribution

4. Poisson Distribution

5. Other Discrete Distributions

1

2.1 Introduction

Example: Fair dice, Observations: 1, 2, 3, 4, 5, 6Each observation probability p = 1/6:

P (1) = 1/6, P (2) = 1/6, . . .

We observe realizations of a random variable

Random variable: Map from a (suitable) probability space into thereal numbers X : Ω → R

Examples:

Ω = 1, 2, 3, 4, 5, 6P (i) = 1/6, i = 1, . . . 6

X(i) = i

2

Example continued

Two fair dices, Sum of observations X = X1 + X2

X1 and X2 both random variables like before (independent)

Ω = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12P (2) = P (12) = 1/36

P (3) = P (11) = 2/36

P (4) = P (10) = 3/36

P (5) = P (9) = 4/36

P (6) = P (8) = 5/36

P (7) = 6/36

X : Ω → R, X(i) = i

3

Discrete random variable

Sample space Ω with finite or countable number of elements,i.e. index set N: Ω = x1, x2, x3, . . .

It is always possible to identify the sample space Ω with the set ofall possible observations of the random variable

Random variable X then has the form X : Ω → R, X(xi) = xi

fully described by its probability function:

P : Ω → [0, 1], P (xi) = pi

Probability of elementary events fully describe distribution of adiscrete random variable

4

Cumulative distribution function (cdf)

F : R→ [0, 1], F (x) = P (X ≤ x)

Example: Fair dice

−2 0 2 4 6 8

0

0.2

0.4

0.6

0.8

1

F(x

) =

P(X

≤ x

)

x

5

Uniform distribution

n possible events with equal probability

Ω = 1, . . . , n P (i) = 1/n

Cummulative distribution function:

F (x) =

0, x < 1

i/n, i ≤ x < i + 1, i = 1, . . . , n− 1

1, x ≥ n

at x ∈ Ω the CDF has jumps of size 1/n

⇒ connection between CDF and probability function

P (i) = F (i)− F (i− 1), for i ∈ Ω

6

Properties of the CDF

Specifically for discrete random variables:

CDF is monotonously increasing step function with jumps at eventswith positive probability

Generally for CDF holds:

• P (x) = F (x)− F (x−), where F (x−) = limh→x,h<x

F (h)

due to the definition of F (x) = P (X ≤ x)

• P (a < X ≤ b) = F (b)− F (a)

• lima→−∞

F (a) = 0, limb→∞

F (b) = 1

• F (x) monotonously increasing

7

Exercise

CDF of a random variable X given by

F (x) =

0, x < 1

1− 2−k, k ≤ x < k + 1, k = 1, 2, . . .

1. Draw the CDF in the range of x ∈ [0, 5]

2. Determine the probability function of X

3. Compute the probability of X > 5?

8

2.2 Mean and Variance

Essential properties of a distribution

Important for practical purposes

⇒ Reduction of information of data

Mean is a measure of central tendency, also called expected value,corresponds to the arithmetic mean of a sample

Variance is a measure of dispersioncorresponds to the deviation from the mean of a sample

Both figures based on moments of distribution, specifically for thenormal distribution of major importance

9

Mean

Discrete random variable X with probability space Ω, P

Definition of mean:

E(X) =∑

x∈Ω

xP (x)

Weighted sum of values of Ω

weights are the corresponding probabilities of events

Usual notation: µ = E(X)

Example Fair dice:

E(X) = 1 · 1/6 + 2 · 1/6 + · · ·+ 6 · 1/6

=1 + 2 + 3 + 4 + 5 + 6

6= 21/6 = 3.5

10

Transformation of random variables

Discrete random variable X with probability space Ω, P

Specifically for all x ∈ Ω : P (x) = px

Additionally given f : Ω → R, image set f(Ω)

Definition: f(X) is the random variable Y : f(Ω) → R with

Y (y) = y and P (y) =∑

x∈Ω:f(x)=y

px

I.e. values of events x ∈ Ω are transformed into f(x)Probabilities added for all x with equal images f(x)

11

Examples for transformation

1) Fair dice, f(x) = x2, Y = X2:

Y (y) = y with y ∈ ΩY := 1, 4, 9, 16, 25, 36P (1) = P (4) = P (9) = P (16) = P (25) = P (36) = 1/6

2) Fair dice, g(x) = (x− 3.5)2, Z = (X − 3.5)2:

Z(z) = z with z ∈ ΩZ := 2.52, 1.52, 0.52 = 6.25, 2.25, 0.25P (6.25) = p1 + p6 = 1/3P (2.25) = p2 + p5 = 1/3P (0.25) = p3 + p4 = 1/3

Exercise:Ω = −1, 0, 1, P (X = −1) = P (X = 1) = 1/4, P (X = 0) = 1/2

Compute Y = X2 and Z = X3

12

Expectation of functions

Example: Fair dice – continued:

1) E(f(X)) = E(Y ) = 1 · 1/6 + 4 · 1/6 + · · ·+ 36 · 1/6

=1 + 4 + 9 + 16 + 25 + 36

6= 91/6 = 15.1667

2) E(g(X)) = E(Z) = 6.25/3 + 2.25/3 + 0.25/3 = 2.9167

In general: Computation of expectation of f(X):

E(f(X)) =∑

x∈Ω

f(x)P (x)

Weighted sum of the values of f(Ω)

Note:∑

x∈Ω,f(x)=y

f(x)P (x) =∑

y∈f(Ω)

yPY (y)

13

Linear Transformation

For general a, b ∈ R:

E(aX + b) = aE(X) + b

Proof:

E(aX + b) =∑

x∈Ω

(ax + b)P (x)

= a∑

x∈Ω

xP (x) + b∑

x∈Ω

P (x)

= aE(X) + b

Specifically: E(X − µ) = E(X − E(X)) = 0

14

Variance

Definition:

Var (X) := E(X − µ)2

Usual notation: σ2 = Var (X)

σ . . . Standard deviation: SD(X) =√

Var (X)

It holds Var (X) = E(X2)− µ2

E(X − µ)2 =∑

x∈Ω

(x− µ)2P (x) =∑

x∈Ω

(x2 − 2µx + µ2)P (x)

=∑

x∈Ω

x2P (x)− 2µ∑

x∈Ω

xP (x) + µ2∑

x∈Ω

P (x)

= E(X2)− 2µ2 + µ2 = E(X2)− µ2

15

Example for variance

Three random variables X1, X2, X3

X1 = 0 with probability 1

X2 equally distributed on −1, 0, 1X3 equally distributed on −50,−25, 0, 25, 50

All three random variables have mean 0

Var (X1) = 02 · P (0) = 0

Var (X2) = (−1)2 · 1/3 + 12 · 1/3 = 2/3

Var (X3) = (−50)2 · 1/5 + (−25)2 · 1/5 + 252 · 1/5 + 502 · 1/5 = 1250

Variance gives additional information on the distribution

16

Properties of variance

For general a, b ∈ R:

Var (aX + b) = a2Var (X)

Proof:

Var (aX + b) = E(aX + b− aµ− b)2 = a2E(X − µ)2

Specifically: Var (−X) = Var (X)

Var (X + b) = Var (X)

At times E(X2)− µ2 is easier to compute than E(X − µ)2

Exercise: Variance of fair dice with both formulas

17

Moments of a distribution

k-th moment of a random variable: mk := E(Xk)

k-th central moment: zk = E((X − µ)k)

m1 . . . mean

z2 = m2 −m21 . . . variance

Of practical importance also third and fourth moment

Skewness: ν(X) := z3σ3 = E(X3

∗ ) where X∗ := (X − µ)/σ

• ν(X) = 0 . . . symmetric distribution

• ν(X) < 0 . . . left skewed

• ν(X) > 0 . . . right skewed

Kurtosis: z4σ4 = E(X4

∗ )(has to do with curvature → Normal distribution)

18

Exercise: Skewness

Random variable X has the following distribution:

P (1) = 0.05, P (2) = 0.1, P (3) = 0.3, P (4) = 0.5, P (5) = 0.05

Draw probability function and CDF

Compute skewness!

Compute skewness for the mildly changed distribution

P (1) = 0.05, P (2) = 0.3, P (3) = 0.3, P (4) = 0.3, P (5) = 0.05

19

2.3 Binomial distribution

Bernoulli trial: Two possible outcomes (0 or 1)

P (X = 1) = p, P (X = 0) = q where q = 1− p

E.g. fair coin: p = 1/2

Example: Throw an unfair coin twice. P (head) = p = 0.7Compute probability distribution of Z, the number of heads!

Sample space ΩZ = 0, 1, 2Throwing both coins independently!

P (Z = 0) = P (X1 =0, X2 =0) = P (X1 =0)P (X2 =0) = 0.32 = 0.09

P (Z = 1) = P (X1 =0, X2 =1) + P (X1 =1, X2 =0) =

= 2 · P (X1 =0)P (X2 =1) = 2 · 0.3 · 0.7 = 0.42

P (Z = 2) = P (X1 =1, X2 =1) = P (X1 =1)P (X2 =1) = 0.72 = 0.49

20

Binomial distribution

n independent Bernoulli trials, P (X = 1) = p

Y : Number of successes (trials with outcome 1) binomiallydistributed:

P (Y = k) =(nk

)pkqn−k

Proof: Independence ⇒ Probability for each single sequencewith k successes (1) and n− k failures (0) given by pk(1− p)n−k

Number of such sequences: r-combination without replacement

Notation: Y ∼ B(n, p)

Exercise: Throw independently five fair coins

Compute distribution of the number of heads!

21

Example binomial distribution

Exam with failure rate of 20%

Distribution of number of successes for 10 students?

P (X = 7) =(

107

)· 0.87 · 0.23 = 0.2013

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

22

Examples binomial distribution: n = 10

p = 0.1

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

p = 0.2

p = 0.3

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

p = 0.5

23

Exercise: S.R. Example 6f

Communication system - n components, each functionsindependently with probability p

Total system operates if at least one-half of its components work

1. For which values of p is a 5 - component system more likely towork than a 3 - component system?

2. Generalize: For which values of p is a 2k + 1 - componentsystem more likely to work than a 2k − 1 - component system?

24

Application: Drawing with replacement

population of N objects

• M of N objects have some property E• Draw n objects with replacement

Number X of drawn objects with property E are binomiallydistributed:

X ∼ B(n, M/N)

Exercise: Bowl with 3 black and 9 white balls; draw 5 balls withreplacement, X . . . number of drawn balls that are black

• Probability function of X?

• Expected value of X?

25

Mean of binomial distribution

X ∼ B(n, p) ⇒ E(X) = np

Using k(nk

)= n

(n−1k−1

)we obtain

E(X) =n∑

k=1

k

(n

k

)pkqn−k = np

n∑

k=1

(n− 1k − 1

)pk−1qn−k

= npn−1∑

i=0

(n− 1

i

)piqn−1−i

and due to the binomial theorem

n−1∑

i=0

(n− 1

i

)piqn−1−i = (p + q)n−1 = 1

Alternative Proof: Differentiate (p + q)n = 1 w.r.t. p

26

Variance of binomial distribution

X ∼ B(n, p) ⇒ Var (X) = npq

Again using k(nk

)= n

(n−1k−1

)we obtain

E(X2) =n∑

k=1

k2

(n

k

)pkqn−k = np

n∑

k=1

k

(n− 1k − 1

)pk−1qn−k

= npn−1∑

i=0

(i + 1)(

n− 1i

)piqn−1−i = np (n− 1)p + 1

and thus

Var (X) = E(X2)− µ2 = np (n− 1)p + 1 − (np)2 = np(1− p)

Alternative Proof: Differentiate (p + q)n = 1 twice w.r.t. p

27

2.4 Poisson distribution

Definition: Ω = N0 = 0, 1, 2, · · ·

P (X = k) = λk

k! e−λ , λ > 0

Notation: X ∼ P(λ)

Poisson-distributed random variable can take in principle arbitrarilylarge values - though with very small probability

Example: λ = 2

P (X ≤ 1) =20

0!e−2 +

21

1!e−2 = (1 + 2)e−2 = 0.4060

P (X > 4) = 1− P (X ≤ 4) = 1− (1 + 2 +42

+86

+1624

)e−2

= 1− 0.9473 = 0.0527

28

Examples Poisson distribution

λ = 1

0 1 2 3 4 5 6 7 8 9 10 11 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3 4 5 6 7 8 9 10 11 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

λ = 1.5

λ = 3

0 1 2 3 4 5 6 7 8 9 10 11 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3 4 5 6 7 8 9 10 11 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

λ = 5

29

Application

To model rare events

Examples

• Number of clients within a certain time frame

• Radioactive decay

• Number of errors per slide

• Number of people older than 100 years (per 1 000 000)

• number of false alarms per day

• etc.

Connection between Poisson distributed events and the time inbetween two events ⇒ Exponential distribution

30

Assumptions

Events happening at certain points in time are Poisson - distributedunder the following assumptions

• Probability that exactly 1 event occurs within a given timeinterval of length h is approximately λh

• Probability that 2 or more events occur within a given timeinterval of length h is very small compared to h

• Looking at two time intervals that do not overlap the number ofevents in one interval is independent from the number ofevents in the other interval

For each time interval [t1, t2] the probability for the number ofoccurring events is Poisson distributed with parameter λ(t2 − t1).

31

Example

Suppose that the number of earthquakes per week is Poissondistributed with parameter λ = 2

1. What is the probability of at least 3 earthquakes during thenext week?

2. What is the probability of at least 3 earthquakes during thenext two weeks?

Solution: 1)P (X ≥ 3) = 1− P (X ≤ 2) = 1− (1 + 2 + 4

2 )e−2 = 0.3233

2) Now we have a time interval of 2 weeks, therefore we get aPoisson distribution with parameter 2λ = 4

P (X ≥ 3) = 1− P (X ≤ 2) = 1− (1 + 4 + 162 )e−4 = 0.7619

32

Mean and variance

X ∼ P(λ) ⇒ E(X) = λ

Proof:

E(X) =∞∑

k=0

kλk

k!e−λ = e−λ

∞∑

k=1

λk

(k − 1)!= λe−λ

∞∑

j=0

λj

j!

X ∼ P(λ) ⇒ Var (X) = λ

Proof:

E(X2)=∞∑

k=0

k2 λk

k!e−λ =e−λ

∞∑

k=1

kλk

(k − 1)!=λe−λ

∞∑

j=0

(j + 1)λj

j!=λ(λ+1)

E(X2)− E(X)2 = λ(λ + 1)− λ2 = λ

33

Exercise

Suppose that a book has on average on every third page a typo.

1. What is the probability that you find at least two errors on thepage that you are reading right now?

2. What is the probability that you find at least two errors within10 pages?

3. What is the probability that you find at least two errors on anyof 10 pages?

34

Approximation of binomial distribution

X ∼ B(n, p), where n large and p small (e. g. n > 10 and p < 0.05)

⇒ X ∼ P(np)i.e. X is approximately Poisson-distributed with parameter λ = np

Motivation: Let λ := np

P (X = k) =n!

k! (n− k)!pkqn−k

=n(n− 1) · · · (n− k + 1)

k!· λk

nk· (1− λ/n)n

(1− λ/n)k

For n large and moderate λ (i.e. p small) we have

n(n− 1) · · · (n− k + 1)nk

≈ 1 (1−λ/n)k ≈ 1 (1−λ/n)n ≈ e−λ

and thus P (X = k) ≈ λk

k! e−λ

35

Example Poisson approximation

Comparison of Poisson approximation (λ = 0.5) with exact CDF ofbinomial distribution (n = 10, p = 0.05)

0 1 2 3 4 5 60.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Blue: P ∼ B(10, 0.05)Red: P ∼ P(0.5)

Binomial:

P (X ≤ 3) = 0.9510 + 10 · 0.05 · 0.959

+ 45 · 0.052 · 0.958 + 120 · 0.053 · 0.957

= 0.99897150206211

Poisson approximation:

P (X ≤ 3) ≈

≈(

1 + 0.5 +0.52

2+

0.53

6

)e−0.5

= 0.99824837744371

36

2.5 Other discrete Distributions

We will discuss

• Geometric

• Hypergeometric

Apart from that

• Negative binomial (more general: Panjer)

• Generalized Poisson

• Zeta distribution

• etc.

Wikipedia very helpful

37

Geometric distribution

Independent Bernoulli - trials with probability p

X . . . number of trials until the first success

Therefore P (X = k) = qk−1 p

k − 1 failures with probability q = 1− p

Exercise: Bowl with N white and M black balls

Drawing with replacement

a) Probability, that it takes exactly k trials, till one draws a black ball

b) Probability, that it takes at most k trials, till one draws a black ball

38

Geometric distribution

Compare shape of distribution later with density of exponentialdistribution

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Memorylessness

39

Mean and variance

Note that∞∑

j=0

qj = 11−q and thus

∞∑k=1

qk−1p = p1−q = p

p = 1

Differentiate:∞∑

k=1

kqk−1 = ddq

∞∑k=0

qk = 1(1−q)2

E(X) =∞∑

k=1

kqk−1p =p

(1− q)2=

1p

Differentiate again:∞∑

k=1

k(k − 1)qk−2 = d2

dq2

∞∑k=0

qk = 2(1−q)3

E(X2) =∞∑

k=1

k2qk−1p = pq∞∑

k=1

k(k−1)qk−2 +p∞∑

k=1

kqk−1 =2pq

p3+

1p

And thus: Var (X) = E(X2)− E(X)2 = 2p2 − 1

p − 1p2 = 1−p

p2

40

Hypergeometric distribution

Binomial distribution: Drawing with replacement

Exercise: Bowl, 3 black balls, 5 white balls,Draw 4 balls with and without replacement respectively.

Compute for both cases distribution of the number of drawn blackballs!

0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

with replacement0 1 2 3 4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

without replacement

41

Hypergeometric distribution

N objects from which M have some property E . Draw n objectswithout replacement, X number of drawn objects with property E .

P (X = k) = (Mk )(N−M

n−k )(N

n)

We use the definition(ab

)= 0, whenever a < b

Clearly we have P (X = k) = 0 if M < k

I cannot draw more black balls than there are in the bowl

Also clear that P (X = k) = 0 if N −M < n− k

I cannot draw more white balls than there are in the bowl

Thus: Ω = k : max(0, n−N + M) ≤ k ≤ min(n, M)

42

Mean and variance

Without proof (easy but slightly tedious computations)

E(X) = nMN , Var (X) = nM

N (1− MN )N−n

N−1 ,

Define p := MN and compare with binomial distribution

E(X) = np same formula like for binomial

Var (X) = np(1− p)N−nN−1 asymptoticly like binomial

because limN→∞ N−nN−1 = 1

If N and M very large compared to n, then we have approximatelyX ∼ B(n, M

N ) (without proof)

43

Example hypergeometric distribution

Quality control: Delivery of 30 boxes with eggs,10 boxes contain at least one broken egg,Take a sample of size 6

• Compute probability that two boxes within the sample containbroken eggs?

N = 30,M = 10, n = 6

P (X = 2) =

(102

)(204

)(306

) = 0.3672

• Mean and variance for the number of boxes within the samplethat contain broken eggs?

E(X) = 6 · 1030 = 2; Var (X) = 6 · 1

3 · 23 · 24

29 = 1.1034

44

Exercise: Approximation by binomial distribution

Lottery with 1000 lots, 200 are winningAssume you buy 5 lots

1. Compute probability, that at least one lot will win

Solution: 0.6731

2. Compute the same probability using the binomialapproximation

Solution: 0.6723

45

Summary discrete distributions

• Uniform: Ω = x1, . . . , xn , P (X = xk) = 1/n

• Binomial: X ∼ B(n, p), P (X = k) =(nk

)pkqn−k

We have E(X) = np, Var (X) = npq Ω = 0, . . . , n

• Poisson: X ∼ P(λ), P (X = k) = λk

k! e−λ

We have E(X) = λ, Var (X) = λ Ω = 0, 1, 2 . . .

• Geometric: P (X = k) = p qk−1

We have E(X) = p−1, Var (X) = q p−2 Ω = 1, 2 . . .

• Hypergeometric: P (X = k) =(Mk

)(N−Mn−k

)/(Nn

)

We have E(X) = np, Var (X) = np(1− p)N−nN−1 , p = M

N

46