Probability distributions Binomial probability distribution (ASW, section 5.4) Using Excel for the...

22
Probability distributions Binomial probability distribution (ASW, section 5.4) Using Excel for the binomial (ASW, pp. 222-223) Uniform probability distribution (ASW, section 6.1) Normal probability distribution (ASW, section 6.2) Bring the text to class on Monday and Wednesday, Sept. 29 and October 1. We will be using Tables 1 and 5 of Appendix B of ASW. Notes for September 29, 2008

Transcript of Probability distributions Binomial probability distribution (ASW, section 5.4) Using Excel for the...

Probability distributionsBinomial probability distribution (ASW, section 5.4)Using Excel for the binomial (ASW, pp. 222-223)Uniform probability distribution (ASW, section 6.1)Normal probability distribution (ASW, section 6.2)

Bring the text to class on Monday and Wednesday, Sept. 29 and October 1. We will be using Tables 1 and 5 of Appendix B of ASW.

Notes for September 29, 2008

Variance (ASW, 195)The variance of a probability distribution is the expected value of

the squares of the differences of the random variable x from the mean μ. Symbolically,

Var(x) = σ2 = ∑(x – μ)2 f(x)The Greek symbol σ is “sigma.”The variance can be difficult to calculate and interpret. It is in

units that are the square of the random variable x. Partly because of this, in statistical work it is more common to use the square root of the variance or σ. The standard deviation has the same units as x.

Variance of x, number of females selected

x f(x) x f(x) x - μ (x – μ)2 (x – μ)2f(x)

0 1/8 = 0.125 0.000 -1.5 2.25 0.28125

1 3/8 = 0.375 0.375 -0.5 0.25 0.09375

2 3/8 = 0.375 0.750 0.5 0.25 0.09375

3 1/8 = 0.125 0.375 1.5 2.25 0.28125

Total 8/8 = 1.000 1.500 0.75000

If a random sample of 3 persons is obtained from a large population composed of half females and half males, the expected number of females selected is μ = 1.5. The variance of the number of females selected is Var(x) = σ2 = ∑(x – μ)2 f(x) = 0.75. The standard deviation is the square root of 0.75, so that σ = 0.866.

Sample and population variance

• The variance of a sampling distribution is (ASW, 195)

• This is equivalent to the variance of a population (ASW, 92)

Note that the variance of a sample is

Var(x) = σ2 = ∑(x – μ)2 f(x)

N

xi

22 )(

1

)( 22

n

xxs i

Unbiased estimator

• The expected value of s2 is equal to σ2, a characteristic that is referred to as an unbiased estimate. That is,

• Using (n-1) in the denominator of s2, rather than n, produces this unbiased estimate.

• The concept of biased and unbiased estimators is important in constructing good estimators and is a major consideration in econometric work.

• When using Excel to estimate mean and standard deviation, make sure you use the proper formulae.

22 )( sE

Binomial probability distribution (ASW, 200)A binomial experiment is a probability experiment with the

following characteristics: – The experiment has n identical trials.– Two outcomes are possible on each trial – one trial is

termed a success and the other is termed a failure.– The probability of a success occurring on each trial is

p. This probability p is the same on each trial.– Since the outcome must either be a success or

failure, a failure is the complement of a success and the probability of a failure is 1-p. (Some texts refer to this probability as q, that is, q=1-p).

– The trials are independent of each other.

Given the above conditions:• The binomial probability distribution provides the probability

of x successes in n trials, where x=0, 1 ,2, 3, … , n.• Note that there are only two parameters that determine

binomial probabilities:n = the number of trials. p = the probability of success.

• Successive trials must be independent of each other. That is, the outcome of any one trial must not affect the probability of success or failure for any other trial. P (success failure on any other trial) = pP (success success on any other trial) = p

i

x f(x)

0 1/8 = 0.125

1 3/8 = 0.375

2 3/8 = 0.375

3 1/8 = 0.125

Total 8/8 = 1.000

Example – number of females selected in a random sample of size 3 from a large population of half males and half females.

The above distribution is a binomial probability distribution with success defined as selecting a female. There are n = 3 independent trials, the probability of success is p = 0.5, and x is the number of successes. In this experiment, selecting a male is termed a failure, and the probability of selecting a male is 1-p = 1-0.5 = 0.5.

x is the number of females selected and f(x) is the probability of x females being selected

Formula for binomial probability

1!0

)1)(2)....(2)(1(!

)1()!(!

!)( )(

nnnn

ppxnx

nxf xnx

If n is the number of trials of the binomial experiment and p is the probability of success, then the probability of x successes in n trials of the experiment is given by the probability function f(x), defined as follows:

Using the binomial formula

125.0125.01)125.0)(1(1231

123)5.01(5.0

)!03(!0

!3)0(

375.0125.03)5.0)(25.0(112

123)5.01(5.0

)!23(!2

!3)2(

)03(0

)23(2

f

f

Combinations and permutations (ASW, 146-147)

Permutations – the number of ways of arranging N objects, taken n at a time, where the order of the objects is taken into account, is:

Where is the number of possible combinations of N objects, taken n at a time, where the order of the objects does not matter.

)!(

!!

nN

NnCP

N

n

N

n

CN

n

)!(!

!

nNn

NC

N

n

Rationale for the binomial formulaProbability of x successes and (n-x) failures is

This is and represents the probability of any particular sequence of x successes and (n-x) failures.

And there are ways of arranging these x successes and (n-x) failures. To obtain the probability of x successes in n trials, multiply the probability of any particular sequence by this combination.

)1(...)1()1()1(... pppppppp n – x times x times

)1( ppxnx

Cn

x

Example – selection of Saskatchewan workers, classified by years of education and wages and salaries

From all these workers, randomly select 13 workers with 14-17 years of education. What is the probability that exactly 8 of these will have incomes of $45,000 or more? Probability of 8 or more?

A random sample from a large population means that successive selections are independent of each other. There are n = 13 workers selected. If success is defined as the probability of selecting a worker with an income of $45,000 or more, the probability of success p = 82/230 = 0.357.

Probability of 8 with $45,000 or more income = 0.0373. See the following slides for the calculation.

Using the formula

00000154.0)1)(00000153.0(!0!13

!13)357.01(357.0

)!1313(!13

!13)13(

000035827.0)643.0)(000004286.0(1!12

!1213)357.01(357.0

)!1213(!12

!13)12(

000387139.0)413449.0)(000012005.0(12!11

!111213)357.01(357.0

)!1113(!11

!13)11(

002556743.0)265847707.0)(000033627.0(123!10

!10111213)357.01(357.0

)!1013(!10

!13)10(

011512347.0)170940076.0)(000094192.0(1234!9

!910111213)357.01(357.0

)!913(!9

!13)9(

037323.0)109914469.0)(000263843.0(12345!8

!8910111213)357.01(357.0

)!813(!8

!13)8(

)1313(13

)1213(12

)1113(11

)1013(10

)913(9

)813(8

f

f

f

f

f

f

Probabilities to 3 decimal places Number of successes (x)

Probability of x or f(x)

8 0.0373

9 0.0115

10 0.0026

11 0.0004

12 0.0000

13 0.0000

The probability of 8 or more successes is the sum of the probabilities of 8, 9, 10, 11, 12, or 13 successes. This is 0.0373 + 0.0115 + 0.0026 + 0.0004 + 0.0000 + 0.0000 = 0.0518.

Using an Excel worksheet to obtain the probabilities

130.357

x f(x)0 0.0032117571 0.0231815912 0.0772239013 0.1572100884 0.2182115145 0.2180757686 0.1614371167 0.0896314948 0.0373232239 0.011512347

10 0.00255670811 0.00038713912 3.58239E-0513 1.52998E-06

1

Formula in Excel

130.357

x f(x)0 =BINOMDIST(C4,$A$1,$A$2,FALSE)1 =BINOMDIST(C5,$A$1,$A$2,FALSE)2 =BINOMDIST(C6,$A$1,$A$2,FALSE)3 =BINOMDIST(C7,$A$1,$A$2,FALSE)4 =BINOMDIST(C8,$A$1,$A$2,FALSE)5 =BINOMDIST(C9,$A$1,$A$2,FALSE)6 =BINOMDIST(C10,$A$1,$A$2,FALSE)7 =BINOMDIST(C11,$A$1,$A$2,FALSE)8 =BINOMDIST(C12,$A$1,$A$2,FALSE)9 =BINOMDIST(C13,$A$1,$A$2,FALSE)10 =BINOMDIST(C14,$A$1,$A$2,FALSE)11 =BINOMDIST(C15,$A$1,$A$2,FALSE)12 =BINOMDIST(C16,$A$1,$A$2,FALSE)13 =BINOMDIST(C17,$A$1,$A$2,FALSE)

=SUM(D4:D17)

n=13 is in cell a1 and p=0.357 is in cell a2.

Mean and standard deviation

• For a binomial distribution with n trials and p as the probability of success, the mean or expected value and variance of the random variable x is

• For the sex distribution of n = 3 individuals, the expected number of females selected is 3 × 0.5 = 1.5 and the variance is 3 × 0.5 × 0.5 = 0.75, as we previously determined.

• For the experiment of selecting 13 individuals, the mean number of those with 14-17 years of education is 13 × 0.357 = 4.64, the variance is 13 × 0.357 × 0.643 = 2.984, and the standard deviation is 1.727.

)1()(

)(2 pnpxVar

npxE

Examples where binomial could be applied• The probability of ten or more heads when flipping a

coin twelve times.• The probability of 6 threes in 15 rolls of a die.• The probability of selecting 56 or more unemployed

persons in a random sample of 500 workers in the province of Saskatchewan.

• The probability that the tax form has been correctly completed in a random sample of 500 Canadian taxpayers.

• The probability that more than 1/3 of a sample 1,000 Saskatchewan residents has a university degree.

Why might the binomial not apply in the following?• The probability that there will be snow on 20 or more days in

January?• The probability of 6 threes and 7 fives in 25 rolls of a die. • The probability that the UR Rams win all of their remaining

football games?• The probability that the Conservatives win 155 or more seats,

among the 308 up for election, in the coming federal election.• The probability that 10 or more automobiles in a car dealer’s

lot in Regina will have defective transmissions.• The probability that fifty or more clients of the Regina Food

Bank , during the month of October, will be unemployed.

Extending the binomial

• When the number of trials of a binomial experiment is large, ie. if n is large, then it is time-consuming to compute binomial probabilities without a computer.

• In this case, it is possible to use the normal distribution to approximate the binomial probabilities. See ASW, section 6.3.

• In addition, we may not be as interested in the number of successes as in the proportion of successes. In this case, the normal approximation can be used to obtain probabilities for the proportion p of the times that a success occurs. See ASW, section 7.6.

Later on Monday or on Wednesday

• Uniform probability distribution.• Normal probability distribution.• Normal approximation to the binomial

probability distribution.