Chapter 11: The Normal Distributions - University of Notre … · 2015-02-18 · continuous...

35
ACMS 20340 Statistics for Life Sciences Chapter 11: The Normal Distributions

Transcript of Chapter 11: The Normal Distributions - University of Notre … · 2015-02-18 · continuous...

ACMS 20340Statistics for Life Sciences

Chapter 11:The Normal Distributions

Introducing the Normal Distributions

The class of Normal distributions is the most widely used variety ofcontinuous probability distributions.

Normal density curves are symmetric, single-peaked, andbell-shaped.

The are not “normal” in the sense of “typical” or “boring”, butthey are actually quite special.

Why “Normal”?

In 1809 Carl Friedrich Gauss developed his “normal law of errors”to help rationalize the use of the method of least squares.

! !

!"#$%&'()*+,-

!"#$%&'(#)*+,#-+./0+.12#3*455#0/6/,78/0#2.5#9"7+:*,#,*;#7<#/++7+5=#>7#2/,8#+*>.7"*,.?/#>2/#45/#7<#>2/#:/>270#7<#,/*5>#5@4*+/5A

Why “Normal”?

Many years ago I called the Laplace-Gaussian curve thenormal curve, which ... has the disadvantage of leadingpeople to believe that all other distributions of frequencyare in one sense or another ‘abnormal’.

-Karl Pearson (1920)

! !

!"#$%&'()*+,-

!"#$%&%'#()&#*+&,&-#..'/&01'&2#3.#-'45#6))7#$&-6(8'&01'&$+(9#.&-6(8':&;17-1<<<1#)&01'&/7)#/8#$0#*'&+=&.'#/7$*&3'+3.'&0+&>'.7'8'&01#0&#..&+01'(&/7)0(7>607+$)&+=&=('?6'$-%&#('&7$&+$'&)'$)'&+(&#$+01'(&@#>$+(9#.@<A

! B#(.&C'#()+$&DEFGHI

Francis Galton’s Bean Machine

! !

!"#$%&'()#*+,$-'(./#$(0#%1&$/

The first generator of Normal random variables.

The Shape of Normality

! !

!"#$%"&'#$()$*(+,&-./0

The density curve of a particular Normal distribution is describedby its mean µ and its stand deviation σ.

The equation of the density curve is

f (x) =1√2π

e−12( x−µ

σ )2

The Shape of Normality

! !

!"#$%"&'#$()$*(+,&-./0

!"#$%#&'()*$+,-.#$/0$1$21-)(+,31-$4/-513$

%(')-(6,)(/&$('$%#'+-(6#%$6*$()'$5#1&$!$1&%$()'$

')1&%1-%$%#.(1)(/&$"#

7&03#+)(/&$2/(&)

7&03#+)(/&$2/(&)

The density curve of a particular Normal distribution is describedby its mean µ and its stand deviation σ.

The equation of the density curve is

f (x) =1√2π

e−12( x−µ

σ )2

.

Mean and Standard Deviation

Changing the mean, µ, merely changes where the curve is centered.

! !

!"#$%#$&%'(#$&#)&%*"+,#(,-$

!"#$%&$%'(")'*)#$'*)+),-'."#$%)/'0")+)'(")'.1+2)'&/'.)$()+)34

!"#"$%&"$'&"(%'

)"#"*

5 6 7 8 9 :5 :6 :7 :8 :9 65 66 67 68 69 ;5

Here are Normal curves with µ = 10, 15, and 20, and σ = 3.

Mean and Standard Deviation

Changing the standard deviation, σ, changes the spread of thecurve.

This also changes the height (since the area = 1).

! !

!"#$%#$&%'(#$&#)&%*"+,#(,-$

!"#$%&$%'(")'*(#$+#,+'+)-&#(&.$'/"#$%)*'(")'*0,)#+'.1'(")'/2,-)3

4.()'("&*'5&66'#6*.'/"#$%)'(")'")&%"('''''78,)#'9':;

Here are curves with µ = 15 and σ = 2, 4, and 6.

Why Care About Normal Distributions?

1. They provide good descriptions of real data, including manybiological characteristics, such as blood pressure, bone density,heights, and yields of corn.

2. They provide good approximations of many chance outcomes,such as the proportion of boys over many hospital births.

3. Many statistical inference methods rely on Normaldistributions. (We’ll see this in chapter 13 and beyond.)

Warning!

Do not assume that every variable has a Normal distribution!!

For example, the guinea pig survival times are skewed to the right.

! !

!"#$%$&'

!"#$"%#&''()*#%+&%#&#,&-.&/0*#+&'#&#1"-)&0#2.'%-./(%."$33

4"-#*5&)60*7#'(-,.,&0#%.)*'#&$2#/"28#9*.:+%'#&-*#';*9*2#%"#%+*#-.:+%<

A Few Words on Notation

There is a common shorthand for Normal distributions.

A Normal distribution with mean µ and standard deviation σ isdenoted

N(µ, σ).

The 68-95-99.7 Rule

For a Normal distribution with mean µ and standard deviation σ,(i.e. N(µ, σ)),

I approximately 68% of observations fall within σ of µ;

I approximately 95% of observations fall within 2σ of µ; and

I approximately 99.7% of observations fall within 3σ of µ.

This rule holds for all Normal distributions.

The 68-95-99.7 Rule

! !

!"#$%&'()'((*+$,-.#

Heights of Young Women

The heights of young women between ages 18 to 24 areapproximately Normally distributed with µ = 64.5 in. and σ = 2.5in.

! !

!"#$%&'()*+$,-$.,/0($1,2&0

!"#$%&'()*(+),-$(.)/"-(0$"1(23(&)(45(06"(0776)8#/0&"9+(:)6/099+(1#'&6#;,&"1(.#&%(((((((((((((((

!"#"<5=>(#-=(0-1($"#"%&'"()&

Heights of Young Women

What is the probability that a young woman is taller than 62inches?

! !

!"#$%&'()*+$,-$.,/0($1,2&0

!"#$%&'%$"(%)*+,#,&-&$.%$"#$%#%/+0#1%&'%$#--(*%$"#1%23%&14"('5

(% between 62 and 67) + (% above 67) = (% above 62)

68% + 16% = 84%

Standard Normal Distribution

There are many possible Normal distributions (one for every µ andpositive σ).

By sliding and stretching the curve, we can transform any Normaldistribution to any other Normal distribution. Not only do Normaldistributions share common properties.

We single out the Normal distribution N(0, 1), and call it thestandard Normal distribution.

And we call the transformation of an arbitrary Normal distributionto the standard one standardizing.

Standardizing

If x is an observation from the Normal distribution N(µ, σ), thestandardized value of x is

z =x − µσ

The standardized values are often called z-scores.

z-scores

z-scores measure how many standard deviations an observation isaway from the mean.

A positive z-score indicates the observation is greater than themean.

A negative z-score indicates the observation is less than the mean.

Heights of Young Women

Recall the height distribution of young women is N(64.5, 2.5).

The standardized height is

z =height− 64.5

2.5

I A woman 70 inches tall has the z-score

z =70− 64.5

2.5= 2.2.

I A woman 5 feet (60 inches) tall has the z-score

z =60− 64.5

2.5= −1.8.

Finding Normal Probabilities

Whether using software or tables, Normal probabilities are given ascumulative probabilities.

! !

!"#$"#%&'()*+,&-)(.+.","/"01

!"#$"#%&'()*+&(,-$./%#&,%&$/01#(2&3,%4/1&5%,0/0)1)$)#(&/%#&+)6#*&/(&!"#"$%&'()7

8"#&9'4'1/$)6#&5%,0/0)1)$:&-,%&/&6/1'#&;&)(&$"#&5%,5,%$),*&,-&,0(#%6/$),*(&/$&,%&0#1,.&;7

The cumulative probability for a value x is the proportion ofobservations less than or equal to x .

Tips for Finding Normal Probabilities

We use the addition rule and the complement rule to findprobabilities.Recall that the probability of any individual value is 0.

So, P(X ≤ 40) = P(X < 40) + P(X = 40) = P(X < 40).

Sketching a picture of the area you want can be very helpful.

Methods of Finding Normal Probabilities

I Normal Curve applet on the website

I CrunchIt! distribution calculator

I Standard Normal Tables

The Standard Normal Table

! !

!"#$%&'()'*)$+,*-'.$!'/.#

The Standard Normal Table

! !

!"#$%&'()'*)$+,*-'.$!'/.#.0062

is the area under N(0,1)

left of z = -2.50

The Standard Normal Table

! !

!"#$%&'()'*)$+,*-'.$!'/.#.0062

is the area under N(0,1)

left of z = -2.50

.0060 is the area

under N(0,1) left of z = -2.51

The Standard Normal Table

! !

!"#$%&'()'*)$+,*-'.$!'/.#.0062

is the area under N(0,1)

left of z = -2.50

.0060 is the area

under N(0,1) left of z = -2.51

.0052 is the area

under N(0,1) left of z = -2.56

Heights of Young Women 1

What is the probability that a randomly selected young womanmeasures between 60 and 68 inches tall?

Recall, our distribution is N(64.5, 2.5).

First, let’s sketch the density curve to find the cumulativeprobabilities we need.

Heights of Young Women

! !

!"#$%&'()*+$,-$.,/0($1,2&0

We need to find the area less than 68 inches and subtract the arealess than 60 inches.

Heights of Young Women 2

First we find the z-scores by standardizing:

If60 ≤ x < 68,

then it follows that

60− 64.5

2.5≤ x − 64.5

2.5<

68− 64.5

2.5.

Thus if we set z = x−64.52.5 , we have

−1.8 ≤ z < 1.4.

Heights of Young Women 3

Second, we use the table to find the areas.

! !

!"#$%&'()*+$,-$.,/0($1,2&0

!"#$#%$&'%$()%$(*+,%$("$-./0$()%$*1%*'2

Heights of Young Women 4

Lastly, we finish the calculation.

Area between -1.8 and 1.4 = (area left of 1.4)− (area left of -1.8)

= 0.9192− 0.0359

= 0.8833

The probability that a randomly selected young woman measuresbetween 60 and 68 inches tall is about 0.88 or 88%.

Question 7

Using N(1, 5), for what value of a do we have P(X < a) = 14?

Standardize a:

za =a− 1

5

Use table to find za such that P(Z < za) = 14 .

P(Z < −0.68) = 0.2483P(Z < −0.67) = 0.2514

Choose za = −0.67 since that is closer to 0.25.

Now solve for a:

za = −0.67 =a− 1

5=⇒ a = (5)(−0.67) + 1 = −2.35

The Other Quartile

Using N(1, 5), for what value of a do we have P(X < a) = 14?

a = −2.35

Now, for what value of a do we have P(X > a) = 0.25?

P(X < a) = 1− 0.25 = 0.75.

za =a− 1

5

The table gives P(Z < 0.67) = 0.75.

a = (5)(0.67) + 1 = 4.35

General Procedure

To solve P(X < a) = p for some normal distribution N(µ, σ):

I Find the corresponding za value using the table for thestandard normal distribution: P(Z < za) = p.

I Use algebra to solve for a: a = µ+ σza.