Stat 155, Section 2, Last Time

45
Stat 155, Section 2, Last Time • Interpreted Midterm Results • Variance of Random Variables – From Probability Table – Properties: • Ignores shift • Multiples comes through squared • Sum when independent • Sampling Distributions – Binomial Distribution • Calculate Probs with BINOMDIST • Mean & Standard Deviation

description

Stat 155, Section 2, Last Time. Interpreted Midterm Results Variance of Random Variables From Probability Table Properties: Ignores shift Multiples comes through squared Sum when independent Sampling Distributions Binomial Distribution Calculate Probs with BINOMDIST - PowerPoint PPT Presentation

Transcript of Stat 155, Section 2, Last Time

Page 1: Stat 155,  Section 2, Last Time

Stat 155, Section 2, Last Time• Interpreted Midterm Results

• Variance of Random Variables– From Probability Table– Properties:

• Ignores shift• Multiples comes through squared• Sum when independent

• Sampling Distributions– Binomial Distribution

• Calculate Probs with BINOMDIST• Mean & Standard Deviation

Page 2: Stat 155,  Section 2, Last Time

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 334-351, 358-369

Approximate Reading for Next Class:

Pages 382-396, 400-416

Page 3: Stat 155,  Section 2, Last Time

Binomial Distribution

Normal Approximation to the Binomial

Idea:

Bi(n,p) prob. histo. curve

So can approximate Binomial probs with

normal areas

)1(, pnpnpN

Page 4: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

Before modern software, this was a critical

issue, since Binomial Table (C in text) only

goes to n = 20.

Normal Approx made this possible…

Now still useful, since BINONDIST conks out

around n = 1000 (but political polls need

n ~ 2000-3000).

Page 5: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

Visualization of Normal Approximation:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg20.xls

Bi(100,0.3): Looks really good

Bi(20,0.5): Chunky, approx. a little weak

Bi(20,0.05): p too small for good approx.

Bi(20,0.95): p too big for good approx.

Page 6: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

When is Normal Approximation “acceptable”?

Textbooks “Rule of Thumb” (there are others):

Need: np >= 10 and n(1-p) >= 10

Relate to above examples:

Bi(20,0.5): np = n(1-p) = 10, boundary case

Bi(20,0.05): np = 4 < 10, poor approx.

Bi(20,0.95): n(1-p) = 4 < 10, poor approx.

Page 7: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

Nice Illustration:

Publisher’s Website

• Statistical Applets

• Normal Approx to Binomial

• Shows how rule of thumb makes sense

Page 8: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

HW:

5.21

5.23

Page 9: Stat 155,  Section 2, Last Time

Normal Approx. to BinomialHW:C18: In a political poll of 2000, 1010 will vote

for A. To decide how “safe” it is to predict A will win:

a. Calculate P{X >= 1010}, for X ~ Bi(2000,1/2) (0.327)

(“could happen”, so not safe to predict)b. Recalculate, assuming 1050 will vote for A

(0.0127)(now have stronger evidence, will build on this)

Page 10: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

A refinement: Continuity Correction

Idea:

• Binomial only takes on Integer Values

• Not really Continuous

• Better viewpoint (i.e. better approx.):

Replace points by “± ½ bars”

Page 11: Stat 155,  Section 2, Last Time

Continuity CorrectionGraphically:

Approx P{X=0}

by area of

bar above

(-0.5,0.5),

under curve. I.e. P{X = 0} = P{-0.5<X<0.5}

Page 12: Stat 155,  Section 2, Last Time

Continuity CorrectionMore examples:

3 4 5 6 7

}5.75.2{}73{

}5.75.3{}73{

}5.65.2{}73{

}5.65.3{}73{

XPXP

XPXP

XPXP

XPXP

Page 13: Stat 155,  Section 2, Last Time

Continuity Correction

Next time, consider replacing above e.g by

something with number lines….

Page 14: Stat 155,  Section 2, Last Time

Continuity Correction

Excel example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg20.xls

Part 3

For Bi(20,0.2):

• Continuity Correction much better

• About an order of magnitude in % error

Page 15: Stat 155,  Section 2, Last Time

Continuity Correction

Notes:

• Gives substantial improvement for small n

• But doesn’t matter much for large n

• Class rules:

– do when easy

– Skip when it adds complications…

Page 16: Stat 155,  Section 2, Last Time

Continuity Correction

HW C 19: For X ~ Bi(25,0.4)

a. Check this is on the boundary of “OK” for

using a Normal Approximation.

Find:

b. P{6 < X <= 11}

c. P{6 <= X < 11}

Using:

Page 17: Stat 155,  Section 2, Last Time

Continuity Correction

Using:

i. The naïve Normal approx.

(0.607,0.607)

ii. Normal approx. w/ cont. corr.

(0.653,0.548)

iii. The exact Binomial Distribution

(0.659,0.556 )

Page 18: Stat 155,  Section 2, Last Time

Binomial Distribution

Two Important “scales”:

i. Counts: X ~ Bi(n,p) (done above)

ii. Proportions (~ percentages):

, on [0,1] scale

often very natural,

e.g. political polls

nX

p ˆ

Page 19: Stat 155,  Section 2, Last Time

Binomial for Proportions

Relationship betweens means and SDs:

(“on average, expect” )

pnpnn X

Xnn

Xp 111ˆ

pp ˆ

npp

pnpnn X

Xn

p

)1()1(

112

22

21

npp

p

)1(ˆ

Page 20: Stat 155,  Section 2, Last Time

Binomial for Proportions

Normal Approx for Proportions:

(just uses above means and SDs)

npp

pNp)1(

Page 21: Stat 155,  Section 2, Last Time

Binomial for Proportions

HW:

5.25 work with both BINOMDIST and Normal

Approx.

(BINOMDIST: (a) 0.1495 (b) 0.0418)

Page 22: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

An interesting advertisement:

http://www.albinoblacksheep.com/flash/honda.php

Page 23: Stat 155,  Section 2, Last Time

Section 5.2: Distrib’n of Sample Means

Idea: Study Probability Structure of

• Based on

• Drawn independently

• From same distribution,

• Having Expected Value:

• And Standard Deviation:

n

iiXn

X1

1

nXX ,...,1

XEX

X

Page 24: Stat 155,  Section 2, Last Time

Expected Value of Sample Mean

How does relate to ?

Sample mean “has the same mean” as the original data.

XE X

n

nXX

XXn

X n 11

11

XXXX nnn

11

Page 25: Stat 155,  Section 2, Last Time

Variance of Sample Mean

Study “spread” (i.e. quantify variation) of

Variance of Sample mean “reduced by ”

X

22

21

2

11

1n

nXX

XXn

X n

222

222

111XXXX n

nnn

n1

Page 26: Stat 155,  Section 2, Last Time

S. D. of Sample Mean

Since Standard Deviation is square root of Variance,

Take square roots to get:

S. D. of Sample mean “reduced by ”

XX n 1

n1

Page 27: Stat 155,  Section 2, Last Time

Mean & S. D. of Sample Mean

Summary:

Averaging:

1. Gives same centerpoint

2. Reduces variation by factor of

Called “Law of Averages, Part I”

n1

Page 28: Stat 155,  Section 2, Last Time

Law of Averages, Part I

Some consequences (worth noting):

• To “double accuracy”, need 4 times as much data.

• For “10 times the accuracy”, need 100 times as much data.

Page 29: Stat 155,  Section 2, Last Time

Law of Averages, Part I

Nice Illustration:

Publisher’s Website

• Statistical Applets

• Law of Large Numbers

Page 30: Stat 155,  Section 2, Last Time

Law of Averages, Part I

HW: 5.29

Page 31: Stat 155,  Section 2, Last Time

Distribution of Sample Mean

Now know center and spread, what about “shape of distribution”?

Case 1: If are indep.

CAN SHOW:

(knew these, news is “mound shape”)

Thus work with NORMDIST & NORMINV

nXX ,,1 ,N

nNX

,~

Page 32: Stat 155,  Section 2, Last Time

Distribution of Sample Mean

Case 2: If are “almost anything”

STILL HAVE:

“approximately”

nXX ,,1

X

nN

,

Page 33: Stat 155,  Section 2, Last Time

Distribution of Sample Mean

Remarks:

• Mathematics: in terms of

• Called “Law of Averages, Part II”

• Also called “Central Limit Theorem”

• Gives sense in which Normal Distribution is in the center

• Hence name “Normal” (ostentatious?)

limn

Page 34: Stat 155,  Section 2, Last Time

Law of Averages, Part II

More Remarks:

• Thus we will work with NORMDIST & NORMINV a lot, for averages

• This is why Normal Dist’n is good model for many different populations

(any sum of small indep. Random pieces)

• Also explains Normal Approximation to the Binomial

Page 35: Stat 155,  Section 2, Last Time

Normal Approx. to Binomial

Explained by Law of Averages. II, since:

For X ~ Binomial (n,p)

Can represent X as:

Where:

Thus X is an average (rescaled sum), so

Law of Averages gives Normal Dist’n

n

iiXX

1

itrialonS

itrialonFX i 1

0

Page 36: Stat 155,  Section 2, Last Time

Law of Averages, Part II

Nice Java Demo:http://www.amstat.org/publications/jse/v6n3/applets/CLT.html

1 Dice (think n = 1): Average Dist’n is flat

2 Dice (n = 1): Average Dist’n is triangle

5 Dice (n = 5): Looks quite “mound shaped”

Page 37: Stat 155,  Section 2, Last Time

Law of Averages, Part II

Another cool one:http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

• Create U shaped distribut’n with mouse

• Simul. samples of size 2: non-Normal

• Size n = 5: more normal

• Size n = 10 or 25: mound shaped

Page 38: Stat 155,  Section 2, Last Time

Law of Averages, Part II

Textbook version:

Publisher’s Website

• Statistical Applets

• Central Limit Theorem

Page 39: Stat 155,  Section 2, Last Time

Law of Averages, Part II

Class Example:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg21.xls

Shows:

• Even starting from non-normal shape,

• Averages become normal

• More so for more averaging

• SD smaller with more averaging ( )n1

Page 40: Stat 155,  Section 2, Last Time

Law of Averages, Part II

HW: 5.35, 5.37, 5.39, 5.45

Page 41: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

A statistics professor was describing

sampling theory to his class, explaining

how a sample can be studied and used

to generalize to a population.

One of the students in the back of the room

kept shaking his head.

Page 42: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

"What's the matter?" asked the professor.

"I don't believe it," said the student, "why not study the whole population in the first place?"

The professor continued explaining the ideas of random and representative samples.

The student still shook his head.

Page 43: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

The professor launched into the mechanics of

proportional stratified samples, randomized

cluster sampling, the standard error of the

mean, and the central limit theorem.

The student remained unconvinced saying, "Too

much theory, too risky, I couldn't trust just a

few numbers in place of ALL of them."

Page 44: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

Attempting a more practical example, the professor then explained the scientific rigor and meticulous sample selection of the Nielsen television ratings which are used to determine how multiple millions of advertising dollars are spent.

The student remained unimpressed saying, "You mean that just a sample of a few thousand can tell us exactly what over 250 MILLION people are doing?"

Page 45: Stat 155,  Section 2, Last Time

And now for somethingcompletely different….

Finally, the professor, somewhat disgruntled with the skepticism, replied,

"Well, the next time you go to the campus clinic and they want to do a blood test...tell them that's not good enough ...

tell them to TAKE IT ALL!!"

From: GARY C. RAMSEYER• http://www.ilstu.edu/~gcramsey/Gallery.html