Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions...

60
Stat 31, Section 1, Last Time Inference for Proportions Hypothesis Tests 2 Sample Proportions Inference Skipped 2-way Tables Sliced populations in 2 different ways Look for independence of factors Chi Square Hypothesis test

Transcript of Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions...

Page 1: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Stat 31, Section 1, Last Time• Inference for Proportions

– Hypothesis Tests

• 2 Sample Proportions Inference

– Skipped

• 2-way Tables

– Sliced populations in 2 different ways

– Look for independence of factors

– Chi Square Hypothesis test

Page 2: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 582-611, 634-667

Approximate Reading for Next Class:

Pages 634-667

Page 3: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Preliminary comments:

• Circled numbers are points taken off

• Total for each problem in brackets

• Points evenly divided among parts

• Page total in lower right corner

• Check those sum to total on front

• Overall score out of 100 points

Page 4: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Interpretation of Scores:

• Too early for letter grades

• These will change a lot:

– Some with good grades will relax

– Some with bad grades will wake up

• Don’t believe “A & C” average to “B”

Page 5: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Interpretation of Scores:

• Recall large variation over 2 midterms

– No exception this semester

Page 6: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - ResultsCompare Midterm Scores

40

50

60

70

80

90

100

40 50 60 70 80 90 100

Midterm I

Mid

term

2 I

Page 7: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - ResultsCompare Midterm Scores

40

50

60

70

80

90

100

40 50 60 70 80 90 100

Midterm I

Mid

term

2 I

Line ofEqualScores

Page 8: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - ResultsCompare Midterm Scores

40

50

60

70

80

90

100

40 50 60 70 80 90 100

Midterm I

Mid

term

2 I

Some have DramaticallyImproved

Others haveBeen distractedBy other things

Page 9: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Interpretation of Scores:

• Recall large variation over 2 midterms

– No exception this semester

• Get better info from 2 test Total

– So will report answers in those terms

Page 10: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Histogram

of Results:

Midterm I + II, Total Score

0

2

4

6

8

10

12

14

Total Score

Fre

qu

en

cy

Page 11: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Interpretation of Scores (2 Test total):

170 - 200 A

155 – 168 B

131 – 154 C

120 – 129 D

-- 119 F

Page 12: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Midterm I - Results

Where do we go from here?

• I see 2 rather different groups…

• Which are you in?

• What can you do?

• Most important:

It is still early days……

Page 13: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Chapter 9: Two-Way TablesMain idea:

Divide up populations in two ways– E.g. 1: Age & Sex– E.g. 2: Education & Income

• Typical Major Question:

How do divisions relate?

• Are the divisions independent?– Similar idea to indepe’nce in prob. Theory– Statistical Inference?

Page 14: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Two-Way TablesBig Question:Is there a

relationship?

Note: tallest bars French Wine French Music Italian Wine Italian Music Other Wine No MusicSuggests there is a relationship

NoneFrench

Italian

French Wine

Italian Wine

Other Wine

0

5

10

15

20

25

30

35

40

45

# Bottles purchased

Music

Class Example 31 - Counts

Page 15: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Two-Way TablesGeneral Directions:

• Can we make this precise?

• Could it happen just by chance?

– Really: how likely to be a chance effect?

• Or is it statistically significant?

– I.e. music and wine purchase are related?

Page 16: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Two-Way TablesAn alternate view:

Replace counts by proportions (or %-ages)

Class Example 31 (Wine & Music), Part 2http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls

Advantage:

May be more interpretable

Drawback:

No real difference (just rescaled)

Page 17: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Two-Way TablesTesting for independence:

What is it?

From probability theory:

P{A | B} = P{A}

i.e. Chances of A, when B is known, are same as when B is unknown

Table version of this idea?

Page 18: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way Tables

Counts analog of P{A|B}???

Equivalent condition for independence is:

So for counts, look for:

Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n

i.e. Entry = Product of Marginals

}{}{}&{ BPAPBAP

Page 19: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesVisualize Product of Marginals for:

Class Example 31 (Wine & Music), Part 4http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls

Shows same structure

as marginals

But not match between

music & wine

Good null hypothesisNone

FrenchItalian

French Wine

Italian Wine

Other Wine

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

# Bottles purchased

Music

Class Example 31 - Independent Model

Page 20: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesApproach:

• Measure “distance between tables”– Use Chi Square Statistic

– Has known probability distribution when table is independent

• Assess significance using P-value

– Set up as: H0: Indep. HA: Dependent

– P-value = P{what saw or m.c. | Indep.}

Page 21: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesChi-square statistic: Based on:

• Observed Counts (raw data),

• Expected Counts (under indep.),

Notes:– Small for only random variation

– Large for significant departure from indep.

iObs

iExp

icells i

ii

ExpExpObs

X2

2

Page 22: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesChi-square statistic calculation:

Class example 31, Part 5:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls

– Calculate term by term

– Then sum

– Is X2 = 18.3 “big” or “small”?

icells i

ii

ExpExpObs

X2

2

Page 23: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesH0 distribution of the X2 statistic:

“Chi Squared” (another Greek letter )

Parameter: “degrees of freedom”

(similar to T distribution)

Excel Computation:– CHIDIST (given cutoff, find area = prob.)

– CHIINV (given prob = area, find cutoff)

2

Page 24: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesFor test of independence, use:

degrees of freedom =

= (#rows – 1) x (#cols – 1)

E.g. Wine and Music:

d.f. = (3 – 1) x (3 – 1) = 4

Page 25: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesE.g. Wine and Music:

P-value = P{Observed X2 or m.c. | Indep.} =

= P{X2 = 18.3 of m.c. | Indep.} =

= P{X2 >= 18.3 | d.f. = 4} =

= 0.0011

Also see Class Example 31, Part 5http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls

Page 26: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesE.g. Wine and Music:

P-value = 0.001

Yes-No: Very strong evidence against

independence, conclude music has a

statistically significant effect

Gray-Level: Also very strong

evidence

Page 27: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesExcel shortcut:

CHITEST

• Avoids the (obs-exp)^2 / exp calculat’n

• Automatically computes d.f.

• Returns P-value

Page 28: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Independence in 2-Way TablesHW:

9.27

9.29

Page 29: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

A statistics joke, from:

GARY C. RAMSEYER'S INTERNET GALLERY

OF STATISTICS JOKES

http://www.ilstu.edu/~gcramsey/Gallery.html

Page 30: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

A somewhat advanced society has figured

how to package basic knowledge in pill

form.

A student, needing some learning, goes to

the pharmacy and asks what kind of

knowledge pills are available.

Page 31: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

The pharmacist says "Here's a pill for

English literature."

The student takes the pill and swallows it

and has new knowledge about English

literature!

Page 32: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

"What else do you have?" asks the student.

"Well, I have pills for art history, biology,

and world history, "replies the

pharmacist.

The student asks for these, and swallows

them and has new knowledge about

those subjects!

Page 33: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

Then the student asks, "Do you have a pill for statistics?"

The pharmacist says "Wait just a moment", and goes back into the storeroom and brings back a whopper of a pill that is about twice the size of a jawbreaker and plunks it on the counter.

"I have to take that huge pill for statistics?" inquires the student.

Page 34: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

And Now for Something Completely Different

The pharmacist understandingly nods his

head and replies:

"Well, you know statistics always was a little

hard to swallow."

Page 35: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Caution about 2-Way TablesSimpson’s Paradox:

Aggregation into tables can be dangerous

E.g. from:

http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/node50.html

Study Admission rates to professional programs, look for sex bias….

Page 36: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxAdmissions to Business School:

% Males ad’ted = 480 / (480 + 120) * 100%

= 80%

% Females ad’ted = 180 / (180 + 20)* 100%

= 90%

Better for females???

Admit Deny

Male 480 120

Female 180 20

Page 37: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxAdmissions to Law School:

% Males ad’ted = 10 / (10 + 90) * 100%

= 10%

% Females ad’ted = 100 / (100+200)*100%

= 33.3%

Better for females???

Admit Deny

Male 10 90

Female 100 200

Page 38: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxCombined Admissions:

% Males ad’ted = 490 / (490 + 210) * 100%

= 70%

% Females ad’ted = 280 / (280+210)*100%

= 56%

Better for males???

Admit Deny

Male 490 210

Female 280 220

Page 39: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxHow can the rate be higher for both females

and also males?Reason: depends on relative proportionsNotes:• In Business (male applicants

dominant), easier to get in(660 / 800)

• In Law (female applicants dominant), much harder to get in

(110 / 400)

Page 40: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxLesson:

Must be very careful about aggregation

Worse: may not be aware that aggregation has been done….

Recall terminology: Lurking Variable

Can hide in aggregation…

Could be used for cheating…

Page 41: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Simpson’s ParadoxHW:

9.15

9.17

Page 42: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for RegressionChapter 10

Recall:

• Scatterplots

• Fitting Lines to Data

Now study statistical inference associated with fit lines

E.g. When is slope statistically significant?

Page 43: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Scatterplot

For data (x,y)

View by plot:

(1,2)

(3,1)

(-1,0)

(2,-1)

Toy Scatterplot, Separate Points

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

x

y

Page 44: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Idea:

Fit a line to data in a scatterplot

• To learn about “basic structure”

• To “model data”

• To provide “prediction of new values”

Page 45: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Recall some basic geometry:A line is described by an equation:

y = mx + b

m = slope m

b = y intercept b

Varying m & b gives a “family of lines”,Indexed by “parameters” m & b (or a & b)

Page 46: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Approach:

Given a scatterplot of data:

Find a & b (i.e. choose a line)

to “best fit the data”

),(),...,,( 11 nn yxyx

Page 47: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Given a line, , “indexed” by

Define “residuals” = “data Y” – “Y on line”

=

Now choose to make these “small”

),( 11 yx

abxy

)( abxy ii

),( 22 yx

),( 33 yx

ab&

ab&

Page 48: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Excellent Demo, by Charles Stanton, CSUSBhttp://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

More JAVA Demos, by David Lane at Rice U.http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

http://www.ruf.rice.edu/~lane/stat_sim/comp_r/index.html

Page 49: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Recall Linear Regression

Make Residuals > 0, by squaring

Least Squares: adjust to

Minimize the “Sum of Squared Errors”

ab&

21

)(

n

iii abxySSE

Page 50: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Least Squares in Excel

Computation:

1. INTERCEPT (computes y-intercept a)

2. SLOPE (computes slope b)

Revisit Class Example 14http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg14.xls

HW: 10.17a

Page 51: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Goal: develop

• Hypothesis Tests and Confidence Int’s

• For slope & intercept parameters, a & b

• Also study prediction

Page 52: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Idea: do statistical inference on:

– Slope a

– Intercept b

Model:

Assume: are random, independent

and

iii ebaXY

ie

eN ,0

Page 53: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Viewpoint: Data generated as:

y = ax + b

Yi chosen from

Xi

Note: a and b are “parameters”

Page 54: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Parameters and determine the

underlying model (distribution)

Estimate with the Least Squares Estimates:

and

(Using SLOPE and INTERCEPT in Excel,

based on data)

a b

a b

Page 55: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Distributions of and ?

Under the above assumptions, the sampling

distributions are:

• Centerpoints are right (unbiased)

• Spreads are more complicated

a b

aaNa ,~ˆ

bbNb ,~ˆ

Page 56: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope

• Small for x’s more spread out– Data more spread More accurate

• Small for more data– More data More accuracy

a

n

ii

ea

xxaSD

1

e

Page 57: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept

• Smaller for – Centered data More accurate intercept

• Smaller for more data– More data More accuracy

b

n

ii

eb

xx

xn

bSD

1

2

21ˆ

e

0x

Page 58: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for RegressionOne more detail:

Need to estimate using data

For this use:

• Similar to earlier sd estimate,

• Except variation is about fit line

• is similar to from before

e

2

ˆˆ1

2

n

bxays

n

iii

e

s

2n 1n

Page 59: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for Regression

Now for Probability Distributions,

Since are estimating by

Use TDIST and TINV

With degrees of freedom =

e es

2n

Page 60: Stat 31, Section 1, Last Time Inference for Proportions –Hypothesis Tests 2 Sample Proportions Inference –Skipped 2-way Tables –Sliced populations in 2.

Inference for RegressionConvenient Packaged Analysis in Excel:

Tools Data Analysis Regression

Illustrate application using:

Class Example 27,

Old Text Problem 8.6 (now 10.12)