Stor 155, Section 2, Last Time

45
Stor 155, Section 2, Last Time Inference for Regression Least Square Fits Sampling distrib’ns for slope and intercept Regression Tool Gave many useful answers (CIs, Hypo Tests, Graphics,…) But had to “translate language”

description

Stor 155, Section 2, Last Time. Inference for Regression Least Square Fits Sampling distrib’ns for slope and intercept Regression Tool Gave many useful answers (CIs, Hypo Tests, Graphics,…) But had to “translate language”. Reading In Textbook. Approximate Reading for Today’s Material: - PowerPoint PPT Presentation

Transcript of Stor 155, Section 2, Last Time

Page 1: Stor 155,  Section 2, Last Time

Stor 155, Section 2, Last Time• Inference for Regression

– Least Square Fits

– Sampling distrib’ns for slope and intercept

• Regression Tool

– Gave many useful answers

(CIs, Hypo Tests, Graphics,…)

– But had to “translate language”

Page 2: Stor 155,  Section 2, Last Time

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 634-667

Next Time: All review

Page 3: Stor 155,  Section 2, Last Time

Stat 31 Final Exam:Date & Time:

Tuesday, May 8,  8:00-11:00

Last Office Hours:• Thursday, May 3, 12:00 - 5:00• Monday, May 7, 10:00 - 5:00 • & by email appointment (earlier)

Bring with you, to exam:• Single (8.5" x 11") sheet of formulas• Front & Back OK

Page 4: Stor 155,  Section 2, Last Time

Prediction in Regression

Idea: Given data

Can find the Least Squares Fit Line, and do

inference for the parameters.

Given a new X value, say , what will the

new Y value be?

nn YXYX ,,,, 11

0X

Page 5: Stor 155,  Section 2, Last Time

Prediction in Regression

Dealing with variation in prediction:

Under the model:

A sensible guess about ,

based on the given ,

is:

(point on the fit line above )

iii ebaXY

0Y

bXaYYˆˆˆˆ 00

0X

0X

Page 6: Stor 155,  Section 2, Last Time

Prediction in Regression

What about variation about this guess?

Natural Approach: present an interval

(as done with Confidence Intervals)

Careful: Two Notions of this:

1. Confidence Interval for mean of

2. Prediction Interval for value of

0Y

0Y

Page 7: Stor 155,  Section 2, Last Time

Prediction in Regression

1. Confidence Interval for mean of :

Use:

where:

and where

0Y

YSEtY ˆ*ˆ

n

ii xx

xxn

sSEY

1

2

20

ˆ

1

)2,95.01(* nTINVt

Page 8: Stor 155,  Section 2, Last Time

Prediction in Regression

Interpretation of:

• Smaller for closer to

• But never 0

• Smaller for more spread out

• Larger for larger

0x x

n

ii xx

xxn

sSEY

1

2

20

ˆ

1

six

Page 9: Stor 155,  Section 2, Last Time

Prediction in Regression

2. Prediction Interval for value of

Use:

where:

And again

0Y

YSEtY ˆ*

0

n

ii

Y

xx

xx

nsSE

1

2

20

ˆ

11

)2,95.01(* nTINVt

Page 10: Stor 155,  Section 2, Last Time

Prediction in Regression

Interpretation of:

• Similar remarks to above …

• Additional “1 + ” accounts for added

variation in compared to

n

ii

Y

xx

xxn

sSE

1

2

20

ˆ

11

Y0Y

Page 11: Stor 155,  Section 2, Last Time

Prediction in RegressionRevisit Class Example 33,

Textbook Problem 10.23-10.25:

Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

Page 12: Stor 155,  Section 2, Last Time

Prediction in Regression??? Next time: spruce up these examples

a lot ???

Page 13: Stor 155,  Section 2, Last Time

Prediction in RegressionClass Example 33,

Textbook Problems 10.23 – 10.25:

(a) Plot the data, Does the trend in lean over time appear to be linear?

(b) What is the equation of the least squares fit line?

(c) Give a 95% confidence interval for the average rate of change of the lean.

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 14: Stor 155,  Section 2, Last Time

Prediction in RegressionHW:

10.7 b, c, d

10.8 ((c) [11610, 12660], [9554, 14720])

Page 15: Stor 155,  Section 2, Last Time

Prediction in RegressionRevisit Class Example 33,

Textbook 10.23 – 10.25:

Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

Page 16: Stor 155,  Section 2, Last Time

Prediction in RegressionClass Example 33, Problem 10.24:

(a) In 1918 the lean was 2.9071 (the coded value is 71). Using the least squares equation for the years 1975 to 1987, calculate a predicted value for the lean in 1918

(b) Although the least squares line gives an excellent fit for 1975 – 1987, this did not extend back to 1918. Why?

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 17: Stor 155,  Section 2, Last Time

Prediction in RegressionClass Example 33, Problem 10.25:

(a) How would you code the explanatory variable for the year 2002?

(b) The engineers working on the tower were most interested in how much it would lean if no corrective action were taken. Use the least squares equation line to predict the lean in 2005.

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 18: Stor 155,  Section 2, Last Time

Prediction in RegressionClass Example 33, Problem 10.25:

(c) To give a margin of error for the lean in 2005, would you use a confidence interval for the mean, or a prediction interval? Explain your choice.

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

Page 19: Stor 155,  Section 2, Last Time

Prediction in RegressionClass Example 33, Problem 10.25

(d) Give the values of the 95% confidence interval for the mean, and the 95% prediction interval. How do they compare?

Recall generic formula (same for both):

SEtY *0

Page 20: Stor 155,  Section 2, Last Time

Prediction in Regression

Class Example 33, Problem 10.25

Difference was in form for SE:

CI for mean:

PI for value:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg33.xls

n

ii

e

xx

xxn

sSEY

1

2

20

ˆ

1

n

ii

e

xx

xxn

sSEY

1

2

20

ˆ

11

Page 21: Stor 155,  Section 2, Last Time

Outliers in RegressionCaution about regression:

Outliers can have a major impacthttp://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

• Single point can throw slope way off

• And intercept too

• Can watch for this, using plot

• And residual plot show this, too

Page 22: Stor 155,  Section 2, Last Time

Nonlinear RegressionWhen lines don’t fit data:

• How do we know?

• What can we do?

• There is a lot…

• But beyond scope of this course

• Some indication…

Page 23: Stor 155,  Section 2, Last Time

Nonlinear RegressionClass Example 34: World Population

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg34.xls

Main lessons:

• Data can be non-linear

• Identify with plot

• Residuals even more powerful at this

• Look for systematic structure

Page 24: Stor 155,  Section 2, Last Time

Nonlinear RegressionClass Example 34: World Population

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg34.xls

When data are non-linear:

• There is non-linear regression

• But not covered here

• Can use lin. reg’n on transformed data

• Log transform often useful

Page 25: Stor 155,  Section 2, Last Time

Next time:Additional Issues in Regression

Robustness

Outliers via Java Applet

HW on outliers

Page 26: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

Etymology of:

“And now for something completely

different”

Anybody heard of this before?

(really 2 questions…)

Page 27: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

What is “etymology”?

Google responses to:

define: etymology• The history of words; the study of the history

of words.csmp.ucop.edu/crlp/resources/glossary.html

• The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm

Page 28: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

What is “etymology”?

• Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm

Page 29: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

Google response to: define: and now for something

completely differentAnd Now For Something Completely Different is a

film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different

Page 30: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

Google Search for:

“And now for something completely different”

Gives more than 100 results….

A perhaps interesting one:

http://www.mwscomp.com/mpfc/mpfc.html

Page 31: Stor 155,  Section 2, Last Time

And Now for Something Completely Different

Google Search for:“Stor 155 and now for something completely different”

Gives:[PPT] Slide 1File Format: Microsoft Powerpoint - View as HTML

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/ ... And Now for Something Completely Different. P: Dead bugs on windshield. ...stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155-07-01-30.ppt - Similar pages

Page 32: Stor 155,  Section 2, Last Time

Review Slippery Issues

Major Confusion:

Population Quantities

Vs.

Sample Quantities

Page 33: Stor 155,  Section 2, Last Time

Review Slippery Issues

Population Mathematical Notation:

(fixed & unknown)

Sample Mathematical Notation :

(summaries of data, have numbers)

p,,

psX ˆ,,

Page 34: Stor 155,  Section 2, Last Time

Hypothesis Testing – Z scores

E.g. Fast Food Menus:

Test

Using

P-value = P{what saw or m.c.| H0 & HA bd’ry}

(guides where to put $21k & $20k)

000,20$:0 H

000,20$: AH

10,400,2$,000,21$ nsX

Page 35: Stor 155,  Section 2, Last Time

Hypothesis Testing – Z scores

P-value = P{what saw or or m.c.| H0 & HA bd’ry}

rybdXP '|000,21$

000,20$|000,21$ XP

102400$

000,20$000,21$

nsX

P

317.1 ZP

Page 36: Stor 155,  Section 2, Last Time

Hypothesis Testing – Z scores

P-value

This is the Z-score

Computation: Class E.g. 24, Part 6http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg24.xls

Distribution: N(0,1)

317.1 ZP

Page 37: Stor 155,  Section 2, Last Time

Hypothesis Testing – Z scores

P-value

So instead of reporting tail probability,

Report this cutoff instead,

as “SDs away from mean $20,000”

317.1 ZP

Page 38: Stor 155,  Section 2, Last Time

Review for Final

An Important Mode of Thinking:

Ideas vs. “Cookbook”

Page 39: Stor 155,  Section 2, Last Time

Ideas vs. “Cookbook”How do you view your sheet of formulas?

• A set of recipes?

• “Look through list” to solve problems?

• Getting harder to find now?

Problems:

• Too many decisions to make

• Hard to sort out while looking through…

Page 40: Stor 155,  Section 2, Last Time

Ideas vs. “Cookbook”Too many decisions to make, e.g.:

• Binomial vs. Normal

• 1-sided vs. 2-sided Hypo. Tests

• TDIST (INV) vs. NORMDIST (INV)

• CI vs. Sample Size calculation

• 1 sample vs. 2 sample

• Which is H0, HA? And what direction?

• What is “m.c.”? What is “Bdry”?

Page 41: Stor 155,  Section 2, Last Time

Ideas vs. “Cookbook”Suggested Approach:

• Use concepts to guide choice

• This is what I try to teach

• And is what I am testing for…

How to learn?

• Go through old HW (random order)

• When stumped, look through notes

(look for main Ideas, not “the right formula”)

Page 42: Stor 155,  Section 2, Last Time

A useful conceptPerhaps not well taught?

|a – b| - “Number of spaces between a and b on the numberline”

e.g. Midterm II, problem 3c (x = 0, 1, 2, 3)

{|X – 1| > 1} = {“number of spaces between X and 1 is more than 1”}

= {X = 3}

Page 43: Stor 155,  Section 2, Last Time

A useful concepte.g. Midterm II, problem 3c (x = 0, 1, 2, 3)

{|X – 1| > 1} = {“number of spaces between X and 1 is more than 1”}

= {X = 3}

Because:

0 1 2 3

Page 44: Stor 155,  Section 2, Last Time

Response to a RequestYou said at the end of today's class that you would be

willing to take class time to "reteach" concepts that might still be unknown to us.

Well, in my case, it seems that probability and probability distribution is a hard concept for me to grasp.

On the first midterm, I missed … and on the second midterm, I missed …

I seem to be able to grasp the other concepts involving binomial distribution, normal distribution, t-distribution, etc fairly well, but probability is really killing me on the exams.

If you could reteach these or brush up on them I would greatly appreciate it.

Page 45: Stor 155,  Section 2, Last Time

A Flash from the PastTwo HW “Traps”

1. Working together:• Great, if the relationship is equal• But don’t be the “yes, I get it” person…

2. The HW “Consortium”:• You do HW 1, and I’ll do HW 2…• Easy with electronic HW• Trap: HW is about learning• You don’t learn on your off weeks…