Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance...

Post on 12-Jan-2016

214 views 0 download

Tags:

Transcript of Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance...

Stat 31, Section 1, Last Time• 2-way tables

– Testing for Independence– Chi-Square distance between data & model– Chi-Square Distribution– Gives P-values (CHIDIST)

• Simpson’s Paradox:– Lurking variables can reverse comparisons

• Recall Linear Regression– Fit a line to a scatterplot

Recall Linear Regression

Idea:

Fit a line to data in a scatterplot

Recall Class Example 14https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls

• To learn about “basic structure”

• To “model data”

• To provide “prediction of new values”

Inference for Regression

Goal: develop

• Hypothesis Tests and Confidence Int’s

• For slope & intercept parameters, a & b

• Also study prediction

Inference for Regression

Idea: do statistical inference on:

– Slope a

– Intercept b

Model:

Assume: are random, independent

and

iii ebaXY

ie

eN ,0

Inference for Regression

Viewpoint: Data generated as:

y = ax + b

Yi chosen from

Xi

Note: a and b are “parameters”

Inference for Regression

Parameters and determine the

underlying model (distribution)

Estimate with the Least Squares Estimates:

and

(Using SLOPE and INTERCEPT in Excel,

based on data)

a b

a b

Inference for Regression

Distributions of and ?

Under the above assumptions, the sampling

distributions are:

• Centerpoints are right (unbiased)

• Spreads are more complicated

a b

aaNa ,~ˆ

bbNb ,~ˆ

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope

• Small for x’s more spread out– Data more spread More accurate

• Small for more data– More data More accuracy

a

n

ii

ea

xxaSD

1

e

Inference for RegressionFormula for SD of :

• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept

• Smaller for – Centered data More accurate intercept

• Smaller for more data– More data More accuracy

b

n

ii

eb

xx

xn

bSD

1

2

21ˆ

e

0x

Inference for RegressionOne more detail:

Need to estimate using data

For this use:

• Similar to earlier sd estimate,

• Except variation is about fit line

• is similar to from before

e

2

ˆˆ1

2

n

bxays

n

iii

e

s

2n 1n

Inference for Regression

Now for Probability Distributions,

Since are estimating by

Use TDIST and TINV

With degrees of freedom =

e es

2n

Inference for RegressionConvenient Packaged Analysis in Excel:

Tools Data Analysis Regression

Illustrate application using:

Class Example 27,

Old Text Problem 8.6 (now 10.12)

Inference for RegressionClass Example 27,

Old Text Problem 8.6 (now 10.12)Utility companies estimate energy used by

their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:

Inference for RegressionData for October through June are:

Month X = Deg. Days Y = Gas Cons’n

Oct 15.6 5.2

Nov 26.8 6.1

Dec 37.8 8.7

Jan 36.4 8.5

Feb 35.5 8.8

Mar 18.6 4.9

Apr 15.3 4.5

May 7.9 2.5

Jun 0 1.1

Inference for RegressionClass Example 27,

Old Text Problem 8.6 (now 10.12)

Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls

Good News:

Lots of things done automatically

Bad News:

Different language,

so need careful interpretation

Inference for RegressionExcel Glossary:

Excel Stat 31

R2 r2 = Prop’n of Sum of Squares

Explained by Line

intercept Intercept b

X Variable Slope a

Coefficient Estimates & .a b

Inference for RegressionExcel Glossary:

Excel Stat 31

Standard Errors

Estimates of & .

(recall from Sampling Dist’ns)

T – Stat. (Est. – mean) / SE, i.e. put

on scale of T – distribution

P-value For 2-sided test of:

a b

0:.0:0

b

aHvs

b

aH A

Inference for RegressionExcel Glossary:

Excel Stat 31

Lower 95%

Upper 95%

Ends of 95% Confidence

Interval for a and b

(since chose 0.95 for Confidence level)

Predicted . Points on line at ,

i.e. .iXiY

baX i

Inference for RegressionExcel Glossary:

Excel Stat 31

Residual for .

Recall: gave useful information about quality of fit

Standard Residuals:

on standardized scale

e

ii bXaY

ˆˆ

iX bXaY iiˆˆ

Inference for RegressionSome useful variations:

Class Example 28,

Old Text Problems 10.8 - 10.10

(now 10.13 – 10.15)

Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Inference for RegressionClass Example 28, (now 10.13 – 10.15)

Old 10.8:

Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:

Inference for RegressionClass Example 28,

(now 10.13 – 10.15)

Old 10.8:

The data are:

Year Lean

75 642

76 644

77 656

78 667

79 673

80 688

81 696

82 698

83 713

84 717

85 725

86 742

87 757

Inference for RegressionClass Example 28, (now 10.13 – 10.15)

Old 10.8:

(a) Plot the data, does the trend in lean over time appear to be linear?

(b) What is the equation of the least squares fit line?

(c) Give a 95% confidence interval for the average rate of change of the lean.

https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Inference for Regression

HW:

10.3 b,c

10.5

And Now for Something Completely Different

Etymology of:

“And now for something completely

different”

Anybody heard of this before?

And Now for Something Completely Different

What is “etymology”?

Google responses to:

define: etymology• The history of words; the study of the history

of words.csmp.ucop.edu/crlp/resources/glossary.html

• The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm

And Now for Something Completely Different

What is “etymology”?

• Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm

And Now for Something Completely Different

Google response to: define: and now for something

completely differentAnd Now For Something Completely Different is a

film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different

And Now for Something Completely Different

Google Search for:

“And now for something completely different”

Gives more than 100 results….

A perhaps interesting one:

http://www.mwscomp.com/mpfc/mpfc.html

And Now for Something Completely Different

Google Search for:

“Stat 31 and now for something completely different”

Gives:

[PPT] Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for something completely different… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages

Prediction in Regression

Idea: Given data

Can find the Least Squares Fit Line, and do

inference for the parameters.

Given a new X value, say , what will the

new Y value be?

nn YXYX ,,,, 11

0X

Prediction in Regression

Dealing with variation in prediction:

Under the model:

A sensible guess about ,

based on the given ,

is:

(point on the fit line above )

iii ebaXY

0Y

iY ebXaY ˆˆˆˆ 00

0X

0X

Prediction in Regression

What about variation about this guess?

Natural Approach: present an interval

(as done with Confidence Intervals)

Careful: Two Notions of this:

1. Confidence Interval for mean of

2. Prediction Interval for value of

0Y

0Y

Prediction in Regression

1. Confidence Interval for mean of :

Use:

where:

and where

0Y

YSEtY ˆ*ˆ

n

ii xx

xxn

sSEY

1

2

20

ˆ

1

)2,95.01(* nTINVt

Prediction in Regression

Interpretation of:

• Smaller for closer to

• But never 0

• Smaller for more spread out

• Larger for larger

0x x

n

ii xx

xxn

sSEY

1

2

20

ˆ

1

six

Prediction in Regression

2. Prediction Interval for value of

Use:

where:

And again

0Y

YSEtY ˆ*

0

n

ii xx

xxn

sSEY

1

2

20

ˆ

11

)2,95.01(* nTINVt

Prediction in Regression

Interpretation of:

• Similar remarks to above …

• Additional “1 + ” accounts for added

variation in compared to

n

ii

Y

xx

xxn

sSE

1

2

20

ˆ

11

Y0Y

Prediction in RegressionRevisit Class Example 28,

(now 10.13 – 10.15) Old 10.8:

Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

Prediction in RegressionClass Example 28, (now 10.13 – 10.15)

Old 10.9:

(a) Plot the data, Does the trend in lean over time appear to be linear?

(b) What is the equation of the least squares fit line?

(c) Give a 95% confidence interval for the average rate of change of the lean.

https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Prediction in RegressionHW:

10.20 and add part:

(f) Calculate a 95% Confidence Interval for

the mean oxygen uptake of individuals

having heart rate 96, and heart rate

115.

Additional Issues in RegressionRobustness

Outliers via Java Applet

HW on outliers