Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance...
-
Upload
nickolas-hawkins -
Category
Documents
-
view
214 -
download
0
Transcript of Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance...
Stat 31, Section 1, Last Time• 2-way tables
– Testing for Independence– Chi-Square distance between data & model– Chi-Square Distribution– Gives P-values (CHIDIST)
• Simpson’s Paradox:– Lurking variables can reverse comparisons
• Recall Linear Regression– Fit a line to a scatterplot
Recall Linear Regression
Idea:
Fit a line to data in a scatterplot
Recall Class Example 14https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls
• To learn about “basic structure”
• To “model data”
• To provide “prediction of new values”
Inference for Regression
Goal: develop
• Hypothesis Tests and Confidence Int’s
• For slope & intercept parameters, a & b
• Also study prediction
Inference for Regression
Idea: do statistical inference on:
– Slope a
– Intercept b
Model:
Assume: are random, independent
and
iii ebaXY
ie
eN ,0
Inference for Regression
Viewpoint: Data generated as:
y = ax + b
Yi chosen from
Xi
Note: a and b are “parameters”
Inference for Regression
Parameters and determine the
underlying model (distribution)
Estimate with the Least Squares Estimates:
and
(Using SLOPE and INTERCEPT in Excel,
based on data)
a b
a b
Inference for Regression
Distributions of and ?
Under the above assumptions, the sampling
distributions are:
• Centerpoints are right (unbiased)
• Spreads are more complicated
a b
aaNa ,~ˆ
bbNb ,~ˆ
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope
• Small for x’s more spread out– Data more spread More accurate
• Small for more data– More data More accuracy
a
n
ii
ea
xxaSD
1
2ˆ
e
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept
• Smaller for – Centered data More accurate intercept
• Smaller for more data– More data More accuracy
b
n
ii
eb
xx
xn
bSD
1
2
21ˆ
e
0x
Inference for RegressionOne more detail:
Need to estimate using data
For this use:
• Similar to earlier sd estimate,
• Except variation is about fit line
• is similar to from before
e
2
ˆˆ1
2
n
bxays
n
iii
e
s
2n 1n
Inference for Regression
Now for Probability Distributions,
Since are estimating by
Use TDIST and TINV
With degrees of freedom =
e es
2n
Inference for RegressionConvenient Packaged Analysis in Excel:
Tools Data Analysis Regression
Illustrate application using:
Class Example 27,
Old Text Problem 8.6 (now 10.12)
Inference for RegressionClass Example 27,
Old Text Problem 8.6 (now 10.12)Utility companies estimate energy used by
their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
Inference for RegressionData for October through June are:
Month X = Deg. Days Y = Gas Cons’n
Oct 15.6 5.2
Nov 26.8 6.1
Dec 37.8 8.7
Jan 36.4 8.5
Feb 35.5 8.8
Mar 18.6 4.9
Apr 15.3 4.5
May 7.9 2.5
Jun 0 1.1
Inference for RegressionClass Example 27,
Old Text Problem 8.6 (now 10.12)
Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls
Good News:
Lots of things done automatically
Bad News:
Different language,
so need careful interpretation
Inference for RegressionExcel Glossary:
Excel Stat 31
R2 r2 = Prop’n of Sum of Squares
Explained by Line
intercept Intercept b
X Variable Slope a
Coefficient Estimates & .a b
Inference for RegressionExcel Glossary:
Excel Stat 31
Standard Errors
Estimates of & .
(recall from Sampling Dist’ns)
T – Stat. (Est. – mean) / SE, i.e. put
on scale of T – distribution
P-value For 2-sided test of:
a b
0:.0:0
b
aHvs
b
aH A
Inference for RegressionExcel Glossary:
Excel Stat 31
Lower 95%
Upper 95%
Ends of 95% Confidence
Interval for a and b
(since chose 0.95 for Confidence level)
Predicted . Points on line at ,
i.e. .iXiY
baX i
Inference for RegressionExcel Glossary:
Excel Stat 31
Residual for .
Recall: gave useful information about quality of fit
Standard Residuals:
on standardized scale
e
ii bXaY
ˆˆ
iX bXaY iiˆˆ
Inference for RegressionSome useful variations:
Class Example 28,
Old Text Problems 10.8 - 10.10
(now 10.13 – 10.15)
Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
Inference for RegressionClass Example 28,
(now 10.13 – 10.15)
Old 10.8:
The data are:
Year Lean
75 642
76 644
77 656
78 667
79 673
80 688
81 696
82 698
83 713
84 717
85 725
86 742
87 757
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
(a) Plot the data, does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Inference for Regression
HW:
10.3 b,c
10.5
And Now for Something Completely Different
Etymology of:
“And now for something completely
different”
Anybody heard of this before?
And Now for Something Completely Different
What is “etymology”?
Google responses to:
define: etymology• The history of words; the study of the history
of words.csmp.ucop.edu/crlp/resources/glossary.html
• The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm
And Now for Something Completely Different
What is “etymology”?
• Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm
And Now for Something Completely Different
Google response to: define: and now for something
completely differentAnd Now For Something Completely Different is a
film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different
And Now for Something Completely Different
Google Search for:
“And now for something completely different”
Gives more than 100 results….
A perhaps interesting one:
http://www.mwscomp.com/mpfc/mpfc.html
And Now for Something Completely Different
Google Search for:
“Stat 31 and now for something completely different”
Gives:
[PPT] Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for something completely different… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages
Prediction in Regression
Idea: Given data
Can find the Least Squares Fit Line, and do
inference for the parameters.
Given a new X value, say , what will the
new Y value be?
nn YXYX ,,,, 11
0X
Prediction in Regression
Dealing with variation in prediction:
Under the model:
A sensible guess about ,
based on the given ,
is:
(point on the fit line above )
iii ebaXY
0Y
iY ebXaY ˆˆˆˆ 00
0X
0X
Prediction in Regression
What about variation about this guess?
Natural Approach: present an interval
(as done with Confidence Intervals)
Careful: Two Notions of this:
1. Confidence Interval for mean of
2. Prediction Interval for value of
0Y
0Y
Prediction in Regression
1. Confidence Interval for mean of :
Use:
where:
and where
0Y
YSEtY ˆ*ˆ
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
)2,95.01(* nTINVt
Prediction in Regression
Interpretation of:
• Smaller for closer to
• But never 0
• Smaller for more spread out
• Larger for larger
0x x
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
six
Prediction in Regression
2. Prediction Interval for value of
Use:
where:
And again
0Y
YSEtY ˆ*
0
n
ii xx
xxn
sSEY
1
2
20
ˆ
11
)2,95.01(* nTINVt
Prediction in Regression
Interpretation of:
• Similar remarks to above …
• Additional “1 + ” accounts for added
variation in compared to
n
ii
Y
xx
xxn
sSE
1
2
20
ˆ
11
Y0Y
Prediction in RegressionRevisit Class Example 28,
(now 10.13 – 10.15) Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…
Prediction in RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.9:
(a) Plot the data, Does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Prediction in RegressionHW:
10.20 and add part:
(f) Calculate a 95% Confidence Interval for
the mean oxygen uptake of individuals
having heart rate 96, and heart rate
115.
Additional Issues in RegressionRobustness
Outliers via Java Applet
HW on outliers