Regression Analysis
description
Transcript of Regression Analysis
-
Key Terms and Concepts Before taking the quiz, you need to be able to explain the meanings (and recognize symbols in cases where there is an associated symbol) of each of these terms or concepts. You should also know when and how to use them in statistics problems.
Unless otherwise noted (by a note such as "see lesson 2") these terms and concepts are defined in the glossary.
categorical variables cause and effect conditional distributions data transformation explanatory variable finding residuals on the graphing calculator (see lessons 3 and 4 and your calculator manual) influential points interpreting MINITAB for regression (see lesson 3) joint frequencies least-squares regression line (line of best fit) LinReg (a + bx) on the graphing calculator (On the TI83/84 it is STAT CALC 8. On the TI-89 it
is [F4] 3:Regressions) marginal frequencies negative association non-linear bivariate data numerical variables outliers (in bivariate data) positive association relation between r and the slope of the regression line resid on the graphing calculator residual response variable row and column percents scatterplot Simpson's Paradox sum of squared residuals the coefficient of determination (r2) the correlation coefficient (r) two-way table use the graphing calculator to transform data to achieve linearity (see lesson 5) using residuals to test a linear model (see lessons 3, 4, and 5)
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 1 of 22
-
Objectives, Example Problems, and Study Tips
Introduction to Bivariate Data
Objective 1 Distinguish between quantitative and categorical data. Examples 1. Which of the following statistics or variables are derived from quantitative data and
which are derived from categorical data? A. Your G.P.A B. The political party your father belongs to C. The cities of residence of 300 people D. The populations of 10 different cities
Tips
Categorical data are also called qualitative data. Quantitative data are also called numerical data.
A categorical variable (which holds categorical data) tells which of several groups an individual belongs to. A quantitative variable has a numerical value that can be manipulated mathematically.
Categorical variables can be thought of as labels, or names; quantitative variables can be thought of as numerical values, or quantities.
Some categorical data appear as numbers, but they are really just names, or labels, for categories. For example, the variable "favorite radio station" may have the value 102.3 or 104.1, but these numbers are meant as labels and not as quantities. It would be meaningless to add 102.3 and 104.1 to get the average radio station.
Answers 1.
A. Quantitative. Your G.P.A, is an arithmetic mean of grades you received from many classes.
B. Categorical. "Political party" is a label, not a quantity. C. Categorical D. Quantitative
Objective 2 Distinguish between explanatory and response variables. Examples 1. You want to be able to predict class rank from number of hours a student spends on
homework. Which is the explanatory variable, and which is the response variable?
2. True or False: There is always a cause-and-effect relationship between the explanatory and response variables.
Tips
Sometimes things can be associated without knowing which, if either, variable caused the other.
The explanatory variable is sometimes called the independent variable, and the response variable is sometimes called the dependent variable.
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 2 of 22
-
Answers 1. The variable you're using to predict from is the explanatory variable and the variable you're
trying to predict is the response variable. Therefore, hours spent on homework is the explanatory variable, and class rank is the response variable.
2. False. We might suspect a cause-an-effect relationship between two variables if they're strongly related, but association alone does not prove cause and effect. For example, it's well known that success on the SAT predicts success in college (that's one reason why many colleges use SAT scores to help decide on admissions). In no way, however, does it mean that a high score on the SAT causes success in college.
Objective 3 Construct a scatterplot when given a set of paired data. Examples 1. The number of calories and the number of grams of fat in 25 common fast foods
(hamburgers, pie, french fries, onion rings, etc.) are given in the following table. Construct a scatterplot of the data where grams of fat is the explanatory (independent) variable: (Data taken from Landwehr & Watkins, Exploring Data, Dale Seymour Publications, Palo Alto, 1995, pg. 21)
Grams of Calories Fat 31 570 38 660 48 800 55 890 14 300 19 350 10 260 15 300 28 470 33 530 25 450 10 280 32 620 28 450 13 236 9 178 4 142 5 95 0 25 20 372 19 339 14 320 13 360 8 290 14 220
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 3 of 22
-
Answer 1.
ObjectivIdentify Example1. As ag
posit
2. Does
3. Woulassoc
ve 4 instances of
es ge increasesive or negat
s the followin
d you expeciated?
f positive an
s, the averagtive associa
ng scatterpl
ect hours of
nd negative
ge number otion?
lot demonst
exercise pe
association
of years left
trate positiv
er week and
n.
t to live decr
ve or negativ
d weight to
reases. It th
ve associatio
o be positive
his an instan
on?
ely or negat
nce of
tively
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 4 of 22
-
Tip Associations are considered to be positive if the response variable increases as the
explanatory variable decreases, and negative if the response variable decreases as the explanatory variable increases.
Answers 1. Negative. As the explanatory variable (age) increases, the response variable (number of
years to live) decreases.
2. Negative
3. Negative, since typically people who exercise weigh less.
The Least-Squares Regression Line
Objective 1 Calculate the linear regression line from a bivariate data set, interpret the correlation coefficient, and use the line to predict values of the response variable when given values for the explanatory variable. Examples 1. In what sense is the linear regression line also the "line of best fit?"
2. Define the least-squares regression line.
3. Consider again the fat vs. calories data you saw earlier:
Fat 31 38 48 55 14 19 10 15 28 33 25 10 32 Calories 570 660 800 890 300 350 260 300 470 530 450 280 620
Fat 28 13 9 4 5 0 20 19 14 13 8 14 Calories 450 236 178 142 95 25 372 339 320 360 290 220
Calculate the line of best fit of calories on fat (that is, use fat as the explanatory variable and use calories as the response variable) and interpret the regression coefficient. (Do this on your calculator, not by hand!)
Tips
The line of "best fit" for any set of points is the line that comes closest to containing all the ordered pairs in the data.
When you interpret a correlation coefficient, you simply make a statement that tells the amount of increase in the response variable for every unit increase (an increase of 1) in the explanatory variable. The amount of increase is simply the regression coefficient, or the slope of the regression line.
On the TI-83/TI-84 there is a LinReg(a+bx), which you can get to by Press STAT Choose [CALC] Choose [LinReg(a+bx)] (or just press 8)
On the TI-89 there is a LinReg(a+bx), which you can get to when you are in the Stats/List Editor press [F4] 3:Regressions [ENTER].
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 5 of 22
-
Noteinstru
Answer1. The l
lines,
2. The lof sqbetw
regre
3. y = 7
You cand s
You cplot1
The c"fat c
Note
ObjectivCalculate Example1. Using
A. Th
: In statisticuctions abov
s linear regres, that comes
east-squareuared residueen the act
ession line is
78.03 + 14.
can display setting up a
can display t with x as li
correlation ccontent," th
: Save your
ve 2 e a set of re
es the data fro
he sum of th
cs, do NOT uve and choo
ssion line is s closest to
es regressiouals from thual y-value
s defined as
96x (rounde
the scatterp scatterplot
the scatterpist1 and y a
coefficient caere tends to
r data for th
esiduals from
om Lesson 3
he residuals
use LinReg(osing [LinRe
also called the set of p
n line, or thhe regressio of the point
s the line tha
ed to two de
plot with the with L1 and
lot with the as list2.
an be interpo be an incr
e next parts
m a linear re
3, calculate
.y y
(ax+b), whiceg(ax+b] rat
the line of bpoints.
e line of besn line. A rest and its pre
at minimize
ecimal place
e line on TI-d the Xlist a
line on the T
reted as: "Fease in the
s of this rev
egression.
a complete
ch you can ther than [L
best fit. It's
st fit, is the sidual is defedicted y-va
es the expre
es).
-83/TI-84 byand L2 as th
TI-89 by pre
For every 1 g variable "ca
view.
e set of resid
get by followLinReg(a+bx
s the one lin
line that mifined as the alue ( y ). Sy
ession y
y pressing 2he Ylist.
essing
gram increasalories" of 1
duals and de
wing the x)].
e, of all pos
inimizes the vertical distymbolically,
2 .y
2nd STAT PL
[F1] and s
se in the var14.96."
etermine
ssible
e sum tance the
OT
set up
riable
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 6 of 22
-
B. The sum of the squares of the residuals 2 .y y Here's the data again:
Fat 31 38 48 55 14 19 10 15 28 33 25 10 32 Calories 570 660 800 890 300 350 260 300 470 530 450 280 620
Fat 28 13 9 4 5 0 20 19 14 13 8 14 Calories 450 236 178 142 95 25 372 339 320 360 290 220
Tip
You should be able to calculate the residuals without using the built-in function in your calculator; it teaches you a lot about just what residuals are. However, you should remember that when you do a linear regression on the graphing calculator, a set of residuals is created and stored under the list name [RESID] on the TI-83/TI-84 which can be found under the [NAMES] menu; and [statvars\resid] on the TI-89.
Answer on the TI-83/TI-84: 1. Enter the fat data in L1 and the calorie data in L2.
2. Press STAT CALC 8 (to get LinReg(a+bx)).
3. Press 2nd [L1], [L2], VARS Y-VARS 1 1 (to get LinReg(a+bx) L1,L2,Y1).
4. Press ENTER (the regression equation is stored in Y1).
5. Press STAT 1 and move the cursor on top of L3 (clear L3 if it has numbers in it).
6. Press CLEAR 2nd [L2] - VARS Y-VARS 1 1 ( 2nd [L1] ) ENTER.
7. This will place the residuals in L3. The expression L2-Y1(L1) is equivalent to subtracting the predicted y-value from the actual y-value.
8. Press STAT CALC 1 2nd [L3].
9. The sum of the residuals is y y and should be very close to 0. The sum of the residuals squared is 2y y and is something like 44942.81.
10. Just to be sure you've done it right in L3, do the following:
2nd [LIST], select MATH, press 5, press 2nd [LIST], select RESID and press ENTER, press x2. (Your screen should end up with this expression: sum( LRESID)2).
11. Press ENTER again, and your answer should be 44942.81, which is the sum of all of the residuals squared.
Note: Save the data for fat, calories, and residuals for the next part of this review. Answer on the TI-89: 1. Go to the Stats/List Editor and clear list1. Enter the Fat data there. Clear list 2 and put the
Calorie data there.
2. Compute the least-squares regression line for the data while you are in the lists, by pressing [F4] 3:Regressions->1:LineReg(a+bx) [ENTER] (your list names of list1 and
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 7 of 22
-
list2 (such
3. The c
4. You cto y1
5. When
6. Scrol
7. Type
8. You n
9. The spresssum
10. The Not
Just
ObjectivUse resid ObjectivUse the ggives an variables Example1. You s
TI-89residhave on thsectio
Use tthese
Tip
Aa
should alreah as y1(x))
calculator di
can also writ1 [ENTER] a
n you press
l over to the
the formula
now see the
sum of the r
s [2nd] [5]3:is indeed cl
sum of the
te: Save the
t to be sure On the homstatvars\rall of the re
Note: Save
ve 3 duals to disc
ve 4 graphing ca acceptable s.
e should have9 Fat data, Luals. If not, not done an
he TI-83/TI-on.
the data in ye data.
A residual ploverage resid
ady be thereto store you
isplays the r
te down yound enter yo
[2ND] [APP
e top of list3
a list2-y1(lis
e residual va
residuals is
:List [ENTERose to 0.
residuals sq
e data for Fa
youve doneme screen eresid^2 [)]esiduals squ
e the data fo
cuss the ade
lculator to c model for t
e saved the L2 TI-83/TI-, return to thny new regr-84, and sta
your calcula
ot should shdual value o
e; but, if thur regressio
regression i
ur equation ur equation
PS], you sho
3 and press
st1) [ENTER
alues entere
y y andR] 6:sum(
quared is at, Calories,
e it right in ditor press [ENTER]. Y
uared.
or Fat, Calor
equacy of a
create residuthe relations
data from t84 or list2 The previous ressions, theatvars\resid
ator to discu
how points mof zero (the
ey are not, n equation.
nformation
and press .
ould be back
s [CLEAR] [E
R].
ed in list3.
d should be
[ENTER] li
2y y s, and residu
list3, do the[CLEAR] [2n
Your answer
ries, and res
linear regre
ual plots andship betwee
he previousTI-89 Calorie section ande same data on the TI-8
uss the exte
more-or-les average of
type them i [ENTER].
on the scre
[F1] to
k in the lists
ENTER]
very close
st3 [)] [ENT
something c
uals for the n
e following ond] [5]3:Lisr should be 4
siduals for t
ession mode
d determineen the expla
s section in le data, and Ld follow the will be in th
89. You may
nt to which
s randomly any set of r
in). Choose
een.
access the
s again.
to 0. On the
TER], and yo
close to 4494
next part of
on the TI-89st [ENTER] 644942.8, wh
the next par
el.
e whether a natory and
lists L1, TI-8L3 TI-83/TI- steps now. he list namey use either
a line is a g
distributed residuals is
a function s
Y= Editor. S
e [HOME] sc
ou will see if
42.81.
f this review
9: 6:sum( [ENhich is the su
rt of this rev
linear regre response
83/TI-84 or-84 or list3 T Note that, ied RESID as list for this
good model
about the always 0).
slot
Scroll
creen
f your
w.
NTER] um of
view
ssion
r list1 TI-89 if you in L3
for
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 8 of 22
-
Dis
Answer 1. First,
lineapreviIt shoon th
Our v
On tPfu
TP
T
On t
Pp
Distinctive pas not the be
, use the dar regressionous section ould look likhe screen):
visual impre
he TI-83/Tress Y = andunction is tu
urn 2nd STALOT screen
hen press [
he TI-89:
ress Y=ress [F2]9.
atterns, sucst possible m
ta in L1 andn of Calories and do it no
ke this (note
ession is tha
TI-84: d turn off Y1urned off, th
AT PLOT 1 o the followin
ZOOM 9], a
= and turn o
h as a parabmodel for th
d L2 on the s on Fat. (If ow.) Graph e that the re
at this is a p
1 by moving he = will not
n and be sung way, with
and you sho
on your data
bola, about he data.
TI-83/TI-84 youve forg the scatterpegression lin
pretty good f
your cursort be highligh
ure the otheh L3 as your
uld get:
a plot and th
the y = 0 lin
4 (list1 and gotten how tplot of the dne and one
fit. Now, to
r over the = hted.)
er STAT PLOr Ylist:
he best-fit l
ne would ind
list2 on theto do this, rdata and thepoint on the
do a residu
and pressin
OTs are off.
ine in y1. To
dicate that a
e TI-89) to dreturn to thee regressione line also s
ual plot:
ng ENTER. (I
Set up the
o see your p
a line
do a e n line. hows
If the
STAT
plot,
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 9 of 22
-
O
Pm Wthre
Tsoism
The Cor
ObjectivCalculate Example1. Defin
2. The f
IQG
On the home
ress [Fmark in front
With your cuhe mark youeturn to the
here is a moo we would s some tendmight expect
rrelation Co
ve 1 e Pearson's
es ne the correl
following hy
Q (x) 10GPA (y) 3.
e screen pre
F1] and scrot of it. Make
rsor on plotu like to usee plot1. Pres
ore or less ra therefore coency for thet the model t
oefficient
correlation
lation coeffi
ypothetical d
00 120 0 3.8
ss [F4] 8: F
oll to plot1. e sure that a
t1, press [ENe; x is list1 ass [F2] 9 (Zo
andom patteonclude thate pattern to to predict so
coefficient r
icient by wr
data represe
110 103.1 2.9
FnOff [ENTER
Turn it on ball other plo
NTER], and and y is list3oomData) to
ern about tht the line is get closer tomewhat be
r for a set o
iting out the
ents a set of
05 85 9 2.6
R] to turn o
by pressing ots are off.
for Plot Typ3. Press [ENo see your r
he line y = 0 a reasonabto y = 0 at tetter for high
of paired dat
e formula.
f IQ scores
95 132.9 3.6
ff all functio
[F4], which
pe choose "SNTER][ENTEresidual plot
0 (also knowle model forthe right of ther numbers
ta and expla
and grade p
30 100 6 2.8
ons on the T
h puts a che
Scatter." ChER] to save t.
wn as y - y r the data. Tthe graph, ss of grams o
ain its mean
point averag
105 903.1 2.4
TI-89.
eck
hoose and
= 0), There so we of fat.
ning.
ges:
4
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 10 of 22
-
A. DrB. FinC. W
GD. Ho
ex Tips
Ty
Ta[Ca
Answer
1. r
A
BC
D
ObjectivExplain tline. Example1. Cons
line.
2. A setcompwhat
raw a scattend the correhat does r t
GPA? ow would thxplanatory v
he correlati-variables ohe correlatissuming yoCATALOG] Always on fo
s
11
ixn s
A.
B. r = .91 C. r is quite
and the crelationshor about
D. The correlinear relatalks abostrength
ve 2 the relations
es ider a data s
t of data poiputed and a is the slope
erplot of theelation coefftell you in te
he correlatiovariable) an
on coefficieor in terms oon coefficieuve rememAlpha [D], scr the TI-89,
i
x y
x y ys s
strong (quitcorrelation chip betweenthe directio
elation coeffationship. Tut a directioof the linea
ship betwee
set with r =
ints has r = regression e of the reg
ese data (grficient (graperms of the
on coefficiennd IQ the y-
nt can be deof the standnt on your T
mbered to tucroll to Diag, and r is dis
11 x
z zn
te close to -oefficient in
n IQ and GPAn of predictficient wouldThe regressioon of predictr relationsh
en the correl
-.88, sx = 9.
.56. A com line fitted tression line
aphing calcuphing calculastrength of
nt change if variable (th
efined in terardized z-scTI-83 appeaurn DiagnostgnosticsOn, splayed whe
yz
1 or +1). Thn #2 leads uA. It says notion. d not changon coefficiention, but coip between
lation coeffi
.17, and sy =
mplete set ofo these data fitted to the
ulator). ator). a linear rela
we let GPA he response
rms of raw scores for eaars when yotics on in th and press Een you perfo
he visual ims to believe
othing about
e. Correlationt (the sloprrelation is two variabl
icient and th
= 10.54. Fin
f z-scores foa. If there'se scatterplo
ationship be
be the x-va variable)?
scores for thach data setou do a lineahe CATALOGENTER ENTEorm a regre
age of the se that there t cause and
on is simplye of the leasimply a mees.
he slope of t
d the slope
or the x- ands enough infot of the z-s
etween IQ a
ariable (the
he x- and t. ar regressioG menu: 2nd ER. Diagnostssion equat
scatter plot iis a strong l effect, how
y a measurest-squares easure of th
the regressi
of the regre
d y-values iformation tocores?
and
n tics is tion.
in #1 inear ever,
e of line)
he
ion
ssion
s o tell,
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 11 of 22
-
Tips sx is the standard deviation of the values in the variable x, and sy is the standard
deviation of the values in the variable y.
s
x
yb rs
where b = the regression coefficient r = the correlation coefficient sx = the standard deviation of the values in the variable X sy = the standard deviation of the values in the variable Y
If you standardize the data in x and y (if you convert the numbers to z-scores), sx and
sy will both be one, and the formula is simplified to: 11
b r b = r
Therefore, the correlation coefficient and the regression coefficient will have the same value.
Answer
1. 10.54.88 1.0119.17s
x
yb rs
2. The slope of the regression line for the scatterplot of standardized scores will be the same as the correlation coefficient for the raw data: r = .56. This happens because when you standardize scores, you end up with a standard deviation of 1. Thus the ratio of the standard deviations in the formula given in example 1 is 1, and you're left with b = r.
Objective 3 Calculate the coefficient of determination (r2) for a set of paired data and explain its meaning. Examples
1. Given the following data set:
X 114 87 93 74 50 Y 14 12 10 9 7
A. Find the line of best fit, and the correlation coefficient, of Y on X.
B. Find the value of r2.
C. Interpret r2 in the context of the data set.
2. Describe what's meant by the phrase, "r2 is the amount of variation in Y explained by X."
Tip
The "amount of variation in Y" refers to the total variation in Y measured from the
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 12 of 22
-
average y-value, that is, from y (as opposed to y ). The total variation is often referred
to as the "total sum of squares" (SST) and equals 21 .y y Answers 1.
A. y = 1.51 + .11x,r = .93 B. r2 = .87 C. 87% of the variation in Y (as measured from y ) is attributable to variation in X.
2. The short explanation:
Some of the variation in Y is tied to the trend shown by the least-squares line; as the x-variable increases, the y-variable increases or decreases by a certain amount. That's the variation explained by X. But other components of the variation in Y aren't related to changes in X. That's the variation not explained by X. The coefficient of determination (r2) is simply the proportion of the variation in Y that is explained by X. The longer but more precise explanation: Imagine that we had a set of y-values with no knowledge of x-values they might be paired with. With no ability to predict a value of y, our best guess at any value of Y would have to be y , the average value. The distance from any point to y can be considered an "error" in our prediction. You can compute the sum of the squares of all such "errors" in the data set of y-values and call it total sum of squares. Now find the line of best fit, which is simply the horizontal line y =y . Now, consider an "error" to be the distance from the actual y-value to the predicted y-value (that's right, it's a residual). Find the sum of the squares of those residuals. The (total sum of squares)-(sum of squares of residuals) is the amount of error eliminated because of basing our predictions on the regression line rather than the average y-value. The fraction of the total error from y that this represents is the amount of variation "explained" by the variable X.
Objective 4 Interpret MINITAB output for regression and correlation. Examples Consider this printout:
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 13 of 22
-
1. Ident
2. Ident Answer1. r2 ("R
regre2. y = 6 Influent
ObjectivIdentify setting.
tify the corr
tify the regr
s R-Sq" in the ession line is64.2 1.01
tial Points
ve 1 and describ
relation coef
ression equa
printout) = s negative. x
and Outlie
be the influe
fficient.
ation.
.776, and r
ers
nce of outlie
= -.881. NO
ers and influ
OTE: r is neg
uential poin
gative since
ts in a regre
e the slope o
ession
of the
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 14 of 22
-
ObjectivDistingui
Example1. Desc
2. Cons
Is the
3. If theregre
Answer1. Outlie
verticpointInfluepoint
Outliesmal
Influeregre
2. It's adirechas a
3. The sis y line hwith
Transfo ObjectivRecogniz ObjectivUse powapproprican be a
ve 2 ish between
es ribe the diff
ider the foll
e point show
e "box" poinession line in
s: ers are poincal directionts are pointsential pointsts can also b
ers tend to ler effect on
ential pointsession line);
an influentiation and it'sa large effec
slope would 6.03 .5 .x (N
have a negathe influent
rmations t
ve 1 ze whether
ve 2 er, logarithmate and anappropriately
n an outlier
ferences bet
owing grap
wn as a box
nt were remncrease or d
nts that havn. They usuas whose rems are typicabe outliers.
have a largn the regres
s tend to ha they have
l point and s removal wct on both th
increase. FNote that thative slope wtial point rem
to Achieve
bivariate da
mic, and polyze the resy modeled w
and an influ
tween outlie
h:
x an outlier o
oved from tdecrease?
e large residally aren't emoval wouldlly on the ex
e effect on tssion coeffic
ave a large ea relatively
an outlier, swould have ahe regressio
or these date effect of t
when the resmoved is y
Linearity
ata may be l
olynomial traults to deterwith a straig
uential point
ers and influ
or an influen
the above d
duals; they'xtreme valu have a signxtreme ends
the correlatient (slope o
effect on the smaller effe
since it's fara large effecon coefficien
ta, the regrethe point is st of the pat
1.9 1.5 .x
linearly rela
ansformatiormine if the ght line.
t.
uential point
ntial point?
rawing, wou
're far from ues of the enificant effes of the ran
tion coefficieof the regre
e regressionect on the c
r from the lict on the slont and the c
ession equaso severe tttern is clea
ted.
ns to linear transformat
ts.
uld the slop
the linear pxplanatory
ect on the rege of x-valu
ent (r) and aession line).
n coefficientcorrelation c
inear patterope of the reorrelation c
ation with thhat it makearly positive
ize non-linetion has resu
e of the
pattern in th(X). Influenegression linues. Influen
a relatively
t (slope of thcoefficient (r
rn in a verticegression linoefficient.
he "box" incls the regres.) the equat
ear data wheulted in data
he ntial ne. tial
he r).
cal ne. It
luded ssion tion
ere a that
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 15 of 22
-
Objective 3 Use the graphing calculator to do transformation of data and analyze the results to determine if the transformation has resulted in data that can be appropriately modeled with a straight line.
Examples 1. Consider the following set of observations: Obs. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x 13.5 13.5 14 15 17.5 19 20 21 22 23 25 25 26 27 y 5 15 35 25 25 70 80 140 75 125 190 300 240 315
A. Enter the data in L1 and L2 in your TI-83/84 or list1 and list2 in your TI-89; find the
regression line, and construct a scatterplot with the regression line included. Does a line appear to be a good model for these data? Why not?
B. What is r2 for this model? C. Find the natural logarithms (ln) of the y-data. Put these values in L3 (L3=ln(L2)) or list3
(list3 = ln(list2)). D. Draw a scatterplot of x vs. ln y. Find the regression equation of ln y on x and include it
on the graph. Does this appear to be a better linear fit? What is r2 for this model? E. Use the regression equation you found in #4 to predict the value of y when x = 16
(remember to "back transform" to the original data!)
Tips To transform a data set (for instance, to find ln y), follow these steps:
1. Go to your lists. 2. Place your cursor over the top of L3 and clear the data in L3 (assuming that your
data are in L1 and L2) or list3 and clear the data in list3 on the TI-89 with your data in list1 and list3.
3. With your cursor over the top of L3 (or list3), press [ENTER] so that your cursor is blinking at the bottom of the screen, and you see the expression L3= (or list3= on the TI-89).
4. Press LN 2nd L2 ENTER (or [2nd] LN list2 [ENTER] on the TI-89). 5. You can now do bivariate statistics and make plots of L1 vs. L2 (list1 vs. list2). Be
sure to recalculate the linear regression equation for L1 vs. L3 (list1 vs. list3) and change your STAT PLOT to L1 vs. L3 (list1 vs. list3 for plots on the TI-89).
Important note: Your graphing calculator has options for quadratic regressions, cubic regressions, and other nonlinear regressions. Do not use these, as they will yield incorrect residuals. Instead, transform the data first as taught in the tutorial.
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 16 of 22
-
$QVZHUV $
7KHUHJUHVVLRQHTXDWLRQLVVKRZQDWWKHWRSRIWKHFDOFXODWRUVFUHHQ$OLQHLVQRWDJRRGILWIRUWKHVHGDWDWKHSRLQWVVKRZDGHILQLWHFXUYHGSDWWHUQ% r 2
& 3DUWRIWKHFRQYHUVLRQVDUHVKRZQEHORZ
7KHUHJUHVVLRQHTXDWLRQOQ\RQ[LVVKRZQDWWKHWRSRIWKHVFUHHQEHORZ
,WGRHVDSSHDUWREHDEHWWHUILWU DQG U
( )RUWKLVPRGHO \[\ OQ ==+=+=
7KHQ == H\ 1RWHWKDWWKLVZDVFRPSOHWHGE\EDFNWUDQVIRUPLQJ
&DWHJRULFDO%LYDULDWHDWD7ZR:D\7DEOHV
2EMHFWLYH5HFRJQL]HVLWXDWLRQVZKHUHXVLQJDWZRZD\WDEOHLVDQDSSURSULDWHPHWKRGIRUDQDO\]LQJELYDULDWHGDWDDQGFRQVWUXFWDWZRZD\WDEOHIURPWKHGDWD
([DPSOHV ,QDUHFHQWVXUYH\DWDSXEOLFKLJKVFKRRODUDQGRPVDPSOHIURPHDFKFODVVZDV
DVNHGZKHWKHUWKH\FRQVLGHUHGWKHPVHOYHVHPRFUDWVRU5HSXEOLFDQVRIIUHVKPHQVDLGWKH\ZHUHHPRFUDWVRIVRSKRPRUHVVDLGWKH\ZHUHHPRFUDWVRIMXQLRUVVDLGWKH\ZHUHHPRFUDWVVHQLRUVZHUHLQWHUYLHZHG$OOWRJHWKHUVWXGHQWVLQFOXGLQJVHQLRUVVDLGWKH\FRQVLGHUHGWKHPVHOYHVHPRFUDWV$ 3UHVHQWWKHUHVXOWVRIWKHVXUYH\LQDWZRZD\FRQWLQJHQF\WDEOHOHWFODVVEH
WKHURZYDULDEOH% *LYHWKHFRQGLWLRQDOGLVWULEXWLRQIRU5HSXEOLFDQV
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 17 of 22
-
$QVZHUV
$ 3DUW\3UHIHUHQFH
&ODVV HP 5HS)5 62 -5 65
% &RQGLWLRQDOGLVWULEXWLRQIRU5HSXEOLFDQVIUHVKPHQDUHRU5HSXEOLFDQ
VRSKRPRUHVDUHRU5HSXEOLFDQMXQLRUVDUHRU5HSXEOLFDQDQGVHQLRUVDUHRU5HSXEOLFDQ
2EMHFWLYH,GHQWLI\DQGLQWHUSUHWWKHGLIIHUHQWW\SHVRIIUHTXHQFLHVDQGGLVWULEXWLRQVLQDWZRZD\WDEOHLQFOXGLQJPDUJLQDOIUHTXHQFLHVPDUJLQDOGLVWULEXWLRQVDQGFRQGLWLRQDOGLVWULEXWLRQVRIFROXPQVE\URZVDQGURZVE\FROXPQV
([DPSOH &RQVLGHUWKHIROORZLQJWZRZD\WDEOH
9DULDEOH%9DULDEOH$ ( ) *
&
$ ,GHQWLI\WKHURZDQGFROXPQYDULDEOHV% ,GHQWLI\WKHPDUJLQDOIUHTXHQFLHV& )LQGWKHPDUJLQDOGLVWULEXWLRQVIRUHDFKYDULDEOH ,GHQWLI\WKHFRQGLWLRQDOGLVWULEXWLRQIRU9DULDEOH%YDOXH(
7LSV &RQGLWLRQDOGLVWULEXWLRQVDUHSURSRUWLRQVRIDYDOXHZLWKLQDJLYHQYDULDEOH &RQGLWLRQDOGLVWULEXWLRQVFDQJRLQWZRGLUHFWLRQVURZE\FROXPQRUFROXPQE\URZ,QWKLVFDVHDQH[DPSOHRIDURZE\FROXPQGLVWULEXWLRQZRXOGEHWKHSURSRUWLRQRIYDOXHVLQWKHHDFKURZLQYDULDEOH$WKDWIDOOLQWKHFROXPQV()RU*
:KHQFRQGLWLRQDOGLVWULEXWLRQVDUHQDPHGWKHILUVWQDPHLVWKHGHQRPLQDWRU)RUH[DPSOHLI\RXZDQWWKHFRQGLWLRQDOGLVWULEXWLRQRI:KLWHVZKRDUHIHPDOHUDFHE\JHQGHU\RXUHORRNLQJIRUWKHSURSRUWLRQ
:KLWHVRIQXPEHUWRWDOIHPDOHDUHZKR:KLWHVRIQXPEHU
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 18 of 22
-
,I\RXUHORRNLQJIRUWKHFRQGLWLRQDORIIHPDOHVZKRDUH:KLWHJHQGHUE\UDFH\RXUHORRNLQJIRUWKHSURSRUWLRQ
IHPDOHVRIQXPEHUWRWDO:KLWHDUHZKRIHPDOHVRIQXPEHU
$QVZHU
$ 5RZYDULDEOH$&ROXPQ9DULDEOH%% )RUYDULDEOH$IRU9DULDEOH%
& )RUYDULDEOH$
==
)RUYDULDEOH%
===
)RUYDOXH(RIYDULDEOH%YDOXHV&DQGRI9DULDEOH$KDYHUHVSHFWLYHO\
==
2EMHFWLYH5HFRJQL]HDQGLQWHUSUHW6LPSVRQV3DUDGR[
([DPSOH ,Q$QGUHDZVRQEDWWHGDJDLQVWULJKWKDQGHGSLWFKHUVDQGDJDLQVW
OHIWKDQGHGSLWFKHUV7KDWVDPH\HDU/HH/DF\EDWWHGDJDLQVWULJKWKDQGHGSLWFKHUVDQGDJDLQVWOHIWKDQGHGSLWFKHUV7KDWLV$QGUHDZVRQEDWWHGEHWWHUWKDQ/HH/DF\DJDLQVWERWKULJKWDQGOHIWKDQGHGSLWFKHUV+RZHYHUIRUWKH\HDU/HH/DF\EDWWHGWR$QGUHDZVRQV+RZLVLWSRVVLEOHWKDWRQHEDWWHUFRXOGEHEHWWHUDJDLQVWERWKULJKWDQGOHIWKDQGHGSLWFKHUV\HWKDYHDORZHUDYHUDJHRYHUDOO"
7LS 7KHUHLVSUREDEO\DOXUNLQJYDULDEOHLQKHUHVRPHZKHUH
$QVZHU 7KLVLVDQH[DPSOHRI6LPSVRQV3DUDGR[$QGUHDZVRQKDGPDQ\PRUHDWEDWVDQG
KLVDWEDWVZHUHFRQFHQWUDWHGDWWKHORZHUDYHUDJH/HH/DF\VDWEDWVRQWKHRWKHUKDQGZHUHPRUHFRQFHQWUDWHGDWWKHKLJKHUDYHUDJH7KHH[DFWGDWDIROORZV
$QGUHDZVRQ +LWV $W%DWV $YHUDJHYV5+3 YV/+3 7RWDOV
/HH/DF\ +LWV $W%DWV $YHUDJHYV5+3 YV/+3 7RWDOV
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 19 of 22
-
8QLW:UDS8S
2EMHFWLYHHILQHHDFKNH\WHUPDQGFRQFHSWLQWKLV8QLW
2EMHFWLYH([SODLQWKHVLJQLILFDQFHRIHDFKNH\WHUPDQGFRQFHSWLQWKLV8QLW
2EMHFWLYH$SSO\FRQFHSWV\RXOHDUQHGLQWKLV8QLWWRVSHFLILFSUREOHPV
7LSV $V\RXUHYLHZWKHNH\WHUPVWKLQNDERXWKRZWKH\UHODWHWRRQHDQRWKHU3OD\DJDPHLQZKLFK\RXSLFNDWHUPDWUDQGRPGHILQHLWWKHQSLFNDQRWKHUWHUPDQGWKLQNKRZLWUHODWHVWRWKHILUVWWHUP7KHWZRWHUPVPD\QRWVHHPUHODWHGDWILUVWEXWDOORIWKHWHUPV\RXOOOHDUQLQWKLVFRXUVHDUHVRPHKRZUHODWHGVRPHDUHGLVWDQWFRXVLQVUDWKHUWKDQEURWKHUVDQGVLVWHUV
7KLQNDERXWKRZWKHFRQFHSWV\RXOHDUQHGFRXOGEHFRPELQHGDQGDSSOLHGWRQHZVLWXDWLRQV7KHTXHVWLRQV\RXVHHRQWKH4XL]DQGRQWKH$3([DPPD\ORRNQHZRQWKHVXUIDFHDQGWKH\PD\FRPELQHFRQFHSWVLQQHZZD\VEXW\RXOOEHDEOHWRDQDO\]HDQGVROYHWKHPE\DSSO\LQJWHFKQLTXHVWDXJKWLQWKLV8QLW
6DPSOH)UHH5HVSRQVH4XHVWLRQ
7KHUHDUHPDQ\ZD\VWKHFRQFHSWVLQWKLV8QLWFDQEHFRPELQHGLQWRDIUHHUHVSRQVHTXHVWLRQUHTXLULQJ\RXWRXVHPDQ\RIWKHFRQFHSWV\RXYHOHDUQHGLQRUGHUWRVROYHLW+HUHVMXVWRQHH[DPSOH
7KHIROORZLQJUHSUHVHQWVWKHSRSXODWLRQRIWKH8QLWHG6WDWHVLQWKRXVDQGVLQWHQ\HDULQWHUYDOVIURPWR6RXUFH86&HQVXV%XUHDX
-
$QVZHUV 3RSXODWLRQ [U 7KHOLQHGRHVDSSHDUWRILWWKHGDWDUHDVRQDEO\ZHOO
7KHUHVLGXDOSORWVKRZVDGHILQLWHSDWWHUQVRWKDWHYHQWKRXJKWKHOLQHYLVXDOO\DSSHDUHGWREHDJRRGILWDQGWKHUHZDVDYHU\KLJKFRUUHODWLRQFRHIILFLHQWDOLQHFHUWDLQO\LVQWWKHEHVWSRVVLEOHPRGHOIRUWKHGDWD
:HNQRZWKDWSRSXODWLRQJURZWKRIWHQJURZVH[SRQHQWLDOO\VROHWVWU\WDNLQJWKHORJRIWKHSRSXODWLRQYDOXHVDQGGRDUHJUHVVLRQRQWKHWUDQVIRUPHGGDWD
-
7KHUHJUHVVLRQHTXDWLRQLVOQSRSXODWLRQ [U 7KHSLFWXUHGRHVORRNOLNHDVRPHZKDWEHWWHUILWDQGWKHFRUUHODWLRQFRHIILFLHQWLVVOLJKWO\KLJKHUWKDQIRUWKHUDZGDWD7REHVXUHKRZHYHUZHQHHGWRORRNDWWKHUHVLGXDOSORWIRUWKHVHGDWD
$OWKRXJKQRWFRPSOHWHO\UDQGRPZHVKRXOGZRUU\DELWDERXWWKHWUHQGIRUWKHUHVLGXDOVWREHGHFOLQLQJGXULQJWKHODVW\HDUVLWFHUWDLQO\KDVOHVVRIDSDWWHUQWKDWWKHUHVLGXDOSORWIRUWKHUDZGDWD:HFRQFOXGHWKDWWKHGDWDLVPRUHOLNHO\H[SRQHQWLDOWKDQOLQHDUDQGZHQRWHWKDWWKHUDWHRILQFUHDVHVHHPVWREHGHFOLQLQJRYHUWKHODVWVHYHUDOGHFDGHV
______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse) TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated. TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.
AP Statistics Review: Bivariate Data: Regression Analysis and Two-Way Tables
Page 22 of 22