§14.2–The Coefficient of Determination
Tom Lewis
Fall Term 2009
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 1 / 13
Outline
1 Review
2 The regression identity
3 Some computing formulas
4 The coefficient of determination
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 2 / 13
Review
Variation
Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑
(wi − w)2 =∑
w2i − (
∑wi )
2
n
The variation of a set measures the deviation of the set from its mean.
The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then
s =
√variation
n − 1or variation = (n − 1)s2
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13
Review
Variation
Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑
(wi − w)2 =∑
w2i − (
∑wi )
2
n
The variation of a set measures the deviation of the set from its mean.
The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then
s =
√variation
n − 1or variation = (n − 1)s2
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13
Review
Variation
Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑
(wi − w)2 =∑
w2i − (
∑wi )
2
n
The variation of a set measures the deviation of the set from its mean.
The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then
s =
√variation
n − 1or variation = (n − 1)s2
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13
Review
Variation
Given a data set {w1,w2, . . . ,wn}, we define the variation of the setby ∑
(wi − w)2 =∑
w2i − (
∑wi )
2
n
The variation of a set measures the deviation of the set from its mean.
The variation of a data set is closely related to its standard deviation.For example, if s is the population standard deviation of this set, then
s =
√variation
n − 1or variation = (n − 1)s2
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 3 / 13
Review
Regression formulas
Given a set of n ordered pairs (x1, y1), . . . , (xn, yn), let
Sxx =∑
(xi − x)2 =∑
x2i −
( ∑xi )
2
n
Syy =∑
(yi − y)2 =∑
y2i −
( ∑yi )
2
n
Sxy =∑
(xi − x)(yi − y) =∑
i
xiyi −( ∑
i xi )( ∑
yi
)n
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 4 / 13
Review
The regression equation
The regression equation for a set of n data points is
y = b1x + b0,
where b1 = Sxy/Sxx and b0 = y − b1x .
The big picture
According to our model,
y = b1x + b0︸ ︷︷ ︸regression
+ e︸︷︷︸error
We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13
Review
The regression equation
The regression equation for a set of n data points is
y = b1x + b0,
where b1 = Sxy/Sxx and b0 = y − b1x .
The big picture
According to our model,
y = b1x + b0︸ ︷︷ ︸regression
+ e︸︷︷︸error
We will show that the total variation in the variable y (SST ) can beseparated into the variation due to the regression model (SSR) and thevariation due to the error (SSE ).
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 5 / 13
Review
Problem (Variation in the data)
Recall that the regression equation for the data set (1, 2), (3, 5), and (4, 8)is
y =27
14x − 1
7
This gives us four columns of data to study:
x y y y − y
1 2 25/14 −3/14
3 5 79/14 9/14
4 8 53/7 −3/7
Compute the variation of the data in each of the columns.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 6 / 13
Review
Solution (Part I)
The variation in the x data is simply
Sxx =∑
x2i −
(∑
xi )2
n= 26− 82
3=
14
3.
The variation in the y data is simply
Syy =∑
y2i − (
∑yi )
2
n= 93− (15)2
3= 18.
The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that
SST = Syy
Our calculations continue on the next slide.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13
Review
Solution (Part I)
The variation in the x data is simply
Sxx =∑
x2i −
(∑
xi )2
n= 26− 82
3=
14
3.
The variation in the y data is simply
Syy =∑
y2i − (
∑yi )
2
n= 93− (15)2
3= 18.
The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that
SST = Syy
Our calculations continue on the next slide.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13
Review
Solution (Part I)
The variation in the x data is simply
Sxx =∑
x2i −
(∑
xi )2
n= 26− 82
3=
14
3.
The variation in the y data is simply
Syy =∑
y2i − (
∑yi )
2
n= 93− (15)2
3= 18.
The variation in the y data is also denoted by SST for the total sumof squares variation. Let us remember that
SST = Syy
Our calculations continue on the next slide.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 7 / 13
Review
Solution (Part II)
The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have
SSR =∑
y2 − (∑
y)2
n=
1293
14− (15)2
3=
243
14.
The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have
SSE =∑
(y − y)2 − (∑
(y − y))2
n=
9
14− 02
3=
9
14.
Notice that∑
(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13
Review
Solution (Part II)
The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have
SSR =∑
y2 − (∑
y)2
n=
1293
14− (15)2
3=
243
14.
The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have
SSE =∑
(y − y)2 − (∑
(y − y))2
n=
9
14− 02
3=
9
14.
Notice that∑
(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13
Review
Solution (Part II)
The variation in the y data is denoted by SSR, which stands for thesum of the squares in the regression data. For our set we have
SSR =∑
y2 − (∑
y)2
n=
1293
14− (15)2
3=
243
14.
The variation in the y − y data is denoted by SSE, which stands forthe sum of the squarea in the error data. For out set we have
SSE =∑
(y − y)2 − (∑
(y − y))2
n=
9
14− 02
3=
9
14.
Notice that∑
(y − y) = 0. This is not a coincidence; the sum of theerrors is always 0.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 8 / 13
The regression identity
The regression identity
Notice that
SSR + SSE =243
14+
9
14=
252
14= 18 = SST .
This is not a coincidence. In general,
SST = SSR + SSE
This is called the regression identity.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 9 / 13
Some computing formulas
Computing formula for SST
Recall thatSST = Syy .
In other words, SST is nothing more than the variation in the y -data.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 10 / 13
Some computing formulas
Computing formula for SSR
The key observation is that∑
y =∑
y and therefore the mean of theregression data (the y -data) is y ; thus,
SSR =∑
(y − y)2 =∑
(b1x + b0 − b1x − b0)2
= b21
∑(x − x)2 = b2
1Sxx
=S2
xy
S2xx
Sxx =S2
xy
Sxx
In summary,
SSR =S2
xy
Sxx.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 11 / 13
Some computing formulas
Computing formula for SSE
From the regression identity, we have SSE = SST − SSR; therefore,
SSE = Syy −S2
xy
Sxx
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 12 / 13
The coefficient of determination
The coefficient of determination
The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:
r2 =SSR
SST
Problem
Develop a computing formula for r2.
What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?
Problem
Work on the regression handout.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13
The coefficient of determination
The coefficient of determination
The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:
r2 =SSR
SST
Problem
Develop a computing formula for r2.
What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?
Problem
Work on the regression handout.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13
The coefficient of determination
The coefficient of determination
The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:
r2 =SSR
SST
Problem
Develop a computing formula for r2.
What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?
Problem
Work on the regression handout.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13
The coefficient of determination
The coefficient of determination
The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:
r2 =SSR
SST
Problem
Develop a computing formula for r2.
What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?
Problem
Work on the regression handout.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13
The coefficient of determination
The coefficient of determination
The coefficient of determination, denoted by r2, is the proportion of thetotal variation in the response variable explained by the regression model:
r2 =SSR
SST
Problem
Develop a computing formula for r2.
What does it mean if r2 is close to 1? What does it mean if r2 isclose to 0?
Problem
Work on the regression handout.
Tom Lewis () §14.2–The Coefficient of Determination Fall Term 2009 13 / 13
Top Related