Linear Regression Exploring relationships between two metric variables.
The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables...
-
Upload
oscar-stanley -
Category
Documents
-
view
214 -
download
0
Transcript of The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables...
![Page 1: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/1.jpg)
The Practice of StatisticsThird Edition
Chapter 4:More about Relationships
between Two Variables
Copyright © 2008 by W. H. Freeman & Company
Daniel S. Yates
![Page 2: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/2.jpg)
Section 4.1 Modeling Nonlinear Data
Linear relationship
x y y/x
0 2.0 5.1
1 7.14.7
2 11.85.3
3 17.15.1
4 22.2
Constant value y/x indicate linear relationship
y = a + bx
1.501
0.21.7
12
12
x
y
xx
yy
![Page 3: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/3.jpg)
Section 4.1 Modeling Nonlinear Data
Exponential relationship
x y yn/yn-1
0 1.0
2.101 2.12.05
2 4.31.84
3 7.92.03
4 16.0
Constant value yn/yn-1 called common ratio indicates exponential relationship
y = abx
0.1
1.2
1
2
y
y
![Page 4: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/4.jpg)
Section 4.1 Modeling Nonlinear Data
Power relationship
x y yn/yn-1 y/x
0 0.0
2.11 2.17.7 14.1
2 16.23.3 37.3
3 53.52.38 73.9
4 127.4
Neither yn/yn-1 or y/x are constant indicates possible power relationship
y = axb
![Page 5: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/5.jpg)
Section 4.1 Modeling Nonlinear Data
•Many important real world situations exhibit exponential or power relationships.
•Exponential and power relationships can be transformed into linear forms so linear regression analysis can be utilized.
![Page 6: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/6.jpg)
![Page 7: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/7.jpg)
• Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can’t take it for granted.)
• A curved relationship between two variables might not be apparent when looking at a scatterplot alone, but will be more obvious in a plot of the residuals. – Remember, we want to see “nothing” in a plot of the
residuals.
![Page 8: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/8.jpg)
• No regression analysis is complete without a display of the residuals to check that the linear model is reasonable.
• Residuals often reveal subtleties that were not clear from a plot of the original data.
![Page 9: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/9.jpg)
Section 4.1 Modeling Nonlinear Data
• For exponential relationship - logy is linear with respect to x
• For power relationship - logy is linear with respect to logx
![Page 10: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/10.jpg)
Transforming Exponential Data
Year Cell Phone Users
1986 503
1987 890
1988 1545
1989 2701
1990 4734
1991 8345
1992 14356
1993 25019
1994 45673
Steps
1) Graph data
2) Check common ratio if you suspect exponential relationship
3) Create new list with log of the y values
4) Graph data. X vs log Y
5) Perform linear regression on the transformed data. Store equ. in Y1
6) Transform back by taking raising 10 to both sides of the equation.
7) Graph data to check
XY 2434186.0727179.480 10*10ˆ
![Page 11: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/11.jpg)
• Linear models give a predicted value for each case in the data.
• We cannot assume that a linear relationship in the data exists beyond the range of the data.
• Once we venture into new x territory, such a prediction is called an extrapolation.
![Page 12: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/12.jpg)
Section 4.2 Interpreting Correlation and Regression
• r and LSRL describe only linear relationships
• r and LSRL are strongly influenced by a few extreme observations – influential points
• Always plot your data
• The use of a regression line to predict outside the domain of values of the explanatory variable x is called extrapolation and cannot be trusted.
![Page 13: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/13.jpg)
Lurking Variables and Causation• No matter how strong the association, no matter how large the
R2 value, no matter how straight the line, there is no way to conclude from a regression alone that one variable causes the other.– There’s always the possibility that some third variable is
driving both of the variables you have observed.• With observational data, as opposed to data from a designed
experiment, there is no way to be sure that a lurking variable is not the cause of any apparent association.
![Page 14: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/14.jpg)
Section 4.2 Interpreting Correlation and Regression
• Lurking variables are variables that can influence the relationship of two variables.
• Lurking variables are not measured or even considered.
• Lurking variables can falsely suggest a strong relationship between two variables or even hide a relationship.
![Page 15: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/15.jpg)
Lurking Variables and Causation (cont.)
• The following scatterplot shows that the average life expectancy for a country is related to the number of doctors per person in that country:
![Page 16: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/16.jpg)
Lurking Variables and Causation (cont.)
• This new scatterplot shows that the average life expectancy for a country is related to the number of televisions per person in that country:
![Page 17: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/17.jpg)
Lurking Variables and Causation (cont.)• Since televisions are cheaper than doctors, send TVs to
countries with low life expectancies in order to extend lifetimes. Right?
• How about considering a lurking variable? That makes more sense…– Countries with higher standards of living have both longer
life expectancies and more doctors (and TVs!).– If higher living standards cause changes in these other
variables, improving living standards might be expected to prolong lives and increase the numbers of doctors and TVs.
![Page 18: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/18.jpg)
New England Patriotsy = 0.9716x + 194.48
R2 = 0.3658
0
50
100
150
200
250
300
350
400
0 20 40 60 80 100 120
Jersey Number
Wei
gh
t (lb
s.)
![Page 19: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/19.jpg)
• Strong association of variables x and y can reflect any of the following underlying relationships– Causation - changes in x cause changes in y
– ex. Consuming more calories with no change in physical activity causes weight gain.
– Common response – both x and y respond to some unobserved variable or variables.
– ex. There may be perceived cause and effect between SAT scores and undergrad GPA but both variables are likely responding to student knowledge and ability
![Page 20: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/20.jpg)
– Confounding – the effect on y of the explanatory variable x is mixed up with the effects on y of other lurking variables.
– ex. Minority students have lower ave. SAT scores than whites; but minorities on average grew up in poorer households and attended poorer schools. These socioeconomic variables make cause and effect suspect.
![Page 21: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/21.jpg)
Strong Association
![Page 22: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/22.jpg)
Strong Association
![Page 23: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/23.jpg)
Strong Association
![Page 24: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/24.jpg)
• A carefully designed experiment is the best way to get evidence that x causes y. Lurking variables must be kept under control.
![Page 25: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/25.jpg)
Section 4.3 Relations in Categorical Data
• Categorical data may be inherently categorical such as; sex,race and occupation.
• Categorical data may be created by grouping quantitative data.
• Two way tables – hold categorical data
![Page 26: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/26.jpg)
Income 25-34 35-54 55 + Total
0-19,999
4,506 2,738 3,400 10,644
20,000-39,999
8,724 5,622 4,789 19,135
40,000-49,999
12,643 16,893 7,642 37,178
Total 25,873 25,253 15,831 66,957
Age Groupexample
Row variable – Income Column variable - Age
![Page 27: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/27.jpg)
• The totals of the rows and column are called marginal distributions.
• The totals may be off from the table data due to rounding error.
• The data may also be represented by percents.
• Relationships between categorical data may be calculated from the two way table.
• Data may be represented by a bar chart.
![Page 28: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/28.jpg)
• Conditional distributions satisfy a certain condition on the table.– Ex. Distribution of income level for 25-34 year
olds.– Ex. Distribution of age for people making
$20,000 - $39,999
![Page 29: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/29.jpg)
Example Outcome Hospital A
Hospital B
Total
Died 63
(3%)
16
(2%)
79
Survived 2037
(97%)
784
(98%)
2,821
Total 2,100 800 2,900
![Page 30: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/30.jpg)
Outcome Hospital A
Hospital B
Hospital A
Hospital B
Died 6
(1%)
8
(1.3%)
57
(3.8%)
8
(4%)
Survived 594
(99%)
592
(98.7%)
1,443
(96.2%)
192
(96%)
Total 600 600 1,500 200
Good Condition Poor Condition
Lurking Variable
![Page 31: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/31.jpg)
![Page 32: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/32.jpg)
![Page 33: The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649f285503460f94c40076/html5/thumbnails/33.jpg)