1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in...
-
Upload
eugene-crawford -
Category
Documents
-
view
225 -
download
2
Transcript of 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in...
![Page 1: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/1.jpg)
1
Chapter 4: More on Two-Variable Data
4.1 Transforming Relationships
4.2 Cautions
4.3 Relations in Categorical Data
0
20
40
60
80
100
120
140
20 40 60 80 100 120 140 160 180
Time, minutes
g dy
e/kg
fib
er
![Page 2: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/2.jpg)
2
Example
Year
1990
1993
1994
1995
1996
1997
1998
1999
Cell Phone Users (thousands)
5,283
16,009
24,134
33,786
44,043
55,312
69,209
86,047
![Page 3: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/3.jpg)
3
Scatterplot for Cell Phone Example
![Page 4: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/4.jpg)
4
Residuals Plot
![Page 5: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/5.jpg)
5
What’s going on here?
• Do the data (y) increase by a constant amount each year?
– This would suggest a linear model.
• Or, do the data increase by a fixed percentage each year? That is, can you multiply the y-value by a fixed number to get the next year’s number, and then multiply that number by the fixed number to get the following year’s number?
– This would suggest an exponential model.
![Page 6: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/6.jpg)
6
Use an Exponential (Non-Linear) Model
![Page 7: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/7.jpg)
7
Plotting our original data vs. our exponential model …
![Page 8: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/8.jpg)
8
Prediction Using the Model
• Model:
• Now use the new model to predict cell phone subscribers for 2000.
![Page 9: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/9.jpg)
9
• Problem 4.6, p. 212
– Parts a, b, g
• Problem 4.11, p. 213
– Create a model, and then let’s see how well we can predict population in 2009.
Practice
![Page 10: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/10.jpg)
10
Power Law Models
• General form of a power law model:
paxy • Biologists have found that many characteristics of living things are described quite closely by power
laws.
– For example, the rate at which animals use energy goes up as the ¾ power of their body weight (Kleiber’s Law).
![Page 11: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/11.jpg)
11
Problem 4.13, p 219
![Page 12: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/12.jpg)
12
Problem 4.13, p. 219
![Page 13: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/13.jpg)
13
Residuals Analysis
![Page 14: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/14.jpg)
14
Predicting Lifespan for Humans
![Page 15: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/15.jpg)
15
Another Practice Problem
• 4.25, pp. 224-225
• Create appropriate model
• Predict seed count for tree with seed weight of 1,000 mg.
![Page 16: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/16.jpg)
16
• HW Problem:
– 4.14, p. 220
![Page 17: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/17.jpg)
17
4.2 Cautions about Correlationand Regression
• The correlation (r) and the LSR line are not resistant.
• As we have seen, extrapolation is often dangerous.
– Predicting past the x-variable for which the model was developed.
![Page 18: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/18.jpg)
18
The French Paradox
• The paradox refers to the fact that the French have long had low rates of heart disease (Japan is the only developed country with a lower rate), despite a diet relatively rich in saturated animal fats. The French propensity to drink wine the way some Americans guzzle soft drinks has been cited as a likely explanation of the paradox, since numerous studies have indicated that alcohol consumed in moderation helps to prevent atherosclerosis, or accumulation of fatty deposits in arteries, which is the underlying cause of most heart attacks.
+ from NY Times article
![Page 19: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/19.jpg)
19
Lurking Variables
• As we discussed in the example of amount of wine consumed vs. number of incidents of heart disease, there can be other variables not measured in a correlation study that may influence the interpretation of relationships among those variables.– Lurking Variables
• It is possible to show, for example, that there is a high correlation
between shoe size and intelligence for a group of children varying
in age from, say, 4 to 15.
– What is the lurking variable?• To control for age, we can calculate the correlation between shoe
size and IQ for each of the different ages.– Age 4, 5, 6, …
![Page 20: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/20.jpg)
20
Correlation Between Shoe Size and IQ?(Common Response)
Age
ShoeSize
IQ
![Page 21: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/21.jpg)
21
See Figure 4.18, p. 227
![Page 22: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/22.jpg)
22
Lurking Variables ThatChange Over Time
• Many lurking variables change systematically over time.
• One useful method for detecting lurking variables is to plot both the response variable and the regression residuals against the time order of the observations (whenever the time order is available).
• See Example 4.12, p. 228
![Page 23: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/23.jpg)
23
![Page 24: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/24.jpg)
24
Using Averaged Data
• Be careful when applying the results of a study that uses averages to individuals.
• Problem 4.31, p. 231
![Page 25: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/25.jpg)
25
Causation
• Simply put, a strong correlation between two variables says nothing about one variable causing the other. One variable may in fact cause the other to change, but a correlation or LSR line cannot tell us that.
– More investigation is needed!
• A designed study with proper experimental controls should be used.
![Page 26: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/26.jpg)
26
Figure 4.22, p. 232
• Causation
• Common Response
• Confounding
![Page 27: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/27.jpg)
27
Confounding
• The effects of two variables on a response variable are said to be confounded when they cannot be distinguished from one another.
– Definition: Two or more variables that might have caused an effect were simultaneously present, so that we do not know to which to attribute the effect.
– See 1, Example 4.13 (p. 232), and explanation, p. 233, top of p. 234.
• Does this mean that we cannot ever suggest causation?
– Read the two paragraphs on p. 235 (establishing causation).
![Page 28: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/28.jpg)
28
Causation
• Example 4.14, p. 232
– Numbers 1 and 2 (p. 233)
![Page 29: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/29.jpg)
29
Common Response
• Example 4.15, p. 233
![Page 30: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/30.jpg)
30
Homework
• Reading through p. 240
![Page 31: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/31.jpg)
31
Problems
• Problems on p. 237:
– 4.33, 4.34, 4.35
• 4.73, p.257
![Page 32: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/32.jpg)
32
Problem 4.73, p. 257
Power law model might best fit,so take log of L1 and L2. Plot belowof L3 and L4.
![Page 33: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/33.jpg)
33
4.73, cont.
The pendulum period is proportional to the square rootof its length.
![Page 34: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/34.jpg)
34
4.3 Relations in Categorical Variables
• There are many relationships of interest to us that cannot be described by using correlation and LSR techniques.
– Recall that correlation and LSR require both variables to be quantitative.
• Often, we want to study the relationship between two variables that are inherently categorical.
![Page 35: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/35.jpg)
35
Two-Way Table (Ex. 4.19, p. 241)
Age Group
Education 25 to 34 35 to 54 55+ Total
Did not complete HS
4,474 9,155 14,224 27,853
Complete HS 11,546 26,481 20,060 58,087
1-3 yrs college
10,700 22,618 11,127 44,445
4+ yrs college 11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
cell
![Page 36: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/36.jpg)
36
Two-Way Table
• The row variable is level of education.– In this study, is level of education the explanatory or
response variable?• The column variable is age.
– Explanatory or response?
• Marginal distributions:– The distributions of education alone and age alone
are called marginal distributions because their totals are in the margins: Education at the right, and age at the bottom.
![Page 37: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/37.jpg)
37
Marginal Distributions
• It is often advantageous to display the marginal distribution in percents instead of raw numbers.
Education Level in U.S. (adults age 25+)
15.9
33.125.4 25.6
0
10
20
30
40
50
No highschooldegree
High schoolonly
1-3 years ofcollege
4+ years ofcollege
Years of Schooling
Per
cen
t o
f T
ota
l
![Page 38: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/38.jpg)
38
Conditional Distributions
• The previous graph looked at the breakdown of education levels for the entire population. Many times, however, we are looking for breakdowns (i.e., distributions) for a certain group within the population.
– For example, of those people with 4+ years of college, look at the distribution across age groups.
– Let’s complete a bar graph for this comparison.
– This is a conditional distribution.
![Page 39: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/39.jpg)
39
One Conditional Distribution for Example 4.19
Breakdown by age group of people with 4+ years of college
24.7
51.7
23.6
0
10
20
30
40
50
60
25-34 35-54 55+
Age Group
Per
cent
![Page 40: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/40.jpg)
40
Different Question
• What proportion of each age group received 4+ years of college education?
![Page 41: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/41.jpg)
41
• Read paragraph at the bottom of page 248.
![Page 42: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/42.jpg)
42
One set of conditional distributions:Figure 4.27, p. 248
![Page 43: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/43.jpg)
43
Problems
• 4.53, p. 245
• 4.59, p. 251
![Page 44: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/44.jpg)
44
Graph for Problem 4.59
Beakdown of Planned Majors in Business School,by Gender
30.2
40.4
2.2
27.1
34.8
24.8
3.7
36.6
0
10
20
30
40
50
Accounting Admin Economics Finance
Business School Major
Per
cen
t
Female Male
![Page 45: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/45.jpg)
45
Homework
• Read through the end of the chapter.
• Be sure you understand “Simpson’s Paradox.”
• Problem:
– 4.62, p. 253
![Page 46: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/46.jpg)
46
Simpson’s Paradox
• Problem 4.60, p. 251
• Statement of the Paradox:
– Simpson’s paradox refers to the reversal of the direction of a comparison or association when data from several groups are combined to form a single group.
![Page 47: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/47.jpg)
47
Practice/Review Problems
• Problem:
– 4.68, p. 254
– 4.72 (parts a-c), p. 257
![Page 48: 1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.](https://reader035.fdocuments.in/reader035/viewer/2022081516/56649ec65503460f94bd1b21/html5/thumbnails/48.jpg)
48
Relationship Between Type of College and Management Level
4.2
54.4
41.4
7.3
53.1
39.8
-5
515
25
35
4555
65
High Middle Low
Management Level
Per
cen
t
Public Private