More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be...
More on Two-Variable Data
Chapter Objectives
• Identify settings in which a transformation might be necessary in order to achieve linearity.
• Use transformations involving powers and logarithms to linearize curved relationships.
• Explain what is meant by a two-way table, and describe its parts.
• Give an example of Simpson’s Paradox.• Explain what gives the best evidence for
causation.• Explain the criteria for establishing causation
when experimentation is not feasible.
The Goal
• Our goal is to fit a model to curved data so that we can make predictions as we did in chapter 3.
• HOWEVER, the only statistical tool we have to fit a model is the least-squares regression model.
• THEREFORE, in order to find a model for curved data, we must first “straighten it out”….
Transforming Relationships
• Data that displays a curved pattern can be modeled by a number of different functions.
• Two most common:– Exponential (y=ABx)– Power (y=AxB)
• Chapter 4 focuses on these two models
pp. 195 – 6
• Example 4.1
• Brain weight v. body weight
• Note about variables:– Sometimes we wish to transform x, or y, or
both x and y.– Therefore we refer to variables generically
as t.
Why
• Linear transformations cannot straighten a curved relationship between two variables.
• Because of this, we must resort to functions that are not linear.
A Note about Monotonic Functions
4.1
• A. y = 2.54 xmonotonic increasing
• B. y = 60/xmonotonic decreasing
• C. circumference = π(diameter)monotonic increasing
• D. SquaredError = (time – 5)2
Not monotonic
Figure 4.5
• What can we learn?– The graph of a linear function (power p = 1) is a straight line.– Powers greater than 1 (like p = 2 and p = 4) give graphs that
bend upward. The sharpness of the bend increases as p increases.
– Powers less than 1 but greater than 0 (like p = 0.5) give graphs that bend downward.
– Powers less than 0 (like p = -0.5 and p = -1) give graphs that decrease as x increases. Greater negative values of p result in graphs that decrease more quickly.
– Look at the p = 0 graph. You may be surprised that this is not the graph of y = x0. Why not? The 0th power x0 is just the constant 1, which is not very useful. The p = 0 entry in the figure is not constant; it is the logarithm, log x. That is, the logarithm fits into the hierarchy of power transformations at p = 0.
pp. 201 - 202
• Example 4.2 runs through several steps from the ladder of power transformations.
• This emphasizes that the process can be one of – (a) making a good guess, based on observations of a
graph of the data, about the type of transformation needed and
– (b) trying several types of the transformation chosen.• This can get tedious, so the next section
introduces a more analytic approach.• The first approach is to look for an exponential
growth pattern, which has the advantage that it can be linearized by taking logarithms (of the response variable) to transform the data.
4.3
• Weight = c1 (height)3 and
strength = c2 (height)2;
therefore, strength = c (weight)2/3, where
c is a constant.
4.4
• A graph of the power law y =x2/3 shows that strength does not increase linearly with body weight, as would be the case if a person 1 million times as heavy as an ant could lift 1 million times more than the ant. Rather, strength increases more slowly. For example, if weight is multiplied by 1000, strength will increase by a factor of (1000)2/3 = 100.
4.5
• Let y = average heart rate and x = body weight.• Keibler’s law says that total energy consumed is
proportional to the three-fourths power of body weight, that is, Energy = c1x3/4.
• But total energy consumed is also proportional to the product of the volume of blood pumped by the heart and the heart rate, that is, Energy = c2(volume)y.
• The volume of blood pumped by the heart is proportional to body weight, that is, Volume = c3x.
• Putting these three equations together yields
c1x3/4 = c2(volume)y = c2(c3x)y.• Solving for y, we obtain 4/1
32
4/31 cxxcc
xcy
Exponential Growth
• Linear growth: adding a fixed increment in each equal time period.
• Exponential growth: multiplying by a fixed number in each equal time period.– Can also be looked at as growing by a fixed
percentage.
p. 205
• Example 4.4• Is this exponential growth?• What is the projected amount for 2005?• Actual was 203,000,000 (2005)• Other interesting statistics:
– 2,000,000,000 cell phones world wide• 4.5% world without
– Average American spends 13 talking hours per month– Average American in 18 – 24 age group spends 22
talking hours per month
Texting in the United States
Logarithm
logbx=y if and only if by=x
The rules for logarithms are
XpX
BAB
A
BAAB
p loglog
logloglog
logloglog
p. 209
• Example 4.6
4.6
• A.
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
1977 1978 1979 1980 1981 1982
Year
Acr
es
4.6
• B. 226260/63024 = 3.59
907075/226260 = 4.01
2826095/907075 = 3.12
• C. log y yields 4.7996, 5.3546, 5.9576, 6.4512
4.6
• C.
4.5000
4.70004.9000
5.1000
5.30005.5000
5.7000
5.9000
6.10006.3000
6.5000
1977 1978 1979 1980 1981 1982
Year
log
(ac
res)
4.6
• D. use calculator to confirm
• E. The residual plot of the transformed data shows no clear pattern, so the line is a reasonable model for these points.
4.6
• F. xy 5558.051.1094ˆlog xy 5558.051.1094ˆlog 1010 xy 5558.051.109410ˆ
xy 5558.051.1094 1010ˆ
4.6
• G. The predicted number of acres defoliated in 1982 is the exponential function evaluated at 1982, which gives 10,719,964.92 acres.
4.9
162 41 x
048576,12 45 x
4.10
• A. Year # children killed
1951 2
1952 4
1953 8
1954 16
1955 32
1956 64
1957 128
1958 256
1959 512
1960 1024
4.10
• B.
0
200
400
600
800
1000
1200
1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
Year
# C
hil
dre
n K
ille
d
4.10
• C. If x = number of years after 1950, then y = the number of children killed x years after 1950 = 2x.
At x = 45, y = 245 = 3.52 x 1013, or
35,200,000,000,000.
4.10
• D.
0
0.5
1
1.5
2
2.5
3
3.5
1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
Year
log
(#
chil
dre
n k
ille
d)
4.10
• E. b = 0.3010
a = -587.008
xy 3010.0008.587ˆlog
p. 215
• Exponential growth models become linear when we apply the logarithm transformation to the response variable y.
• Power law models become linear when we apply the logarithm transformation to both variables.
4.17
• A. Year Value
1 537.50
2 577.81
3 621.15
4 667.73
5 717.81
6 771.65
7 829.52
8 891.74
9 958.62
10 1030.52
4.17
• B.
500.00
600.00
700.00
800.00
900.00
1000.00
1100.00
0 1 2 3 4 5 6 7 8 9 10 11
Year
Val
ue
4.17
• C. 2.73, 2.76, 2.79, 2.82, 2.86, 2.89, 2.92, 2.95, 2.98, 3.01
2.70
2.75
2.80
2.85
2.90
2.95
3.00
3.05
0 2 4 6 8 10 12
Year
log
(Val
ue)
4.18
• Alice has
• Fred has
17.3049075.1500 25
00.300025100500
Cautions About Correlation and Regression
Our Tools for Describing Data Sets
• Correlation– r: Strength, form, direction
• Regression– Generalized pattern– Useful for predictions
• Limitations of our tools– Correlation and regression describe only
linear relationships– The correlation “r” and the “LSRL” are NOT
RESISTANT
Other Cautions
• Extrapolation– The use of a regression line for prediction far
outside the domain used.– Examples:
• Age v. Height• Time v. Death Rate ( Swine Flu)• Time v. Water Level of a Lake• Time v. Children gunned down
Other Cautions
• Lurking Variables– A variable that is not among the explanatory
or response variables in a study and yet may influence the interpretation of relationships among these variables.
– Can falsely suggest relationship between x and y
– Can hide actual relationship between x and y
Other Cautions
• Lurking Variables– An example….
• There's this guy who's going to clean the windows of a mental asylum. A patient follows him shouts to him "I gotta secret, I gotta secret...", he ignores the patient. Again the patient follows him, but he ignores his cries. By the time he's nearly finished the building, he's really curious about what the patients secret is, so he decides to ask the patient. The patient pulls a matchbox out of his pocket, opens it and puts it on a table. Out crawls this little spider. The patient says "spider go left", and the spider walks to it's left a bit. Then he says "spider go right", the spider walks to its right a little bit. He says "spider turn around, walk forward then go right", and sure enough the spider turns around, walks forward, and then goes right a bit. The window cleaner is amazed "Wow! He says, that's amazing!", "No, that's not my secret says the patient, watch". He picks up the spider in his hand and pulls all its legs off then puts it back on the table. "Spider go right", the spider doesn't move, "spider go Left", the spider doesn't move, "Spider turn around" again the spider doesn't move. "There!" he says, "that's my secret, if you pull all a spiders legs off they go deaf....................
• The answer is not available in the original data, but was discovered through some additional research on the Buick Estate Wagon. These data were collected by Consumer's Union on a test track (rather than using the EPA test values for fuel efficiency) following the manufacturer's recommendations for each car's maintenance. Additional research revealed that starting with this model year, Buick recommended a higher tire inflation pressure for the Buick Estate Wagon. The recommended inflation pressure level was higher than the level for other cars in the survey. Harder tires present less rolling resistance and improve gas mileage; therefore, the Buick Estate Wagon outperformed our expectations based on our regression model, which did not account for tire inflation pressure. In our model Tire Pressure is a lurking variable, variable that seems to help in predicting gas mileage but is not included in the model.
Other Cautions
• Using averaged data– Pay particular attention to data that has been
averaged– The correlation and LSRL of these data sets
should not be applied to the individuals that the averages came from
• Example– Examining monthly data and attempting to apply it to a
day of that month.
Beware the post-hoc fallacyBeware the post-hoc fallacy
“Post hoc, ergo propter hoc.”
To avoid falling for the post-hoc fallacy, assuming that an observed correlation is due to causation, you must put any statement of relationship through sharp inspection.
Causation can not be established “after the fact.” It can only be established through well-designed experiments. {see Ch 5}
Explaining AssociationExplaining Association
Strong Associations can generally be explained by one of three relationships.
ConfoundingConfounding: x may cause y, but y may instead be caused by a confounding variable z
CommonCommon ResponseResponse: x and y are reacting to a lurking variable z
CausationCausation:x causes y
CausationCausation
Causation is not easily established.
The best evidence for causation comes from experiements that change x while holding all other factors fixed.
Even when direct causation is present, it is rarely a complete explanation of an association between two variables.
Even well established causal relations may not generalize to other settings.
Common ResponseCommon Response
“Beware the Lurking Variable”
The observed association between two variables may be due to a third variable.
Both x and y may be changing in response to changes in z.
ConfoundingConfounding
Two variables are confounded when their effects on a response variable cannot be distinguished from each other.Confounding prevents us from drawing conclusions about causation.
We can help reduce the chances of confounding by designing a well-controlled experiment.
ExampleExample
People with two cars tend to live longer than people who own only one car. Owning three cars is even better, and so on. What might explain the association?
p. 238
• 4.38: People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar. Does artificial sweetener use cause weight gain?– There may be a causative effect, but in the
direction opposite to the one suggested: People who are overweight are more likely to be on diets, and so choose artificial sweeteners over sugar. Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to artificial sweeteners.
p. 238
4.39: Women who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed chemicals cause the miscarriages. Another explanation may be the fact these workers spend a lot of time on their feet.– Time standing up is a confounding variable in
this case.
p. 239p. 239
4.41: Children who watch many hours of TV get lower grades on average than those who watch less TV. Why does this fact not show that watching TV causes low grades?
p. 239
4.43: High school students who take the SAT, enroll in an SAT coaching course, and take the SAT again raise their mathematics score from an average of 521 to 561. Can this increase be attributed entirely to taking the course?
The effect of coaching and confounded with those of experience. A student who has taken the SAT once may improve his ro her score on the second attempt because of increased familiarity with the test.