Homework Linear Regression Problems should be...
Transcript of Homework Linear Regression Problems should be...
Homework Linear Regression
Problems should be worked out in your notebook
1. Following are the mean heights of Kalama children:
Age (months) 18 19 20 21 22 23 24 25 26 27 28 29
Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5
a) Sketch a scatter plot
b) Describe the pattern of the scatterplot.
There is a strong, positive, linear relationship between age and height of the Kalama children.
c) What is the correlation coefficient? Interpret in terms of the problem.
r = .994366 There is a strong, positive correlation between age and height.
d) Calculate and interpret the slope.
Slope = .634965; For every 1 month increase in age, the height increases .634965 cm, on average
e) Calculate and interpret the y-intercept.
y-int = 64.9283 If a Kalama child was 0 months old, he would be approximately 65 cm tall. It’s on
the high side, (average height at birth is 19”-21” or 35-51 cm), but not unreasonable.
f) Write the equation of the regression line. Draw the regression line.
�̂� = 64.9283 + .634965𝑥 x = age in months; �̂� = predicted height in cm
g) Predict the height of a 32 month old child.
�̂� = 64.9283 + .634965(32) �̂� = 85.2 85.2 inches tall at 32 months
h) Make a residual plot and comment on whether a linear model is appropriate.
The residual plot shows no obvious pattern so a linear model is a good choice.
2. The average prices (in dollars) per ounce of gold and silver for the years 1986 through 1994 are
given below.
Year 1986 1987 1988 1989 1990 1991 1992 1993 1994
Gold 368 478 438 383 385 363 345 361 389
Silver 5.47 7.01 6.53 5.50 4.82 4.04 3.94 4.30 5.30
a. What is the explanatory variable? Explain.
Either could be the explanatory variable. I don’t think there is any obvious
explanatory/response situation present.
b. Find the regression line for gold predicting silver.
�̂� = −3.87734 + .023307𝑥 x = average price per oz of gold;
�̂� = predicted average price of silver per oz
c. Interpret the slope and y-intercept.
Slope = .023307; For every 1 dollar increase in average price per oz of gold, the
average price per oz of silver increases .02 dollars, on average
y-int = -3.88 When the average price per oz of gold is 0, the average price per
oz of silver is -3.88. This is meaningless.
d. What is the correlation coefficient? Interpret.
r = .92052 There is a strong, positive relationship between average price per oz of
gold and average price per oz of silver.
e. Find the regression line for silver predicting gold.
�̂� = 200.499 + 36.357𝑥 x = average price per oz of silver;
�̂� = predicted average price of gold per oz
f. Interpret the slope and y-intercept.
Slope = 36.357; For every 1 dollar increase in average price per oz of silver, the
average price per oz of silver increases 36.36 dollars, on average
y-int = -3.88 When the average price per oz of silver is 0, the average price per
oz of gold is $200.50.
g. What is the correlation coefficient? Interpret. Compare your answer to part ‘d’.
r = .92052 There is a strong, positive relationship between average price per oz of
silver and average price per oz of gold. Same as part ‘d’.
h. What is the coefficient of determination? Interpret.
𝑟2 = .847358 84.7% of the variation in average price per oz of gold is accounted for by
the linear model relating average price per oz of gold to average price per
oz of silver.
3. Good runners take more steps per second as they speed up. Here are the average numbers of steps
per second for a group of top female runners at different speeds. The speeds are in feet per second.
Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11
Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55
a) You want to predict steps per second from running speed. Which is the explanatory variable?
Make a scatterplot of the data with this goal in mind.
Running speed would be the explanatory variable.
b) Describe the pattern of the scatterplot.
There is a strong, positive linear relationship between running speed and steps per second.
c) What is the correlation coefficient? Interpret in terms of the problem.
r = .998988 There is a strong, positive relationship between run speed and steps per second.
d) Calculate and interpret the slope.
Slope = .080284 For every 1 ft/sec increase in running speed, the number of steps per
second increases by .080284 steps, on average.
e) Calculate and interpret the y-intercept.
y-int = 1.76608 When the running speed is 0 ft/sec, there are 1.76608 steps per second.
This interpretation is meaningless.
f) Write the equation of the regression line. Draw the regression line.
�̂� = 1.76608 + .080284𝑥 x = running speed �̂� = predicted number of steps per second
g) If you need to cover 20 ft/s to win a race, predict the steps per second you’ll need to maintain.
�̂� = 1.76608 + .080284(20) �̂� = 3.37175 You will need to maintain 3.37175 steps per second
h) Make a residual plot and comment on whether a linear model is appropriate.
There is an obvious curved pattern so a linear model would not be a good fit.
4. Car dealers across North America use the “Red Book” to help them determine the value of used cars
that their customers trade in when purchasing new cars. The book lists on a monthly basis the
amount paid at recent used-car auctions and indicates the values according to condition and optional
features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an
experiment to determine whether the odometer reading should be included, ten 3-year-old cars are
randomly selected of the same make, condition, and options. The trade-in value (in $100) and
mileage (in 1000s of miles) are shown below.
Odometer 59 92 61 72 52 67 88 62 95 83
Trade-in 37 31 43 39 41 39 35 40 29 33
a) Describe the pattern of the scatterplot.
There is a fairly strong, negative linear relationship between odometer reading and trade-in value.
b) Find the sample regression line for determining how the odometer reading affects the trade-in
value of the car.
�̂� = 56.2047 − .266822𝑥 x = odometer reading in 1000s �̂� = predicted trade-in value in $100
c) Interpret the slope in terms of the problem.
For every 1000 mile increase in odometer reading, the trade-in value decreases by $26.68, on
average
d) Calculate and interpret the correlation coefficient.
r = -.893418 There is a fairly strong, negative relationship between odometer reading and
trade-in value
e) Calculate and interpret the coefficient of determination.
r2 = .798195 79.8% of the variation in trade-in value is accounted for by the linear model
relating trade-in value to odometer reading
f) Predict the trade-in value of a car with 60,000 miles.
�̂� = 56.2047 − .266822(60) �̂� = 40.1954 A car with 60,000 miles has a predicted trade-in value of $4019.54.
g) What would be the odometer reading of a car with a trade-in value of $4200?
42 = 56.2047 − .266822𝑥 𝑥 = 53.2366 A car with a trade-in value of $4200 would be predicted to have approximately 53,236 miles on
the odometer.
h) Make a residual plot and comment on whether a linear model is appropriate.
There is no obvious pattern so a linear model would be a good choice.
i) What is the residual for the car with 92,000 miles on the odometer?
�̂� = 56.2047 − .266822(92) �̂� = 31.6571
Residual = observed – expected 32 – 31.6571 = .3429
5. In one of the Boston city parks there has been a problem with muggings in the summer months. A
police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the
following data. For each day, x represents the number of police officers on duty in the park and y
represents the number of reported muggings on that day. .
x
y
10 15 16 1 4 6 18 12 14 7
5 2 1 9 7 8 1 5 3 6
a) Sketch a scatter plot. Describe the pattern of the scatterplot.
There is a strong, negative, linear relationship between number of police officers on duty in the
park and number of muggings
b) What is the regression line?
�̂� = 9.7798 − .493184𝑥 x = # of officers on duty in the park �̂� = predicted # of muggings
c) What is the correlation coefficient? Interpret in terms of the problem.
r = -.9691 There is a strong, negative relationship between number of officers on duty in the
park and the number of muggings.
d) Interpret the slope in terms of the problem.
Slope = -.493184 For every 1 officer increase in the number of officers on duty, the number
of muggings decreases by .493184, on average.
e) Find the coefficient of determination and interpret in terms of the problem.
r2 = .939113 93.91% of the variation in the number of muggings is accounted for by the linear
model relating number of muggings to the number of police officers on duty in the park.
f) Predict the number of muggings if there are 9 police officers on duty.
�̂� = 9.7798 − .493184(9) �̂� = 5.34114 It is predicted that approximately 5
muggings will take place when there are 9 officers on duty in the park.
6. Each of the following statements contains a blunder. Explain in each case what is wrong.
a. “There is a high correlation between the gender of American workers and their income”
Gender is categorical, not quantitative.
b. “We found a high correlation (r = 1.09) between students’ ratings of faculty teaching and
ratings made by other faculty members.”
Correlation can not be greater than 1
c. “The correlation between planting rate and yield of corn was found to be r = .23 bushel.”
Correlation does not have units (no r = .23 bushel)
7. Foal weight at birth is an indicator of health, so it is of interest to breeders of thoroughbred horses.
Is foal weight related to the weight of the mare? The accompanying data are from the article
“Suckling Behavior Does Not Measure Milk Intake in Horses” (animal Behavior [1999])
Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Mare weight(kg) 556 638 588 550 580 642 568 642 556 616 549 504 515 551 594
Foal weight(kg) 129 119 132 123.5 112 113.5 95 104 104 93.5 108.5 95 117.5 128 127.5
a) Describe the pattern of the scatterplot.
There appears to be no linear relationship between mare weight and foal weight so it does not
make sense to do any linear regression analysis.
b) Find the equation of the regression line.
c) Interpret the slope in terms of the problem.
d) Interpret the y-intercept in terms of the problem.
e) Calculate and interpret the correlation coefficient.
f) Calculate and interpret the coefficient of determination.
8. The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years)
for a random sample of Plymouth Voyagers on several dealers’ lots.
A computer printout showing the results of a straight
line to the data by the method of least squares gives: Price = 12.37 – 1.13 Age
R-sq = 75.5%
a) Find the correlation coefficient for the relationship
between price and age of Voyagers based on these
data.
𝑟 = √. 755 r = -.868907
b) What is the slope of the regression line? Interpret
it in the context of these data.
Slope = -1.13 For every 1 year increase in car
age, the value decreases by 1130 dollars, on
average.
c) How will the size of the correlation coefficient
change if the 10-year-old Voyager is removed
from the data set? Explain.
The correlation coefficient should get closer to -1;
it should get stronger.
d) How will the slope of the LSRL change if the 10-
year-old Voyager is removed from the data?
The slope should get steeper.
9. One measure of the success of knee surgery is postsurgical range of motion for the knee joint.
Postsurgical range of motion was recorded for 12 patients who had surgery following a knee
dislocation. The age of each patient was also recorded (“Reconstruction…” American Journal of
Sports Medicine). The average age was 25.83 years and standard deviation of 7.578 years. The
average range of motion was 130.1 degrees with a standard deviation of 11.927 degrees. The
correlation coefficient was r = .5534.
a) If we use age to try and predict the range of motion, what is the slope? What is the y-intercept?
Interpret the two in context of the problem.
𝑏 = 𝑟 (𝑠𝑦
𝑠𝑥) 𝑏 = .5534 (
11.927
7.578) 𝑏 = .870995
𝑎 = �̅� − 𝑏�̅� 𝑎 = 130.1 − .870995(25.83) 𝑎 = 107.602 Slope = .870995 For every 1 year increase in age, the range of motion increases by .870995
degrees, on average
y-intercept = 107.602 For someone just born we would expect them to have 107.6 degrees range
of motion. However, babies would not have reconstructive knee surgery.
b) Use the regression line to predict the range of motion of someone 32 years of age.
�̂� = 107.602 + .870995(32) �̂� = 135.474 A 32 year old would be predicted to have 135 degree range of motion.
c) Use the regression line to predict the range of motion of someone 50 years of age. Do you feel
this is an accurate prediction? Explain your thoughts.
�̂� = 107.602 + .870995(50) �̂� = 151.152 A 50 year old is 3 standard deviations above the mean which would make this borderline
extrapolation. 150 degrees is very good range of motion for post surgery.
Price
_1000
2
4
6
8
10
12
14
Age_in_years
2 4 6 8 10
Plymouth Voyagers Scatter Plot
10. Newsweek gave the following 1994 average weekly earnings from allowances, chores, work, and
gifts for children of ages 4 through 12.
Age
Earnings
4 5 6 7 8 9 10 11 12
87 42 62 63 65 69 01 79 19$5. $7. $7. $10. $10. $10. $12. $13. $20.
a. Construct a scatter plot. Describe the pattern of the scatterplot.
There is a strong, positive, linear relationship between age and earnings. The point (12,
20.19) could be an outlier.
b. Interpret the slope in terms of the problem.
Slope = 1.4205 For every 1 year increase in age, the earnings increase by 1.42 dollars, on
average.
c. Find the coefficient of determination and interpret in terms of the problem.
r2 = .839758 83.98% of the variation in earnings is accounted for by the linear model
relating earnings to age.
d. Find the correlation coefficient and interpret in terms of the problem.
r = .916383 There is a strong, positive relationship between age and earnings.
e. Predict the weekly earnings of a child who is age 16. Do you think this is a good prediction?
Explain.
�̂� = −.378444 + 1.4205(16) �̂� = 22.3496 I don’t think the prediction on $22.35 weekly earnings for a 16 year old is accurate because
many 16 year olds have a job and will make more money. The x-value of 16 is extrapolation
for the data we have and extrapolation is not always accurate.
11. The paper “A Cross-National Relationship between Sugar Consumption and Major Depression?”
(Depression and Anxiety [2002]) concluded that there was a strong correlation ( .9444r ) between
refined sugar consumption (calories per person per day) and annual rate of major depression (cases
per 100 people) based on data from 6 countries. The average sugar consumption was 340.83 calories
per person per day with a standard deviation of 110.56 calories while the annual rate of depression
was 4.26 cases with a standard deviation of 1.338 cases.
a) What is the slope of the regression line of annual rate of depression based on sugar consumption?
What is the y-intercept? Interpret the two in context of the problem.
𝑏 = 𝑟 (𝑠𝑦
𝑠𝑥) 𝑏 = .9444 (
1.338
110.56) 𝑏 = .011429
𝑎 = �̅� − 𝑏�̅� 𝑎 = 4.26 − .011429(340.83) 𝑎 = .364654 Slope = .011429 For every 1 calorie increase in sugar consumption, the rate of depression
increases .011429 cases per 100 people, on average
y-intercept = .364654 When sugar consumption is 0 calories per person per day there would be
.36 cases of depression per 100 people.
b) Use the regression line to predict the depression rate of the United States if the average person
consumes 300 calories per person per day.
�̂� = .364654 + .011429(300) �̂� = 3.79335 If the average person consumes 300 calories per day of sugar we could predict 3.79 cases of
depression per 100 people.
c) New Zealand’s depression rate is 5.7 annual cases per 100 people. Use the model to find the
possible sugar consumption. Does the regression line allow us to make this prediction? Explain.
5.7 = .364654 + .011429x x = 466.825 The possible sugar consumption is 466.825 calories
per person per day. Our regression line allowed us to make this prediction. The 5.7 cases of
depression per 100 people is within 2 standard deviations of the mean.
12. How quickly can athletes return to their sport following injuries requiring surgery? The paper
“Arthroscopic Distal Clavicle Resection for Isolated Atraumatic Osteolysis in Weight Lifters”
(American Journal of Sports Medicine, 1998) discovered there was a moderate positive (r = .55)
linear relationship between a lifters age and the number of days after arthroscopic shoulder surgery
before being able to return to their sport between 10 weight lifters. The average age of the weight
lifters was 30.4 with standard deviation of 2.875 years. The average number of days before being
able to return to their sport was 3.2 days with a standard deviation of 1.398 days.
a. Determine the line to predict the number of days based on the age of the weight lifter.
𝑏 = 𝑟 (𝑠𝑦
𝑠𝑥) 𝑏 = .55 (
1.398
2.875) 𝑏 = .267443
𝑎 = �̅� − 𝑏�̅� 𝑎 = 3.2 − .267443(30.4) 𝑎 = −4.93027
�̂� = −4.93027 + .267443𝑥 x = age of weight lifter �̂� = predicted days to return
b. Determine the coefficient of determination and interpret in terms of the problem.
r = .55 square both r2 = .3025 30.25% of the variation in number of days to return is
accounted for by the linear model relating number of days to return to age of weight lifter.
c. Given the spread of the lifters was from 26 to 34 years old, predict the number of days for a
28 year old lifter. Do you feel this prediction is accurate? Explain.
�̂� = −4.93027 + .267443(28) �̂� = 2.55813
The predicted number of days for a 28 year old weight lifter to return is 2.55813 days. The
prediction should be accurate because the age of 28 was within the interval of given ages.
13. Success in hunting varies greatly among species of animals. Lions, who hunt singly, are rarely
successful in more than 10 percent of their hunts. Wild African dogs, who hunt in packs, are among
the most efficient of all hunters, succeeding at a rate of over 90 percent of their hunts.
In the early 1960’s, researcher Jane Goodall discovered that chimpanzees were not solely vegetarian
in their diets, as had previously been thought. This discovery spurred a tremendous amount of
primate research. Some of the latest primatology research has been done on chimpanzees to find out
if larger hunting parties increase the chances of a successful hunt. The results of one such research
project are summarized in the table for the number of chimpanzees in the hunting party versus the
percentage of successful hunts.
Number of Chimps 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16
Percent of Success 20 30 28 42 40 58 45 62 65 63 75 75 78 75 82
a. Construct a scatter plot.
b. Determine the regression line.
�̂� = 22.7 + 3.98𝑥 x = number of chimps in the hunting party �̂� = predicted success %
c. Interpret the y-intercept. Does the interpretation make sense in this context?
y-intercept = 22.7 If there are 0 chimps in the hunting party they will be successful 22.7% of
the time. This can’t happen; there can be no success if 0 chimps are hunting.
d. Interpret the slope.
Slope = 3.98 For every 1 chimp increase in the hunting party, there success percent
increases 3.98%, on average.
e. Find the correlation coefficient and interpret in terms of the problem.
R = .958961 There is a strong, positive relationship between number of chimps in the
hunting party and percent of success.
f. Find the coefficient of determination and interpret in terms of the problem.
r2 = .919606 91.96% of the variation in percent of success is accounted for by the linear
model relating percent of success to number of chimps in the hunting party.
g. Sketch the residual plot. Interpret in terms of the problem.
There appears to be a slight curve to the pattern so perhaps a linear model is not the best
choice.
14. The following is a table of the number of registered automatic weapons (in thousands) of selected
states and their corresponding murder rates.
Weapons
Rates
116 8 3 36 0 6 6 9 2 5 2 4 2 6
131 10 6 101 4 4 115 6 6 36 53
. . . . . . . .
. . . . . . . .
a. Determine the regression line.
�̂� = 4.04725 + .852519𝑥 x = number of registered weapons (in 1000s)
�̂� = predicted murder rate
b. Predict the number of weapons for a state with a rate of 8.5?
8.5 = 4.04725 + .852519x x = 5.22305
The predicted number of registered weapons is 5223.
c. Predict the murder rate for a state with 10,000 registered automatic weapons.
�̂� = 4.04725 + .852519(10) �̂� = 12.5724 The predicted murder rate for a state with 10000 registered automatic weapons is 12.6%.
15. The following output data from MINITAB shows the height of girls (in cm) based on the number of
years old.
Predictor Coef Stdev t-ratio p
Constant 76.61 1.188 64.52 0.000
Age(yrs) 6.3661 0.1672 38.02 0.000
s=1.518 R-sq=99.5%
a) What is the equation of the least squares line? Interpret the slope.
�̂� = 76.61 + 6.3661𝑥 x = age in years �̂� = predicted height in cm
Slope = 6.3661 For every 1 year increase in age, the height increases 6.3661 cm, on
average.
b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the
problem.
r = .997497 There is a strong, positive relationship between age and height.
r2 = .995 99.5% of the variation in height is accounted for by the linear model relating
height to age.
c) Predict the height of a 3 year old girl.
�̂� = 76.61 + 6.3661(3) �̂� = 95.7083 The predicted height of a 3 year old girl is 95.7 cm.
d) Predict the age if a girl is 135 cm.
135 = 76.61 + 6.3661𝑥 𝑥 = 9.17202 The predicted age of a girl that is 135 cm tall is 9 years old.
16. Women made significant gains in the 1970’s in terms of their acceptance into professions that had
been traditionally populated by men. To measure just how big these gains were, we will compare
the percentage of professional degrees award to women in 1973-1974 to the percentage awarded in
1978-1979 for selected fields of student.
Field Degrees in 73-74 Degrees in 78-79
Dentistry 2.0% 11.9%
Law 11.5 28.5
Medicine 11.2 23.1
Optometry 4.2 13.0
Osteopathic medicine 2.8 15.7
Podiatry 1.1 7.2
Theology 5.5 13.1
Veterinary medicine 11.2 28.9
a) What is the regression line?
�̂� = 7.00687 + 1.72414𝑥 x = % of degrees in 73-74 �̂� = predicted % of degrees in 78-79
b) Interpret the slope in terms of the problem.
Slope = 1.72414 For every 1 % increase in degrees earned in 73-74, the % of degrees
earned in 78-79 increased 1.72414%, on average.
c) Find the coefficient of determination and interpret in terms of the problem.
r2 = .885862 88.6% of the variation in % of degrees earned in 78-79 is accounted for by the
linear model relating % of degrees earned in 78-79 to % of degrees earned in 73-74.
d) Sketch the residual plot. Interpret.
There appears to be a pattern in the residual plot so a linear model may not be the best choice.
e) Find the residual for optometry.
�̂� = 7.00687 + 1.72414(4.2) �̂� = 14.2483 Residual = observed – expected 13 – 14.2483 = -1.2483
f) Find the residual for veterinary medicine. Did the regression line over or under predict?
Explain.
�̂� = 7.00687 + 1.72414(11.2) �̂� = 26.3173 Residual = observed – expected 28.9 – 26.3173 = 2.5827
When the residual is positive, the LSRL over predicts, when the residual is negative, the LSRL
under predicts. In this case, the LSRL predicted less than the actual amount.
17. Shells of mollusks function as both part of the skeletal system and as protective armor. It has been
argued that many features of these shells were the result of natural selection in the constant battle
against predators. The paper “Postmortem Changes in Strength of Gastropod Shells” included
scatter plot of data on x = shell height (cm) and y = breaking strength (newtons). The least squares
line for a sample of 38 hermit crab shells was . .y x 2751 244 9 .
a. What are the slope and intercept of this line?
Slope = 244.9 y-intercept = -275.1
b. When shell height increases by 1 cm, by how much does breaking strength tend to change?
Breaking strength tends to increase 244.9 newtons, on average.
c. What breaking strength would you predict when shell height is 2 cm?
�̂� = −275.1 + 244.9(2) �̂� = 214.7 We would predict a breaking strength of 214.7 newtons when the shell height is 2 cm.
d. Does this approximate linear relationship appear to hold for shell heights as small as 1 cm?
Explain your thoughts.
�̂� = −275.1 + 244.9(1) �̂� = −30.2 When the shell height is 1 cm, the predicted breaking strength is -30.2 newtons. I don’t
believe the linear model holds for shell heights as small as 1 cm because breaking strengths
should not be negative.
18. Given the following data sets, find the regression line. Sketch the residual plot and comment on the
likelihood of the regression line being a good model.
x
y
2 3 4 5 6 7 8 9
86 96 103 110 115 120 130 131
x
y
3 6 8 9 11 14 18 20
19 22 39 50 75 87 96 125
For the first data set (graph on the left), there appears to be a curved pattern in the residual plot so a
linear model may not be the best choice.
For the second data set (graph on the right), there appears to be a sine wave pattern so a linear model
may not be the best choice.
19. The data come from a study of ice cream consumption that spanned the springs and summers of
three years. The ice cream consumption (pints per capita per year), family income of consumers
($1000 per year) and the temperature (degrees Fahrenheit) is listed below.
Consumption
Income
Temperature
20 07 19 45 20 44 221 2111 17 89 17 00 14 98 1399 1331
18 25 1331 1398 18 72 17 78 18 25 1918 1851 17 78 1851
41 56 63 68 69 65 61 47 32 24
. . . . . . . . . .
. . . . . . . . . .
a. Complete two scatter plots with consumption being the response variable for each plot.
b. Find the two regression lines.
�̂� = 26.3401 − .476616𝑥 x = income in $1000 per year �̂� = predicted consumption
�̂� = 10.0151 + .152452𝑥 x = temperature in Fahrenheit �̂� = predicted consumption
c. Interpret the slopes.
For every $1000 increase in family income, the consumption decreases by .476616 pints, on
average
For every 1 degree increase in temperature, the consumption increases by .152452 pints, on
average.
d. Interpret the coefficient of determinations.
r2 = .097971 9.8% of the variation in consumption of ice cream is accounted for by the
linear model relating consumption to income.
r2 = .603245 60.3% of the variation in consumption of ice cream is accounted for by the
linear model relating consumption to temperature.
e. Sketch and interpret both residual plots.
There are two distinct clusters of data so a linear model may not be a good choice.
There is no obvious pattern so a linear model is a good choice.
f. Which do you think is the better predictor of consumption? Explain.
It appears that temperature may be a better predictor of consumption. There is a strong,
positive linear pattern in the scatterplot and the coefficient of determination is significantly
higher.
g. Predict the consumption for a temperature of 53 degrees.
�̂� = 10.0151 + .152452(53) �̂� = 18.095
We would predict 18.1 pints per capita per year for a temperature of 53 degrees Fahrenheit.
h. Predict the consumption for an income of $17,500.
�̂� = 26.3401 − .476616(17.5) �̂� = 17.9993 We would predict 18 pints per capita per year for an income of $17,500.
i. Predict the income and temperature for 3 gallons a year.
**8 pints per gallon; use 24 pints for 3 gallons
24 = 10.0151 + .152452x x = 91.73
We would predict that 3 gallons will be consumed when it is 91.73 degrees Fahrenheit.
24 = 26.3401 - .476616x x = 4.90982
We would predict that 3 gallons will be consumed when the family income is $4,909.82.
20. People with diabetes measure their fasting plasma glucose (FPG; measured in units of milligrams per
milliliter) after fasting for at least 8 hours. Another measurement, made at regular medical checkups
is called HbA. This is roughly the percent of red blood cells that have a glucose molecule attached. It
measures average exposure to glucose over a period of several months. The table below gives data
on both HbA and FPG for 18 diabetics five months after they had completed a diabetes education
class.
HbA FPG HbA FPG
Subject (%) (mg/mL) Subject (%) (mg/mL)
1 6.1 141 10 8.7 172
2 6.3 158 11 9.4 200
3 6.4 112 12 10.4 271
4 6.8 153 13 10.6 103
5 7.0 134 14 10.7 172
6 7.1 95 15 10.7 359
7 7.5 96 16 11.2 145
8 7.7 78 17 13.7 147
9 7.9 148 18 19.3 255
a) Sketch a scatter plot. Describe the scatterplot.
There is a very mild, positive linear relationship between HbA and FPG.
Subject 15 is an outlier in the y direction. Subject 18 is an outlier in the x direction.
b) Find the correlation and the regression line for all 18 subjects
r = .481902 �̂� = 66.4285 + 10.4077𝑥
c) Find the correlation and the regression line when only subject 15 is removed.
r = .568397 �̂� = 69.4872 + 8.92039𝑥
d) Find the correlation and the regression line when only subject 18 is removed.
r = .383701 �̂� = 52.2615 + 12.1158𝑥
e) Are either or both of these points influential for the correlation? Explain why r changes in
opposite directions when we remove each of these points.
They both appear to be influential for the correlation. Removing subject 15 makes the correlation
stronger (closer to 1); removing subject 18 makes the correlation weaker (closer to 0).
f) Is either Subject 15 or Subject 18 strongly influential for the least-squares line?
They both appear to be equally influential in the LSRL, but I don’t think either is strongly
influential. A difference in the slope of approximately 1.5-1.7 FPGs doesn’t seem like much
when the levels of FPG are in the 100s.