Homework Linear Regression Problems should be...

15
Homework Linear Regression Problems should be worked out in your notebook 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 a) Sketch a scatter plot b) Describe the pattern of the scatterplot. There is a strong, positive, linear relationship between age and height of the Kalama children. c) What is the correlation coefficient? Interpret in terms of the problem. r = .994366 There is a strong, positive correlation between age and height. d) Calculate and interpret the slope. Slope = .634965; For every 1 month increase in age, the height increases .634965 cm, on average e) Calculate and interpret the y-intercept. y-int = 64.9283 If a Kalama child was 0 months old, he would be approximately 65 cm tall. It’s on the high side, (average height at birth is 19”-21” or 35-51 cm), but not unreasonable. f) Write the equation of the regression line. Draw the regression line. = 64.9283 + .634965 x = age in months; = predicted height in cm g) Predict the height of a 32 month old child. = 64.9283 + .634965(32) = 85.2 85.2 inches tall at 32 months h) Make a residual plot and comment on whether a linear model is appropriate. The residual plot shows no obvious pattern so a linear model is a good choice.

Transcript of Homework Linear Regression Problems should be...

Page 1: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

Homework Linear Regression

Problems should be worked out in your notebook

1. Following are the mean heights of Kalama children:

Age (months) 18 19 20 21 22 23 24 25 26 27 28 29

Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5

a) Sketch a scatter plot

b) Describe the pattern of the scatterplot.

There is a strong, positive, linear relationship between age and height of the Kalama children.

c) What is the correlation coefficient? Interpret in terms of the problem.

r = .994366 There is a strong, positive correlation between age and height.

d) Calculate and interpret the slope.

Slope = .634965; For every 1 month increase in age, the height increases .634965 cm, on average

e) Calculate and interpret the y-intercept.

y-int = 64.9283 If a Kalama child was 0 months old, he would be approximately 65 cm tall. It’s on

the high side, (average height at birth is 19”-21” or 35-51 cm), but not unreasonable.

f) Write the equation of the regression line. Draw the regression line.

�̂� = 64.9283 + .634965𝑥 x = age in months; �̂� = predicted height in cm

g) Predict the height of a 32 month old child.

�̂� = 64.9283 + .634965(32) �̂� = 85.2 85.2 inches tall at 32 months

h) Make a residual plot and comment on whether a linear model is appropriate.

The residual plot shows no obvious pattern so a linear model is a good choice.

Page 2: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

2. The average prices (in dollars) per ounce of gold and silver for the years 1986 through 1994 are

given below.

Year 1986 1987 1988 1989 1990 1991 1992 1993 1994

Gold 368 478 438 383 385 363 345 361 389

Silver 5.47 7.01 6.53 5.50 4.82 4.04 3.94 4.30 5.30

a. What is the explanatory variable? Explain.

Either could be the explanatory variable. I don’t think there is any obvious

explanatory/response situation present.

b. Find the regression line for gold predicting silver.

�̂� = −3.87734 + .023307𝑥 x = average price per oz of gold;

�̂� = predicted average price of silver per oz

c. Interpret the slope and y-intercept.

Slope = .023307; For every 1 dollar increase in average price per oz of gold, the

average price per oz of silver increases .02 dollars, on average

y-int = -3.88 When the average price per oz of gold is 0, the average price per

oz of silver is -3.88. This is meaningless.

d. What is the correlation coefficient? Interpret.

r = .92052 There is a strong, positive relationship between average price per oz of

gold and average price per oz of silver.

e. Find the regression line for silver predicting gold.

�̂� = 200.499 + 36.357𝑥 x = average price per oz of silver;

�̂� = predicted average price of gold per oz

f. Interpret the slope and y-intercept.

Slope = 36.357; For every 1 dollar increase in average price per oz of silver, the

average price per oz of silver increases 36.36 dollars, on average

y-int = -3.88 When the average price per oz of silver is 0, the average price per

oz of gold is $200.50.

g. What is the correlation coefficient? Interpret. Compare your answer to part ‘d’.

r = .92052 There is a strong, positive relationship between average price per oz of

silver and average price per oz of gold. Same as part ‘d’.

h. What is the coefficient of determination? Interpret.

𝑟2 = .847358 84.7% of the variation in average price per oz of gold is accounted for by

the linear model relating average price per oz of gold to average price per

oz of silver.

Page 3: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

3. Good runners take more steps per second as they speed up. Here are the average numbers of steps

per second for a group of top female runners at different speeds. The speeds are in feet per second.

Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11

Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55

a) You want to predict steps per second from running speed. Which is the explanatory variable?

Make a scatterplot of the data with this goal in mind.

Running speed would be the explanatory variable.

b) Describe the pattern of the scatterplot.

There is a strong, positive linear relationship between running speed and steps per second.

c) What is the correlation coefficient? Interpret in terms of the problem.

r = .998988 There is a strong, positive relationship between run speed and steps per second.

d) Calculate and interpret the slope.

Slope = .080284 For every 1 ft/sec increase in running speed, the number of steps per

second increases by .080284 steps, on average.

e) Calculate and interpret the y-intercept.

y-int = 1.76608 When the running speed is 0 ft/sec, there are 1.76608 steps per second.

This interpretation is meaningless.

f) Write the equation of the regression line. Draw the regression line.

�̂� = 1.76608 + .080284𝑥 x = running speed �̂� = predicted number of steps per second

g) If you need to cover 20 ft/s to win a race, predict the steps per second you’ll need to maintain.

�̂� = 1.76608 + .080284(20) �̂� = 3.37175 You will need to maintain 3.37175 steps per second

h) Make a residual plot and comment on whether a linear model is appropriate.

There is an obvious curved pattern so a linear model would not be a good fit.

Page 4: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

4. Car dealers across North America use the “Red Book” to help them determine the value of used cars

that their customers trade in when purchasing new cars. The book lists on a monthly basis the

amount paid at recent used-car auctions and indicates the values according to condition and optional

features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an

experiment to determine whether the odometer reading should be included, ten 3-year-old cars are

randomly selected of the same make, condition, and options. The trade-in value (in $100) and

mileage (in 1000s of miles) are shown below.

Odometer 59 92 61 72 52 67 88 62 95 83

Trade-in 37 31 43 39 41 39 35 40 29 33

a) Describe the pattern of the scatterplot.

There is a fairly strong, negative linear relationship between odometer reading and trade-in value.

b) Find the sample regression line for determining how the odometer reading affects the trade-in

value of the car.

�̂� = 56.2047 − .266822𝑥 x = odometer reading in 1000s �̂� = predicted trade-in value in $100

c) Interpret the slope in terms of the problem.

For every 1000 mile increase in odometer reading, the trade-in value decreases by $26.68, on

average

d) Calculate and interpret the correlation coefficient.

r = -.893418 There is a fairly strong, negative relationship between odometer reading and

trade-in value

e) Calculate and interpret the coefficient of determination.

r2 = .798195 79.8% of the variation in trade-in value is accounted for by the linear model

relating trade-in value to odometer reading

f) Predict the trade-in value of a car with 60,000 miles.

�̂� = 56.2047 − .266822(60) �̂� = 40.1954 A car with 60,000 miles has a predicted trade-in value of $4019.54.

g) What would be the odometer reading of a car with a trade-in value of $4200?

42 = 56.2047 − .266822𝑥 𝑥 = 53.2366 A car with a trade-in value of $4200 would be predicted to have approximately 53,236 miles on

the odometer.

h) Make a residual plot and comment on whether a linear model is appropriate.

There is no obvious pattern so a linear model would be a good choice.

i) What is the residual for the car with 92,000 miles on the odometer?

�̂� = 56.2047 − .266822(92) �̂� = 31.6571

Residual = observed – expected 32 – 31.6571 = .3429

Page 5: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

5. In one of the Boston city parks there has been a problem with muggings in the summer months. A

police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the

following data. For each day, x represents the number of police officers on duty in the park and y

represents the number of reported muggings on that day. .

x

y

10 15 16 1 4 6 18 12 14 7

5 2 1 9 7 8 1 5 3 6

a) Sketch a scatter plot. Describe the pattern of the scatterplot.

There is a strong, negative, linear relationship between number of police officers on duty in the

park and number of muggings

b) What is the regression line?

�̂� = 9.7798 − .493184𝑥 x = # of officers on duty in the park �̂� = predicted # of muggings

c) What is the correlation coefficient? Interpret in terms of the problem.

r = -.9691 There is a strong, negative relationship between number of officers on duty in the

park and the number of muggings.

d) Interpret the slope in terms of the problem.

Slope = -.493184 For every 1 officer increase in the number of officers on duty, the number

of muggings decreases by .493184, on average.

e) Find the coefficient of determination and interpret in terms of the problem.

r2 = .939113 93.91% of the variation in the number of muggings is accounted for by the linear

model relating number of muggings to the number of police officers on duty in the park.

f) Predict the number of muggings if there are 9 police officers on duty.

�̂� = 9.7798 − .493184(9) �̂� = 5.34114 It is predicted that approximately 5

muggings will take place when there are 9 officers on duty in the park.

Page 6: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

6. Each of the following statements contains a blunder. Explain in each case what is wrong.

a. “There is a high correlation between the gender of American workers and their income”

Gender is categorical, not quantitative.

b. “We found a high correlation (r = 1.09) between students’ ratings of faculty teaching and

ratings made by other faculty members.”

Correlation can not be greater than 1

c. “The correlation between planting rate and yield of corn was found to be r = .23 bushel.”

Correlation does not have units (no r = .23 bushel)

7. Foal weight at birth is an indicator of health, so it is of interest to breeders of thoroughbred horses.

Is foal weight related to the weight of the mare? The accompanying data are from the article

“Suckling Behavior Does Not Measure Milk Intake in Horses” (animal Behavior [1999])

Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mare weight(kg) 556 638 588 550 580 642 568 642 556 616 549 504 515 551 594

Foal weight(kg) 129 119 132 123.5 112 113.5 95 104 104 93.5 108.5 95 117.5 128 127.5

a) Describe the pattern of the scatterplot.

There appears to be no linear relationship between mare weight and foal weight so it does not

make sense to do any linear regression analysis.

b) Find the equation of the regression line.

c) Interpret the slope in terms of the problem.

d) Interpret the y-intercept in terms of the problem.

e) Calculate and interpret the correlation coefficient.

f) Calculate and interpret the coefficient of determination.

Page 7: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

8. The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years)

for a random sample of Plymouth Voyagers on several dealers’ lots.

A computer printout showing the results of a straight

line to the data by the method of least squares gives: Price = 12.37 – 1.13 Age

R-sq = 75.5%

a) Find the correlation coefficient for the relationship

between price and age of Voyagers based on these

data.

𝑟 = √. 755 r = -.868907

b) What is the slope of the regression line? Interpret

it in the context of these data.

Slope = -1.13 For every 1 year increase in car

age, the value decreases by 1130 dollars, on

average.

c) How will the size of the correlation coefficient

change if the 10-year-old Voyager is removed

from the data set? Explain.

The correlation coefficient should get closer to -1;

it should get stronger.

d) How will the slope of the LSRL change if the 10-

year-old Voyager is removed from the data?

The slope should get steeper.

9. One measure of the success of knee surgery is postsurgical range of motion for the knee joint.

Postsurgical range of motion was recorded for 12 patients who had surgery following a knee

dislocation. The age of each patient was also recorded (“Reconstruction…” American Journal of

Sports Medicine). The average age was 25.83 years and standard deviation of 7.578 years. The

average range of motion was 130.1 degrees with a standard deviation of 11.927 degrees. The

correlation coefficient was r = .5534.

a) If we use age to try and predict the range of motion, what is the slope? What is the y-intercept?

Interpret the two in context of the problem.

𝑏 = 𝑟 (𝑠𝑦

𝑠𝑥) 𝑏 = .5534 (

11.927

7.578) 𝑏 = .870995

𝑎 = �̅� − 𝑏�̅� 𝑎 = 130.1 − .870995(25.83) 𝑎 = 107.602 Slope = .870995 For every 1 year increase in age, the range of motion increases by .870995

degrees, on average

y-intercept = 107.602 For someone just born we would expect them to have 107.6 degrees range

of motion. However, babies would not have reconstructive knee surgery.

b) Use the regression line to predict the range of motion of someone 32 years of age.

�̂� = 107.602 + .870995(32) �̂� = 135.474 A 32 year old would be predicted to have 135 degree range of motion.

c) Use the regression line to predict the range of motion of someone 50 years of age. Do you feel

this is an accurate prediction? Explain your thoughts.

�̂� = 107.602 + .870995(50) �̂� = 151.152 A 50 year old is 3 standard deviations above the mean which would make this borderline

extrapolation. 150 degrees is very good range of motion for post surgery.

Price

_1000

2

4

6

8

10

12

14

Age_in_years

2 4 6 8 10

Plymouth Voyagers Scatter Plot

Page 8: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

10. Newsweek gave the following 1994 average weekly earnings from allowances, chores, work, and

gifts for children of ages 4 through 12.

Age

Earnings

4 5 6 7 8 9 10 11 12

87 42 62 63 65 69 01 79 19$5. $7. $7. $10. $10. $10. $12. $13. $20.

a. Construct a scatter plot. Describe the pattern of the scatterplot.

There is a strong, positive, linear relationship between age and earnings. The point (12,

20.19) could be an outlier.

b. Interpret the slope in terms of the problem.

Slope = 1.4205 For every 1 year increase in age, the earnings increase by 1.42 dollars, on

average.

c. Find the coefficient of determination and interpret in terms of the problem.

r2 = .839758 83.98% of the variation in earnings is accounted for by the linear model

relating earnings to age.

d. Find the correlation coefficient and interpret in terms of the problem.

r = .916383 There is a strong, positive relationship between age and earnings.

e. Predict the weekly earnings of a child who is age 16. Do you think this is a good prediction?

Explain.

�̂� = −.378444 + 1.4205(16) �̂� = 22.3496 I don’t think the prediction on $22.35 weekly earnings for a 16 year old is accurate because

many 16 year olds have a job and will make more money. The x-value of 16 is extrapolation

for the data we have and extrapolation is not always accurate.

Page 9: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

11. The paper “A Cross-National Relationship between Sugar Consumption and Major Depression?”

(Depression and Anxiety [2002]) concluded that there was a strong correlation ( .9444r ) between

refined sugar consumption (calories per person per day) and annual rate of major depression (cases

per 100 people) based on data from 6 countries. The average sugar consumption was 340.83 calories

per person per day with a standard deviation of 110.56 calories while the annual rate of depression

was 4.26 cases with a standard deviation of 1.338 cases.

a) What is the slope of the regression line of annual rate of depression based on sugar consumption?

What is the y-intercept? Interpret the two in context of the problem.

𝑏 = 𝑟 (𝑠𝑦

𝑠𝑥) 𝑏 = .9444 (

1.338

110.56) 𝑏 = .011429

𝑎 = �̅� − 𝑏�̅� 𝑎 = 4.26 − .011429(340.83) 𝑎 = .364654 Slope = .011429 For every 1 calorie increase in sugar consumption, the rate of depression

increases .011429 cases per 100 people, on average

y-intercept = .364654 When sugar consumption is 0 calories per person per day there would be

.36 cases of depression per 100 people.

b) Use the regression line to predict the depression rate of the United States if the average person

consumes 300 calories per person per day.

�̂� = .364654 + .011429(300) �̂� = 3.79335 If the average person consumes 300 calories per day of sugar we could predict 3.79 cases of

depression per 100 people.

c) New Zealand’s depression rate is 5.7 annual cases per 100 people. Use the model to find the

possible sugar consumption. Does the regression line allow us to make this prediction? Explain.

5.7 = .364654 + .011429x x = 466.825 The possible sugar consumption is 466.825 calories

per person per day. Our regression line allowed us to make this prediction. The 5.7 cases of

depression per 100 people is within 2 standard deviations of the mean.

12. How quickly can athletes return to their sport following injuries requiring surgery? The paper

“Arthroscopic Distal Clavicle Resection for Isolated Atraumatic Osteolysis in Weight Lifters”

(American Journal of Sports Medicine, 1998) discovered there was a moderate positive (r = .55)

linear relationship between a lifters age and the number of days after arthroscopic shoulder surgery

before being able to return to their sport between 10 weight lifters. The average age of the weight

lifters was 30.4 with standard deviation of 2.875 years. The average number of days before being

able to return to their sport was 3.2 days with a standard deviation of 1.398 days.

a. Determine the line to predict the number of days based on the age of the weight lifter.

𝑏 = 𝑟 (𝑠𝑦

𝑠𝑥) 𝑏 = .55 (

1.398

2.875) 𝑏 = .267443

𝑎 = �̅� − 𝑏�̅� 𝑎 = 3.2 − .267443(30.4) 𝑎 = −4.93027

�̂� = −4.93027 + .267443𝑥 x = age of weight lifter �̂� = predicted days to return

b. Determine the coefficient of determination and interpret in terms of the problem.

r = .55 square both r2 = .3025 30.25% of the variation in number of days to return is

accounted for by the linear model relating number of days to return to age of weight lifter.

c. Given the spread of the lifters was from 26 to 34 years old, predict the number of days for a

28 year old lifter. Do you feel this prediction is accurate? Explain.

�̂� = −4.93027 + .267443(28) �̂� = 2.55813

The predicted number of days for a 28 year old weight lifter to return is 2.55813 days. The

prediction should be accurate because the age of 28 was within the interval of given ages.

Page 10: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

13. Success in hunting varies greatly among species of animals. Lions, who hunt singly, are rarely

successful in more than 10 percent of their hunts. Wild African dogs, who hunt in packs, are among

the most efficient of all hunters, succeeding at a rate of over 90 percent of their hunts.

In the early 1960’s, researcher Jane Goodall discovered that chimpanzees were not solely vegetarian

in their diets, as had previously been thought. This discovery spurred a tremendous amount of

primate research. Some of the latest primatology research has been done on chimpanzees to find out

if larger hunting parties increase the chances of a successful hunt. The results of one such research

project are summarized in the table for the number of chimpanzees in the hunting party versus the

percentage of successful hunts.

Number of Chimps 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16

Percent of Success 20 30 28 42 40 58 45 62 65 63 75 75 78 75 82

a. Construct a scatter plot.

b. Determine the regression line.

�̂� = 22.7 + 3.98𝑥 x = number of chimps in the hunting party �̂� = predicted success %

c. Interpret the y-intercept. Does the interpretation make sense in this context?

y-intercept = 22.7 If there are 0 chimps in the hunting party they will be successful 22.7% of

the time. This can’t happen; there can be no success if 0 chimps are hunting.

d. Interpret the slope.

Slope = 3.98 For every 1 chimp increase in the hunting party, there success percent

increases 3.98%, on average.

e. Find the correlation coefficient and interpret in terms of the problem.

R = .958961 There is a strong, positive relationship between number of chimps in the

hunting party and percent of success.

f. Find the coefficient of determination and interpret in terms of the problem.

r2 = .919606 91.96% of the variation in percent of success is accounted for by the linear

model relating percent of success to number of chimps in the hunting party.

g. Sketch the residual plot. Interpret in terms of the problem.

There appears to be a slight curve to the pattern so perhaps a linear model is not the best

choice.

Page 11: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

14. The following is a table of the number of registered automatic weapons (in thousands) of selected

states and their corresponding murder rates.

Weapons

Rates

116 8 3 36 0 6 6 9 2 5 2 4 2 6

131 10 6 101 4 4 115 6 6 36 53

. . . . . . . .

. . . . . . . .

a. Determine the regression line.

�̂� = 4.04725 + .852519𝑥 x = number of registered weapons (in 1000s)

�̂� = predicted murder rate

b. Predict the number of weapons for a state with a rate of 8.5?

8.5 = 4.04725 + .852519x x = 5.22305

The predicted number of registered weapons is 5223.

c. Predict the murder rate for a state with 10,000 registered automatic weapons.

�̂� = 4.04725 + .852519(10) �̂� = 12.5724 The predicted murder rate for a state with 10000 registered automatic weapons is 12.6%.

15. The following output data from MINITAB shows the height of girls (in cm) based on the number of

years old.

Predictor Coef Stdev t-ratio p

Constant 76.61 1.188 64.52 0.000

Age(yrs) 6.3661 0.1672 38.02 0.000

s=1.518 R-sq=99.5%

a) What is the equation of the least squares line? Interpret the slope.

�̂� = 76.61 + 6.3661𝑥 x = age in years �̂� = predicted height in cm

Slope = 6.3661 For every 1 year increase in age, the height increases 6.3661 cm, on

average.

b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the

problem.

r = .997497 There is a strong, positive relationship between age and height.

r2 = .995 99.5% of the variation in height is accounted for by the linear model relating

height to age.

c) Predict the height of a 3 year old girl.

�̂� = 76.61 + 6.3661(3) �̂� = 95.7083 The predicted height of a 3 year old girl is 95.7 cm.

d) Predict the age if a girl is 135 cm.

135 = 76.61 + 6.3661𝑥 𝑥 = 9.17202 The predicted age of a girl that is 135 cm tall is 9 years old.

Page 12: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

16. Women made significant gains in the 1970’s in terms of their acceptance into professions that had

been traditionally populated by men. To measure just how big these gains were, we will compare

the percentage of professional degrees award to women in 1973-1974 to the percentage awarded in

1978-1979 for selected fields of student.

Field Degrees in 73-74 Degrees in 78-79

Dentistry 2.0% 11.9%

Law 11.5 28.5

Medicine 11.2 23.1

Optometry 4.2 13.0

Osteopathic medicine 2.8 15.7

Podiatry 1.1 7.2

Theology 5.5 13.1

Veterinary medicine 11.2 28.9

a) What is the regression line?

�̂� = 7.00687 + 1.72414𝑥 x = % of degrees in 73-74 �̂� = predicted % of degrees in 78-79

b) Interpret the slope in terms of the problem.

Slope = 1.72414 For every 1 % increase in degrees earned in 73-74, the % of degrees

earned in 78-79 increased 1.72414%, on average.

c) Find the coefficient of determination and interpret in terms of the problem.

r2 = .885862 88.6% of the variation in % of degrees earned in 78-79 is accounted for by the

linear model relating % of degrees earned in 78-79 to % of degrees earned in 73-74.

d) Sketch the residual plot. Interpret.

There appears to be a pattern in the residual plot so a linear model may not be the best choice.

e) Find the residual for optometry.

�̂� = 7.00687 + 1.72414(4.2) �̂� = 14.2483 Residual = observed – expected 13 – 14.2483 = -1.2483

f) Find the residual for veterinary medicine. Did the regression line over or under predict?

Explain.

�̂� = 7.00687 + 1.72414(11.2) �̂� = 26.3173 Residual = observed – expected 28.9 – 26.3173 = 2.5827

When the residual is positive, the LSRL over predicts, when the residual is negative, the LSRL

under predicts. In this case, the LSRL predicted less than the actual amount.

Page 13: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

17. Shells of mollusks function as both part of the skeletal system and as protective armor. It has been

argued that many features of these shells were the result of natural selection in the constant battle

against predators. The paper “Postmortem Changes in Strength of Gastropod Shells” included

scatter plot of data on x = shell height (cm) and y = breaking strength (newtons). The least squares

line for a sample of 38 hermit crab shells was . .y x 2751 244 9 .

a. What are the slope and intercept of this line?

Slope = 244.9 y-intercept = -275.1

b. When shell height increases by 1 cm, by how much does breaking strength tend to change?

Breaking strength tends to increase 244.9 newtons, on average.

c. What breaking strength would you predict when shell height is 2 cm?

�̂� = −275.1 + 244.9(2) �̂� = 214.7 We would predict a breaking strength of 214.7 newtons when the shell height is 2 cm.

d. Does this approximate linear relationship appear to hold for shell heights as small as 1 cm?

Explain your thoughts.

�̂� = −275.1 + 244.9(1) �̂� = −30.2 When the shell height is 1 cm, the predicted breaking strength is -30.2 newtons. I don’t

believe the linear model holds for shell heights as small as 1 cm because breaking strengths

should not be negative.

18. Given the following data sets, find the regression line. Sketch the residual plot and comment on the

likelihood of the regression line being a good model.

x

y

2 3 4 5 6 7 8 9

86 96 103 110 115 120 130 131

x

y

3 6 8 9 11 14 18 20

19 22 39 50 75 87 96 125

For the first data set (graph on the left), there appears to be a curved pattern in the residual plot so a

linear model may not be the best choice.

For the second data set (graph on the right), there appears to be a sine wave pattern so a linear model

may not be the best choice.

Page 14: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

19. The data come from a study of ice cream consumption that spanned the springs and summers of

three years. The ice cream consumption (pints per capita per year), family income of consumers

($1000 per year) and the temperature (degrees Fahrenheit) is listed below.

Consumption

Income

Temperature

20 07 19 45 20 44 221 2111 17 89 17 00 14 98 1399 1331

18 25 1331 1398 18 72 17 78 18 25 1918 1851 17 78 1851

41 56 63 68 69 65 61 47 32 24

. . . . . . . . . .

. . . . . . . . . .

a. Complete two scatter plots with consumption being the response variable for each plot.

b. Find the two regression lines.

�̂� = 26.3401 − .476616𝑥 x = income in $1000 per year �̂� = predicted consumption

�̂� = 10.0151 + .152452𝑥 x = temperature in Fahrenheit �̂� = predicted consumption

c. Interpret the slopes.

For every $1000 increase in family income, the consumption decreases by .476616 pints, on

average

For every 1 degree increase in temperature, the consumption increases by .152452 pints, on

average.

d. Interpret the coefficient of determinations.

r2 = .097971 9.8% of the variation in consumption of ice cream is accounted for by the

linear model relating consumption to income.

r2 = .603245 60.3% of the variation in consumption of ice cream is accounted for by the

linear model relating consumption to temperature.

e. Sketch and interpret both residual plots.

There are two distinct clusters of data so a linear model may not be a good choice.

There is no obvious pattern so a linear model is a good choice.

f. Which do you think is the better predictor of consumption? Explain.

It appears that temperature may be a better predictor of consumption. There is a strong,

positive linear pattern in the scatterplot and the coefficient of determination is significantly

higher.

g. Predict the consumption for a temperature of 53 degrees.

�̂� = 10.0151 + .152452(53) �̂� = 18.095

We would predict 18.1 pints per capita per year for a temperature of 53 degrees Fahrenheit.

h. Predict the consumption for an income of $17,500.

�̂� = 26.3401 − .476616(17.5) �̂� = 17.9993 We would predict 18 pints per capita per year for an income of $17,500.

i. Predict the income and temperature for 3 gallons a year.

**8 pints per gallon; use 24 pints for 3 gallons

24 = 10.0151 + .152452x x = 91.73

We would predict that 3 gallons will be consumed when it is 91.73 degrees Fahrenheit.

24 = 26.3401 - .476616x x = 4.90982

We would predict that 3 gallons will be consumed when the family income is $4,909.82.

Page 15: Homework Linear Regression Problems should be …staff.katyisd.org/sites/0410576/PublishingImages/Pages/documents/HW...Homework Linear Regression Problems should be worked out in your

20. People with diabetes measure their fasting plasma glucose (FPG; measured in units of milligrams per

milliliter) after fasting for at least 8 hours. Another measurement, made at regular medical checkups

is called HbA. This is roughly the percent of red blood cells that have a glucose molecule attached. It

measures average exposure to glucose over a period of several months. The table below gives data

on both HbA and FPG for 18 diabetics five months after they had completed a diabetes education

class.

HbA FPG HbA FPG

Subject (%) (mg/mL) Subject (%) (mg/mL)

1 6.1 141 10 8.7 172

2 6.3 158 11 9.4 200

3 6.4 112 12 10.4 271

4 6.8 153 13 10.6 103

5 7.0 134 14 10.7 172

6 7.1 95 15 10.7 359

7 7.5 96 16 11.2 145

8 7.7 78 17 13.7 147

9 7.9 148 18 19.3 255

a) Sketch a scatter plot. Describe the scatterplot.

There is a very mild, positive linear relationship between HbA and FPG.

Subject 15 is an outlier in the y direction. Subject 18 is an outlier in the x direction.

b) Find the correlation and the regression line for all 18 subjects

r = .481902 �̂� = 66.4285 + 10.4077𝑥

c) Find the correlation and the regression line when only subject 15 is removed.

r = .568397 �̂� = 69.4872 + 8.92039𝑥

d) Find the correlation and the regression line when only subject 18 is removed.

r = .383701 �̂� = 52.2615 + 12.1158𝑥

e) Are either or both of these points influential for the correlation? Explain why r changes in

opposite directions when we remove each of these points.

They both appear to be influential for the correlation. Removing subject 15 makes the correlation

stronger (closer to 1); removing subject 18 makes the correlation weaker (closer to 0).

f) Is either Subject 15 or Subject 18 strongly influential for the least-squares line?

They both appear to be equally influential in the LSRL, but I don’t think either is strongly

influential. A difference in the slope of approximately 1.5-1.7 FPGs doesn’t seem like much

when the levels of FPG are in the 100s.