Ch 12 - More about Regression

28
Ch 12 - More about Regression “The Last Chapter!”

description

Ch 12 - More about Regression. “The Last Chapter!”. - PowerPoint PPT Presentation

Transcript of Ch 12 - More about Regression

Page 1: Ch 12 - More about Regression

Ch 12 - More about Regression

Ch 12 - More about Regression

“The Last Chapter!”“The Last Chapter!”

Page 2: Ch 12 - More about Regression

Does seat location matter? Many people believe that students learn better if they sit close to the front of the classroom. Does sitting closer cause higher achievement, or do better students imply chose to sit in the front? Students were randomly assigned to seat locations in a teacher’s classroom for a chapter and their location was recorded as well as their chapter test grade. The explanatory variable in this experiment is which row the student was assigned (Row 1 is the closest to the front and Row 7 is the farthest away). Here are the results:Row 1: 76, 77, 94, 99Row 2: 83, 85, 74, 79Row 3: 90, 88, 68, 78Row 4: 94, 72, 101, 70, 79Row 5: 76, 65, 90, 67, 96Row 6: 88, 79, 90, 83Row 7: 79, 76, 77, 63

Page 3: Ch 12 - More about Regression

Construct a scatter plot and describe what you see.

Interpret the slope of the least-squares regression line in this context.

Explain why it is important to randomly assign the students to seats rather than students selecting their own seats.

Does the negative slope provide convincing evidence that sitting closer to the board causes higher achievement, or is it plausible that the association is due to the chance variation in the random assignment?

Page 4: Ch 12 - More about Regression

What will we make an inference about?

Slope or y-intercept?

Greek symbols for parameters, and “regular” letters for statistics!

Page 5: Ch 12 - More about Regression

Conditions to Check

L - Linear - the actual relationship between x and y is linear

I - Independent - individual observations are independent of one another

N - Normal - for any fixed value of x, the y-value varies according to a Normal distribution

E - Equal Variance - the standard deviation of y is the same for all values of x

R - Random - the data comes from a well-designed random sample or randomized experiment

Page 6: Ch 12 - More about Regression

How are we going to check that stuff?

Linear - check overall pattern of scatterplot, check the residual plot (transformations might be needed)

Independent - look at design to ensure random sampling, if sampling without replacement, check the 10% condition

Normal - create a graph of the residuals to check for any strong skewness or major outliers. Graphs - histogram, Normal Probability plot, or stemplot

Equal Variance - The scatter about zero for the residuals should be evenly above and below for all x-values

Random - same as always!

Page 7: Ch 12 - More about Regression

Check the conditions for our “where you sit” example

L

I

N

E

R

Page 8: Ch 12 - More about Regression

A little review on computer output

Here is the computer output for the least-squares regression line for the seating-chart data:

Regression Analysis: Score versus Row

Predictor Coef SE Coef T P

Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

XX X

X

Page 9: Ch 12 - More about Regression

1. State the equation of the least-squares regression line. State any variables you use.

2. Interpret the slope, y-intercept (if possible) and standard deviation of the residuals.

Regression Analysis: Score versus Row

Predictor Coef SE Coef T P

Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

Page 10: Ch 12 - More about Regression

Confidence Intervals for the Slope

Confidence Intervals for the Slope

Page 11: Ch 12 - More about Regression

t Interval for the Slope of a LSRL

Formula: b + t*SEb

Where t* = critical value with n - 2 degrees of freedom

SEb =

Page 12: Ch 12 - More about Regression

Back to our Seats & Grades

(a) Identify the standard error of the slope SEb from the computer output provided. Interpret this in context of the problem.

(b) Calculate the 95% confidence interval for the true slope. Show your work.

(c) Interpret the interval from part (b).

(d) Based on your interval, is there convincing evidence that seat location affects scores?

Regression Analysis: Score versus Row

Predictor Coef SE Coef T P

Constant 85.706 4.239 20.22 0.000

Row -1.1171 0.9472 -1.18 0.248

S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%

Page 13: Ch 12 - More about Regression

(a) SEb = 0.9472. If we repeated the random assignment many times, the slope of the estimated regression line would typically vary by about 0.9472 from the slope of the true regression line for predicting test scores from row number.

Page 14: Ch 12 - More about Regression

(b) df = 30 - 2 = 28, so t* = 2.048 (you can use table B or your invT command)

-1.1171 + 2.048(0.9472) = (-3.0570, 0.8228)

Page 15: Ch 12 - More about Regression

(c) I am 95% confident that the interval from -3.0570 to 0.8228 captures the true slope for the regression line relating a test score (y) and the students row number (x).

Page 16: Ch 12 - More about Regression

(d) Since a 95% confidence interval contains 0 as a plausible slope, I do not have convincing evidence that there is an association between test score and row number.

Page 17: Ch 12 - More about Regression

Two students decided to investigate the effect of sugar on the life of cut flowers. They went to the local grocery store and randomly selected 12 carnations. All the carnations seemed equally healthy when they were selected. When the students got home, they prepared 12 identical vases with exactly the same amount of water in each vase. They put one tablespoon of sugar in 3 vases, two tablespoons of sugar in 3 vases, and three tablespoons of sugar in 3 vases. In the remaining 3 vases, they put no sugar. After the vases were prepared and placed in the same location, the students randomly assigned one flower to each vase and observed how many hours each flower continued to look fresh. Here are the data: Sugar (tbs)

Freshness (hours)

0 168

0 180

0 192

1 192

1 204

1 204

2 204

2 210

2 210

3 222

3 228

3 234

Page 18: Ch 12 - More about Regression

Below are the least squares regression analysis for these data from Minitab.

Regression Analysis: Freshness (hours) versus Sugar (tbs)

Predictor Coef SE Coef T P

Constant 181.200 3.635 49.84 0.000

Sugar (tbs) 15.200 1.943 7.82 0.000

S = 7.52596 R-Sq = 86.0% R-Sq(adj) = 84.5%

Page 19: Ch 12 - More about Regression

a) Construct and interpret a 99% confidence interval for the slope of the true regression line.

b) Would you feel confident predicting the hours of freshness if 10 tablespoons of sugar are used? Explain.

Page 20: Ch 12 - More about Regression

Significance Test for the Slope

Significance Test for the Slope

Page 21: Ch 12 - More about Regression

t Test for the Slope of the Population Regression Line

test statistic = statistic - parameter standard deviation of

statistic

t = b - B0 with df = n - 2 SEb

Page 22: Ch 12 - More about Regression

Why preform a significance test on the slope?

Really no association and we got a nonzero slope due to random chance variation due to random assignment

Really is an association

Page 23: Ch 12 - More about Regression

Do customers who stay longer at buffets give larger tips? While Charlotte was working at the local Asian buffet, she collected a random sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the amount of the tip (in dollars). Do these data provide convincing evidence that customers who stay longer give larger tips? Here are the data:

Time (minutes) Tip (dollars)

23 5.00

29 2.75

44 7.75

55 5.00

61 7.00

65 8.88

67 9.01

70 5.00

74 7.29

85 7.50

90 6.00

99 6.50

Page 24: Ch 12 - More about Regression

a) Construct a scatterplot with the least squares regression line on your calculator. Describe what this graph tells you about the relationship between the two variables.

Page 25: Ch 12 - More about Regression

b)What is the equation of the least squares regression line? Define any variables you use.

c) Interpret the slope and y-intercept of the least squares regression line in context.

Page 26: Ch 12 - More about Regression

d) Using the computer output and graphs below, carry out an appropriate test to answer Charlotte’s question.

Regression Analysis: Tip (dollars) versus Time (minutes)

Predictor Coef SE Coef T P

Constant 4.535 1.657 2.74 0.021

Time 0.03013 0.02448 1.23 0.247

S = 1.77931 R-Sq = 13.2% R-Sq(adj) = 4.5%

Page 27: Ch 12 - More about Regression

Confidence intervals give a better picture

A random sample of 11 used Honda CR-V’s from the 2002-6 model years was selected from the inventory at www.carmax.com. The number of miles driven and the advertised price were recorded for each CR-V. A 95% confidence interval for the slope of the LSRL for predicting advertised price from number of miles (in thousands) driven is (-50.1, -122.3)

Based on the interval, what conclusion would you draw from a test of H0: B = 0 versus Ha: B does not = 0 at the 0.05 significance level?

What more information do you gain from the confidence interval?

Page 28: Ch 12 - More about Regression

Calculators!

Confidence Intervals: LinRegTInt

Significance Tests: LinRegTTest