Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring...

37
Chapter 14: Inference for Regression

Transcript of Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring...

Page 1: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Chapter 14: Inference for Regression

Page 2: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables)

Bi-variate data - relationships between 2 numeric, quantitative variables measured on same individual

Each individual appears as an point (x, y) on the scatter plot

Explanatory variable; response variable

Page 3: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Scatterplot; label & scale; look for overall patterns (DOFS)

Page 4: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Measuring Linear Association: Correlation or “r”

Correlation (r) measures direction and strength of a linear relationship between two quantitative variables

Correlation (r) is always between -1 and 1; makes no sense to have r = -13 or r = 27

Correlation (r) is not resistant (look at formula; based on mean)

Correlation is for scatter plots (not LSRL)

r is in standard units, so r doesn’t change if units are changed

If we change from yards to feet, r is not effected

Page 5: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Measuring Linear Association: Correlation or “r”

r ≈0 not strong linear relationship

r close to 1 strong positive linear relationship

r close to -1 strong negative linear relationship

Page 6: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Page 7: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

One better ...Least Squares Regression Line (LSRL)

Page 8: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Least Squares Regression (predicts values)

LSRL Model:

is predicted value of response variable

a is y-intercept of LSRL

b is slope of LSRL; slope is predicted (expected) rate of change

x is explanatory variable

Page 9: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Least Squares Regression(predicts values)

May be asked to interpret slope of LSRL & y-intercept, in context

Caution: Interpret slope of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable

NOT change in y for a unit change in x; LSRL is a model; models are not perfect

Page 10: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Extrapolation... What is it again??

Extrapolation! Don’t do it… ever.

Example: Growth data from children from age 1 month to age 12 years … LSRL

What is the predicted height of a 40-year old?

Page 11: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Outliers & Influential Points

All influential points are outliers, but not all outliers are influential points. Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

Page 12: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Coefficient of Determination; r2

r2 tells us how well our LSRL describes our data; how well does this linear model fit the data

r2 is always between 0 and 1 ; 0 ≤ r2 ≤ 1

r2, “fraction of the variation of the values of y that are explained by LSRL”

VERSUS r, correlation, -1 ≤ r ≤ 1; describes direction and strength of the linear relationship in a scatter plot

Page 13: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Chapter 14: Inference for Regression

We are now going to take all of that previous knowledge about bi-variate data and apply it to inference (forming judgments about population parameters on the basis of random sampling; a statistic)

Remember, = a + bx is just an estimate, a predictor, a statistic (like or ), based on a sample

Statistics vary from sample to sample

ˆ y

ˆ p

x

Page 14: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

SRS BMW Cars (age & price)

What about another SRS of n = 7? Would data/points possibly be different?

So then would LSRL be different?

What about another SRS?

Data varies from sample to sample

Do we know the true population

parameter? Do we have info

on ALL BMW’s?

Page 15: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

SRS BMW Cars (age & price)

So this LSRL is just based on THESE 7 pieces of data

We don’t know the true, unknown population parameter regression line, y = βo + β1x

But we can estimate the true,unknown regression line using aconfidence interval... OR ...we can test a claim using anhypothesis test

Page 16: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Let’s talk about conditions...

We need to be aware/check conditions before we perform inference (confidence intervals, hypothesis testing) with any situation (means, proportions, linear regression, one-sample, two-sample, Chi-Square, etc.)

If conditions are not met, our inference may be very inaccurate; worthless information

Page 17: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Conditions for Linear Regression Inference

1. Linearity: trend is linear (Use Residuals Plot to Check)

2. Normality: errors follow a Normal distribution with a mean of zero; N (0, σ ) (Use QQ Plot/Normal Probability Plot to Check)

3. Constant standard deviation: the standard deviation σ must be the same for all values of the predictor variable (Use Residuals Plot to Check)

4. Independence: Errors must be independent of one another (review raw data and collection process)

Page 18: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Residuals ... we look at these to determine if conditions 1 & 3 are met

Least Squares Regression Line is not perfect, but it’s the best model we have

All points on the scatter plot don’t fit perfectly on the LSRL; very common

Vertical distances from pointto LSRL are called “residuals,”or left-overs

Page 19: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Residuals: Observed y value – expected y value

LSRL is the line that creates the least “left-overs,” aka least residuals

Page 20: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Graphical Tool: Residuals Plot

We plot the residuals (left overs, points on scatterplot that are above or below LSRL) to determine if a line is the best model to describe our scatterplot of bivariate data

Perhaps a line isn’t the best model…. Maybe a quadratic curve or a log curve or square root function is a better model for the data

Page 21: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Residuals Plot (truck example)

On left is scatter plot & LSRL; on right is residuals plot

Page 22: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Graphical Tool: Residuals Plot

To check the linearity and the constant standard deviation conditions, should have no obvious pattern, random, unstructured

In the below case, both conditions are met

Page 23: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Residuals Plot

If there is an obvious pattern, conditions 1 & 3 are not met

Page 24: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Condition #2: Normality...

Errors must follow a Normal distribution

Can examine a Normal Probability Plot (NPP) (or a QQ Plot) of the residuals (left-overs)

If NPP is fairly linear, then condition #2 is satisfied

Page 25: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

NPP that shows that errors do not follow a Normal distribution

Page 26: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Condition #4... Independence

Errors must be independent of one another

Exam the collection method of the data if possible

In most cases, we must assume independence until if/when we discover otherwise

Page 27: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Equation...for LSRL (sample statistic)

Sample statistic: = a + b x where

x is the value of the explanatory variable

b is estimated slope (sample statistic)

a is estimated y-intercept (sample statistic)

is the estimated value of the response variable (sample statistic)

ˆ y

ˆ y

Page 28: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Equation... for true, unknown population parameter line

Population parameter: y = βo + β1x

x is the value of the explanatory variable

β1 is the true, actual (but unknown) population slope

β0 is the true, actual (but unknown) population y-intercept

y is the true, actual (but unknown) value of the population parameter response variable

Page 29: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Hypothesis testing...

Majority of time, we are most interested in performing an hypothesis test on slope (not y-intercept)

Ho: Slope = 0

(OR β1 = 0 OR there is no linear association between two variables OR correlation = 0)

Ha: Slope ≠ 0 (> or <)

(OR β1 ≠ 0 OR there is a linear association between the two variables OR correlation ≠ 0) ... or > or <

Page 30: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Hypothesis testing...

Same 4 steps:

State null and alternative hypothesis

Check conditions

Do calculations

Interpret results in context

Page 31: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Random sample of 9th grade students...

... going on their annual backpacking trip each fall

Is there a linear relationship between body weight and backpack weight?

www.whfreeman.com/tps5e

Body Weight (lbs) vs. Backpack Weight (lbs)

Body Weight

120 187 109 103 131 158 116

Backpack Weight

26 30 26 24 29 31 28

Page 32: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Ho: No linear relationship between body weight & backpack weight (or β1 = 0)

Ha: There is a linear relationship between body weight & backpack weight (or β1 ≠ 0)

Conditions: Assume all conditions have been checked & met.

Calculations: Enter data into Minitab (one column for body weight & another for backpack weight); then go to regression, simple regression. Careful of response & predictor (backwards). Choose linear, 95% confidence.

Interpretation: Decision, α level, p-value, context.www.whfreeman.com/tps5e

Body Weight (lbs) vs. Backpack Weight (lbs)

Body Weight 120 187 109 103 131 158 116Backpack Weight 26 30 26 24 29 31 28

Page 33: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Construct a confidence interval at the 95% level.

Conditions: Assume all conditions have been checked & met.

Calculations: Enter data into Minitab (one column for body weight & another for backpack weight); then go to regression, simple regression. Careful of response & predictor (backwards). Choose linear, 95% confidence.

Interpretation: We are 95% confident that the true, unknown population parameter, the true slope, β, is between ...

www.whfreeman.com/tps5e

Body Weight (lbs) vs. Backpack Weight (lbs)

Body Weight 120 187 109 103 131 158 116

Backpack Weight 26 30 26 24 29 31 28

Page 34: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Do customers who stay longer at buffets give larger/smaller tips?

Xx

Time (minutes) Tip ($)

23 5.00

39 2.75

44 7.75

55 5.00

61 7.00

65 8.88

67 9.01

70 5.00

74 7.29

85 7.5

90 6.00

99 6.50

Page 35: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Do customers who stay longer at buffets give larger/smaller tips?

A statistics student investigated this question as part of her project. She obtains a SRS of receipts which included this information.

Does this data provide convincing evidence that customers who stay longer tip differently than customers who stay shorter periods of time?

Ho: β = 0 (no relationship between variables)

Ha: β ≠ 0 (customers who stay longer give larger tips)

www.whfreeman.com/tps5e

Page 36: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Do customers who stay longer at buffets give larger/smaller tips?

Ho: β = 0 (no relationship between variables)

Ha: β ≠ 0 (customers who stay longer give larger tips)

Conditions: Assume all conditions have been checked and met.

Calculations: Enter data into Minitab and run calculations.

Interpretation: Decision, α level, p-value, context.www.whfreeman.com/tps5e

Page 37: Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Homework...

Homework

Section Quiz

Our next test ...