Psych 5510/6510

Post on 14-Jan-2016

57 views 0 download

Tags:

description

Psych 5510/6510. Chapter 10 . Interactions and Polynomial Regression: Models with Products of Continuous Predictors. Spring, 2009. Broadening the Scope. - PowerPoint PPT Presentation

Transcript of Psych 5510/6510

1

Psych 5510/6510

Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous

Predictors

Spring, 2009

2

Broadening the Scope

So far we have been limiting our models by ignoring the possibility that the predictor variables might interact, and by using only straight lines for our regression (i.e. ‘linear’ regression). This chapter provides an approach that allows us to add both the interaction of variables and nonlinear regression to our models.

3

Our ‘Running’ Example

Throughout this chapter we will be working with the following example:

Y is the time (in minutes) taken to run a 5-kilometer race.

X1 is the age of the runner

X2 is how many miles per week the runner ran when in training for the race.

4

‘On Your Mark’

We will begin by taking another perspective of what we have been doing so far in the text, and then use that perspective to understand interactions and nonlinear regression.

5

Time and AgeThe analysis of the data leads to the following

‘simple’ relationship between Time (Y) and Age (X1).

MODEL C: Ŷi=β0

MODEL A: Ŷi=β0+β1X1i

Ŷi=15.104 + .213X1i

PRE=.218. F*=21.7, p<.01

6

Simple Relationship between Time and Age

7

Time and MilesThe simple relationship between Time (Y) and

Miles of Training (X2).

MODEL C: Ŷi=β0

MODEL A: Ŷi=β0+β2X2i

Ŷi=31.91 - .280X2i

PRE=.535. F*=89.6, p<.01

8

Simple Relationship between Race Time and Miles of Training

9

Both PredictorsNow regress Y, on both Age (X1), and Miles of

Training (X2).

MODEL C: Ŷi=β0

MODEL A: Ŷi=β0 +β1X1i +β2X2i

Ŷi=24.716 + 1.65X1i - .258X2i

PRE=.662. F*=75.55, p<.01

10

‘Get Set’

Now we will develop another way to think about multiple regression, one that re-expresses multiple regression in the form of a simple regression.

We will start with the Age (X1).

The simple regression of Y on X1 has this form:

Ŷi=(intercept) + (slope)X1i

11

The multiple regression model is:Ŷi=24.716 + 1.65X1i - .258X2i

We can make the multiple regression model fit the simple regression form:

Ŷi= (intercept) + (slope)X1i

Ŷi= (24.716 - .258X2i) + (1.65)X1i

When X2=10, then Ŷi= (22.136) + (1.65)X1i

When X2=30, then Ŷi= (16.976) + (1.65)X1i

From this it is clear that the value of X2 can be thought of as changing the intercept of the simple regression of Y on X1, without changing its slope.

12

The simple relationship of Time (Y) and Age (X1) at various levels of Training Miles (X2)

13

Of course we can also work the other direction, and change the multiple regression formula to examine the simple regression of Time (Y) on Miles of Training (X2)

14

The multiple regression model is:Ŷi=24.716 + 1.65X1i - .258X2i

We can make the multiple regression model fit the simple regression form:

Ŷi= (intercept) + (slope)X2i

Ŷi= (24.716 +1.65 X1i) + (-.258)X2i

When X1=20, then Ŷi= (57.716) + (-.258)X2i

When X1=60, then Ŷi= (123.72) + (-.258)X2i

From this it is clear that the value of X1 can be thought of as changing the intercept of the simple regression of Y on X2, without changing its slope.

15

The simple relationship of Time (Y) and Training Miles (X2)at various levels of Age (X1)

16

Additive Model

When we look at these simplified models it is clear that the effect of one variable gets added to the effect of the other, moving the line up or down the Y axis but not changing the slope.

This is known as the ‘additive model’.

17

Interactions Between Predictor Variables

Let’s take a look at a non-additive model. In this case, we raise the possibility that the relationship between age (X1) and time (Y) may differ across levels of the other predictor variable miles of training (X2). To say that the relationship between X1 and Y may differ across levels of X2 is to say that the slope of the regression line of Y on X1 may differ across levels of X2.

18

The slope of the relationship between age and time is less for runners who trained a lot than for those trained less.

Non-Additive Relationship Between X1 and X2

19

Interaction=Non-Additive

Predictor variables interact when the value of one variable influences the relationship (i.e. slope) between the other predictor variables and Y.

20

Interaction & Redundancy

Whether or not there is an interaction between two variables in predicting a third is an issue that is totally independent of whether or not the two predictor variables are redundant with each other. Expunge from your mind any connection between these two issues (if it was there in the first place).

21

Adding Interaction to the Model

To add an interaction between variables to the model, simply add a new variable that is the product of the other two (i.e. create a new variable whose values are the score on X1 times the score on X2), then do a linear regression on that new model:

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

Ŷi=19.20 +.302X1i +(-.076)X2i +(-.005)(X1iX2i)

22

Testing Significance of the Interaction

Test significance as you always do using the model comparison approach.

First, to test the overall model that includes the interaction term:

Model C: Ŷi=β0

Model A: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

H0: β1 = β2 = β3 =0HA: at least one of those betas is not zero.

23

Testing Significance of the Interaction

Second, to test whether adding the interaction term is worthwhile compared to a purely additive model:

Model C: Ŷi= β0 +β1X1i +β2X2i

Model A: Ŷi= β0 +β1X1i +β2X2i +β3(X1iX2i)

H0: β3=0

HA: β30

The test of the partial regression coefficient gives you:PRE=.055, PC=3, PA=4, F*=4.4, p=.039

24

Understanding the Interaction of Predictor Variables

To develop an understanding of the interaction of predictor variables, we will once again take the full model:

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

And translate it into the form of the simple relationship of one predictor variable (X1) and Y:

Ŷi=(intercept) + (slope)X1i

25

‘Go”Full model:

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) =

Ŷi=β0 +β2X2i +β1X1i +β3(X1iX2i) =

Ŷi=β0 +β2X2i +(β1 +β3X2i ) X1i

Simple relationship of Y (time) and X1 (age):

Ŷi= (intercept) + (slope)X1i

Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i

26

Simple Relationship of Y (Time) and X1 (Age)

Ŷi= (intercept) + (slope)X1i

Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i

It is clear in examining the relationship between X1 and Y, that the value of X2 influences both the intercept and the slope of that relationship.

27

Simple Relationship of Time and Age (cont.)

Ŷi= (intercept) + (slope)X1i

Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i

b0=19.20 b1=.302 b2=-.076 b3=-.005

Ŷi=(19.20 +-.076X2i) + (.302 +-.005X2i )X1i

When X2 (i.e. miles) =10, thenŶi=18.44 + .252X1i

When X2 (i.e. miles) =50, thenŶi=15.4 + .052X1i

28

Interactive Model

29

Simple Relationship of Y (Time) and X2 (Miles)

Full model: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) =

Ŷi= β0 +β1X1i +(β2 +β3X1i ) X2i

Simple relationship of Y (time) and X2 (miles):

Ŷi=(intercept) + (slope)X2i

Ŷi=(β0 +β1X1i) + (β2 +β3X1i )X2i

30

Simple Relationship of Time and Miles (cont.)

Ŷi= (intercept) + (slope)X2i

Ŷi= (β0 +β1X1i) + (β2 +β3X1i )X2i

b0=19.20 b1=.302 b2=-.076 b3=-.005

Ŷi=(19.20 +.302X1i) + (-.076 +-.005X2i )X1i

When X1 (i.e. age) =60, thenŶi=37.32 - .376X2i

When X1 (i.e. age) =20, thenŶi=25.24 –.176X2i

31

Interactive Model

32

Back to the AnalysisWe’ve already looked at how you test to see if it

is worthwhile to move from the additive model to the interactive model:

Model C: Ŷi= β0 +β1X1i +β2X2i

Model A: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

H0: β3=0

HA: β30

The next topic involves the interpretation of the partial regression coefficients.

33

Interpreting Partial Regression Coefficients

Ŷi= β0 +β1X1i +β2X2i

Additive model: we’ve covered this in previous chapters. The values of β1 and β2 are the slopes of the regression of Y on that variable when the other variable is held constant (i.e. the slope across values of the other variable). Look back at the scatterplots for the additive model, β1 is the slope of the relationship between Y and X1 across various values of X2, note that the slope doesn’t change.

34

Interpreting Partial Regression Coefficients

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

Interactive model: when X1 and X2 interact, then the slope of the relationship between Y and X1 changes across values of X2 so what does β1 reflect?

Answer: β1 is the slope of the relationship between Y and X1 when X2=0. Note: the slope will be different for other values of X2.

Likewise: β2 is the slope of the relationship between Y and X2 when X1=0.

35

Interpreting β1 and β2 (cont.)

So, β1 is the slope of the regression of Y on X1 when X2=0, or in other words, the slope of the regression of Time on Age for runners who trained 0 miles per week (even though none of our runners trained that little).

β2 is the slope of the regression of Y on X2 when X1=0, or in other words, the slope of the regression of Time on Miles for runners who are 0 years old!

This is not what we are interested in!

36

Better Alternative

A better alternative for when scores of zero in our predictor variables are not of interest, is to use mean deviation scores instead (this is called ‘centering’ our data):

Then regress Y on X’1 and X’2

Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)

22i2i11i1i XXX' XXX'

37

Interpreting β1 and β2 Now

So, β1 is still the slope of the regression of Y on X1 when X’2=0, but now X’2=0 when X2=the mean of X2, which is much more relevant, we now have the relationship between Time and Age for runners who trained an average amount.

β2 is the slope of the regression of Y on X2 when X’1=0, but now X’1=0 when X1=the mean of X1, i.e., we now have the relationship between Time and Miles for runners who were at the average age of our sample.

38

Interpreting β0

For the model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

β0 is the value of Y when all the predictor scores equal zero (rarely of interest)

For the model:Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)

β0 = μY (due to the use of mean deviation scores) and the confidence interval for β0 is thus the confidence interval for μY

39

Interpreting β3

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

β3 represents how much the slope changes in one variable as the other variable changes by 1. It is not influenced by whether you use X1 or X’ 1, or X2 or X’2. So β3 would be the same in both of the following models:

Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)

But the values of β0 , β1 and β2 would be different in the two models.

40

Interpreting β3 (cont.)

Important note: β3 represents the interaction of X1 and X2 only when both of those variables are included by themselves in the model.

For example, in the following model β3 would not represent the interaction of X1 and X2 because β2X2i is not included in the model:

Ŷi=β0 +β1X1i +β3(X1iX2i)

41

Other TransformationsAs we have seen, using X’=(X-mean of X) allows

us to have meaningful β’s, as the partial regression coefficient is the simple relationship of the corresponding variable when the other variable equals its mean.

We can use other transformations. X1i”=(X1i-50) allows us to look at the simple relationship between miles (X2) and time (Y) when age (X1)=50.

42

Regular model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

Model with transformed X1i Ŷi=β0 +β1X”1i +β2X2i +β3(X’’1iX2i)

Transforming the X1i score to X”1i will: 1. Affect the value of β2 (as it now gives the slope for the

relationship between X2 and Y when X1=50).

2. Will not affect B1 (the slope of the relationship between X1 and Y when X2=0).

3. Will not affect B3 (the slope of the interaction term is not affected by transformations of its components as long as all components are included in the model).

43

Power Considerations

The confidence interval formula is the same for all partial regression coefficients, whether of interactive terms or not:

2

p2

pip

crit1p.12...p

R1XX

MSEFb

44

Power Considerations

Smaller confidence intervals mean more power:1. Smaller MSE (i.e. error in the model) means more

power.

2. Larger tolerance (1-R²) means more power.

2

p2

pip

crit1p.12...p

R1XX

MSEFb

45

Power, Transformations, and Redundancy

If you use transformed scores (e.g. mean deviations) then it can affect the redundancy of the interaction term with its component terms (which should then affect the confidence intervals and thus affect power) but this change in redundancy is completely counterbalanced by changes in MSE. Thus using transformed scores will not affect the confidence intervals or power. So...

46

The Point Being...

If your stat package won’t let you include an interaction term because it is too redundant with its component terms (i.e. its tolerance is too low) then you can try using mean deviation component terms (which will change the redundancy of the interaction term with it components without altering the confidence interval of the interaction term).

47

Polynomial (Non-linear)Regressions

What we have learned about how to examine the interaction of variables also provides exactly what we need to see if there might be non-linear relationships between the predictor variables and the criterion variable (Y).

48

Polynomial (Non-linear)Regressions

Let’s say we suspect that the relationship between Time and Miles is not the same across all levels of Miles. In other words, adding 5 more miles per week of training when you are currently at 10 miles per week, will have a different effect than adding 5 more miles when you are currently training at 50 miles per week.

To say that Miles+5 has a different effect when Miles=10 then when Miles=50 is to say that the slope is different at 10 than at 50.

49

X2 Interacting With Itself

In essence we are saying that X2 is interacting with itself.

Previous model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)

This model (ignore X1 and use X2 twice)

Ŷi=β0 +β1X2i +β2X2i +β3(X2iX2i)

50

Interaction Model

Ŷi=β0 +β1X2i +β2X2i +β3(X2iX2i), or,

Ŷi=β0 +β1X2i +β2X2i +β3(X2i²)

However, we cannot calculate the b’s because the variables that go with β1 and β2 are completely redundant (they are the same variable, thus tolerance =0), so we drop one of them (which makes conceptual sense in terms of model building), and get:

Ŷi=β0 +β1X2i +β2(X2i²)

51

In Terms of Simple Relationship

Now let’s once again organize this into the simple relationship between Y and X2 so we can see how it works.

Model: Ŷi= β0 +β1X2i+β2 X2i²

Ŷi= (intercept) + (slope)X2i

Ŷi= (β0 –β2X2i²) + (β1 +2β2X2i )X2i

Where did those terms for the intercept and slope come from? I’ll show you later, right now take my word for it.

52

Simple Relationship (cont.)

Ŷi= (intercept) + (slope)X2i

Ŷi= (β0 –β2X2i²) + (β1 +2β2X2i )X2i

Note that the value of X2 influences both the intercept and the slope of its simple relationship with Y. Thus the relationship (i.e. the slope) between X2 and Y changes across values of the predictor variable.

53

Ŷi=b0 +b1X2i +b2(X2i²)

b0=37.47 b1=.753 b2=.008

Ŷi= 37.47 +.753X2i +.008 (X2i²)

Ŷi= (intercept) + (slope)X2i

Ŷi= (b0 –b2X2i²) + (b1 +2b2X2i )X2i

Ŷi= (37.47–.008X2i²) + (-.753+2(.008)X2i )X2i

When X2=0Ŷi=37.47+(-.753 )X2i

When X2=20Ŷi=34.22+(-.433 )X2i

54

Nonlinear Relationship

The relationship between Time and Miles at any particularvalue of Miles is the line that is tangent to the curve at that point.

55

Nonlinear Relationship

More importantly: the above line is the regression line we arefitting to the data with the squared term in the Model.

56

Interpreting β0

Model: Ŷi= β0 +β1X2i+β2 X2i²

β0 is the predicted value of Y when X2 = 0. In other words, it is the predicted time for a runner who runs zero hours per week. If we use X mean deviation scores then this would be the predicted time for a runner who runs an average number of hours per week.

57

Interpreting β1

Model: Ŷi= β0 +β1X2i+β2 X2i²

β1 is the slope of the relationship between Y and X2 when X2 = 0. The slope will be different at other values of X2

58

Interpreting β2

Model: Ŷi= β0 +β1X2i+β2 X2i²

β2 times 2 is how much the slope of the relationship between Y and X2 changes when X2 increases by 1.

Why times 2?Ŷi=(intercept) + (slope)X2i

Ŷi=(β0 –β2X2i²) + (β1 +2β2X2i )X2i

When X2 changes by 1, the slope is effected by 2 times β2. Another way of saying it is that β2 is half how much the slope changes.

59

Interpreting β2 (cont.)

This interpretation of the coefficient for a quadratic (or higher) term only applies if all of its component terms are included in the model.

Ŷi= β0 +β1X2i+β2 X2i²

The interpretation of β2 depends upon β1 being there.

Ŷi= β0 +β1X2i+β2 X2i² +β3 X2i³

The interpretation of β3 depends upon β1 and β2

being there.

60

Testing Significance of the Quadratic (i.e. X²) Term

Test significance as you always do using the model comparison approach.

To test the overall model that includes the quadratic term:

Model C: Ŷi=β0

Model A: Ŷi=β0 +β1X2i +β2(X2i²)

H0: β1 = β2 =0HA: at least one of those betas is not zero.

61

Testing Significance of the Quadratic (i.e. X²) Term

To test whether adding the quadratic term is worthwhile compared to a linear model:

Model C: Ŷi= β0 +β1X2i

Model A: Ŷi=β0 +β1X2i +β2(X2i²)

The test of the partial regression coefficient does this for you.

62

What About the Linear Term (i.e. X)?

Model: Ŷi=β0 +β1X2i +β2(X2i²)

The t tests for the regression coefficients will tell you whether each β is significantly different than 0. What if, in the example above, β2 is significant but β1 is not? Should you drop β1X2i from your model and keep β2(X2i²)? No, the components of X2² (in this case just X2) give the analysis of X2² its meaning. If the model included X³ we would need to include X² and X in the model for the analysis of X³ to have meaning, and so on.

63

Why?Our goal is to move forward a step at a time in the

complexity of the model. We start with what can be explained linearly, then see how much can be explained above and beyond that by including a quadratic term (i.e. the partial correlation of adding the quadratic term last to a model that contains the linear term). We lose that meaning of the quadratic partial correlation if the linear term is dropped from the model.

Also note that the correlation between two powers of a variable (e.g. X and X²) tends to be very high, meaning that they are quite redundant, and it is not surprising that the linear term might be non-significant when the quadratic term is in the model.

64

Mean Deviation ScoresIf mean deviation scores are used (i.e. X’) then:

Ŷi=β0 +β1X’2i +β2(X’2i²)

1. The coefficient for X (i.e. β1) is the slope of the simple relationship between X and Y when X equals its mean.

2. The coefficient for the quadratic term (i.e. β2) is not affected (as long as all of its components are included in the model).

65

General Approach forArriving at ‘Simple’ Relationships

Being able to turn a complicated model into the simple relationship between Y and the various predictors can be a big aid in understanding how the model works.

In other words...Ŷi= β0 +β1X1i... βpXpi into:

Ŷi=(intercept)+(slope)X1i

Ŷi=(intercept)+(slope)X2i

etc.

66

General Approach

We need to find what is called the ‘partial derivative’ of the model for the particular variable whose simple relationship with Y we would like to examine. We will symbolize the partial derivate as: Modelpd

67

General ApproachThen to create the simple relationship of

Ŷi=(intercept)+(slope)X, where:1. Intercept = Model – (Modelpd)(X)

2. Slope = Modelpd

I would say stop there, but if you must know...combining the simple formula and the two pieces given above...

Ŷi=(Model – (Modelpd)(X))+(Modelpd)X, which while correct just looks confusing and there is no reason to go there.

68

Example

Model we will be working with:Ŷi=β0 +β1X1i +β2X2i +β3(X1i²)

We want to know the simple relationship between X1 and Y.

To make the notation simple, we will call the predictor variable of interest X, and the other predictor variables other letters (in this case we will use Z to stand for X2).

Ŷi=β0 +β1X +β2Z +β3(X²)

69

Rules for Arriving at the Partial Derivative

1. To find the partial derivative of items that are summed together, find the partial derivative of each item and add those together.

2. The partial derivative of aXm is amXm-1. Note that:

a. X1 = X, so the partial derivative of the term 3X² would be (3)(2)(X1) = 6X

b. X0 = 1, so the partial derivative of the term 2X would be (2)(1)(X0) = (2)(1)(1) = 2

3. The partial derivative of any term that doesn’t contain X is 0.

70

Solution for our ExampleModel: β0 + β1X +β2Z + β3X²

Modelpd:0 +(1)(β1)(X0) + 0 + (2)(β3)(X1)

Modelpd: β1 + 2β3X

intercept=Model – (Modelpd)X= β0 + β1X +β2Z + β3X² – (β1 + 2β3X)X

= β0 + β1X +β2Z + β3X² - (β1X + 2β3X²)

= β0 + β1X +β2Z + β3X² - β1X - 2β3X²

= β0 +β2Z - β3X²

slope= Modelpd=β1 + 2β3X

Ŷi=(intercept) + (slope)X

Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X

71

Interpretation

Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X

So what does this tell us? Well it tells us that for any particular value of X that the relationship between X and Y (i.e. the slope) is affected by the value of X. And that the intercept (which moves the regression line up or down on the Y axis) is influenced by both Z and X. This may not seem all that important, but in some complex models it might lead to a better understanding of the relationship between X and Y (to see what role the other variables, and X itself, play in that relationship).

72

Interpretation

Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X

We can also plug in specific values for X and Z, and SPSS’s estimates of the β’s, to see the relationship between Y and X at that point. For example, if SPSS computes b0=3, b1=2.5, b2=6.1, b3=7, and we want to know the relationship between Y and X when X=2 and Z=5, then we have:

Ŷi=(3 +6.1(5) – 7(4)) + (2.5 + 2(7)(2))X

Ŷi=5.5+30.5X