Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

96
1 Regression Analysis with SPSS Robert A. Yaffee, Ph.D. Statistics, Mapping and Social Science Group Academic Computing Services Information Technology Services New York University Office: 75 Third Ave Level C3 Tel: 212.998.3402 E-mail: [email protected] February 04

Transcript of Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

Page 1: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

1

Regression Analysiswith SPSS

Robert A. Yaffee, Ph.D.Statistics, Mapping and Social Science Group

Academic Computing Services Information Technology Services

New York UniversityOffice: 75 Third Ave Level C3

Tel: 212.998.3402E-mail: [email protected]

February 04

Page 2: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

2

Outline1. Conceptualization2. Schematic Diagrams of Linear Regression

processes3. Using SPSS, we plot and test relationships

for linearity4. Nonlinear relationships are transformed to

linear ones5. General Linear Model6. Derivation of Sums of Squares and ANOVA

Derivation of intercept and regression coefficients

7. The Prediction Interval and its derivation8. Model Assumptions

1. Explanation2. Testing3. Assessment

9. Alternatives when assumptions are unfulfilled

Page 3: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

3

Conceptualization of Regression Analysis

• Hypothesis testing

• Path Analytical Decomposition of effects

Page 4: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

4

Hypothesis Testing

• For example: hypothesis 1 : X is statistically significantly related to Y.– The relationship is positive (as X

increases, Y increases) or negative (as X decreases, Y increases).

– The magnitude of the relationship is small, medium, or large.

If the magnitude is small, then a unit change in x is associated with a small change in Y.

Page 5: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

5

Regression AnalysisHave a clear notion of what you can and

cannot do with regression analysis

• Conceptualization– A Path Model of a Regression

Analysis

Path Diagram of A Linear RegressionAnalysis

YY

X1

X2

x3

error

i iY k b x b x b x e 1 1 2 2 3 3

Page 6: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

6

A Path AnalysisDecomposition of Effects into Direct,Indirect, Spurious, and Total Effects

X1

Y3

X2

Y1

Y2

Error

A

B

C

D

E F

Direct Effects:Paths C, E, F

Indirect Effects:Paths

AC, BE, DF

Total Effects:Sum of Direct and

Indirect Effects

Spurious effects are due tocommon (antecedent) causes

Error

Error

Error

In a path analysis, Yi is endogenous. It is the outcome of several paths.

Direct effects on Y3: C,E, F

Indirect effects on Y3: BF, BDF

Total Effects= Direct + Indirect effects

Page 7: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

7

Interaction Analysis

X1

X2

Y

A

B

C

Y= K + aX1 + BX2 + CX1*X2

Interaction coefficient: C

X1 and X2 must be in model for interaction to be properly specified.

Page 8: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

8

A Precursor to Modeling with Regression

• Data Exploration: Run a scatterplot matrix and search for linear relationships with the dependent variable.

Page 9: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

9

Click on graphs and then on scatter

Page 10: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

10

When the scatterplot dialog box appears, select

Matrix

Page 11: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

11

A Matrix of Scatterplots will appear

Search for distinct linear relationships

Page 12: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

12

Page 13: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

13

Page 14: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

14

Decomposition of the Sums of Squares

Page 15: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

15

Graphical Decomposition of Effects

X

Yy a b x

X

Y

iY

ˆi iy y error

y y r e g re s s io n e f fe c t iy y Total Effect

D ec o m p o s it io n o f E ffec ts

Page 16: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

16

Decomposition of the sum of squares

ˆ ˆ

( )

ˆ ˆ

ˆ ˆ( ) ( ) ( )

ˆ ˆ( ) ( ) ( )

i i i i

i i i i

n n n

i i i ii i i

Y Y Y Y Y Y

total effect error effects regression model effect

Y Y Y Y Y Y per case i

Y Y Y Y Y Y per case i

Y Y Y Y Y Y for data set

2 2 2

2 2 2

1 1 1

Page 17: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

17

Decomposition of the sum of squares

• Total SS = model SS + error SS

and if we divide by df

• This yields the Variance Decomposition: We have the total variance= model variance + error variance

ˆ ˆ( ) ( ) ( )n n n

i i i ii i i

Y Y Y Y Y Y

n n k k

2 2 2

1 1 1

1 1

Page 18: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

18

F test for significance and R2 for magnitude of effect

• R2 = Model var/total var

( , )k n k

RkFR

n k

2

1 211

ˆ( )

ˆ( )

n

ii

n

i ii

Y Y

kRY Y

n k

2

1

2

2

1

1

•F test for model significance= Model Var/Error Var

Page 19: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

19

ANOVA tests the significance of the Regression Model

Page 20: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

20

The Multiple Regression Equation

• We proceed to the derivation of its components: – The intercept: a– The regression parameters, b1 and b2

i iY a b x b x e 1 1 2 2

Page 21: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

21

Derivation of the Intercept

n n n

i i ii i i

n n n n

i i i ii i i i

n

ii

n n n

i i ii i i

a y b x

n n

i ii i

y a bx e

e y a bx

e y a b x

Because by definition e

y a b x

na y b x

a y bx

1 1 1

1 1 1 1

1

1 1 1

1 1

0

0

Page 22: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

22

Derivation of the Regression Coefficient

:

( )

( )

( )

( )

i i i

i i i

n n

i i ii i

n n

i i ii i

n

i n ni

i i i ii i

n n

i i i ii i

n

i iin

ii

Given y a b x e

e y a b x

e y a b x

e y a b x

ex y b x x

b

x y b x x

x yb

x

1 1

2 2

1 1

2

1

1 1

1 1

1

2

1

2 2

0 2 2

Page 23: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

23

• If we recall that the formula for the correlation coefficient can be expressed as follows:

Page 24: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

24

from which it can be seen that the regression coefficient b,is a function of r.

n

i ii 1

n n2 2i i

i 1 i 1

i

i

x yr

x y

where

x x x

y y y

n

i ii 1

j n2

i 1

x yb

x

* yj

x

sdb r

sd

Page 25: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

25

Extending the bivariate caseTo the Multiple linear regression case

Page 26: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

26

1 2 1 2

1 2

1 2

. 2* (6)

1

yx yx x x yyx x

x x x

r r r sd

r sd

2 1 1 2

2 1

1 2

. 2* (7)

1

yx yx x x yyx x

x x x

r r r sd

r sd

1 1 2 2 (8)a Y b x b x

It is also easy to extend the bivariate interceptto the multivariate case as follows.

Page 27: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

27

Significance Tests for the Regression Coefficients

1. We find the significance of the parameter estimates by using the F or t test.

2. The R2 is the proportion of variance explained.

3. 2 (n-1)= 1-(1-R )

(n-p-1)Adjusted R

where n sample size

p number of parameters inmodel

2

Page 28: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

28

F and T tests for significance for overall

model

/

( ) /( )

2

2

Model varianceF

error variance

R p

1 R n p 1

where

p number of parameters

n sample size

( )* 2

2

t F

n 2 r

1 r

Page 29: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

29

Significance tests

• If we are using a type II sum of squares, we are dealing with the ballantine. DV Variance explained = a + b

Page 30: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

30

Significance tests

T tests for statistical significance

a

b

tse

bt

se

0

0

Page 31: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

31

Significance tests

Standard Error of intercept

( )*

( ) ( )

ia

i

xY YSE

n n n x x

22

2

1

2 1

ˆ

ˆ

ˆ

b

n

i

SEx

where std devof residual

e

n

2

2

2 1

2

Standard error of regression coefficient

Page 32: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

32

Programming Protocol

After invoking SPSS, procede to File, Open, Data

Page 33: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

33

Select a Data Set (we choose employee.sav)

and click on open

Page 34: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

34

We open the data set

Page 35: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

35

To inspect the variable formats, click on variable

view on the lower left

Page 36: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

36

Because gender is a string variable, we need to

recode gender into a numeric format

Page 37: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

37

We autorecode gender by clicking on transform and

then autorecode

Page 38: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

38

We select gender and move it into the variable

box on the right

Page 39: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

39

Give the variable a new name and click on add

new name

Page 40: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

40

Click on ok and the numeric variable sex is

created

It has values 1 for female and 2 for male and those values labelsare inserted.

Page 41: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

41

To invoke Regression analysis,

Click on Analyze

Page 42: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

42

Click on Regression and then linear

Page 43: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

43

Select the dependent variable: Current Salary

Page 44: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

44

Enter it in the dependent variable box

Page 45: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

45

Entering independent variables

• These variables are entered in blocks. First the potentially confounding covariates that have to entered.

• We enter time on job, beginning salary, and previous experience.

Page 46: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

46

After entering the covariates, we click on

next

Page 47: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

47

We now enter the hypotheses we wish to

test• We are testing for minority or

sex differences in salary after controlling for the time on job, previous experience, and beginning salary.

• We enter minority and numeric gender (sex)

Page 48: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

48

After entering these variables, click on

statistics

Page 49: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

49

We select the following statistics from the dialog box and click on continue

Page 50: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

50

Click on plots to obtain the plots dialog box

Page 51: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

51

We click on OK to run the regression analysis

Page 52: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

52

Navigation window (left) and output window(right)

This shows that SPSS is reading the variables correctly

Page 53: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

53

Variables Entered and Model Summary

Page 54: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

54

Omnibus ANOVA

Significance Tests for the Model at each stage of the

analysis

Page 55: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

55

Full ModelCoefficients

12036.3 1.83

165.17 23.64

2882.84 1419.7

CurSal BeginSal

Jobtime Exper

gender Minority

Page 56: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

56

We omit insignificant variables and rerun the analysis to obtain trimmed

model coefficients

12126.5 1.85

163.20 24.36

2694.30

CurSal BeginSal

Jobtime Exper

gender

Page 57: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

57

Beta weights

• These are standardized regression coefficients used to compare the contribution to the explanation of the variance of the dependent variable within the model.

Page 58: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

58

T tests and signif.

• These are the tests of significance for each parameter estimate.

• The significance levels have to be less than .05 for the parameter to be statistically significant.

Page 59: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

59

Assumptions of the Linear Regression Model

1. Linear Functional form

2. Fixed independent variables

3. Independent observations

4. Representative sample and proper specification of the model (no omitted variables)

5. Normality of the residuals or errors

6. Equality of variance of the errors (homogeneity of residual variance)

7. No multicollinearity

8. No autocorrelation of the errors

9. No outlier distortion

Page 60: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

60

Explanation of the Assumptions

1. 1. Linear Functional form1. Does not detect curvilinear relationships

2. Independent observations1. Representative samples2. Autocorrelation inflates the t and r and f statistics and

warps the significance tests3. Normality of the residuals

1. Permits proper significance testing4. Equality of variance

1. Heteroskedasticity precludes generalization and external validity

2. This also warps the significance tests5. Multicollinearity prevents proper parameter

estimation. It may also preclude computation of the parameter estimates completely if it is serious enough.

6. Outlier distortion may bias the results: If outliers have high influence and the sample is not large enough, then they may serious bias the parameter estimates

Page 61: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

61

Diagnostic Tests for the Regression Assumptions

1. Linearity tests: Regression curve fitting1. No level shifts: One regime

2. Independence of observations: Runs test3. Normality of the residuals: Shapiro-Wilks or

Kolmogorov-Smirnov Test4. Homogeneity of variance if the residuals: White’s

General Specification test5. No autocorrelation of residuals: Durbin Watson or

ACF or PACF of residuals6. Multicollinearity: Correlation matrix of independent

variables.. Condition index or condition number7. No serious outlier influence: tests of additive

outliers: Pulse dummies.1. Plot residuals and look for high leverage of residuals2. Lists of Standardized residuals 3. Lists of Studentized residuals4. Cook’s distance or leverage statistics

Page 62: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

62

Explanation of Diagnostics

1. Plots show linearity or nonlinearity of relationship

2. Correlation matrix shows whether the independent variables are collinear and correlated.

3. Representative sample is done with probability sampling

Page 63: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

63

Explanation of Diagnostics

Tests for Normality of the residuals. The residuals are saved and then subjected to either of:Kolmogorov-Smirnov Test: Tests

the limit of the theoretical cumulative normal distribution against your residual distribution.

Nonparametric Tests

1 sample K-S test

Page 64: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

64

Collinearity Diagnostics

R

small tolerances imply problems

ToleranceSmall intercorrelations among indep vars

means VIF

VIF signifies problems

21

Variance InflationFactor (VIF)

1

1

10

Tolerance

Page 65: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

65

More Collinearity Diagnostics

condition numbers

= maximum eigenvalue/minimum eigenvalue.If condition numbers are between

100 and 1000, there is moderate to strong collinearity

condition index k

where k condition number

If Condition index > 30 then there is strong collinearity

Page 66: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

66

Outlier Diagnostics

1. Residuals. 1. The predicted value minus the actual

value. This is otherwise known as the error.

2. Studentized Residuals 1. the residuals divided by their

standard errors without the ith observation

3. Leverage, called the Hat diag1. This is the measure of influence of

each observation 4. Cook’s Distance:

1. the change in the statistics that results from deleting the observation. Watch this if it is much greater than 1.0.

Page 67: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

67

Outlier detection

• Outlier detection involves the determination whether the residual (error = predicted – actual) is an extreme negative or positive value.

• We may plot the residual versus the fitted plot to determine which errors are large, after running the regression.

Page 68: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

68

Create Standardized Residuals

• A standardized residual is one divided by its standard deviation.

ˆi istandardized

y yresid

swhere s std devof residuals

Page 69: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

69

Limits of Standardized Residuals

If the standardized residuals have values in excess of 3.5

and -3.5, they are outliers.

If the absolute values are less than 3.5, as these are, then there are no outliers

While outliers by themselves only distort mean prediction when the sample size is small enough, it is important to gauge the influence of outliers.

Page 70: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

70

Outlier Influence

• Suppose we had a different data set with two outliers.

• We tabulate the standardized residuals and obtain the following output:

Page 71: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

71

Outlier a does not distort and outlier b does.

Page 72: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

72

Studentized Residuals

• Alternatively, we could form studentized residuals. These are distributed as a t distribution with df=n-p-1, though they are not quite independent. Therefore, we can approximately determine if they are statistically significant or not.

• Belsley et al. (1980) recommended the use of studentized residuals.

Page 73: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

73

Studentized Residual

( )

( )

( )i

s ii

i

si

i

i

ee

s h

where

e studentized residual

s standard deviationwhereithobs is deleted

h leverage statistic

2 1

These are useful in estimating the statistical significanceof a particular observation, of which a dummy variableindicator is formed. The t value of the studentized residualwill indicate whether or not that observation is a significantoutlier.The command to generate studentized residuals, called rstudt is:predict rstudt, rstudent

Page 74: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

74

Influence of Outliers

1. Leverage is measured by the diagonal components of the hat matrix.

2. The hat matrix comes from the formula for the regression of Y.

ˆ '( ' ) '

'( ' ) ' ,

,

ˆ

Y X X X X X Y

where X X X X the hat matrix H

Therefore

Y HY

1

1

Page 75: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

75

Leverage and the Hat matrix

1. The hat matrix transforms Y into the predicted scores.

2. The diagonals of the hat matrix indicate which values will be outliers or not.

3. The diagonals are therefore measures of leverage.

4. Leverage is bounded by two limits: 1/n and 1. The closer the leverage is to unity, the more leverage the value has.

5. The trace of the hat matrix = the number of variables in the model.

6. When the leverage > 2p/n then there is high leverage according to Belsley et al. (1980) cited in Long, J.F. Modern Methods of Data Analysis (p.262). For smaller samples, Vellman and Welsch (1981) suggested that 3p/n is the criterion.

Page 76: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

76

Cook’s D

1. Another measure of influence.

2. This is a popular one. The formula for it is:

'( )

i ii

i i

h eCook s D

p h s h

2

2

1

1 1

Cook and Weisberg(1982) suggested that values of D that exceeded 50% of the F distribution (df = p, n-p)are large.

Page 77: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

77

Using Cook’s D in SPSS

• Cook is the option /R• Finding the influential outliers• List cook, if cook > 4/n• Belsley suggests 4/(n-k-1) as a cutoff

Page 78: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

78

DFbeta

• One can use the DFbetas to ascertain the magnitude of influence that an observation has on a particular parameter estimate if that observation is deleted.

( )

( )

.

ij j jj

jj

j

b b uDFbeta

u h

where u residuals of

regressionof x on remaining xs

21

Page 79: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

79

Programming Diagnostic Tests

Testing homoskedasiticitySelect histogram, normal probability plot,

and insert *zresid in Yand *zpred in X

Then click on continue

Page 80: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

80

Click on Save to obtain the Save dialog box

Page 81: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

81

We select the following

Then we click on continue, go back to the Main Regression Menu and click on OK

Page 82: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

82

Check for linear Functional Form

• Run a matrix plot of the dependent variable against each independent variable to be sure that the relationship is linear.

Page 83: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

83

Move the variables to be graphed into the box on the upper right, and

click on OK

Page 84: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

84

Residual Autocorrelation check

nt t

i t

Durbin Watson d

tests first order

autocorrelationof residuals

e ed

e

1

1

2

See significance tables for this statistic

Page 85: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

85

Run the autocorrelation function from the Trends Module for a better analysis

Page 86: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

86

Testing for Homogeneity of variance

Page 87: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

87

Normality of residuals can be visually inspected from the histogram with the superimposed normal curve.

Here we check the skewness for symmetry and the kurtosis for peakedness

Page 88: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

88

Kolmogorov Smirnov Test: An objective test of normality

Page 89: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

89

Page 90: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

90

Page 91: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

91

Multicollinearity test with the correlation matrix

Page 92: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

92

Page 93: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

93

Page 94: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

94

Alternatives to Violations of Assumptions

• 1. Nonlinearity: Transform to linearity if there is nonlinearity or run a nonlinear regression

• 2. Nonnormality: Run a least absolute deviations regression or a median regression (available in other packages or generalized linear models [ SPLUS glm, STATA glm, or SAS Proc MODEL or PROC GENMOD)].

• 3. Heteroskedasticity: weighted least squares regression (SPSS) or white estimator (SAS, Stata, SPLUS). One can use a robust regression procedure (SAS, STATA, or SPLUS) to obtain downweighted outlier effect in the estimation.

• 4. Autocorrelation: Run AREG in SPSS Trends module or either Prais or Newey-West procedure in STATA.

• 4. Multicollinearity: components regression or ridge regression or proxy variables. 2sls in SPSS or ivreg in stata or SAS proc model or proc syslin.

Page 95: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

95

Model Building Strategies

• Specific to General: Cohen and Cohen

• General to Specific: Hendry and Richard

• Extreme Bounds analysis: E. Leamer.

Page 96: Regression Analysis (with SPSS) by R.A.Yaffee (sintesi for dummies)

96

Nonparametric Alternatives

1. If there is nonlinearity, transform to linearity first.

2. If there is heteroskedasticity, use robust standard errors with STATA or SAS or SPLUS.

3. If there is non-normality, use quantile regression with bootstrapped standard errors in STATA or SPLUS.

4. If there is autocorrelation of residuals, use Newey-West autoregression or First order autocorrelation correction with Areg. If there is higher order autocorrelation, use Box Jenkins ARIMA modeling.