1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have...

61
1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants In Canada, strike rates of 25.5% versus 20.3% Budd’s Claim Foreign-owned plants are larger and located in strike-prone industries Need multivariate regression analysis!

description

3 Important Regression Topics Prediction Various confidence and prediction intervals Diagnostics Are assumptions for estimation & testing fulfilled? Specifications Quadratic terms? Logarithmic dep. vars.? Additional hypothesis tests Partial F tests Dummy dependent variables Probit and logit models

Transcript of 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have...

Page 1: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

1

The Power of Regression

• Previous Research Literature Claim• Foreign-owned manufacturing plants have greater

levels of strike activity than domestic plants• In Canada, strike rates of 25.5% versus 20.3%

• Budd’s Claim• Foreign-owned plants are larger and located in

strike-prone industries• Need multivariate regression analysis!

Page 2: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

2

The Power of Regression

Dependent Variable: Strike Incidence

(1) (2) (3)

U.S. Corporate Parent(Canadian Parent omitted)

0.230**(0.117)

0.201*(0.119)

0.065(0.132)

Number of Employees(1000s)

--- 0.177**(0.019)

0.094**(0.020)

Industry Effects? No No Yes

Sample Size 2,170 2,170 2,170

* Statistically significant at the 0.10 level; ** at the 0.05 level (two-tailed tests).

Page 3: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

3

Important Regression

Topics• Prediction

• Various confidence and prediction intervals• Diagnostics

• Are assumptions for estimation & testing fulfilled?• Specifications

• Quadratic terms? Logarithmic dep. vars.?• Additional hypothesis tests

• Partial F tests• Dummy dependent variables

• Probit and logit models

Page 4: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

4

Confidence Intervals

• The true population [whatever] is within the following interval (1-)% of the time:

Estimate ± t/2 Standard ErrorEstimate

• Just need• Estimate• Standard Error• Shape / Distribution (including degrees of freedom)

Page 5: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

5

Prediction Interval for New

Observation at xp1. Point Estimate 2. Standard Error

3. Shape• t distribution with n-k-1 d.f

4. So prediction interval for a new observation is

Siegel, p. 481

4. So prediction interval for a new observation is

Page 6: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

6

Prediction Interval for Mean

Observations at xp1. Point Estimate 2. Standard Error

3. Shape• t distribution with n-k-1 d.f

4. So prediction interval for a new observation is

Siegel, p. 483

Page 7: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

7

Earlier Example

Regression Statistics

Multiple R 0.770

R Squared 0.594

Adj. R Squared 0.543

Standard Error 10.710

Obs. 10

ANOVA

df SS MS F Significance

Regression 1 1340.452 1341.452 11.686 0.009

Residual 8 917.648 114.706

Total 9 2258.100

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept 39.401 12.153 3.242 0.012 11.375 67.426

hours 2.122 0.621 3.418 0.009 0.691 3.554

Hours of Study (x) and Exam Score (y) Example

1. Find 95% CI for Joe’s exam score (studies for 20 hours)

2. Find 95% CI for mean score for those who studied for 20 hours

-x = 18.80

Page 8: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

8

Diagnostics / Misspecification

• For estimation & testing to be valid…• y = b0 + b1x1 + b2x2 + … + bkxk + e makes sense

• Errors (ei) are independent• of each other• of the independent variables

• Homoskedasticity• Error variance independent of the independent variables

e2 is a constant

• Var(ei) xi2 (i.e., not heteroskedasticity)

Violations render our inferences invalid and misleading!

Page 9: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

9

Common Problems

• Misspecification• Omitted variable bias• Nonlinear rather than linear relationship• Levels, logs, or percent changes?

• Data Problems• Skewed variables and outliers• Multicollinearity• Sample selection (non-random data)• Missing data

• Problems with residuals (error terms)• Non-independent errors• Heteroskedasticity

Page 10: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

10

Omitted Variable Bias

• Question 3 from Sample Exam Bwage = 9.05 + 1.39 union (1.65) (0.66)wage = 9.56 + 1.42 union + 3.87 ability

(1.49) (0.56) (1.56) wage = -3.03 + 0.60 union + 0.25 revenue (0.70) (0.45) (0.08)

• H. Farber thinks the average union wage is different from average nonunion wage because unionized employers are more selective and hire individuals with higher ability.

• M. Friedman thinks the average union wage is different from the average nonunion wage because unionized employers have different levels of revenue per employee.

Page 11: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

11

Checking the Assumptions

• How to check the validity of the assumptions?• Cynicism, Realism, and Theory• Robustness Checks

• Check different specifications• But don’t just choose the best one!

• Automated Variable Selection Methods • e.g., Stepwise regression (Siegel, p. 547)

• Misspecification and Other Tests• Examine Diagnostic Plots

Page 12: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

12

Diagnostic Plots

Predicted Values

Res

idua

ls

Increasing spread might indicate heteroskedasticity. Try transformationsor weightedleast squares.

Page 13: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

13

Diagnostic Plots

Predicted Values

Res

idua

ls“Tilt” from outliers might indicate skewness. Try log transformation

Page 14: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

14

Problematic Outliers

Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)

Number of obs = 44 R-squared = 0.1718------------------------------------------------ stockrating | Coef. Std. Err. t P>|t|-------------+---------------------------------- handicap | -1.711 .580 -2.95 0.005 _cons | 73.234 8.992 8.14 0.000 ------------------------------------------------

Without 7 “Outliers”

Number of obs = 51 R-squared = 0.0017------------------------------------------------ stockrating | Coef. Std. Err. t P>|t|-------------+---------------------------------- handicap | -.173 .593 -0.29 0.771 _cons | 55.137 9.790 5.63 0.000 ------------------------------------------------

With the 7 “Outliers”

Page 15: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

15

Are They Really Outliers??

Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)

Diagnostic Plot is OK

Predicted Values

Resi

dual

s

BE CAREFUL!

Page 16: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

16

Diagnostic Plots

Predicted Values

Res

idua

lsCurvature might indicate nonlinearity. Try quadratic specification

Page 17: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

17

Diagnostic Plots

Predicted Values

Res

idua

lsGood diagnostic plot. Lacks obvious indications of other problems.

Page 18: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

18

Adding Squared (Quadratic) Term

Job Performance regression on Salary (in $1,000s) (Egg Data)

Source | SS df MS Number of obs = 576------- -+-------------------- F(2,573) = 122.42 Model | 255.61 2 127.8 Prob > F = 0.0000Residual | 598.22 573 1.044 R-squared = 0.2994---------+-------------------- Adj R-squared = 0.2969 Total | 853.83 575 1.485 Root MSE = 1.0218---------------+--------------------------------------------job performance| Coef. Std. Err. t P>|t| ---------------+-------------------------------------------- salary | .0980844 .0260215 3.77 0.000salary squared | -.000337 .0001905 -1.77 0.077 _cons | -1.720966 .8720358 -1.97 0.049 ------------------------------------------------------------

Salary Squared = Salary2 [=salary^2 in Excel]

Page 19: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

19

Quadratic Regression

0

2

4

6

8

30 50 70 90 110 130 150

Annual Salary (1000s)

Job

Perfo

rman

ce

Quadratic regression(nonlinear)

Job perf = -1.72 + 0.098 salary – 0.00034 salary squared

Page 20: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

20

0

2

4

6

8

30 50 70 90 110 130 150 170 190

Annual Salary (1000s)

Job

Perfo

rman

ceQuadratic Regression

Job perf = -1.72 + 0.098 salary – 0.00034 salary squared

Effect of salary will eventually turn negative

But where?

Max = -linear coeff.

2*quadratic coeff.

Page 21: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

21

Another Specification

Possibility• If data are very skewed, can try a log specification

• Can use logs instead of levels for independent and/or dependent variables

• Note that the interpretation of the coefficients will change

• Re-familiarize yourself with Siegel, pp. 68-69

Page 22: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

22

Quick Note on Logs

• a is the natural logarithm of x if:

2.71828a = x

or, ea = x • The natural logarithm is abbreviated “ln”

• ln(x) = a• In Excel, use ln function• We call this the “log” but don’t use the “log” function!• Usefulness: spreads out small values and narrows large

values which can reduce skewness

Page 23: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

23

Earnings Distribution

Weekly Earnings from the March 2002 CPS, n=15,000

Skewed to the right

Page 24: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

24

Residuals from Levels

Regression

Residuals from a regression of Weekly Earnings on demographic characteristics

Skewed to the right—use of t distribution is suspect

Page 25: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

25

Log Earnings Distribution

Natural Logarithm of Weekly Earnings from the March 2002 CPS, i.e., =ln(weekly earnings)

Not perfectly symmetrical, but better

Page 26: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

26

Residuals from Log Regression

Residuals from a regression of Log Weekly Earnings on demographic characteristics

Almost symmetrical—use of t distribution is probably OK

Page 27: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

27

Hypothesis Tests• We’ve been doing hypothesis tests for single coefficients

• H0: = 0 reject if |t| > t/2,n-k-1

• HA: 0• What about testing more than one coefficient at the same

time?• e.g., want to see if an entire group of 10 dummy

variables for 10 industries should be in the model• Joint tests can be conducted using partial F tests

Page 28: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

28

Partial F TestsH0: 1 = 2 = 3 = … = C = 0

HA: at least one i 0• How to test this?

• Consider two regressions• One as if H0 is true

• i.e., 1 = 2 = 3 = … = C = 0 • This is a “restricted” (or constrained) model

• Plus a “full” (or unconstrained) model in which the computer can estimate what it wants for each coefficient

Page 29: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

29

Partial F Tests• Statistically, need to distinguish between

• Full regression “no better” than the restricted regression– versus –

• Full regression is “significantly better” than the restricted regression

• To do this, look at variance of prediction errors• If this declines significantly, then reject H0

• From ANOVA, we know ratio of two variances has an F distribution• So use F test

Page 30: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

30

Partial F Tests

• SSresidual = Sum of Squares Residual• C = #constraints • The partial F statistic has C, n-k-1 degrees of freedom

• Reject H0 if F > F,C, n-k-1

1)k/(nSS)/CSS(SS

Fullresidual

Fullresidual

Restrictedresidual

F

Page 31: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

31

Coal Mining Example (Again)

Regression Statistics

R Squared 0.955

Adj. R Squared 0.949

Standard Error 108.052

Obs. 47

ANOVA df SS MS F Significance

Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

Page 32: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

32

Minitab OutputPredictor Coef StDev T PConstant -168.5 258.8 -0.65 0.519hours 1.2235 0.186 6.56 0.000tons 0.0478 0.403 0.12 0.906unemp 19.618 5.660 3.47 0.001WWII 159.85 78.22 2.04 0.048Act1952 -9.8 100.0 -0.10 0.922Act1969 -203.0 111.5 -1.82 0.076

S = 108.1 R-Sq = 95.5% R-Sq(adj) = 94.9%

Analysis of VarianceSource DF SS MS F PRegression 6 9975695 1662616 142.41 0.000Error 40 467008 11675Total 46 10442703

Page 33: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

33

Is the Overall Model

Significant?H0: 1 = 2 = 3 = … = 6 = 0

HA: at least one i 0• Note: for testing the overall model, C=k

• i.e., testing all coefficients together• From the previous slides, we have SSresidual for the “full”

(or unconstrained) model • SSresidual=467,007.875

• But what about for the restricted (H0 true) regression?• Estimate a constant only regression

Page 34: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

34

Constant-Only Model

Regression Statistics

R Squared 0

Adj. R Squared 0

Standard Error 476.461

Obs. 47

ANOVA df SS MS F Significance

Regression 0 0 0 . .

Residual 46 10442702.809 227015.278

Total 46 10442702.809

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept 671.937 69.499 9.668 0.0000 532.042 811.830

Page 35: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

35

Partial F Tests

H0: 1 = 2 = 3 = … = 6 = 0

HA: at least one i 0

• Reject H0 if F > F,C, n-k-1 = F0.05,6,40 = 2.34

• 142.406 > 2.34 so reject H0. Yes, overall model is significant

1)65/(47467,007.875)/6467,007.872.809(10,442,70

F

= 142.406

Page 36: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

36

Select F Distribution

5% Critical ValuesNumerator Degrees of Freedom

1 2 3 4 5 6 …1 161 199 216 225 230 2342 18.5 19.0 19.2 19.2 19.3 19.33 10.1 9.55 9.28 9.12 9.01 8.948 5.32 4.46 4.07 3.84 3.69 3.5810 4.96 4.10 3.71 3.48 3.33 3.2211 4.84 3.98 3.59 3.36 3.20 3.0912 4.75 3.89 3.49 3.26 3.11 3.0018 4.41 3.55 3.16 2.93 2.77 2.6640 3.94 3.09 2.84 2.46 2.31 2.19

1000 3.85 3.00 2.61 2.38 2.22 2.11…D

enom

inat

or D

egre

es o

f Fre

edom

Page 37: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

37

A Small ShortcutRegression Statistics

R Squared 0.955

Adj. R Squared 0.949

Standard Error 108.052

Obs. 47

ANOVA df SS MS F Significance

Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

For constant only model, SSresidual=10,442,702.809

So to test overall model, you don’t need to run a constant-only model

Page 38: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

38

An Even Better Shortcut

Regression Statistics

R Squared 0.955

Adj. R Squared 0.949

Standard Error 108.052

Obs. 47

ANOVA df SS MS F Significance

Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

In fact, the ANOVA table F test is exactly the test for the overall model being significant—recall Unit 8

Page 39: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

39

Testing Any Subset

Regression Statistics

R Squared 0.955

Adj. R Squared 0.949

Standard Error 108.052

Obs. 47

ANOVA df SS MS F Significance

Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

Partial F test can be used to test any subset of variables

For example, H0: WWII = Act1952 = Act1969 = 0

HA: at least one i 0

Page 40: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

40

Restricted Model

Regression Statistics

R Squared 0.955

Adj. R Squared 0.949

Standard Error 108.052

Obs. 47

ANOVA df SS MS F Significance

Regression 3 9837344.76 3279114.920 232.923 0.000

Residual 43 605358.049 14078.094

Total 46 10442702.809

Coeff. Std. Error t stat p value

Intercept 147.821 166.406 0.888 0.379

hours 0.0015 0.0001 20.522 0.000

tons -0.0008 0.0003 -2.536 0.015

unemp 7.298 4.386 1.664 0.103

Restricted regression with WWII = Act1952 = Act1969 = 0

Page 41: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

41

Partial F Tests

H0: WWII = Act1952 = Act1969 = 0

HA: at least one i 0

• Reject H0 if F > F,C, n-k-1 = F0.05,3,40 = 2.84

• 3.95 > 2.84 so reject H0. Yes, subset of three coefficients are jointly significant

1)65/(47467,007.875)/3467,007.8749(605,358.0

F

= 3.950

Page 42: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

42

Regression and Two-Way ANOVA

TreatmentsA B C

1 10 9 82 12 6 53 18 15 144 20 18 185 8 7 8

Blo

cks

“Stack” data using dummy

variables

A B C B2 B3 B4 B5 Value1 0 0 0 0 0 0 101 0 0 1 0 0 0 121 0 0 0 1 0 0 181 0 0 0 0 1 0 201 0 0 0 0 0 1 80 1 0 0 0 0 0 90 1 0 1 0 0 0 60 1 0 0 1 0 0 150 1 0 0 0 1 0 180 1 1 0 0 0 1 70 0 1 0 0 0 0 8

… …

Page 43: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

43

Recall Two-Way Results

ANOVA: Two-Factor Without Replication

Source of Variation

SS df MS F P-value

F crit

Blocks 312.267 4 78.067 38.711 0.000 3.84Treatment 26.533 2 13.267 6.579 0.020 4.46Error 16.133 8 2.017Total 354.933 14

Page 44: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

44

Regression and Two-Way ANOVA

Source | SS df MS Number of obs = 15----------+---------------------- F( 6, 8) = 28.00 Model | 338.800 6 56.467 Prob > F = 0.0001 Residual | 16.133 8 2.017 R-squared = 0.9545-------------+------------------- Adj R-squared = 0.9205 Total | 354.933 14 25.352 Root MSE = 1.4201

-------------------------------------------------------------treatment | Coef. Std. Err. t P>|t| [95% Conf. Int]----------+-------------------------------------------------- b | -2.600 .898 -2.89 0.020 -4.671 -.529 c | -3.000 .898 -3.34 0.010 -5.071 -.929 b2 | -1.333 1.160 -1.15 0.283 -4.007 1.340 b3 | 6.667 1.160 5.75 0.000 3.993 9.340 b4 | 9.667 1.160 8.34 0.000 6.993 12.340 b5 | -1.333 1.160 -1.15 0.283 -4.007 1.340 _cons | 10.867 .970 11.20 0.000 8.630 13.104-------------------------------------------------------------

Page 45: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

45

Regression and Two-Way ANOVA

Regression Excerpt for Full Model Source | SS df MS---------+------------------- Model | 338.800 6 56.467Residual | 16.133 8 2.017

---------+------------------- Total | 354.933 14 25.352

Regression Excerpt for b2= b3 =… 0 Source | SS df MS---------+------------------- Model | 26.533 2 13.267Residual | 328.40 12 27.367---------+------------------- Total | 354.933 14 25.352

Regression Excerpt for b= c = 0 Source | SS df MS---------+------------------- Model | 312.267 4 78.067Residual | 42.667 10 4.267---------+------------------- Total | 354.933 14 25.352

Use these SSresidual values to do partial F tests and you will get exactly the same answers as the Two-Way ANOVA tests

Page 46: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

46

Select F Distribution

5% Critical ValuesNumerator Degrees of Freedom

1 2 3 4 5 6 9 …1 161 199 216 225 230 234 2412 18.5 19.0 19.2 19.2 19.3 19.3 19.43 10.1 9.55 9.28 9.12 9.01 8.94 8.818 5.32 4.46 4.07 3.84 3.69 3.58 3.39

10 4.96 4.10 3.71 3.48 3.33 3.22 3.0211 4.84 3.98 3.59 3.36 3.20 3.09 2.9012 4.75 3.89 3.49 3.26 3.11 3.00 2.8018 4.41 3.55 3.16 2.93 2.77 2.66 2.4640 3.94 3.09 2.84 2.46 2.31 2.19 2.12

1000 3.85 3.00 2.61 2.38 2.22 2.11 1.89 3.84 3.00 2.60 2.37 2.21 2.10 1.83D

enom

inat

or D

egre

es o

f Fre

edom

Page 47: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

47

3 Seconds of Calculus

xx

xx

)log(xy

xy

constantaisbif0 o

xbo

11 )( bxxb

Page 48: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

48

Regression Coefficients

• y = b0 + b1x

(linear form)

• log(y) = b0 + b1x (semi-log form)

• log(y) = b0 + b1log(x) (double-log form)

1 unit change in x changes y by b1

1%%

//

)log()log( b

xy

xxyy

xy

1%/)log( b

xy

xyy

xy

1bxy

1 unit change in x changes y by b1

(x100) percent

1 percent change in x changes y by b1

percent

Page 49: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

49

Log Regression Coefficients

• wage = 9.05 + 1.39 union• Predicted wage is $1.39 higher for unionized workers (on

average)• log(wage) = 2.20 + 0.15 union

• Semi-elasticity• Predicted wage is approximately 15% higher for unionized

workers (on average)• log(wage) = 1.61 + 0.30 log(profits)

• Elasticity• A one percent increase in profits increases predicted wages

by approximately 0.3 percent

Page 50: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

50

Multicollinearity

Number of obs = 69F( 2, 66) = 6.84Prob > F = 0.0020R-squared = 0.1718Adj R-squared = 0.1467Root MSE = .91445----------------------------------------------repair | Coef. Std. Err. t P>|t| -------+--------------------------------------weight | -.00017 .00038 -0.41 0.685engine | -.00313 .00328 -0.96 0.342 _cons | 4.50161 .61987 7.26 0.000----------------------------------------------

Auto repair records, weight, and engine size

Page 51: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

51

Multicollinearity• Two (or more) independent variables are so highly correlated

that a multiple regression can’t disentangle the unique contributions of each• Large standard errors and lack of statistical significance for

individual coefficients• But joint significance

• Identifying multicollinearity• Some say “rule of thumb |r|>0.70” (or 0.80)• But better to look at results

• OK for prediction • Bad for assessing theory

Page 52: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

52

Prediction With Multicollinearity

• Prediction at the Mean (weight=3019 and engine=197)

Model for prediction

Predicted Repair

Lower 95% Limit

(Mean)

Upper95% Limit

(Mean)

Multiple Regression 3.411 3.191 3.631

WeightOnly 3.412 3.193 3.632

EngineOnly 3.410 3.192 3.629

Page 53: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

53

Dummy Dependent Variables

• Dummy dependent variables• y = b0 + b1x1 + … + bkxk + e• Where y is a {0,1} indicator variable

• Examples• Do you intend to quit? yes / no• Did the worker receive training? yes/no• Do you think the President is doing a good job? yes/no• Was there a strike? yes / no• Did the company go bankrupt? yes/no

Page 54: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

54

Linear Probability

Model• Mathematically / computationally, can estimate a regression

as usual (the monkeys won’t know the difference)• This is called a “linear probability model”

• Right-hand side is linear• And is estimating probabilities

• P(y =1) = b0 + b1x1 + … + bkxk

• b1=0.15 (for example) means that a one unit change in x1 increases probability that y=1 by 0.15 (fifteen percentage points)

Page 55: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

55

Linear Probability

Model• Excel won’t know the difference, but perhaps it should• Linear probability model problems

e2 = P(y=1)[1-P(y=1)]

• But P(y =1) = b0 + b1x1 + … + bkxk

• So e2 is

• Predicted probabilities are not bounded by 0,1• R2 is not an accurate measure of predictive ability

• Can use a pseudo-R2 measure• Such as percent correctly predicted

Page 56: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

56

Logit Model &Probit Model

• Solution to these problems is to use nonlinear functional forms that bound P(y=1) between 0,1

• Logit Model (logistic regression)

• Probit Model

• Where is the normal cumulative distribution function

exbxbxbb

exbxbxbb

kk

kk

eeyP

...

...

22110

22110

1)1(

)...()1( 22110 exbxbxbbyP kk

Recall, ln(x) = a when ea = x

Page 57: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

57

Logit Model &Probit Model

• Nonlinear so need statistical package to do the calculations• Can do individual (z-tests, not t-tests) and joint statistical

testing as with other regressions• Also confidence intervals

• Need to convert coefficients to marginal effects for interpretation

• Should be aware of these models• Though in many cases, a linear probability model works

just fine

Page 58: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

58

Example• Dep. Var: 1 if you know of the FMLA, 0 otherwise

Probit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000Log likelihood = -707.94377 Pseudo R2 = 0.1410------------------------------------------------------------FMLAknow | Coef. Std. Err. z P>|z| [95% Conf. Int]---------+-------------------------------------------------- union | .238 .101 2.35 0.019 .039 .436 age | -.002 .018 -0.13 0.897 -.038 .033 agesq | .135 .219 0.62 0.536 -.293 .564nonwhite | -.571 .098 -5.80 0.000 -.764 -.378 income | 1.465 .393 3.73 0.000 .696 2.235incomesq | -5.854 2.853 -2.05 0.040 -11.45 -.262[other controls omitted] _cons | -1.188 .328 -3.62 0.000 -1.831 -.545------------------------------------------------------------

Page 59: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

59

Marginal Effects• For numerical interpretation / prediction, need to convert

coefficients to marginal effects• Example: Logit Model

• So b1 gives effect on Log(•), not P(y=1)• Probit is similar

• Can re-arrange to find out effect on P(y=1)• Usually do this at the sample means

exbxbxbbyP

yPkk

...)1(1

)1(log 22110

Page 60: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

60

Marginal EffectsProbit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000Log likelihood = -707.94377 Pseudo R2 = 0.1410------------------------------------------------------------FMLAknow | dF/dx Std. Err. z P>|z| [95% Conf. Int]---------+-------------------------------------------------- union | .095 .040 2.35 0.019 .017 .173 age | -.001 .007 -0.13 0.897 -.015 .013 agesq | .054 .087 0.62 0.536 -.117 .225Nonwhite | -.222 .036 -5.80 0.000 -.293 -.151 income | .585 .157 3.73 0.000 .278 .891incomesq | -2.335 1.138 -2.05 0.040 -4.566 -.105[other controls omitted]-----------------------------------------------------------For numerical interpretation / prediction, need to convert coefficients to marginal effects

Page 61: 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have greater levels of strike activity than domestic plants.

61

But Linear Probability

Model is OK, TooProbitCoeff.

Union 0.238 (0.101)

Nonwhite -0.571 (0.098)

Income 1.465 (0.393)

Income Squared

-5.854 (2.853)

ProbitMarginal

0.095 (0.040)-0.222(0.037) 0.585(0.157) -2.335(1.138)

Regression0.084 (0.035)-0.192(0.033)0.442

(0.091)-1.354(0.316)

So regression is usually OK, but should

still be familiar with

logit and probit

methods