1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have...

1

The Power of Regression

• Previous Research Literature Claim• Foreign-owned manufacturing plants have greater

levels of strike activity than domestic plants• In Canada, strike rates of 25.5% versus 20.3%

• Budd’s Claim• Foreign-owned plants are larger and located in

strike-prone industries• Need multivariate regression analysis!

2

The Power of Regression

Dependent Variable: Strike Incidence

(1) (2) (3)

U.S. Corporate Parent(Canadian Parent omitted)

0.230**(0.117)

0.201*(0.119)

0.065(0.132)

Number of Employees(1000s)

--- 0.177**(0.019)

0.094**(0.020)

Industry Effects? No No Yes

Sample Size 2,170 2,170 2,170

* Statistically significant at the 0.10 level; ** at the 0.05 level (two-tailed tests).

3

Important Regression

Topics• Prediction

• Various confidence and prediction intervals• Diagnostics

• Are assumptions for estimation & testing fulfilled?• Specifications

• Quadratic terms? Logarithmic dep. vars.?• Additional hypothesis tests

• Partial F tests• Dummy dependent variables

• Probit and logit models

4

Confidence Intervals

• The true population [whatever] is within the following interval (1-)% of the time:

Estimate ± t/2 Standard ErrorEstimate

• Just need• Estimate• Standard Error• Shape / Distribution (including degrees of freedom)

5

Prediction Interval for New

Observation at xp1. Point Estimate 2. Standard Error

3. Shape• t distribution with n-k-1 d.f

4. So prediction interval for a new observation is

Siegel, p. 481


6

Prediction Interval for Mean

Observations at xp1. Point Estimate 2. Standard Error

3. Shape• t distribution with n-k-1 d.f


Siegel, p. 483

7

Earlier Example

Regression Statistics

Multiple R 0.770

R Squared 0.594

Adj. R Squared 0.543

Standard Error 10.710

Obs. 10

ANOVA

df SS MS F Significance

Regression 1 1340.452 1341.452 11.686 0.009

Residual 8 917.648 114.706

Total 9 2258.100

Coeff. Std. Error t stat p value Lower 95% Upper 95%

Intercept 39.401 12.153 3.242 0.012 11.375 67.426

hours 2.122 0.621 3.418 0.009 0.691 3.554

Hours of Study (x) and Exam Score (y) Example

1. Find 95% CI for Joe’s exam score (studies for 20 hours)

2. Find 95% CI for mean score for those who studied for 20 hours

-x = 18.80

8

Diagnostics / Misspecification

• For estimation & testing to be valid…• y = b0 + b1x1 + b2x2 + … + bkxk + e makes sense

• Errors (ei) are independent• of each other• of the independent variables

• Homoskedasticity• Error variance independent of the independent variables

e2 is a constant

• Var(ei) xi2 (i.e., not heteroskedasticity)

Violations render our inferences invalid and misleading!

9

Common Problems

• Misspecification• Omitted variable bias• Nonlinear rather than linear relationship• Levels, logs, or percent changes?

• Data Problems• Skewed variables and outliers• Multicollinearity• Sample selection (non-random data)• Missing data

• Problems with residuals (error terms)• Non-independent errors• Heteroskedasticity

10

Omitted Variable Bias

• Question 3 from Sample Exam Bwage = 9.05 + 1.39 union (1.65) (0.66)wage = 9.56 + 1.42 union + 3.87 ability

(1.49) (0.56) (1.56) wage = -3.03 + 0.60 union + 0.25 revenue (0.70) (0.45) (0.08)

• H. Farber thinks the average union wage is different from average nonunion wage because unionized employers are more selective and hire individuals with higher ability.

• M. Friedman thinks the average union wage is different from the average nonunion wage because unionized employers have different levels of revenue per employee.

11

Checking the Assumptions

• How to check the validity of the assumptions?• Cynicism, Realism, and Theory• Robustness Checks

• Check different specifications• But don’t just choose the best one!

• Automated Variable Selection Methods • e.g., Stepwise regression (Siegel, p. 547)

• Misspecification and Other Tests• Examine Diagnostic Plots

12

Diagnostic Plots

Predicted Values

Res

idua

ls

Increasing spread might indicate heteroskedasticity. Try transformationsor weightedleast squares.

13

Diagnostic Plots

Predicted Values

Res

idua

ls“Tilt” from outliers might indicate skewness. Try log transformation

14

Problematic Outliers

Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)

Number of obs = 44 R-squared = 0.1718------------------------------------------------ stockrating | Coef. Std. Err. t P>|t|-------------+---------------------------------- handicap | -1.711 .580 -2.95 0.005 _cons | 73.234 8.992 8.14 0.000 ------------------------------------------------

Without 7 “Outliers”

Number of obs = 51 R-squared = 0.0017------------------------------------------------ stockrating | Coef. Std. Err. t P>|t|-------------+---------------------------------- handicap | -.173 .593 -0.29 0.771 _cons | 55.137 9.790 5.63 0.000 ------------------------------------------------

With the 7 “Outliers”

15

Are They Really Outliers??

Stock Performance and CEO Golf Handicaps (New York Times, 5-31-98)

Diagnostic Plot is OK

Predicted Values

Resi

dual

s

BE CAREFUL!

16

Diagnostic Plots

Predicted Values

Res

idua

lsCurvature might indicate nonlinearity. Try quadratic specification

17

Diagnostic Plots

Predicted Values

Res

idua

lsGood diagnostic plot. Lacks obvious indications of other problems.

18

Adding Squared (Quadratic) Term

Job Performance regression on Salary (in $1,000s) (Egg Data)

Source | SS df MS Number of obs = 576------- -+-------------------- F(2,573) = 122.42 Model | 255.61 2 127.8 Prob > F = 0.0000Residual | 598.22 573 1.044 R-squared = 0.2994---------+-------------------- Adj R-squared = 0.2969 Total | 853.83 575 1.485 Root MSE = 1.0218---------------+--------------------------------------------job performance| Coef. Std. Err. t P>|t| ---------------+-------------------------------------------- salary | .0980844 .0260215 3.77 0.000salary squared | -.000337 .0001905 -1.77 0.077 _cons | -1.720966 .8720358 -1.97 0.049 ------------------------------------------------------------

Salary Squared = Salary2 [=salary^2 in Excel]

19

Quadratic Regression

0

2

4

6

8

30 50 70 90 110 130 150

Annual Salary (1000s)

Job

Perfo

rman

ce

Quadratic regression(nonlinear)

Job perf = -1.72 + 0.098 salary – 0.00034 salary squared

20

0

2

4

6

8

30 50 70 90 110 130 150 170 190

Annual Salary (1000s)

Job

Perfo

rman

ceQuadratic Regression

Job perf = -1.72 + 0.098 salary – 0.00034 salary squared

Effect of salary will eventually turn negative

But where?

Max = -linear coeff.

2*quadratic coeff.

21

Another Specification

Possibility• If data are very skewed, can try a log specification

• Can use logs instead of levels for independent and/or dependent variables

• Note that the interpretation of the coefficients will change

• Re-familiarize yourself with Siegel, pp. 68-69

22

Quick Note on Logs

• a is the natural logarithm of x if:

2.71828a = x

or, ea = x • The natural logarithm is abbreviated “ln”

• ln(x) = a• In Excel, use ln function• We call this the “log” but don’t use the “log” function!• Usefulness: spreads out small values and narrows large

values which can reduce skewness

23

Earnings Distribution

Weekly Earnings from the March 2002 CPS, n=15,000

Skewed to the right

24

Residuals from Levels

Regression

Residuals from a regression of Weekly Earnings on demographic characteristics

Skewed to the right—use of t distribution is suspect

25

Log Earnings Distribution

Natural Logarithm of Weekly Earnings from the March 2002 CPS, i.e., =ln(weekly earnings)

Not perfectly symmetrical, but better

26

Residuals from Log Regression

Residuals from a regression of Log Weekly Earnings on demographic characteristics

Almost symmetrical—use of t distribution is probably OK

27

Hypothesis Tests• We’ve been doing hypothesis tests for single coefficients

• H0: = 0 reject if |t| > t/2,n-k-1

• HA: 0• What about testing more than one coefficient at the same

time?• e.g., want to see if an entire group of 10 dummy

variables for 10 industries should be in the model• Joint tests can be conducted using partial F tests

28

Partial F TestsH0: 1 = 2 = 3 = … = C = 0

HA: at least one i 0• How to test this?

• Consider two regressions• One as if H0 is true

• i.e., 1 = 2 = 3 = … = C = 0 • This is a “restricted” (or constrained) model

• Plus a “full” (or unconstrained) model in which the computer can estimate what it wants for each coefficient

29

Partial F Tests• Statistically, need to distinguish between

• Full regression “no better” than the restricted regression– versus –

• Full regression is “significantly better” than the restricted regression

• To do this, look at variance of prediction errors• If this declines significantly, then reject H0

• From ANOVA, we know ratio of two variances has an F distribution• So use F test

30

Partial F Tests

• SSresidual = Sum of Squares Residual• C = #constraints • The partial F statistic has C, n-k-1 degrees of freedom

• Reject H0 if F > F,C, n-k-1

1)k/(nSS)/CSS(SS

Fullresidual

Fullresidual

Restrictedresidual

F

31

Coal Mining Example (Again)


R Squared 0.955



Obs. 47

ANOVA df SS MS F Significance

Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809


Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

32

Minitab OutputPredictor Coef StDev T PConstant -168.5 258.8 -0.65 0.519hours 1.2235 0.186 6.56 0.000tons 0.0478 0.403 0.12 0.906unemp 19.618 5.660 3.47 0.001WWII 159.85 78.22 2.04 0.048Act1952 -9.8 100.0 -0.10 0.922Act1969 -203.0 111.5 -1.82 0.076

S = 108.1 R-Sq = 95.5% R-Sq(adj) = 94.9%

Analysis of VarianceSource DF SS MS F PRegression 6 9975695 1662616 142.41 0.000Error 40 467008 11675Total 46 10442703

33

Is the Overall Model

Significant?H0: 1 = 2 = 3 = … = 6 = 0

HA: at least one i 0• Note: for testing the overall model, C=k

• i.e., testing all coefficients together• From the previous slides, we have SSresidual for the “full”

(or unconstrained) model • SSresidual=467,007.875

• But what about for the restricted (H0 true) regression?• Estimate a constant only regression

34

Constant-Only Model


R Squared 0

Adj. R Squared 0


Obs. 47


Regression 0 0 0 . .

Residual 46 10442702.809 227015.278

Total 46 10442702.809


Intercept 671.937 69.499 9.668 0.0000 532.042 811.830

35

Partial F Tests

H0: 1 = 2 = 3 = … = 6 = 0

HA: at least one i 0

• Reject H0 if F > F,C, n-k-1 = F0.05,6,40 = 2.34

• 142.406 > 2.34 so reject H0. Yes, overall model is significant

1)65/(47467,007.875)/6467,007.872.809(10,442,70

F

= 142.406

36

Select F Distribution

5% Critical ValuesNumerator Degrees of Freedom

1 2 3 4 5 6 …1 161 199 216 225 230 2342 18.5 19.0 19.2 19.2 19.3 19.33 10.1 9.55 9.28 9.12 9.01 8.948 5.32 4.46 4.07 3.84 3.69 3.5810 4.96 4.10 3.71 3.48 3.33 3.2211 4.84 3.98 3.59 3.36 3.20 3.0912 4.75 3.89 3.49 3.26 3.11 3.0018 4.41 3.55 3.16 2.93 2.77 2.6640 3.94 3.09 2.84 2.46 2.31 2.19

1000 3.85 3.00 2.61 2.38 2.22 2.11…D

enom

inat

or D

egre

es o

f Fre

edom

37

A Small ShortcutRegression Statistics

R Squared 0.955



Obs. 47


Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809


Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

For constant only model, SSresidual=10,442,702.809

So to test overall model, you don’t need to run a constant-only model

38

An Even Better Shortcut


R Squared 0.955



Obs. 47


Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809


Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

In fact, the ANOVA table F test is exactly the test for the overall model being significant—recall Unit 8

39

Testing Any Subset


R Squared 0.955



Obs. 47


Regression 6 9975694.933 1662615.822 142.406 0.000

Residual 40 467007.875 11675.197

Total 46 10442702.809


Intercept -168.510 258.819 -0.651 0.519 -691.603 354.583

hours 1.244 0.186 6.565 0.000 0.001 0.002

tons 0.048 0.403 0.119 0.906 -0.001 0.001

unemp 19.618 5.660 3.466 0.001 8.178 31.058

WWII 159.851 78.218 2.044 0.048 1.766 317.935

Act1952 -9.839 100.045 -0.098 0.922 -212.038 192.360

Act1969 -203.010 111.535 -1.820 0.076 -428.431 22.411

Partial F test can be used to test any subset of variables

For example, H0: WWII = Act1952 = Act1969 = 0


40

Restricted Model


R Squared 0.955



Obs. 47


Regression 3 9837344.76 3279114.920 232.923 0.000

Residual 43 605358.049 14078.094

Total 46 10442702.809

Coeff. Std. Error t stat p value

Intercept 147.821 166.406 0.888 0.379

hours 0.0015 0.0001 20.522 0.000

tons -0.0008 0.0003 -2.536 0.015

unemp 7.298 4.386 1.664 0.103

Restricted regression with WWII = Act1952 = Act1969 = 0

41

Partial F Tests

H0: WWII = Act1952 = Act1969 = 0


• Reject H0 if F > F,C, n-k-1 = F0.05,3,40 = 2.84

• 3.95 > 2.84 so reject H0. Yes, subset of three coefficients are jointly significant

1)65/(47467,007.875)/3467,007.8749(605,358.0

F

= 3.950

42

Regression and Two-Way ANOVA

TreatmentsA B C

1 10 9 82 12 6 53 18 15 144 20 18 185 8 7 8

Blo

cks

“Stack” data using dummy

variables

A B C B2 B3 B4 B5 Value1 0 0 0 0 0 0 101 0 0 1 0 0 0 121 0 0 0 1 0 0 181 0 0 0 0 1 0 201 0 0 0 0 0 1 80 1 0 0 0 0 0 90 1 0 1 0 0 0 60 1 0 0 1 0 0 150 1 0 0 0 1 0 180 1 1 0 0 0 1 70 0 1 0 0 0 0 8

… …

43

Recall Two-Way Results

ANOVA: Two-Factor Without Replication

Source of Variation

SS df MS F P-value

F crit

Blocks 312.267 4 78.067 38.711 0.000 3.84Treatment 26.533 2 13.267 6.579 0.020 4.46Error 16.133 8 2.017Total 354.933 14

44


Source | SS df MS Number of obs = 15----------+---------------------- F( 6, 8) = 28.00 Model | 338.800 6 56.467 Prob > F = 0.0001 Residual | 16.133 8 2.017 R-squared = 0.9545-------------+------------------- Adj R-squared = 0.9205 Total | 354.933 14 25.352 Root MSE = 1.4201

-------------------------------------------------------------treatment | Coef. Std. Err. t P>|t| [95% Conf. Int]----------+-------------------------------------------------- b | -2.600 .898 -2.89 0.020 -4.671 -.529 c | -3.000 .898 -3.34 0.010 -5.071 -.929 b2 | -1.333 1.160 -1.15 0.283 -4.007 1.340 b3 | 6.667 1.160 5.75 0.000 3.993 9.340 b4 | 9.667 1.160 8.34 0.000 6.993 12.340 b5 | -1.333 1.160 -1.15 0.283 -4.007 1.340 _cons | 10.867 .970 11.20 0.000 8.630 13.104-------------------------------------------------------------

45


Regression Excerpt for Full Model Source | SS df MS---------+------------------- Model | 338.800 6 56.467Residual | 16.133 8 2.017

---------+------------------- Total | 354.933 14 25.352

Regression Excerpt for b2= b3 =… 0 Source | SS df MS---------+------------------- Model | 26.533 2 13.267Residual | 328.40 12 27.367---------+------------------- Total | 354.933 14 25.352

Regression Excerpt for b= c = 0 Source | SS df MS---------+------------------- Model | 312.267 4 78.067Residual | 42.667 10 4.267---------+------------------- Total | 354.933 14 25.352

Use these SSresidual values to do partial F tests and you will get exactly the same answers as the Two-Way ANOVA tests

46

Select F Distribution

5% Critical ValuesNumerator Degrees of Freedom

1 2 3 4 5 6 9 …1 161 199 216 225 230 234 2412 18.5 19.0 19.2 19.2 19.3 19.3 19.43 10.1 9.55 9.28 9.12 9.01 8.94 8.818 5.32 4.46 4.07 3.84 3.69 3.58 3.39

10 4.96 4.10 3.71 3.48 3.33 3.22 3.0211 4.84 3.98 3.59 3.36 3.20 3.09 2.9012 4.75 3.89 3.49 3.26 3.11 3.00 2.8018 4.41 3.55 3.16 2.93 2.77 2.66 2.4640 3.94 3.09 2.84 2.46 2.31 2.19 2.12

1000 3.85 3.00 2.61 2.38 2.22 2.11 1.89 3.84 3.00 2.60 2.37 2.21 2.10 1.83D

enom

inat

or D

egre

es o

f Fre

edom

47

3 Seconds of Calculus

xx

xx

)log(xy

xy

constantaisbif0 o

xbo

11 )( bxxb

48

Regression Coefficients

• y = b0 + b1x

(linear form)

• log(y) = b0 + b1x (semi-log form)

• log(y) = b0 + b1log(x) (double-log form)

1 unit change in x changes y by b1

1%%

//

)log()log( b

xy

xxyy

xy

1%/)log( b

xy

xyy

xy

1bxy

1 unit change in x changes y by b1

(x100) percent

1 percent change in x changes y by b1

percent

49

Log Regression Coefficients

• wage = 9.05 + 1.39 union• Predicted wage is $1.39 higher for unionized workers (on

average)• log(wage) = 2.20 + 0.15 union

• Semi-elasticity• Predicted wage is approximately 15% higher for unionized

workers (on average)• log(wage) = 1.61 + 0.30 log(profits)

• Elasticity• A one percent increase in profits increases predicted wages

by approximately 0.3 percent

50

Multicollinearity

Number of obs = 69F( 2, 66) = 6.84Prob > F = 0.0020R-squared = 0.1718Adj R-squared = 0.1467Root MSE = .91445----------------------------------------------repair | Coef. Std. Err. t P>|t| -------+--------------------------------------weight | -.00017 .00038 -0.41 0.685engine | -.00313 .00328 -0.96 0.342 _cons | 4.50161 .61987 7.26 0.000----------------------------------------------

Auto repair records, weight, and engine size

51

Multicollinearity• Two (or more) independent variables are so highly correlated

that a multiple regression can’t disentangle the unique contributions of each• Large standard errors and lack of statistical significance for

individual coefficients• But joint significance

• Identifying multicollinearity• Some say “rule of thumb |r|>0.70” (or 0.80)• But better to look at results

• OK for prediction • Bad for assessing theory

52

Prediction With Multicollinearity

• Prediction at the Mean (weight=3019 and engine=197)

Model for prediction

Predicted Repair

Lower 95% Limit

(Mean)

Upper95% Limit

(Mean)

Multiple Regression 3.411 3.191 3.631

WeightOnly 3.412 3.193 3.632

EngineOnly 3.410 3.192 3.629

53

Dummy Dependent Variables

• Dummy dependent variables• y = b0 + b1x1 + … + bkxk + e• Where y is a {0,1} indicator variable

• Examples• Do you intend to quit? yes / no• Did the worker receive training? yes/no• Do you think the President is doing a good job? yes/no• Was there a strike? yes / no• Did the company go bankrupt? yes/no

54

Linear Probability

Model• Mathematically / computationally, can estimate a regression

as usual (the monkeys won’t know the difference)• This is called a “linear probability model”

• Right-hand side is linear• And is estimating probabilities

• P(y =1) = b0 + b1x1 + … + bkxk

• b1=0.15 (for example) means that a one unit change in x1 increases probability that y=1 by 0.15 (fifteen percentage points)

55

Linear Probability

Model• Excel won’t know the difference, but perhaps it should• Linear probability model problems

e2 = P(y=1)[1-P(y=1)]

• But P(y =1) = b0 + b1x1 + … + bkxk

• So e2 is

• Predicted probabilities are not bounded by 0,1• R2 is not an accurate measure of predictive ability

• Can use a pseudo-R2 measure• Such as percent correctly predicted

56

Logit Model &Probit Model

• Solution to these problems is to use nonlinear functional forms that bound P(y=1) between 0,1

• Logit Model (logistic regression)

• Probit Model

• Where is the normal cumulative distribution function

exbxbxbb

exbxbxbb

kk

kk

eeyP

...

...

22110

22110

1)1(

)...()1( 22110 exbxbxbbyP kk

Recall, ln(x) = a when ea = x

57

Logit Model &Probit Model

• Nonlinear so need statistical package to do the calculations• Can do individual (z-tests, not t-tests) and joint statistical

testing as with other regressions• Also confidence intervals

• Need to convert coefficients to marginal effects for interpretation

• Should be aware of these models• Though in many cases, a linear probability model works

just fine

58

Example• Dep. Var: 1 if you know of the FMLA, 0 otherwise

Probit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000Log likelihood = -707.94377 Pseudo R2 = 0.1410------------------------------------------------------------FMLAknow | Coef. Std. Err. z P>|z| [95% Conf. Int]---------+-------------------------------------------------- union | .238 .101 2.35 0.019 .039 .436 age | -.002 .018 -0.13 0.897 -.038 .033 agesq | .135 .219 0.62 0.536 -.293 .564nonwhite | -.571 .098 -5.80 0.000 -.764 -.378 income | 1.465 .393 3.73 0.000 .696 2.235incomesq | -5.854 2.853 -2.05 0.040 -11.45 -.262[other controls omitted] _cons | -1.188 .328 -3.62 0.000 -1.831 -.545------------------------------------------------------------

59

Marginal Effects• For numerical interpretation / prediction, need to convert

coefficients to marginal effects• Example: Logit Model

• So b1 gives effect on Log(•), not P(y=1)• Probit is similar

• Can re-arrange to find out effect on P(y=1)• Usually do this at the sample means

exbxbxbbyP

yPkk

...)1(1

)1(log 22110

60

Marginal EffectsProbit estimates Number of obs = 1189 LR chi2(14) = 232.39 Prob > chi2 = 0.0000Log likelihood = -707.94377 Pseudo R2 = 0.1410------------------------------------------------------------FMLAknow | dF/dx Std. Err. z P>|z| [95% Conf. Int]---------+-------------------------------------------------- union | .095 .040 2.35 0.019 .017 .173 age | -.001 .007 -0.13 0.897 -.015 .013 agesq | .054 .087 0.62 0.536 -.117 .225Nonwhite | -.222 .036 -5.80 0.000 -.293 -.151 income | .585 .157 3.73 0.000 .278 .891incomesq | -2.335 1.138 -2.05 0.040 -4.566 -.105[other controls omitted]-----------------------------------------------------------For numerical interpretation / prediction, need to convert coefficients to marginal effects

61

But Linear Probability

Model is OK, TooProbitCoeff.

Union 0.238 (0.101)

Nonwhite -0.571 (0.098)

Income 1.465 (0.393)

Income Squared

-5.854 (2.853)

ProbitMarginal

0.095 (0.040)-0.222(0.037) 0.585(0.157) -2.335(1.138)

Regression0.084 (0.035)-0.192(0.033)0.442

(0.091)-1.354(0.316)

So regression is usually OK, but should

still be familiar with

logit and probit

methods

1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have...

Documents

Transcript of 1 The Power of Regression Previous Research Literature Claim Foreign-owned manufacturing plants have...