Analysis of the Propensity to Earn Non-Wage Income in America

Analysis of the Propensity

to Earn Non-Wage

Income in America

2016

MICROECONOMETRICS FINAL PROJECT EMILIO JOSE CALLE

HOPKINS AAP | December 2016

1

Question of Interest

Differences in income have continuously being studied based on gender, race and educational level

in regards to wage discrimination in the workplace. But what happens in non-wage income, such

as earnings from dividends, capital gains, rents, and interests? Are the gaps (gender, race,

education) just as wide or are they even wider, especially in a country such as the US where this

non-wage income can make a pretty substantive share of the total income of a household? In a year

filled with racial and gender divides, the sources on inequality in the country represent a greater

interest now than ever before.

Literature Review

The first report reviewed was “Expanding the Economic Base Model to Include Nonwage

Income”, by Katherine Nesse from Kansas State University. This report was particularly

interesting as it detailed both where some of the most important sources of nonwage income were

coming from, and the economic impact they are having. Specifically it mentions how retirement

income and medicare/Medicaid income (both government transfers) had an impact on

employment, production, and jobs in places such as Florida that contain a high proportion of

recipients due to old-age people retiring there as well as stationed military.

Another paper reviewed was “Accounting for Changes in Income Inequality: Decomposition

Analyses for the UK, 1978-2008”, by Mike Brewer and Liam Wren-Lewis, that is basically a

British recollection of the same phenomenon of retirees and people on government support in

Britain, but in particular in the more affordable parts of the country which include Scotland and

Ireland.

“The effects of economic growth on income inequality in the US” by Amir Rubina and Dan Segalb.

This article is quite relevant because it points to other sources of income growth for American not

related either to wages nor to transfers, such as the growth of wealth due to the rising value of Real

Estate in the US, non-farm land, and other related investment activities. Even though the article

does not say so, these conclusions match what Piketty has presented in his book “Capital in the

21st Centrury, where he also mentions that nor wages nor stocks nor financial instruments but that

real estate has been the biggest driver of wealth creation in developed nations in the last century.

Other documents reviewed include: “Reformas de las pensiones públicas y privadas en España”

from Rosado Cebrian, y Alonso Fernandez; that had a much higher focus on people living longer

and having to do so with shrinking pensions (it would seem like nonwage income turning negative

up to a point); Economic Growth and Income Inequality in the Asia-Pacific Region: A

Comparative Study of China, Japan, South Korea, and the United States” by Yiwen Yang and

Theresa Greaney, which offers interesting facts but it’s a bit too wide for this essay that is more

focused on the reality of the United States.

Describe the Data Source

The source to be used in this study is the “2013 Survey of Consumer Finances”, developed and

published by the Board of Governors of the Federal Reserve System. According to the Fed, the

SCF is the most recent survey conducted. The SCF is normally a triennial cross-sectional survey

of U.S. families. The survey data include information on families’ balance sheets, pensions,

income, and demographic characteristics. Information is also included from related surveys of

pension providers and the earlier such surveys conducted by the Federal Reserve Board. No other

2

study for the country collects comparable information. Therefore, although this survey presents

data from the 2013, the data points about income are from 2012 results, gathered through 5 years,

thus it will be transformed into panel data using Xtset. Stata describes this dataset, before any

editing, this way:obs 30,075/vars:324 /size: 31,789,275

The data from the survey is presented in 2010 dollars according to the source. It’s also useful for

this particular research that the variables that are going to be analyzed (Age, Gender, Education,

Race and Marital Status), are registered in categorical variables such as age of the household

head1:<35, 2:35-44, 3:45-54, 4:55-64, 5:65-74, 6:>=7. Race is defined with 1=white non-

Hispanic, 2=black/African-American, 3=Hispanic, 5=other; and finally education is defined as

1=no high school diploma/GED, 2=high school diploma or GED, 3=some college, 4=college

degree; 1=married/living with partner, 2=neither married nor living with partner. However, Age

and Education are also presented in numerical values. The initial Tobit results are going to be

presented both ways (numerical and categorical, whenever possible).

As it’s not necessary for this project to go beyond the presented variables, and as the dataset is too

heavy to be manipulated comfortably using Stata, it has been cut to include just the aforementioned

variables for the purposes of this study, plus the identification variables YY1, Y1 and WGT for

survey weighting purposes. Reviewing the SAS codebook used for this survey, the proposed

weighting by the authors was that only the data points where Y1, YY1 and WGT were positive

(>0) together should be used. This was the first test performed on the dataset and was found that

all datapoints were compliant, possibly

Proposed empirical framework

Continuing with the survey weighting situation, a problem with the data given by the Federal

Reserve is that when working with SCF data, standard error calculations can overestimate the

reliability of regressions and other statistics unless two other kinds of error are accounted for:

imputation error and sample variability error. One way to account for this would be to use the

Replicate Weight files provided by the Fed, plus specific stata code that is suggested. However

this weighting problem mainly affects standard errors due to imputation error as describe by the

survey authors: “Missing data in the SCF are imputed 5 times, meaning that each SCF family has

5 separate observations (called “implicates”) in the final data.”. This situation, added to the

complications of fixing the data, prompts to look for a simpler solution, one that relies less on the

precision of standard errors. One way to do this is by using Non-Linear Square Estimators, as there

is a considerable technical difference when analyzing the squared errors, as: “For linear models,

the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS

Total. This seems quite logical. The variance that the regression model accounts for plus the error

variance adds up to equal the total variance. Further, R-squared equals SS Regression / SS Total,

which mathematically must produce a value between 0 and 100%. In nonlinear regression, SS

Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear

models, and it no longer has to be between 0 and 100%.” (from Minitab Blog). The sample

variability problem, that results from using a sample, rather than a population, can also partially

be compensated with a Non-Linear model and the fact that the sample size is fairly large, even

after constraining the variables.

The Tobit model is a Non-Linear Least Square Estimator, that belongs to the Selection models

criteria, and it’s used as a Maximum Likelihood Method. Jay Steward, from the US Bureau of

Labor Statistics best describes the reason why the Tobit model has become so popular in cases

3

with censored data in the following way: “Tobit has been the predominant approach in more-recent

studies. The Tobit model would seem to be a sensible approach, because it was developed

specifically for situations where the dependent variable is truncated at zero or some other cutoff.

The standard discussion of the Tobit model (Tobin, 1958) assumes that there is a latent variable

(for example, desired expenditures) underlying the observed dependent variable (actual

expenditures). The two are equal when the latent variable is greater than zero, but the observed

variable is zero when the latent variable is negative.”. (BLS Working papers, November 2009,

“Tobit or not Tobit?”).

Another reason for using a Non-Linear LS estimation approach is that if the depend variable is

constrained, and there’s clustering of the data at the constraint, using OLS on the complete sample

would be biased and inconsistent, while performing OLS on the unclustered part would be biased

and inconsistent as well. As this research is on Non-Wage Income, all the wage income has been

initially suppressed. Also, the creation of the Non-Wage variable has to be done, as it’s not part of

the original survey. Here it’s important to notice that Non-Wage income can be negative, as some

of it’s components can turn negative such as government transfers and business farm income. Thus

a straight Income-Wage Income approach is not necessarily the ideal formula to use. Instead, the

variable NonWageIncome (NonWincome as expressed) was composed by adding 6 variables:

Farm Income + Dividend Income + Government Transfers income + Social Security Income +

Retirement Account income + Appreciation Income (KG Income income from asset

appreciation, gold, etc).

As the values can be negative depending on where they fall in the distribution, it’s important to

separate those who do not receive any other income but wage. In this case this was done by

dropping those datapoints where all these 6 variables were equal to 0 at the same time, thus

guaranteeing that Non-Wage Income was indeed from having no income and not from arithmetic

error. This censoring reduced the observations by 7056, reducing the final sample to 23,019.

After compensating for these data situations, the plan is to apply a Tobit regression both at the

levels and logs of NonWage income, using the variables as Panel Data and using Xttobit. Results

are going to be presented as both a numerical value and a propensity. Afterwards, the fit of the

model is going to be analyzed by computing the fitted y’s and correlating them with the variable.

Also, the statistical significance of each category is going to be analyzed. The marginal effects at

the means will be computed. It’s important to notice that, as there are negative numbers in Non-

Wage Income, it was necessary to do a transformation of the data to be able to log it. This is so

because the values, including the negative ones make sense as well (negative non-wage income

might be a discriminatory motivated factor, for example). Finally, the Xttobit command has the

advantage of including es the overall and panel-level variance components (labeled sigma e and

sigma u, respectively) together with rho, which is the percent contribution to the total variance of

the panel-level variance component. When rho is zero, the panel-level variance component is

unimportant, and the panel estimator is not different from the pooled estimator. A likelihood-ratio

test of this is included at the bottom of the output. This test formally compares the pooled estimator

(tobit) with the panel estimator. Also it’s important to notice that Stata rejects Age in numerical

form for the xtTobit regression, thus it’s done in categorical form.

The exclusionary restriction in this report is then not having non-wage income.

4

Descriptive Statistics

Raw data, before any transformation

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- INCOME | 30075 856289.2 6014268 0 1.80e+08 WAGEINC | 30075 122615.7 652381.2 0 2.24e+07 2.HHSEX | 30075 .2365752 .4249863 0 1 AGE | 30075 51.75328 16.17062 18 95 -------------+-------------------------------------------------------- EDUC | 30075 13.96359 2.699456 -1 17 2.MARRIED| 30075 .3733998 .483715 0 1 RACE | 2 | 30075 .1240565 .3296515 0 1 -------------+-------------------------------------------------------- 3 | 30075 .0924688 .2896914 0 1 5 | 30075 .0478803 .2135165 0 1

The results are that total income has a mean of $856,289.20 yet the wage income has a mean of

$122,615.70, pointing towards a big inequality in earnings. The standard deviations results are not

reliable for the reasons presented before in the previous sections. Regarding the other variables, it

can be seen that on average the houselholds are headed by women 23.66% of the time, that 37.34%

are single, the average age of the household head in this sample is 51.75 years, has almost 14 years

of education (about an associates degree), and are identified as “white” by 73.55%.

NonWincome

Percentiles Smallest

1% -2312.117 -517411

5% 202.9063 -517411

10% 1318.891 -517411 Obs 23019

25% 7913.345 -517411 Sum of Wgt. 23019

50% 22319.69 Mean 895536.8

Largest Std. Dev. 5805035

75% 92084.58 1.34e+08

90% 933368.9 1.38e+08 Variance 3.37e+13

95% 3094321 1.38e+08 Skewness 13.7253

99% 1.93e+07 1.55e+08 Kurtosis 238.8792

The Non-Wage Income variable is heavily skewed and has a high kurtosis. This could come as an

alert that the Tobit MLE might not be the best model for this. One option would be to see if these

characteristics persist by ignoring the observations at 0 and below. This was done, but the results

are basically the same. However this report will also transform NonWageIncome by adding

600.000, and then logging it. This way no value below zero is not ignored, and the log can be

5

taken. Working with logs instead of levels reduces both the Skewness and the Kurtosis

significantly as shown below, thus this report will focus on using it. See Annex.

LnNonWincome

Percentiles Smallest

1% 13.30082 11.32163

5% 13.30502 11.32163

10% 13.30688 11.32163 Obs 23019

25% 13.31779 11.32163 Sum of Wgt. 23019

50% 13.34121 Mean 13.58935

Largest Std. Dev. .6784092

75% 13.44746 18.71591

90% 14.24298 18.74466 Variance .460239

95% 15.12231 18.74993 Skewness 3.637583

99% 16.80503 18.86196 Kurtosis 18.2104

Dependent Variable: Log of Non-Wage Income (Transformed)

The Skewness and the Kurtosis found can be the result of this survey focusing on Household

Heads, meaning you must be the top-earner in a home. This might be causing a Selection bias as

those who are not a household head but are earning income (like students in a student dorm, but

working part-time) might be left out. More of this issue is discussed below.

Findings

1. As the histogram attached show, the NonWageIncome is highly censored and clustered to the left, just as expected. Also as expected, the Log of Non-Wage Income is more evenly distributed, approaching normal

2. As expressed previously, R2 does not directly apply to Non-Linear models. Also, as there is clustering in this data, the Partial R2 (McFadden) is not appropriate, thus it’s not shown in the results from Stata.

3. The log-likelihood result (-49649.405) is very far from 0, thus it can be concluded that this is not a good fitting model. The correlation of the predicted y with the data also shows this with a 0.5004 result, which when squared gives just a 0.2504 squared error component. Thus the fit is very loose, although it’s better that the regular, level Non-Wage Income.

4. Another test for fit applied was the AIC and BIC, with both being almost within 10 points of each other, meaning that there is probably no difference between them. Both are very large, pointing to the possibility of overfitting variables.

5. The Rho of the test (result of the Likelihood Ratio Test), is very small, very close to 0, meaning that fitting less variables and running the Tobit would probably give about the same results.

6. In this kind of model the marginal effects are the same as the coefficients. With that consideration, this model shows that: being a woman has a -15.25% on having Non-Wage Income, that each year extra education adds 22.87% to Non Wage Income, that being single reduces it by -74.63%, that being other than white reduces non-wage income, however in different proportions: being black has a negative -27.34% effect, being latino is negative but less so with -7.30% effect, while the rest of the races have a -52.18% effect. This last part makes a bit of sense as the “rest” of the races includes many

6

migrant people that might not be able to adjust to the components of non-wage income such as government transfers, retirement accounts, or even have a property or asset that would create rent or appreciate. Finally, the test shows that non-wage income increases considerably with age, confirming an early suspicion that Non-Wage income should increase with age as more social nets are activated (elderly, disabled, etc)

7. The same model was run for NonWageIncome at levels, with a much worse fit: extreme AIC/BIC, Correlation of 15.89% and squared errors of 0.025249, thus a pretty bad fit.

8. In both models the Chi-Square value is 0, thus rejecting the hypothesis that the regressors are 0, at least 1 of them is >0

9. Wald tests were performed on all group variables, and were statistically significative, leading to better prediction

10. It would have been important for this particular report to test this model against Multinomial Logit or Conditional Logit, or a similar selection model such as the Generalized Tobit.

11. The margins at the means were taken as shown in the annex. The results display education as the most significant regressor on the expected observed effect on NonWage Income, followed by age. This finding makes sense as education is both a process that takes time (so its effects are similar to those of age), plus it has a continuous return on investment through time. The conclusion seems to be that to retire into a comfortable life, the best investment to make today is to study, followed maybe by getting married.

12. The model here presented was re-done this time for Wage Income. As wage income cannot be negative, the natural log could be computed without transformation. However the sample size was reduced by those with an income of 0. Doing the Xttobit again with a censoring at 0, the results are quite contrasting with Non-Wage income: Age is more relevant to having Wage Income (finding a job) in the first three age segments (from less than 35 to 54), education has a relevant but small role at just 18.5% per year of education, being a woman represented a disadvantage of 27.66% in wages, while being single was penalized with 76.71% less. Race had other interesting results: being black meant having 38.40% less wage income, thus implying that they receive more government help later on due to the much less pronounced index for Non-Wage Income. Being latino represented a loss of under 7%, at 6.98% of wages, very similar to the Non-Wage coefficient. The staggering change is in other races: while belonging to this group meant a disadvantage of over 52% in Non-Wage income, the gap is less pronounced here with just a 20.17% disadvantage.

13. The model above estimated had a much better fit than the Non-Wage model, with a Log Likelihood of about half, a 2/3 value of the AIC/BIC, and a 57.67% correlation, or 33.25% squared error

14. This topic points towards the possibility of doing more research on likely missing variables in the case of Non-Wage income, such as Health, Political Status (which would be complicated as the electoral system in America does not allow for a direct correlation of population numbers to political importance of a geographic group); or even things like probability of serving in a war or an armed conflict, or working for the government in general

15. Also, it would be important to research the possibility of there being individual specific effects that are not being properly accounted for, such as risk-aversion, propensity to save, or skills that would make people go towards for more advanced education and that could represent greater Non-Wage Incomes later in their lives.

7

Histogram of NonWincome: Highly clustered and censored to the left

Histogram of the log of Nong-Wage Income: More normally distributed

Histogram of Ln on Wage Income

0

5.0

e-0

81

.0e-0

71

.5e-0

72

.0e-0

72

.5e-0

7

Den

sity

0 5.00e+07 1.00e+08 1.50e+08NonWincome

0

.05

.1.1

5.2

.25

Den

sity

-10 0 10 20 30LnNonWincome

0.1

.2.3

.4

Den

sity

0 5 10 15 20LnWage

8

STATA OUTPUT

name: <unnamed>

log: \\Client\C$\Users\User\OneDrive\Microeconometrics\Project\TobitFinalLog.smcl

log type: smcl

opened on: 9 Dec 2016, 16:26:53

. do "C:\Users\ecalle1\AppData\Local\Temp\560\STD00000000.tmp"

. clear

. import excel "\\Client\C$\Users\User\OneDrive\Microeconometrics\Project\Excel for project 1.

> xlsx", sheet("SCFP2013") firstrow

. ***raw data summary

. summarize INCOME WAGEINC i.HHSEX AGE EDUC i.MARRIED i.RACE

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

INCOME | 30075 856289.2 6014268 0 1.80e+08

WAGEINC | 30075 122615.7 652381.2 0 2.24e+07

2.HHSEX | 30075 .2365752 .4249863 0 1

AGE | 30075 51.75328 16.17062 18 95

-------------+--------------------------------------------------------

EDUC | 30075 13.96359 2.699456 -1 17

2.MARRIED| 30075 .3733998 .483715 0 1

|

RACE |

2 | 30075 .1240565 .3296515 0 1

-------------+--------------------------------------------------------

3 | 30075 .0924688 .2896914 0 1

5 | 30075 .0478803 .2135165 0 1

. *** Compensate survey weights

. drop if Y1<0 & YY1<0 & WGT<0

(0 observations deleted)

. ** CREATING NON-WAGE INCOME

. gen NonWincome = BUSSEFARMINC+ INTDIVINC+ KGINC+ SSRETINC+

TRANSFOTHINC+ PENACCTWD

. drop if BUSSEFARMINC==0 & INTDIVINC==0 & KGINC==0 & SSRETINC==0 &

TRANSFOTHINC==0 & PENACCTW

> D==0

(7056 observations deleted)

. ** CREATING LOG NON WAGE INCOME

9

. gen NonW600 = (NonWincome * 600)

. gen LnNonWincome = ln(NonW600)

(426 missing values generated)

. *** Histograms

. histogram NonWincome

(bin=43, start=-517411.03, width=3613662.6)

. histogram LnNonWincome

(bin=43, start=-6.0322866, width=.72761192)

. *** Tabulate variables

. tabulate HHSEX

HHSEX | Freq. Percent Cum.

------------+-----------------------------------

1 | 17,564 76.30 76.30

2 | 5,455 23.70 100.00

------------+-----------------------------------

Total | 23,019 100.00

. tabulate AGECL

AGECL | Freq. Percent Cum.

------------+-----------------------------------

1 | 2,810 12.21 12.21

2 | 3,470 15.07 27.28

3 | 4,745 20.61 47.90

4 | 5,349 23.24 71.13

5 | 4,015 17.44 88.57

6 | 2,630 11.43 100.00

------------+-----------------------------------

Total | 23,019 100.00

. tabulate EDCL

EDCL | Freq. Percent Cum.

------------+-----------------------------------

1 | 1,941 8.43 8.43

2 | 5,801 25.20 33.63

3 | 3,751 16.30 49.93

4 | 11,526 50.07 100.00

------------+-----------------------------------

Total | 23,019 100.00

10

. tabulate MARRIED

MARRIED | Freq. Percent Cum.

------------+-----------------------------------

1 | 14,680 63.77 63.77

2 | 8,339 36.23 100.00

------------+-----------------------------------

Total | 23,019 100.00

. tabulate RACE

RACE | Freq. Percent Cum.

------------+-----------------------------------

1 | 17,719 76.98 76.98

2 | 2,624 11.40 88.37

3 | 1,601 6.96 95.33

5 | 1,075 4.67 100.00

------------+-----------------------------------

Total | 23,019 100.00

. *** Setting the variables in Panel Form (Age has to be categorical to be used this way)

. xtset HHSEX

panel variable: HHSEX (unbalanced)

. xtset AGECL

panel variable: AGECL (unbalanced)

. xtset EDUC

panel variable: EDUC (unbalanced)

. xtset MARRIED

panel variable: MARRIED (unbalanced)

. xtset RACE

panel variable: RACE (unbalanced)

. **** Applying Xttobit

. xttobit LnNonWincome i.AGECL EDUC i.HHSEX i.MARRIED i.RACE

Obtaining starting values for full model:

Iteration 0: log likelihood = -49649.407


Fitting full model:

11



Random-effects tobit regression Number of obs = 22593

Group variable: RACE Number of groups = 4

Random effects u_i ~ Gaussian Obs per group: min = 1035

avg = 5648.3

max = 17378

Integration method: mvaghermite Integration points = 12

Wald chi2(11) = 7548.31

Log likelihood = -49649.405 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

LnNonWincome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

AGECL |

2 | .7066101 .056232 12.57 0.000 .5963975 .8168228

3 | 1.302946 .0530967 24.54 0.000 1.198879 1.407014

4 | 1.980092 .0521607 37.96 0.000 1.877859 2.082325

5 | 2.647917 .0549011 48.23 0.000 2.540313 2.755522

6 | 2.785868 .0606315 45.95 0.000 2.667033 2.904704

|

EDUC | .2287489 .0059158 38.67 0.000 .2171541 .2403437

2.HHSEX | -.1524588 .048946 -3.11 0.002 -.2483912 -.0565265

2.MARRIED | -.7462958 .0432882 -17.24 0.000 -.8311391 -.6614524

|

RACE |

2 | -.2733963 .0482893 -5.66 0.000 -.3680417 -.1787509

3 | -.0729652 .0602913 -1.21 0.226 -.191134 .0452036

5 | -.5217904 .0702695 -7.43 0.000 -.6595161 -.3840647

_cons | 12.20378 .0981247 124.37 0.000 12.01146 12.3961

-------------+----------------------------------------------------------------

/sigma_u | 2.37e-19 .0144931 0.00 1.000 -.028406 .028406

/sigma_e | 2.178461 .0102482 212.57 0.000 2.158374 2.198547

-------------+----------------------------------------------------------------

rho | 1.19e-38 1.45e-21 0 1

------------------------------------------------------------------------------

Observation summary: 0 left-censored observations

22593 uncensored observations

0 right-censored observations

12

. *** Test with AIC and BIC

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 22593 . -49649.41 14 99326.81 99439.17

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note

. *** Predict yhat

. predict yhat

(option xb assumed; fitted values)

. *** Correlate with model to test fit

. correlate LnNonWincome yhat

(obs=22593)

| LnNonW~e yhat

-------------+------------------

LnNonWincome | 1.0000

yhat | 0.5004 1.0000

*** Estimate Marginal change

margins, atmeans

Adjusted predictions Number of obs = 22593

Model VCE : OIM

Expression : Linear prediction, predict()

at : 1.AGECL = .1228257 (mean)

2.AGECL = .1495596 (mean)

3.AGECL = .2021423 (mean)

4.AGECL = .2325499 (mean)

5.AGECL = .1768247 (mean)

6.AGECL = .1160979 (mean)

EDUC = 14.10278 (mean)

1.HHSEX = .7612092 (mean)

2.HHSEX = .2387908 (mean)

1.MARRIED = .6347541 (mean)

2.MARRIED = .3652459 (mean)

1.RACE = .7691763 (mean)

2.RACE = .1148143 (mean)

13

3.RACE = .0701987 (mean)

5.RACE = .0458106 (mean)

------------------------------------------------------------------------------

| Delta-method

| Margin Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_cons | 16.68156 .0144931 1151.00 0.000 16.65315 16.70996

------------------------------------------------------------------------------

. *** Wald Tests

. test 2.AGECL 3.AGECL 4.AGECL 5.AGECL 6.AGECL

( 1) [LnNonWincome]2.AGECL = 0





chi2( 5) = 3786.96

Prob > chi2 = 0.0000

. test 2.HHSEX

( 1) [LnNonWincome]2.HHSEX = 0

chi2( 1) = 9.70

Prob > chi2 = 0.0018

. test EDUC

( 1) [LnNonWincome]EDUC = 0

chi2( 1) = 1495.15

Prob > chi2 = 0.0000

. test 2.MARRIED

( 1) [LnNonWincome]2.MARRIED = 0

chi2( 1) = 297.22

Prob > chi2 = 0.0000

. test 2.RACE 3.RACE 5.RACE

( 1) [LnNonWincome]2.RACE = 0

14



chi2( 3) = 80.15

Prob > chi2 = 0.0000

. *** Run Xttobit for regular non-wage income (To compare)

. xttobit NonWincome i.AGECL i.RACE EDU i.MARRIED i.HHSEX




Fitting full model:






avg = 5754.8

max = 17719


Wald chi2(11) = 596.46


------------------------------------------------------------------------------

NonWincome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

AGECL |

2 | 66153.51 146599.6 0.45 0.652 -221176.4 353483.4

3 | 574665.5 138139.2 4.16 0.000 303917.7 845413.2

4 | 791156.9 136203 5.81 0.000 524203.9 1058110

5 | 1181212 143680.9 8.22 0.000 899602.1 1462821

6 | 863541.6 158901.4 5.43 0.000 552100.5 1174983

|

RACE |

2 | -263855.5 126283.5 -2.09 0.037 -511366.6 -16344.5

3 | -121636.8 157746 -0.77 0.441 -430813.3 187539.7

5 | -457657.6 181501.1 -2.52 0.012 -813393.2 -101922.1

|

15

EDUC | 223992.1 15444.48 14.50 0.000 193721.5 254262.7

2.MARRIED | -646968.5 113092.4 -5.72 0.000 -868625.6 -425311.5

2.HHSEX | -233173.2 127978.4 -1.82 0.068 -484006.2 17659.9

_cons | -2536490 256595.3 -9.89 0.000 -3039408 -2033573

-------------+----------------------------------------------------------------

/sigma_u | 1.80e-12 37774.37 0.00 1.000 -74036.4 74036.4

/sigma_e | 5731132 26710.51 214.56 0.000 5678781 5783484

-------------+----------------------------------------------------------------

rho | 9.82e-38 4.13e-21 0 1

------------------------------------------------------------------------------




. *** Test with AIC and BIC

. estat ic


-----------------------------------------------------------------------------


-------------+---------------------------------------------------------------

. | 23019 . -390871 14 781769.9 781882.5

-----------------------------------------------------------------------------


. *** Predict Y for Xttobit with non Wage Income

. predict wagehat


. *** test fit of model for Non-Wage Income

. correlate wagehat NonWincome

(obs=23019)

| wagehat NonWin~e

-------------+------------------

wagehat | 1.0000

NonWincome | 0.1589 1.0000

. log close

name: <unnamed>

log: \\Client\C$\Users\User\OneDrive\Microeconometrics\Project\TobitFinalLog.smcl

log type: smcl

closed on: 9 Dec 2016, 17:31:04

Extra Model Estimation: Xttobit of Ln of Wage Income:

16

xttobit LnWage i.AGECL EDUC i.HHSEX i.MARRIED i.RACE, ll(0)




Fitting full model:






avg = 5472.3

max = 15862


Wald chi2(11) = 10906.49


------------------------------------------------------------------------------

LnWage | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

AGECL |

2 | .512951 .0231636 22.14 0.000 .4675513 .5583508

3 | .6474464 .0221438 29.24 0.000 .6040453 .6908476

4 | .6041509 .0233908 25.83 0.000 .5583058 .649996

5 | .1406824 .0302148 4.66 0.000 .0814624 .1999024

6 | .026963 .0531135 0.51 0.612 -.0771375 .1310635

|

EDUC | .1850258 .003127 59.17 0.000 .1788969 .1911546

2.HHSEX | -.2765807 .0260926 -10.60 0.000 -.3277213 -.2254401

2.MARRIED| -.7671472 .0223455 -34.33 0.000 -.8109436 -.7233508

|

RACE |

2 | -.3840131 .024279 -15.82 0.000 -.4315991 -.336427

3 | -.0698663 .0267616 -2.61 0.009 -.1223179 -.0174146

5 | -.2017611 .0336816 -5.99 0.000 -.2677758 -.1357465

|

_cons | 8.266862 .0489288 168.96 0.000 8.170963 8.362761

-------------+----------------------------------------------------------------

/sigma_u | 7.25e-22 .0074408 0.00 1.000 -.0145837 .0145837

/sigma_e | 1.100865 .0052615 209.23 0.000 1.090553 1.111177

-------------+----------------------------------------------------------------

rho | 4.34e-43 8.90e-24 0 1

------------------------------------------------------------------------------


17



. estat ic


-----------------------------------------------------------------------------


-------------+---------------------------------------------------------------

. | 21889 . -33162.6 14 66353.19 66465.1

-----------------------------------------------------------------------------


. predict yhat


. correlate yhat LnWage

(obs=21889)

| yhat LnWage

-------------+------------------

yhat | 1.0000

LnWage | 0.5767 1.0000

Analysis of the Propensity to Earn Non-Wage Income in America

Economy & Finance

Transcript of Analysis of the Propensity to Earn Non-Wage Income in America