Analysis of the Propensity to Earn Non-Wage Income in America
-
Upload
emilio-jose-calle-celi -
Category
Economy & Finance
-
view
17 -
download
0
Transcript of Analysis of the Propensity to Earn Non-Wage Income in America
Analysis of the Propensity
to Earn Non-Wage
Income in America
2016
MICROECONOMETRICS FINAL PROJECT EMILIO JOSE CALLE
HOPKINS AAP | December 2016
1
Question of Interest
Differences in income have continuously being studied based on gender, race and educational level
in regards to wage discrimination in the workplace. But what happens in non-wage income, such
as earnings from dividends, capital gains, rents, and interests? Are the gaps (gender, race,
education) just as wide or are they even wider, especially in a country such as the US where this
non-wage income can make a pretty substantive share of the total income of a household? In a year
filled with racial and gender divides, the sources on inequality in the country represent a greater
interest now than ever before.
Literature Review
The first report reviewed was “Expanding the Economic Base Model to Include Nonwage
Income”, by Katherine Nesse from Kansas State University. This report was particularly
interesting as it detailed both where some of the most important sources of nonwage income were
coming from, and the economic impact they are having. Specifically it mentions how retirement
income and medicare/Medicaid income (both government transfers) had an impact on
employment, production, and jobs in places such as Florida that contain a high proportion of
recipients due to old-age people retiring there as well as stationed military.
Another paper reviewed was “Accounting for Changes in Income Inequality: Decomposition
Analyses for the UK, 1978-2008”, by Mike Brewer and Liam Wren-Lewis, that is basically a
British recollection of the same phenomenon of retirees and people on government support in
Britain, but in particular in the more affordable parts of the country which include Scotland and
Ireland.
“The effects of economic growth on income inequality in the US” by Amir Rubina and Dan Segalb.
This article is quite relevant because it points to other sources of income growth for American not
related either to wages nor to transfers, such as the growth of wealth due to the rising value of Real
Estate in the US, non-farm land, and other related investment activities. Even though the article
does not say so, these conclusions match what Piketty has presented in his book “Capital in the
21st Centrury, where he also mentions that nor wages nor stocks nor financial instruments but that
real estate has been the biggest driver of wealth creation in developed nations in the last century.
Other documents reviewed include: “Reformas de las pensiones públicas y privadas en España”
from Rosado Cebrian, y Alonso Fernandez; that had a much higher focus on people living longer
and having to do so with shrinking pensions (it would seem like nonwage income turning negative
up to a point); Economic Growth and Income Inequality in the Asia-Pacific Region: A
Comparative Study of China, Japan, South Korea, and the United States” by Yiwen Yang and
Theresa Greaney, which offers interesting facts but it’s a bit too wide for this essay that is more
focused on the reality of the United States.
Describe the Data Source
The source to be used in this study is the “2013 Survey of Consumer Finances”, developed and
published by the Board of Governors of the Federal Reserve System. According to the Fed, the
SCF is the most recent survey conducted. The SCF is normally a triennial cross-sectional survey
of U.S. families. The survey data include information on families’ balance sheets, pensions,
income, and demographic characteristics. Information is also included from related surveys of
pension providers and the earlier such surveys conducted by the Federal Reserve Board. No other
2
study for the country collects comparable information. Therefore, although this survey presents
data from the 2013, the data points about income are from 2012 results, gathered through 5 years,
thus it will be transformed into panel data using Xtset. Stata describes this dataset, before any
editing, this way:obs 30,075/vars:324 /size: 31,789,275
The data from the survey is presented in 2010 dollars according to the source. It’s also useful for
this particular research that the variables that are going to be analyzed (Age, Gender, Education,
Race and Marital Status), are registered in categorical variables such as age of the household
head1:<35, 2:35-44, 3:45-54, 4:55-64, 5:65-74, 6:>=7. Race is defined with 1=white non-
Hispanic, 2=black/African-American, 3=Hispanic, 5=other; and finally education is defined as
1=no high school diploma/GED, 2=high school diploma or GED, 3=some college, 4=college
degree; 1=married/living with partner, 2=neither married nor living with partner. However, Age
and Education are also presented in numerical values. The initial Tobit results are going to be
presented both ways (numerical and categorical, whenever possible).
As it’s not necessary for this project to go beyond the presented variables, and as the dataset is too
heavy to be manipulated comfortably using Stata, it has been cut to include just the aforementioned
variables for the purposes of this study, plus the identification variables YY1, Y1 and WGT for
survey weighting purposes. Reviewing the SAS codebook used for this survey, the proposed
weighting by the authors was that only the data points where Y1, YY1 and WGT were positive
(>0) together should be used. This was the first test performed on the dataset and was found that
all datapoints were compliant, possibly
Proposed empirical framework
Continuing with the survey weighting situation, a problem with the data given by the Federal
Reserve is that when working with SCF data, standard error calculations can overestimate the
reliability of regressions and other statistics unless two other kinds of error are accounted for:
imputation error and sample variability error. One way to account for this would be to use the
Replicate Weight files provided by the Fed, plus specific stata code that is suggested. However
this weighting problem mainly affects standard errors due to imputation error as describe by the
survey authors: “Missing data in the SCF are imputed 5 times, meaning that each SCF family has
5 separate observations (called “implicates”) in the final data.”. This situation, added to the
complications of fixing the data, prompts to look for a simpler solution, one that relies less on the
precision of standard errors. One way to do this is by using Non-Linear Square Estimators, as there
is a considerable technical difference when analyzing the squared errors, as: “For linear models,
the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS
Total. This seems quite logical. The variance that the regression model accounts for plus the error
variance adds up to equal the total variance. Further, R-squared equals SS Regression / SS Total,
which mathematically must produce a value between 0 and 100%. In nonlinear regression, SS
Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear
models, and it no longer has to be between 0 and 100%.” (from Minitab Blog). The sample
variability problem, that results from using a sample, rather than a population, can also partially
be compensated with a Non-Linear model and the fact that the sample size is fairly large, even
after constraining the variables.
The Tobit model is a Non-Linear Least Square Estimator, that belongs to the Selection models
criteria, and it’s used as a Maximum Likelihood Method. Jay Steward, from the US Bureau of
Labor Statistics best describes the reason why the Tobit model has become so popular in cases
3
with censored data in the following way: “Tobit has been the predominant approach in more-recent
studies. The Tobit model would seem to be a sensible approach, because it was developed
specifically for situations where the dependent variable is truncated at zero or some other cutoff.
The standard discussion of the Tobit model (Tobin, 1958) assumes that there is a latent variable
(for example, desired expenditures) underlying the observed dependent variable (actual
expenditures). The two are equal when the latent variable is greater than zero, but the observed
variable is zero when the latent variable is negative.”. (BLS Working papers, November 2009,
“Tobit or not Tobit?”).
Another reason for using a Non-Linear LS estimation approach is that if the depend variable is
constrained, and there’s clustering of the data at the constraint, using OLS on the complete sample
would be biased and inconsistent, while performing OLS on the unclustered part would be biased
and inconsistent as well. As this research is on Non-Wage Income, all the wage income has been
initially suppressed. Also, the creation of the Non-Wage variable has to be done, as it’s not part of
the original survey. Here it’s important to notice that Non-Wage income can be negative, as some
of it’s components can turn negative such as government transfers and business farm income. Thus
a straight Income-Wage Income approach is not necessarily the ideal formula to use. Instead, the
variable NonWageIncome (NonWincome as expressed) was composed by adding 6 variables:
Farm Income + Dividend Income + Government Transfers income + Social Security Income +
Retirement Account income + Appreciation Income (KG Income income from asset
appreciation, gold, etc).
As the values can be negative depending on where they fall in the distribution, it’s important to
separate those who do not receive any other income but wage. In this case this was done by
dropping those datapoints where all these 6 variables were equal to 0 at the same time, thus
guaranteeing that Non-Wage Income was indeed from having no income and not from arithmetic
error. This censoring reduced the observations by 7056, reducing the final sample to 23,019.
After compensating for these data situations, the plan is to apply a Tobit regression both at the
levels and logs of NonWage income, using the variables as Panel Data and using Xttobit. Results
are going to be presented as both a numerical value and a propensity. Afterwards, the fit of the
model is going to be analyzed by computing the fitted y’s and correlating them with the variable.
Also, the statistical significance of each category is going to be analyzed. The marginal effects at
the means will be computed. It’s important to notice that, as there are negative numbers in Non-
Wage Income, it was necessary to do a transformation of the data to be able to log it. This is so
because the values, including the negative ones make sense as well (negative non-wage income
might be a discriminatory motivated factor, for example). Finally, the Xttobit command has the
advantage of including es the overall and panel-level variance components (labeled sigma e and
sigma u, respectively) together with rho, which is the percent contribution to the total variance of
the panel-level variance component. When rho is zero, the panel-level variance component is
unimportant, and the panel estimator is not different from the pooled estimator. A likelihood-ratio
test of this is included at the bottom of the output. This test formally compares the pooled estimator
(tobit) with the panel estimator. Also it’s important to notice that Stata rejects Age in numerical
form for the xtTobit regression, thus it’s done in categorical form.
The exclusionary restriction in this report is then not having non-wage income.
4
Descriptive Statistics
Raw data, before any transformation
Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- INCOME | 30075 856289.2 6014268 0 1.80e+08 WAGEINC | 30075 122615.7 652381.2 0 2.24e+07 2.HHSEX | 30075 .2365752 .4249863 0 1 AGE | 30075 51.75328 16.17062 18 95 -------------+-------------------------------------------------------- EDUC | 30075 13.96359 2.699456 -1 17 2.MARRIED| 30075 .3733998 .483715 0 1 RACE | 2 | 30075 .1240565 .3296515 0 1 -------------+-------------------------------------------------------- 3 | 30075 .0924688 .2896914 0 1 5 | 30075 .0478803 .2135165 0 1
The results are that total income has a mean of $856,289.20 yet the wage income has a mean of
$122,615.70, pointing towards a big inequality in earnings. The standard deviations results are not
reliable for the reasons presented before in the previous sections. Regarding the other variables, it
can be seen that on average the houselholds are headed by women 23.66% of the time, that 37.34%
are single, the average age of the household head in this sample is 51.75 years, has almost 14 years
of education (about an associates degree), and are identified as “white” by 73.55%.
NonWincome
Percentiles Smallest
1% -2312.117 -517411
5% 202.9063 -517411
10% 1318.891 -517411 Obs 23019
25% 7913.345 -517411 Sum of Wgt. 23019
50% 22319.69 Mean 895536.8
Largest Std. Dev. 5805035
75% 92084.58 1.34e+08
90% 933368.9 1.38e+08 Variance 3.37e+13
95% 3094321 1.38e+08 Skewness 13.7253
99% 1.93e+07 1.55e+08 Kurtosis 238.8792
The Non-Wage Income variable is heavily skewed and has a high kurtosis. This could come as an
alert that the Tobit MLE might not be the best model for this. One option would be to see if these
characteristics persist by ignoring the observations at 0 and below. This was done, but the results
are basically the same. However this report will also transform NonWageIncome by adding
600.000, and then logging it. This way no value below zero is not ignored, and the log can be
5
taken. Working with logs instead of levels reduces both the Skewness and the Kurtosis
significantly as shown below, thus this report will focus on using it. See Annex.
LnNonWincome
Percentiles Smallest
1% 13.30082 11.32163
5% 13.30502 11.32163
10% 13.30688 11.32163 Obs 23019
25% 13.31779 11.32163 Sum of Wgt. 23019
50% 13.34121 Mean 13.58935
Largest Std. Dev. .6784092
75% 13.44746 18.71591
90% 14.24298 18.74466 Variance .460239
95% 15.12231 18.74993 Skewness 3.637583
99% 16.80503 18.86196 Kurtosis 18.2104
Dependent Variable: Log of Non-Wage Income (Transformed)
The Skewness and the Kurtosis found can be the result of this survey focusing on Household
Heads, meaning you must be the top-earner in a home. This might be causing a Selection bias as
those who are not a household head but are earning income (like students in a student dorm, but
working part-time) might be left out. More of this issue is discussed below.
Findings
1. As the histogram attached show, the NonWageIncome is highly censored and clustered to the left, just as expected. Also as expected, the Log of Non-Wage Income is more evenly distributed, approaching normal
2. As expressed previously, R2 does not directly apply to Non-Linear models. Also, as there is clustering in this data, the Partial R2 (McFadden) is not appropriate, thus it’s not shown in the results from Stata.
3. The log-likelihood result (-49649.405) is very far from 0, thus it can be concluded that this is not a good fitting model. The correlation of the predicted y with the data also shows this with a 0.5004 result, which when squared gives just a 0.2504 squared error component. Thus the fit is very loose, although it’s better that the regular, level Non-Wage Income.
4. Another test for fit applied was the AIC and BIC, with both being almost within 10 points of each other, meaning that there is probably no difference between them. Both are very large, pointing to the possibility of overfitting variables.
5. The Rho of the test (result of the Likelihood Ratio Test), is very small, very close to 0, meaning that fitting less variables and running the Tobit would probably give about the same results.
6. In this kind of model the marginal effects are the same as the coefficients. With that consideration, this model shows that: being a woman has a -15.25% on having Non-Wage Income, that each year extra education adds 22.87% to Non Wage Income, that being single reduces it by -74.63%, that being other than white reduces non-wage income, however in different proportions: being black has a negative -27.34% effect, being latino is negative but less so with -7.30% effect, while the rest of the races have a -52.18% effect. This last part makes a bit of sense as the “rest” of the races includes many
6
migrant people that might not be able to adjust to the components of non-wage income such as government transfers, retirement accounts, or even have a property or asset that would create rent or appreciate. Finally, the test shows that non-wage income increases considerably with age, confirming an early suspicion that Non-Wage income should increase with age as more social nets are activated (elderly, disabled, etc)
7. The same model was run for NonWageIncome at levels, with a much worse fit: extreme AIC/BIC, Correlation of 15.89% and squared errors of 0.025249, thus a pretty bad fit.
8. In both models the Chi-Square value is 0, thus rejecting the hypothesis that the regressors are 0, at least 1 of them is >0
9. Wald tests were performed on all group variables, and were statistically significative, leading to better prediction
10. It would have been important for this particular report to test this model against Multinomial Logit or Conditional Logit, or a similar selection model such as the Generalized Tobit.
11. The margins at the means were taken as shown in the annex. The results display education as the most significant regressor on the expected observed effect on NonWage Income, followed by age. This finding makes sense as education is both a process that takes time (so its effects are similar to those of age), plus it has a continuous return on investment through time. The conclusion seems to be that to retire into a comfortable life, the best investment to make today is to study, followed maybe by getting married.
12. The model here presented was re-done this time for Wage Income. As wage income cannot be negative, the natural log could be computed without transformation. However the sample size was reduced by those with an income of 0. Doing the Xttobit again with a censoring at 0, the results are quite contrasting with Non-Wage income: Age is more relevant to having Wage Income (finding a job) in the first three age segments (from less than 35 to 54), education has a relevant but small role at just 18.5% per year of education, being a woman represented a disadvantage of 27.66% in wages, while being single was penalized with 76.71% less. Race had other interesting results: being black meant having 38.40% less wage income, thus implying that they receive more government help later on due to the much less pronounced index for Non-Wage Income. Being latino represented a loss of under 7%, at 6.98% of wages, very similar to the Non-Wage coefficient. The staggering change is in other races: while belonging to this group meant a disadvantage of over 52% in Non-Wage income, the gap is less pronounced here with just a 20.17% disadvantage.
13. The model above estimated had a much better fit than the Non-Wage model, with a Log Likelihood of about half, a 2/3 value of the AIC/BIC, and a 57.67% correlation, or 33.25% squared error
14. This topic points towards the possibility of doing more research on likely missing variables in the case of Non-Wage income, such as Health, Political Status (which would be complicated as the electoral system in America does not allow for a direct correlation of population numbers to political importance of a geographic group); or even things like probability of serving in a war or an armed conflict, or working for the government in general
15. Also, it would be important to research the possibility of there being individual specific effects that are not being properly accounted for, such as risk-aversion, propensity to save, or skills that would make people go towards for more advanced education and that could represent greater Non-Wage Incomes later in their lives.
7
Histogram of NonWincome: Highly clustered and censored to the left
Histogram of the log of Nong-Wage Income: More normally distributed
Histogram of Ln on Wage Income
0
5.0
e-0
81
.0e-0
71
.5e-0
72
.0e-0
72
.5e-0
7
Den
sity
0 5.00e+07 1.00e+08 1.50e+08NonWincome
0
.05
.1.1
5.2
.25
Den
sity
-10 0 10 20 30LnNonWincome
0.1
.2.3
.4
Den
sity
0 5 10 15 20LnWage
8
STATA OUTPUT
name: <unnamed>
log: \\Client\C$\Users\User\OneDrive\Microeconometrics\Project\TobitFinalLog.smcl
log type: smcl
opened on: 9 Dec 2016, 16:26:53
. do "C:\Users\ecalle1\AppData\Local\Temp\560\STD00000000.tmp"
. clear
. import excel "\\Client\C$\Users\User\OneDrive\Microeconometrics\Project\Excel for project 1.
> xlsx", sheet("SCFP2013") firstrow
. ***raw data summary
. summarize INCOME WAGEINC i.HHSEX AGE EDUC i.MARRIED i.RACE
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
INCOME | 30075 856289.2 6014268 0 1.80e+08
WAGEINC | 30075 122615.7 652381.2 0 2.24e+07
2.HHSEX | 30075 .2365752 .4249863 0 1
AGE | 30075 51.75328 16.17062 18 95
-------------+--------------------------------------------------------
EDUC | 30075 13.96359 2.699456 -1 17
2.MARRIED| 30075 .3733998 .483715 0 1
|
RACE |
2 | 30075 .1240565 .3296515 0 1
-------------+--------------------------------------------------------
3 | 30075 .0924688 .2896914 0 1
5 | 30075 .0478803 .2135165 0 1
. *** Compensate survey weights
. drop if Y1<0 & YY1<0 & WGT<0
(0 observations deleted)
. ** CREATING NON-WAGE INCOME
. gen NonWincome = BUSSEFARMINC+ INTDIVINC+ KGINC+ SSRETINC+
TRANSFOTHINC+ PENACCTWD
. drop if BUSSEFARMINC==0 & INTDIVINC==0 & KGINC==0 & SSRETINC==0 &
TRANSFOTHINC==0 & PENACCTW
> D==0
(7056 observations deleted)
. ** CREATING LOG NON WAGE INCOME
9
. gen NonW600 = (NonWincome * 600)
. gen LnNonWincome = ln(NonW600)
(426 missing values generated)
. *** Histograms
. histogram NonWincome
(bin=43, start=-517411.03, width=3613662.6)
. histogram LnNonWincome
(bin=43, start=-6.0322866, width=.72761192)
. *** Tabulate variables
. tabulate HHSEX
HHSEX | Freq. Percent Cum.
------------+-----------------------------------
1 | 17,564 76.30 76.30
2 | 5,455 23.70 100.00
------------+-----------------------------------
Total | 23,019 100.00
. tabulate AGECL
AGECL | Freq. Percent Cum.
------------+-----------------------------------
1 | 2,810 12.21 12.21
2 | 3,470 15.07 27.28
3 | 4,745 20.61 47.90
4 | 5,349 23.24 71.13
5 | 4,015 17.44 88.57
6 | 2,630 11.43 100.00
------------+-----------------------------------
Total | 23,019 100.00
. tabulate EDCL
EDCL | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,941 8.43 8.43
2 | 5,801 25.20 33.63
3 | 3,751 16.30 49.93
4 | 11,526 50.07 100.00
------------+-----------------------------------
Total | 23,019 100.00
10
. tabulate MARRIED
MARRIED | Freq. Percent Cum.
------------+-----------------------------------
1 | 14,680 63.77 63.77
2 | 8,339 36.23 100.00
------------+-----------------------------------
Total | 23,019 100.00
. tabulate RACE
RACE | Freq. Percent Cum.
------------+-----------------------------------
1 | 17,719 76.98 76.98
2 | 2,624 11.40 88.37
3 | 1,601 6.96 95.33
5 | 1,075 4.67 100.00
------------+-----------------------------------
Total | 23,019 100.00
. *** Setting the variables in Panel Form (Age has to be categorical to be used this way)
. xtset HHSEX
panel variable: HHSEX (unbalanced)
. xtset AGECL
panel variable: AGECL (unbalanced)
. xtset EDUC
panel variable: EDUC (unbalanced)
. xtset MARRIED
panel variable: MARRIED (unbalanced)
. xtset RACE
panel variable: RACE (unbalanced)
. **** Applying Xttobit
. xttobit LnNonWincome i.AGECL EDUC i.HHSEX i.MARRIED i.RACE
Obtaining starting values for full model:
Iteration 0: log likelihood = -49649.407
Iteration 1: log likelihood = -49649.405
Fitting full model:
11
Iteration 0: log likelihood = -49649.405
Iteration 1: log likelihood = -49649.405
Random-effects tobit regression Number of obs = 22593
Group variable: RACE Number of groups = 4
Random effects u_i ~ Gaussian Obs per group: min = 1035
avg = 5648.3
max = 17378
Integration method: mvaghermite Integration points = 12
Wald chi2(11) = 7548.31
Log likelihood = -49649.405 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
LnNonWincome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
AGECL |
2 | .7066101 .056232 12.57 0.000 .5963975 .8168228
3 | 1.302946 .0530967 24.54 0.000 1.198879 1.407014
4 | 1.980092 .0521607 37.96 0.000 1.877859 2.082325
5 | 2.647917 .0549011 48.23 0.000 2.540313 2.755522
6 | 2.785868 .0606315 45.95 0.000 2.667033 2.904704
|
EDUC | .2287489 .0059158 38.67 0.000 .2171541 .2403437
2.HHSEX | -.1524588 .048946 -3.11 0.002 -.2483912 -.0565265
2.MARRIED | -.7462958 .0432882 -17.24 0.000 -.8311391 -.6614524
|
RACE |
2 | -.2733963 .0482893 -5.66 0.000 -.3680417 -.1787509
3 | -.0729652 .0602913 -1.21 0.226 -.191134 .0452036
5 | -.5217904 .0702695 -7.43 0.000 -.6595161 -.3840647
_cons | 12.20378 .0981247 124.37 0.000 12.01146 12.3961
-------------+----------------------------------------------------------------
/sigma_u | 2.37e-19 .0144931 0.00 1.000 -.028406 .028406
/sigma_e | 2.178461 .0102482 212.57 0.000 2.158374 2.198547
-------------+----------------------------------------------------------------
rho | 1.19e-38 1.45e-21 0 1
------------------------------------------------------------------------------
Observation summary: 0 left-censored observations
22593 uncensored observations
0 right-censored observations
12
. *** Test with AIC and BIC
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 22593 . -49649.41 14 99326.81 99439.17
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note
. *** Predict yhat
. predict yhat
(option xb assumed; fitted values)
. *** Correlate with model to test fit
. correlate LnNonWincome yhat
(obs=22593)
| LnNonW~e yhat
-------------+------------------
LnNonWincome | 1.0000
yhat | 0.5004 1.0000
*** Estimate Marginal change
margins, atmeans
Adjusted predictions Number of obs = 22593
Model VCE : OIM
Expression : Linear prediction, predict()
at : 1.AGECL = .1228257 (mean)
2.AGECL = .1495596 (mean)
3.AGECL = .2021423 (mean)
4.AGECL = .2325499 (mean)
5.AGECL = .1768247 (mean)
6.AGECL = .1160979 (mean)
EDUC = 14.10278 (mean)
1.HHSEX = .7612092 (mean)
2.HHSEX = .2387908 (mean)
1.MARRIED = .6347541 (mean)
2.MARRIED = .3652459 (mean)
1.RACE = .7691763 (mean)
2.RACE = .1148143 (mean)
13
3.RACE = .0701987 (mean)
5.RACE = .0458106 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 16.68156 .0144931 1151.00 0.000 16.65315 16.70996
------------------------------------------------------------------------------
. *** Wald Tests
. test 2.AGECL 3.AGECL 4.AGECL 5.AGECL 6.AGECL
( 1) [LnNonWincome]2.AGECL = 0
( 2) [LnNonWincome]3.AGECL = 0
( 3) [LnNonWincome]4.AGECL = 0
( 4) [LnNonWincome]5.AGECL = 0
( 5) [LnNonWincome]6.AGECL = 0
chi2( 5) = 3786.96
Prob > chi2 = 0.0000
. test 2.HHSEX
( 1) [LnNonWincome]2.HHSEX = 0
chi2( 1) = 9.70
Prob > chi2 = 0.0018
. test EDUC
( 1) [LnNonWincome]EDUC = 0
chi2( 1) = 1495.15
Prob > chi2 = 0.0000
. test 2.MARRIED
( 1) [LnNonWincome]2.MARRIED = 0
chi2( 1) = 297.22
Prob > chi2 = 0.0000
. test 2.RACE 3.RACE 5.RACE
( 1) [LnNonWincome]2.RACE = 0
14
( 2) [LnNonWincome]3.RACE = 0
( 3) [LnNonWincome]5.RACE = 0
chi2( 3) = 80.15
Prob > chi2 = 0.0000
. *** Run Xttobit for regular non-wage income (To compare)
. xttobit NonWincome i.AGECL i.RACE EDU i.MARRIED i.HHSEX
Obtaining starting values for full model:
Iteration 0: log likelihood = -390870.96
Iteration 1: log likelihood = -390870.96
Fitting full model:
Iteration 0: log likelihood = -390870.96
Iteration 1: log likelihood = -390870.96
Random-effects tobit regression Number of obs = 23019
Group variable: RACE Number of groups = 4
Random effects u_i ~ Gaussian Obs per group: min = 1075
avg = 5754.8
max = 17719
Integration method: mvaghermite Integration points = 12
Wald chi2(11) = 596.46
Log likelihood = -390870.96 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
NonWincome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
AGECL |
2 | 66153.51 146599.6 0.45 0.652 -221176.4 353483.4
3 | 574665.5 138139.2 4.16 0.000 303917.7 845413.2
4 | 791156.9 136203 5.81 0.000 524203.9 1058110
5 | 1181212 143680.9 8.22 0.000 899602.1 1462821
6 | 863541.6 158901.4 5.43 0.000 552100.5 1174983
|
RACE |
2 | -263855.5 126283.5 -2.09 0.037 -511366.6 -16344.5
3 | -121636.8 157746 -0.77 0.441 -430813.3 187539.7
5 | -457657.6 181501.1 -2.52 0.012 -813393.2 -101922.1
|
15
EDUC | 223992.1 15444.48 14.50 0.000 193721.5 254262.7
2.MARRIED | -646968.5 113092.4 -5.72 0.000 -868625.6 -425311.5
2.HHSEX | -233173.2 127978.4 -1.82 0.068 -484006.2 17659.9
_cons | -2536490 256595.3 -9.89 0.000 -3039408 -2033573
-------------+----------------------------------------------------------------
/sigma_u | 1.80e-12 37774.37 0.00 1.000 -74036.4 74036.4
/sigma_e | 5731132 26710.51 214.56 0.000 5678781 5783484
-------------+----------------------------------------------------------------
rho | 9.82e-38 4.13e-21 0 1
------------------------------------------------------------------------------
Observation summary: 0 left-censored observations
23019 uncensored observations
0 right-censored observations
. *** Test with AIC and BIC
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 23019 . -390871 14 781769.9 781882.5
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note
. *** Predict Y for Xttobit with non Wage Income
. predict wagehat
(option xb assumed; fitted values)
. *** test fit of model for Non-Wage Income
. correlate wagehat NonWincome
(obs=23019)
| wagehat NonWin~e
-------------+------------------
wagehat | 1.0000
NonWincome | 0.1589 1.0000
. log close
name: <unnamed>
log: \\Client\C$\Users\User\OneDrive\Microeconometrics\Project\TobitFinalLog.smcl
log type: smcl
closed on: 9 Dec 2016, 17:31:04
Extra Model Estimation: Xttobit of Ln of Wage Income:
16
xttobit LnWage i.AGECL EDUC i.HHSEX i.MARRIED i.RACE, ll(0)
Obtaining starting values for full model:
Iteration 0: log likelihood = -33162.597
Iteration 1: log likelihood = -33162.596
Fitting full model:
Iteration 0: log likelihood = -33162.596
Iteration 1: log likelihood = -33162.596
Random-effects tobit regression Number of obs = 21889
Group variable: RACE Number of groups = 4
Random effects u_i ~ Gaussian Obs per group: min = 1160
avg = 5472.3
max = 15862
Integration method: mvaghermite Integration points = 12
Wald chi2(11) = 10906.49
Log likelihood = -33162.596 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
LnWage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
AGECL |
2 | .512951 .0231636 22.14 0.000 .4675513 .5583508
3 | .6474464 .0221438 29.24 0.000 .6040453 .6908476
4 | .6041509 .0233908 25.83 0.000 .5583058 .649996
5 | .1406824 .0302148 4.66 0.000 .0814624 .1999024
6 | .026963 .0531135 0.51 0.612 -.0771375 .1310635
|
EDUC | .1850258 .003127 59.17 0.000 .1788969 .1911546
2.HHSEX | -.2765807 .0260926 -10.60 0.000 -.3277213 -.2254401
2.MARRIED| -.7671472 .0223455 -34.33 0.000 -.8109436 -.7233508
|
RACE |
2 | -.3840131 .024279 -15.82 0.000 -.4315991 -.336427
3 | -.0698663 .0267616 -2.61 0.009 -.1223179 -.0174146
5 | -.2017611 .0336816 -5.99 0.000 -.2677758 -.1357465
|
_cons | 8.266862 .0489288 168.96 0.000 8.170963 8.362761
-------------+----------------------------------------------------------------
/sigma_u | 7.25e-22 .0074408 0.00 1.000 -.0145837 .0145837
/sigma_e | 1.100865 .0052615 209.23 0.000 1.090553 1.111177
-------------+----------------------------------------------------------------
rho | 4.34e-43 8.90e-24 0 1
------------------------------------------------------------------------------
Observation summary: 0 left-censored observations
17
21889 uncensored observations
0 right-censored observations
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 21889 . -33162.6 14 66353.19 66465.1
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note
. predict yhat
(option xb assumed; fitted values)
. correlate yhat LnWage
(obs=21889)
| yhat LnWage
-------------+------------------
yhat | 1.0000
LnWage | 0.5767 1.0000