Post on 11-Jan-2016
1st. STATA Group Meeting Mexico
Discussion of user-written Stata programs
Predicting counterfactual densities with the DFL Ado-file:A pertinent constructive critique.
Luis Huesca ReynosoCentro de Investigación en Alimentación y Desarrollo, A.C.Department of Economics. Email: lhuesca@ciad.mx
April 23, 2009, Universidad Iberoamericana Campus Mexico.
It is not an easy task dealing with distributions (and so with densities!)
Problems to face:A. Scale: log or numeric.B. Comparisson: Unit of measurement (in economics and social sciencies: constant
prices, others.C. Selection of the right window width (eye-ball sight or the optimal) –check out for
instance bandw by Salgado-Ugarte, Shimizu and Taniuchi-D. Joint: Compute them toghether (see for instance nbins or # of grid points in
akdensity).
STATA makes it easier!
Goal.-The estimation of kernel density functions and counterfactuals well dimensioned with a semiparametric technique:
Estimate densities that stands for obtaining the real shape not only for the total distribution but also for a number of subgroups belonging to the former.
Probability density function (PDF)
Any function, f(y) can serve as a density function as long as:
and
By definition, the sum of the PDF must add to one as so for the Gaussian or any other nice kernel functions (Duclos, 2001 & Silverman, 1986) –Epanechnikov, biweight , triangular, cosine kernels for instance-.
A general kernel function K(u) to weight the density must then be,
1)(
dxyf
,0)( yf y
1)(
duuK 1)(ˆ
dyyfSince then
Kernel density estimation: Letting the data speak by themselves as follows:
hYy
Kh
yf in
i 1
1)(ˆ
With as a vector of earnings, h the optimal window width and K a Gaussian kernel function.
),,( 1 nyyy
Following Jenkins and Van Kerm, (2005) for decompositions:
)()(1
yfyf kK
k
k
kkf
as a weigthed sum of the FDPs for each sub-group k, where stands for the population share of the group k, and as the PDF of the group k.
- In the empirical example an adaptive kernel estimator is used (Van Kerm, 2003).
Dinardo, Fortin, Lemiux (1996)
Counterfactual estimation compares the objective variable (depvar) distribution to the depvar distribution that would have prevailed if they had been paid like the comparison group (the counterpart).
dxAsxhxyfdyyf AA )|()|()( dxBsxhxyfdyyf BB )|()|()(
Actual dxBsxhxyfyf BBB ||
dxAsxhxyfyf BBA ||Counterfactual
dxxhxh
AsxhxyfAB
AB ||
Which can be computed using Bayes’ theorem:
The conditional treatment probability – propensity score – is estimated by the program under a especification using a logistic regression (DFL command shifts to probit as well). For comparisson I use the pscore ado file written by Becker & Ichino (2002) which follows the neirest neighbour technique.
Actual wage distributions for A and B
dxBsxhxywf B ||
)(1
)|(1
|
AP
APxAP
xAP
w
DFL (1996) rewrite and reweigh the density for B as follows:
In Stata:w = 1-Prob(Depvar=1)/Prob(Depvar=0)
Empirical case: (A semi-parametric-approach)
Estimation of the mexican earnings distribution and decompositions by sub-population of workers in the formal and informal sectors (compliance with social security coverage).
(Let’s assume that self-selection bias does not affect individual decisions of worker’s location). Models are estimated separately for each category.
Logit has a practical advantage over probit when the sum of predicted values equal to the sum of empirically observed values (Butcher and Dinardo, 1998.)
ENEU: Encuesta Nacional de Empleo Urbano (National Survey of Urban Employment).
Males aging from 16 to 65Occupations = (1 ,…, 4)
1: Formal self-employed2: Informal self-employed3: Formal wage-earners
4: Informal wage-earners
Model 1 pooled
Model 2 pooled)exp(1
)exp()(
ss
ss
x
xfSP
1. Compute the earnings distribution using DFL command.
dfl depvar indepvars [if exp] [in range] , outcome(varname) [nbins(integer) w(bandwidth) adaptive gauss quietly probit [logit default] graph(cfactual) graph_combine axis_selection_options axis_scale_options title_options
dfl informal esc eda eda2 jefe dmiembros dwmenor drama1 drama3 /// drama4 dregion1 dregion2 dregion3 dregion4 dregion6 ///if sex==1 & logitp>=1 & logitp<=2, outcome(logwm) nbins(50) /// adaptive gauss graph(cfactual)
2. Compute the earnings distribution using do-file.
pscore informalb esc eda eda2 jefe dmiembros dwmenor drama1 drama3 drama4 ///dregion1 dregion2 dregion3 dregion4 dregion6 if sex==1 & logitp>=1 & logitp<=2, /// pscore(mypscore) logit level(0.001)akdensity logwm if sex==1 & logitp==4 [aw = mypscore], gau s(i) ///gen(hai92c dhai92c) lab var dhai92c “Informal wage-earner"replace dhai92c = dhai92c*.24
Example with my do-file
Syntax
0.2
.4.6
.81
De
nsity
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Total Self-employed
Wage-earners
Decomposition of density functions for self-employedand wage earners, Mexico 1992
Figure 1.
DFL commandDo file reescaled
0.5
11.
5D
ens
ity
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males
-1-.
50
.5D
iffer
ence
in D
ensi
ties
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
-1-.
50
.5D
ifere
nci
a e
n d
en
sid
ad
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
0.5
11
.52
De
nsi
ty
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males
Figure 2. Wage-earners in Mexico working in a formal world, 1992.
-.1
-.0
50
.05
.1D
ifere
nci
a e
n d
en
sid
ad
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
0.1
.2.3
.4D
en
sity
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males
Do file reescaled adjusting ranges
Figure 2a. Wage-earners in Mexico working in a formal world, 1992.
0.2
.4.6
.81
De
nsi
ty
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males
-.1
-.0
50
.05
.1D
iffe
ren
ce in
De
nsi
ties
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
0.2
.4.6
.81
De
nsi
ty
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males-.
1-.
05
0.0
5.1
Dife
ren
cia
en
de
nsi
da
d
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
DFL commandDo file reescaled
Figure 3. Self-employed in Mexico working in a formal world, 1992.
0.2
.4.6
De
nsid
ad
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Males
-.1
-.0
50
.05
.1D
ifere
ncia
en
dens
ida
d
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Differences
Do file reescaled adjusting ranges
Figure 3a. Self-employed in Mexico working in a formal world, 1992.
0.2
.4.6
.8D
ensi
ty
2 4 6 8 10 12Log of earnings (pesos 2000=100)
Factual Counterfactual
DFL command
0.2
.4.6
.81
Den
sity
2 4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Do file rescaled-.
10
.1.2
Diff
eren
ce in
Den
sitie
s
2 4 6 8 10 12Log of earnings (pesos 2000=100)
DFL command
-.1
-.05
0.0
5.1
Dife
renc
ia e
n de
nsid
ad
2 4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Do file rescaled
Figure 4. Informal self-employed males in a formal world 2002
0.2
.4.6
.81
Den
sity
2 4 6 8 10 12Log of earnings (pesos 2000=100)
Factual Counterfactual
DFL command
0.2
.4.6
.81
Den
sity
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Factual Counterfactual
Do file rescaled-.
2-.
10
.1.2
Diff
eren
ce in
Den
sitie
s
2 4 6 8 10 12Log of earnings (pesos 2000=100)
DFL command
-.2
-.1
0.1
.2D
ifere
ncia
en
dens
idad
4 6 8 10 12Log of monthly earnings (pesos 2000=100)
Do file rescaled
Figura 5. Informal wage-earner males in a formal world 2002
DFL user written command is useful just watch out when using sub-groups or log scales.
DFL (1996) use the subgroup decomposability property of the aggregate PDF.
A suggestion when computing densities, consider population shares (if necessary) to weight them.
The problem of obtaining over-dimensioned densities struggles the most when dealing with logarithmic scales for data.
For kernel densities the estimation with the adaptive technique is more time-consuming but seems to be more accurate as well (it works better without smoothing more than needed).
Adaptive kernel estimation depicts better bimodal or multimodal distributions
Conclusions :
Huesca, Luis and Mario Camberos (2009), "El mercado laboral mexicano 1992 y 2002: Un análisis contrafactual de los cambios en la informalidad", Economía Mexicana, Vol. XVIII, Núm. 1, primer semestre, pp. 5-43.
Dinardo, John, Nicole Fortin, and Thomas Lemieux (1996), “Labor Market Institutions and the Distribution of Wages, 1973-1992: A semi-parametric approach”, Econometrica, 64(5), 1001-44.
Azevedo, Joao Pedro (2005). DiNardo, Fortin and Lemieux Counterfacual Kernel Density –DFL user written command-”.
Inegi (2006), Encuesta Nacional de Empleo Urbano, 1992 and 2002, ENEU, INEGI, Ags., México, Bases de datos.
Jenkins, Stephen and Phillipe Van Kerm (2005), “Accounting for income distribution trends: A density function decomposition approach”, Journal of Economic Inequality, 3, pp. 43-62.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman and Hall. London.
Van-Kerm, Phillipe (2003), “Adaptive kernel density estimation”, -akdensity- The Stata Journal, 3(2), 148-56.
References
Duclos, Jean-Yves (2001), “Non-parametric estimation for distributive analysis”, Poverty and Equity: theory and estimation, Departament d’Economia Aplicada, Universitat Autònoma de Barcelona, mimeo, March, 37-44.
Heckman, James, Ichimura, H. and Todd, P. E. (1998), "Matching as an Econometric Evaluation Estimator", Review of Economic Studies, 65, 261-294.
Becker, Sascha O., and Andrea Ichino (2002), “Estimation of average treatment effects based on propensity scores”, The Stata Journal, 2(4), 358-377.
Butcher, K. F. and John Dinardo (1998), “The immigrant and native-born wage distributions: Evidence from united states census”, NBER Working paper No. 6630.