N ational T ransfer A ccounts Data and Estimation Issues May 15, 2009 Sang-Hyop Lee University of...
-
Upload
philomena-stone -
Category
Documents
-
view
213 -
download
0
Transcript of N ational T ransfer A ccounts Data and Estimation Issues May 15, 2009 Sang-Hyop Lee University of...
NNational ational TTransfer ransfer AAccountsccounts
Data Data and Estimation and Estimation IssuesIssues
May 15, 2009May 15, 2009
Sang-Hyop LeeSang-Hyop Lee
University of Hawaii at ManoaUniversity of Hawaii at Manoa
National Transfer Accounts
DataData Sets for Statistical Sets for Statistical AnalysisAnalysis► Cross sectionCross section► Time seriesTime series► Cross section time series; useful for analysisCross section time series; useful for analysis of of
aggregate variables (over time or by cohort).aggregate variables (over time or by cohort).► Panel (longitudinal)Panel (longitudinal)
Repeated cross-section design: most commonRepeated cross-section design: most common Rotating panel design (Cote d’Ivore 1985 data)Rotating panel design (Cote d’Ivore 1985 data) Supplemental cross-section design (Kenya & Supplemental cross-section design (Kenya &
Tanzania 1982/83 data, Malaysia and Indonesia LS)Tanzania 1982/83 data, Malaysia and Indonesia LS)► Cross section with retrospective informationCross section with retrospective information► Micro vs. MacroMicro vs. Macro
National Transfer Accounts
Quality of Survey DataQuality of Survey Data
►Constructing NTA requires individual or Constructing NTA requires individual or household micro survey data sets.household micro survey data sets.
►A good survey data set has the A good survey data set has the properties ofproperties of Extent (richnessExtent (richness): it has the ): it has the variablesvariables of of
interest at a certain level of detail.interest at a certain level of detail. ReliabilityReliability: the variables are : the variables are measured measured
without errorwithout error.. ValidityValidity: the data set is : the data set is representativerepresentative..
National Transfer Accounts
Data Problem (An example)Data Problem (An example)►FIES (64,433 household with FIES (64,433 household with
233,225 individuals)233,225 individuals) Measured for only urban areaMeasured for only urban areas s
(Valid?)(Valid?) No single person householdNo single person households (Valid?)s (Valid?) No information oNo information onn income for family income for family
owned businessowned business (Rich?) (Rich?) Measured for Measured for up to 8 household up to 8 household
membersmembers: (Reliable? Valid?): (Reliable? Valid?)
National Transfer Accounts
Extent (Richness):Extent (Richness):Missing/Change of VariablesMissing/Change of Variables
►Missing variablesMissing variables Not measured or Not measured or measured for a measured for a
certain groupcertain group Labor portion of self-employLabor portion of self-employmentment
incomeincome►Change of variables over timeChange of variables over time
Institutional/policy changeInstitutional/policy change New consumption items, new jobs, etcNew consumption items, new jobs, etc
►Change of survey Change of survey instrument/collapsinginstrument/collapsing
National Transfer Accounts
Reliability: Reliability: Measurement Measurement ErrorError
► ResponseResponse/reporting/reporting error error RRespondents do not know what is requiredespondents do not know what is required IIncentive to understatencentive to understate/overstate/overstate RRecall biasecall bias: related with period of survey: related with period of survey Using wrong/different reporting unitsUsing wrong/different reporting units HHeapingeaping OOutliersutliers
► Coding errorCoding error, top coding, missing observation, top coding, missing observation► Overestimate/Overestimate/uunderestimatenderestimate
Parents do not reportParents do not report/register/register their children until the their children until the children have namechildren have name
Detect by checking survival rate of single ageDetect by checking survival rate of single agess► Discrepancy between aggregate and Discrepancy between aggregate and sum of sum of
individual valueindividual valuess
National Transfer Accounts
Validity: Validity: CensorCensoringing
►Selection based on characteristicsSelection based on characteristics►Censoring Censoring due to the time of surveydue to the time of survey
Duration of unemploymentDuration of unemployment (left and right (left and right censoring)censoring)
Completed years of schoolingCompleted years of schooling
►Attrition (Attrition (ppanel data)anel data)
National Transfer Accounts
Categorical/Qualitative Categorical/Qualitative VariablesVariables
►Converting categorical to single Converting categorical to single continuous variablescontinuous variables Grouped by age (population, public Grouped by age (population, public
education consumption)education consumption) Income category (FPL)Income category (FPL)
► Inconsistency over timeInconsistency over time►Categorical Categorical continuous, and vice continuous, and vice
versaversa
National Transfer Accounts
Units, Real vs. NominalUnits, Real vs. Nominal► Be cBe careful about the reporting unitareful about the reporting unit
MeasurMeasurementement units units Reporting period units (reference period, Reporting period units (reference period,
seasonal fluctuation, recall bias) seasonal fluctuation, recall bias)
► Nominal vs. RealNominal vs. Real Aggregation across itemsAggregation across items Quality change (e.g. computer)Quality change (e.g. computer) Where inflation is a substantial problem Where inflation is a substantial problem
National Transfer Accounts
Solution for Solution for Missing VariablesMissing Variables► Missing is not usually zero!!Missing is not usually zero!!► Ignore itIgnore it;; random non-response random non-response► Give upGive up;; find other source of data find other source of data sets sets► ImputeImpute; “missing does not mean zero”.; “missing does not mean zero”.
Based on their characteristics or mean valueBased on their characteristics or mean value Based on the value of other peer groupBased on the value of other peer group Modified zero order regressions (y on x)Modified zero order regressions (y on x)
- Create dummy variable for missing Create dummy variable for missing variables of x (z)variables of x (z)
- Replace missing variable with 0 (x’)Replace missing variable with 0 (x’)- Regress y on x’ and z, rather than y on xRegress y on x’ and z, rather than y on x
National Transfer Accounts
Households vs. IndividualsHouseholds vs. Individuals
► In NTA, estimates such as consumption and In NTA, estimates such as consumption and labor income should belabor income should be individual level, while individual level, while inter-household transfer is household level.inter-household transfer is household level.
► But a lot of data are gathered from householdBut a lot of data are gathered from household Allocating household consumption Allocating household consumption
(income) to individual household members (income) to individual household members is a critical part of estimation is a critical part of estimation
Adjusting using aggregate (Adjusting using aggregate (mmacro) controlacro) control
National Transfer Accounts
Headship (Thailand, 1996)Headship (Thailand, 1996)
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80 90+
Age
Hea
dsh
ip R
ate
(per
cen
t)
Economic Head
Self-reported Head
`
National Transfer Accounts
MeasuringMeasuring Consumption Consumption
► Underestimation: e.g. British FESUnderestimation: e.g. British FES Using aggregate control mitigates the Using aggregate control mitigates the
problem.problem.
► Home produced items: both income and Home produced items: both income and consumption.consumption.
► Allocation across individuals is difficult Allocation across individuals is difficult ► Estimating Estimating some some profilesprofiles, such as health , such as health
expenditure is also difficult,expenditure is also difficult, part partlyly due due to various sourceto various sourcess of financing. of financing.
National Transfer Accounts
Measuring IncomeMeasuring Income
► ““All of the difficulties of measuring All of the difficulties of measuring consumption apply with greater force to the consumption apply with greater force to the measurement of income” (Deaton, p. 29).measurement of income” (Deaton, p. 29). Need detailed information on “transactions” Need detailed information on “transactions”
(inflow and outflow): an enormous task(inflow and outflow): an enormous task Incentive to understate: using aggregate control Incentive to understate: using aggregate control
mitigate the problem.mitigate the problem. Some surveys did not attempt to collect Some surveys did not attempt to collect
information on asset incomeinformation on asset income
► Allocating self-employment income across Allocating self-employment income across individuals is difficult.individuals is difficult.
National Transfer Accounts
Data Data CleaningCleaning
► Case by caseCase by case► Find out what data sets are available and Find out what data sets are available and
choose the best onechoose the best one► Detect outliers and examine them Detect outliers and examine them
carefullycarefully► A serious examination is required when A serious examination is required when
inflation matters to check whether actual inflation matters to check whether actual estimation process generateestimation process generatess a variable a variable
► Make variables consistentMake variables consistent► Convert categorical variableConvert categorical variabless to to
continuous variablecontinuous variabless, etc., etc.
National Transfer Accounts
Weighting Weighting and Clusteringand Clustering
► Weight should be used in the summary of Weight should be used in the summary of variables/direct variables/direct tabulation/regression/smoothing.tabulation/regression/smoothing.
► Frequency Weights; Frequency Weights; ““fwfw”” indicate replicated indicate replicated data. The weight data. The weight tells us tells us how many how many observations each observation really observations each observation really represents.represents.. tab edu [w=wgt] . tab edu [w=wgt] tab edu [fw=wgt] tab edu [fw=wgt]
► Analytic Weights; Analytic Weights; ““awaw”” are inversely are inversely proportional to the variance of an proportional to the variance of an observation. It is appropriate when you are observation. It is appropriate when you are dealing with data containing averages.dealing with data containing averages.. su edu [w=wgt] . su edu [w=wgt] su edu [aw=wgt] su edu [aw=wgt]. reg wage edu [w=wgt] . reg wage edu [w=wgt] reg wage edu reg wage edu [aw=wgt][aw=wgt]
National Transfer Accounts
Weighting and ClusteringWeighting and Clustering (cont’d)(cont’d)►Probability Weights; Probability Weights; ““pwpw”” are the are the
sample weight which is the inverse of sample weight which is the inverse of the probability that this observation was the probability that this observation was sampled.sampled.
. reg wage edu [pw=wgt] . reg wage edu [pw=wgt] reg wage reg wage edu [(a)w=wgt], robustedu [(a)w=wgt], robust
. reg wage edu [pw=wgt], cluster(hhid) . reg wage edu [pw=wgt], cluster(hhid)
reg wage edu [(a)w=wgt], reg wage edu [(a)w=wgt], cluster(hhid)cluster(hhid)
National Transfer Accounts
SmoothingSmoothing►Shows the pattern more clearly by Shows the pattern more clearly by
reducing sampling variancereducing sampling variance►Type of smoothingType of smoothing
““lowess” smoothing (Stata) does notlowess” smoothing (Stata) does not allow the incorporation of allow the incorporation of weightweight. . Expanding the data is Expanding the data is computationally burdensome.computationally burdensome.
Friedman’s super smoothing (R) does.Friedman’s super smoothing (R) does.
National Transfer Accounts
Rule of smoothingRule of smoothing► Basic components should be smoothed, but not aggregations. For Basic components should be smoothed, but not aggregations. For
example, private health consumption and public health consumption example, private health consumption and public health consumption profiles should be smoothed, but the sum of the two should not be profiles should be smoothed, but the sum of the two should not be smoothed.smoothed.
► The objective is to reduce sampling variance but not eliminate what The objective is to reduce sampling variance but not eliminate what may be “real” features of the data. may be “real” features of the data. Avoid too much smoothing (e.g.Avoid too much smoothing (e.g.,, health health consumption for old ages consumption for old ages)). Use . Use
right bandwidth (span)right bandwidth (span) Public health spending may increase dramatically when individuals reach Public health spending may increase dramatically when individuals reach
an age threshold, e.g., 65. This kind of feature of the data should not be an age threshold, e.g., 65. This kind of feature of the data should not be smoothed away.smoothed away.
Due to unusual high health consumption by newborns, we tend not to Due to unusual high health consumption by newborns, we tend not to smooth health consumption by age 0. This could be done by including smooth health consumption by age 0. This could be done by including estimated unsmoothed health consumption by newborns to the age profile estimated unsmoothed health consumption by newborns to the age profile of smoothed private health consumption by other age groups.of smoothed private health consumption by other age groups.
Only adults (usually ages 15 and older) receive income, pay income taxes Only adults (usually ages 15 and older) receive income, pay income taxes and make familial transfer outflows. Thus, when we smooth these age and make familial transfer outflows. Thus, when we smooth these age profiles, we begin smoothing from the adults, excluding those younger age profiles, we begin smoothing from the adults, excluding those younger age group who do not earn income.group who do not earn income.
However, problem arises when some beginning age group may appear to However, problem arises when some beginning age group may appear to have negative values for these variables. This could be solved by replacing have negative values for these variables. This could be solved by replacing the negative by the unsmoothed values for the beginning age group.the negative by the unsmoothed values for the beginning age group.
National Transfer Accounts
Specific commentsSpecific comments
► nta<- read.csv(“sample.csv",header=T) nta<- read.csv(“sample.csv",header=T) *Read in data. Work name is nta, data name *Read in data. Work name is nta, data name is sample.csvis sample.csv
► t1<- supsmu(nta$age,nta$yl,nta$sample, span=0.1)t1<- supsmu(nta$age,nta$yl,nta$sample, span=0.1)► t2<- supsmu(nta$age,nta$yl,nta$sample, span=0.2)t2<- supsmu(nta$age,nta$yl,nta$sample, span=0.2)► t3<- supsmu(nta$age,nta$yl,nta$sample, t3<- supsmu(nta$age,nta$yl,nta$sample,
span=“cv”)span=“cv”)
*Smooth data. Work name is t1 t2 t3.*Smooth data. Work name is t1 t2 t3.► write.csv(c(t1, t2, t3), “result.csv") write.csv(c(t1, t2, t3), “result.csv")
*Write out data using name “result" *Write out data using name “result"
National Transfer Accounts
DiscussionDiscussion► Data type/quality varies across countries.Data type/quality varies across countries.► Estimation method could vary across Estimation method could vary across
countries depending on data.countries depending on data.► However,However, some standard some standard procedureprocedure could could
be applied.be applied. Definition Definition EEstimation using weight stimation using weight
SmoothingSmoothing using weight using weight Macro Macro control control Present your work!Present your work!
If some component varIf some component variesies substantially substantially by age, then it is estimateby age, then it is estimatedd separately separately
(education, health, etc)(education, health, etc)
NNational ational TTransfer ransfer AAccountsccounts
The EndThe End