Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009...
-
Upload
paige-cobb -
Category
Documents
-
view
215 -
download
2
Transcript of Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009...
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009
Improving imputation methodology in the Hungarian Central Statistical Office
(HCSO)
Zoltán CserehátiHCSO Methodological Department
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 2
• 1. Introduction
• 2. The „IDPS” (Creating Integrated Data Processing System) project
• 3. Documentation scheme for imputation
• 4. Training course on imputation
• 5. Future work: Handbook on imputation
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 3
1. Introduction (1)• The work of the HCSO
Methodological department:• Our scope of processing phases:
Sampling Estimation Imputation Seasonal adjustment Data confidentiality
(list gradually widening)
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 4
1. Introduction (2)• Tools offered:
quality guidelines for the elements of value chain methodological documentation schemes good practices quality indicators methodological support training course materials quality assessing tools handbook for several phases
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 5
General issues related to non-response and imputation (1)
• Item / Unit non-response
• Non-response bias(Selecting larger samples is not a solution.)
• Alternatives: Reweighting Imputation
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 6
General issues related to non-response and imputation (2)
• What is special about imputation: There is a huge variety of imputation methods. Many of
them are quite simple and easy to implement. Unlike other methodological areas imputation is a
processing phase which is often conducted by subject matter statisticians without the supervision of methodologists.
Supposedly many of these methods could be improved.
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 7
2. The "IDPS" (Creating Integrated Data Processing System) project
Objectives (1): To develop user-friendly integrated data processing system
based on standard logic covering the widest range of surveys.
Accessible via a standard user interface and providing a clear and efficient tool for the statisticians.
Include data quality requirements and data processing procedures documented in the meta-database
To be integrated with other general purpose systems such as data entry, dissemination
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 8
Objectives (2): To develop applications or frame systems allowing
coordination and quality management in the control of processing
Direct access to data for the purpose of verification and analysis
To restructure the division of labour with the IT staff focusing on innovation, development and production quality data faster through direct data processing
• We anticipate having a (partially) working system by the end of 2010.
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 9
The organization of the IDPS project An IT company chosen by a public procurement
procedure On behalf of the HCSO:
IT Department Methodological Department Selected subject matter statisticians from all the relevant
fields. Project leadership:
Selected members of the HCSO IT Department IT company project leaders
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 10
2. IDPS (2)
Benefits: Common, integrated platform for all the surveys Less redundancy More transparent system Processes documented in a standard way Better overview of the process plans System functionalities by the hand of the user Build new data process flows more easily
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 11
2. IDPS (3)Main steps already done:
Documentation of the data process flow elements Designing a general scheme for a universal data processing
flow Identifying process stages such as editing, imputation,
outlier filtering, consistency checking, etc Identifying basic methods currently in use in the different
stages. Identifying process steps from which the individual
implementations of the methods are built from.
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 12
2. IDPS (4) Standard processes• We do not want to settle strict methodological
standards. The so-called “standards” of the IDPS system will be
optimally designed software components for implementing different algorithms and procedures which are useful as building blocks to compile the IT version of different methods.
• How does an ideal standard process look like? Small and special enough to serve as a building block Flexible and general enough Having a number of parameters for fine tuning
• As a consequence: We will face difficult trade-off situations
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 13
3.1 Documentation schemes
Affected methodological areas: Sampling Imputation Estimation and standard error calculation Seasonal adjustment and confidentiality.
Aims: to build a uniform structure for assessing to gain a better overview of the methods used by various
surveys to improve process quality.
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 14
3.2 A documentation scheme for imputation
General information treatment of item/unit non-response Imputation method applied Is there any guideline? Is the procedure documented? Place in the processing chain Software solution used Auxiliary data sources used Simple or composite method Indicate the applied method(s)
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 15
Indicate the method used for imputation! (If you apply different methods, please indicate all of them.)
Rule based Imputing methods Deductive imputation (logical reasoning) Using external rules for imputation
Model based methods Explicit models
Imputing with mean Imputing with population mean
Imputing with class means Imputing with the median Imputing using random numbers Regression based imputations Impute values using simple linear regression Imputation based on multivariate regression model Predictive mean matching Other (please specify): Implicit models
Hot deck imputation Sequential hot deck method Hierarchical hot deck method Nearest neighbour method
Cold deck imputation Other (please specify):
Time series models Sophisticated models ARIMA models Kalman filter models Simple models
Imputation methods based on population dynamics considering consecutive time periods
Imputation methods based on simple trend analysis of longer time series Imputing using neural networks Other (please specify):
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 16
4. Internal training course on imputation (1)
Concept of imputation Why imputing at all? Drawbacks and benefits of different methods How to reduce non-response bias? Basic weighting techniques Benefits of complete datasets How to organize a method building process? Use of auxiliary information
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 17
4. Internal training course on imputation (2)
Editing and imputation Basic imputation methods / examples Documentation: flow charts, algorithmic
descriptions Flagging the imputed values The place of imputation in the whole data
processing flow Imputation and outlier-filtering How to plan and assess an imputation method? Simulation studies
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 18
4. Internal training course on imputation (3)
Teamwork session: Select a practical problem and try to solve it
together in teams Share the experiences and ideas
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 19
5. Conclusion, future work (1)• Compiling a handbook on imputation
(For internal use in the HCSO): Recommended methods with application areas Detailed guidelines: how to build an imputation method Highlighting current best practices Practical advices, focusing on issues related to Hungarian
specialities • Using the experiences of
The work on the IDPS system The feedbacks from the training course The information collected by the documentation scheme
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 20
5. Conclusion, future work (2)• International background material including:
ONS paper: „Report on the Task Force on Imputation”
Statistics Canada Quality Guidelines The results of the
EUREDIT project EDIMBUS project
• Implementing to the special needs of the HCSO • (In the area of seasonal adjustment a similar work has
been already finished)
Improving imputation methodology
in the Hungarian Central Statistical Office (HCSO)
NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 21
ReferencesThe results of the EUREDIT project:
http://www.cs.york.ac.uk/euredit/results/results.htmlThe results of the EDIMBUS project:
http://edimbus.istat.it/EDIMBUS1/The ONS paper: Report on the Task Force on Imputation (June
1996) GSS Methodology SeriesStatistics Canada Quality Guidelines (Fourth Edition 2003)Quality Guidelines of the HCSO (Legal Act 2007)Hungarian Central Statistical Office: Strategy 2005-2008, pages
26-27.Csereháti, Z. (2006) Multiple Donor Imputation Techniques,
Paper for the European Conference on Quality in Survey Statistics, Cardiff, 24-26 April 2006