Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009...

21
Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) Zoltán Csereháti HCSO Methodological Department

Transcript of Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009...

Page 1: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009

Improving imputation methodology in the Hungarian Central Statistical Office

(HCSO)

Zoltán CserehátiHCSO Methodological Department

Page 2: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 2

• 1. Introduction

• 2. The „IDPS” (Creating Integrated Data Processing System) project

• 3. Documentation scheme for imputation

• 4. Training course on imputation

• 5. Future work: Handbook on imputation

Page 3: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 3

1. Introduction (1)• The work of the HCSO

Methodological department:• Our scope of processing phases:

Sampling Estimation Imputation Seasonal adjustment Data confidentiality

(list gradually widening)

Page 4: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 4

1. Introduction (2)• Tools offered:

quality guidelines for the elements of value chain methodological documentation schemes good practices quality indicators methodological support training course materials quality assessing tools handbook for several phases

Page 5: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 5

General issues related to non-response and imputation (1)

• Item / Unit non-response

• Non-response bias(Selecting larger samples is not a solution.)

• Alternatives: Reweighting Imputation

Page 6: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 6

General issues related to non-response and imputation (2)

• What is special about imputation: There is a huge variety of imputation methods. Many of

them are quite simple and easy to implement. Unlike other methodological areas imputation is a

processing phase which is often conducted by subject matter statisticians without the supervision of methodologists.

Supposedly many of these methods could be improved.

Page 7: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 7

2. The "IDPS" (Creating Integrated Data Processing System) project

Objectives (1): To develop user-friendly integrated data processing system

based on standard logic covering the widest range of surveys.

Accessible via a standard user interface and providing a clear and efficient tool for the statisticians.

Include data quality requirements and data processing procedures documented in the meta-database

To be integrated with other general purpose systems such as data entry, dissemination

Page 8: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 8

Objectives (2): To develop applications or frame systems allowing

coordination and quality management in the control of processing

Direct access to data for the purpose of verification and analysis

To restructure the division of labour with the IT staff focusing on innovation, development and production quality data faster through direct data processing

• We anticipate having a (partially) working system by the end of 2010.

Page 9: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 9

The organization of the IDPS project An IT company chosen by a public procurement

procedure On behalf of the HCSO:

IT Department Methodological Department Selected subject matter statisticians from all the relevant

fields. Project leadership:

Selected members of the HCSO IT Department IT company project leaders

Page 10: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 10

2. IDPS (2)

Benefits: Common, integrated platform for all the surveys Less redundancy More transparent system Processes documented in a standard way Better overview of the process plans System functionalities by the hand of the user Build new data process flows more easily

Page 11: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 11

2. IDPS (3)Main steps already done:

Documentation of the data process flow elements Designing a general scheme for a universal data processing

flow Identifying process stages such as editing, imputation,

outlier filtering, consistency checking, etc Identifying basic methods currently in use in the different

stages. Identifying process steps from which the individual

implementations of the methods are built from.

Page 12: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 12

2. IDPS (4) Standard processes• We do not want to settle strict methodological

standards. The so-called “standards” of the IDPS system will be

optimally designed software components for implementing different algorithms and procedures which are useful as building blocks to compile the IT version of different methods.

• How does an ideal standard process look like? Small and special enough to serve as a building block Flexible and general enough Having a number of parameters for fine tuning

• As a consequence: We will face difficult trade-off situations

Page 13: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 13

3.1 Documentation schemes

Affected methodological areas: Sampling Imputation Estimation and standard error calculation Seasonal adjustment and confidentiality.

Aims: to build a uniform structure for assessing to gain a better overview of the methods used by various

surveys to improve process quality.

Page 14: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 14

3.2 A documentation scheme for imputation

General information treatment of item/unit non-response Imputation method applied Is there any guideline? Is the procedure documented? Place in the processing chain Software solution used Auxiliary data sources used Simple or composite method Indicate the applied method(s)

Page 15: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 15

Indicate the method used for imputation! (If you apply different methods, please indicate all of them.)

Rule based Imputing methods Deductive imputation (logical reasoning) Using external rules for imputation

Model based methods Explicit models

Imputing with mean Imputing with population mean

Imputing with class means Imputing with the median Imputing using random numbers Regression based imputations Impute values using simple linear regression Imputation based on multivariate regression model Predictive mean matching Other (please specify): Implicit models

Hot deck imputation Sequential hot deck method Hierarchical hot deck method Nearest neighbour method

Cold deck imputation Other (please specify):

Time series models Sophisticated models ARIMA models Kalman filter models Simple models

Imputation methods based on population dynamics considering consecutive time periods

Imputation methods based on simple trend analysis of longer time series Imputing using neural networks Other (please specify):

Page 16: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 16

4. Internal training course on imputation (1)

Concept of imputation Why imputing at all? Drawbacks and benefits of different methods How to reduce non-response bias? Basic weighting techniques Benefits of complete datasets How to organize a method building process? Use of auxiliary information

Page 17: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 17

4. Internal training course on imputation (2)

Editing and imputation Basic imputation methods / examples Documentation: flow charts, algorithmic

descriptions Flagging the imputed values The place of imputation in the whole data

processing flow Imputation and outlier-filtering How to plan and assess an imputation method? Simulation studies

Page 18: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 18

4. Internal training course on imputation (3)

Teamwork session: Select a practical problem and try to solve it

together in teams Share the experiences and ideas

Page 19: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 19

5. Conclusion, future work (1)• Compiling a handbook on imputation

(For internal use in the HCSO): Recommended methods with application areas Detailed guidelines: how to build an imputation method Highlighting current best practices Practical advices, focusing on issues related to Hungarian

specialities • Using the experiences of

The work on the IDPS system The feedbacks from the training course The information collected by the documentation scheme

Page 20: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 20

5. Conclusion, future work (2)• International background material including:

ONS paper: „Report on the Task Force on Imputation”

Statistics Canada Quality Guidelines The results of the

EUREDIT project EDIMBUS project

• Implementing to the special needs of the HCSO • (In the area of seasonal adjustment a similar work has

been already finished)

Page 21: Improving imputation methodology in the Hungarian Central Statistical Office (HCSO) NTTS 2009 seminar, Bruxelles18 - 20 February 2009 Improving imputation.

Improving imputation methodology

in the Hungarian Central Statistical Office (HCSO)

NTTS 2009 seminar, Bruxelles 18 - 20 February 2009 21

ReferencesThe results of the EUREDIT project:

http://www.cs.york.ac.uk/euredit/results/results.htmlThe results of the EDIMBUS project:

http://edimbus.istat.it/EDIMBUS1/The ONS paper: Report on the Task Force on Imputation (June

1996) GSS Methodology SeriesStatistics Canada Quality Guidelines (Fourth Edition 2003)Quality Guidelines of the HCSO (Legal Act 2007)Hungarian Central Statistical Office: Strategy 2005-2008, pages

26-27.Csereháti, Z. (2006) Multiple Donor Imputation Techniques,

Paper for the European Conference on Quality in Survey Statistics, Cardiff, 24-26 April 2006