A quality monitoring system for statistics based on administrative data
description
Transcript of A quality monitoring system for statistics based on administrative data
www.statistik.at We provide information
A quality monitoring system for statistics based on administrative dataUNECE Seminar on New Frontiers for Statistical Data Collection, Geneva
Manuela LenkStatistics Austria
Registers, Classifications and Methods Division
31st Oct.– 2nd Nov. 2012
www.statistik.at slide 2 | 31.10. - 2.11.2012
Register-based census in Austria
First register-based census in Austria 2011 Full census, no sampling
Census topicsPopulation census, housing census, census of enterprises and their local units of employment
Data availability On municipality level Geo-Codes Statistical databases Interactive maps
www.statistik.at slide 3 | 31.10. - 2.11.2012
Quality assessment of the census
Application of a quality framework• The framework is independent from data processing,
allowing the application on other statistical projects• Data processes can be evaluated without influencing them
Three stages of quality evaluation• Raw data
– Registers provided by the data holders
• Central Database (CDB)– Combined information from the registers– Data is merged by a unique key
• Final Data Pool (FDP)– Final data including imputations
www.statistik.at slide 4 | 31.10. - 2.11.2012
Quality framework - Overview
www.statistik.at slide 5 | 31.10. - 2.11.2012
Quality assessment on register level I
Calculation of quality indicators• Each attribute in each register gets a quality between 0 and 1• Quality calculation is based on 3 so-called hyperdimensions
HD Documentation• Focuses on factors which possibly predetermine data quality• Realized by a questionnaire which is filled out in accordance
with the data authority• Questions are weighted by their impact on data quality
• Quality indicator: maximum obtainable scoreobtained score
www.statistik.at slide 6 | 31.10. - 2.11.2012
Quality assessment on register level II
HD Pre-processing• Detection of formal errors, like missing primary keys, out-of-
range values and item non-response• Usable records are calculated by the subtraction of erroneous
records from total records
• Quality indicator:
HD External Source• The accuracy of the data is checked• Comparison with existing representative surveys
• Quality indicator:
total number of recordsusable records
total number of linked recordsnumber of consistent values
www.statistik.at slide 7 | 31.10. - 2.11.2012
Quality framework - Overview
www.statistik.at slide 8 | 31.10. - 2.11.2012
Quality assessment of the CDB and FDP
Unique AttributesAttribute exists in only one register, directly transferred to the CDB (e.g. highest level of education)
Multiple AttributesAttribute exists in more than one register, combined in the CDB using certain decision rules (e.g. demographic attributes)
Derived AttributesAttribute is created based on other attributes (e.g. type of commuter)
Multiple Attribute
www.statistik.at slide 9 | 31.10. - 2.11.2012
Quality assessment of unique attributes
The highest level of education (EDU) is
delivered by one single register. The quality
indicator is derived by the three
hyperdimensions.
There are still missing values
(with quality=0) that decrease the quality indicator in
the CDB.
After imputations of missing values, we assess the quality indicator of the
attribute EDU in the Final Data Pool.
www.statistik.at slide 10 | 31.10. - 2.11.2012
Quality assessment of multiple attributes
SEX is available in two registers. The
attribute is evaluated in both data sources
with the three hyperdimensions.
Does the information differ between the two data sources?
Which register should we believe in? Dempster-Shafer theory
takes uncertainty, consistency and conflict into account.
www.statistik.at slide 11 | 31.10. - 2.11.2012
Quality assessment of derived attributes
There is no information on current activity status (CAS) or commuters (COM) in the raw
data. We derive the information for CAS from two other attributes in two data
sources.
We obtain the required information for COM
from the already derived attribute CAS. Thus, the quality indicator of both
attributes is equal.
Imputations are applied on CAS. The imputed values are
transferred to the COM attribute by the same
derivation process already done in the
CDB.
www.statistik.at slide 12 | 31.10. - 2.11.2012
Usability of the results
Raw data Which register delivers a certain attribute with the highest quality indicator? Is there a register with a below-average quality for all delivered attributes? Is the quality indicator of a certain attribute worse than in the last delivery?
Census Database Is there any advancement of data quality by the use of multiple data sources? Comparison with prior censuses – plausibility checks
Final Data Pool Comparison of attributes for further advancement Comparison of census generations over time
www.statistik.at slide 13 | 31.10. - 2.11.2012
Further Information Austrian Journal of Statistics, Volume 39 (2010), Number 4
• http://www.stat.tugraz.at/AJS/ausg104/104Berka.pdf
Statistica Neerlandica, Volume 66 (2012), Issue 1• http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9574.2011.00506.x/pdf
ESSnet on Data Integration 2011, Madrid• http://www.ine.es/e/essnetdi_ws2011/ppts/Lenk.pdf
ISI World Statistics Congress STS50 - Methods and quality of administrative data used in a census 2011, Dublin• http://isi2011.congressplanner.eu/pdfs/650199.pdf
NTTS Conference 2011, Brussels• http://www.cros-portal.eu/sites/default/files/S13P1.pdf
UNECE/Eurostat Expert Group Meeting on Register-Based Censuses 2010, The Hague• http://live.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.41/2010/wp.4.e.pdf
European Conference on Quality in Official Statistics 2010, Helsinki• http://q2010.stat.fi/media//presentations/session-26/fiedler_quality-in-official-statistics_statisticsaustria_paper.pdf
European Conference on Quality in Official Statistics, June 2012• http://www.q2012.gr/articlefiles/sessions/21.2_Manuela%20Lenk%20_A%20quality%20monitoring%20system.pdf
www.statistik.at slide 14 | 31.10. - 2.11.2012
Please address queries to:Manuela Lenk
Register based census
Contact information:Guglgasse 13, 1110 Viennaphone: +43 (1) 71128-8283
fax: +43 (1) [email protected]
A quality monitoring system for statistics based on administrative dataUNECE Seminar on New Frontiers for Statistical Data Collection, Geneva