European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla...
-
Upload
felicity-miles -
Category
Documents
-
view
218 -
download
0
Transcript of European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla...
![Page 1: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/1.jpg)
European Conference on Quality 2008 in Official Statistics
Session on Administrative data.
M. Carla Congia, Silvia Pacini,
Donatella Tuzi ([email protected])
Istat - Italy
Quality Challenges in Processing Administrative Data to Produce
Short-term Labour Cost Statistics
Rome, 8–11 July 2008
![Page 2: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/2.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
Presentation Outlines
The Italian Oros Survey
The peculiarities of the administrative source used
The quality strategy in a context of timely and extensive use of administrative data
Final remarks
![Page 3: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/3.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
Since 2003 the Italian NSI has released quarterly indicators on gross wages and total labour cost (Oros Survey) covering all size enterprises in the private non-agricultural sector. Indices are released 70 days after the end of the reference quarter.
In the past this information was monthly collected only for large firms through the Survey on Large Enterprises (> 500 employees).
The Oros Survey was planned to fill this gap in the Italian statistics, using administrative data (employees’ social contribution declarations to the National Social Security Institute - INPS) for Small and Medium Enterprises, integrated with the survey data on Large Enterprises (LES).
The Oros Survey
Nowadays, in Italy the Oros Survey is an innovative example of administrative data extensively used to produce timely business
statistics
![Page 4: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/4.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
All Italian non-agricultural firms in the private sector, with at least one employee (roughly 12 million employees and 1.3 million employers per year) have to pay monthly social security contributions to INPS.
The Administrative Sources
Employers monthly declaration (DM10 form)Highly detailed grid organized in administrative codes with information on employment by type, paid days, wage bills, social contributions, credit terms and tax relieves. Each DM10 lays in more records (on average 8 records per unit). About 10 million records each month.
Transmitted 35 days after the end of the reference quarter.
INPS administrative register (AR) Contains structural information for each administrative unit (administrative id., fiscal code, name, legal form, dates of registration and cancellation, etc.). About 4 million records each quarter.
Transmitted at the end of the reference quarter.
![Page 5: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/5.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
Differently from Survey data, the use of an administrative source:
reduces the financial costs of a direct collection and avoids further response burden on enterprises;
satisfies the growing demand for timely and detailed statistical information, for multiple statistical aims.
Yet, data collection is beyond the NSI control (that needs information about the quality of the administrative data used).
Strict relationships and coordination with the administrative institutions help to reduce the risks to incur in data quality problems due to the dependence from the data supplier.
In this, the Oros Survey does not differ from other register-based statistics.
Peculiarities of the Administrative Source
![Page 6: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/6.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
What makes the Oros Survey peculiar with respect to other register-based statistics is its release timeliness, that obliged Istat to acquire data without any previous check and aggregation (completely raw). Unusual statistical quality aspects are implied:
the processing of a huge quantity of complex data in a very short time;
the lack of standardized metadata to translate administrative information;
the continuous changes of administrative definitions and concepts.
The acquisition of raw information allows Istat to monitor most of the processing aspects, but an hard work is needed to guarantee a high standard of quality.
A pervasive strategy of quality has been implemented, covering the whole Oros production process.
Peculiarities of the Administrative Source (2)
![Page 7: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/7.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
The Quality Strategy in the Oros Production Process
Preliminary checks and retrieval of the statistical variables
Treatment of measurement errors (micro editing)
Treatment of non-response errors(imputation of temporary employment agencies)
The large firms:integration with survey data
Checks on macro data
DM10micro data
Oros Survey indicators
Metadata Database
Administrative Register (AR)
![Page 8: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/8.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
The Administrative Register
The AR is used as a representation of the current population.
But:
it suffers of over-coverage problems (temporary suspensions and firm closures are under-recorded);
the economic activity code is drawn from the Italian Business Register (BR) (90% of the Oros active units);
hard work to outline the estimation frame (exclusion of units not belonging to the Oros target population);
special attention to the quality of the fiscal code as leading matching variable.
![Page 9: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/9.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
Preliminary Checks and Retrieval of the Statistical Variables
preliminary checks on raw data and correction of errors on codes, record duplications, incoherencies with current legislation;
translation of the administrative data into statistical variables, through complex additions and subtractions of a huge number of wage and contribution items identified by numerous administrative codes (actually more than 5,000);
estimation of some components for which information is not available in the administrative form (e.g. Employers’ injuries insurance premium and severance payment).
Meta-information on laws, regulations, contribution rates, codes and other technical aspects of Social Security is timely collected and updated in a standardized METADATA DATABASE in-house built. It is necessary to carry out:
In this step each DM10 is reorganized in 1 record.
![Page 10: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/10.jpg)
Q2008. Rome, 8-11 July 2008
Treatment of Measurement Errors
Once statistical data have been made available a more traditional micro editing procedure is set up…but…
…given the huge number of units, it is strongly based on selective criteria. A score function assigns to each of the 1.3 million of units the probability that an error occurs in the target variables.
Cut-off thresholds are fixed to select anomalous values, but their identification is deeply affected by the significant tails in the distribution of the target variables:
Administrative data
Session
very low per capita wages (e.g. units with only supplementary earnings);
negative per capita other labour costs (e.g. social contribution rebates).
![Page 11: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/11.jpg)
-1,350 -975 -600 -225 150 525 900 1,275 1,650 2,025 2,400 2,775 3,150 3,525 3,900 4,275 4,650 5,025 5,400 5,7750
2.5
5.0
7.5
10.0
12.5
15.0
Per capita other labour costs
%Figure 1 – Distribution of the per capita other labour costs (euro values) in the Oros manufacturing small and medium enterprises – July 2007 -
Mean= 450 Median= 430
Max= 6,900 Min= -1,350
![Page 12: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/12.jpg)
Q2008. Rome, 8-11 July 2008
Treatment of Measurement Errors (2)Administrative
dataSession
The edit and imputation rules are based on known functional relations among the analyzed variables and are aimed at evaluating and keeping at unit record level both cross-sectional and longitudinal consistency using information on the closest months.
The number of monthly edits is generally not high but even an oversight may have a significant effect.
0.0
1.0
2.0
3.0
4.0
5.0
6.0
2005Q
1
2005Q
2
2005Q
3
2005Q
4
2006Q
1
2006Q
2
2006Q
3
2006Q
4
2007Q
1
2007Q
2
2007Q
3
Series w ith measurement error Corrected series
Quarterly changes of the Oros wage index in the Wholesale and retail trade sector (G) – In the third quarter 2007, the number of employees of a unit was affected by a measurement error: part time workers 73,000. Imputed data: 2.
Would have implied a change of 0.8% instead of 3%.
This step is mainly interactive. Given the nature of data, by experience automatic corrections are avoided
![Page 13: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/13.jpg)
Q2008. Rome, 8-11 July 2008
In the Oros Survey non-responses are units delivering the DM10 with a delay. Nevertheless, almost the 95-98% of the Oros population is represented by the preliminary administrative data.
Given the tested MAR nature of the missing units and their limited number in the preliminary data, they do not significantly affect the Oros wage and other labour cost changes.
Treatment of Non-response Errors
Units referred to Temporary Employment Agencies (TEA) are an exception, because of their strong characterization.
About 100 units accounting for the 3% of total employment in the private sector (20% in sector K - Real estate, renting and business activities).
Administrative data
Session
The absence of even few of these units may significantly impact on changes of the per capita indicators
![Page 14: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/14.jpg)
Q2008. Rome, 8-11 July 2008
The single out of TEA unit non-responses is not an easy task:
the population under study is represented by the current AR which suffers of over-coverage problems (a list of respondents is not available). It follows that the unit active status must be predicted, through a longitudinal analysis of the unit activity in the nearby quarters;
given the strong dynamic nature of TEA, an hard work is necessary to follow their frequent changes (e.g. mergers, split-ups, etc.) over time to separate real non-responses from non-active units.
Treatment of Non-response Errors (2)Administrative
dataSession
Imputation of missing data is deterministic and widely based on the use of past information on non-respondents and panel information on the current respondents.
![Page 15: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/15.jpg)
Q2008. Rome, 8-11 July 2008
In the past integration of survey data on LE was strongly motivated by a non-significant representation of these units in the preliminary administrative data.
Nowadays the INPS source guarantees a good coverage of these units but, as experience has suggested, the use of the statistical source provides higher quality data:
enterprise recalling in case of non-responses or suspected measurement errors;
more rapid and efficient management of the frequent legal changes these units are subjected to (e.g. mergers, split-ups, acquisitions etc.).
Integration with Survey Data on Large EnterprisesAdministrative
dataSession
In the Oros estimates a special attention is given to Large Enterprises (firms with more than 500 employees - LE). In the Italian non-agricultural sector LE account for about 1000 units employing 2 million workers.
![Page 16: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/16.jpg)
Q2008. Rome, 8-11 July 2008
Combining Survey and administrative data, specific quality aspects are involved :
harmonisation of variables;
record matching: the fiscal code is the main linking variable, but ambiguities may happen because of formal errors or different updating time in the two sources (mergers, hive-offs, split-ups might be recorded in several periods). Big efforts are aimed at avoiding omissions and duplications, using supplementary information (legal name, number of employees etc.).
Integration with Survey Data on Large Enterprises (2)Administrative
dataSession
About 12% of LES employment is manually reviewed and matched to the correspondent administrative firms.
![Page 17: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/17.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
Checks on Macro Data
Final checks on macro data are a key step in the quality target to identify possible residual errors that may affect the estimates. These checks are mainly based on:
analytic and graphical inspection of the time series at a sub-population detail: acceptance boundaries must be respected by pre-defined statistical measures;
automatic detection of outliers based on TERROR, an application of the software TRAMO-SEATS, where the detection of suspected errors is based on REG-ARIMA model estimates;
comparison with other statistical source figures (e.g. National Accounts, Indices of wages according to collective agreements, etc.);
variable relationships, whose coherence has to be guaranteed (e.g. the ratio of other labor costs on wages, etc.).
If any error is detected, a drill-down to micro data may be necessary
![Page 18: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/18.jpg)
Q2008. Rome, 8-11 July 2008
Internal Oros Quality Reporting
The quarterly documentation and updating of the Oros production process is a fundamental task in the general strategy of quality:
metadata are archived;
methodological information is documented;
imputed data are flagged (and pre-imputation data are archived);
quality indicators on the impact of imputation are calculated.
The documentation of the Oros process guarantees its reproducibility and repeatability
Administrative data
Session
![Page 19: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/19.jpg)
Q2008. Rome, 8-11 July 2008
Final Remarks
The Oros Survey was:
developed with any previous experience in the use of administrative data for the production of short term official statistics;
gradually implemented learning by doing.
High timeliness, frequent changes in Social Security laws and regulations and strongly detailed raw data imply relevant and unusual quality problems managed through:
strict relationships and coordination with the administrative institution;
pervasive quality strategy along the whole production process;
highly skilled human resources to handle the wide and non-conventional processing aspects, subjected to frequent modifications;
systematic documentation of the production steps.
Administrative data
Session
Less “standardizable” than a traditional survey quality strategy?
![Page 20: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)](https://reader036.fdocuments.in/reader036/viewer/2022082422/56649eca5503460f94bd89be/html5/thumbnails/20.jpg)
Administrative data
Session
Q2008. Rome, 8-11 July 2008
References
Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D. (2004) Use of Administrative Data to produce Short Term Statistics on Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome.
Caporello G., Maravall A. (2002) A tool for quality control of time series data. Program TERROR. Bank of Spain.
Eurostat (2003) Quality assessment of administrative data for statistical purposes. Doc. Eurostat/A4/Quality/03/item6, available on the web site:http://epp.eurostat.ec.europa.eu/pls/portal/docs/PAGE/PGP_DS_QUALITY/TAB47141301/DEFINITION_2.PDF
Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, available on the web site: http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_EDIMBUS
Thank you for your attention
Donatella Tuzi