European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla...

20
European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi ([email protected]) Istat - Italy Quality Challenges in Processing Administrative Data to Produce Short-term Labour Cost Statistics Rome, 8–11 July 2008

Transcript of European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla...

Page 1: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

European Conference on Quality 2008 in Official Statistics

Session on Administrative data.

M. Carla Congia, Silvia Pacini,

Donatella Tuzi ([email protected])

Istat - Italy

Quality Challenges in Processing Administrative Data to Produce

Short-term Labour Cost Statistics

Rome, 8–11 July 2008

Page 2: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

Presentation Outlines

The Italian Oros Survey

The peculiarities of the administrative source used

The quality strategy in a context of timely and extensive use of administrative data

Final remarks

Page 3: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

Since 2003 the Italian NSI has released quarterly indicators on gross wages and total labour cost (Oros Survey) covering all size enterprises in the private non-agricultural sector. Indices are released 70 days after the end of the reference quarter.

In the past this information was monthly collected only for large firms through the Survey on Large Enterprises (> 500 employees).

The Oros Survey was planned to fill this gap in the Italian statistics, using administrative data (employees’ social contribution declarations to the National Social Security Institute - INPS) for Small and Medium Enterprises, integrated with the survey data on Large Enterprises (LES).

The Oros Survey

Nowadays, in Italy the Oros Survey is an innovative example of administrative data extensively used to produce timely business

statistics

Page 4: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

All Italian non-agricultural firms in the private sector, with at least one employee (roughly 12 million employees and 1.3 million employers per year) have to pay monthly social security contributions to INPS.

The Administrative Sources

Employers monthly declaration (DM10 form)Highly detailed grid organized in administrative codes with information on employment by type, paid days, wage bills, social contributions, credit terms and tax relieves. Each DM10 lays in more records (on average 8 records per unit). About 10 million records each month.

Transmitted 35 days after the end of the reference quarter.

INPS administrative register (AR) Contains structural information for each administrative unit (administrative id., fiscal code, name, legal form, dates of registration and cancellation, etc.). About 4 million records each quarter.

Transmitted at the end of the reference quarter.

Page 5: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

Differently from Survey data, the use of an administrative source:

reduces the financial costs of a direct collection and avoids further response burden on enterprises;

satisfies the growing demand for timely and detailed statistical information, for multiple statistical aims.

Yet, data collection is beyond the NSI control (that needs information about the quality of the administrative data used).

Strict relationships and coordination with the administrative institutions help to reduce the risks to incur in data quality problems due to the dependence from the data supplier.

In this, the Oros Survey does not differ from other register-based statistics.

Peculiarities of the Administrative Source

Page 6: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

What makes the Oros Survey peculiar with respect to other register-based statistics is its release timeliness, that obliged Istat to acquire data without any previous check and aggregation (completely raw). Unusual statistical quality aspects are implied:

the processing of a huge quantity of complex data in a very short time;

the lack of standardized metadata to translate administrative information;

the continuous changes of administrative definitions and concepts.

The acquisition of raw information allows Istat to monitor most of the processing aspects, but an hard work is needed to guarantee a high standard of quality.

A pervasive strategy of quality has been implemented, covering the whole Oros production process.

Peculiarities of the Administrative Source (2)

Page 7: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

The Quality Strategy in the Oros Production Process

Preliminary checks and retrieval of the statistical variables

Treatment of measurement errors (micro editing)

Treatment of non-response errors(imputation of temporary employment agencies)

The large firms:integration with survey data

Checks on macro data

DM10micro data

Oros Survey indicators

Metadata Database

Administrative Register (AR)

Page 8: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

The Administrative Register

The AR is used as a representation of the current population.

But:

it suffers of over-coverage problems (temporary suspensions and firm closures are under-recorded);

the economic activity code is drawn from the Italian Business Register (BR) (90% of the Oros active units);

hard work to outline the estimation frame (exclusion of units not belonging to the Oros target population);

special attention to the quality of the fiscal code as leading matching variable.

Page 9: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

Preliminary Checks and Retrieval of the Statistical Variables

preliminary checks on raw data and correction of errors on codes, record duplications, incoherencies with current legislation;

translation of the administrative data into statistical variables, through complex additions and subtractions of a huge number of wage and contribution items identified by numerous administrative codes (actually more than 5,000);

estimation of some components for which information is not available in the administrative form (e.g. Employers’ injuries insurance premium and severance payment).

Meta-information on laws, regulations, contribution rates, codes and other technical aspects of Social Security is timely collected and updated in a standardized METADATA DATABASE in-house built. It is necessary to carry out:

In this step each DM10 is reorganized in 1 record.

Page 10: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

Treatment of Measurement Errors

Once statistical data have been made available a more traditional micro editing procedure is set up…but…

…given the huge number of units, it is strongly based on selective criteria. A score function assigns to each of the 1.3 million of units the probability that an error occurs in the target variables.

Cut-off thresholds are fixed to select anomalous values, but their identification is deeply affected by the significant tails in the distribution of the target variables:

Administrative data

Session

very low per capita wages (e.g. units with only supplementary earnings);

negative per capita other labour costs (e.g. social contribution rebates).

Page 11: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

-1,350 -975 -600 -225 150 525 900 1,275 1,650 2,025 2,400 2,775 3,150 3,525 3,900 4,275 4,650 5,025 5,400 5,7750

2.5

5.0

7.5

10.0

12.5

15.0

Per capita other labour costs

%Figure 1 – Distribution of the per capita other labour costs (euro values) in the Oros manufacturing small and medium enterprises – July 2007 -

Mean= 450 Median= 430

Max= 6,900 Min= -1,350

Page 12: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

Treatment of Measurement Errors (2)Administrative

dataSession

The edit and imputation rules are based on known functional relations among the analyzed variables and are aimed at evaluating and keeping at unit record level both cross-sectional and longitudinal consistency using information on the closest months.

The number of monthly edits is generally not high but even an oversight may have a significant effect.

0.0

1.0

2.0

3.0

4.0

5.0

6.0

2005Q

1

2005Q

2

2005Q

3

2005Q

4

2006Q

1

2006Q

2

2006Q

3

2006Q

4

2007Q

1

2007Q

2

2007Q

3

Series w ith measurement error Corrected series

Quarterly changes of the Oros wage index in the Wholesale and retail trade sector (G) – In the third quarter 2007, the number of employees of a unit was affected by a measurement error: part time workers 73,000. Imputed data: 2.

Would have implied a change of 0.8% instead of 3%.

This step is mainly interactive. Given the nature of data, by experience automatic corrections are avoided

Page 13: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

In the Oros Survey non-responses are units delivering the DM10 with a delay. Nevertheless, almost the 95-98% of the Oros population is represented by the preliminary administrative data.

Given the tested MAR nature of the missing units and their limited number in the preliminary data, they do not significantly affect the Oros wage and other labour cost changes.

Treatment of Non-response Errors

Units referred to Temporary Employment Agencies (TEA) are an exception, because of their strong characterization.

About 100 units accounting for the 3% of total employment in the private sector (20% in sector K - Real estate, renting and business activities).

Administrative data

Session

The absence of even few of these units may significantly impact on changes of the per capita indicators

Page 14: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

The single out of TEA unit non-responses is not an easy task:

the population under study is represented by the current AR which suffers of over-coverage problems (a list of respondents is not available). It follows that the unit active status must be predicted, through a longitudinal analysis of the unit activity in the nearby quarters;

given the strong dynamic nature of TEA, an hard work is necessary to follow their frequent changes (e.g. mergers, split-ups, etc.) over time to separate real non-responses from non-active units.

Treatment of Non-response Errors (2)Administrative

dataSession

Imputation of missing data is deterministic and widely based on the use of past information on non-respondents and panel information on the current respondents.

Page 15: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

In the past integration of survey data on LE was strongly motivated by a non-significant representation of these units in the preliminary administrative data.

Nowadays the INPS source guarantees a good coverage of these units but, as experience has suggested, the use of the statistical source provides higher quality data:

enterprise recalling in case of non-responses or suspected measurement errors;

more rapid and efficient management of the frequent legal changes these units are subjected to (e.g. mergers, split-ups, acquisitions etc.).

Integration with Survey Data on Large EnterprisesAdministrative

dataSession

In the Oros estimates a special attention is given to Large Enterprises (firms with more than 500 employees - LE). In the Italian non-agricultural sector LE account for about 1000 units employing 2 million workers.

Page 16: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

Combining Survey and administrative data, specific quality aspects are involved :

harmonisation of variables;

record matching: the fiscal code is the main linking variable, but ambiguities may happen because of formal errors or different updating time in the two sources (mergers, hive-offs, split-ups might be recorded in several periods). Big efforts are aimed at avoiding omissions and duplications, using supplementary information (legal name, number of employees etc.).

Integration with Survey Data on Large Enterprises (2)Administrative

dataSession

About 12% of LES employment is manually reviewed and matched to the correspondent administrative firms.

Page 17: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

Checks on Macro Data

Final checks on macro data are a key step in the quality target to identify possible residual errors that may affect the estimates. These checks are mainly based on:

analytic and graphical inspection of the time series at a sub-population detail: acceptance boundaries must be respected by pre-defined statistical measures;

automatic detection of outliers based on TERROR, an application of the software TRAMO-SEATS, where the detection of suspected errors is based on REG-ARIMA model estimates;

comparison with other statistical source figures (e.g. National Accounts, Indices of wages according to collective agreements, etc.);

variable relationships, whose coherence has to be guaranteed (e.g. the ratio of other labor costs on wages, etc.).

If any error is detected, a drill-down to micro data may be necessary

Page 18: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

Internal Oros Quality Reporting

The quarterly documentation and updating of the Oros production process is a fundamental task in the general strategy of quality:

metadata are archived;

methodological information is documented;

imputed data are flagged (and pre-imputation data are archived);

quality indicators on the impact of imputation are calculated.

The documentation of the Oros process guarantees its reproducibility and repeatability

Administrative data

Session

Page 19: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Q2008. Rome, 8-11 July 2008

Final Remarks

The Oros Survey was:

developed with any previous experience in the use of administrative data for the production of short term official statistics;

gradually implemented learning by doing.

High timeliness, frequent changes in Social Security laws and regulations and strongly detailed raw data imply relevant and unusual quality problems managed through:

strict relationships and coordination with the administrative institution;

pervasive quality strategy along the whole production process;

highly skilled human resources to handle the wide and non-conventional processing aspects, subjected to frequent modifications;

systematic documentation of the production steps.

Administrative data

Session

Less “standardizable” than a traditional survey quality strategy?

Page 20: European Conference on Quality 2008 in Official Statistics Session on Administrative data. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it)

Administrative data

Session

Q2008. Rome, 8-11 July 2008

References

Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D. (2004) Use of Administrative Data to produce Short Term Statistics on Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome.

Caporello G., Maravall A. (2002) A tool for quality control of time series data. Program TERROR. Bank of Spain.

Eurostat (2003) Quality assessment of administrative data for statistical purposes. Doc. Eurostat/A4/Quality/03/item6, available on the web site:http://epp.eurostat.ec.europa.eu/pls/portal/docs/PAGE/PGP_DS_QUALITY/TAB47141301/DEFINITION_2.PDF

Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, available on the web site: http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_EDIMBUS

Thank you for your attention

Donatella Tuzi

[email protected]