Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

21
Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute Session: Quality indicators and quality measurement of Statistical Registers 10 July 2008 Quality in statistics: the BR case

description

Quality in statistics: the BR case. Session: Quality indicators and quality measurement of Statistical Registers 10 July 2008. Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute. Quality in statistics: the BR case. - PowerPoint PPT Presentation

Transcript of Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Page 1: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Monica Consalvi – Giuseppe Garofalo – Caterina Viviano

Italian National Statistical Institute

Session: Quality indicators and quality measurement

of Statistical Registers

10 July 2008

Quality in statistics: the BR case

Page 2: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

2

Quality in

statistics: the

BR case

Business Register vs Statistical Survey

BRs are statistical products with their own specificities:

• Extensive use of Administrative data

• Heterogeneity and variability of inputs

• Heterogeneity of users

• Relevance of technological aspects

• Output specificity (dissemination of micro data)

• Continuous data updating

Page 3: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Quality in

statistics: the

BR case

Extensive use of Administrative dataThe problem of quality is set in a different context – in comparison with statistical surveys – it is resolvable only ex-post: data is known but not how it is generated

Heterogeneity and variability of inputsQuality indicators for specific subsets of units and for different variables are necessary

Relevance of technological aspectsHuge amount of data, complex procedures for data integration and methodologies application, changes over time in applied rules (e.g. changes in classification, in adm. sources contents….)

Business Register vs Statistical Survey – Quality specificities

Output specification The dissemination of micro data suggests that “errors annul each other on average” is not true anymore. With reference to BR errors add one to another (e.g. over and under coverage)

Page 4: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Heterogeneity of users• The BR’s reference universe and updating period will be

different if used for the STS rather than for SBS• If the Value Added is estimated referring to BR’s universe, the

quality (e.g. activity code and size) of large units will be fundamental.

• If the indicators of the Business Demography take the BR as reference, the quality of the smaller units will be very important.

Quality in

statistics: the

BR case

Business Register vs Statistical Survey – Quality specificities

Continuous data updating Need to identify actual and spurious changes: structural development of the economy

• demographic aspects, • changes in size • changes in economic activity

process of revision of the register• the BR may acquire data referring to a previous time • actual changes recorded at a later time• delay in recording birth/death or in recording changes in

characteristics in the administrative registers

Page 5: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

5

The BR quality indicators

The system of quality indicators refers to three dimensions:

1. The phases of the BR’s updating process

2. A framework of components of the quality

3. The factors for the building up of the indicators

Quality in

statistics: the

BR case

Page 6: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

6

The phases of the BR’s updating process

The BR is the result of a conceptual and physical integration of several administrative and statistical input sources

1)  Quality of the INPUT (input sources)

2)  Quality of the process (matching, merging, editing, updating)

3)  Quality of the OUTPUT

Quality in

statistics: The

BR case

Page 7: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

7

A framework of components of the quality

To monitor the BR quality the most frequently used

components are:

- Coverage in terms of both units and variables

- Timeliness in terms of delay in updating

- Completeness

- Accuracy

Quality in

statistics: the

BR case

Page 8: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

8

The factors for the building up of the indicatorsLa qualità del registro ASIA

A methodological process for assessing variables coming from administrative sources

Quality in

statistics: the

BR case

Five factors for defining a BR quality indicator: time, scope, subpopulation, variable and criterion

The most important factor is the criterion : a method to evaluate, unit by unit, the correctness of the variables’ values of the interest

• Compliance• Internal Consistency• Temporal Consistency• Metadata

Page 9: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

9

Criteria (1)

Quality in

statistics: the

BR case

1. Compliance The value of a unit of the BR can be considered as correct if it is sufficient “close” to the reference value (external sources).The compliance determines whether or not the BR complies with an ex. source The compliance comes close to the reliability when the real value is not known

2. Internal consistency

A value will be deemed “correct” if it is coherent in relation to other variables of the same unit.

Page 10: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

10

La qualità del registro ASIA

Criteria (2) A methodological

process for assessing variables coming from administrative sources

Quality in

statistics: the

BR case

4. Quality without ‘witness’ (use of metadata)Usage of a set of information included in the BR to measure quality without needing a reference value and with no element of comparison - variables of BR management or metadata system: validity date, estimation methodology, origin of data, data validation process.

3. Temporal consistency The quality is defined on the basis of a comparison between two values in two different periods.Big changes in short temporal lags are defined as impossible or less plausible

Page 11: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

11

Phase: Input / Component : timeliness / Factor: temp. consistency

JANFEBMAR

APRMAY

JUN

JUL

AUG

Sep

OCT

NOV

DEC

JUL

AUG

Sep

OCT

NOV

DEC

JANFEBMAR

APRMAY

JIU

Supply_2004

Supply_2005

71%

57%

Source: Social SecurityIndicator: Percentage of records with declared

employees by month

A methodological process for assessing variables coming from administrative sources

Quality in

statistics: the

BR case

Page 12: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

  Supply’s year

  2001 2002 2003 2004 2005 2006

BR reference year I(t)%

Cessationdate 2000 2001 2002 2003 2004 2005

1-[N(t+1)/N(t)

2000332.878 19 374.341 100 19 20

2001194.634 350.462 384.199 178 31 19 -9,6

200214 30.055 408.291 419.144 36 19 -2,7

2003- - 129.661 358.822 369.815 35 -3,1

2004- - 4 130.247 357.907 380.778 -6,4

2005- - - 28 79.721 384.272

Phase: Input / Component : coverage / Factor: temp. consistency

Source: Chamber of CommerceIndicator: Loss of information in dates of cessation

A methodological process for assessing variables coming from

administrative sources

Quality in

statistics: the

BR case

Page 13: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Phase: Process / Component : accuracy / Factor: metadata

Indicator: Variables Edit and Imputation

VAR INDICATOR It=2005 It=2005%

NACE

N° edit 202.333 1,85 %

N° imputation 87.628 43,31 %

N° edit without imputation 114.705 56,69 %

VAR INDICATOR It=2005 It=2005%

Empl.

N° edit 74.3120,68 %

N° imputation 72.768 97,92 %

N° edit without imputation 1.544 2,08 %

A methodological process for assessing variables coming from

administrative sources

Quality in

statistics: the

BR case

Page 14: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

Phase: Process / Component : accuracy / Factor: int. consistency

Source: Tax AuthorityIndicator: out-of-date classification

INDICATOR It=2005 It=2005% Var_I[t-(t-1)]

N° record with out-of-date classification that are not decoded using NACE Rev 1.1

725.697 9,53 % 0,84 %

A methodological process for assessing variables coming from

administrative sources

Quality in

statistics: the

BR case

Page 15: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

15

0,0

2,0

4,0

6,0

8,0

10,0

1999 2000 2001 2002 2003 2004 2005

Addres

Activity Status

Quality in

statistics: the

BR case

Phase: Output / Component : accuracy / Factor: compliance

Source: SME sample survey Indicator: differences in address and activity status

Page 16: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

time series Population N error(t-2)_(t-1)_(t)

001 Entries 442,352 0000 Out never active 2,275,196 0111 Active 3,597,559 0110 Exits 313,413 0100 Exits in t-1 and not active 225,868 0011 Entries in t-1 and active 365,097 0010 Dis-activations 54,848 1.5101 Reactivations 52,567 1.5

Ij= 100 –[(xkj * ek) / xkj * 100]

I2005 = 97.8

Quality in

statistics: the

BR case

Phase: Output / Component : accuracy / Factor: temp. consistency

Indicator: coherence in activity status

Page 17: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

17

The BR’s Quality Declaration (QD)

QD is a complex system of quality indicators

QD is based on the concept of transparency: to supply all the meaningful and useful tools to measure different quality components in relation to each stage of the process. QD consists of a rich documentation made up of a set of important direct and indirect indicators, having a time dimension for data, sources and variables.

QD contains:- meta-data - a set of indicators easily to be interpreted

Quality in

statistics: the

BR case

Page 18: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

18

1. Phases of the process2. Components 3. Factors

Input C: timeliness, coverage, completeness, F: temporal consistency, internal consistency

Process C: coverage, accuracyF : temporal consistency, internal consistency, metadata

Output C: timeliness, coverage, completeness, accuracy

F: compliance, internal consistency, metadata

A methodological process for assessing variables coming from administrative sources

Quality in

statistics: the

BR case

The BR’s Quality Declaration (QD)

Page 19: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

19

The BR’s Quality Declaration (QD)

37 Indicators have been identified:Quality

in statistics:

the BR case

CriteriaTimeliness Coverage Completeness Accuracy

ComplianceInt. Consistency 7Temp. Consistency 2 3 1Metadata

ComplianceInt. Consistency 2 2Temp. ConsistencyMetadata 1 3

Compliance 6Int. Consistency 2 4 1Temp. ConsistencyMetadata 2 1

Components

INPUT

PROCESS

OUTPUT

Page 20: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

20

The BR’s Quality Declaration (QD)A methodological process for assessing variables coming from administrative sources

Quality in

statistics: the

BR case

1. Quality of Input Component – 1.1 Completeness 1.1.1 ) Address, s=CCIAA: Number of records ( % weight) with missing information

INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Records with missing address (cciaa)

% weight (abs.number of records)

0.49 (37,408)

-0.03

2. Quality of process Component – 2.1 Coverage 1) Number of records, s=CCIAA, not matched with the base MEF

INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Not matched Records (cciaa)

% weight (abs. number of records)

5.25 (338,304)

0.03

3. Quality of output Component – 3.2 Timeliness 3.2 Lag, in days, between dissemination time of BR and reference year of data

INDICATOR COMPUTATION It=2005 VI=It=2005 - It=2004 Timeliness of BR dissemination

Days of delay between the dissemination time and the reference year of data

492 +24

Page 21: Monica Consalvi – Giuseppe Garofalo – Caterina Viviano Italian National Statistical Institute

21

The BR’s Quality Declaration (QD)

The QD has been disseminated to internal users

for the first time in 2007

Problems not solved yet:

1. Dissemination of a different version for external users - containing only meta-data and indicators on quality of output.

2. The necessity to obtain a synthetic view of the proposed indicators using “compound indicators”.

3. Internal users were involved in the discussion around QD, but a deeper analysis of their suggestions has not been considered yet.

Quality in

statistics: the

BR case