Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

20
Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices First meeting ESSnet on Data Integration Rome 28 – 29 January 2010 The use of administrative and accounts data for business statistics (ESSnet AdminData)

description

The use of administrative and accounts data for business statistics (ESSnet AdminData). Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices First meeting ESSnet on Data Integration - PowerPoint PPT Presentation

Transcript of Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Page 1: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices

First meeting ESSnet on Data Integration

Rome 28 – 29 January 2010

The use of administrative and accounts data for business statistics (ESSnet AdminData)

Page 2: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Background

ISTAT is about to start a major research project finalised to support the transition of Italian SBS statistics from a data collection system extensively based on direct reporting (involving around 120.000 companies) to a new survey system, which is largely based on the use of administrative data sources

A more intensive use of administrative data for the compilation of SBS statistics requires to carefully evaluate the most relevant administrative data sources with respect to different types of business population (large companies, small and medium size companies, micro-businesses), and to carefully assess the impact of potential sources of biases in those data

Administrative data sources are currently used for the compilation of SBS statistics to produce preliminary estimates, as requested by the SBS Regulation. They are also used as a complementary source of business data with respect to direct reporting to produce definitive SBS data. They are also used for the construction of Italian Business Register (Asia), including demography, and the Oros database (Employment, wage, salary and social contributions STS)

First meeting ESSnet on Data

Integration

Rome, January 29, 2010

Page 3: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

ESSnet AdminDataFirst meeting

ESSnet on Data Integration

Rome, January 29, 2010

Page 4: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

WP5 - Plan of activities for the period 2010-2013

Assessing the relevance of data matching and definitions inconsistency problems arising when combining multiple administrative data sources (balance sheet data, VAT data, Fiscal authority surveys) with survey data

Design and testing of a methodological approach (and drafting related recommendations) finalized to deal with inconsistency in variables and data matching problems arising from the use of fiscal data in the estimation of SBS data with respect to the smaller size classes (small and micro-businesses)

Designing and testing the appropriate methodological approach (and drafting related recommendations) finalized to statistically amend inconsistency in variables and data matching problems arising from the use of balance sheet data in the estimation of SBS data with respect to the larger size classes (medium and large businesses)

Designing and testing the proper methodological approach (and drafting related recommendations) finalized to properly use complementary fiscal and administrative data sources to estimate specific variables and segments of the SBS target population not covered by the above mentioned data sources.

Defining the final reports and comparison with other EU countries

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 5: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Italian Case – Proposed Actions (First Year)

-Integration issues:-Matching issues: detecting different subset of Population (micro-small/medium-large; Unincorporated/Corporated)-Metadata issues: comparison between SBS definitions and Fiscal data

- Review of Administrative sources useful to produce Structural Business Statistics (SBS Eu reg. 58/97, 410/98, 2700/98, 2056/02, 1670/03, 295/2008)

- First step in the reconstruction of the main economic variables for Small firms by using Fiscal sources

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 6: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Integration issues (1)

Advantages:- Reduce statistical burden - Reduce bias in estimates (due to TMR)- Reduce costs- Timeliness in producing estimates

Drawbacks:- Confidentiality problems related to the Administrative data access - Administrative data are customarily collected for different purposes- No control on data production process at the origin (to check missing values, outliers, etc.). Cooperation with Agencies that provides data should be considered.- They may refer to legal units not statistical units

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 7: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Integration issues (2)

Matching different data sources (statistical/administrative) means tackling a host of issue, e.g.:

Identifying business units i.e. find an identifying variable which is a unique key that is a natural join between different sources. In almost all firm databases we choose the fiscal code (available from Asia)

Dealing with Matching Problemsi.e. whenever a key variable is unavailable or is not sufficient to identify the statistical unit. In case of mis-matches or when sources do not contain the same unit

Identifying changes in business units Changes involving a single unit (changes in kind of business classification, in legal form or localisation)

Changes in the number of units (death, birth, breaks up and splits off, mergers and acquisitions)

Addressing sampling problemsWhen merging survey data with exhaustive data from a subset of the population

Reconciling definitions and values among sources Whenever a variable has not the same definition or value across different sources

Handling data editing and data reconstruction issues Measurement Errors, Missing data, Outliers etc

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 8: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Review of sources

1) Fiscal Agency

• Fiscal Survey Purpose Aiming to enhance fiscal complianceNot all firms

• Tax Return data Unico (personal tax), 770 (witholding tax on employees and temporary workersMore info for micro_firms with simplified bookkeeping. Less info for other firms

• VAT data Changes in legal unit and Turnover data

2) Chambers of CommerceBalance Sheet Data

All Corporate firmsBetter coherence with SBS variables

3) Social Security InstituteData from monthly declaration of the enterprise on employeesAll firms with at least 1 employee in a months of the yearNumber of employees, typology, wage and salary, social contributions

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 9: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Purpose of Administrative sources To support Tax Admin. control action on small and medium firms

Population coverage Single ownerships, Partnerships and corporate firms Turnover greater than 30.000€ and less than 7,5 million € Roughly 4 million of records

Variables More balance-sheets-comparable variables (Turnover, Value of Production, Intermediate costs, Value Added, Personnel costs, Gross and net operating surplus)Different definition of accounting variables (e.g. Freelancers)

First step in the reconstruction of the main economic variables for SME by using Administrative data (1)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Page 10: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using Administrative data (2)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Representative sample of small medium firms (Total 93k, Respondants 44K)

Corporate firms (coverage of financial statements 31k)

Coverage of Fiscal Auth. Survey (Fas 63k)

Coverage Corp+Fas-(Fas∩Corp) (76k)A+B Uncovered (93k-76k=17k), but it is possible a partial reconstruction through Tax returns data

Delimitation of Tax returns data typology : CM (Minimum), RE (Freelancers), RG (Simplified), RF (Ordinary), RS (Companies)

F I S C A L A U T H . S U R V E Y Siz

e

Legal type

F I N A N C I A L

S T A T E M E N T S

Med

ium

Sm

all

Sole proprietorships Corporate firmsPartnerships

Tax return data (RS, RF form)

Tax return data (RE and RG forms)

Minimum taxpayer

BA

Coverage analysis by legal type and size class

Page 11: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using Administrative data (3)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

List of harmonized variables from various sources defined according the SBS regulation and international accounting standard

Description Sme Survey Fiscal (a) Financial Statement Tax return (b)Income from sales and Services fatt_tot_pmi fatt_tot_sdsx fatt_tot_bil fatt_tot_uyyzzChanges in stock of finished and semi-fin. products var_rpfpcl_pmi var_rpfpcl_bilChanges in stock var_riman_pmi var_riman_sdsx var_riman_bil var_riman_uyyzzChanges in contract work in progress var_lavco_pmi var_lavco_sdsx var_lavco_bil var_lavco_uyyzzChanges in internal work capitalized inc_immli_pmi inc_immli_sdsx inc_immli_bilOther income and earnings ric_altri_pmi ric_altri_sdsx ric_altri_bil ric_altri_uyyzzPurchases acq_beni_pmi acq_beni_sdsx acq_beni_bil acq_beni_uyyzzPurchases of goods and services acq_bese_pmi acq_bese_sdsx acq_bese_bil acq_bese_uyyzzGoods for resale CRS353Services (Total) acq_ser_pmi acq_ser_sdsx acq_ser_bil acq_ser_uyyzzUse of third party assets acq_gdbt_pmi acq_gdbt_sdsx acq_gdbt_bil acq_gdbt_uyyzzValue adjustments acq_amm_pmi acq_amm_sdsx acq_amm_bil acq_amm_uyyzzChanges in stocks of raw mat. and for resale var_rmpriv_pmi var_rmpriv_bilChanges in stock for resale var_rriv_uscrsFund allocations acq_acc_pmi acq_acc_sdsx acq_acc_bil acq_acc_uyyzzOther operating charges acq_oneri_pmi acq_oneri_sdsx acq_oneri_bil acq_oneri_uyyzzValue added vagg_pmi vagg_sdsx vagg_bil vagg_uyyzzWages and salarie ret_pmi ret_bilSocial security contributions onerisoc_pmi onerisoc_bilShare of leaving indemnity tfr_pmi tfr_bilPersonnel costs clav_pmi clav_sdsx clav_bil clav2_uyyzzOther personnel costs altrclav_pmi altrclav_bilGross operating surplus margope_pmi margope_sdsx margope_bil margope_uyyzz(a) Sector study form x= enterprise, freelance

(b) Tax return form yy= sole proprietorship, partnership, limited company

zz= frelance, simplified, company

Page 12: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (4)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Coverage of the initial sample of SME survey by type of response and administrative data

Non respondents Respondents TotalFinancial Statements 10,370 19,739 30,109

Fiscal Authority Survey (F) 24,655 17,798 42,453

Fiscal Authority Survey (G) 1,343 1,223 2,566

Tax Return data - PF-RG 2,312 990 3,302 Tax Return data - PF-RE 747 483 1,230 Tax Return data - SP-RG 810 378 1,188

Tax Return data - SC-RS 4,546 1,839 6,385

From survey only - 1,251 1,251 Total 44,783 43,701 88,484

Out of coverage and list errors 10,218 No sources 4,337 Total sample units 103,039

Initial theoretitical sample

Source

Page 13: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (5)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Integration scheme

63711037025998

841588484

43372503

1.05

Sm

all-

Med

ium

en

terp

rise

s S

urv

ey

Non

-Res

pond

ents

(53

%)

Res

pond

ents

(47

%)

(1,251 survey only)

Survey variables

Financial Statements (10,370)

Fiscal Authority Survey (25,998)

Tax Return data (8,415)

Partial missing responseto impute

Total Missing Response - Not covered (4,337) (5%)

Out of coverage (>100 workers or not active) (10,218)

Information content of SME survey and Administrative Souces (SBS variables)

Tax Return data (3,690)

Fiscal Authority Survey (19,021)

Financial Statements (19,739)

Current imputation (6,371)

Estimation of the source

substitution effect SE=

Σ(Y'-Y)w(S1=

43701)

Estimation of the difference in final estimation DF=ΣY'w'-ΣYwandEvaluation of the Non Response Bias EffectBE=ΣY'(w'-w)(S2=88484)

Page 14: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (6)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Calibration and bias estimation

Final estimates on the subset of respondents (S1)

Final estimates on the integrated sample (S2)

The difference in the final estimation is equal to

In this way we can distinguish, in the final estimated difference, two possible bias due to:

- The source substitution effect for S1 =

- Difference originated from the calibration procedure for S2 =

k S kk S k wywyYYDIFF ~~

1

*

2

**

k S k wyY ~

1

*

2

** ~

k S k wyY

k S kk wyy 1

*

)( *

2

*kk S k wwy

k S k wy 1

* k S k wy 2

*kw

If we add and subtract

is zero for all units of S2 not included in S1, we obtain:where

k S kk S kk S kk S k wywywywyDIFF 2

*

1

*

1

*

2

*

Page 15: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (7)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Analysis Variable : w

Sector n m TMR w - average Dev std Min Max

Mining 927 425 54.2 7.88 12.69 0.30 136.92

Manufacturing 35372 16845 52.4 30.04 110.57 0.01 3515.18

Energy 1013 540 46.7 5.31 9.55 0.01 95.44

Construction 4447 2066 53.5 297.86 726.75 0.01 8715.29

Trade 16995 8400 50.6 147.78 490.70 0.01 12673.87

Hotel, Restaurant 2586 1066 58.8 256.41 794.33 0.01 9711.67

Transport 6107 2530 58.6 60.07 290.17 0.01 7202.51

Financial services 1328 598 55.0 112.49 265.59 0.11 2336.82

Business services 14967 7202 51.9 152.94 685.02 0.01 16046.91

Social services 9079 4029 55.6 122.83 395.96 0.01 12952.08

Size class n m TMR w - average Dev std Min Max1-9 61480 24570 60.0 172.0 576.6 0.01 16046.9110-19 14541 6237 57.1 25.5 91.0 0.01 2339.3920-49 11720 8829 24.7 6.7 17.9 0.01 902.1950-99 5080 4065 20.0 3.5 5.7 0.01 181.37

Total 92821 43701 52.9 102.0 441.0 0.01 16046.91

Page 16: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (8)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Analysis Variable : w*Sector n m* TMR* w* - average Dev std Min MaxMining 927 895 3.5 3.75 4.33 0.46 61.12Manufacturing 35372 33724 4.7 15.00 47.00 0.01 1806.62Energy 1013 918 9.4 3.14 4.98 0.01 79.37Construction 4447 4243 4.6 145.03 321.42 0.01 4349.11Trade 16995 16438 3.3 75.52 208.93 0.01 4148.60Hotel, Restaurant 2586 2477 4.2 110.35 306.91 0.01 5056.08Transport 6107 5654 7.4 26.89 113.64 0.01 2441.72Financial services 1328 1235 7.0 54.47 111.93 0.48 1594.54Business services 14967 14352 4.1 76.75 336.71 0.01 8822.26Social services 9079 8548 5.8 57.89 193.18 0.01 5603.26

Size class n m* TMR w* - average Dev std Min Max1-9 61480 58300 5.2 72.5 245.2 0.01 8822.2610-19 14541 13959 4.0 11.3 41.6 0.01 2234.7020-49 11720 11320 3.4 5.1 10.8 0.01 292.5750-99 5080 4905 3.4 2.9 4.6 0.01 192.40

Total 92821 88484 4.7 50.4 202.1 0.01 8822.26

Page 17: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (7)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

CDF of weights w and w*

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

w

w *

Page 18: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (7)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

0

200

400

600

800

1000

1200

1400

1600

1800

1% 5% 10% 25% Q1 50% Med. 75% Q3 90% 95% 99%

W

W*

0

2

4

6

8

10

1% 5% 10% 25% Q1 50% Med.

Distribution of old and new weights

Page 19: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

First step in the reconstruction of the main economic variables for SME by using a Fiscal archive (8)

Rome, January 29, 2010

First meeting ESSnet on Data

Integration

Bias and evaluation of errors on Y=Turnover

Turnover

Variabile Y Nstrata Average errorarb 187 9%rmse 187 11%

Variabile Y* Nstrata Average errorarb 189 8%rmse 189 10%

After a simulation of 1000 re-sampling

Source substitution bias -1.1Non response bias 1.1Gross bias (Y*-Y)/Y% 0.1

Page 20: Cristina Casciano, Viviana De Giorgi, Filippo Oropallo

Looking ahead

Heavily reduced missing response rates and reduced impact of the calibration procedure

Analysis of two types of estimation bias for other variables:- Source substitution effect- Total non response effect

Evaluation of the Absolute Relative Bias and the Root Mean Square Error of the two estimates Y and Y*

Development of data imputation pattern to cover: - remaining variables not contained in administrative data (as PMR)- mismatches (as TMR)

Restructuring SBS by integrating Administrative sources in the statistical production process

Rome, January 29, 2010