Monitoring year-to-year variation in structural business statistics
-
Upload
octavious-puckett -
Category
Documents
-
view
20 -
download
2
description
Transcript of Monitoring year-to-year variation in structural business statistics
Monitoring year-to-year variation in structural business statistics
Contribution to Q2008 – Rome, 9 July 2008Session: Editing and Imputation [email protected], ESTAT – G1
Structural Business Statistics (SBS)
Yearly statistics covering the ‘Business Economy’ (NACE sections C-K).
Geographical coverage: EEA + candidate countries Many characteristics: financial, employment… Multiple breakdowns: activity, size class, region
Produced by National Statistical Institutes, using uniform definitions (but data collection methodologies may vary)
Role of ESTAT: collecting, validating data flows, confidentiality treatment, publishing data series.
Actors
Data validation: a ‘macro editing’ tool
Main variation causes of aggregates in data flows:
– performance of the individual enterprises
– change in the composition of the set of enterprises
– raw data error (misreporting)
– data processing error (editing flaw).
Essential characteristics of macro-editing tool
not to overload correspondents with false alerts suitable threshold to single out influential anomalies
Previous practice
Symmetric [-20%, +20%] confidence interval Applied to all (but a few) characteristics Possibly generating hundreds (if not thousands) of
“anomalous” variations Skilful application required by ESTAT database manager Small aggregates vary more -> Unreasonable burden for
NSI of small countries
Factors influencing evolution of SBS data
Macro-economic– Economic growth– Inflation PPI/CPI (SBS data are in current prices)– Currency fluctuations
Micro economic– Prospering of enterprises– Business demography in the sector
Administrative: business register related– Registering enterprises / deregistering merged, closed
down or suspended units– Activity classification of enterprises
<- can be compensated for
Heuristics: Basic assumptions
Assumptions:– year-to-year variations (YTYV) of individual enterprises = set of random
observations– Enterprises very unevenly distributed in size and the YTYV of large
corporation influential on the sector average. – Economies of scale come to our rescue: since large YTYV more typical
for small companies.– variance of average: YTYV ~ 1/n
– Standard deviation on the average
Knowing economic growth G and inflation I, change of the aggregates could be estimated.
– So can we expect Vt є [Vt-1* (1+Gt)*(1+It)*(1 ± 2.σ/√nt-1)] with 95%
probability ?
No, because of several sources of bias
n1/ ~ YTYV
Heuristics: sources of bias
Non-financial business economy: NACE C-K \ J: not a full coverage
Stratification by NACE: non-random sample -> heavily biased sector evolution, moreover: – We use one unique ‘inflation’ number (CPI) instead of array of
sectoral PPI
GDP is a sum of values added. Other characteristics: possibly different evolution
Result of bias: expectation value => expectation interval
Heuristics: variability of characteristics
A few characteristics can be negative of close to zero:– Change in stocks or work in progress (frequently)– Gross operating surplus (rarely)– Value added (almost never)
Consequences:– Volatile characteristics -> large % YTYV– Variance increase of the characteristics
Measures taken:– Dropping volatile characteristics– Widening confidence limits of expectation interval (lack
of predictability ≈ extra bias source)
Heuristics: Bringing it together
(Standard) Confidence interval limited by a Standard lower boundary (SLB) and standard upper boundary (SUB)
Adapted boundaries: number of enterprises in year t-1
SLB / ( ) ; SUB * ( )
2. σ imply 95% confidence limits, leaving 5% anomalies (too many) … but we have no idea about σ.
=> 2. σ is considered a parameter: We fit the value 4 to obtain an 80% reduction of the number of ‘anomalies’ as compared to previous practice.
1
.21
tn
1
.21
tn
Heuristics: Method applied
Standard Confidence Interval:
– width depending on characteristics
– tuned using CPI and/or growth data (compare in national currency)
– Symmetrical on log-scale
Tuned interval for ‘Business demography’ characteristics.
– SLB / ( ) < (nt/nt-1) < SUB*( )
Tuned interval for Financial characteristics– [SLB / (1+…) * (1+real growth) * (1+inflation rate) ; SUB*(1+ …) * (1+real
growth)*(1+inflation rate)]
Tuned interval for Employment characteristics– [SLB / (1+…) * (1+real growth); SUB*(1+ …) * (1+real growth)]
)1(
41
tn )1(
41
tn
Characteristics Inflation? Growth? SLB SUB
Number of enterprises N N 0.82 1.22
Turnover Y Y 0.82 1.22
Purchases Y Y 0.82 1.22
Value added Y Y 0.77 1.30
Personnel costs Y Y 0.82 1.22
Number of employees N Y 0.82 1.22
Turnover / person empl. Y N 0.85 1.18
Purchases/ product.value N N 0.85 1.15
Confidence interval standard lower and upper boundaries
Implementation and discussion
Deterministic method => programmed in Access for distribution Test more tolerant on small aggregates => Reduced burden for small MS
(confirmation in ‘2003-04 field test’) Raising awareness on influential changes 'macro-editing tool‘: signalling suspicious aggregates:
– Business demographic change?– Micro-data to be reviewed? Selective editing of ‘suspect’ subset.– Same ‘macro editing tool’ front end (NSI) and back end (ESTAT) ->
shorter validation cycle Field test: Number of anomalies varies between 0.37% and 4.6% (!) Correlation low (0.15) between ‘country size’ (number of inhabitants) and
anomaly frequency: small and large MS are treated on equal footing.