SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of...
-
Upload
truongdiep -
Category
Documents
-
view
218 -
download
0
Transcript of SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of...
1
ESSNET
USE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS
STATISTICS
WP4
TIMELINESS OF ADMINISTRATIVE SOURCES FOR MONTHLY AND QUARTERLY
ESTIMATES
STS-ESTIMATES BASED ON ADMIN DATA: DEALING WITH REVISIONS
(SGA 2011: DELIVERABLE 4.4)
Ciro Baldi a, Donatella Tuzi, Francesca Ceccato, Silvia Pacini, Epp Karus, Pieter Vlag
a Corresponding author: Senior Researcher
ISTAT – Italian NSI
Directorate for Short-Term Economic Statistics
Wages and Labour Input Business Statistics Division
Via Tuscolana, 1788 00173 Rome
2
Contents
Contents ............................................................................................................................................... 2
1. Introduction ............................................................................................................................... 3
2. Using admin data for STS-estimates ......................................................................................... 4
2.1 The general system of admin data based STS-estimates .................................................................... 4
2.2 Incompleteness of admin data: two situations ................................................................................... 5
2.3 The large enterprise survey ................................................................................................................. 7
2.4 STS-estimates and active enterprises ................................................................................................. 7
3. Admin data based STS-estimates and revisions ....................................................................... 9
3.1 Components for revisions ................................................................................................................... 9
3.2 The complete sequence of revisions ................................................................................................ 10
4. Revision strategy ..................................................................................................................... 10
4.1 Introduction ...................................................................................................................................... 10
4.2 Factors influencing the revision policy ............................................................................................ 11
4.3 Updating of admin data .................................................................................................................... 11
4.4 Updating of the survey data .............................................................................................................. 13
4.5 Updating of the SBR (and population changes) ............................................................................... 13
4.6 Benchmarking ................................................................................................................................... 14
4.7 The second component: publication strategy and output obligations ............................................... 14
4.8 Examples and considerations ............................................................................................................ 15
5. A complete sequence of revisions: an example from Finland ................................................ 21
6 A structural way to analyse revisions: an example from Italy ................................................ 25
6.1 Introduction ...................................................................................................................................... 25
6.2 General outline: 5 steps .................................................................................................................... 25
6.3 Step 1: context information – example Italian employment data ..................................................... 27
6.4 Step 2: revision measures – example for Italian employment data .................................................. 30
6.5 Step 2a: graphical analysis of problematic domains ....................................................................... 31
6.6 Step 2b: decomposition of the revision error into a survey part and an admin data part .................. 33
6.7 Step 3: further analysis .................................................................................................................... 35
6.8 Cause and effect report: a synthetic description of the main causes of revisions ............................. 37
6.9 Generalisation and general remarks.................................................................................................. 39
7 Conclusions ............................................................................................................................. 40
Acknowledgements ............................................................................................................................ 41
Appendix 1. An application of revision analysis to the Estonian turnover estimates on retail trade 43
Appendix 2. Summary statistics on revisions .................................................................................... 55
Appendix 3. Contribution of admin data and survey data to revisions .............................................. 58
Appendix 4. SAS code for the calculation of the summary statistics on revisions and graphs ......... 62
3
Summary
Dealing with revisions is an important part of any short-term statistical (STS) process. This is
especially the case when STS-estimates are based on a combination of a survey for the largest
enterprises and administrative data (admin data) for medium and small enterprises. In such a
system, admin data are structurally incomplete when the first estimates have to be made and this
incompleteness leads to revisions when replacing these first preliminary estimates with later final
estimates based on a complete set of admin data. As NSIs (NSIs) have little control over the
completeness of the admin data, it is important that revision analysis is carried out in order to
monitor this effect. However, our analyses show that updates to the information from the large
enterprises survey and corrections in the determination of the active enterprise population between
the first preliminary and last final estimate are also important sources for revisions.
Based on these observations, this paper describes three aspects of revision analysis. The first is
revision analysis as a tool to understand the characteristics of the preliminary estimates. The
second is revision analysis to suggest areas for improvement, either in a development stage or
during current production. The third is to support NSIs when setting up revision policies for an
STS-production process based on a combination of a large enterprise survey and admin data.
Keywords: Revisions, Administrative Data, Revision Policy, Revision Analysis, Short term
Statistics, Quality
1. Introduction
Short-term statistics (STS) have to provide early signals on the economy dynamics. For this reason
they are needed as quickly as possible and are usually released when information is still partial or
subject to change. Subsequently, revised estimates are released to incorporate newly available
information to improve the quality of the indicators. Since, by definition, later estimates are
considered more accurate, the revision is a primary quality indicator of the preliminary estimates,
for the producer as well for the user. The reliability of the preliminary estimates can be easily
assessed by looking at the magnitude and characteristics of revisions. Large and/or systematic
revisions may be interpreted as a signal of bad performance of the early estimates and thus damage
their credibility.
These considerations lead the NSIs to design preliminary estimates in such a way that revisions are
minimised. As a consequence, the analysis and monitoring of revisions are essential aspects either
in a development stage or in current production.
In a development stage, the comparison of simulated preliminary estimates and final estimates
provides useful information to improve and fine-tune the estimation methodology through the
detection of the main causes of discrepancies and through comparisons of variants of the
methodology. In current production, the constant monitoring of revisions allows detection as soon
as possible of problems that may have arisen in the estimation process, so that the necessary actions
may be taken.
4
The analysis of revisions is important in all short-term statistical processes, but even more in those
based on admin data, for a couple of reasons. Firstly, the NSIs have a limited control on coverage
and completeness of the data, implying that statisticians may face occasional or structural drops in
preliminary data. The monitoring of revisions may be essential to identify these problems at an
early stage. Secondly, frequently at least for final estimates the admin data cover the target
population on a census basis. This means that the final estimate is not affected by sampling errors.
The revision may be interpreted as the difference between an estimate and the “true” value,
assuming that the final estimate is accurate.
A second aspect in dealing with revisions is to set up an appropriate revision schedule of subsequent
releases of the indicators for the same reference period. In the case of admin data based processes,
this revision policy has to take into account the updating of multiple sources of information.
The aim of this deliverable is twofold. On one hand, it documents the work of the ESSnet Admin
Data on the analysis of revisions. It proposes a way to analyse revisions systematically for STS-
estimates based on admin data for the small and medium sized enterprises and a survey of the
largest enterprises. On the other hand, it illustrates the main sources of revisions in STS-estimates in
an admin data based process and relates these sources of revisions to a revision strategy.
The document is organised as follows. Chapter 2 provides a summary of methodological and
practical issues when producing STS-estimates with a combination of a survey of the largest
enterprises and admin data for the medium-sized and small enterprises. Basically, this section
summarises the results of deliverables 4.1, 4.2 and 4.3 of the ESSnet on Admin Data. Some
practical examples of revision analysis are shown in chapter 3. This section mainly deals with an
example of revision analysis from the Italian estimates of the number of employees based on Social
Security data. However, some additional examples using VAT-data from Estonia, Finland and the
Netherlands are also included. Chapter 4 discusses the relationship between revisions and the three
data sources, i.e. a survey of the largest enterprises, admin data for the other enterprises and the
business register to determine the population frame. In this section a link between these revision
sources and a revision policy is made.
In Appendix 1 the revision indicators are explained, together with their formulae. In Appendix 2, it
is described a simple way to decompose the revisions in contributions due to the data sources, In
Appendix 3 a second full application of the revision analysis is reported on Estonian VAT data on
turnover in retail trade. Finally Appendix 4 reports some SAS code snippets that might be used to
replicate the analysis shown here in other NSIs.
2. Using admin data for STS-estimates
2.1 The general system of admin data based STS-estimates
The general set-up when utilizing admin data for producing STS is that a combination of a survey
and admin data is used (see, e.g., Orchard et al. 2011; Karus, 2012; Kavaliauskiene, 2011; Lorenz,
2011; and Šličkutė-Šeštokienė, 2011). Since large enterprises often have a complex structure and
their impact on the estimates is large, correct surveyed observations from those large enterprises are
5
considered crucial for producing reliable STS figures. In the survey the large enterprises are
generally completely enumerated.
For the remaining small and medium enterprises, VAT data are used instead of direct observations
by the NSI. In some specific cases only, a small sample may be surveyed for small and medium
enterprises In other words, the general system of admin data based STS-estimates consists of two
parts: the use of a survey for the large enterprises; and the use of admin data, i.e. VAT data for
turnover estimates and social security data for employment estimates.
2.2 Incompleteness of admin data: two situations
A drawback of using admin data for small and medium enterprises is that these admin data are still
incomplete when the monthly or quarterly STS-estimates have to be produced. This incompleteness
might be temporal (e.g. due to late response of enterprises) or structural (e.g. because enterprises
below a fixed income threshold may report for a different periodicity to the admin data holder).
Roughly speaking, two general situations can be distinguished:
I. By far most admin data of the period t is available in time (with the remainder of the data being
provided later).
II. No or very limited data of the period t is available in time.
In both cases a common practice among NSIs is to enumerate the population of large enterprises
completely with a survey.
Situation I applies in general to regularly produced quarterly estimates, because the general
situation in continental Europe is that commercial enterprises have to declare their VAT and Social
Security data on a monthly or quarterly base, and the deadline for reporting these data to the
authorities is much earlier than publication deadlines for quarterly turnover and employment
estimates according to the STS-regulation.
The second situation (situation II) mostly applies for monthly estimates, because some enterprises
declare per quarter, and some deadlines for monthly statistical publications are early, e.g. before the
deadline of reporting to the tax office.
Many NSIs consider that coverage of about 80% of the total turnover by the large enterprises survey
and available admin data is necessary, before reliable figures of turnover can be published in a
certain publication cell (see, e.g., Vlag, 2012).
In situation I, the natural wish of a statistician would be to complete the dataset to the whole
population. Theoretically, many ways of doing this are possible. In practice, various methods of
imputation are used. Several statistical production systems using an almost complete VAT-dataset
have been described in deliverable 4.1 of SGA-2011 of the ESSnet Admin Data (Maasing et al.,
2013). An important conclusion of Maasing et al., 2013 was that both level and growth rate
estimates for turnover can be produced using VAT if:
VAT provides a good coverage of the population. A good coverage is defined as 80 % or more
of the estimated population covered by available VAT (Maasing et al., 2013);
the data transfer from the tax office to the statistical institute is guaranteed; and
6
the link with the Business Register is well established.
In case of situation II (no or few admin data available) it was concluded that the statistical survey
cannot be directly substituted by admin data, because the available admin data are generally not
representative for the target population and this selectivity cannot be determined beforehand. In this
case several estimation methods are available:
maintaining a small survey for current period t and weighting this mini-survey with help of
admin data of previous period. Hence, the admin data are used as auxiliary information for the
estimates. This method is described in deliverable 4.2 of SGA-2011 of the ESSnet Admin Data
(Kavaliauskiene et al., 2013)
alternatively the admin data estimates of previous month or quarter are used to check whether
long-term trends and short-term movements are similar for the larger enterprises and smaller
enterprises. Depending on the outcome, this information can be used to decide whether
a survey of the largest enterprises (LE-survey) only is sufficient for the (first) monthly
estimates, knowing that the structural series based on a LE-survey and admin data become
available at a later stage or for the quarters; or
the LE-survey should be combined with a separate estimate for the smallest enterprises
based on extrapolation of the VAT-series.
These model-based estimation methods are described in deliverable 4.3 of SGA-2011 of the ESSnet
Admin Data (Vlag et al., 2013).
General problem –
administrative data are not available at
the time they are needed
A. Admin data almost complete
B.No or limited admin data
Use of incomplete dataset
current period
Imputation of missing data
Use of admin data of
previous period(s)
Regression type
estimation technique
Benchmarking /
Nowcasting
Deliverable 4.1
SGA-2011
Deliverable 4.2
SGA-2011
Deliverable 4.3
SGA-2011
Figure 1. Scope of the timeliness problem, and relationship with deliverables.
7
2.3 The large enterprise survey
In deliverable 4.1 of the ESSnet Admin Data it has been discussed that the coverage of the large
enterprise survey differs per country. It was concluded that this coverage is in practice often based
on a balance between:
1. targets for administrative burden reduction and statistical production costs;
2. the link between the statistical business register and the (units of) VAT and social security
admin data; and
3. the impact on growth rates of definitional differences between the ‘administrative’ variables
and the ‘statistical’ variables required by the STS-regulation.
However, when defining the coverage of the large enterprise (LE) survey, it has to be kept in mind
that the maintenance of this survey provides some insurance against unexpected breaks in the
system (such as drops in admin data). As preliminary STS-estimates are generally designed in such
a way that revisions are minimised, Langford and Teneva (2012) developed a method to calculate
the impact of the ‘incompleteness’ factor on revisions, to determine the boundary between the LE-
survey and admin data parts in admin data based STS-estimates.
This method is based on calculating revisions between the first STS-estimates (incomplete admin
data) and final STS-estimates (complete admin data) by defining the boundary between the LE-
survey and the admin data parts at several thresholds. More specifically, revisions are calculated
when using admin data for enterprises with fewer than 20, 50, 100 and 200 persons employed,
respectively. These authors have argued that the boundary between the LE-survey and the admin
data parts should be set at the threshold at which revisions start to increase considerably. As a
consequence, it is recommended that coverage of the large enterprise survey differs per activity.
Langford and Teneva (2012) tested the method on VAT-data in the United Kingdom. The coverage
of VAT for the “first estimates” (3 months after the reporting periods) was about 60 % in terms of
turnover. The coverage of VAT in the final estimates is close to 100 %. Langford and Teneva
(2012) assumed that the data of the LE-survey were complete at the first estimates and remained
unchanged between the first estimates and the final estimates. The validity of this assumption will
be discussed in the next chapters of this paper.
2.4 STS-estimates and active enterprises
Deliverable 4.1 of the ESSnet AdminData (Maasing et al., 2013) extensively discusses a subtle but
relevant aspect for the STS estimates - uncertainty about which enterprises are active and which are
not during the reference period.
The fact that admin data for a reference period normally cover the population of active enterprises
in that period defines a distinctive feature of STS based on admin data. In other words, the admin
data provide a representation of the currently active population of enterprises, which may or may
not coincide with the active population of enterprises according to the Statistical Business Register
(SBR). In practice, this disparity causes several challenges.
The SBR is often used:
as the sample frame for surveys, including the LE-survey in an admin data based STS-system,
and/or
8
to maintain consistency between the various business surveys of the NSI.
For these reasons, it is preferred that the admin data used for STS-estimates are linked to the SBR.
In practice, most countries link the admin data to a ‘frozen’ SBR for a certain period. A frozen SBR
defines the enterprise population characteristics as registered in the SBR at a certain date (for
example 31 December 20xx). However, linking the admin data to a frozen SBR is not
straightforward, even for a complete admin dataset (e.g.for annual SBS-statistics or final STS-
estimates). Reasons for complications are:
1. different enterprise units in the SBR and the admin data;
2. different registrations of mergers/split-offs in admin data and the SBR due to time-lags;
3. different registration of information in admin data than in the SBR due to maintenance
peculiarities;
4. (slightly) different population coverage.
The incompleteness of admin data is an important issue for STS. Due to time-lags between the SBR
and the admin data source, late reporting starting enterprises are missed in the first
estimates as they
are not included in the SBR yet. For the same reason, it is difficult to determine whether admin data
are missing due to a) late reporting or b) because the enterprise has stopped. In the latter case no
imputation is needed for the missing units. Hence, the so-called provisional target population is
uncertain at the time of the preliminary estimates. This situation is sketched in Figure 2 below.
2
Population frame
= business register
Admin
Data
(i.e. VAT)
SBS:
link admin data
Complications:
- coverage,
- dif. units,
- merges
Estimation:
Provisional active population
but stopped
Additional challenge STS:
time-lags
missing
missing
VAT
but active
Not in BR but
Admin
Data
(i.e. VAT)
Figure 2. Schematic sketch of a) general challenges when linking admin data to the SBR (middle column)
and b) specific challenges for STS when linking incomplete admin data to the SBR
9
In contrast to the first estimates, this ‘time-lag’ problem does not exist for the final estimates. All
admin data are available when the final estimates have to be produced and the final target
population which consists of all active enterprises in period t hand can be compiled by linking all
available admin data with the SBR (Baldi et al., 2012).
Analyses in several countries show that the uncertainty in the provisional target population is a
major source of revisions in STS-admin data estimates (Maasing et al., 2013).
3. Admin data based STS-estimates and revisions
3.1 Components for revisions
The previous chapter mentions three major sources for revisions in an admin data based STS-
system:
1. the estimation for the small and medium sized enterprises, because the admin data are
incomplete for the preliminary first estimates but complete for the final estimates;
2. the estimation for the large enterprises, as the data of the LE-survey might be updated between
the first and final estimates.
3. the estimation of the target population, i.e. the active enterprises, for the preliminary estimates.
Taking into account these three components, Roestel (2011) proposed to perform revision analyses
on the total estimate, i.e. in most cases the published results, sub-divided by revision analyses on:
a. both data sources, i.e.
a1. the admin data based estimate for the small and medium sized enterprises; and
a2. the LE-survey estimate for the large enterprises
b. (the uncertainty in) the active population at the time of the preliminary estimates.
The basic idea behind this sub-division is that (simulations of) revisions between the preliminary
estimates and the final estimates can be used for fine-tuning an admin data based STS-production
system. More specifically, these can be used to decide whether available resources should be
concentrated on optimising:
the estimation for missing VAT-data in the preliminary estimates (or VAT-data analysis in
general);
the estimation for the LE-survey (i.e. dealing with missing survey data or revisions to these
data), or the size of the LE-survey;
the link between the admin data and the business register.
In a production setting, this sub-division can help to find the cause for an unusually large revision
more quickly. This is especially the case if the revision is not caused by a single unit, but by a more
general problem, such as:
drops in responses in VAT or the LE-survey; or
the relationship with periodicity and changes in the business cycle.
Note that these three components of revisions cover all possible causes. For example, revisions may
also be caused by changes in the SBR (corrections of erroneous NACE-codes; incorporation of
10
merges/split-offs in the SBR), revised VAT or social security data declaration or due to combining
different sources, but these three components are normally sufficient to find the underlying causes
of revisions. Chapter 6 explains how the revisions of the separate components can be calculated.
3.2 The complete sequence of revisions
Depending on the timeliness of the admin data and the publication deadlines, the theoretically most
extensive sequence of admin data based STS-estimates for period t would consist of the following
estimates:
1. A first preliminary estimate for period t based on a LE-survey plus model-based estimation for
small and medium sized enterprises, because the admin data are not available yet (= situation II
in Chapter 2),
2. A second preliminary estimate for period t based on a LE-survey plus an estimation for small
and medium sized enterprises based on fairly complete admin data (= situation I in Chapter 2),
3. A series of estimates based on a LE-survey and admin data gradually becoming complete,
4. A final estimate based for period t on:
a LE-survey (which is complete and completely analysed)
the complete (and analysed) set of admin data for small and medium sized enterprises
a population frame derived from a ‘analysed and corrected’ SBR for this period.
In practice, such a complete sequence was only found for the retail trade in Finland which covers a
sufficiently long period. This sequence covers all estimates between the first output as required by
the European STS-regulation (30 days after the end of the month) and a final estimate (225 days
after the month). As the output obligations for the first estimates are later for other activities, the
admin data are already available. But in Finland such a complete sequence does not exist for other
activities and for these activities the sequences starts with the second preliminary estimates.
4. Revision strategy
4.1 Introduction
The design of a revision policy is an important part in the development of any short term statistics
process. Because short term indicators have to provide early signals on the economy dynamics,
they are needed as quickly as possible and are usually released when only partial information is
available, thus being subject to changes when more information is available. In setting up a revision
policy a NSI has to balance the need to release more accurate indicators when updated information
is available and the costs both for the producers and users of frequent release. As Eurostat (2012)
puts it:
“Revisions are a two-sided affair from the producer’s perspective as well. The new information they provide
is needed to describe economic developments more precisely, yet, frequent and/or major revisions can
damage the credibility of the statistical data. …Both, producers and users have extra work caused by
revisions. Producers have to develop revised and new data. Users have to update their databank and to adjust
their analysis.”
11
And NSIs have to “… find a balance between the demands for the best statistical information at all points in
time (which then suggests a continuous revision policy) and avoiding unnecessary changes in the data.
The basic principle is that significant information for politically or economically important data should be
incorporated as quickly as possible into published data in order to avoid a wrong assessment of the economic
development, whereas minor changes should first be collected before being implemented.”
While the general principles of revision policies are well described in the – above cited -
OECD/Eurostat (2008) and Eurostat (2012) documents, this paper presents some specific elements
for admin data based STS estimates.
4.2 Factors influencing the revision policy
While the first release is constrained by European Regulations or national obligations deadlines,
following releases are planned according a revision policy fixed by the NSI. The factors that
influence such policy may be grouped in two categories:
1. the updates of available data sources (input), due to revisions of:
1.1 admin data;
1.2 survey data;
1.3 SBR;
1.4 other information (e.g. benchmarking);
2. publication strategy and output obligations.
Note that the first factor corresponds with the three components which determine the quality of the
admin data based STS-estimate (see chapter 2).
4.3 Updating of admin data
VAT or Social Security data for the reference period might be either incomplete or missing at all at
the time of the first estimate. Depending on the deadline of the first estimate and the legislation that
sets the obligation for the firms, the admin data may be missing at all (or only very limited
available), as illustrated in situation II (see Vlag et al 2013, Kavaliauskiene et al., 2013, Teneva
2012, Orchard et al. 2011) or partly incomplete only as illustrated by the situation B cases (see
Baldi et al. 2011, Vlag et al., 2013. For following estimates of the same period (revisions) the NSI
may rely on more complete datasets up to the point where the data for the whole population covered
by the admin data is available. In situation II, a following estimate can be based on incomplete
admin data. Also, in situation II, the completion of the data happens because all late reporters
become available eventually.
During the development of the statistical process, the cause and the timing toward the completion
and the stabilization of the admin data should be analysed in order to set up a proper estimation
methodology and a revision policy. The plot of the cumulative number of reporters against the
transmission dates, or the analysis of the impact of value changes in different deliveries of the data,
are useful tools in deciding when to revise the estimate and which is the length of the series to be
revised. Such data analyses may not only help to determine revision strategy, but also help in
12
designing the transmission schedule of admin data from the data holder to the NSI. For instance,
Italy requests two transmissions of Social Security data, one at 45 days from the end of the quarter,
the latest moment to respect the STS deadline taken into account the processing time, and the
second one year later when the data are complete and stabilized (Baldi et al. 2011). The schedule of
transmission of VAT data set up by Estonia for the turnover index in this very start of the new
system implies 4 transmissions of data with a monthly periodicity (Karus 2012).
In general, it can be stated that the majority of the late information will become available at two or
three specific moments. More precisely,
most data of late reporters will be available at the end of the next period, i.e. when the estimates
for the next month or next quarter have to be produced;
in the case of monthly output, missing data due to quarterly reporting become available after
the end of the quarter. More specifically, quarterly admin data become available 20-40 days
after the quarter, taking into account that reporting deadlines for submitting VAT and/or social
security data for monthly and quarterly periods (Vlag et al., 2013) are in general 20-40 days
after reference periods;
the remaining missing information, the annual VAT and employment reporters, becomes
available x months after the end of the year. When these missing data are received, the admin
dataset can be considered as complete.
Besides completing the data, the subsequent transmissions of admin data may contain change of
values for the units already present in earlier transmissions. This may happen either because the
firm has amended a previously declared data or because the administrative institution has adjusted
the data following its check procedures. The timing of revisions in previously reported values for
the units already is, in general, more erratic that the timing of completing the missing admin data.
Therefore, analyses about completion and the stabilization of the admin data in time are extremely
important in a development and production phase despite the fact the most important moments of
the completion of the data can be estimated beforehand.
The consequences for developing a revision strategy are as follows:
the first estimate is defined by output needs, i.e. European regulations or national requirement.
If an incomplete set of admin data is available, then:
an obvious timing for the first revision is the publication moment of the next period,
because the majority of the late reporters are available then;
an obvious timing for another revision is the publication moment of the quarterly results
(in case of monthly publication) of the next period because the majority of the late
reporters are available then;
an obvious timing for the last revision is when the information of the yearly reporters
become available.
The final estimate (on which last revision is based) is considered as the most accurate estimate and
is therefore considered as the reference.
13
If no or very limited admin data are available at the timing of the first estimates (situation II), an
obvious additional revision moment is the first publication with an incomplete admin data set
(situation I).
This raw revision scheme can, however, be improved based on analyses about completion and the
stabilization and the previously mentioned other factors determining revision strategy, like large
enterprise survey, updates of information of the Business Register, benchmarking and output
obligations.
4.4 Updating of the survey data
Since the general set-up when utilizing admin data for producing STS is that a combination of a
survey for large enterprises and admin data for small and medium sized enterprises, the impact of
the updates in survey data, new or revised reponse, plays and important role in the amount of
revision. However, in contrast to admin data NSIs have much more control in the timing of these
updates because it controls the data collection. The impact of this data source has less impact on the
revision strategy than the availability of admin data.
When carrying out revision analysis, it is however important that the original survey data (used for
the first estimates and other estimates) remain stored. This recommendation seems obvious but, in
practice, the revision analyses presented in Chapter 6 of this deliverable were hampered by the fact
that some countries do not preserve the ‘original’ survey data. The ESSnet project has also observed
that – like the admin data part – the large enterprises survey is sometimes not complete when the
first estimates have to be made. Therefore, it is recommended to perform analyses about completion
and the stabilization of the large enterprise survey when developing an admin data based STS-
system. Improving the data collection and data treatment system of the few remaining surveyed
enterprises may considerably improve the quality of the output. Some examples will be given in
Chapter 6.
4.5 Updating of the SBR (and population changes)
The role of the SBR is crucial for the definition of the target population. Some countries involved in
the ESSnet use for current period t the latest available frozen version of the SBR, which is a file
normally released yearly. Typically this file is released in the last part of the year (October-
November) and is used for the subsequent year, both for sampling monthly to yearly surveys and to
define the target population for that year. Another important point is whether the frozen SBR file is
time-referenced or not, that is if this file is referred to a particular moment or period of time (e.g a
year), indicating that the information well represent the population of enterprises active in that
moment or during that period. In Finland it is referred to the year before and in Italy, where the file
is released at the beginning of the year, to two years before. Other countries, like the Netherlands,
use an actual version of the SBR, e.g. the population for period t corresponds with the list of
enterprises in the SBR for period t.
Within admin data based statistical processes, the actual use of the (frozen version of the) SBR,
however, varies among countries. In Lithuania and Estonia, beyond being used to sample the survey
part, it establishes the population frame for the coming year. Only changes occurred to big
14
enterprises (entries in the sector/births, exits/deaths or spurious demographic events such as those
implied by mergers or demergers of enterprises) are taken into consideration during the year. In
Estonia, in the current phase of testing, for the admin data part the units identified as dead according
to the administrative sources are removed from the target population (Maasing 2012). On the other
side, Italy and Germany use the SBR only as a starting point to define the target population of the
reference period by adding to it the units born during the year and removing those deemed dead (see
Baldi et al. 2011, Lorenz 2011). The characteristics of these approaches are discussed elsewhere
(De Waal et al. 2012, Maasing et al. 2012).
Here what matters are the implications for the revision policy. Due to time-lag issues, the
registration of starters-, stoppers-, merges and split-off is different in the admin data sources. This
time-lag effect increases when an older frozen version of the SBR is used as population frame.
However, it does also exist when an actual version of the SBR is used, although to a smaller extent.
In deliverable 4.1 of the ESSnet on AdminData (Vlag et al., 2013) and Chapter 2.3 it is argued that
this time-lag effect leads to uncertainty in the determination of the provisional active population and
that this uncertainty is a major sources of revision.
The implication for the revision strategy is that the final revision should be based on a complete set
of admin data plus the final version of the SBR for this period. Ideally, it is recommended that the
intermediate revisions also correspond with updates of the SBR information.
4.6 Benchmarking
The availability of estimates for a lower level of temporal disaggregation (e.g. yearly or quarterly)
may trigger the necessity to revise the estimates at the higher level (quarterly or monthly) to
produce consistent estimates. For instance, the release of quarterly data on turnover in retail trade is
used in Estonia and Netherlands to benchmark the monthly estimates. In some cases the benchmark
to annual estimates (e.g. SBS) involves the adjustment to a more appropriate population of
enterprises. In other countries, such as Italy, the practice of revising the STS indicators to acquire
the consistency with the SBS indicators is not used.
Whatever the practice of the country, it is recommended that if benchmarking is applied the timing
of the benchmarking corresponds with the revision strategy. More specifically, when monthly
results are benchmarked with quarterly information it is preferred that this benchmarking
corresponds with the incorporating of the quarterly admin data in the published estimates. When the
STS-series are benchmarked with annual information it is preferred that this benchmarking
corresponds with the incorporation of the annual admin data (and the version of the SBR) in the
published STS-estimates. Of course, this ideal situation should be balanced against publication
obligations (see Chapter 4.7).
4.7 The second component: publication strategy and output obligations
Beyond the availability of more updated information, output obligation and publication strategy
also determine the revision policy.
15
Firstly, the actual users’ needs; questions are often implicitly raised in order to set up the schedule
of revisions, e.g. how frequent revisions should be in order to be functional for the analysts and
other users, and what is the maximum delay demanded for the final estimate.
Secondly, the revision scheduling may be influenced by the fact that some statistics are inputs for
other statistics (for instance National Accounts). A related issue arises because, in this case, the NSI
may set up common revision policies for indicators used for other statistics. See for instance ONS’s
Revisions and Corrections Policy (http://www.ons.gov.uk/ons/guide-method/revisions/revisions-
and-corrections-policy/index.html).
Thirdly, the relationship with the admin data holder may influence the planning of revisions, since it
may only be feasible to ask for the data for one period a few times.
Finally, the work load necessary to manage (organise, process, store) multiple versions of micro-
and macro-data should not be underestimated. This remark seems obvious, but one of the most
important conclusions of this work was that several NSIs have implemented an admin data based
STS system, but only a few countries have organized this system in such a way that complete
revision analyses can be carried out.
4.8 Examples and considerations
The following examples show how the revision policy is affected by the available information.
In Lithuania, the monthly indicators of income for the Retail trade are released only twice, at 27-28
days after the end of the reference month and at about 58 days. The second release incorporates late
respondents and revised values from the surveys and also the VAT data for the reference month,
which were not available for the first estimate (Table 1). In the same country, the income indicator
for Manufacturing is released three times: the first estimate is released at t+21 or t+22 days, based
only on survey data; the second at t+51 also incorporates the admin data for the reference month;
the third estimate, released at the end of the year/beginning of the following year incorporates
updated information (Table 2).
In Estonia, the new methodology for the estimates of monthly indicators on turnover contemplates
up to six estimates (Table 3). Considering the available information, three versions of preliminary
estimates are released: the first estimate is released 30 days after the reference month and is based
on the first version of the survey data and the first transmission of admin data (obtained at t+24
days). At 60 days a second version of the estimates is released, on the basis of survey updates and a
second admin data transmission (obtained at t+54 days). A third version of the estimates is available
90 days after the reference month, which uses only updated survey information because the VAT is
complete and checked after t+60 days. Further versions of the preliminary estimates are also
calculated as a consequence of benchmarking to quarterly estimates (implemented in February,
May, August and November), which are based on a larger survey.
Italy releases five estimates of the quarterly index on number of employees, the first at 60 days from
the end of the period and the others each quarter, up to one year later. The final estimate
incorporates the second (and last) transmission of Social Security data. The rationale for this
transmission schedule is that the first transmission is already almost complete and the data can be
considered definitive after one year. These facts, coupled with the long time and labour-intensive
16
processing needed to derive the requested indicators, advised against requesting further intermediate
transmissions. The release of a newer version of the SBR, as well the revision of the large enterprise
monthly survey that affect all the months of the previous year, are incorporated once a year in the
May release (Table 4).
It is worthwhile to mention that the Lithuanian and the Estonian revision policy are comprehensive
and well described but do not include a final revision which takes into account a new version of the
SBR as revised population frame. As the inventory of practices of admin based STS-systems
revealed that uncertainties in the active population is a major source for revisions. As the final
estimate is considered as an accurate one in revision analyses and revision policies, it is
recommended that in new systems a final estimate should include updated information of the
population, i.e. business register.
17
Table 1 - Lithuania: revision policy in the monthly income estimates (retail trade)
18
Table 2 - Lithuania: revision policy in the monthly income estimates (manufacturing)
19
Table 3 - Estonia: revision policy in the monthly turnover estimates (retail trade)
20
Table 4 - Italy: revision policy in the quarterly employment estimates
21
5. A complete sequence of revisions: an example from Finland
The monthly retail trade survey of Statistics Finland is mainly based on VAT data. Only the largest
enterprises are surveyed, plus some smaller enterprises which are considered to be crucial for the
estimates. However, VAT data arrives too late for the first estimates. As a result, the first estimates
for month t are produced without using VAT, but with a large enterprises survey only. The
consecutive turnover growth rates for month t are determined as follows:
A first estimate is provided 30 days after the end of the month. It is based on unweighted
survey results of about 100 large enterprises. The estimate is made for the highest aggregates
only.
A second estimate is provided 45 days after the end of the month. It is based on unweighted
survey results of about 280 large enterprises. This estimate is used for high and some lower
aggregates. The selected 280 enterprises cover about 70 % of total turnover.
VAT is used for the estimates provided at 75 days after the end of the month. At this stage all
aggregates are published. The estimation techniques are described by Maasing et al., 2013.
The results are revised up to 225 days after the month. The ‘225 days’ estimate is considered
as final, because the VAT-information and “Business Register” information about the
population is complete.
The first two estimates are based on a simple model; “growth rate of the large enterprises = growth
rate of the entire target population”. In formulae:
The results of the estimates a) after 30 days, b) after 45 days, b) after 75 days and c) 225 days are
shown in Figures 3 and 4. Figure 3 focuses on the revisions between the first two preliminary
estimates and the third (first admin data based) estimate for month t. Figure 4 focuses on the
revisions between the second preliminary estimate, the first (incomplete) admin data based
estimates and the final (complete) admin data based estimate.
Figure 3 shows that, for the retail trade, the 30 day and 45 day estimates are quite similar, because
the revision between these estimates and the t+75 days estimate are quite similar. In general the
t+45 days are closer to the t+75 days estimate than the t+30 days estimates. This was expected, as
the sample is larger at t+45 days. No systematic bias was detected between the t+30, t+45 days
estimate on one hand and the t+75 days estimate on the other hand. Neither was a relationship
between growth and revision detected. As a result, it can be concluded that these revision analyses
indicate that the t+30 and t+45 days estimates are satisfactory, taking into account the small sample
size. This demonstrates that for high aggregates first estimates can be produced with a small survey
and a simple model, if the necessary assumptions of the latter (in this case: the short-term
MLEtt
SEMLE
MLEttSEMLEttMLE
tt GYY
CGYGYG ;1,
;1,;1.
1,
...
22
movement of growth of large enterprises is correlated with the growth of all enterprises) are
fulfilled.
Figure 4 shows that the growth rates after 75 days (the first ones produced with VAT) are
structurally higher than the estimates after 225 days, i.e. they are biased upwards. This is caused by
uncertainty about the active population 75 days after the month, which is caused by time-lags in
VAT and Business Register information about starting and stopping enterprises, leading to
uncertainty about which missing VAT-data need to be imputed (Maasing et al., 2013). The results
show large revisions in summer 2010 and 2011, respectively. Further analyses have revealed that
these high revisions are related to a low survey response in the summer periods. This example
illustrates a major drawback of this estimation method. As a small dataset is grossed to the entire
target population, errors and irregularities in this dataset may easily lead to errors in the estimates.
Hence, it is recommended to check the (few) available data thoroughly if they are used for a
temporal estimation of many small enterprises.
Figure 3 Growth rates derived from a) LE-estimate 30 days after the end of the month, b) LE-estimate 45
days after the end of the month based on a larger sample, c) LE + VAT estimate 75 days of the month using
incomplete VAT-data. Revisions are shown in bars below.
23
Figure 4 Growth rates derived from a) LE-estimate 45 days after the end of the month, b) LE + VAT
estimate 75 days of the month and c) a final estimate after 225 days of the month when the VAT-data and
information about the population are complete.
These series provide useful information about revisions and the quality of the series. The results
generally agree with the observations of deliverables 4.1, 4.2 and 4.3 of the ESSnet Admin Data.
However, they also reveal additional issues (slight structural bias in t+45 day estimate; effect of
lower responses in summer) that were only detected afterwards. In the next chapter, the approach of
structural revision analyses and an accompanying revision sheet is presented, in order to detect
larger revisions and underlying causes at an earlier stage.
24
Variable Domain Vintage No. of
occurrences Period
Size Direction Variability
Impact on
growth
rates
direction
MAR RMAR MaxAR MeAR MR % > 0 % < 0 MeR SDR Range
%
Sign(Later)
=Sign(Early)
Turnover Retail trade t+75 36 Jan09-Dec11 0.7 0.2 1.4 0.7 -0.7 0.0 100.0 -0.7 0.3 1.3 94.4
Turnover Construction t+75 36 Jan09-Dec11 1.5 0.1 6.3 1.5 -1.0 8.3 91.7 -1.2 1.6 9.4 94.4
Wages and
salaries Retail trade t+45 36 Jan09-Dec11 0.8 n.a 1.6 0.8 -0.8 0.0 100.0 -0.8 0.3 1.6 n.a.
Wages and
salaries Whole economy t+45 36 Jan09-Dec11 0.8 n.a 1.7 0.7 -0.8 0.0 100.0 -0.7 0.3 1.5 n.a
Table 5 - Summary statistics on revisions - Finland
25
6 A structural way to analyse revisions: an example from Italy
6.1 Introduction
A structured way to analyse revisions is proposed in this section. It takes into account the general
context as described in chapters 1 to 5. These analyses focus on:
1. the revision between the first and the last estimate, although the method can be used between
any two vintages;
2. the original series (not adjusted for calendar or seasonal factors);
3. the growth rate as the target parameter.
The presented framework is aimed to provide a concrete example of structural revisions analyses
according to the outlines as sketched in previous chapters. It is a continuation of the work of Roestel
(2011). Hence, although the example is worked out for social security data of Italy and VAT-data
for Estonia, it is basically a continuation of the attempt to formalise VAT-analyses in the
Netherlands and Germany (Roestel, 2011). More specifically, it provides the tools to assess the size,
systematic nature and plausibility of revisions and to direct further efforts to understand the causes
of revisions, splitting up the revision into the part due to admin data and the part due to survey data.
The framework proposed follows a top-down approach, as illustrated in Figure 5.
The aim of this example is not only to provide a concrete example of revision analyses and show
which factors in the statistical process may lead to revisions, but also to demonstrate which data and
time-series need to be stored to perform structural revision analyses. The importance of the latter
remark must not be under-estimated, because several admin data STS systems exist where there is
no opportunity to perform structural (automatic) revision analyses.
6.2 General outline: 5 steps
In the first step, general information on the target parameters and their estimation procedures are
summarised according to a pre-defined form, in order to provide the context for the analysis of
revisions (step 1).
In the second step, synthetic measures of revisions are reported for all the publication domains (step
2). From this bird’s eye view of the aggregate revisions of all the publication domains, the analysis
drills down in two directions. The first direction (step 3) aims at deepening the analysis only on
domains that have shown problems in step 2 through:
a) a time-series representation of the revisions; and
b) decomposition, showing the contributions of admin and survey data.
The second direction aims at detecting, at the level of estimation methodology, the critical issues
that may have influenced revisions (step 4) either with cross-domains analysis or, again, drilling
down into specific domains. Being method-specific, the appropriate analysis for this second
direction can be only chosen in the specific context. In the final step (step 5) a cause and effect
26
report is prepared in order to summarise the most relevant issues which have emerged from the
previous analyses.
This framework is intended as a monitoring tool, for internal control and not for publication, which
might provide useful feedback for process management and in the evaluation of the estimates for
future improvements.
In this context, the purpose of this analysis framework is threefold. Firstly, the analysis of revisions
can be useful in experimental contexts to evaluate a tentative methodology and eventually compare
some variants. Secondly, even in current production contexts it could be a useful tool to monitor the
performance of the estimates, through the analysis of characteristics and causes of revisions.
Thirdly, applying these analyses on subsequent deliveries of admin data referred to the same period
should help to evaluate the gains in terms of revisions of successive deliveries and to design a
revision policy.
The proposed analysis requires not only that all vintages of macro data are stored, but also, for the
analysis in step 3 and 4, that it is possible to replicate the data situation of the different sources at
different points in time. In other words the data base system should be able to track the changes in
micro data (either value substitutions, availability of late units, changes in the SBR etc.).
Figure 5. The revisions analysis framework
Context information
Revision measures
- Target parameters.
- Data sources.
- Estimation procedures. First/last estimate
Step 1
Step 2 - Synthetic measures for publication domains
Step 3 Analysis of
problematic domains
Graphical analysis
Revision
decomposition
(survey/admin)
Cause effect
analysis
- Legislation changes.
- Sudden (unexpected) drop in data depending on?
- External original source revision of microdata.
- Internal processing revision of microdata.
- Changes in methodology.
- Reclassification/errors in Nace code.
- Others.
Step 5
Further analysisStep 4 - Estimation methodology (country specific)
27
Chapter 6.3 shows how this framework is used for quarterly estimates of the number of employees
based on Social Security in Italy (Baldi et al. 2012). An additional example on the use of VAT for
turnover estimates for the retail trade in Estonia is shown in Appendix 1. A further application is
found in Langford and Teneva (2012). Note that these examples are presented as case studies and
the framework can be adapted to a more general situation.
6.3 Step 1: context information – example Italian employment data
The purpose of the ‘context information’ section is to provide context information and metadata to
understand the analysis of revisions that follows.
This template and the corresponding analyses are designed to study the revisions per indicator
(same target variable, same source, same periodicity, same deadline). Therefore, a separate form for
each indicator should be compiled. The template is filled in with the Italian data in Table 6.
However, it is relatively straightforward to complete this form for admin data based statistics (VAT
or social security data) in other countries. In Annex 1 this example has been filled in for Estonia.
General Information
Indicator Number of employees
Target Domains All divisions of the B-N (Nace Rev.2) aggregate
Periodicity Quarterly
Number of routine revisions 1 (q-4)
Deadline of the first estimate 60 days
Release of the first estimate 60
Release of the second estimate 150
Release of the third estimate 240
Release of the fourth estimate 330
Release of the final estimate 420 days
Sub-populations
Large enterprises Enterprises with 500 employees or more
Share of LEs in terms of target variable (i.e.
turnover or employment)
20% of total employment
Small and medium enterprises Enterprises with less than 500 employees
Share of SMEs in terms of target variable
(i.e. turnover or employment)
80% of total employment
First estimate
Large enterprises
Use of survey Yes, census, 1400 enterprises
% of survey respondents on final data 84.2% (2011 average value)
% of target variable on final data 87.7% (2011 average value)
Use of admin data No
% of data reporters on final data
% of target variable on final data
Estimator Enumeration of available data + imputation of
missing units
28
Small and medium enterprises
Use of survey No
% of survey respondents on final data
% of target variable on final data
Use of admin data Yes: direct use of almost complete data
% of data reporters on final data 98.15% (2007-2010 average value)
% of main variable on final data 98.14% (2007-2010 average value)
Estimator Enumeration of available data + imputation of
missing units.
Combined estimate between LEs and
SMEs
Sum of LEs and SMEs estimation results by domain
Notes In the imputation procedure, the list of active non
reporting units is predicted through adjacent
reporting. Employment values are imputed using
E(t)/E(t-1) calculated by domains and on panel
reporting units
Final estimate
Large enterprises
Use of survey Yes
% of survey respondents on final data 96.8% (2011 average value)
% of target variable on final data 98% (2011 average value)
Use of admin data No
% of survey respondents on final data
% of target variable on final data
Estimator Enumeration of available data (included late
respondents not included in the preliminary
estimate) + imputation of missing units on the basis
of a deterministic approach
Small and medium enterprises
Use of survey No
% of survey respondents on final data
% of target variable on final data
Use of admin data Yes
% of data reporters on final data 100%
% of main variable on final data 100%
Estimator Enumeration of available data
Combined estimate between LEs and
SMEs
Sum of LEs and SMEs estimation results by domain
Notes
Table 6. Context information, as filled in for quarterly employment estimates in Italy.
In relation to the framework, it is worth emphasising that, in the subsection on “sub-populations”,
each sub-population of enterprises should be indicated for which there is a significant difference in
data source ( survey or admin data); or methodology.
29
Routine revisions in this template are defined as all the regular revisions scheduled according to the
revision policy. Occasional revisions, due to a change in methodology or unexpected events, are not
mentioned in this part of the framework. If a country is still developing an STS-production system
which uses admin data, this template may be useful to develop a tentative revision policy.
All relevant information to understand the methodology, especially with regards to the analysis of
revisions, should be reported under the line ‘notes’.
With regard to the Italian example, some details in the framework are evident and there follows a
summary with further details. The Italian employment estimates are obtained with a mixed mode
approach where the population of enterprises with at least 500 employees is covered by a traditional
survey and the population of small and medium enterprises is covered with Social Security data
(Baldi et al. 2011). The large enterprises survey (LE-survey) is a monthly panel survey that follows
all the enterprises that had at least 500 employees in the base year regardless of whether their size
remained over the threshold since that year. In order to cover the population of enterprises not
represented by the survey, a complementary list of enterprises is built from the Social Security data
in the base year and maintained in every quarter by adding the firms that entered the population. In
every period, the estimate is produced by adding the estimated employment of the LE-survey to the
estimated employment derived from the admin data. The preliminary first estimates of the LE-
survey are obtained by adding imputed values of the non-reporters to the enumeration of the
reporters. Once a year the LE-survey data of the previous 12 months are revised, replacing the
imputed values with the reported values.
The estimates for the population of SMEs covered by the admin data, shown in the example, are
obtained with an imputation procedure much like the one used in Germany for the turnover
estimates based on VAT data. It is a current population methodology where STS-estimates for the
target population of enterprises for the month t is approximated by the early reporters for that
month, plus imputed values for the enterprises assumed active for that month but not reporting. This
list is formed using the following deterministic rules:
In each of the first two months of a quarter, a unit is declared missing (that is assumed to be
active) if it has reported for both the month before and the month after the reference month.
For the third month of the quarter, for which there is no information about the following
month, the unit is assumed to be active if it has reported the month before.
The employment value is imputed by applying the growth rate between month t-1 and month t (of
the units present in both months) to the value of employment for the missing unit at the time t-1. In
formulae, the provisional population of units Pt
pis composed by the population of early reporters
Pt
erand the population of units defined as missing reporters P
t
mr:
p er mr
t t tP P P [1]
The above deterministic rules imply that a unit is defined to belong to the population of missing
units for the month t:
30
when t is the first or second month of the quarter;
when t is the third month of the quarter.
The value imputed for a missing unit is therefore: [2]
6.4 Step 2: revision measures – example for Italian employment data
This step is basically the heart of the revision analyses as it provides a set of summary statistics
about the revisions for all the publication domains. The mentioned revisions are supposed to be
evaluated on the year-on-year growth rates.
The literature indicates a variety of summary measures (see for instance Di Fonzo, 2005; McKenzie
and Gamba, 2008). In the following table (table 2), the most commonly used are examined. The
summary measures are classified in four groups, depending on the kind of information they provide
(size, direction, variability, impact on sign of growth rates). A detailed description of the summary
statistics of table 6, together with their formulae, is provided in Appendix 2.
The aim of these revision measures is to provide two kind of indications:
a general idea about the (size of) revisions and check whether there is anything systematic in
the revisions
the identification of domains that need in-depth analysis. The analyst can define rules and
thresholds to identify the problematic domains. One can also have a couple of thresholds for
each indicator: perhaps a first to indicate mild problems and a second to indicate severe
problems.
The table has been filled in with the results on the Italian estimates for 10 selected domains only.
The most critical values of the summary statistics report are picked out in yellow.
The main points highlighted by the analysis are:
1. Overall the revisions are quite limited in size: they range in terms of Mean Absolute Revision
between 0.1 and 0.3 with the exceptions of division 10 (0.4) and division 81 (0.8). In terms of
Median Absolute Revision they are below 0.3, with the only exception of division 81 (0.6).
2. The analysis of direction measures shows that the revisions are, slightly, systematically positive
implying a small under-estimation in the first estimates.
3. On the basis of the Max Absolute Revision and size of the domain three cases are chosen for a
more detailed analysis: division 10 (manufacture of food products), division 41 (construction of
buildings), division 81 (services to buildings and landscape activities).
The first conclusion from these results is that the first estimates tend to be too low, which is useful
information when publishing the results.
1 1 if and mr er er
t t ti P i P i P
1 if mr er
t ti P i P
1
1
1
1
ˆer er
t t
er ert t
jt
j P P
it it
jt
j P P
y
y yy
31
More detailed analyses, relating this under-estimation to:
the large enterprise survey;
the estimation procedure; and/or
the uncertainty in provisional target population
are discussed in the next chapter and are of course crucial to improving the system (or comparing
systems across countries). The second conclusion from these results is that divisions 10, 41, 81 are
weak sections, which is again useful information when publishing the results.
6.5 Step 2a: graphical analysis of problematic domains
For each problematic domain selected from the previous analysis, a graph comparing the first and
the last estimate and related revisions is shown below (Figure 6). The aim of these graphical
analyses is to determine whether the weak domains have large revisions for all periods or only
certain periods. It is important to address this for two reasons – firstly for explaining published
results, and secondly in case high revisions occur regularly and are related to the business cycle
or seasonality, which would show that the chosen methodology is unsuitable for that particular
domain.
32
Domain Period No. of
occur-
rences
(for
which
revisions
are calcu-
lated)
Size of
domain
(target
population)
Survey
Covera
ge
Revision error
No. of
employees
Target
variabl
e (%on
target
popu-
lation)
Size Direction Variability Impact on
growth rates
direction
MAR RMA
R
MaxA
R
MeAR MR %
revisio
n >0
%
revision
<0
MeR SDR Rang
e % Sign(Later) = Sign(Early)
10
2008:1-
2010:3 11 307,084 15.3 0.4 0.3 1.5 0.1 0.4 100.0 0.0 0.1 0.6 1.5 90.9
15
2008:1-
2010:3 11 122,478 4.6 0.2 0.0 0.4 0.1 0.1 81.8 18.2 0.1 0.2 0.6 100.0
25
2008:1-
2010:3 11 510,910 2.3 0.1 0.0 0.3 0.1 0.1 81.8 18.2 0.1 0.1 0.5 100.0
28
2008:1-
2010:3 11 426,635 15.4 0.2 0.1 0.5 0.2 0.2 90.9 9.1 0.2 0.1 0.5 90.9
30
2008:1-
2010:3 11 92,197 53.8 0.3 0.1 1.0 0.2 0.1 63.6 36.4 0.1 0.4 1.3 100.0
41
2008:1-
2010:3 11 469,929 1.8 0.2 0.0 0.9 0.1 0.1 63.6 36.4 0.0 0.3 1.0 100.0
47
2008:1-
2010:3 11
1,033,858 22.7 0.2 0.1 0.6 0.2 0.2 90.9 9.1 0.2 0.2 0.6 90.9
64
2008:1-
2010:3 11 388,278 75.3 0.2 0.2 0.6 0.3 0.2 90.9 9.1 0.3 0.2 0.7 100.0
71
2008:1-
2010:3 11 78,519 11.9 0.2 0.1 0.6 0.2 0.2 90.9 9.1 0.2 0.2 0.6 90.9
81
2008:1-
2010:3 11 426,375 19.8 0.8 0.4 2.2 0.6 0.8 100.0 0.0 0.6 0.6 1.9 90.9
Table 6. Summary statistics on revisions (application to Italian employment estimates)
33
Figure 6. Graphical analyses: preliminary estimate, last estimate and revisions (application to Italian
employment estimates).
The graphical analyses show that high revisions are concentrated at the end of 2009 for division 10,
while for division 41 the only noticeable revision (2010 Q2) is very small compared to the size of
the growth rate (-15%). Division 81 is different since it shows sizeable, positive, revisions in almost
every quarter, with an increase at the beginning of 2010.
6.6 Step 2b: decomposition of the revision error into a survey part and an admin
data part
In this step, for the problematic domains, the revisions are broken down according to the
contribution due to the survey and the contribution due to admin data. The exact calculations behind
this decomposition are provided in Appendix 2. In this chapter only the results will be presented.
A simple way of presenting the revision and its components can be through graphs like those in
Figure 7. The red line in this graph is the total revision and the bars represent the contribution of
the two sources of data. The application to the three selected Italian divisions is shown.
34
Figure 7. Graphical analysis: decomposition of revisions due to admin and survey data.
35
Decomposing the revisions in this way shows that for division 10 in 2009 Q3 and 2009 Q4 the
revisions were substantially due to the LE-survey. The explanation is that a couple of influential
enterprises were missing for the preliminary estimates and imputed. Since they had a seasonal
pattern different from that of the respondent enterprises, the imputation procedure was unable to
predict their values accurately.
For the other two divisions, where LEs are much less relevant, the bulk of revisions are due to the
admin data. In division 81, in the last part of the time series, the contribution of the large enterprises
compensates a little for the contribution of the SMEs.
This Figure is important because it reveals that large revisions might be due to missing data in the
LE-survey. Due to the significant contribution of the LE-survey data to the estimate, the data
collection and imputation of the LE-survey also need to be sound in an admin data based STS-
system! If this is not the case, large revisions in admin data based STS-estimates may arise which
are NOT caused by deficiencies in the admin data.
Large revisions caused by the LE-survey were also detected in a new VAT-system for quarterly
turnover estimates in the Netherlands, in a VAT-based STS-system in Estonia (Appendix 1) and
were also described by Roestel (2012) when using VAT-data in Germany. Hence, there seems to be
a general problem of large revisions due to missing (or revised) data in the STS-survey, not just
with Italian employment data.
6.7 Step 3: further analysis
This step focuses on the main mechanism issues that may have caused revisions. The analysis drills
down into the most critical features of the adopted method; to identify and quantify the contribution
of the most crucial steps of the estimation methodology to the revision. This step covers method-
specific explorations and is therefore left to each NSI. For methodologies based on imputation
procedures, it is recommended that the analysis addresses whether the revisions are due to the
identification of units for imputation or to the applied method of imputation. Further insights may
come from analysis aimed at understanding how the methodology works in sectors characterised by
different dynamics or behaviours (seasonality, trend, business demography etc.).
In the example of Italian employment estimates, because the table of summary measures of
revisions has highlighted a slight but systematically positive revision error, further analysis is
required into problems with the imputation procedure for the part covered by admin data. Since this
is composed of two parts, it can be useful to see if the errors are due to the imputation method or
due to the uncertainty in the active population at the timing of the first estimates.
To clarify the results, using the same notation as in Appendices 1 and 2 and dropping the redundant
subscript and superscripts, the preliminary estimate may be written as:
[3] ˆp
p
t it
i P
Y y
36
The preliminary estimate can thus be rewritten as:
[4]
Analogously, since the final population may be seen as composed by the early reporters and the
late reporters (because ), the last estimate is defined as:
[5]
The difference between the final estimate and the preliminary estimate (the revision error) on the
subpopulation is thus:
[6]
Since part of the population of assumed missing reporters is effectively constituted by late reporters,
this can also be written as:
[7]
where the first term represents the imputation error on the population of units correctly assumed as
active, the second term represents the under-estimation due to the population of units which were
not identified as active (or were incorrectly defined as inactive) and the third term represents the
over-estimation due to the population of units incorrectly defined as active.
Table 7 shows for the selected domains the terms described above as averages for the period 2008-
2010. All values are expressed as percentage share of the last estimate. Starting from the total
preliminary estimate (column h) we find, like table 2, the result of a slight downward bias. On
average, this accounts for a maximum of 0.2-0.3 percentage points, with the exception of division
81. Analysing the results in more detail, it can be seen that there are no significant differences (for
the population of enterprises imputed at t and then reporting in the final population) between the
imputed values and the reported values (columns b and c). This result suggests that the imputation
method on average is not responsible for the error. The slight downward bias derives instead from
the population of units which are incorrectly defined as active (column d) or which were incorrectly
identified as inactive in the provisional population (column e).
ˆer mr
t t
p
t it it
i P i P
Y y y
er
tP
lr
tP l er lr
t t tP P P
ˆ ˆ( ) ( )er lr er mr lr mr
t t t t t t
l p
t t it it it it it it
i P i P i P i P i P i P
Y Y y y y y y y
( ) ( )
ˆ ˆ ˆ( )lr mr lr mr lr lr mr mr lr mr
t t t t t t t t t t
l p
t t it it it it it it
i P i P i P P i P P P i P P P
Y Y y y y y y y
er lrt t
l
t it it
i P i P
Y y y
37
Table 7. Summary results on the Italian imputation method
The main conclusion from these analyses is that uncertainty in the active enterprise population
causes systematic revisions. This has been found not only with Social Security data in Italy, but also
with VAT-data in Estonia, Finland and Germany (Vlag et al.,2013).
6.8 Cause and effect report: a synthetic description of the main causes of
revisions
An analysis of cause and effect for the main causes of revision emerged from the previous analysis,
using the following grid (Table 8). In the first section of the grid, the main characteristics of the
revisions are described from a general perspective, as the synthetic report on the main causes of
revision for the Italian employment estimates emerged from the previous analysis.
In the second part of the grid, for the most problematic domains (10, 41 and 81 in the Italian
example), for each point in time for which the revision is particularly significant, the possible
causes are reported, according to a pre-specified codification (see below).
Without later
reporting
Imputed
values in
the
preliminary
estimates
(b)
Reported
values in
the final
data (c)
Imputed
values in the
preliminary
estimates
(d)
10 98.2 1.2 1.2 0.4 0.7 1.6 1.8 99.8 100.0
15 98.4 1.0 1.0 0.5 0.7 1.5 1.6 99.8 100.0
25 98.6 0.9 0.9 0.3 0.4 1.3 1.4 99.9 100.0
28 98.5 1.0 1.0 0.3 0.5 1.3 1.5 99.8 100.0
30 98.1 1.2 1.2 0.5 0.7 1.7 1.9 99.8 100.0
41 97.7 1.4 1.4 0.8 1.0 2.2 2.3 99.9 100.0
47 98.0 1.2 1.2 0.6 0.8 1.8 2.0 99.8 100.0
64 98.2 1.2 1.2 0.3 0.6 1.5 1.8 99.7 100.0
71 98.3 1.1 1.1 0.3 0.6 1.5 1.7 99.8 100.0
81 96.3 1.8 1.8 0.7 2.0 2.5 3.7 98.8 100.0
Domain
Imputed
value
f=(b+d)
Total
reported
value
i=(a+g)
With later reporting
Early
reporters
(a)
Units with imputed missing values at
t
Units without
imputed values
in the preliminary
estimates but
reporting in the
final data (e)
Total
estimated
value
h=(a+f)
Reported
values
g=(c+e)
38
Table 8. Causes of point in time errors. Application to the Italian employment estimates
GENERIC
(CROSS
DOMAINS)
- Slight general under-estimation.
- Under-estimation systematically characterises SMEs subpopulation.
- Revisions of survey data are in general low and sometimes compensate for SMEs
revisions. But for some domains they appear to be significant.
- Almost all domains record high revisions in the first and second quarters of 2010, due to
an administrative change.
DOMAIN POINT IN TIME
ERRORS/GENERIC
REVISIONS
CAUSES DESCRIPTION
Division 10 2009q3 SR High revisions on LEs survey data. Sector
characterised by seasonal activity.
2009q4 SR High revisions on LEs survey data.
2010q2 LC Severe change of the Social Security declaration
form.
Generic
Division 41 2010q2 LC Severe change of the Social Security declaration
form
Generic
Division 81 2009q1 MP General underestimation of the imputation method
+
2009q4 MP General underestimation of the imputation method
+
2010q1 MR+ LC General underestimation of the imputation method
+
2010q2 MP+LC Severe change of the Social Security declaration
form
2010q3 MP+LC Severe change of the Social Security declaration
Generic General underestimation of the method. Sector characterised by
higher than average, non reporting rates and relevant business
dynamics.
Classification of possible causes of revisions
Legislation changes (LC)
Sudden (unexpected) drop in data----depending on?(AD)
External original source revision of microdata (MR)
Internal processing revision of microdata (PE)
Changes in methodology (MC)
Reclassification/errors in Nace code (NC)
Method performance (MP)
Revision of Survey data (SR)
Others (explain) (O)
Summarising, these revision analyses revealed that:
39
the revisions are generally quite low, suggesting that the preliminary estimate is quite reliable.
However it appears that the first estimate is characterised by a small systematic under-
estimation (due to provisional active population estimate);
regarding the estimation using admin data, the imputation for late reporters does not seem to be
a problem provided the units to be imputed are correctly identified. The imputation procedure
adopted, based on the month-on-month growth rate of the reporters works well overall in
estimating the employment of the units that are correctly identified as late reporters;
more important is the (in)ability of the first estimates to capture the true population dynamics.
In practice, over-imputation for inactive units does not fully compensate for the under-
imputation of new starters;
the imputation methodology did not fully cope with the unexpected and considerable drop in
data due to the legislation change which occurred in 2010;
6.9 Generalisation and general remarks
Two generalisations can be considered depending on the available data in the individual countries.
The first consists in analysing the average measures of revisions through time to check whether
they are not influenced by the business cycle or changes in information availability (for
instance because of legislation changes or other administrative issues). A simple way to do this
is to calculate the table of summary revisions for sub-periods. An alternative is to follow
revision measures on “sliding time windows”.
A second useful generalisation is to study the revisions across vintages. This may highlight
whether the estimates converge gradually to the final values or not (Röstel 2011). Moreover, it
may provide insights into the impact of updates to the various sources (admin data, survey data,
benchmarking, etc.).
A general comment is that we have noticed that only a few countries structurally carry out revision
analyses. Moreover, the necessary data to perform revision analyses are sometimes not available, or
are difficult to extract from existing systems. While the available admin data used for the different
releases are often stored, it was sometimes difficult to trace back afterwards which version of the
Business Register or which LE-survey data were used for the consecutive estimates. The latter may
suggest that the contributions of these factors to the quality of the admin based STS-estimates might
be under-estimated when developing systems.
As revision analyses provide essential information about the quality of the preliminary estimates
and the underlying causes (missing admin data, missing LE-survey data and/or inability to capture
population dynamics), when developing an admin data based STS-system it is recommended that
the database is constructed in such a way that structural revision analyses can be carried out.
40
7 Conclusions
Analyses of current practices suggest that
estimation methods for missing admin data;
missing data in the large enterprises survey; and
uncertainty of the active population when the estimates have to be made
are the most important components determining the quality of admin data based STS-estimates (see
deliverable 4.1 of the ESSnet on AdminData). If preliminary estimates have to be made when no
admin data are available, the factor ‘estimation methods for missing admin data’ is replaced by the
quality of temporary model-based estimations.
In this case a complete sequence of STS-estimates for period t (and its accompanying revisions)
should consist of:
model-based estimates for small and medium sized enterprises, replaced by admin data
estimates with a decreasing amount of missing admin data over time, and
a survey-based estimate for the largest enterprises with an decreasing amount of missing survey
data over time.
Within such a sequence, the size of revisions is the primary quality indicator of the accuracy of the
preliminary estimates. Moreover this indicator is available to the general public. Large and/or
systematic revisions may be interpreted as a signal of unreliability and undermine the credibility of
the preliminary estimates. Revision analyses are a powerful tool to increase the reliability of (admin
based) STS-estimates.
Revision analyses are highly recommended in the development stage of such a system, as they help
to evaluate the methodology, compare its variants and indicate the direction for improvements. If an
NSI is considering replacing a survey based approach with an admin data based or mixed approach,
the study of the revisions is essential to compare the old and the new methods. Another interesting
application for revision analyses is in the choice of the boundary between the survey part and the
admin data part.
Revision analyses are equally important during current production, as they can help discover sudden
events in both available admin (or survey) data or changes in the business cycle that are causing
larger revisions. Moreover, they can detect publication levels for which the preliminary estimates
are ‘strong’ and publication levels for which they are ‘weaker’. At the same time, revision analyses
reveal the strength of the estimates under normal or abnormal circumstances. The availability of
such information is very useful for fine-tuning the estimation methodology or explaining results to
the public.
In this context, it is remarkable that few countries perform structural revision analyses on their
admin data based STS series. In some cases, the necessary information cannot even be derived from
the database. Hence, one of the recommendations when developing or updating an admin data based
41
STS-system is that the database is constructed in such a way that structural revision analyses can be
carried out.
The proposed revision sheet might be a useful tool to analyse revisions structurally, because it
includes all necessary elements. Depending on the exact data situation in an individual country, the
practical use of such a sheet will differ per country.
Revision analyses are closely related to a revision policy. Generally the deadline for the first
estimates is determined by European Regulations or by national needs. The publication dates of
later (more final) estimates are often more flexible. As uncertainty in the provisional active
population is a major source for revisions and often leads to slightly biased (under- or
overestimated) growth rates in the first estimates, it is recommended that the final estimates are
made when both the admin data and survey data are complete, and the Statistical Business Register
has been finalised. Other revision points can be determined based on output obligations, or at times
when the missing or revised data become available.
Acknowledgements
This deliverable is the result of the work and discussions of the complete ESSnet WP4 group. The
authors would like to acknowledge the contributions and constructive comments of all the countries
participating in WP4 of the ESSnet Admin Data. The gratitude and appreciation of the authors are
sent to them all.
References
Baldi C., Congia M.C., Pacini S., Tuzi D. (2011). The STS-employment estimates in Italy based
on admin data. Deliverable of Work package 4.
Baldi C., Tuzi D. (2012). Analysis of revisions of admin data based short term statistics. Proposal
of a template and an application to Italian employment estimates. Interim report of Work package 4.
Baldi C., Ceccato F., Pacini S., Tuzi D. (2012). Imputation of employment admin data in Italy.
Interim report of the Work package 4.
De Waal, A.G., Vlag, P.A., Baldi, C. Tuzi, D. (2012), The use of administrative data for STS.
Situation I: Good coverage provided by administrative data. Milestone of Work package 4.
http://essnet.admindata.eu
Di Fonzo T. (2005). The OECD project on revisions analysis: First elements for discussion, paper
presented at the OECD STESEG Meeting, Paris, 27-28 June 2005
(http://www.oecd.org/dataoecd/55/17/35010765.pdf).
Eurostat (2012). ESS Guidelines on Revision Policy for Principal European Economic Indicators
(https://circabc.europa.eu/faces/jsp/extension/wai/navigation/container.jsp).
Kavaliauskiene D., (2011). Application of Ratio and GREG-Estimator to VAT for Monthly
Turnover Estimates. Deliverable of Work package 4.
Kavaliauskiene D, Slickute-Sestokiene M & Vlag P (2013), The use of regression estimators for
admin data based STS estimates, Deliverable 4.2 of ESSnetAdminData – SGA2011,
http://essnet.admindata.eu
42
Karus E. (2012). Revision analysis on Estonian Retail Trade Data. Presentation for Work package
4.
Kiema S., Remes T. (2012). Comparison of Imputation with Realisations. Presentation for Work
package 4.
Langford, A., and Teneva, M. (2012). Analysis if revisions of admin data based short term
statistics. Application to UK retail sales data and implications for the definition of the boundary
between survey and administrative data coverage. Internal report of the Work package 4 (upon
request).
Lorenz R. (2011). Current Results and Future Improvements in Respect of Estimates for Missing
Values in the VAT Registration. Deliverable of Work package 4.
Maasing E. (2012). Testing Imputations on Estonian Retail Trade Data. Presentation for work
package 4.
Maasing E, Remes T, Baldi C & Vlag P (2013), STS estimates based solely on admin data: final
results and recommendations, Deliverable 4.1 of ESSnetADminData – SGA2010,
http://essnet.admindata.eu
Mazzi G. L., Ruggeri Cannata, R. (2008). A Proposal for a Revisions Policy of Principal
European Economic Indicators (PEEIs), Contribution to the OECD/Eurostat Task Force on
Performing Revisions Analysis for Sub-Annual Economic Statistics
(http://www.oecd.org/dataoecd/44/39/40309491.pdf).
McKenzie R., Gamba M. (2008). Interpreting the results of Revision Analyses: Recommended
Summary Statistics. Contribution to the OECD/Eurostat Task Force on “Performing Revisions
Analysis for Sub-Annual Economic Statistics” (www.oecd.org/dataoecd/47/18/40315546.pdf).
OECD/Eurostat Task Force on Performing Revisions Analysis for Sub-Annual Economic
Statistics (2008). A basis for classifying reasons for revisions to short term statistics,
(http://www.oecd.org/dataoecd/44/37/40309451.pdf).
Orchard C., Langford A., Moore K. (2011). National practices of the use of administrative and
accounts data in UK short-term business statistics. Deliverable of Work package 4.
Röstel D. (2011). Attempts to improve methods of plausibility checks of combined turnover data in
German service statistics (STS). Deliverable of Work package 4 (SGA-2010).
Sirviö M. (2011a). Turnover Indices (incl. Retail Trade) and Value Added Tax (VAT) Data.
Deliverable of work package 4 (SGA-2010).
Sirviö M. (2011b). Industrial Production Index and Value Added Tax (VAT) Data. Deliverable of
Work package 4 (SGA-2010).
- M. (2011). Application of GREG Estimators for (Administrative Data Based)
Short Term Lithuanian Labour Statistics. Deliverable of Work package 4 (SGA-2010).
Teneva M. (2012). Use of VAT data in Monthly Business Survey. Presentation for Work package
4.
Toivanen E. (2012). Practical Experiences with Estimating Incomplete VAT Data. Presentation for
Work package 4.
Vlag P. (2012). Using Incomplete VAT-data for Turnover Estimates in Europe. Presentation for
Work package 4.
43
Vlag P., Ortega Azurduy S., Karus E. (2011a). The use of admin data for monthly and quarterly
estimates: common issues and challenges in Estonia, Finland, Germany, Italy, Lithuania, The
Netherlands and the United Kingdom. Deliverable of Work package 4 (SGA, 2010).
Vlag P., Ortega Azurduy S., Van Loon A., Scholtus S. (2011b). Monthly turnover estimates with
VAT: challenges in the Netherland. Deliverable of Work package 4 (SGA, 2010).
Pieter Vlag, Reinier Bikker, Ton de Waal, Eetu Toivanen, Mila Teneva (2013), Extrapolating
admin data for early estimation: some findings and recommendations for the ESS, Deliverable 4.3
of ESSnetADminData – SGA2010, http://essnet.admindata.eu
Appendix 1. An application of revision analysis to the Estonian
turnover estimates on retail trade
This section describes the results of the application of revision analysis to the Estonian retail trade
turnover estimates (division 47 of the Nace Rev.2) (Karus 2012). The analysis refers to the year on
year growth rates of the target variable and covers the period from January to May 2012.
The following table A3.1 reviews main information on sources, revisions policy, estimation
method, coverage etc. on the considered domain.
The Estonian monthly estimates on retail trade turnover are based on a mixed source approach,
where survey data are complemented by admin data on VAT. A census survey collects information
on large enterprises (20+ persons employed) while data on medium enterprises (2-19 persons
employed) are collected through a sample survey. VAT Admin data are used as direct source for
small enterprises (1 persons employed) estimates and as auxiliary source for the imputation of
missing units in the survey sources. Outliers from both survey and admin data are corrected with
stratum averages. The estimates are based on a fixed population, that for year 2012 was created in
November 2011, on the basis of a frozen version of the SBR. Only the changes for large and
important enterprises are taken into account while updating the population during the reference
year. In order to take into account exits, in the admin data portion, the activity status is predicted,
using information on VAT reporting: an enterprise is considered as non active if VAT declaration
for current and previous month was not reported. This information in reinforced using an additional
admin source: under the previous condition evaluated on VAT data, an enterprise is considered non
active if social security declaration for current and previous month was not reported or reported
with salary=0.
Considering the available information, three versions of preliminary estimates are released (30 days,
60 days and 90 days after the reference month); further versions of the preliminary estimates are
also calculated through benchmarking, when quarterly survey data are available (benchmarking is
implemented in February, May, August and November). Benchmarking also allows elimination of
differences in definitions between survey and admin data. For some months the number of revisions
may arrive to six (see §3 for further details on the Estonian revision policy). Final estimates are
released 1 year after the reference month.
44
This new method, based on the exploitation of the VAT data, has been in production only since
January 2012. For this reason only a few vintages of the estimates are available at the moment.
45
Table A3.1 - Context information. Estonian retail trade turnover estimates
General Information
Indicator Turnover
Target Domains NACE 47
Periodicity Monthly
Number of routine revisions First estimate + 3 revisions
46
Deadline of the first estimate T+15 days after the reference month for the survey
T+20 days after the reference month for VAT data
Release of the first estimate T+30 days (1 month) from the reference month
Release of the second estimate T+60 days (2 months) from the reference month
Release of the third estimate T+90 days (3 months) from the reference month
47
February, May, August and November Benchmarking with quarterly survey data
…
Release of the final estimate T+365 days (12 months) from the reference month
Subpopulations
Large enterprises Census (survey) for large enterprises, 20 and more persons
employed
Share of LEs in terms of target variable
(turnover)
84.2% (January 2012 8th estimate in September 2012)
Medium enterprises Sample survey for medium enterprises, 2-9 persons employed and
10-19 persons employed
Share of MEs in terms of target variable
(turnover)
13.1% (January 2012 8th estimate in September 2012)
Small enterprises Admin data (VAT) for small enterprises, 1 person employed
Share of SEs in terms of target variable
(turnover)
2.7% (January 2012 8th estimate in September 2012)
First estimate
Large and medium enterprises
Use of survey Yes, census and sample survey.
Sample size – 567 enterprises
% of survey respondents on final data 91.2% (August 2012 first estimate LME survey respondents/January
2012 8th LME survey respondents)
% of target variable on final data 98.1% (January 2012 first LME respondents turnover
estimate/January 2012 8th LME respondents turnover estimate)
Use of admin data Yes. VAT for imputation of missing data.
% of survey respondents on final data 25 respondents
% of target variable on final data 0.9%
Estimator Census – enumeration of available data + imputation of missing
data. Sample survey – enumeration of available data + imputation of
missing data, weighting.
Small enterprises
Use of survey Yes. 7 outliers
% of survey respondents on final data 100% (January 2012 first SE outlier respondents/ January 2012 8th
SE outlier respondents)
% of target variable on final data 100% (January 2012 first SE outlier turnover/ January 2012 8th SE
outlier turnover)
Use of admin data Yes. Population 4331 (2012). Direct use VAT data, statistical
turnover is calculated.
% of data reporters on final data 97.5% (January 2012 first SE VAT respondents/January 2012 2 nd
SE VAT respondents)
% of main variable on final data 88.8% (January 2012 first SE VAT turnover estimate/January 2012
48
8th SE VAT turnover estimate)
Estimator Enumeration of available data + imputation of missing data. Outliers
(turnover exceeding average +/– 3x StdDev) imputed with average
of the stratum.
Combined estimate between LMEs and
SEs
Sum of large, medium and small enterprises estimation results my
domain.
Final estimate
Large and medium enterprises
Use of survey Yes, census and sample survey.
Sample size – 567 enterprises
% of survey respondents on final data 14.1% of survey and VAT respondents
% of target variable on final data 97.0% of total NACE 47 turnover
Use of admin data Yes. VAT for imputation of missing data.
% of survey respondents on final data 0.2% of survey and VAT respondents
% of target variable on final data 0.1% of total NACE 47 turnover
Estimator Census – enumeration of available data + imputation of missing
data. Sample survey – enumeration of available data + imputation of
missing data, weighting.
Small enterprises
Use of survey Yes. 7 outliers
% of survey respondents on final data
0.2% of survey and VAT respondents
% of target variable on final data 0.2% of total NACE 47 turnover
Use of admin data Yes. Population 4331 (2012). Direct use VAT data, statistical
turnover is calculated.
49
% of data reporters on final data 61.4% of survey and VAT respondents
% of main variable on final data 2.5% of total NACE 47 turnover
Estimator Enumeration of available data + imputation of missing data. Outliers
(turnover exceeding average +/– 3x StdDev) imputed with average
of the stratum.
Combined estimate between LMEs and
SEs
Sum of large, medium and small enterprises estimation results by
domain.
In order to get comparable revisions, the following analysis is based on the comparison of the first
estimate with the fourth, that is not the last version. The fourth estimate includes, in addition to the
information used in the first (see above for more details of the revision policy):
1. updated survey data;
2. updated admin data; and
3. at least two revisions due to benchmarking with quarterly data (for March, April and May
three revisions due to benchmarking).
50
The analysis focuses on division 47 and some of its sub-domains, as described in table A3.2.
Table A3.3. reports the summary statistics on revisions calculated as described above. The most
critical values of the summary statistics are highlighted in yellow.
Although the time-series is relatively short, some important points about the quality of the output
are apparent:
1) Revisions are quite small compared with the size of the growth rate: the RMAR takes the value
of 0.1% for all the considered domains. In terms of Mean Absolute Revision it ranges between
1 and 2.3%, with the maximum revisions recorded in sub-domain G47-4711-472-473 (3.3%).
2) The analysis of direction measures shows that revisions are systematically positive (i.e. the first
estimates are slightly under-estimates).
3) Looking at variability, the revisions in the domain G4711+472 are the most erratic.
An important difference is apparent between the sector with a large share of survey data
(G4711+472) and those with fewer survey data.
51
Table A3.2 - Description of the domains considered in the analysis
NACE code Description
G47 Retail trade, except of motor vehicles and motorcycles
G4711+472 Retail sale of food, beverages and tobacco
G47-4711-472-473 Retail sale of manufactured goods excl automotive fuel
G47-473 Retail sale excl. automotive fuel
Table A3.3 - Summary statistics on revisions
Domain No. of
occurrences Period
Target
population
Survey
coverage Size Direction Variability
Impact on
growth rates’
direction
No. of units
% of target
variable of
population
MAR RMAR MaxAR MeAR MR % > 0 % < 0 MeR SDR Range % Sign(Later)
=Sign(Early)
G 47 5 JAN2012-MAY2012 5566 97 1.2 0.1 2.4 1.5 1.1 80 20 1.5 1 2.6 100
G4711+472 5 JAN2012-MAY2012 277 94 1 0.1 2.5 1.1 0.6 80 20 0.4 1.3 3.5 100
G47-4711-472-
473 5 JAN2012-MAY2012 609 64 2.3 0.1 3.3 2.4 2.3 100 0 2.4 1 2.4 100
G47-473 5 JAN2012-MAY2012 919 78 1.4 0.1 2.8 1.6 1.4 80 20 1.6 1.1 3 100
52
The time series of the first and fourth estimates are represented in the following graphs, together
with revisions.
Figure A3.1 - Graphical analysis: first estimate, fourth estimate and revisions.
The graphs confirm a systematic under-estimation, in division 47 as well as its sub-domains.
53
In Figure A3.2 revisions are decomposed into the part covered in the preliminary estimate by the
admin data and the part covered by the survey. Looking at the four graphs as a whole, the survey
source appears as a predominant cause of revisions because the survey part has the largest share
compared to the total estimate. The contribution of the smallest enterprises (estimates using admin
data) is proportional to their share of turnover. For instance in division 47 (where the admin data
accounts only for 3% in terms of the target variable (table A.3.2)), they account for just about half
the revision of May 2012 and about one fourth in April 2012. In sector 4711-472-473, where admin
data account for about one third of the target variable, in almost every month they were the
predominant source of revisions. A remarkable observation is that the general under-estimation of
the first estimates comes both from the survey part as well as the admin data part.
Figure A3.2 - Graphical analysis: decomposition of revision due to admin and survey data.
54
In table A3.4 the information on the main causes of revisions analysis are synthetically reported.
The first section describes in general terms the main characteristics of the revisions while in the
second part for the problematic subdomains 472, 4791 and 4779+4781+4782+4789+4799 the
causes of the most significant revisions are explained.
Table A3.4 - Causes of point in time errors.
Generic
(cross domains)
- Slight general underestimation.
- Underestimation characterises survey and admin data subpopulation, exception is one
month, March 2012.
- Revisions of data are in general low, for some activity groups even negligible. Survey
and admin data revisions direction is the same and do not compensate revisions.
- Higher revisions were observed during benchmarking with quarterly data which may
be considered as elimination of definition difference.
Domain Point in time
errors/generic
revisions
Causes Description
472 2012_5 SL, BQ Late reporter of survey data, imputation of
survey data was underestimated.
Benchmarking with quarterly survey data
Generic Small subpopulation. Every reporter is important.
4791 2012_1
SR, BQ
Correction of quarterly survey microdata,
Benchmarking with quarterly survey data 2012_2
2012_3
Generic Activity with highest revisions. First estimate is underestimated.
4779+4781+4782+
4789+4799
2012_2 OT
Treatment of admin data outliers
2012_3
Generic Only activity with negative revisions. First estimate is
overestimated.
Classification of possible causes of revisions
Revision of Survey data (SR)
Benchmarking of data with quarterly survey data (BQ)
Outlier treatment (OT)
Late reporter of survey data (SL)
Conclusions
The application of the revision analysis framework on the year-on-year growth rates (t,t-12) of the
Estonian monthly retail trade turnover estimates, revealed quality aspects of these estimates which
are important when explaining the estimates or improving the system:
1) total revisions are low, at least compared to the magnitude of the growth rates, suggesting that
the first estimate is quite reliable.
2) total revisions are positive: it appears that the first estimate is slightly underestimated;
3) the main revisions of the estimates from the survey source are due to late responses and
corrections of preliminary reported data, estimates from the admin data are revised by late
reporters and benchmarking with quarterly survey data.
55
Appendix 2. Summary statistics on revisions
The revision indicators reported in table 2 are described in this section, together with their formulae.
Before presenting the formulae themselves, we first present the general relationships, let’s define:
with respectively the preliminary estimate and the last estimate of month t. The preliminary
and last estimate of the y-on-y growth rate will therefore be respectively equal to:
[A1.1]
and
[A1.2]
The revision of the y-o-y growth rate will thus be defined as:
[A1.3]
That is the ratio of the level revision error and the last estimate of t-12.
will be used to refer to the number of periods t..
Size of revisions
Mean Absolute Revision
[A1.4]
gives a measure of the revision size. The use of absolute value avoids compensation effects between positive and negative revisions. It does not provide information on directional bias.
In order to get a measure that is normalised by the size of the estimate the following indicator is used: Relative Mean Absolute Revision
[A1.5]
,p l
t tY Y
12
12
p lp t t
t l
t
Y YY
Y
12
12
l ll t t
t l
t
Y YY
Y
tR
12 12
l p
t t t
l p
t t tt l l
t t
R Y Y
Y Y RR
Y Y
l p
t t tR Y Y
n
n
1t
pt
lt YY
n
1MAR
nl p
t t
t 1
nl
t
t 1
Y Y
RMAR
Y
56
Similar to MAR is the Mean Squared Revision (MSR) that emphasises the highest revisions. A further measure of size is the median of revisions in absolute value (MeAR) which is not affected
by extreme observations:
Median Absolute Revision
[A1.6]
and the highest revision considered in absolute value (MaxAR), which immediately highlights the
most extreme case:
Maximum Absolute Revision
[A1.7]
Direction
Mean Revision
[A1.8]
gives an indication on the direction of revisions: if positive (negative) the preliminary estimate underestimates (overestimates) on average the last estimate. This measure doesn’t give useful information on the size of revisions due to compensations of opposite sign revisions. Other simple measures that can be used as supplementary to the mean revision are the % of positive revisions, the % of negative revisions and the % of revisions = zero. Further measures that give indication on the direction of revisions are the median revision
(MeR), that is not affected by extreme revisions and reinforce the interpretation of the MR:
Median Revision
[A1.9]
Variability
Standard Deviation of Revisions
[A1.10]
Gives a measure of spread of revisions around their mean value (MR) providing an indication on
the volatility of revisions in a given time interval. It is affected by extreme values so it is not a good
measure of dispersion of revisions when their distribution is asymmetric. Another measure of
variability is the Range.
Range
[A1.11]
Various versions of range can be also considered: Range90 is the interval into which the 90% of
revisions stands, Range50 etc.
1 2 nMeAR Me( R , R ,....., R )
1 2 nMaxAR max( R , R ,....., R )
n
1t
pt
lt )YY(
n
1MR
1 2 nMeR Me( R ,R ,.....,R )
n
1t
2t )MRR(
1n
1SDR
1 2 n 1 2 nRange Max( R ,R ,.....,R ) Min( R ,R ,.....,R )
57
Indicators of the skewness of revisions can also be considered, in order to get indications on the
shape of the revisions distribution around the median value.
Impact of revisions on sign of growth rates
It may be of interest to look at how often the last estimates have an opposite sign with respect to the
preliminary estimates. This issue may be important in periods of low economic growth, when the
growth rates lay around zero and revisions may change the sign to growth rates. In order to measure
this aspect one can refer to the % of observations for which the final and the earlier estimates have
the same sign.
58
Appendix 3. Contribution of admin data and survey data to revisions
The issue we are tackling is to measure the impact of the change of availability in admin data
between the last and the preliminary estimate on the total revision. Ideally we would like to be able
to decompose the revision in a part due to admin data and in a part due to survey data. At the time
when the final estimate is released, provided that the databases and the informative systems allow
the reconstruction of the estimate combining different versions of the data one can decompose the
impact of updating different sources of data.
In formula, let’s write the last estimate and the preliminary estimate very generally as:
[A2.1]
[A2.2]
That is as a function of survey (s) and admin (a) data available respectively for the deadline of the
last (l) and preliminary (p) estimate.
The function g will depend on the specific estimator.
In order to isolate the contribution of each of the two data sources, it can be built an estimate
simulating the situation where the survey data are preliminary and the admin data are final. We will
refer to this kind of estimate as counterfactual estimate:
[A2.3]
Considering this counterfactual estimate, the revision of the level can be decomposed as:
[A2.4]
Where the first term represents the revision due to the survey data and the second term is the one
due to the admin data1.
By dividing the [A2.4] for we have a decomposition of the revision of the growth rate:
[A2.5]
The specific formula [A2.5] to be used in each situation depends on the specific estimator used by
each country. The contribution of Admin data to the estimate, in fact, depends on the role that they
play in the estimator. For instance, in Italy, after the micro imputation, the admin data will be just
enumerated to get the estimate for small and medium enterprises (Baldi et a. 2011, 2012). Instead in
the regression estimator of Lithuania, used for the small and medium enterprises, the admin data
plays the role of auxiliary variable (Kavaliauskiene 2011)2. In UK testing the admin data are used
after a correction (Orchard et al 2011, Teneva 2012).
1 Alternatively the counterfactual estimate can be defined as one where the survey data are final and the admin data are
preliminary. In this case the [A2.3] becomes and accordingly the [A2.4] becomes
where the first term represents the revision due to the admin data and the second
term is the one due to the survey data.
2 Here the estimate of Lithuania is a bit simplified: it is not considered that, in reality, the estimate of the smallest
enterprises is obtained through the survey and a HT estimator.
( , )l l l
tY g s a
( , )p p p
tY g s a
( , )c p l
tY g s a
( ) ( )l p l c c p
t t t t t tY Y Y Y Y Y
12
l
tY
12 12 12
( ) ( )l p l c c p
t t t t t t
l l l
t t t
Y Y Y Y Y Y
Y Y Y
( , )ac p l
tY g s a
( ) ( )l p l ac ac p
t t t t t tY Y Y Y Y Y
59
In the following, the specific formulation for the cases of quasi complete data and regression
estimator are presented. Since the source of data and the methodology used is often different for the
population of large and small and medium enterprises it is important to identify the two estimates
separately. We will refer to the largest enterprises with the subscript LE and to the small and
medium enterprises with the subscript SME.
Quasi complete data and mixed mode enumeration.
In this situation, such as the one used by Istat-employment indicators (Baldi et al. 2011, 2012) and
SE and Destatis turnover indicators (Lorenz 2011), the estimate of Y can be obtained as a sum of the
estimates obtained through survey and admin data respectively for the Large enterprises and the
small and medium enterprises.
[A2.6]
The preliminary estimate and the last estimate will be expressed accordingly as:
[A2.7]
[A2.8]
The revision error can be decomposed in the part due to the administrative source and to the survey
source:
[A2.9]
By rewriting it in the following form:
[A2.10]
where the first term represents the contribution of the revision of the survey part to the growth rate,
while the second represents the contribution of the admin data part. In this case the two terms also
correspond to the contribution of large and small/medium enterprises to the total.
A step ahead is the representation in terms of growth rates of the two components. In fact
multiplying and dividing the previous expression by and concentrating on the part due to
survey data, we can write:
[A2.11]
that is: where represents the weight in the final estimate due to the survey.
The same operation can be performed on the part of the estimate due to admin data, so that the total
revision error can be written as:
[A2.12]
This approach is valid for those cases where admin data are used as target variable and no
correction is required.
s a
t LE t SME tY Y Y
p ps pa
t LE t SME tY Y Y
l ls la
t LE t SME tY Y Y
12 12
( ) ( )sl sp al ap
t t t tt sl al
t t
Y Y Y YR
Y Y
12 12 12 12
( ) ( )sl sp al ap
t t t tt sl al sl al
t t t t
Y Y Y YR
Y Y Y Y
12
sl
tY
12 12
12 12 12 12 12 12
( ) ( )sl sp sl sl sp sl
t t t t t t
sl al sl sl sl al
t t t t t t
Y Y Y Y Y Y
Y Y Y Y Y Y
12
s s
t tR 12
s
t
12 12
s s a a
t t t t tR R R
60
Regression estimator
For a regression estimator, such as the one used by Statistics Lithuania for the estimates of Income
for retail trade (Kavaliauskiene 2011), the estimate of the target variable is coming from the results
of a mixed approach where the estimate of the small and medium enterprises is obtained through a
regression estimator, that is:
[A2.13]
where the estimate of small and medium enterprises is obtained with a regression estimator (here
we omit the subscript SME and s to indicate the survey sample for simplicity):
[A2.14]
where
[A2.15]
is the Horwitz Thompson estimator of the target total and dit are the direct weights
[A2.16]
is the Horwitz Thompson estimator of the auxiliary total
[A2.17]
is the total of the auxiliary variable over the target population list L and is the weighted least
square estimator obtained regressing over the sample s, yit on xit. Notice that in the formula of
enter the direct weights and possibly a term for heteroskedasticity
Now let’s introduce the notation for the preliminary estimate and the last estimate in column 1 and
2 of table A2.1
Table A2.1. Regression estimator for the preliminary, last and counterfactual estimates for small
enterprises
Preliminary estimate Last Estimate Counterfactual estimate
- obtained on sp regressing
on
- obtained on sl regressing
on
- obtained on sp regressing
on
t LE t SME tY Y Y
ˆ( )t t t t tY Y X X
t it it
s
Y d y
t it it
s
X d x
t it
L
X x
t
t
ˆ( )p p p p p
t t t t tY Y X X ˆ( )l l l l l
t t t t tY Y X X ˆ( )c p c c c
t t t t tY Y X X
p
p p
t it it
s
Y d y tl
l l
it it
s
Y d y tp
p p
it it
s
Y d y
tp
p p
it it
s
X d x tl
l l
it it
s
X d x tp
c l
it it
s
X d x
t itpt
p p
L
X x t itlt
l l
L
X x t itpt
c l
L
X x
ˆt
p
it
pyit
px
ˆt
l
it
lyit
lx
ˆt
c
it
pyit
lx
61
In table A2.1 the superscript p and l indicate, as usual, the information available for the preliminary
and last estimate. Thus sp, yit
p, Lt
p, xit
p , - indicate respectively the sample of respondents, the
value of the target variable, the Register list that identifies the target population, the auxiliary
variable coming from the admin data, and the regression coefficient available for the preliminary
estimate deadline. An analogous meaning have the terms sl, yit
l, Lt
l, xit
l , for final estimate
In the case of SL xitp, the admin data available for the preliminary estimate are actually the VAT
data referred to the month before, while xitl are the admin data referred to the current month. The
notation may be found confusing, since we are using in both cases the subscript t. However we
prefer to leave the subscript t to take into account a more general case of a country that has the
availability of current information from admin data and choose to use a regression estimator to
adjust for definitional or other measure related issues.
To measure the contribution to the revision provided by the change in admin data the counterfactual
estimate can be built using all the information referred to the time of preliminary estimate with the
exception of the admin data (used as auxiliary variables) that have to be referred to the time of the
last estimate.
In practice, the regression coefficient can be obtained by regressing the data yitp on xit
l on the
sample sp, while the total is to be calculated summing up xit
l over the list defined by the registry
used for the preliminary estimate, Lp, and is to be calculated over the sample s
p3.
Reintroducing the estimate of the largest enterprises, the revision of the level can thus be
decomposed as follows:
[A2.18]
Where the first term represents the revision due to the change in survey part (respondents,
modification of the values y, and the (business) register list used as the population frame (if it is
changed between the first and the final estimate4) and the second term is due to the change in admin
data part.
This formula also allows easily to decompose between the estimate of large enterprise and the
estimate of small and medium.
3 An alternative formulation of the preliminary, last and counterfactual estimates can be obtained starting from the
representation of the regression estimator as weighted sum of the sample units, where the weights are a product of the
direct weights and the g weights. 4 Following the logic behind the decomposition one might also want to isolate the contribution due to the updating of
the register, if it enter in the final estimate.
ˆit
p
ˆt
l
t
cX
t
cX
( ) ( )l p l c c p
t t t t t tY Y Y Y Y Y
[( ) ( )] [( ) ( )]ls ls ps cs ps cs ps ps
LE t SME t LE t SME t LE t SME t LE t SME tY Y Y Y Y Y Y Y
62
Appendix 4. SAS code for the calculation of the summary statistics on
revisions and graphs
This section provides the SAS (v9.2) code to calculate the table of summary measures and the
graphs of estimates and revisions.
The input data should have a cross-sectional/time series form as follows:
year month domain v1 v2 ….. Vn
2009 1 RTD 2.5 2.5 0.6
2009 2 RTD -2.2 ….. -1.1
….. ….. ….. ….. ….. …..
2010 1 RTD
2010 2 RTD
….. ….. ….. ….. ….. …..
2009 1 CON 4.3 ….. 2.5
2009 2 CON 2.1 ….. 2.0
….. ….. ….. ….. ….. …..
2010 1 CON
2010 2 CON
….. ….. ….. ….. ….. …..
Where for each time occurrences (year, month) and for the target domains (RTD=retail trade,
CON=construction…), the various vintages of year-on-year growth rates are variables here labelled
as v1,v2,…,vn.
The code is divided in 3 steps: 1) manual settings,2) data management 3) calculate and output
statistics and graphs.
In the manual settings, beyond indicating the input and output paths, three macro parameters are set:
the input dataset, and the two vintages to compare.
The code provides also an example to conditional formatting the cell of the table of summary
measures, to highlight the values exceeding certain predefined thresholds. It might be useful when
the table contains several domains to identify quickly those problematic. In the example the values
between 0.3 and 1 are coloured in yellow and those exceeding 1 in red. This conditional formatting
is applied to the RMAR. It is to be stressed that the it is just an example and different rules and
threshold may be used as according the choices of the analyst.
The code produces the following outputs :
1) Report_&vi.xls that contains the report on summary statistics on revisions based on the
vintages chosen for the comparison;
2) Graph_&vi.html that contains the graphical analysis on y-on-y growth rate.
/*Step 1 – Manual Setting*/
libname libIO "\\PATH WHERE READ THE INPUT(directory)";
filename outpath "\\PATH WHERE SAVE THE REPORTS AND THE GRAPHS(directory)";
proc format;
value cback
low - 0.3 = 'white'
0.3<- 1 = 'yellow'
63
1< - high = 'red';
run;
*datain is the name of the input sas dataset;
%let datain=dati;
*vj is the name of the y-on-y growth rate chosen as base (i.e. the last estimate);
*vi is the name of the y-on-y growth rate chosen as comparison (i.e. the first estimate);
%let vj=vn;
%let vi=v1;
/*Step 2 – Data management */
%macro prepare;
data &datain (drop= year month);
set libIO.&datain ;
date=mdy(month,1,year);
format date monyy7.;
run;
proc sort data=&datain;
by domain date; run;
data &datain;
set &datain;
if (&vi gt .) then do;
rev&vi._flgt0_vt=0;
rev&vi._fllt0_vt=0;
rev&vi._flsign_vt=0;
&vi._flsign_vt=0;
rev&vi=(&vj-&vi);
revabs&vi=abs(rev&vi);
&vi.abs=abs(&vi);
if rev&vi gt 0 then rev&vi._flgt0_vt=1;
if .<rev&vi<0 then rev&vi._fllt0_vt=1;
if (&vj gt 0 and &vi gt 0) or (.<&vj<0 and .<&vi<0) then &vi._flsign_vt=1;
end;
run;
%mend prepare;
/*Step 3 – Macro to produce a report on summary statistics on revisions and a graph analysis */
%macro revision;
*Calculation of summary statistics on revisions;
proc means data=&datain nway noprint;
where rev&vi ne .;
64
class domain;
var &vj &vi.abs rev&vi revabs&vi rev&vi._flgt0_vt
rev&vi._fllt0_vt &vi._flsign_vt date;
output out=summary&vi n(rev&vi)=nrev mean(revabs&vi)=MAR mean(&vi.abs)=MAP4
max(revabs&vi)=MaxAR
median(revabs&vi)=MeAR mean(rev&vi)=MR mean(rev&vi._flgt0_vt)=Perc_gt0
mean(rev&vi._fllt0_vt)=Perc_lt0
median(rev&vi)=MeR std(rev&vi)=sdr min(rev&vi)=minrev max(rev&vi)=maxrev
range(rev&vi)=range mean(&vi._flsign_vt)=Perc_signeq min(date)=mindate
max(date)=maxdate ;
run;
data summary&vi (drop=_TYPE_ _freq_ );
set summary&vi;
RMAR=MAR/MAP4;
Perc_gt0=Perc_gt0*100;
Perc_lt0=perc_lt0*100;
Perc_signeq=Perc_signeq*100;
period=put(mindate,monyy7.)||'-'||put(maxdate,monyy7.);
run;
ods tagsets.excelxp path=outpath file="REPORT_&vi..xls"
style=journal options(index='yes' embedded_titles='yes' embedded_footnote='yes'
sheet_interval='none') ;
/*Report On Summary Measures Of Revisions*/
proc report data=summary&vi nowd split='*';
title 'Summary measures of revisions';
col ('Domain' domain) ('N.*occurrences' nrev) ('Period' period)
('Size' MAR RMAR maxar mear)
('Direction' mr perc_gt0 perc_lt0 mer) ('Variability' sdr range) ('Impact on*growth rates*direction'
perc_signeq);
define domain /' ' style={tagattr="format:@"};
define nrev /' ' display;
define period /' ' display;
define MAR/'MAR' display;
define RMAR/'RMAR' display style(column) ={background=cback.};
define maxar/'MaxAR' display;
define mear/'MeAR' display;
define mr/'MR' display;
define perc_gt0/ '% > 0' display;
define perc_lt0/'% < 0' display;
define mer/'MeR' display;
define sdr/'SDR' display;
define range/ 'Range' display;
define perc_signeq/ '% Sign(Later)*=Sign(Early)' display;
format MAR RMAR maxar mear mr perc_gt0 perc_lt0 mer sdr range perc_signeq 8.1 ;
run;
ods tagsets.excelxp close;
65
*Graph Analysis;
proc sort data=&datain;
by domain date; run;
ods html path=outpath file="GRAPH_&vi..html" style=journal;
proc sgplot data=&datain ;
title "Growth rates: base estimate (&vj), compare estimate (&vi) and revision (&vj minus &vi)";
where &vi ne .;
by domain;
vline date/ response=&vi lineattrs=(color=red) legendlabel="&vi";
vline date/response=&vj lineattrs=(color=blue) legendlabel="&vj";
vbar date / response=rev&vi fillattrs=(color=lightgrey) barwidth=.4 legendlabel='Rev.';
yaxis label=' ';
xaxis label=" " grid interval=monthfitpolicy=thin;
format date monyy7.;
run;
ods html close;
%mend revision;
%prepare;
%revision;