Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi,...
-
Upload
brent-brown -
Category
Documents
-
view
215 -
download
0
Transcript of Optimal Allocation in the Multi-way Stratification Design for Business Surveys (*) Paolo Righi,...
Optimal Allocation in the Multi-way Optimal Allocation in the Multi-way Stratification Design for Stratification Design for
Business Surveys Business Surveys (*)(*)
Paolo Righi , Piero Demetrio [email protected]; [email protected]
Italian National Statistical Institute
(*) Research of National Interest n.2007RHFBB3 (PRIN) “Efficient use of auxiliary information at thedesign and at the estimation stage of complex surveys: methodological aspects and applications forproducing official statistics””
OutlineOutline
Statement of the problem Multi-way Sampling Design Multi-way optimal allocation algorithm Monte Carlo simulation
Statement of the problem
Large scale surveys in Official Statistics usually produce estimates for a set of parameters by a huge number of highly detailed estimation domains
These domains generally define not nested partitions of the target population
When the domain indicator variables are available at framework level, we may plan a sample covering each domain
Fixing the sample sizes:Help to control the sampling errors of the main estimates;When direct estimators are not reliable (small area problem), having
the units in the domains allows to: bound the bias of small area indirect estimators; use models with
specific small area effects.
Statement of the problem
Standard solution for fixing the sample sizes stratifies the sample with strata given by cross-classification of variables defining the different partitions (cross-classified or one-way stratified design)
Main drawback:Too detailed stratification:
Risk of sample size explosion; Inefficient sample allocation (2 units per stratum constraint);Risk of statistical burden (e.g. repeated business surveys) .
Statement of the problem
Domain of Interest Parameter of interest and estimator:
Multivariate (r=1,…,R) and multidomain (d =1, … , D) context
being 1dk if dUk and 0dk otherwise.
Statement of the problem
The sampling strategy herein proposed bases each domain estimate on a planned sample size. We consider a general random sampling design where hU (h=1, …, H) of size hN define minimal planned subpopulations. We assume two cases dU = hU or dU = hh U
d where d is a subset of H,...,1 .
Statement of the problem
Example: Three domain types lT (l=1, .., 3). Nace four digit; Nace three digit by size; Nace 2 digit by geography Each domain type defines a partition of the population of lD cardinality being 321 DDDD . Different sampling design allows to plan the sample size of the interest domain: the standard approach define the hU ’s combining
the population of the three domain types. Then
321 DDDH and the kδ are defined as (0,..,1,...,0) vectors. We denote these design as cross-classified or one-way stratified design;
Statement of the problem
Example (continue): the hU ’s are defined combining all the couples of
domain types. Then )()()( 323121 DDDDDDH ;
some hU ’s agree with the domains of one population partitions (for instance 1T ) and the others hU ’s are defined combining couples of the remaining domain types ( 2T and 3T ). Then
)( 321 DDDH ; the hU ’s agree with the domains of interes. Then
321 DDDH .
Sampling design defining the hU ’s as in the last three points are denoted by Multi-way (or incomplete) stratification
Multi-way Sampling DesignMulti-way Sampling Design
Main problem of MWD: define a procedure for random selection
We propose to use the Cube method (Deville and Tillé, 2004):Select random sample of multi-way stratified design; For a large population and a lot of domains.
Multi-way Sampling DesignMulti-way Sampling Design
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 6
The cube algorithm selects a sample s respecting the following general balancing equations
s U kk
k xx
Setting kkk δx with ),...,,...,( 1 Hkhkkk δ
being 1hk if hUk and 0hk otherwise
We have
s hhk n fixed for each sample selection
Optimal allocation algorithmOptimal allocation algorithm
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 6
Deville and Tillé (2005) proposed an approximated expression of the variance
Uk kdrkdrp ftV 2)()( )1/1()|ˆ( π
Where f = N/(N-H), kdr)( dkrky kdrk g )(
and kdrg )( = )(drk Bδ , being
12)( )]1/1([
kUk kkkdr δδB
)1/1( kUk rkkdkk y δ
Optimal allocation algorithmOptimal allocation algorithm
We propose an algorithm for the definition of an optimal inclusion probability vector *π according to the following optimality criterion:
)( * Uk kMin
(a) )(*
)( )|ˆ( drdrp VtV π (dr)
(b) 10 * k
being )(drV a fixed variance threshold for the domain dU on the
r-th variable of interest.
Optimal allocation algorithmOptimal allocation algorithm
We note the (a) constraints depend on the unknown variables of interest. In practice only model predicted values can be used. In the paper we sketch the main phases of the algorithm in this operative context.
We consider a general prediction model M
lkuuE
uEkuE
uyy
rlrkM
rkrkMrkM
rkrkrk
0),(
;)(;0)(
~
22 .
We suppose the 2rk values are known or can be predicted
Optimal allocation algorithmOptimal allocation algorithm
To take into account the model uncertainty, the sampling variance is replaced by the Anticipated Variances (Isaki and Fuller, 1982). An upward approximation of the anticipated variances for the proposed strategy is
Uk rkkdrk
Uk kdrk
drdrpmdr
f
ttEEtAV
2)(
*
2)(
*
2*)()()(
)1/1(
~)1/1(
)|ˆ()ˆ(
π
where 2)(
~kdr is computed by means of a model
predicted value rky~ . The approximation neglects a residual term that we do not show for sake of brevity. However, the optimization procedure does not change if the corrected anticipate variance is taken into account.
Optimal allocation algorithmOptimal allocation algorithm
The optimization problem is defined as
)( * Uk kMin
(5) )()( )ˆ( drdr VtAV (dr)
10 * k
To obtain a solution of the optimization problem we formulate constraints (5) as
)~,()~(
/)~(
*)(
22)(
*22
ddrrkUk rkdkdr
krkUk rkdk
CyV
y
gπ
being :
]~)1(~~)1(2[ 2***)( Uk dkkkUk dkrkdkkdr ggyfC , )~,...,~,...,~(~
1 Dkdkkd gggg , dkg~ = )(
~drk Bδ with )(
~drB given by (2) replacing rky with rky~
Optimal allocation algorithmOptimal allocation algorithm
The algorithm consists of two calculation loops nested in each other.
Let G)( and G),( respectively denote
the generic quantity G as calculated by the iteration ( =0,1,2,..) of the first loop (outer process) and by iteration =0,1,2,..) of the second one (inner process).
Optimal allocation algorithmOptimal allocation algorithm
For a given value of the vector of the inclusion probabilities ),...,,...,( )()()()(
1 Nk πππ π , the first calculation loop calculates the terms (D
x R) terms )~,( *)(
)(ddr
a C gπ .
For given values of )~,( *)(
)(ddr
a C gπ , the second
calculation loop finds the inclusion probabilities,
),...,,...,( ),(),(),(),(1 Nk πππ π ,
solution of problem by a slight modification of the Chromy algorithm.
Monte Carlo simulationMonte Carlo simulation
Objectives of simulation:
Test the convergence of the optimization algorithm (optimization step)
Comparison between the expect AV and the Monte Carlo empirical AV
Comparison with standard cross-classified stratified design
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 12
Monte Carlo simulationMonte Carlo simulation
Data:Subpopulation of the Istat Italian Graduates’ Career Survey
(3,427 units);Driving allocation variables:
employed status (yes/no) ; actively seeking work (yes/no) .
We generate the values of the two variables by means a logistic additive model (Prediction model);
Explicative variables: degree mark, sex, age class and aggregation of subject area degree
The parameters are estimated by the data from the previous survey
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 13
Monte Carlo simulationMonte Carlo simulation
Survey target estimates:Two partitions define the most disaggregate domains:
First partition: university by subject area degree (9 classes);Second partition: degree by sex;
Domains in real survey:448+94; Strata 2,981 (university, degree, sex);
In the simulation: domains 20+15;strata 91.
Errors thresholds fixed in terms of CV(%)
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 14
Monte Carlo simulationMonte Carlo simulation
Results:Assuming as known values
Iterations (outer process): 6;Optimal sample size 171 (after calibration 182).
Assuming predicted values:Iterations (outer process): 3;Optimal sample size 699 (after calibration 707).
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 15
Monte Carlo simulationMonte Carlo simulation
Analysis of the allocation with the predicted values:The sample allocation procedure uses an approximation of the
AV
The simulation confirms the input AV is an upward approximation of the real AV
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 16
Average of Expectected Anticipated CV(%) Partition
1 8.1 17.82 9.2 19.1
1y 2y
Average of Empirical (10,000 Monte Carlo simulations) Anticipated CV(%) Partition
1 6.7 14.72 7.4 15.5
1y 2y
Monte Carlo simulationMonte Carlo simulation
Comparison with the standard approach:
The implicit model (one-way stratification model) is similar to the model used in our approach;
The allocation differences depend on the unit minimum number constraint (2) in each stratum;
The sample size is 751 units (+7.4%);
Taking into account the domains with small population strata (<10 units in average per stratum) standard approach produces +14.4% sample size.
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 17
ReferencesReferences
Bethel J. (1989) Sample Allocation in Multivariate Surveys, Survey Methodology, 15, 47-57.
Chromy J. (1987). Design Optimization with Multiple Objectives, Proceedings of the Survey Research Methods Sec-tion. American Statistical Association, 194-199.
Deville J.-C., Tillé Y. (2004) Efficient Balanced Sampling: the Cube Method, Biometrika, 91, 893-912.
Deville J.-C., Tillé Y. (2005) Variance approximation under balanced sampling, Journal of Statistical Planning and Inference, 128, 569-591
Falorsi P. D., Righi P. (2008) A Balanced Sampling Approach for Multi-way Stratification Designs for Small Area Estimation, Survey Methodology, 34, 223-234
Falorsi P. D., Orsini D., Righi P., (2006) Balanced and Coordinated Sampling Designs for Small Domain Estimation, Statistics in Transition, 7, 1173-1198
Isaki C.T., Fuller W.A. (1982) Survey design under a regression superpopulation model, Journal of the American Statistical Association, 77, 89-96
ITACOSM 2011 - 27-29 June 2011, Pisa, Italy - 18