Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed...
Transcript of Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed...
![Page 1: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/1.jpg)
Optimal Stratification and Allocation for theJune Agricultural Survey
Jonathan Lisic 1 Hejian Sang 2 Zhengyuan Zhu 2
Stephanie Zimmer 2
1United States Department of Agriculture National Agricultural Statistics Service,Washington, D.C. 20250, U.S.A.
2Department of Statistics, Iowa State University, Ames, IA 50011, U.S.A.
December 2nd, 2015
“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1
![Page 2: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/2.jpg)
Overview
This presentation will cover:I Optimal stratification and allocation through simulated
annealing under coefficient of variance and fixed sample sizeconstraints.
I Application to simulated data.I Application to the June Agricultural Survey.
FCSM 2015 2
![Page 3: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/3.jpg)
BackgroundThe June Agricultural Survey (JAS) is an annual area survey ofagriculture over the contiguous 48 states.
I Stratification is performed on a state-by-state basis.I Characteristics of interest include major commercial crop
acreages (corn, soybeans, winter wheat, etc. . . ).I Sampling units (segments) are approximately one square mile
in size (up to 268,518 segments in Texas).I Characteristics are not necessarily correlated with each other.I Target CVs are set for estimates, not administrative data.I Highly correlated covariates available through remote sensing
for crops.I Fixed sample size.
FCSM 2015 3
![Page 4: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/4.jpg)
Background
The current stratification is non-optimal:I Strata are formed through univariate bounds on cultivated
acreage within segments.I Stratification is not based on the characteristics of interest.I Optimal allocation is performed given a stratification.
FCSM 2015 4
![Page 5: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/5.jpg)
The Problem
How do you create an optimal design under quality and sample sizeconstraints?
FCSM 2015 5
![Page 6: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/6.jpg)
Prior ApproachesI The problem has been addressed by Dalenius and Hodges
(1959) and Lavallee and Hidiroglou (1988), for the specificcase of two stratum (one census and one non-census).
I Lavallee and Hidiroglou (1988) formed strata throughunivariate thresholding, e.g. establishments greater than 100people.
I This work has been extended to multiple dimensions (seeBenedetti and Piersimoni, 2012), but not to more strata.
I The multivariate extension initially forms boundaries throughunivariate thresholding each of the characteristics beingsampled.
I This boundaries are relaxed through a sequence of exchanges.I Require strong population asymmetry and the sample size
cannot be fixed.
FCSM 2015 6
![Page 7: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/7.jpg)
Approach
I Use existing, computationally efficient, machine learningmethods to form an initial stratification.
I Use simulated annealing to both obtain an optimal sampleallocation and provide a stratification aligned with our desiredobjective function.
I The approach taken does not require strong populationasymmetry, but requires the sample size to be fixed(potentially empty feasible region).
FCSM 2015 7
![Page 8: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/8.jpg)
Objective Function
How do you define an objective function if you have vector valuedCVs, c = (c1, c2, . . . , cJ), and targets c = (c1, c2, . . . , cJ)?
cj =
√S2
j
yj
where y is the set of PSUs with fully observed administrative dataindexed by j .
FCSM 2015 8
![Page 9: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/9.jpg)
Objective Function
We apply a penalized objective function, with penalty λ:
||c||22 + λ||c − c||22+ (1)
I This objective function penalizes departures from the vectorvalued target CVs.
I The function ||x ||2+ =(∑J
j=1 x2j Ixj >0
)1/2.
I This approach is “soft” in that it does not have “hard” CVconstraints.
FCSM 2015 9
![Page 10: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/10.jpg)
Simulated AnnealingI Simulated annealing is a stochastic optimization process that
minimizes an objective function (possibly with constraints).I Avoids the pitfalls of ending up in a local maxima by
admitting non-optimal states.I The general form of an algorithm to perform this stochastic
process is:1. Start with initial state X0;2. Randomly generate a candidate state Yl , l ≥ 1;3. If Yl has a lower objective function than Xl−1, set Xl = Yl ;4. Else accept Yl with probability ρ = exp{∆hl/t(l)} otherwise
Xl = Xl−1(∆hl = Xl−1 − Xl ) ;
5. Go back to Step 2. until a threshold of iterations has been met.
FCSM 2015 10
![Page 11: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/11.jpg)
Simulated Annealing (Example 1)
FCSM 2015 11
![Page 12: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/12.jpg)
Simulated Annealing (Example 1)
FCSM 2015 12
![Page 13: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/13.jpg)
Simulated Annealing
1. Start with initial stratification I(0) and allocation η0;2. Randomly generate a candidate state I(l)
∗ , l ≥ 1;3. Randomly generate a candidate state η(l)
∗ (possibly the sameas the prior state);
4. If(I(l)
∗ , η(l)∗)
has a lower objective function than(I(l−1), η(l−1)
), set
(I(l), η(l)
)=(I(l)
∗ η(l)∗)
;
5. Else accept I(l)∗ with probability ρ = exp{∆hl/t(l)} otherwise
I(l) = I(l−1);6. Go back to Step 2 until a threshold of iterations has been met.
In this application t(l) = α(l + 1)−1 where α is a tuning parameter.
FCSM 2015 13
![Page 14: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/14.jpg)
Simulated Annealing (Example 2, Iteration 0)Index Strata x y1 1 2.3 722 1 2.5 553 1 2.1 424 1 2.8 615 1 2.9 686 2 4.9 587 2 5.1 448 2 4.2 519 2 2.8 4810 2 4.3 52
For sample size n = (3, 3), λ = 100, α = 1,c = (0.082, 0.068), c = (0.050, 0.100),
objective function = 3.307.
FCSM 2015 14
![Page 15: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/15.jpg)
Simulated Annealing (Example 2, Iteration 1)Index Strata x y1 1 2.3 722 1 2.5 553 1→ 2 2.1 424 1 2.8 615 1 2.9 686 2 4.9 587 2 5.1 448 2 4.2 519 2 2.8 4810 2 4.3 52
For sample size n = (2, 4),c = (0.108, 0.050), c = (0.050, 0.100),
objective function = 5.919, and ρ = 0.271.
FCSM 2015 15
![Page 16: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/16.jpg)
Simulated Annealing (Example 2, Iteration 2)Index Strata x y1 1 2.3 722 1 2.5 553 2 2.1 424 1 2.8 615 1 2.9 686 2 4.9 587 2 5.1 448 2 4.2 519 2→ 1 2.8 4810 2 4.3 52
For sample size n = (2, 4),c = (0.092, 0.069), c = (0.050, 0.100),
and objective function = 4.315 < 5.919.
FCSM 2015 16
![Page 17: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/17.jpg)
Simulated Annealing
Why would this work?I Each move is reversible, ensuring that for an infinitely long
run time with exact precision the method will converge to theglobal minima.
I For large populations with small sample sizes, there is littlechange needed to retain optimal allocation for single PSUexchanges.
I Furthermore, if a large change in optimal allocation needs tooccur after a single PSU exchange, that PSU probablyshouldn’t be moved.
FCSM 2015 17
![Page 18: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/18.jpg)
SimulationA simulation was performed with two population sizes, N=2,800and N=280,000, both with sample size 60.
I N1N = 8
28 , x1 ∼ N (µ1,Σ1),
µ1 = (60, 10), and Σ1 =(
6 00 6
).
I N2N = 10
28 , x2 ∼ N (µ2,Σ2),
µ2 = (20, 10), and Σ2 =(
6 00 6
).
I N3N = 10
28 , x3 ∼ N (µ3,Σ3),
µ3 = (20, 30), and Σ3 =(
6 00 6
).
I λ = 10, 000.I c = (0.020, 0.070).
FCSM 2015 18
![Page 19: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/19.jpg)
Simulation (N=2,800)
FCSM 2015 19
![Page 20: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/20.jpg)
Simulation (N=2,800)
Univariate K-means SimulatedTarget Optimal X Optimal Alloc. Annealing0.020 0.017 0.023 0.0200.070 0.084 0.050 0.070
Table: Attained CVs for simulated population size of 2,800.
FCSM 2015 20
![Page 21: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/21.jpg)
Simulation (N=2,800)
Run Time = 7 seconds for 1,000,000 iterations
FCSM 2015 21
![Page 22: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/22.jpg)
Simulation (N=280,000)
FCSM 2015 22
![Page 23: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/23.jpg)
Simulation (N=280,000)
Univariate K-means SimulatedTarget Optimal X Optimal Alloc. Annealing0.020 0.017 0.022 0.0200.070 0.089 0.047 0.071
Table: Attained CVs for simulated population size of 280,000.
FCSM 2015 23
![Page 24: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/24.jpg)
Simulation (N=280,000)
Run Time = 3.0 hours for 50,000,000 iterations
FCSM 2015 24
![Page 25: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/25.jpg)
Speed and Stability
That’s a lot of iterations!I Computational Speed:
I Variances are saved and only updated on accepted exchanges.I Variances and updated using online methods.
I Computational Stability:I After a fixed number of iterations variances are recalculated
from current strata assignments.
FCSM 2015 25
![Page 26: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/26.jpg)
More Speed
Can we make this faster?I Most successful exchanges occur near the initial boundaries
between stratum from the applied machine-learning methods.I Weighting can be applied to increase the number of
exchanges near the boundaries relative to other locations.
FCSM 2015 26
![Page 27: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/27.jpg)
June Agricultural Survey
This method was tested on South Dakota.I Target crops included cultivated acreage, corn, soybeans,
winter wheat and spring wheat.I Survey using covariate data from 2013-2014.I Each year-by-administrative variable pair is treated as a
distinct administrative variable.I 2015-2019 response is simulated using the 2008-2012
Cropland Data Layer(CDL) (see Boryan et al., 2011).I The algorithm was run for 5,000,000 iterations.
FCSM 2015 27
![Page 28: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/28.jpg)
June Agricultural SurveyResults:
Cultivated Corn Soybeans Winter Wht. Spring Wht.Target 0.01 0.05 0.05 0.19 0.16
2013 0.01 0.03 0.04 0.09 0.092014 0.01 0.02 0.04 0.07 0.07
*2015 0.02 0.04 0.05 0.10 0.10*2016 0.02 0.04 0.05 0.10 0.10*2017 0.02 0.04 0.05 0.10 0.12*2018 0.02 0.04 0.04 0.10 0.11*2019 0.02 0.04 0.04 0.12 0.12
*Using CDL data from prior years.
FCSM 2015 28
![Page 29: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/29.jpg)
Open and Reproducible Research
R package available at https://github.com/jlisic/saAlloc.
FCSM 2015 29
![Page 30: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/30.jpg)
Future Work
I Consider moving to more efficient methods such as differentialevolution (see Day, 2009).
I Investigate adaptive methods for weighting.I Consider alternatives to moving a single PSU, maybe
hyperplanes?I For JAS, understand the relationship between the
administrative data CVs and the estimate CVs.I For JAS, consider ways to predict future land cover.
FCSM 2015 30
![Page 31: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/31.jpg)
Thank You
FCSM 2015 31
![Page 32: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/32.jpg)
References I
Roberto Benedetti and Federica Piersimoni. Multivariateboundaries of a self representing stratum of large units inagricultural survey design. Survey Research Methods, 6(3):125–135, 2012.
Claire Boryan, Zhengwei Yang, Rick Mueller, and Mike Craig.Monitoring us agriculture: the us department of agriculture,national agricultural statistics service, cropland data layerprogram. Geocarto International, 26(5):341–358, 2011.
T. Dalenius and J. L. Jr. Hodges. Minimum variance stratification.Journal of the American Statistical Association, 54:88–101,1959.
FCSM 2015 32
![Page 33: Optimal Stratification and Allocation for the June ...I Lavall´ee and Hidiroglou (1988) formed strata through univariate thresholding, e.g. establishments greater than 100 people.](https://reader033.fdocuments.in/reader033/viewer/2022060400/5f0dfe967e708231d43d1926/html5/thumbnails/33.jpg)
References II
Charles D Day. Evolutionary algorithms for optimal sample design.In A paper presented at the 2009 Federal Committee onStatistical Methodology Conference, Washington, DC, 2009.
Pierre Lavallee and M Hidiroglou. On the stratification of skewedpopulations. Survey Methodology, 14(1):33–43, 1988.
FCSM 2015 33