Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York...

Post on 27-Mar-2015

214 views 0 download

Tags:

Transcript of Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York...

Lenka Mach, Statistics CanadaIoana Şchiopu-Kratina, Statistics Canada

Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada

ICES IIIJune 2007

Optimal Coordination of Samplesin Business Surveys

2

OUTLINE OF THE PRESENTATION:

1. Coordinated sampling

2. Optimal Sample Coordination

2.1 Transportation Problem

2.2 Reduced Transportation Problem

2.3 Variability of the Overlap

3. Example 1: NWCR method for negative coordination of two surveys.

4. Example 2: Reduced TP for positive coordination after re-stratification.

5. Conclusion

3

1. COORDINATED SAMPLING

• Needed when multiple sample surveys of overlapping populations

are conducted.

• Encompasses many different techniques to control the overlap of samples = number of common units.

higher overlap (positive coordination)• Objective:

lower overlap (negative coordination)

than if samples are selected independently.

• References: Ernst (1999), ICES II (2000), etc.

4

1. COORDINATED SAMPLING

First Survey:

S = set of all possible samples s

(marginal) prob. distribution on S

Second Survey:

S’ = set of all possible samples s’

(marginal) prob. distribution on S’

Integrated surveys:

joint prob. distribution s. t.

and

SsspP

SsspQ

SsSsssp ,,

Ssspssps

,, Ssspssps

,,

5

1. COORDINATED SAMPLING

Overlap of s and s’

= number of units that s and s’ have in common

Expected sample overlap

(1)

Survey are positively coordinated if

sso ,

sspssossoEs s

,,,

spspssosspssos ss s

,,,

6

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

We integrate two surveys so that the expected overlap is maximized (minimized):

Find max (min) of (1)

over all (2)

subject to (3)

sspssossoEs s

,,,

SsSsssp ,,X

Ssspssps

,,

Ssspssps

,,

1, s s

ssp

objectivefunction

unknown

constraints

7

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

s’1 s’2 s’3 … s’L p(s)

s1

…p(s1)

s2

…p(s2)

s3

…p(s3)

… … … … … … …

sK … p(sK)

p(s’) p(s’1) p(s’2) p(s’3) … p(s’L) 1

ss’

o(s1,s’1) o(s1,s’2)

o(s2,s’1)

o(s1,s’3) o(s1,s’L)

o(s3,s’1)

o(s2,s’2) o(s2,s’3) o(s2,s’L)

o(s3,s’2)

o(sK,s’L)

o(s3,s’L)o(s3,s’3)

o(sK,s’1) o(sK,s’2) o(sK,s’3)

X1 1 X12 X1 3 X1 L

X2 1 X2 2 X2 3X2L

X3 1 X3 3 X3 LX3 2

XK 1 XK 3XK 2 XK L

8

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

TP is too large, too many variables!

Example: First survey selects SRSWOR of n = 20 from N = 40.

= 137,846,528,820

n

NK

BUT, for stratified SRSWOR designs, we can reduce TP by grouping samples!Condition: The matrix of o(s, s’) within each group must be “symmetric”.

We use a two-stage procedure.

9

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Notation:P frame for Survey 1, P’ frame for Survey 2, C = P P’ c = c(s) = number of units in C sc’ = c’(s’) = number of units in C s’

Solution - Stage 1:• Group samples s super-rows c• Group samples s’ super-columns c’• Form a matrix of blocks (c, c’), define block optimum o(c, c’) • Solve the reduced TP joint probabilities p(c, c’)

Solution - Stage 2:Distribute p(c, c’) evenly among the pairs (s, s’) that have the optimum overlap

– each row s within the block gets the same probability– each column s’ within the block gets the same probability

10

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Matrix of o(s, s’) within a block.

211121112211121112

231

223

221

131

123

121

213121232121

duuduuduuduuduuduu

bbuubbuubbuu

11

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Example 1:

Survey 1: N =40, SRSWOR n =20

Survey 2: N’=41, SRSWOR n’=20

C=37

D=3 B=4

c = 17, 18, 19, 20 4 super-rowsc’ = 16, 17, 18, 19, 20 5 super-columns

Reduced TP has only 4 x 5 = 20 unknowns.

Constraints:

n

Ncn

DcCcp )(

'

'''')'( n

Ncn

BcCcp

12

2. OPTIMAL SAMPLE COORDINATION 2.3 Variability of the Overlap

• Optimal coordination maximizes (minimizes)• In practice, one pair of samples (s, s’) is selected

its overlap o(s, s’) should be close to ! • TP can be used in 2 steps:

– Step 1: as described on Slide 6

– Step 2: - Use from Step 1 as an additional constraint

- New objective function: For example, find the minimum of

(4)

ssoE ,

ssoE ,

ssoE ,

sspssoEssossoVs s

,,,, 2

13

3. Example 1NWCR method for negative coordination of two surveys.

Survey 1: N =40, SRSWOR n =20

Survey 2: N’=41, SRSWOR n’=20D=3, C=37, B=4

Minimize . ssoE ,

Stage 1 – Solve the Reduced TP:• Group samples s into super-rows and s’ into super-columns. • Order super-rows by ascending c and super-columns by descending c’,

form a matrix of blocks.• Block optimum o(c, c’) = max{0, c+c’–C} = smallest possible overlap o(s, s’) within (c, c’). • Use NWCR algorithm to obtain a solution.

Stage 2 - Determine p(s, s’) for each pair (s, s’):• Distribute p(c, c’) equally among all pairs (s, s’) within the block that

have o(s, s’) = o(c, c’).

14

3. Example 1 NWCR method for negative coordination of two surveys.

Table 1a: Reduced TP, p(c, c’) assigned by NWCR

c’p(c)

c 20 19 18 17 16

17 0 0.0591 0 0.0563 0 0 0 0 0 0 0.1154

18 1 0 0 0.2064 0 0.1782 0 0 0 0 0.3846

19 2 0 1 0 0 0.2158 0 0.1689 0 0 0.3846

20 3 0 2 0 1 0 0 0.0675 0 0.0478 0.1154

p(c’) 0.0591 0.2627 0.3940 0.2364 0.0478 1.0000

o(c, c’)p(c, c’)

15

3. Example 1 NWCR method for negative coordination of two surveys.

Stage 2 - Distribution of probabilities within blocks

Consider (c=17, c’=20) with o(c, c’)=0:

• there are = 15,905,368,710 different samples (rows) s

• there are = 15,905,368,710 different samples (columns) s’

The matrix of overlaps o(s, s’) is symmetric:For each sample s, there is exactly one sample s’ such that o(s, s’)=0.For each sample s’, there is exactly one sample s such that o(s, s’)=0.

Each sample s will get probability of Each sample s’ will get probability of

33

1737

44

2037

,71015,905,368

0.0591

16

3. Example 1 NWCR method for negative coordination of two surveys.

Theorem:

(a) The joint density XNWCR obtained by the NWCR method for negative coordination satisfies the constraints given in (3).

(b) XNWCR has the minimum expected overlap within the set of joint densities that satisfy (3).

(c) XNWCR has the minimum variance within this set of joint densities.

Proof in Mach, Reiss, Şchiopu-Kratina (2006).

17

3. Example 1 NWCR method for negative coordination of two surveys.

Simultaneous Selection

i) Select one block using the joint probabilities p(c, c’) in Table 1a.ii) To draw samples s and s’, randomly select units from each set: C = common

units, D = deaths, B = births.

Suppose block (19, 18) selected in i). To select s, randomly select 19 units from 37 in C, and 1 unit from 3 in D . To select s’, take the remaining 37-19=18 units from C, and randomly select two units from 4 in B .

Sequential Selection (s drawn first)

i) Select one block from the super-row c(s) using the conditional probabilities p{(c, c’)| c(s)} corresponding to the joint probabilities in Table 1b.ii) Randomly select units from C and B sets to form s’.

18

3. Example 1 NWCR method for negative coordination of two surveys.

Deaths(D=3)

Common Units(C=37)

Births(B=4)

s s’

n = 20 o (s, s’ ) = 0 c’ = 18 n ’= 20c = 19

19

3. Example 1 NWCR method for negative coordination of two surveys.

20 19 18 17 16 p(c)

17 0 0.0083 0 0.0336 0 0.0426 0 0.0239 0 0.0043 0.1127

18 1 0.0225 0 0.1022 0 0.1574 0 0.0890 0 0.0160 0.3871

19 2 0.0210 1 0.0987 0 0.1508 0 0.0993 0 0.0182 0.3880

20 3 0.0052 2 0.0251 1 0.0426 0 0.0319 0 0.0074 0.1122

p(c’) 0.0570 0.2596 0.3934 0.2441 0.0459 1.0000

Table 1b: Empirical block probabilities for Sequential SRSWOR (PRN)

E [o(s, s’)] V [o(s, s’)]

NWCR 0 0

PRN 0.2716 0.3212

Table 1c: Expectations

20

4. Example 2Reduced TP for positive coordination after re-stratification.

C1 : C1 = 2

New stratum:N’ =15n’ = 5

C2 : C2 = 3

C3 : C3 = 10

Old stratum 1:N1 =20n1 =10

Old stratum 2:N2 = 6n2 = 3

Old stratum 3:N3 =10n3 = 2

Objective: Maximize . ssoE ,

21

4. Example 2 Reduced TP for positive coordination after re-stratification.

Super-rows:→ 3 x 4 x 1 = 12 super-rows

Super-columns:

(0, 0, 5), (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 0, 4), (1, 1, 3), (1, 2, 2), (1, 3, 1),

(2, 0, 2), (2, 1, 2), (2, 2, 1), (2, 3, 0). → 12 super-columns

Reduced TP has 12 x 12 = 144 unknowns.

Constraints:

:,, 321 cccc .2,3,2,1,0,2,1,0 321 ccc

:5''',',','' 321321 ccccccc

2

2

22

22

2

2

1

1

11

11

1

1)(nN

cnCN

cC

nN

cnCN

cC

p c

'

''''

)'(3

3

2

2

1

1nN

cC

cC

cC

p c

Product of hypergeometricprobabilities

Multihypergeometricprobabilities

22

4. Example 2 Reduced TP for positive coordination after re-stratification.

c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)

2,3,2 5 0 5 0.0115 5 0 … 2 0 0.0118

2,2,2 5 0 5 0.0301 4 0 … 2 0 0.1066

1,3,2 5 0 4 0 5 0.0031 … 2 0 0.0263

… … … … … … …

0,0,2 2 0 2 0 2 0 … 2 0.0118 0.0118

p(c’) 0.0899 0.0450 0.0150 … 0.0839 1.0000

c’

Table 2a: Block overlap and probabilities p(c,c’) (TP solution)

o(c, c’) = min(c1,c1’) + min(c2,c2’) + min(c3,c3’)

ETP [o(s, s’)] = 3.6494 VTP [o(s, s’)] = 0.7292

23

4. Example 2 Reduced TP for positive coordination after re-stratification.

Sequential selection:Suppose c = (2,3,2) with p(c’)=0.01184

c’ 2,1,2 2,3,0 Σ

p(c’) 0.01151 0.00033 0.01184

p{c’ |c=(2,3,2)} 0.97213 0.02787 1

ETP{o |c=(2,3,2)} = 5

VTP {o |c=(2,3,2)} = 0

i) Select super-column c’ using p{c’ |c=(2,3,2)}.

ii) Suppose c’ = (2,1,2) selected. → Randomly de-select 2 units from s C2 to form s’.

Table 2b: Probabilities for c = (2,3,2)

24

4. Example 2 Reduced TP for positive coordination after re-stratification.

Is the matrix of overlaps o(s, s’), within a block, is symmetric?

Consider block {c =(2,3,2), c’ =(2,1,2)} with o(c, c’)=5:

• = 43,758 x 1 x 45 different samples (rows) s

• = 1 x 3 x 45 different samples (columns) s’

For each s, there are exactly 3 samples s’ such that o(s, s’)=5.For each s’, there are exactly 43,758 samples s such that o(s, s’)=5.

Each s’ will get probability of

210

03

33

818

22

210

13

22

453

0.01151

25

4. Example 2 Reduced TP for positive coordination after re-stratification.

43,758 rows

333445433344543334454

333445433344543334454333445433344543334454333444533344453334445

333444533344453334445333444533344453334445

43,758 rows

16 s’ 16 s’ 16 s’28 s’ 28 s’ 28 s’

Table 2c: Matrix of o(s, s’); block {c =(2,3,2), c’ =(2,1,2)}

26

4. Example 2 Reduced TP for positive coordination after re-stratification.

c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)

2,3,2 5 0.0022 5 0.0015 5 0.0007 … 2 0.0002 0.0124

2,2,2 5 0.0160 5 0.0173 4 0.0006 … 2 0.0022 0.1067

1,3,2 5 0.0055 4 0.0001 5 0.0025 … 2 0.0007 0.0254

… … … … … … …0,0,2 2 0.0001 2 0 2 0 … 2 0.0069 0.0116

p(c’) 0.0897 0.0453 0.0153 … 0.0847 1.0000

Table 2d: Empirical block probabilities for Sequential SRSWOR (PRN)

c’

E [o(s, s’)] V [o(s, s’)] E{o |c=(2,3,2)} V{o |c=(2,3,2)}

TP 3.6494 0.7292 5 0

PRN 3.5602 0.6940 4.3282 0.5746

Table 2e: Expectations

5. CONCLUSION

Optimal sample coordination is a TP.

For stratified SRSWOR, we can reduce TP by grouping samples.

The groups must be formed so that the matrix of o(s, s’) within each group is symmetric.

The solution and the selection is done in two stages.

Different objective functions can be defined, depending on the goal of the sample coordination project.

Pour plus d’information, veuillez contacter

For more information please contact

www.statcan.ca

Optimal Coordination of Samplesin Business Surveys

Lenka Mach

E-mail/Courriel: Lenka.Mach@statcan.ca

29

REFERENCES

Ernst, L.R. (1999), “The Maximization and Minimization of Sample Overlap Problems: A Half Century of Results,” Bulletin of the International Statistical Institute, Proceedings, Tome LVIII, Book 2, pp 293-296.

Mach, L., Reiss, P.T., and Şchiopu-Kratina, I. (2006), “Optimizing the Expected Overlap of Survey Samples via the Northwest Corner Rule,” Journal of the American Statistical Association, Vol. 101, No. 476, Theory and Methods, pp. 1671-1679.

McKenzie, B. and Gross, B. (2000), “Synchronized Sampling,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 237-243.

Ohlsson, E. (2000), “Coordination of PPS Samples Over Time,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 255-264.

Royce, D. (2000), “Issues in Coordinated Sampling at Statistics Canada,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 245-254.