1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units...

92
1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or “elements”) Population O bs Unit Cluster U.S.residents person household Lincoln households household city block,or postalroute UNL em ployees employee departm ent M aple trees in Verm ont tree 1 km 1 km plot

Transcript of 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units...

Page 1: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

1

Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of

observation units (or “elements”)

Population Obs Unit Cluster

U.S. residents person household

Lincoln households household city block, or postal route

UNL employees employee department

Maple trees in Vermont

tree 1 km 1 km plot

Page 2: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

2

Cluster sample DEFN: A cluster sample is a

probability sample in which a sampling unit is a cluster

Frame SU OU List of phone numbers phone number person List of blocks block household List of UNL departments department faculty member List of plots plot tree

Page 3: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

3

Cluster sample – 2 1-stage cluster sampling

Divide the population (of K elements) into N clusters (of size Mi for cluster i)

Cluster = group of elements An element belongs to 1 and only 1 cluster

Sampling unit Cluster = group of elements = PSU = primary

sampling unit We’ll start by assuming a SRS of clusters (equal prob) Can use any design to select clusters (STS, PPS) –

we’ll work with other designs in Ch 6 Data collection

Collect information on ALL elements in the cluster

Page 4: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

4

1-stage CS STS

Take an SRS f rom ever stratum:Take an SRS of clusters; observe all elements within the clusters in thesample:

A block of cells is a stratum

A block of cells is a clusterSU is a cluster

Don’t sample from every cluster

SU is an element (or OU)

Sample from every stratum

Sample of 40 elements

Page 5: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

5

Cluster vs. stratified sampling Cluster sample

Divide K elements into N clusters Cluster or PSU i has Mi elements

Take a sample of n clusters Stratified sampling

N elements divided into H strata An element belongs to 1 and only 1 stratum

Take a sample of n elements, consisting of nh elements from stratum h for each of the H strata

N

iiMK

1

Page 6: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

6

Cluster sample – 3 2-stage cluster sampling (later)

Process Select PSUs (stage 1) Select elements within each sampled PSU (stage

2) First stage sampling unit is a …

PSU = primary sampling unit = cluster Second stage sampling unit is a …

SSU = secondary sampling unit = element = OU Only collect data on the SSUs that were

sampled from the cluster

Page 7: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

7

1-stage vs. 2-stage cluster sampling

Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:

1-stage cluster sample (stop here)

OR

Stage 1 of 2-stage cluster sample(select PSUs)

Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)

Page 8: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

8

Why use cluster sampling? May not have a list of OUs for a frame, but a list

of clusters may be available List of Lincoln phone numbers (= group of residents) is

available, but a list of Lincoln residents is not available List of all NE primary and secondary schools (= group

of students) is available, but a list of all students in NE schools is not available

May be cheaper to conduct the study if OUs are clustered

Occurs when cost of data collection increases with distance between elements

Household surveys using in-person interviews (household = cluster of people)

Field data collection (plot = cluster of plants, or animals)

Page 9: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

9

Defining clusters due to frame limitations A cluster (or PSU) is a group of

elements corresponding to a record (row) in the frame

Example Population = employees in

McDonald’s franchises Element = employee Frame = list of McDonald’s stores PSU = store = cluster of employees

Page 10: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

10

Defining clusters to reduce travel costs A cluster (or PSU) is a group of

nearby elements Example

Population = all farms Element = farm Frame = list of sections (1 mi x 1 mi

areas) in rural area PSU = section = cluster of farms

Page 11: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

11

Cluster samples usually lead to less precise estimates Elements within clusters tend to be correlated

due to exposure to similar conditions Members of a household Employees in a business Plants or soil within a field plot

We are getting less information than if selected same number of unrelated elements

Select sample of city blocks (clusters of households) Ask each household:

Should city upgrade storm sewer system? PSU (city block) 1

No storm sewer households will tend to say yes PSU (city block) 2

New development households will tend to say no

Page 12: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

12

Defining clusters for improved precision Define clusters for which within-cluster

variation is high (rarely possible) Make each cluster as heterogeneous as possible

Like making each cluster a mini-population that reflects variation in population

Minimizes the amount of correlation among elements in the cluster

Opposite of the approach to stratification Large variation among strata, homogeneous within

strata Define clusters that are relatively small

Extreme case is cluster = element Decreasing the number of correlated

observations in the sample

Page 13: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

13

Example for single-stage cluster sampling w/ equal prob (CSE1) Dorm has N = 100 suites (clusters) Each suite has Mi = 4 students (4 elements

in cluster i , i = 1, 2, … , N) Note that there are

Take SRS n = 5 suites (clusters) Ask each student living in each of the 5

suites How many nights per week do you eat dinner in

the dining hall? Will get observations from a sample of 20

students = 5 suites x 4 students/suite

population in students 400)4(1001

N

iiMK

Page 14: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

14

Dorm example – 2

Stu-dent

Suite 6

Suite 21

Suite 28

Suite 54

Suite 89

1 5 3 6 5 1

2 5 2 4 4 4

3 4 4 4 6 3

4 6 5 5 6 2

Total 20 14 19 21 10

Page 15: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

15

Dorm example – 3 SRS of n = 5 dorm rooms Data on each cluster (all students in dorm

room) ti = total number of dining hall dinners for dorm

room i t2 = 14 dining hall dinners for 4 students in dorm

room 2 Estimated total number of dining hall nights

for the dorm students HT estimator of total = pop size x sample mean (of

cluster totals)dinners hall dining 1680)8.16(100

)1021191420(51

1001ˆ

1

n

iiunb t

nNt

Page 16: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

16

Notation Indices

i = index for PSU i i j = index for SSU j in PSU i

Number of PSUs (clusters) in the population N clusters

Number of SSUs (elements) in a PSU (cluster) Mi elements

Number of SSUs (elements) in the polulation

In Chapters 1-4, this was designated as N elements

1

N

iiMK

Page 17: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

17

Notation – 2

N = 12 PSUs

K = 20 + 12 + … + 9 + 16

= 150 SSUs

M1 = 20 SSUs

M2 = 12 SSUs

M12 = 16 SSUs

M11 = 9 SSUs

i =1

i =9

i =4i =3i =2

i =11 i =12

i =5

SSU i = 9j = 1 SSU

i = 9j = 7

Page 18: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

18

Notation – 3 Response variable for SSU j in PSU

i yij e.g., age of j-th resident in household

i e.g., whether or not dorm resident j

in room i owns a computer

Page 19: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

19

Cluster size =

Cluster population total

Note that we observe cluster population total (or mean or variance) for each sample cluster in 1-stage cluster sampling

We will estimate cluster parameters in 2-stage cluster sampling

iM

jiji yt

1

Cluster-level population parameters (for cluster i )

Mi elements

Page 20: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

20

Cluster population mean

Within-cluster variance

Cluster-level population parameters (for cluster i ) – 2

iM

jiUij

ii yy

MS

1

22

11

i

iM

jij

iiU M

ty

My

i

1

1

Page 21: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

21

75.733.4

39

9

21111

11

11

Sy

t

M

U

Popuation

83.3

46

88.6Sboxes12

2

2

222

Uy

t

M

33.3

30

00.9S9

6

6

266

Uy

t

M

00.7

95.4

99

20

21

1

1

1

S

y

t

M

U

1-stage cluster sample

Page 22: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

22

Cluster-level population parameters (for cluster i ) – 3 For 1-stage cluster samples

Have a complete enumeration of the cluster elements

Cluster population parameters are known For 2-stage cluster samples

Observe data on a sample of elements in a cluster

Estimate cluster population parameters

Page 23: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

23

Population parameters Same parameters as in previous

chapters, rewritten in notation for cluster sampling

Population size

(** K was referred to as N in previous chapters)

Population total (sum of all cluster totals)

N

ii

N

i

M

jij tyt

i

11 1

elements 1

N

iiMK

Page 24: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

24

Population Parameters-2 Population mean (of K elements)

Population variance (among K elements)

Variance among N cluster totals

N

i

M

jUij

i

yyK

S1 1

22

11

N

i

M

jijU

i

yK

y1 1

1

N

iit N

tt

NS

1

22

11

Page 25: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

25

Data from cluster samples Work with element and cluster-level data Element data set will have columns for

Cluster id Element id within cluster Variable (y)

Will also summarize this data set to generate cluster parameters (1-stage) or estimates of cluster parameters (2-stage)

Cluster id Cluster total (or estimate) Cluster mean (or estimate) Cluster variance (or estimate)

Page 26: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

26

1-stage cluster sampleElement data Cluster

summary

i j yij

1 1 y11

1 2 y12

1 3 Y13

1 4 y14

2 1 y21

2 2 y22

2 3 y23

3 1 y31

i ti

1 t1

2 t2

3 t3

iUy

Uy1

Uy2

Uy3

2iS

21S22S23S

Page 27: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

27

Estimation for CSE1 Chapter reading

Section 5.2.1 covers equal sized clusters (Mi constant, read)

We’ll start with 5.2.3 (unequal sized clusters, Mi varies)

Section 5.2.2 covers theory Two types estimators

Unbiased – HT estimator Ratio estimation

Equal probability sample of clusters – assume SRS of clusters

Page 28: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

28

CSE1 unbiased estimation under SRS – total t Estimator for population total using data

collected from a 1-stage cluster sample SRS of clusters

Estimator of variance of

n

iiunb t

nN

t1

ˆ

unbt

2

1

22

11

where1ˆˆ

N

tt

ns

n

s

Nn

NtV unbi

n

it

tunb

Page 29: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

29

Dorm example – 4 Estimated population total

Estimated variance

dinners hall dining 1680)8.16(100

)1021191420(51

1001ˆ

1

n

iiunb t

nNt

06.203ˆ

230,415

7.21

100

511001ˆˆ

7.21])8.1610(...)8.1620[(15

1

1

22

2

22

1

2

2

unb

tunb

n

i

unbit

tSE

n

s

N

nNtV

N

tt

ns

Page 30: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

30

Two events : A and B Pr{ A and B both occur }

= P { A occurs } x P { B occurs given A occurs } In our setting

A = sample cluster i B = sample element j (in cluster i)

Inclusion probability for for element j in cluster i ij = Pr {including element j and cluster i in sample}

= Pr {including cluster i in sample} x Pr {incl. element j given cluster i has been

included in sample}

CSE1 inclusion probability for an element

Page 31: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

31

Need to two pieces Pr {including cluster i in sample} = n / N Pr {including element j given cluster i has been

included in sample} = 1 Inclusion probability ij

= Pr {including element j and cluster i in sample}= Pr {including cluster i in sample} x

Pr {including element j given cluster i has been included in sample} = (n / N ) x 1 = n / N

CSE1 inclusion probability for an element – 2

Page 32: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

32

CSE1 weight for an element Weight for element j in cluster i

Inverse element inclusion probability wij = 1/ ij = N /n

Estimator using weights

n

ii

n

i

M

jij

n

i

M

jijijunb t

nN

ynN

ywtii

11 11 1

ˆ

Page 33: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

33

Dorm example – 5 Inclusion probability for student j in

dorm room i N = 100 dorm rooms n = 5 sample dorm rooms Take all 4 students in dorm room ij = n / N = 1/20 = 0.05

Weight for student j in dorm room i wij = N / n = 20 students

Page 34: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

34

CSE1 unbiased estimation under SRS – mean Unbiased estimator for population

mean For SRS, estimator for total divided by

number of population elements (OUs) Units are y-units per element

unbunb

unbunb

tVK

yV

tK

y

ˆˆ1ˆˆ

ˆ1ˆ

2

Uy

Page 35: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

35

Dorm example – 6

51.0ˆ

257688.0400

230,41ˆˆˆˆ

per weekstudent per dinners hall dining 20.4

)4(100

1680ˆˆ

22

unb

unbunb

unbunb

ySE

K

tVyV

K

ty

Page 36: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

36

Unbiased estimation – proportion p What is y ?

Page 37: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

37

Ratio estimation Usually ti (cluster total) is correlated with Mi

(cluster size) As Mi (# SSUs/elements in cluster i ) increases,

value for ti (total of yij for cluster i ) increases Positive correlation between Mi and ti No intercept

Perfect conditions for SRS ratio estimator

Notation of Ch 3 Notation of Ch 5

yi (variable of interest) ti (cluster total)

xi (auxiliary info) Mi (cluster size)

Page 38: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

38

Ratio estimation for CSE1 Estimator for population mean

Units are y-units per element

n

ii

n

ii

r

M

ty

1

Page 39: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

39

Ratio estimation for CSE1 – 2 Estimator for variance of ratio

estimator of population mean

is average cluster size for populationUM

1

ˆ1

1

1

ˆ1

1ˆˆ

1

22

2

1

2

2

n

yyM

MnNn

n

Myt

MnNn

yV

n

irii

U

n

iiri

Ur

Page 40: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

40

Ratio estimation for CSE1 – 3 Average cluster size

If unknown, can estimate with sample mean of cluster sizes

NK

MN

MN

iiU

1

1

n

iiS M

nM

1

1

Page 41: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

41

Dorm example – 7 Estimated population mean

Average cluster size

n

ii

n

ii

r

M

ty

1

N

KM

NM

N

iiU

1

1

Page 42: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

42

Dorm example – 8 Estimated variance

1

ˆ1

1

1

ˆ1

1ˆˆ

1

22

2

1

2

2

n

yyM

MnNn

n

Myt

MnNn

yV

n

irii

U

n

iiri

Ur

Page 43: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

43

Ratio estimation for CSE1 – 4 Estimator for population total

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

Page 44: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

44

Dorm example – 9 Estimated population total

Estimated variance

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

Page 45: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

45

CSE1: impact of cluster size If cluster sizes Mi are variable across

clusters, generally estimate population parameter with less precision If ti is related to Mi , then get large

variation among cluster totals if Mi is variable

Variance of population parameter estimator (unbiased or ratio) is a function of variation among cluster totals

Page 46: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

46

2-stage equal probability cluster sampling (CSE2) CSE2 has 2 stages of sampling

Stage 1. Select SRS of n PSUs from population of N PSUs

Stage 2. Select SRS of mi SSUs from Mi elements in PSU i sampled in stage 1

Page 47: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

47

2-stage cluster sampling

Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:

Stage 1 of 2-stage cluster sample(select PSUs)

Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)

Page 48: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

48

Motivation for 2-stage cluster samples

Recall motivations for cluster sampling in general Only have access to a frame that lists

clusters Reduce data collection costs by going

to groups of nearby elements (cluster defined by proximity)

Page 49: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

49

Motivation for 2-stage cluster samples – 2 Likely that elements in cluster will be

correlated May be inefficient to observe all elements in

a sample PSU Extra effort required to fully enumerate a

PSU does not generate that much extra information

May be better to spend resources to sample many PSUs and a small number of SSUs per PSU Possible opposing force: study costs

associated to going to many clusters

Page 50: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

50

Have a sample of elements from a cluster We no longer know the value of

cluster parameter, ti

Estimate ti using data observed for mi SSUs

CSE2 unbiased estimation for population total t

im

jij

i

iiii y

m

MyMt

1

ˆ

Page 51: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

51

CSE2 unbiased estimation for population total – 2 Approach is to plug estimated

cluster totals into CSE1 formula CSE1

CSE2

n

iii

n

jiunb yM

nN

tnN

t11

ˆˆ

n

iiUi

n

jiunb yM

nN

tnN

t11

ˆ

Page 52: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

52

The variance of has 2 components associated with the 2 sampling stages1. Variation among PSUs2. Variation among SSUs within PSUs

CSE2 unbiased estimation for population total – 3

unbt

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

among PSU

within PSU

Page 53: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

53

In CSE1, we observe all elements in a cluster We know ti

Have variance component 1, but no component 2

In CSE2, we sample a subset of elements in a cluster We estimate ti with Component 2 is a function of estimates

variance for

CSE2 unbiased estimation for population total – 4

it

it

i

i

i

ii m

s

M

mM

22 1

Page 54: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

54

CSE2 unbiased estimation for population total – 5 Estimated variance among cluster

totals

Estimated variance among elements in a cluster

n

i

unbit N

tt

ns

1

2

ˆ1

1

im

jiij

ii yy

ms

1

22

11

Page 55: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

55

CSE2 unbiased estimation for population total – 6

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

n

i

unbit N

tt

ns

1

2

ˆ1

1

im

jiij

ii yy

ms

1

22

11

Page 56: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

56

Dorm example – 10 Stage 2: select 2 students in each

room

Stu-

dent

Rm 6

Rm 21

Rm 28

Rm 54

Rm 89

1 5 3 6 5 1

2 5 2 4 4 4

3 4 4 4 6 3

4 6 5 5 6 2

Total

? ? ? ? ?

Page 57: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

57

Dorm example – 11 Stage 1

Cluster = N = n = SRS

Stage 2 Element = Mi = mi = SRS

it

Page 58: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

58

Dorm example – 12

it

Stu-dent

(j)

Rm 6

(i=1)

Rm 21

(i=2)

Rm 28

(i=3)

Rm 54

(i=4)

Rm 89

(i=5)

1 5 3 4 5 4

2 6 2 5 4 2

2is

iy

ii yM

im

jiij

i

yym 1

2

11

Page 59: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

59

Dorm example – 13

n

jiunb t

nN

t1

ˆˆ

n

i

unbit N

tt

ns

1

2

ˆ1

1

Page 60: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

60

Dorm example – 14

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

Page 61: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

61

CSE2 unbiased estimation for population mean

2

ˆˆˆˆ

ˆˆ

K

tVyV

Kt

y

unbunb

unbunb

Uy

Page 62: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

62

Dorm example – 15

2

ˆˆˆˆ

ˆˆ

K

tVyV

Kt

y

unbunb

unbunb

Page 63: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

63

Two events : A and B Pr{ A and B both occur }

= P { A occurs } x P { B occurs | A occurs } “|” denotes “given” (a condition)

In our setting A = sample cluster i B = sample element j

Inclusion probability symbols ij = Pr {including element j and cluster i in sample} i = Pr {including cluster i in sample} j|i = Pr {incl. element j | cluster i has been included

in sample}

CSE2 inclusion probability for an element

Page 64: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

64

Need to two pieces i = Pr {including cluster i in sample} = n / N

j|i = Pr {including element j | cluster i has been included in sample} = mi /Mi

Inclusion probability for element j in cluster i ij = i j|i =

CSE2 inclusion probability for an element – 2

i

i

Mm

Nn

Page 65: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

65

CSE2 weight for an element Sampling Weight for element j in

cluster i

Estimator for population total

n

ii

n

iii

n

i

M

jij

i

in

i

M

jij

i

in

i

M

jijijunb

tnN

yMnN

ymM

nN

ymM

nN

ywtiii

11

1 11 11 1

ˆ

ˆ

i

i

ijij m

MnN

w 1

Page 66: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

66

What does equal probability mean in Ch 5? Clusters (PSUs) sampled using SRS Equal inclusion probability for stage 1

PSUs (clusters)

i is same for all i

Nn

i

Page 67: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

67

What does equal probability mean in Ch 5? – 2 Elements (SSUs) in a given PSU are

sampled using SRS All elements (j ) in a sample PSU (i ) are

selected with equal probability This is a conditional probability (given PSU i )

For a given PSU i , j|i is the same for all elements j

i

iij M

m|

Page 68: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

68

What does equal probability mean in Ch 5? – 3 Note that

Equal probability at stage 1 (i )

plus Equal probability at stage 2 given PSU i (j|i )

does NOT imply equal inclusion probability for an element

In fact, element-level (unconditional) inclusion probability is not necessarily constant

Depends on cluster size Mi and sample size mi for the cluster to which the element belongs

i

iij M

mNn

Page 69: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

69

CSE2 ratio estimation for population mean

n

ii

n

iii

n

ii

n

ii

r

M

yM

M

ty

1

1

1

1

ˆˆ

Uy

Page 70: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

70

CSE2 ratio estimation for population mean – 2

n

iiiU

n

irii

n

iriiir

n

i i

i

i

ii

r

Ur

Mn

MMM

yyMn

yMyMn

s

ms

M

mM

Nnns

Nn

MyV

1S

1

22

1

22

1

22

2

1or of mean sampleby estimated be can

ˆ1

1ˆ1

1

11

11ˆˆ

Page 71: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

71

Dorm example – 16

it

Stu-dent

(j)

Rm 6

(i=1)

Rm 21

(i=2)

Rm 28

(i=3)

Rm 54

(i=4)

Rm 89

(i=5)

1 5 3 4 5 4

2 6 2 5 4 2

5.5 2.5 4.5 4.5 3.0

22 10 18 18 12

0.5 0.5 0.5 0.5 2.0

2is

iy

ii yM

im

jiij

i

yym 1

2

11

Page 72: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

72

Dorm example – 16

n

ii

n

ii

r

M

ty

1

1

ˆˆ

n

iriir yyM

ns

1

222 ˆ

11

Page 73: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

73

Dorm example – 17

n

i i

i

i

ii

rr m

sM

mM

Nnns

Nn

MyV

1

22

21

11

1ˆˆ

Page 74: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

74

CSE2 ratio estimation for population total t

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

Page 75: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

75

Dorm example – 18

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

Page 76: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

76

Coots egg example Target pop = American coot eggs in Minnedosa,

Manitoba PSU / cluster = clutch (nest) SSU / element = egg w/in clutch Stage 1

SRS of n = 184 clutches N = ??? Clutches, but probably pretty large

Stage 2 SRS of mi = 2 from Mi eggs in a clutch Do not know K = ??? eggs in population, also large Can count Mi = # eggs in sampled clutch i

Measurement yij = volume of egg j from clutch i

Page 77: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

77

Coots egg example – 2 Scatter plot of volumes

vs. i (clutch id) Double dot pattern - high

correlation among eggs WITHIN a clutch

Quite a bit of clutch to clutch variation

Implies May not have very high

precision unless sample a large number of clutches

Certainly lower precision than if obtained a SRS of

eggs3681

n

iim

ijy

i

Could use a side-by-side plot for data with larger cluster sizes – PROC UNIVARIATE w/ BY CLUSTER and PLOTS option

Page 78: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

78

Coots egg example – 3 Plot

Rank the mean egg volume for clutch i ,

Plot yij vs. rank for clutch i Draw a line between yi 1 and

yi2 to show how close the 2 egg volumes in a clutch are

Observations Same results as Fig 5.3, but

more clear Small within-cluster

variation Large between-cluster

variation Also see 1 clutch with large

WITHIN clutch variation check data (i = 88)

ijy

i sorted by iy

iy

Page 79: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

79

Coots egg example – 4 Plot si vs. for clutch i Since volumes are

always positive, might expect si to increase as gets larger

If is very small, yi 1 and yi 2 are likely to be very small and close small si

See this to moderate degree

Clutch 88 has large si , as noted in previous plot

is

iy

iy

iy

Page 80: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

80

Coots egg example – 5 Estimation goal

Estimate , population mean volume per coot egg in Minnedosa, Manitoba

What estimator? Unbiased estimation

Don’t know N = total number of clutches or K = total number of eggs in Minnedosa, Manitoba

Ratio estimation Only requires knowledge of Mi , number of eggs in

selected clutch i , in addition to data collected May want to plot versus Mi it

Uy

Page 81: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

81

Coots egg example – 6

Clutch

Mi

iy 2is

it i

ii

i ms

MM

222

1

2ˆˆ

rii yMt

1 13 3.86 0.0094 50.23594 0.671901 318.9232 2 13 4.19 0.0009 54.52438 0.065615 490.4832 3 6 0.92 0.0005 5.49750 0.005777 89.22633 4 11 3.00 0.0008 32.98168 0.039354 31.19576 5 10 2.50 0.0002 24.95708 0.006298 0.002631 6 13 3.98 0.0003 51.79537 0.023622 377.053 7 9 1.93 0.0051 17.34362 0.159441 25.72099 8 11 2.96 0.0051 32.57679 0.253589 26.83682 9 12 3.46 0.0001 41.52695 0.006396 135.4898 10 11 2.96 0.0224 32.57679 1.108664 26.83682 … … … … … … …

180 9 1.95 0.0001 17.51918 0.002391 23.97106 181 12 3.45 0.0017 41.43934 0.102339 133.4579 182 13 4.22 0.00003 54.85854 0.002625 505.3962 183 13 4.41 0.0088 57.39262 0.630563 625.7549 184 12 3.48 0.000006 41.81168 0.000400 142.1994 sum 1757 4375.947 42.17445 11,439.58 var 149.565814

ry 2.490579

Page 82: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

82

Don’t

know

Use

Coots egg example – 7

061.0184

511.62549.91ˆ

18417.421

184511.62184

1549.91ˆˆ

549.9184/1757

511.62183

58.439,111

ˆˆ

49.21757

947.4375ˆ

ˆ

2

2

2

r

r

S

riiSi

r

iSi

iSi

r

ySE

NNyV

M

n

yMts

M

ty

Don’t know N , but assumed large

FPC 1

2nd term is very small, so approximate SE ignores 2nd

UM

sM

Page 83: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

83

Coots egg example – 8 What is first-stage PSU inclusion

probability?

What is conditional SSU inclusion probability at second stage?

What is unconditional SSU inclusion probability?

Page 84: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

84

CSE2: Unbiased vs. ratio estimation Unbiased estimator can poor precision if

Cluster sizes (Mi ) are unequal ti (cluster total) is roughly proportional to Mi

(cluster size)

Biased (ratio estimator) can be precise if ti roughly proportional to Mi

This happens frequently in pops w/cluster sizes (Mi) vary

Page 85: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

85

CSE2: Self-weighting design Stage 1: Select n PSUs from N PSUs in pop

using SRS Inclusion probability for PSU i :

Stage 2: Choose mi proportional to Mi so that mi /Mi is constant, use SRS to select sample

Inclusion probability for SSU j given PSU i :

Unconditional inclusion probability for SSU j in cluster i is constant for all elements

Nn

i

cMm

i

iij |

cNn

ij Inclusion probability may vary in practice because may not be possible for mi /Mi to be equal to c for all clusters

Page 86: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

86

Self-weighting designs in general Why are self-weighting samples

appealing?

Are dorm student or coot egg samples self-weighting 2-stage cluster samples?

What other (non-cluster) self-weighting designs have we discussed?

Page 87: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

87

Self-weighting designs in general – 2 What is the caveat for variance

estimation in self-weighting samples? No break on variance of estimator – must

use proper formula for design

Why are self-weighting samples appealing? Simple mean estimator Homogeneous weights tends to make

estimates more precise

Page 88: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

88

Return to systematic sampling (SYS) Have a frame, or list of N elements Determine sampling interval, k

k is the next integer after N/n Select first element in the list

Choose a random number, R , between 1 & k R-th element is the first element to be

included in the sample Select every k-th element after the R-th

element Sample includes element R, element R + k,

element R + 2k, … , element R + (n-1)k

Page 89: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

89

SYS example Telephone survey of members in an

organization abut organization’s website use N = 500 members Have resources to do n = 75 calls N / n = 500/75 = 6.67 k = 7 Random number table entry: 52994

Rule: if pick 1, 2, …, 7, assign as R; otherwise discard #

Select R = 5 Take element 5, then element 5+7 =12, then

element 12+7 =19, 26, 33, 40, 47, …

Page 90: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

90

SYS – 2 Arrange population in rows of

length k = 7R 1 2 3 4 5 6 7 i

1 2 3 4 5 6 7 1

8 9 10 11 12 13 14 2

15 16 17 18 19 20 21 3

22 23 24 25 26 27 28 4

… …

491

492

493

494

495

496

497

71

498

499

500

72

Page 91: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

91

Relationship between SYS and cluster sampling Design relationships

Element = ? Cluster = ? Sampling unit(s) = ? Cluster sampling design = ?

Relationship between frame ordering and expected precision of a an estimate from a cluster sample?

Periodic, where cycle of pattern is coincident with sampling interval k

Ordered by X , which is correlated with response variable Y

Random

Page 92: 1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units (or elements)

92

SYS – 3 Suppose X [age of member] is correlated with

Y [use of org website] Sort list by X before selecting sample

k 1 2 3 4 5 6 7 X i

1 2 3 4 5 6 7 young 1

8 9 10 11 12 13 14 2

15 16 17 18 19 20 21 3

22 23 24 25 26 27 28 4

… mid …

491

492

493

494

495

496

497

71

498

499

500

old 72