1 Sampling Models for the Population Mean Ed Stanek UMASS Amherst.

1

Sampling Models for the Population Mean

Ed Stanek

UMASS

Amherst

2

Basic Problem (Population Mean)

PopulationData

Listing Latent Value

Rose

Lily

Daisy

Rosey

Lilyy

Daisyy

3Rose Lily Daisyy y y

What is ?

Rosey

3

Basic Problem (Population Mean)

Some NotationPopulation

Listing Latent Value

Rose

Lily

Daisy

Rosey

Lilyy

Daisyy

1,...,j

L j N

y

Label Set of Subjects in the Population

1

0 2

3

j

Rose

Lily

Daisy

λ

Listing

1

0 2

3

Rose

j Lily

Daisy

y y

y y y

y y

y

Latent Values Assumption: Response is equal to the latent value for the subject. Thereis no measurement error.

1

1 N

jj

yN

Using vector notation:

1

L

Lily Rose Daisy

yN

y y y

N

Using set notation:

4

Sampling Model

• Select a simple random sample without replacement of size n– Define an estimator that is a linear function of

the sample data– Require the estimator to be unbiased– Determine coefficients that minimize the

variance (over all possible samples)

• Best Linear Unbiased Estimator (BLUE)

5

Sampling Model

Select a simple random sample without replacement

p*=1 p*=2 p*=3 p*=4 p*=5 p*=6

All possible Permutationsof subjects

Order Potential Response

i

1i 1Y

2i 2Y

3i 3Y

R

L

D

R

D

L

L

R

D

L

D

R

D

R

L

D

L

R

Probability of Permutation

01p

02p

03p

04p 0

5p 06p

* *

0 0 1

!p pE I p

N for all * 1,2,..., !p N

Listing

*

*

001 if

0 otherwise

p

pI

Y u y

6

Sampling Model


p*=1 p*=2 p*=3 p*=4 p*=5 p*=6

All possible Permutationsof latent values

1

0 2

3

Rose

j Lily

Daisy

y y

y y y

y y

y

* *

*

1 !0

2 01

3

N

i p pp

Y

Y Y I

Y

Y u y

Potential Response

1 0u y2 0u y

6 0u y

1

1 0 0

0 1 0

0 0 1

u

2

1 0 0

0 0 1

0 1 0

u 6

0 0 1

0 1 0

1 0 0

u

Lilyy

Rosey

Daisyy

Lilyy Rosey

Rosey

Rosey

RoseyRoseyLilyy

Lilyy Lilyy

Lilyy

Lilyy

Daisyy

Daisyy

Daisyy

Daisyy

Daisyy

7

Permutation

All possible Permutation

Order Potential Response

i

1i 1Y

2i 2Y

3i 3Y

1

2

3

Y

Y

Y

Y

Data

Remainder

i i iY E Y E

Sampling Model


* *

*

!0

01

N

p pp

I

Y u y

8

•Represent the Population as a Vector of Random Variables

•The random variables are indexed by their position- not the label for the subject in a position subject

•The subject corresponding to a random variable can not be identified

Permutation

1

2

3

Y

Y

Y

Y

Data

Remainder

Position i=1

Sampling Model


Sample Size: n=1

Permutation

1

2

3

Y

Y

Y

YData

Remainder

Sample Size: n=2

9

Sampling Model

Define the Target

1

N

i ii

P

g Y

g YLinear combination of Population Random Variables:

Special case: Mean (Parameter) 1 for all 1,...ig i N

N

1

1

1

N

N

ii

PN

YN

1 Y

•May be a Parameter•May be a Random variable

Special case: Latent value for Randomly

Selected Subject *

*

1 for

0 for all

i

i

g i

g i i

iP Y

1 2 Ng g g g

10

Sampling Model

Expected Value

1

2

3

I

II

Y

Y

Y

YY

Y

Data

i i iY E Y E

I I n

II II N n

E

Y X 1

Y X 1 NE Y 1

Expected Value Expected Value

Under SRS w/o Rep: iE Y

E

Y Y E

Xβ E E Y Xβ

NE Y 1

LinearLink Function

NE X 1 β

11

Sampling Model

Variance

1

2

3

I

II

Y

Y

Y

YY

Y

Data

i i iY E Y E

2

2

1var N N

N

N

Y I J

P

22

1

1

1

N

ss

yN

2

,

,

1 1

var1 1

n n n N nI

IIN n n N n N n

I I II

II I II

N N

N N

I J 1 1Y

Y1 1 I J

V V

V V

Variance

Variance

Term due to finite populationcorrection factor

1N N NN P I J

where

12

Sampling ModelExpected Value and Variance

Reference Sets

Reference Set: The set of possible values that sample random variables can have with positive probability

Expectation is evaluated over a reference set

1

2

3

I

II

Y

Y

Y

YY

Y

Data

1I YY1n

Example:

If

, ,Lily Rose Daisyy y y

Reference set for 1I YY

13

Sampling Model

Expected Value and Variance:

Reference Sets

1

2

3

I

II

Y

Y

Y

YY

Y

Data

1I YY 1 ReferenceElementReference

Reference

ElementI

Elements

E E Y P y

Y

Reference set for 1I YY

Reference 1

Element 3P

1 ReferenceElementReference

Reference

Element

1 1 1

3 3 3

Elements

Lily Rose Daisy

E Y P y

y y y

1n

, ,Lily Rose Daisyy y y

14


Reference Sets

1

2

3

I

II

Y

Y

Y

YY

Y

Data1

2I

Y

Y

Y

Reference set for

2n

1

2I

Y

Y

Y

, , , , ,Lily Rose Lily Daisy Rose Daisyy y y y y y

Example when

Sets of possible latent values

If 10Lilyy

8Daisyy

6Rosey

10 6 , 10 8 , 6 8Reference set for IY

15


Reference Sets vs Sequence

1

2

3

I

II

Y

Y

Y

YY

Y

Data1

2I

Y

Y

Y

2n


Example when

Reference Set for IY

L

R

L

D

R

L

R

D

D

L

D

R

D R D L R L

Permutation (sequences)

p*=1 p*=2 p*=3 p*=4 p*=5 p*=6

1

2

3

Y

Y

Y

, , , , ,Lily Rose Rose DaisyLily Daisy

Daisy Lily Daisy LilyRose Rose

y y y yy y

y y y yy y

Reference Sequence for IY

16


Reference Sets vs Sequence

1

2

3

I

II

Y

Y

Y

YY

Y

Data 1

2I

Y

Y

Y

2n


Example when

Reference Set :

, , , , ,Rose Lily Daisy RoseLily Daisy

Lily Daisy Lily DaisyRose Rose

y y y yy y

y y y yy y

Reference Sequence :Used in Random PermutationModel

Sufficient, assumingorder doesn’t matter

17

Sampling Model

Determining the BLUE for

1 2 na a a a

Linear Estimator:

Question: What should a be so that the estimator is unbiased and has minimum variance?

I

I IIII

I I II II

P

g Y

Yg g

Y

g Y g Y

Target:

1I II n N nN g g 1 1where

data

ˆI I

I I I

P

g a Y

g Y a Y

18

Sampling Model

Determining the BLUE for Unbiased Constraint

Unbiased requirement:

Implies that

ˆ I I I

I I II II

P

P

g Y a Y

g Y g Y

ˆ 0E P P

ˆ III

II

E P P

Xa g

X 0I II II a X g X

P̂ P NE Y 1

nI

N nII

E

1Y

1Y

ˆ I II IIP P a Y g Y

19

Sampling ModelDetermining the BLUE

Minimizing the Variance

Variance

ˆI II IIP P a Y g Y

0I II II a X g X ,

,

ˆvar I I IIR II

II I II II

P P

V V aa g

V V g

Unbiased Constraint

Lagrangian Function to Minimize with Respect to a

,, 2 2I II II I II II II I II IIf a λ a V a g V a g V g a X g X η

,

,2 2 2I I II II I

f

a η

V a V g X ηa

,2 I II II

f

a η

X a X gη

,

ˆ ˆ,ˆ1

ˆ ˆ ˆ,2I I I II II n

I II II

f

f

aV X V ga 0aX 0 X ga 0

,ˆ

ˆI I I II II

I II II

V X V ga

X 0 X g

20



Solving the Estimating Equations

,ˆ

ˆI I I II II

I II II

V X V ga

X 0 X g

A BM

C D

1 1 1 1 1 11

1 1 1

A A BQ CA A BQM

Q CA Q1 Q D CA Bwhere

1 11 1 1 1 1 11

1 11 1 10

I I I I I I I I I I I I II I

II I I I I I I I

V V X X V X X V V X X V XV X

X X V X X V X V X

1 11 1 1 1 1 1

,ˆ I I I I I I I I I II II I I I I I II II

a V V X X V X X V V g V X X V X X g

21



Solving the Estimating Equations

1 11 1 1 1 1 1

,ˆ I I I I I I I I I II II I I I I I II II

a V V X X V X X V V g V X X V X X g

11 1ˆ

I I I I I I

X V X X V YLet

ˆ ˆI I IP g Y a Y

1 11 1 1 1 1 1,ˆ II I II I I I I I I I I II II I I I I I

a g V V V X X V X X V g X X V X X V

1,

ˆ ˆ ˆI I II II II I I I IP g Y g X V V Y X

ˆ ˆvar var

ˆ ˆ

I I

I I I

P

g a Y

g a V g a

22

Sampling Model

Determining the BLUE of

Using

1,

ˆ ˆ ˆI I II II II I I I IP g Y g X V V Y X

1 1I II n N nN N

g g 1 1

11 1 1I n n n nN N n

V I J I J

and1

I nNX 1

1II N nN X 1

11 1 1

ˆI I I I I I n IN

n

X V X X V Y 1 Y

1

I I I

n

N N n

X V X so that

wheren

fN

1,

1,

1,

1 1 1 1ˆ

1 1 1 1

1 11

n I N n N n n II I I n n n I

n I n N n II I I n n I

N n II I I n n I

N NP f

n N N n N n

N nfn N n N n

fY f YN n

1 Y 1 1 1 V V I 1 1 Y

1 Y 1 1 V V I J Y

1 V V I J Y

1n IY

n 1 Y

23

Sampling Model


Now

where

1,

1 1ˆ 1 N n II I I n n IP fY f YN n

1 V V I J Y

1n IY

n 1 Y

1,

1 1

11

1

II I I N n n n n

N n n

N n n

N N n

n

N N n

N n

V V 1 1 I J

1 1

1 1

and 1n n n nn

1 I J 0

As a result ˆ 1P fY f Y

Y

24

Sampling Model


Now

where andˆ ˆI I IP

Y

g Y a Y I I fY g Y ˆ 1I f Y a Y

2

ˆ ˆvar var

var

1var

I I

n I n

P

Y

n

g a Y

1 Y 1

Since 2 1I n nN

V I J

2

ˆvar 1P fn

1 Sampling Models for the Population Mean Ed Stanek UMASS Amherst.

Documents

Transcript of 1 Sampling Models for the Population Mean Ed Stanek UMASS Amherst.