1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and...

108
1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference Sampling distribution of an estimator under a design: assessing the quality of the estimate used to make inference Apply these to SRS Selecting a SRS sample Estimating population parameters (means, totals, proportions) Estimating standard errors and confidence intervals Determining the sample size

Transcript of 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and...

Page 1: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

1

Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts

Population distribution of Y : object of inference Sampling distribution of an estimator under a

design: assessing the quality of the estimate used to make inference

Apply these to SRS Selecting a SRS sample Estimating population parameters (means, totals,

proportions) Estimating standard errors and confidence intervals Determining the sample size

Page 2: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

2

Assume ideal setting Sampled population = target

population Sampling frame is complete and does not

contain any OUs beyond the target pop No unit nonresponse

Measurement process is perfect All measurements are accurate No missing data (no item nonresponse)

That is, nonsampling error is absent

Page 3: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

3

Survey error model

Total Survey

Error

= Sampling Error

Nonsampling Error

+

Measurement errorNonresponse errorFrame error

Due to the sampling process (i.e., we observe only part of population)

Assessed via bias and variance

Page 4: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

4

Probability sample DEFN: A sample in which each unit in the

population has a known, nonzero probability of being included in the sample

Known probability we can quantify the probability of a SU of being included in the sample Assign during design, use in estimation

Nonzero probability every SU has a positive chance of being included in the sample Proper survey estimates represent entire target

population (under our ideal setting)

Page 5: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

5

Probability sampling relies on random selection methods Random sampling is NOT a haphazard

method of selection Involves very specific rules that include an

element of chance as to which unit is selected Only the outcome of the probability sampling

process (i.e., the resulting sample) is random More complicated than non-random

samples, but provides important advantages Avoid bias that can be induced by selector Required to calculate valid statistical estimates

(e.g., mean) and measures of the quality of the estimates (e.g., standard error of mean)

Page 6: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

6

Representative sample Goal is to have a “representative sample” Probability sampling is used to achieve this by

giving each OU in target population an explicit chance to be included in the sample Sample reflects variability in the population Applies to the sample, but does not apply to the OU/SU

(don’t expect each observation to be a “typical” pop unit Can create legitimate sample designs that

deliberately skew the sample to include adequate numbers of important parts of the variation Common example: oversampling minorities, women MUST use estimation procedures that take into account

the sample design to make inferences about the target population (e.g., sample weights)

Page 7: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

7

Basic sampling designs Simple selection methods

Simple random sampling (Ch 2 & 3) Select the sample using, e.g., a random number table

Systematic sampling (2.6, 5.6) Random start, take every k-th SU

Probability proportional to size (6.2.3) “Larger” SU’s have a higher chance of being included in

sample Selection methods with explicit structure

Stratified sampling (Ch 4) Divide population into groups (strata) Take sample in every stratum

Cluster sampling (Ch 5 & 6) OUs aggregated into larger units called clusters SU is a cluster

Page 8: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

8

Examples Select a sample of n faculty from the

1500 UNL faculty on campus Goal: estimate total (or average) number of

hours faculty spend per week teaching courses

Simple random sampling (SRS) Number faculty from 1 to 1500 Select a set of n random numbers (integers)

between 1 and 1500 Faculty with ids that match the random

numbers are included in the sample

Page 9: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

9

Examples - 2 Systematic sampling (SYS)

Choose a random number between 1 and 1500/n

Select faculty member with that id, and then take every k-th faculty member in the list, with sampling interval k is 1500/n

SRS / SYS Each faculty member has an equal chance

of being included in sample Each sample of n faculty is equally likely

Page 10: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

10

Examples - 3 Probability proportional to size (PPS)

With pps design, we assign a selection probability to each faculty member that is proportional to the number of courses taught by a faculty member that semester

“Size” measure = # of courses taught by faculty member

Faculty who teach more courses are more likely to be included in the sample, but those that teach less still have a positive chance of being included

Motivation: faculty that spend more hours on courses are more critical to getting good estimate of total hours spent

Data from faculty with higher inclusion probabilities will be “down weighted” relative to those with lower probabilities during the estimation process

Typically accomplished using weights for each observation in the dataset

Page 11: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

11

Examples - 4 Stratified random sampling (STS)

Organize list of faculty by college Stratum = college

Allocate n (divide sample size) among colleges so that we select nh faculty in the h-th college Sum of nh over strata equals n

Use SRS, e.g., to select sample in each of the college strata Could use SYS or PPS rather than SRS Could have different selection methods in each

stratum

Page 12: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

12

Examples - 5 Cluster sampling (CS)

Aggregate faculty into departments OU = faculty member, SU = dept

Select a sample of departments, e.g., using SRS Very common to use PPS for selecting clusters

“Size” measure = number of OUs in the the cluster SU Many variants for cluster sampling

After selecting clusters, may want to select a sample of OUs in the cluster rather than taking data on every OU

E.g., select 15 depts in the first stage of sampling, then select 10 faculty in each dept in a second stage of sampling

This is called 2-stage sampling

Page 13: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

13

Examples - 6 Complex sample designs (Ch 7)

Combine basic selection methods (SRS, SYS, PPS) with different methods of organizing the population for sampling (strata, clusters)

Typically have more than one stage of sampling (multi-stage design) Often can not create a frame of all OUs in the population

Need to select larger units first and then construct a frame Stratification and systematic sampling are often used to

encourage spread across the population This improves chances of obtaining a representative sample

Costs are often reduced by selecting clusters of OUs, although cluster sampling may lead to less precision in estimates

Page 14: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

14

Notation for target population The total number of OUs in the population (also

called the universe) is denoted by N Note UPPER CASE Ideally for SRS, sampling frame is list of N OUs in the pop EX: there are N = 4 households in our class

Index set (labels) for all OUs in the population (or universe) is called U U = {1, 2, …, N} A different index set could be our names, or our SSNs

Each person has a value for the characteristic of interest or random variable Y , the number of people in the household The value of Y for household i is denoted by yi Values in the population are y1 , y2 , …, yN

Page 15: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

15

Notation for sample Sample size is denoted by n

Note lower case n is always less than or equal to N (n = N is a

census) Index set (labels) for OUs in the sample is

denoted by S To select a sample, we are selecting n indices

(labels) from the universe U , consisting of N indices for the population

U is our sampling frame in this simple setting Labels in S may not be sequential because we are

selecting a subset of U

Page 16: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

16

Class example Suppose n = 2 households are selected from a

population of N = 4 households in the class U = {1, 2, 3, 4}

Randomly select sample using SRS and get 2 and 3 S =

The data collected on OUs in the sample are values for Y = number of people in the household Data:

Page 17: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

17

Summary of probability sampling framework Assumptions (for now)

Observation unit = sampling unit Target population = sampling universe =

sampling frame N = finite number of OUs in the population U = {1, 2, …, N} is the index set for the OUs

in the population Sample

n = sample size (n is less than or equal to N )

S = index set for n elements selected from population of N units (S is a subset of U)

Page 18: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

18

Conceptual basis for probability sampling Conceptual framework for selecting

samples Enumerate all possible samples of size n from

the population of size N Each sample has a known probability of being

selected P(S) = probability of selecting sample S Use this probability scheme to randomly choose the

sample Using the probability scheme for the samples,

can determine the inclusion probability for each SU i = probability that a sample is selected that

includes unit i

Page 19: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

19

Simple example Population of 4 students in study group,

take a random sample of 2 students Setting

U = {1, 2, 3, 4} N = 4 n = 2

All possible samples of size n = 2 from N = 4 elements

Note: n < N and S U

Page 20: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

20

Simple example - 2 All possible samples

S1 = {1, 2} S3 = {1, 4} S5 = {2, 4}

S2 = {1, 3} S4 = {2, 3} S6 = {3, 4}

Design is determined by assigning a selection probability to each possible sample

P(S1) = 1/3 P(S3) = 1/2 P(S5) = 0

P(S2) = 1/6 P(S4) = 0 P(S6) = 0

Page 21: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

21

Simple example - 3 Inclusion probability definition? What is the probability that student 1

is included in the sample? 1 =

Inclusion probability for student 2, 3, 4? 2 =

3 =

4 =

Is this a probability sample?

Page 22: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

22

Population distribution Response variables represent values

associated with a characteristic of interest for i-th OU Y is the random variable for the characteristic

of interest (CAP Y) yi = value of characteristic for OU i (small y)

The population distribution is the distribution of Y for the target population Y is a discrete random variable with a finite

number of possible values (<= N values) Use discrete probability distribution to

represent the distribution of Y

Page 23: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

23

Population distribution - 2 A discrete probability distribution is denoted

by a series of pairs corresponding to Value of the random variable Y, denoted by y Relative frequency of the value y for the random

variable Y in the population, denoted by P(Y = y) Pair is { y , P(Y = y) }

Constructing a probability distribution List all unique values y of random variable Y Record the relative frequency of y in the

population, P(Y = y)

Page 24: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

24

Class example - 2 Back to # of people in household for each

class member What are the unique values in the pop?

What is the frequency of each value?

What is the relative frequency of each value?

Construct a histogram depicting the variation in values

Page 25: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

25

Summarizing the population distribution Use population parameters to

summarize population distribution Mean or expected value of y

(parameter: ) Proportion of population having a

particular characteristic = mean of a binary (0, 1) variable (parameter: p )

For finite populations, population total of y is often of interest (parameter: t )

Variance of y (parameter: S 2)

Uy

Page 26: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

26

Mean of Y for population Expected value, or population mean, of Y

Mean is in y-units per OU-unit Measure of central tendency (middle of distn) Related to population total (t) and proportion (p)

Examples Average number of miles driven per week adults

in US Average number of phone lines per household

N

t

N

yy

N

ii

U 1

Page 27: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

27

Class example - 3 What is the mean household size

for people in this classroom?

Page 28: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

28

Total of Y in population Population total of Y

Total number of y-units in the population Examples

Number of households in market area with DSL yi =1 if household i has DSL, yi = 0 if not N = number of households in market area

Number of deer in Iowa yi =number of deer observed in area i N = number of observation areas in Iowa

U

N

ii yNyt

1

Page 29: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

29

Class example - 4 What is the total number of people

living in households of people in the classroom?

Page 30: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

30

Proportion Proportion (p) of population having

a particular characteristic Mean of binary variable

Nt

N

yp

N

ii

1

sticcharacteri havet doesn' OU if , 0

sticcharacteri has OU if , 1

i

iy i

Page 31: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

31

Class example - 5 What proportion of people in the

classroom have a cell phone?

Page 32: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

32

Population variance of Y Population variance of Y

Measure of spread or variability in population’s response values Analogous to 2 in other stat classes Not the standard error of an estimate Note this is CAP S 2

1

)(][ 1

2

2

N

yySYV

N

iUi

Page 33: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

33

Coefficient of variance for Y Variation relative to mean

(unitless)

UyS

CV

Page 34: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

34

Class example - 6 What is the population variance for

number of people in households of people in the classroom?

What is the CV?

Page 35: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

35

Summary of population distribution of Y Basic pop unit: OU (i) Number of units or size of pop: N Random variable: Y Parameters: characterize the target population

Mean Total t Proportion (mean) p Variance S2

Coefficient of variation CV = S / STATIC: it is the object of inference and never

changes with design or estimator

Uy

Uy

Page 36: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

36

What’s next Population distribution of Y is object of inference Use SRS to select a sample and estimate the

parameters of the population distribution How to select a sample Estimators for population parameters of Y under SRS

Sample mean estimates population mean N x sample mean estimates population total Sample variance estimates population variance

Assessing the quality of an estimator of a population parameter under SRS

Sampling distribution Bias, standard error, confidence intervals for the estimator

Page 37: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

37

Simple random sample (SRS) DEFN: A SRS is a sample in which every

possible subset of n SUs has an equal chance of being selected as the sample every sampling unit has equal chance of

being included in the sample Example of an “equal probability” sample Does not imply that a sample in which each

SU has the same inclusion probability is a SRS Other non-SRS designs can generate equal

probability samples

Page 38: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

38

Simple random sampling (SRS) Two types

SRSWR (SRS with replacement) Return SU after each step in the selection process

SRSWOR (SRS without replacement) Do not return SU after it has been selected

Selection probability Probability that a unit is selected in a single

draw Constant throughout SRSWR process Changes with each draw in the SRSWOR process

NOT an inclusion probability, which considers the probability of drawing a sample that includes unit i

Page 39: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

39

SRSWR (SRS with replacement) Selection procedure

Select one OU with probability 1/N from N OUs This is the selection probability for each draw

Returning selected OU to universe Repeat n times

Procedure is like drawing n independent samples of size 1 Can draw a sampling unit twice – duplicate units Unappealing for finite populations – no

additional info in having a duplicate unit Useful in theoretical development for large

populations

Page 40: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

40

Focus: SRSWOR (SRS without replacement) Selection procedure

Select one OU from universe of size N with probability 1/N

DON’T return selected unit to universe Select 2nd OU from remaining units in universe

with probability 1/(N - 1) DON’T return selected unit to universe Repeat until n sampling units have been selected

Selection probabilities change with each draw 1/N, then 1/(N -1), then 1/(N -2), …, 1/(N – n +1)

Page 41: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

41

SRSWOR (SRS without replacement) Probability of selecting a sampling unit in a

single draw depends on number of SUs already selected (conditional probability) On the c-th step of the process, c-1 s.u.s have

already been selected for a sample of size n Probability of selecting any of the remaining N – c +

1 s.u.s in the next draw is

Inclusion probability for SU i (unconditional probability) (see p. 44 in text)

Nn

i

11 cN

Page 42: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

42

SRSWOR (SRS without replacement) Number of possible SRSWOR samples of

size n from universe of size N

Probability of selecting a sample S

(Probability is the same for all samples)

12...)2()1(! where , )!(!

!

xxxx

nNnN

n

N

1

)(

n

NSP

Page 43: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

43

Selecting a SRS using SRSWOR Create a sampling frame

List of sampling units in the universe or population Assigns an index to each sampling unit

Determine a selection procedure that performs SRSWOR Procedure must generate to n unique sampling units

such that each SU has an equal chance of being included in the sample

Random number generator or table is common basis Need rules to identify when the selected unit is included

in the sample or tossed Select random numbers and determine sampled

units

Page 44: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

44

Using random numbers to select a SRSWOR sample Determine a rule to assign random numbers

to the sampling universe index set U Rule must give each unit an equal chance of being

included in the sample Select the set of random numbers, e.g., using

computer or printed random number table Apply the rule to each random number to

determine the sampled OU Check to see if this OU has already been selected

If already selected, ignore it Keep going until you have n SUs in the sample

Page 45: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

45

Census of Agriculture exampleSelect 300 counties from 3078 counties in the US

N = n =

Sampling frame = ? Generate random numbers between 0 and 1 on

the computer Need n or more random numbers depending on rule

Multiply each random number by N = 3078 and round up to the nearest integer Random number = .61663 Multiply random # by N = 3078 x .61663 = 1897.98714 Round up to 1898 Take 1898th county in the frame

Page 46: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

46

Estimating population mean under SRS Target population mean

Estimator of for SRS sample of size n is the sample mean

Note “Estimator” refers to the formula “Estimate” refers to the value obtained from

using the formula with data

n

iiy

ny

1

1

N

iiU y

Ny

1

1

Uy

Page 47: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

47

Class example - 7 Estimate the average household

size for our classroom

Page 48: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

48

Estimating population total Target population total

Estimator of t for SRS sample of size n

n

iiy

nN

yNt1

ˆ

N

iiU yyNt

1

Page 49: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

49

Class example - 8 Estimate the total number of

people living in the households of people in this classroom

Page 50: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

50

Estimating population proportion Target population proportion

Y takes on values 0 or 1, where 1 means the unit has the characteristic of interest

Estimator of p for SRS sample of size n

N

iiU y

Nyp

1

1

n

iiy

nyp

1

Page 51: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

51

Class example - 9 Estimate the proportion of people

with cell phones in this class room

Page 52: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

52

Target population variance

Estimator of S2 for SRS sample of size n is the sample variance

(note lower case s)

Estimating population variance

1

)(][ 1

2

2

N

yySYV

N

iUi

1

)(1

2

2

n

yys

n

ii

Page 53: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

53

Class example - 10 Estimate the variance of number of

people in households of people in this class room

Page 54: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

54

Estimating population standard deviation and CV Standard deviation of Y, S ?

Estimator of standard deviation of Y?

CV of population distribution?

Estimator of CV?

Page 55: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

55

What would happen if we took another sample? S = Data = Estimates

Mean Total Proportion Standard deviation CV

Page 56: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

56

Sampling distribution Need to assess the quality of our estimates

Is a good estimator of ? Is a good estimator of p ? Is s2 a good estimator of S2 ?

Use the sampling distribution to assess the quality of the estimator Distribution of estimator over all possible

samples EX: distribution of over all possible SRS

samples of size n from a population of size N

y Uy

y

p

Page 57: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

57

Sampling distribution Simulation

Page 58: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

58

Measures of quality Denote

Population parameter as [think pop mean ]

Estimator of as [think sample mean ] Mean of the sampling distribution is the

expected value of the estimator An estimator is unbiased if

Variance of the sampling distribution Precision: want variance of estimator to be

small Coefficient of variance

Relative precision: want CV to be small }{

}{

E

V

}ˆ{E

}ˆ{E

}ˆ{V

yUy

Page 59: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

59

Sampling distribution of estimator Basic pop unit: sample selected using a specific

design, S Number of units or size of pop: number of

possible samples Need probability of selecting sample !

Random variable: estimator of parameter, Parameters: characterize the quality of the

estimator Mean (assesses bias of the estimator), Variance, SE, CV (assesses precision of estimator)

DEPENDS on population parameter, estimator of population parameter, sample design

}ˆ{E

Page 60: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

60

Population Samplingdistributiondistribution Basic unit: OU (i)

Total number of units: N

Random variable: character of interest, Y

Parameters: characterize the target population Mean , proportion p

(central tendency) Total t Variance S2, std dev S, CV

(spread of distn) STATIC once you identify Y,

pop distribtn is the object of inference and never changes with design or estimator

Basic unit: sample selected using a specific design, S

Total number of units: number of possible samples

Random variable: estimator of parameter,

Parameters: characterize the quality of the estimator Mean (used to assess

bias of the estimator) Variance , SE, CV

(precision of estimator)

DEPENDS on population parameter, estimator of population parameter, sample design

}ˆ{E

}ˆ{V

Uy

Page 61: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

61

Conceptual framework for a sampling distribution - 1 List out all possible samples of size n

from the population of size N A sample is the BASIC UNIT for the

population of all possible samples We determine the probability of selecting

the sample Unequal probability sample (now) Simple random sample

NOTE: sampling distribution depends on the design selected

Page 62: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

62

Simple example from earlier lecture (not SRS!) All possible samples

S1 = {1, 2} S3 = {1, 4} S5 = {2, 4}

S2 = {1, 3} S4 = {2, 3} S6 = {3, 4}

Design is determined by assigning a selection probability to each possible sample

P(S1) = 1/3 P(S3) = 1/2 P(S5) = 0

P(S2) = 1/6 P(S4) = 0 P(S6) = 0

Page 63: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

63

Conceptual framework for a sampling distribution - 2 List Using the n data values associated with

each sample, calculate the value of the estimator for each sample The estimator is the random variable of our

distribution Example: sample mean is calculated for

each of the possible samples NOTE: the sampling distribution depends

on the estimator selected

y

Page 64: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

64

Simple example from earlier lecture - 2 Population values for Y

i 1 2 3 4 yi 3 5 1 3

All possible samples of size n = 2S1 = {1, 2}, S2 = {1, 3}, S3 = {1, 4},S4 = {2, 3}, S5 = {2, 4}, S6 = {3, 4}

Values of corresponding to each sample

0.32/)33(

0.22/)13(

0.42/)53(

3

2

1

y

y

y

0.22/)31(

0.42/)35(

0.32/)15(

6

5

4

y

y

y

y

Page 65: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

65

Conceptual framework for a sampling distribution - 3 List

Using

Sampling distribution is described by pairs of values for estimator from the sample and relative frequency of obtaining that value We are using the steps we used before for

creating a discrete distribution

Page 66: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

66

Representing the sampling distribution Probability distribution: pairs of

is a random variable, c is a value of

} )( ,{ cyPc

" that such samples all" means :

where , )()(:

cyScyS

SPcyPcyS

y y

Page 67: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

67

Simple example from previous lecture - 3 Number of possible samples

Probability of selecting sample

Probability distribution: unique values of and relative frequency

c 2.0 3.0 4.0 )( cyP

6424

)12)(12(1234

2

4

n

N

y

0.2 0)( ,0.3 6/1)(

0.4 0)( ,0.2 2/1)(

0.3 0)( ,0.4 3/1)(

6633

5522

4411

ySPySP

ySPySP

ySPySP

Page 68: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

68

Conceptual framework for a sampling distribution - 4 List Using

Sampling distribution

Parameters summarize sampling distribution Mean of sampling distribution Variance, std dev (SE) of sampling distribution CV of sampling distribution

Page 69: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

69

Mean of sampling distribution Same concept of expected value used with

population distribution

Variance of sampling distribution Use more general formula for variance Later, we’ll use reductions that are easier to

calculate

17.361.3619

6892

31

)0.4(21

)0.3(61

)0.2(

)(}{

c

cyPcyE

Ex: mean and variance of sampling distribution for - 4y

47222.031

)61.30.4(21

)61.30.3(61

)61.30.2(

)(}){(}])[{(}{

222

22

c

cyPyEcyEyEyV

Page 70: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

70

List out all possible samples # possible samples:

Determine the probability of a sample

Calculate estimator for each sample Examples:

Create a discrete probability distribution Calculate summary parameters

What if we took a SRS of size n from N units?

pty ˆor or

!)!(!

nnNN

n

N

samples allfor constant /1)(

n

NSP

}ˆ V{and }ˆE{ , For } V{and }E{ , For tttyyy

Page 71: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

71

Back to example with SRS Number of possible samples

Probability of selecting sample

Probability distribution: unique values of and relative frequency

c 2.0 3.0 4.0 )( cyP

6424

)12)(12(1234

2

4

n

N

y

0.2 6/1)( ,0.3 6/1)(

0.4 6/1)( ,0.2 6/1)(

0.3 6/1)( ,0.4 6/1)(

6633

5522

4411

ySPySP

ySPySP

ySPySP

Page 72: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

72

Mean of sampling distribution

Mean of population distribution

0.339

)0.4(31

)0.3(31

)0.2(31

)(}{

c

cyPcyE

Example: mean of sampling distribution for under SRS y

0.3412

)3153(411

1

N

iiU y

Ny

Page 73: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

73

Bias of an estimator Estimation bias of

Note that this is the mean of the estimator (from sampling distribution) minus the population parameter (from population distribution)

If then is said to be an unbiased estimator of

- }ˆ{ ]ˆBias[ E

0 ]ˆBias[

Page 74: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

74

Variance of sample meanunder SRS

Don’t have to use the general formula Variance of sample mean (derived stat using theory)

Similar to infinite population formula Has an extra factor called the finite population

correction factor (FPC)

variancepopulation the is 1

1

where , 1][

1

22

2

N

iUi yy

NS

Nn

nS

yV

n

2

Page 75: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

75

Example Variance of sampling distribution

for

Other measures of dispersion for sampling distribution

3333.032

42

1 1}{

32

)35()33(2)31(14

1)(

11

2

222

1

22

SNn

yV

yyN

SN

iUi

1925.03

5774.0

}{

}{}{

5774.03333.0}{}{

S

S

S

yE

yVyCV

yVySE

y

Page 76: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

76

Finite population correction factor (FPC)

Sampling fraction is the proportion of the population sampled, or n/N

Larger sample Larger fraction of population Smaller FPC Smaller variance of sample mean

1][2

Nn

nS

yV

Nn

FPC 1

Page 77: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

77

Impact of FPC on estimated variance of parameter estimate Often FPC is very close to 1

Sample of 3000 households from total of 1,200,000 households

In cases where sampling fraction is very small and FPC is very close to 1, FPC has no practical effect on the SE or estimated variance of the param estimate

Sampling fraction n/N is not a good measure of whether your estimate will be precise

The sample size n is the most important part of the variance or SE formulas given variance

99975.00025.1000,200,1

300011

00025.0000,200,1

3000 fraction Sampling

Nn

FPC

Nn

2s

Page 78: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

78

Do not know variance of population distribution,

Unbiased estimator for

Estimator for

Note that is the standard error of the sample mean

Estimating population variance under SRS

1

1

1

22

N

ii yy

ns

2S

2S

1][ˆ2

Nn

ns

yV

][yV

][ˆ)(^

yVySE

Page 79: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

79

Ag example Interested in average number of acres

per county devoted to farms Sample 300 counties from list of 3078 Collect data and get following summary

statistics

What are estimated mean and standard error?

9.551,344

1992 incounty per acres farm 897,2972

s

y

Page 80: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

80

Rounding rules Always keep all of the digits while you are

doing calculations Round only when you get ready to report

the result at the end of the calculation … Round the estimated SE to 2 significant digits

107,789 is rounded to 110,000 0.0325329 is rounded to 0.033

Round estimate to precision of the SE If SE is 110,000, round estimate to nearest 10,000

(xx0,000) If SE is 0.033, round estimate to nearest 1/1000 (x.xxx)

Estimated variances are usually reported to 5 significant digits

Page 81: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

81

Sampling distribution for using SRS of size n from N is an unbiased estimator of

Mean of sampling distribution is always equal to population mean under SRS

Variance of is

Estimate the variance of using sample variance s2

y

1][2

Nn

nS

yV

1][ˆ2

Nn

ns

yV

y

y

Uyy

U}{ yyE

Page 82: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

82

Sampling distribution of under SRS

Mean of for population total t under SRS

Expectation of a linear function of a random variable

If a, b are constants & Y , are random variables, then

Is an unbiased estimator of t ?

tyNyENyNEtE U}{ }{ }ˆ{

t

baEbaE

bYaEbaYE

}ˆ{}ˆ{

}{ }{

t

t

Page 83: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

83

Variance of estimator of total under SRS

Variance of a linear function of a random variableIf a, b are constants & Y , are random variables, then

Sampling distribution of under SRS - 2

nS

Nn

NyVNyNVtV2

22 1][][]ˆ[

t

}ˆ{}ˆ{

}{ }{2

2

VabaV

YVabaYV

Page 84: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

84

Sampling distribution of under SRS - 3

Estimator for variance of under SRS

ns

Nn

NtV2

2 1]ˆ[ˆ

t

t

Page 85: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

85

Ag example - 2 Estimated total acres devoted to

farms in the US in 1992? Estimated Variance of estimated

total? Other measures of dispersion for

sampling distribution? Estimated SE

Page 86: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

86

Mean of estimator for population proportion p under SRS

Is unbiased for p ?

Sampling distribution of under SRS

p

p

}ˆ{pE

p

Page 87: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

87

Sampling distribution of under SRS - 2

Variance of sample proportion (derived stat using theory)

Very similar to infinite population formula Extra factor arises from finite pop and is NOT

the same as the FPC Estimator does have the FPC in the

formula

npp

NnN

pV)1(

1

]ˆ[

npp )1(

p

1)ˆ1(ˆ

1]ˆ[ˆ

npp

Nn

pV

Page 88: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

88

Ag example - 3 Suppose we are interested in the

proportion of counties with fewer than 200,000 acres devoted to farms in 1992

Data from our sample of 300 indicate that 153 counties have less than 200,000 acres devoted to farms

Estimated population proportion? Estimated SE of estimated proportion?

Page 89: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

89

Quality of estimates (Fig 2.2, p. 29)

Estimator under a given design is unbiased On average over a large number of samples, the

mean of the estimates “hit” the target population parameter (centered on the bull’s eye)

Estimator under a given design is precise Over a large number of samples, estimates will tend

to be close to one another, indicating that the variance of the sampling distribution for the estimator is small

Clump pattern, but may not be centered on bull’s eye (precise but biased)

Estimator under a given design is accurate Estimator comes close to hitting target and is precise Assess this with the mean squared error (MSE)

Page 90: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

90

Mean Squared Error an Estimator Mean squared error (MSE) of

Combines measures of bias and precision to provide an index of the accuracy of an estimator under a given design Sometimes we are willing to accept a little bias to

get a more precise estimator, MSE is improved

If

22]ˆBias[ ]ˆ[ˆ]ˆMSE[

VE

]ˆ[]ˆMSE[ then 0]ˆBias[ V

Page 91: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

91

MSE of SRS estimators All of these estimators are

unbiased under SRS (Bias = 0) So under SRS

}ˆ{ ]ˆMSE[

}ˆ{ ]ˆMSE[

}{ ]MSE[

tVt

pVp

yVy

Page 92: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

92

Confidence intervals Estimate variance, SE, CV, MSE of

estimator under a design to provide indication of quality of estimate

Another approach Estimate a confidence interval to

express precision of estimate

Page 93: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

93

Book example 2.7, p. 35-6 True parameter value: t = 40 CI of interest: List 70 possible samples of size n = 4 Each sample has a probability of selection P(S) For each sample, record value of a variable u

that indicates whether CI from sample S includes t = 40

Confidence coefficient:

)]ˆ(ˆ4ˆ,)ˆ(ˆ4ˆ[ testtest

)]ˆ(ˆ4ˆ,)ˆ(ˆ4ˆ[ 40 if , 0

)]ˆ(ˆ4ˆ,)ˆ(ˆ4ˆ[ 40 if , 1)(

testtest

testtestSu

77.0)(170

1

k

kk uSP

Page 94: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

94

Ex – 2: Assume SRSWOR If 60 of the 70 SRSWOR samples

resulted in CIs that included the true total, what is the confidence coefficient?

What is alpha?

Page 95: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

95

What is a 95% confidence interval (CI) under SRS? Heuristic definition

Take repeated samples of size n from population of size N

Collect data on Y Calculate an estimate of a population

parameter using data from n observations Calculate 95% CI for parameter estimate

using data from n observations Expect 95% of the CIs to contain the

true value of the parameter

Page 96: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

96

Interpreting CIs in general More generally (for any design), a (1-)100% CI has the interpretation There is a (1-)100% chance of selecting a

sample for which the CI will include the true population parameter

Note The upper and lower limits of the CI are random

variables, calculated from the sample data The true parameter value is either included or

not included in a single CI Confidence coefficient of a CI has a relative

frequency interpretation across samples

Page 97: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

97

Confidence interval definition Standard estimator for a (1-)100% confidence interval (CI):

)]ˆ(ˆˆ,)ˆ(ˆˆ[

ly equivalentor )ˆ(ˆˆ

2/2/

2/

eszesz

esz

Page 98: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

98

Standard normal distribution Z ~ N(0, 1)

Z is the random variable Mean E{Z} = 0 and variance V{Z} =

1

Two-sided (1-)100% confidence interval Use critical value

2/z

2/zZP

Page 99: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

99

Infinite vs. finite populations In other stat classes …

Assume SRS with replacement from infinite pop

Justify CI by applying the Central Limit Theorem (CLT)

In sample surveys, we have a finite number of possible samples Can calculate exact confidence coefficient 1- for a stated interval (see previous example)

In practice, it is not possible to list all possible samples, so we have a special CLT that relies on a “superpopulation” framework

Page 100: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

100

Superpopulation framework Asymptotic framework for SRSWOR in

finite populations Population is part of a larger superpopulation There is a a series of increasingly larger

superpopulations Use superpopulation concept to derive a

Central Limit Theorem for SRSWOR Bottom line

We will use the standard CI estimator with a different theoretical justification

Page 101: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

101

When is CLT justified? Confidence coefficient is approximate

Quality of approximation depends on n and the distribution of the underlying random variable, Y

“n is large enough for CLT” is less clear for finite populations n = 30 rule in other stat classes does NOT apply

Rules of thumb If distribution of Y is close to normal, n = 50 Need larger n if distribution of Y deviates from

normal, e.g., skewed Y categorical: if p is proportion with

characteristic of interest, np 5 and n(1-p) 5

Page 102: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

102

Determining sample size – a general approach Specify tolerable error (level of precision,

level of confidence) Identify appropriate equation relating

tolerable error (e, ) to sample size (n) Estimate unknown parameters in

equation Solve for n Evaluate (and return to first step)

Can you afford sample size? What expectations can be altered?

Page 103: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

103

Specify tolerable error Two parameters

e : margin of error or half-width of CI : [1-]100% is confidence level

Absolute expression (half-width of CI): estimate within e of true pop parameter

Relative expression: within 100e% of

1ˆP e

P e

Page 104: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

104

Equation linking e, , and n Most common equation is half-width of CI

Example: sample mean under SRSWOR

Note for For p , use S2 p(1-p) For = 0.05, use n0 is sample size under SRSWR (ignoring FPC)

]ˆ[ 2/ SEze

2

22/

0 e

Szn

Nn

n

N

Sze

Szn

0

02

2/2

22/

1

12

2/

Nn

nS

ze

22/ z

Page 105: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

105

Estimate unknowns: population variance of y, S2

Use estimator for variance, s2

Pilot study Previous study

Careful about comparability

Use CV from previous study Careful about comparability

Guess variance under normality estimate of S = range for 95% of values / 4 estimate of S = range for 99% of values / 6

Page 106: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

106

Estimating unknowns: population proportion, p Use estimates from pilot or

previous study If know nothing of true proportion

Use p = 0.5 Max possible variance for estimated

proportion under SRS, so this is conservative

Commonly used

Page 107: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

107

Practicalities for determining n Sampling fraction rarely important

Most populations are large enough that sampling fraction n/N is small for practical values of n

Subpopulations should influence sample size 95% CI for a proportion ( = 0.05, p = 0.5)

Implies

n = 400 for e 0.05 (whole sample) n = 100 for e 0.10 (subpopulation) n = 50 for e 0.15 (subpopulation)

n = 500 for e 0.04 (little gain over 400)

ne /1

Page 108: 1 Ch 2: probability sampling, SRS Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference.

108

SRS: pros and cons Cons

SRS is rarely the “best” design May not have list of all OUs need different

design May have additional info on pop to create a

more efficient design (improve precision) Pros / uses

Standard stat procedures can be used with little or no bias

Mainly interested in regression rather than estimating pop params (ignore sample design – but could still get a better sample)