Stratified Random Sampling
Stratified Random Sampling
• A stratified random sample is obtained by separating the population elements into non-overlapping groups, called strata
• Select a simple random sample from each stratum
Stratified Random Sampling…
• Eg: sampling fish from a stream with the goal being to estimate the average length of trout
– Want to know the size of fish (length)
– Stream is made up of riffles, runs and pools
• larger (longer) fish live in the pools • smaller fish in the riffles.
– Strata = stream habitat type
Why Choose Stratification?
• Minimize uncertainty
– equivalent to minimizing the variability associated with our response variable
• Example
– If fish in riffles are similar in length (thus small within habitat variability) then taking averages on a stratum by stratum basis will mean low variation for each average
Simulation Comparing Stratified and Simple Random Sampling
simu<- function(N1, N2, N3, n, no) { # N1: size Population 1 # N2: size Population 2 # N3: size Population 3 # n: sample size # no: total number iteration pop1<-rnorm(N1, 5, 1) pop2<-rnorm(N2, 10, 2) pop3<-rnorm(N3, 20, 4) pop = c(pop1,pop2,pop3) pop.mean = mean(pop) N = N1 + N2 + N3 n1 = round(n*N1/N, 0) n2 = round(n*N2/N, 0) n3 = round(n*N3/N, 0) me.srs = numeric(no) me.st = numeric(no) for(i in 1 : no) { sample.srs = sample(pop, n) sample.st = c(sample(pop1,n1), sample(pop2,n2), sample(pop3,n3)) me.srs[i] = mean(sample.srs) me.st[i] = mean(sample.st) } a = min(me.srs) b = max(me.srs) par(mfrow=c(2,1)) hist(me.srs, main = "mean obtained by Simple Random Sample", col ="red", xlim=c(a,b)) abline(v=pop.mean, lwd = 2.5) hist(me.st, main = "mean obtained by Stratified Random Sample", col="blue",xlim=c(a,b)) abline(v=pop.mean, lwd = 2.5) cat("Population mean:",pop.mean,"\n") } simu(200,500,800, 100, 1000)
mean obtained by Simple Random Sample
me.srs
Frequency
13 14 15 16
0100
250
mean obtained by Stratified Random Sample
me.st
Frequency
13 14 15 16
0100
250
Note with the stratified randomsample that the sampling distributionof the sample meanis characterized byless variation/uncertaintythan in the simple randomsample protocol.
Why Choose Stratification…
• Estimates of population parameters may be desired for subgroups of the population
Eg: By stratifying on stream habitat type
• You can easily provide estimates of the mean fish length for each habitat type (riffle, run, and pool)
• Separate confidence intervals for each of the strata
Why Choose Stratification…
• The cost per observation in the sample may be reduced– Eg. Gear changes when habitat changes
• Simple random sampling of stream sections means more gear changes
Riffles Runs Pools 5.02 14.10 16.39 5.28 12.80 18.34 6.40 14.22 17.73 5.15 15.42 17.96 5.86 14.47 18.27 6.25 14.71 19.44 4.73 14.10 17.76 6.11 13.47 17.61
13.67 18.76 14.00 16.22 14.10 16.88 14.94 18.91 16.98 20.49 16.90 17.79 19.69 19.31 18.66 16.29
Example Data
Obtain a 95% bootstrap CI onthe mean length of fish acrossthe three habitats
Top Related