SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study...

32
SAMPLING METHODS

Transcript of SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study...

Page 1: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

SAMPLING METHODS

Page 2: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Reasons for Sampling

•Samples can be studied more quickly than populations.

•A study of a sample is less expensive than studying an entire population, because smaller number of items or subjects are examined. This consideration is especially important in the design of large studies that require a length follow-up.

•A study of an entire population (census) is impossible in most situations. Sometimes, the process of the study destroys or depletes the item being studied.

Page 3: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

•Sample results are often more accurate than results based on a population.

•If samples are properly selected, probability methods can be used to estimate the error in the resulting statistics. It is this aspect of sampling that permits investigators to make probability statements about observations in a study.

Page 4: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

SAMPLING MEHODS

The primary purpose of sampling is to estimate certain population parameters such as means, totals, proportions or ratios.

A probabilty sample has the characteristic that every element in the population has a known, nonzero probablity of being included in the sample. A non-probability sample is one, that does not have this feature.

Non-probability SamplingProbability Sampling

Page 5: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Probability Sampling Methods

•Simple Random Sampling

•Stratified Random Sampling

•Systematic Sampling

•Cluster Sampling

Page 6: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Simple Random Sampling

A simple random sample is one in which every subject has an equal probability of being selected for the study.

The recommended way to select a simple random sample is to use a table of random numbers or a computer-generated list of random numbers.

Page 7: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

From a population of size N, in order to select a simple random sample of size n;

1. List and number each element in the population from 1 to N.

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ..............................., N-1, N

2. Determine the required sample size, n.

3. Select n random numbers by a random process, e.g. Table of random numbers or a sofware, MS Excell

4. Take subjects from the population corresponding to the selected random numbers.

5. Estimate the population values (parameters).

Page 8: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

83760 31255 71609 89887 00940 54355 44351 89781 58054 6581366280 56046 50526 33649 87067 02697 06577 16707 96368 4767870218 28376 98535 34190 96911 81578 97312 20500 48030 2725602349 88955 52760 73696 91510 38633 38883 90419 26716 9821593606 21415 34843 12969 84847 06280 95916 12991 08262 5838524274 18747 37327 06780 08032 98544 24902 81607 87914 2272167778 70496 57588 89813 71211 83848 93494 27946 79722 7031589134 06458 40897 73025 04191 77144 49340 89446 71852 8085483625 00097 71092 12009 63223 37993 50067 25688 98179 3462803324 68196 72460 55616 27006 50790 28629 88726 97143 6321884392 36623 91964 03505 46525 40490 77787 68545 02795 7267676926 10866 39734 50512 04181 78012 78705 86194 28371 5453506612 60200 49085 85108 71438 10099 99027 65081 82492 7758476721 02889 95600 07984 31925 59685 91510 40039 43205 3714964599 51953 55612 89088 58436 21501 86219 74528 59805 6502079440 99677 49530 55291 34867 54774 52449 23294 94815 9512435839 00177 57742 09502 42624 29017 94284 81409 36904 5432983013 94568 75490 12138 24067 86954 00910 61171 82982 8719119980 47085 46064 19102 26297 79745 99611 04555 52501 3208855716 10350 67645 62922 81919 47925 91448 36025 20611 3893936624 03992 27656 33092 22252 54461 83386 55340 11313 2329050678 33814 07643 81452 60689 48745 49894 27285 90420 3118817932 27351 34623 55864 58659 06992 88558 45742 56792 7102776795 23022 20409 60100 59507 40596 16971 96490 47676 4912920654 64916 59927 62495 81133 29095 64024 02792 39809 8530273601 60099 50404 41700 53664 54397 49600 46980 13882 5427559678 14528 96293 12957 68229 95753 15727 75113 09892 7148792132 51012 09399 30175 73025 99849 34334 20089 19323 9514976143 16802 32819 34057 94227 25779 93959 89810 47627 7056199617 64239 13967 90188 60291 38478 09723 10697 78020 5138802841 25077 02368 75931 42679 70900 33040 08871 46696 1864757979 28621 03155 03704 98473 25894 26753 62390 54746 8418941233 68027 17036 28310 50551 84295 80793 93235 78902 1835148049 09367 15040 29166 64290 16439 67192 16681 46304 6819010984 97394 23070 90585 53139 96998 39834 27678 42288 3377859531 76937 15645 70938 00036 72773 25984 06507 27933 4677936874 61476 74611 74476 48713 36124 98549 70465 58742 2870749377 53222 14506 80260 59070 47101 02248 99520 08803 7977259707 00510 29216 53012 47115 39798 79797 06491 72669 0505563469 49151 35960 88792 43961 62352 78114 77810 95638 84227

From a population of size N=500, select a random sample of size n=10.

•Number subjects from 1 to 500.

•From a random starting point, 838, move down. Take numbers ≤ 500.

•379, 404, 100, 215, 290, 479, 487, 69, 405, 290th subjects in the population will constitute the sample.

•Make observations on selected subjects.

•Estimate parameters.

Page 9: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

From the sample calculate statistics to estimate paramaters (population values). Point EstimatesPoint Estimates

μx

n

xx

n

ii

1

na

p

P P

Page 10: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Interval EstimatesInterval EstimatesConfidence interval for the population mean:

n

s

n

s);1();1( nini txtx

Where S is the standard deviation and t is the tabulated t value.

1

2

2

n

n

xx

s

ii

Page 11: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

1 1.000 3.078 6.314 12.706 31.821 63.657

2 0.816 1.886 2.920 4.303 6.965 9.925

3 0.765 1.638 2.353 3.182 4.541 5.841

4 0.741 1.533 2.132 2.776 3.747 4.604

5 0.727 1.476 2.015 2.571 3.365 4.032

6 0.718 1.440 1.943 2.447 3.143 3.707

7 0.711 1.415 1.895 2.365 2.998 3.499

8 0.706 1.397 1.860 2.306 2.896 3.355

9 0.703 1.383 1.833 2.262 2.821 3.250

10 0.700 1.372 1.812 2.228 2.764 3.169

11 0.697 1.363 1.796 2.201 2.718 3.106

12 0.695 1.356 1.782 2.179 2.681 3.055

13 0.694 1.350 1.771 2.160 2.650 3.012

14 0.692 1.345 1.761 2.145 2.624 2.977

15 0.691 1.341 1.753 2.131 2.602 2.947

16 0.690 1.337 1.746 2.120 2.583 2.921

17 0.689 1.333 1.740 2.110 2.567 2.898

18 0.688 1.330 1.734 2.101 2.552 2.878

19 0.688 1.328 1.729 2.093 2.539 2.861

20 0.687 1.325 1.725 2.086 2.528 2.845

25 0.684 1.316 1.708 2.060 2.485 2.787

30 0.683 1.310 1.697 2.042 2.457 2.750

40 0.681 1.303 1.684 2.021 2.423 2.704

50 0.679 1.299 1.676 2.009 2.403 2.678

60 0.679 1.296 1.671 2.000 2.390 2.660

70 0.678 1.294 1.667 1.994 2.381 2.648

80 0.678 1.292 1.664 1.990 2.374 2.639

90 0.677 1.291 1.662 1.987 2.368 2.632

100 0.677 1.290 1.660 1.984 2.364 2.626

0.674 1.282 1.645 1.960 2.326 2.576

: One Tail: 0.250 0.100 0.050 0.025 0.010 0.005

: Two Tails: 0.500 0.200 0.100 0.050 0.020 0.010

Page 12: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

ExampleA researcher wishes to estimate the average age of the mother at first birth. He selects 10 mothers at random, and gathers the following data:

Mother No 1 2 3 4 5 6 7 8 9 10

Age at first birth

24

20

26

19

20

23

28

22

18

25

Page 13: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Point estimate of the population mean:

years 5.2210

225

n

xx i

Sample standard deviation:

27.3

910

)225(5159

1

22

2

nn

xx

s

ii

Estimated standard eror of the mean:

04.110

27.3

n

ssx

Page 14: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

If the researcher wishes to be 95% confident in his estimate:

n

s

n

s);1();1( nini txtx

)04.1(26.25.22 )04.1(26.25.22

85.24 15.20

Page 15: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

1 1.000 3.078 6.314 12.706 31.821 63.657

2 0.816 1.886 2.920 4.303 6.965 9.925

3 0.765 1.638 2.353 3.182 4.541 5.841

4 0.741 1.533 2.132 2.776 3.747 4.604

5 0.727 1.476 2.015 2.571 3.365 4.032

6 0.718 1.440 1.943 2.447 3.143 3.707

7 0.711 1.415 1.895 2.365 2.998 3.499

8 0.706 1.397 1.860 2.306 2.896 3.355

9 0.703 1.383 1.833 2.262 2.821 3.250

10 0.700 1.372 1.812 2.228 2.764 3.169

11 0.697 1.363 1.796 2.201 2.718 3.106

12 0.695 1.356 1.782 2.179 2.681 3.055

13 0.694 1.350 1.771 2.160 2.650 3.012

14 0.692 1.345 1.761 2.145 2.624 2.977

15 0.691 1.341 1.753 2.131 2.602 2.947

16 0.690 1.337 1.746 2.120 2.583 2.921

17 0.689 1.333 1.740 2.110 2.567 2.898

18 0.688 1.330 1.734 2.101 2.552 2.878

19 0.688 1.328 1.729 2.093 2.539 2.861

20 0.687 1.325 1.725 2.086 2.528 2.845

25 0.684 1.316 1.708 2.060 2.485 2.787

30 0.683 1.310 1.697 2.042 2.457 2.750

40 0.681 1.303 1.684 2.021 2.423 2.704

50 0.679 1.299 1.676 2.009 2.403 2.678

60 0.679 1.296 1.671 2.000 2.390 2.660

70 0.678 1.294 1.667 1.994 2.381 2.648

80 0.678 1.292 1.664 1.990 2.374 2.639

90 0.677 1.291 1.662 1.987 2.368 2.632

100 0.677 1.290 1.660 1.984 2.364 2.626

0.674 1.282 1.645 1.960 2.326 2.576

: One Tail: 0.250 0.100 0.050 0.025 0.010 0.005

: Two Tails: 0.500 0.200 0.100 0.050 0.020 0.010

Page 16: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

CONFIDENCE INTERVAL FOR

A POPULATION PROPORTION

When P, population proportion is unknown, its estimate, the sample proportion, p can be used.

n

p)-p(1

n

p)-p(1);1();1( nn tpPtp

Page 17: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

A researcher wishes to estimate, with 95% confidence, the proportion of woman who are at or below 20 years of age at first birth.

Example

Page 18: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Point estimate of the population proportion:

p=a/n=4/10=0.4

15.010

)6.0)(4.0()1(

n

ppsp

Estimated standard error of the mean:

n

p)-p(1

n

p)-p(1);1();1( nn tpPtp

)15.0)(26.2(4.0)15.0)(26.2(4.0 P

74.006.0 P

Page 19: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

51.0P29.0

In the above example if the sample size were 100 instead of 10, then the 95% confidence interval would be:

Page 20: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Among 250 students of Hacettepe University interwieved 185 responded that they reqularly read a daily newspaper. With 95% confidence, find an interval within which the proportion of students who regularly read a newspaper in Hacettepe University lie.

74.0250

185p Point estimate of the proportion of students

who read a newspaper.

028.0250

26.0x74.0

n

)p1(pSp

The standard error of the estimate is 0.028.

Page 21: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

In oder words, the standard deviation of the proportions that can be computed from all possible samples of size 250 is 0.028.

The 95% Confidence Interval is:

80.0P69.0)028.0(96.174.0P)028.0(96.174.0

StpPStp p)1n;(p)1n;(

Page 22: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Systematic Sampling

A systematic random sample is one in which every kth item is selected; k is determined by dividing the number of items in the population by the desired sample size.

N/n

1 2 3 4 … i … k … i+k … i+2k … i+3k … N

Page 23: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Stratified Sampling

A stratified random sample is one in which the population is first divided into relevant strata (subgroups), which are internally homogenous with respect to the variable of interest and a random sample is then selected from each stratum.

Characteristics used to stratify should be related to the measurement of interest, in which case stratified random sampling is the most efficient, meaning that it requires the smallest sample size.

Page 24: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Strata Strata size Sample size

1 N1 n1

2 N2 n2

k Nk nk

TOTAL N n From each starta, select random samples independently, whose sizes are proportional to the size of that strata.

NN

nn ii

Page 25: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Estimation of the parameters

i

iik

i

i

i

ik

i

i

i

k

i

i

i

k

i

i

n)p(p

NN

)p(se

ns

NN

)xse(

estimates of error dardtanS

p NN

p

x NN

x

1

1

2

1

1

1

Page 26: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Cluster Sampling

A cluster random sample results from a two-stage process in which the population is divided into clusters and a subset of the clusters is randomly selected.

Clusters are commonly based on geographic areas or districts, so this approach is used more often in epidemiologic research than clinical studies.

Page 27: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Non-probability Sampling

The sampling methods just discussed are all based on probability, but nonprobability sampling methods also exist, such as convenience samples or quota samples. Nonprobability samples are those in which the probability that a subject is selected is unknown. Nonprobability samples often reflect selection biases of the person doing the study and do not fulfill the requirements of randomness needed to estimate sampling error. When we use the term “sample” in the context of observational studies, we will assume that the sample has been randomly selected in an appropriate way.

Page 28: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

DETERMINATION OF THE SAMPLE SIZE

How large a sample is needed for estimating

2

22

)(

d

zn

a) Population mean, :i) When population size, N, is unknown

ii) When population size, N, is known

22

)(

2

22

)(

)1(

zNd

Nzn

Page 29: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

If we wish, with 95% confidence, to estimate the average birth weight of infants, within 250 gr around the unknown population mean, how largea sample should we select? (Assume =700 gr)

Example

30)250(

)700()96.1(2

22

n

When N=60

20)700()96.1()59()250(

)700()96.1(60222

22

n

When d=400 gr, required sample size, n is 9.97~10.

Page 30: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

b) Population proportion, P:

2

2

)(

d

PQzn

i) When population size, N, is unknown

ii) When population size, N, is known

2

)(

2

2

)(

)1(

PQzdN

PQNzn

Page 31: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

Example

If we wish, with 95% confidence, the proportion of infants with low birth weight within 10% around the unknown population proportion, how many infants should be selected?

2

2

)(

d

PQzn

9610.0

)50.0)(50.0)(96.1(2

2

n

Page 32: SAMPLING METHODS. Reasons for Sampling Samples can be studied more quickly than populations. A study of a sample is less expensive than studying an entire.

If we know that the population size from which we will sample is 100, how many infants should be selected?

Example

2

)(

2

2

)(

)1(

PQzdN

PQNzn

49)96.1)(50.0)(50.0()10.0)(99(

)50.0)(50.0)(96.1)(100(22

2

n