Sampling Size Determination in Farmer Surveys

download Sampling Size Determination in Farmer Surveys

of 14

Transcript of Sampling Size Determination in Farmer Surveys

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    1/14

    ICRAF Research Support Unit

    Technical Note 4

    SAMPLING SIZEDETERMINATION INFARMER SURVEYS

    ICRAF

    World Agroforestry Centre

    PO Box 30677, Nairobi, KenyaTelephone : +254 2 524 000 or +1 650 833 6645Fax : +254 2 524 001 or +1 650 833 6646http://www.worldagroforestrycentre.org

    Ric Coe Version January 996

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    2/14

    Correct citation: Ric Coe, 1996. Sampling Size Determination in Farmer Surveys. ICRAF ResearchSupport Unit Technical Note No 4. ICRAF World Agroforestry Centre Nairobi, Kenya. 11 pp.

    Copyright 1996 ICRAF World Agroforestry Centre

    This publication is the intellectual property of the International Centre for Research in Agroforestry.While use of the information it contains and its reproduction is encouraged, the content should notbe republished in any way for commercial purposes without the permission of the publishers.

    The publisher and the author make no representation, express or implied, with regard to theaccuracy of the information contained in this book and cannot accept any legal responsibility orliability for any errors or omissions that may be made.

    All terms mentioned in this publication that are known to be trademarks or service marks havebeen appropriately capitalized. The publisher and the author cannot attest to the accuracy of thisinformation. Use of a term in this publication should not be regarded as affecting the validity of anytrademark or service mark.

    ICRAFWorld Agroforestry CentrePO Box 30677Nairobi

    Kenyahttp://www.worldagroforestrycentre.org

    2

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    3/14

    Contents1 Introduction ..................................................................................................................................1

    1.1 Types of objective ................................................................................................................11.1.1 Informal/Exploratory....................................................................................................1

    1.1.2 Estimation of proportions.............................................................................................11.1.3 Estimation of means and totals.....................................................................................21.1.4 Comparison of groups (sub-populations) .....................................................................21.1.5 Estimating relationships ...............................................................................................2

    2 Sampling and Estimates ...............................................................................................................23 Factors to consider when choosing sample size ...........................................................................34 Calculations for a simple random sample. ...................................................................................45 Factors increasing required sample size .......................................................................................86 Factors decreasing the required sample size ..............................................................................107 Steps in fixing the sample size ...................................................................................................11

    i

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    4/14

    Sampling Size Determination in Farmer Surveys

    1 Introduction

    Choosing the sample size is a problem faced by anyone doing a survey of any type. Whatsample size do I need? is one of the most frequently questions asked to statisticians. Theresponse always starts It depends on.... In this note I have summarised what it depends on, andthe steps needed to reach a decision. The sample size must depend on what you want to knowabout (hence the section on objectives) and how well you want to know about it (the section onsampling variation). Factors such as how the sample is selected then further modify the requiredsample size.

    None of this material is new. It can be found in many text books. The books range from thepractical to the mathematical. Good sources covering many practical points of farmer surveys are:

    - Casley D.J and D.A.Lury (1987): Data Collection in Developing Countries, 2nd Ed.Oxford: Oxford University. 225pp.

    - Poate C.D and P.F.Daplyn (1993). Data for Agrarian Development. Cambridge:Cambridge University Press. 387pp.

    Details of the mathematics can be found in books such as:

    - Cochran W.G. (1977). Sampling Techniques, 3rd Ed. New York: Wiley. 428pp.

    1.1 Types of objective

    1.1.1 Informal/Exploratory

    At early stages in many research programs surveys have general exploratory objectives such as To understand constraints in the farming system..... or To examine farmers soil fertilitymaintenance strategies. Informal survey techniques are used and informal approaches to samplesize are sufficient. The simple rule is stop collecting data when you stop learning anythingnew . Such surveys are essential for developing understanding of issues and developinghypotheses, but will not be considered further here.

    1.1.2 Estimation of proportions

    Objectives of focused, formal surveys often can be reduced to estimation of proportions of aspecified population. Examples are: The proportion of farmers in Embu that plant beans in thelong rains, The proportion of individuals who spend at least 20% of work time working off farm,The proportion of farmland occupied by permanent tree crops.

    1

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    5/14

    ICRAF Research Support Unit Technical Note No 4

    1.1.3 Estimation of means and totals

    Examples are: The average amount of tree fodder consumed per cow during the 4 month dryseason, The labour required to clear a 2 year old improved fallow plot, The number of grevilleatrees per 100m of farm boundary.

    Note

    Totals (for a population) are estimated as mean x population size. If the populationsize is known then the estimation of a total is the same problem as estimation of amean.

    Sample size for estimation of other quantities such as the median or 25% point followsthe same principles as for means.

    1.1.4 Comparison of groups (sub-populations)The objectives may require comparison of proportions or means (totals, medians,..) betweendifferent groups. For example: Do farmers with large farms have more livestock?, Is grevilleamore common in agro-ecological zone UM2 than UM3?, Do female headed households use lesshired labour?

    1.1.5 Estimating relationships

    Objectives may be reduced to confirming the existence of a hypothesized relationship or estimatingparameters in a known relationship. For example Confirm that the number of fruit trees planted isinversely proportional to distance from the road.

    2 Sampling and Estimates

    For each of the above objectives there is a quantity M (or quantities) to be estimated. There is atrue value of the quantity, which is the answer we would get if we could measure the wholepopulation without error. This is (almost) always impossible, but the idea is useful.

    A sample is taken and an estimate )

    of M is found ( e.g. If M is the mean then )

    could be thesample mean). If

    ) is close to M we have a good estimate, if it is very different from M it is a poor

    estimate. If we took another sample we would get a different value of )

    . The set of possiblevalues of

    ) is the sampling distribution of the estimate . In practice we take only one sample,

    but the distribution of possible values is again a useful idea. Your one sample could give any oneof the possible estimates so it is useful to know whether they are all close to the true value, or ifthere is a fair chance of getting an estimate which is very far from the true value.

    2

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    6/14

    Sampling Size Determination in Farmer Surveys

    3 Factors to consider whenchoosing sample size

    Accuracy.

    How close do you want your estimate to be to the true answer? It will never be equal to the trueanswer, but how much inaccuracy can you tolerate? Accuracy (or inaccuracy) depends on:

    - Sampling error , the deviation of )

    from M due to the fact that we only measuresome (a sample) of the individuals in the population. Sampling error can bemathematically examined and the effect of sample size determined.

    - Non-sampling error , the deviations of )

    from M due to anything other thansampling error. For example, poorly phrased questions, inaccurate recording,refusal to respond, faking data (it happens!). Non-sampling error is very difficult toquantify and is usually ignored in sample size calculations, yet can be larger thansampling error. Sample size may or may not affect non-sampling errors. Many non-sampling errors increase with increasing sample size because of the less carefulsupervision and quality control possible, the larger number of enumerators, longertime for collecting and processing the data, and so on.

    - Bias . An estimate is biased if it is consistently too large or too small. The definitionof bias is the difference between M and the mean of the sampling distribution of

    ).

    Bias can arise for many reasons. For example:- (i) sample selection favouring large farms, so that farm size and anything

    correlated with it will be over estimated.- (ii) Respondents rounding up values to please interviewers.- (iii) Interviewers rounding up answers because they are sure the farmer

    underestimated.Many sources of bias are unaffected by sample size.

    Precision.

    The precision of an estimate is the spread of its sampling distribution, usually measured by itsstandard deviation. The standard deviation of the sampling distribution is called the standarderror of the estimate. For many sampling schemes the effect of sample size on standard error canbe calculated.

    Cost.

    Increasing sample size will increase costs in the field (transport, enumerators) and afterwards(coding, data entry).

    Choosing sample size involves balancing all these factors.

    3

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    7/14

    ICRAF Research Support Unit Technical Note No 4

    4 Calculations for a simple random

    sample. A simple random sample is a sample of n selected from the population in such a way that everyindividual in the population has equal chance of being included in the sample.

    Estimating a proportion

    If you are trying to estimate a proportion P then the estimate will be P , the sample proportion. The

    standard error of P isP(1 P)

    nse(P)

    =

    A confidence interval is

    t. se( P )P

    The value of t depends mainly on the level of confidence required. A common value is 95%, whichgives a value of t of (about) 2, leading to simple calculations. A 95% confidence interval is arange of values which, roughly, we are 95% certain contains the true value P.

    Note the width of the confidence interval is C = 2t se ( P ) = 4 se ( P ) when t = 2.

    Example: A simple random sample of 50 farmers are interviewed. 40 of them own

    cattle. If P=proportion owning cattle then P = 40/50 = 0.8

    se( P ) = (0.8 0.2/50) = 0.057

    95% c.i. for P is 0.8 2 0.057 = (0.69, 0.91).

    P lies somewhere between 69% and 91%.

    Estimating a mean

    The estimate of a population mean M is the sample mean M , where m= m

    n

    i

    i are the sample

    values.

    se (M =)2

    s

    n,

    where s 2 = sample variance = (M M)2

    n 1i

    4

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    8/14

    Sampling Size Determination in Farmer Surveys

    A 95% confidence interval is 2 se ( M )M

    Example. Twenty five 100m lengths of farm boundary were selected from all

    boundaries. The mean number of grevillea trees was = 11.2. The variance wasM

    s2

    =

    37.2.

    Then se( M) = =37 225

    1 22.

    .

    A 95% c.i. is 11.2 2 1.22 = (8.8, 13.6).

    The mean number of trees per 100 m of boundary is between 8.8 and 13.6.

    Estimating the difference between two proportions.

    P 1 is the proportion in one population and P 2 is the proportion in another. If interested in d = P 1 - P 2 , then samples would be taken in each population and the estimated difference is .

    The standard error is

    d P 1 P 2

    =

    se( , where nd) P1(1 P1) / n P 2(1 P2) / n1= + 2 i is the size of the sampletaken in population i.

    Example: A simple random sample of 25 large (>2.5ha) farms was selected. Theproportion owning cattle was 84%. Another sample of 25 small (< 1.0ha) farms wasselected. The proportion owning cattle was 53%.

    The difference is d = 0.84 - 0.53 = 0.31se(d) = (0.84 0.16 + 0.53 0.47 )/25 = 0.12

    A 95% c.i. is 0.31 2 0.12 = (0.07, 0.55). The difference in rate of cattle ownership issomewhere between 7% and 55% .

    Note that when comparing two populations, results can be presented in two ways. Thesignificance of the difference can be calculated or the size of the difference, together with itsuncertainty (standard error or confidence interval). The later is much more informative. Lack of asignificant difference can be due either to the real difference being very small, or the differencebeing quite large but poorly estimated. Quoting a confidence interval for the differencedistinguishes these two cases.

    5

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    9/14

    ICRAF Research Support Unit Technical Note No 4

    Estimating the difference between two means

    M1 and M 2 are the means of two populations and d is the difference. A simple random sample istaken in each population and the sample means found. Then

    d M 1 M 2

    = and se( , where s is the variance and nd) s / n s / n12

    1 22

    2= +

    i2

    i the sample size in

    population i.

    The calculations in each of the above sections can be inverted to give n, thesample size, if the other quantities are known.

    Example: The number of Grevillea trees per 100m of farm boundary was measuredseparately for farms settled over 10 years ago and for those settled less than 10 yearsago with the following results:

    < 10 years > 10 yearsn 15 10mean 14.2 8.4s 2 23.2 12.8

    The difference in mean number of Grevillea is d = 14.2 - 8.4 = 5.8.

    se(d) = (23.2/15 + 12.8/10) = 1.7

    A 95% c.i. is 5.8 2 1.7 = (2.4, 9.2).

    The difference in mean number of grevillea per 100m is somewhere between 2.4 and

    9.2.

    6

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    10/14

    Sampling Size Determination in Farmer Surveys

    Estimating Need to know n

    Proportion P 1. Approximate value of P2. Width of confidenceinterval, C

    16P(1 P)C 2

    Or

    2. se ( P )

    P(1 P)

    se(P) 2

    Mean M 1. Variance in population, s 2 2. Width of confidenceinterval, C

    16 22

    s

    C

    Or

    2. se( M )

    s

    se(M)

    2

    2

    Difference in ProportionsP 1 - P 2

    1. Approximate P 1 and P 2 2. Width of confidenceinterval of difference, C

    32[P (1 P ) P (1 P )]2 1 1 2 2 + C

    *

    Difference in meansM1 - M2

    1. Population variances s i2

    2. Width of confidenceinterval of difference, C

    32(sn2

    12

    1

    + s

    n22

    2

    )C

    *

    (* Total, assuming equal sample size in each population)

    Sampling fraction

    The sampling fraction, f, is the proportion of the population included in the sample. The samplefraction does not enter the above calculations. Sample size should not be selected bychoosing the sampling fraction.

    The only exception is when the population is very small and the sampling fraction becomes large(>10%). Then standard errors are reduced by a factor (1-f). Note that when a census is done(i.e. every individual is measured and f = 100%), there is no sampling error. There may well still besubstantial non-sampling errors.

    7

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    11/14

    ICRAF Research Support Unit Technical Note No 4

    5 Factors increasing required

    sample sizeCluster sampling.

    Simple random sampling is rarely practical. Some form of multistage or cluster sampling is usuallyused. For example, 50 farmers might be selected by choosing 5 districts at random, 2 villages ineach district and 5 farmers in each village. Nearly always observations in the same cluster will bepositively correlated (i.e. more similar to each other than observations in different clusters). Henceeach new observation gives less new information than if it was independent, so the standard erroris inflated. A clustered or multistage sample of size n is less precise than a simple random sampleof the same size n. If cluster or multistage sampling is used then the required sample size will belarger than that predicted by the formulae above. How much larger depends on the size of theintracluster correlation.

    Choosing a suitable sampling scheme (eg how is the total sample size to be distributed betweendistricts and villages within districts). is a separate problem not dealt with here. The basic principleis to ensure the sample is spread out through the whole population as much as possible. Thus, forexample, choosing 2 villages then 50 farmers per village is likely to give far less precise resultsthan choosing 20 villages and 5 farmers per village, though both have the same total sample size.

    Other non-independence

    Other sources of non-independence in the observations have the same effect as clustering.Examples include interviewer effects (responses collected by the same interviewer tend to besimilar), communication between respondents (the extreme case being attempting to collect datafrom individuals at group meetings).

    These will increase the required sample size beyond that estimated above.

    Non-random sampling

    Non-random methods of sampling can have the same effect as clustering if the result is non-independent observations.

    Non-sampling errors

    Non-sampling errors will inflate true standard errors beyond those given by the formulae above. Alarger sample size will thus be required. Note that increasing the sample size may not help if theproblem is bias, for example that caused by the sample selection procedure.

    Non-response, drop out, lost respondents.

    The planned sample size is often not achieved as selected respondents refuse to take part, cannot be found or drop out of multi round surveys. Some of these problems can be avoided byhaving a well-defined rule for replacing selected respondents (e.g. if no one is at home at theselected house go to the next but one house on the other side of the road). Beware that selectedrespondents who refuse to take part or cannot be found may introduce bias (but that is anotherproblem).

    The planned sample size must be increased to allow for non-response.

    8

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    12/14

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    13/14

    ICRAF Research Support Unit Technical Note No 4

    6 Factors decreasing the required

    sample sizeStratification

    Stratification means dividing the population into subgroups (strata) that can be identified beforedata collection, and taking a separate sample in each stratum. Two main reasons for stratificationare:

    - Because results are required for each stratum. Sample size determination will haveto be done for each stratum.

    - Reducing variability and increasing precision. If strata are selected so that thevariation between individuals within each stratum is less than the overall variance,the gain in precision can be substantial, hence reducing the required sample size.

    Effective stratification to increase precision requires knowledge of the population being studied, sothat relevant, identifiable strata can be chosen. Once within and between stratum variances areknown sample sizes can be calculated.

    The ideas of stratification are also used to design surveys for estimating relationships. Considerthe problem of looking at the relationship between the number of fruit trees (y) and distance fromthe road (x). A simple random sample is likely to have many observations with x values close to themean. The scatter diagram of y against x will then have most observations in a cloud near themiddle and it will not be possible to get a clear picture (and good estimate) of the relationship. Ifthe population is stratified by distance from the road ( for example dividing it into 3 groups of 1000m) a stratified sample can be taken. This ensures that there aresufficient observations at the low and high ends of the x range. Relationships involving several xvariables will require stratification by each of those variables.

    Non-random sampling

    Some non-random sampling schemes can be much more efficient than simple random sampling,giving small standard errors, or reduced sample sizes for the same precision. The best example isgrid sampling of spatially defined populations (e.g. land use, soil). The gain in precision that canbe achieved by systematic sampling is difficult to quantity.

    10

  • 8/13/2019 Sampling Size Determination in Farmer Surveys

    14/14

    Sampling Size Determination in Farmer Surveys

    7 Steps in fixing the sample size

    1. Refine the objectives of the survey. In order to make rational sample size choices, both thequantities to be estimated and the precision required must be specified.

    Note that objectives (or hypotheses) should never be stated in terms of lack of significance (forexample Confirm there is no significant difference in the density of grevillea planted on large andsmall farms). Lack of significant effects can always be guaranteed by doing a poor survey!

    2. Decide which are the key objectives. Most surveys have multiple objectives which mightrequire different sample sizes. Choose 2 or 3 principle objectives.

    3. Collect the information needed to make sample size calculations. Information on thevariation expected (s 2, CV) and possibly useful stratification come from:

    - Previous surveys in the area- Similar surveys in other areas- Censuses- Rough estimates from informal data collection- Pilot surveys.

    4. Estimate the sample size for each of the main objectives, and select the largest.

    5. Estimate the cost (in terms of any limiting source: money, transport, time...).

    If the cost of the required sample is too high go back to 1. The objectives willhave to be made more modest! There is no point just reducing sample size tomatch available resources, as the objectives will not be met.