Lecture 0317(1)

download Lecture 0317(1)

of 29

Transcript of Lecture 0317(1)

  • 8/12/2019 Lecture 0317(1)

    1/29

    Sampling and Sampling

    Distributions

    Biswo Poudel

    KUSOM

  • 8/12/2019 Lecture 0317(1)

    2/29

    Why Take Samples?

    To save money, time

    To maximize information gleaned out of

    limited resource

    Often it might be the only option. If access to

    the population is impossible, it could be the

    only option. (how would you survey the

    owners of old Omega watches in Nepal?)

  • 8/12/2019 Lecture 0317(1)

    3/29

    Census vs sampling

    Question:When is taking census a better

    option than taking a sample?

    Answer:When omission of a group of

    population is not tolerable for the researcher.

    Example: all airplanes are tested thoroughly

    because their performance is individually so

    important.

  • 8/12/2019 Lecture 0317(1)

    4/29

    Frame

    The target population from which the sample istaken.

    This population list, map, directory or other

    source used to represent the population is calledframe.

    Can also be school list, trade association lists, listssold by list brokers.

    Frames may have overregistration(includingmore than target population) orunderregistration.

  • 8/12/2019 Lecture 0317(1)

    5/29

    Random Vs Nonrandom Sampling

    Random Sampling: Every unit of the population has thesameprobability of being selected into the sample. Forexample: lottery outcomes. This is also calledprobability sampling.

    Nonrandom Sampling: Not every unit of the populationhas the same probability of being selected into thesample. This is also called nonprobability sampling.Assigning the probability of occurrence in nonrandom

    sampling is impossible. Nonrandom sampling data are not amenable to

    analysis by most of the statistical techniques.

  • 8/12/2019 Lecture 0317(1)

    6/29

    Random Sample Techniques

    There are four basic random sample

    techniques.

    1. Simple Random Sample Technique : this is

    the most elementary technique. Number each

    unit of the frame from 1 to N. Select n items

    out of that into sample by using some random

    number generator.

  • 8/12/2019 Lecture 0317(1)

    7/29

    2. Stratified Random Sampling

    In this, population is subdivided intononoverlapping subpopulations called strata.

    The researcher then extracts random sample

    from each subpopulation. It has potential for reducing sample error.

    How to choose strata? (a) must be internallyhomogenous, externally must contrast with eachother. (b) do stratification by demographicvariables such as gender, socioeconomic class,geographic region, religion and ethnicity.

  • 8/12/2019 Lecture 0317(1)

    8/29

    Stratified random sampling(SRS) could be

    either proportionate or disproportionate.

    Proportionate SRS occurs when the

    percentage of the sample taken from each

    stratum is proportionate to the percentage

    that each stratum is within the whole

    population. If the Sampling is notproportionate, then it is disproportionate SRS.

  • 8/12/2019 Lecture 0317(1)

    9/29

    Example of proportionate SRS: Suppose we

    are sampling population of Kathmandu.

    Kathmandu has 30% Newars. Suppose you

    have divided your population into strata

    involving ethnicity. If you are taking a sample

    of 100 people, then you want to make sure 30

    Newars are in the sample.

  • 8/12/2019 Lecture 0317(1)

    10/29

    3. Systematic Sampling

    Used because of its convenience and relative

    ease of administration.

    Every kth item is selected to produce a sample

    of size n from a population of size N.

    Value of k, sometimes called sampling cycle, is

    given by .

    For this to be useful, the source of population

    elements is random.

    n

    Nk

  • 8/12/2019 Lecture 0317(1)

    11/29

    4. Cluster (Area) Sampling

    Divide population into nonoverlapping areas

    (Clusters) that are internally heterogenous.

    Each cluster is, in theory, a microcosm of the

    population.

    For example, Chitwan could be a cluster, when

    thinking of taking a sample of Nepal. Other

    cities, districts, metropolitan areas can also

    qualify as a cluster.

  • 8/12/2019 Lecture 0317(1)

    12/29

    After choosing clusters, the researcher either selects allelements from the cluster or randomly selectsindividual elements into the sample from the clusters.

    Two stage sampling: when clusters are too big, andanother cluster is picked up from within a big cluster.

    Advantage:cost, convenience. Since all data are pickedfrom one cluster, the movement cost is reduced.

    Disadvantage: If the elements are similar, then thecluster sampling may be inefficient compared to simplerandom sampling. If all elements of a cluster are same,then it is not better than sampling one individual.

  • 8/12/2019 Lecture 0317(1)

    13/29

    Nonrandom Sampling techniques

    Also called nonprobability techniques sincechance is not used to select elements from thesamples.

    Four nonrandom sampling techniques arepresented here.

    1. Convenience Sampling: elements for the

    sample are selected for the convenience ofthe researcher. Researcher chooses samplesthat are readily available.

  • 8/12/2019 Lecture 0317(1)

    14/29

    2 Judgment sampling

    Elements selected for the sample are chosen by

    the judgment of the researcher.

    Researchers often believe they can obtain right

    sample by using their sound judgment.

    Sampling errors are hard to determine because

    the samples are put together nonrandomly.

    Problems: judgement error might be in onedirection (introducing bias), unlikely to include

    extreme elements

  • 8/12/2019 Lecture 0317(1)

    15/29

    3 Quota Sampling

    In essence, similar to Stratified RandomSampling(SRS).

    Certain population subclasses are used as

    strata. Use nonrandom sampling technique to gather

    data from each strata.

    For example: one may go to a Newarcommunity (say in Sundhara , Lalitpur) andinterview people there until the quota is filled.

  • 8/12/2019 Lecture 0317(1)

    16/29

    Advantage: cost, easy

    Disadvantage: it is essentiallya nonrandom

    sampling.

  • 8/12/2019 Lecture 0317(1)

    17/29

    4. Snowball Sampling

    Survey subjects are selected based onreferrals from other survey respondents.

    First pick a person who fits the profile of

    subject wanted for the study. Then ask thisperson to refer others who have similarprofile.

    Advantage: survey objects are identifiedcheaply and efficiently.

    Disadvantage: this is nonrandom.

  • 8/12/2019 Lecture 0317(1)

    18/29

    Sampling Errors and Nonsampling

    Errors

    Sampling errors: error that occurs when thesample is not representative of thepopulation.

    Nonsampling errors: all other errors such asmissing data, recording errors, in putprocessing errors, analysis errors, responseerrors, measurement instrument causederrors, defective questionairre error, poorconcept errors etc etc.

  • 8/12/2019 Lecture 0317(1)

    19/29

    Sample Mean and Sample Proportion

    Whenever a research produces measurable data such as

    weight, distance, time and income, the sample mean is often

    the statistics of choice. If the research results in countable

    items such as how many people in a sample choose Coca Cola,

    the sample proportion is often the statistics of choice.

    Sample Proportion ( )

    sampletheinitemsofnumbern

    sticscharacterithehavethatsampleainitemsofx

    wheren

    xp

    #

    p

  • 8/12/2019 Lecture 0317(1)

    20/29

    Distribution of Sample Mean

    Central Limit Theorem: If samples of size nare drawn randomly from a population thathas a mean of and standard deviation ,

    then the sample means are approximatelynormally distributed for sufficiently largesample sizes (greater than 30) regardless ofthe shape of the population distribution. If the

    population is normally distributed, the samplemeans are normally distributed for any sizesample.

    2

  • 8/12/2019 Lecture 0317(1)

    21/29

    Example:

    Suppose during any hour in a large

    department store, the average number of

    shoppers is 448, with a standard deviation of

    21 shoppers. What is the probability that arandom sample of 49 different shopping hours

    will yield a sample mean between 441 and

    446 shoppers? Answer: problem is to determine )446441( xP

  • 8/12/2019 Lecture 0317(1)

    22/29

    Notice that

    This leads to the probability of the value beingbetween 441 and 446 to be 0.4901-

    0.2486=0.2415; i.e. 24.15%.

    2486.0;4901.0

    67.0

    49

    21

    448446

    33.2

    49

    21

    448441

    valuesz

    z

    z

  • 8/12/2019 Lecture 0317(1)

    23/29

    Correction for finite sample

    If the sample is taken from a finite population

    of size N, then the z-value for sample size n

    has to be calculated using the following

    formula:

    1

    N

    nN

    n

    xz

  • 8/12/2019 Lecture 0317(1)

    24/29

    Example

    A production companys 350 hourly employees

    average 37.6 years of age, with a standarddeviation of 8.3 years. If a random sample of 45

    hourly employees is taken, what is the probabilitythat the sample will have an average age of lessthan 40 years?

    Associate probability: 0.4808. Probability ofgetting average less than 40 years is: 0.9808

    07.2

    1350

    45350

    45

    3.8

    6.3740

    1

    N

    nN

    n

    xz

  • 8/12/2019 Lecture 0317(1)

    25/29

    If the correction had not been used..

    The answer would have been 0.9738 (with

    associated z-value being 1.94).

  • 8/12/2019 Lecture 0317(1)

    26/29

    Sampling Distribution of proportion

    Normal distribution approximates the shape

    of the distribution of sample proportions if

    n.p>5 and n.q>5 (where q=1-p).

    Z-value for proportion

    n

    pq

    ppz

  • 8/12/2019 Lecture 0317(1)

    27/29

    Example:

    Suppose 60% of the electrical contractors in a

    region use a particular brand of wire. What is

    the probability of taking a random sample of

    size 120 from these electrical contractors andfinding that 0.5 or less use that brand of wire?

    Here

    24.2

    120

    4.06.0

    6.05.0

    ;120

    ;50.0

    ;60.0

    n

    pq

    ppz

    n

    p

    p

  • 8/12/2019 Lecture 0317(1)

    28/29

    Z-table associated with -2.24 is 0.4875. Hence

    the probability of z getting less than this value

    is less than 0.0125.

  • 8/12/2019 Lecture 0317(1)

    29/29

    Example: 2

    If 10% of a population of parts is defective,

    what is the probability of randomly selecting

    80 parts and finding that 12 or more parts are

    defective?

    Here 12/80=.15; p= which is

    associated with 0.0681.

    49.1

    80

    9.01.0

    1.015.0