Statistics- Random Sampling

download Statistics- Random Sampling

of 23

Transcript of Statistics- Random Sampling

  • 8/3/2019 Statistics- Random Sampling

    1/23

    Slide

    8-1

    2/10/2012

    Chapter 8

    Random Sampling: Planning

    Ahead for Data Gathering

  • 8/3/2019 Statistics- Random Sampling

    2/23

    Slide

    8-2

    2/10/2012

    Random Sampling

    The basis for statistical inference about a

    population based on a sample

    Example: Build restaurant in neighborhood?

    Population: the collection of items you want tounderstand

    Nitems. Example: all people in neighborhood

    Sample: a smaller collection of population units

    n items. Example: 100 neighborhood residents whoagree to be interviewed

    Which 100 residents?

    How to select?

  • 8/3/2019 Statistics- Random Sampling

    3/23

    Slide

    8-3

    2/10/2012

    Terminology of Sampling

    Representative Sample

    Same percentages in sample as in population

    Example: sample is representative if same percent:

    work, are young/old, are single/married, etc

    Biased Sample

    Not representative of population in an important way

    Example: sample is biased if too many retired people

    Frame: Access to population (by number from 1 to N)

    Example: from phone book:

    1. Fred Jones

    2. Mary Kipling

    N. Carol Lewis

  • 8/3/2019 Statistics- Random Sampling

    4/23

    Slide

    8-4

    2/10/2012

    Visualizing the Sampling Process

    Here is a fairly (but not perfectly) representative sample

    Note that neither open triangle was selected

    Sample

    Population

  • 8/3/2019 Statistics- Random Sampling

    5/23

    Slide

    8-5

    2/10/2012

    Sampling Terminology (continued)

    Sampling Without Replacement

    No unit may appear more than once in the sample

    After a unit is selected, it is removed from the population

    before another population unit is drawn

    Sampling With Replacement A unit can be represented more than once in the sample

    After a unit is selected, it is replaced back in the population

    before another unit is drawn

    Census A sample that consists of the entire population

    Often too expensive

    Even when affordable, may not be cost-efficient

  • 8/3/2019 Statistics- Random Sampling

    6/23

    Slide

    8-6

    2/10/2012

    Statistic and Parameter

    Sample Statistic

    Any number computed from sample data

    A random variable. Known

    Example: Average weekly food expenditures for 100 sampled

    residents Random? Yes! Due to randomness of sample selection

    Population Parameter

    Any number computed for the entire population

    A fixed number. Unknown

    Example: mean weekly food expenditures for all 77,386

    residents

    Do we ever know this? NO!

    But we estimate it (with error)

  • 8/3/2019 Statistics- Random Sampling

    7/23

    Slide

    8-7

    2/10/2012

    Estimator and Estimate

    Estimator

    A sample statistic used to guess a population parameter

    Example: Sample average for100 selected residents is an

    estimator of the population mean of all 77,386 residents

    Estimate [WRONG! Estimators are usually wrong. Often useful anyway]

    The actual number computed from the data

    Example: $33.91 is an estimate of neighborhood weekly food

    expenditures per person

    Estimation error Estimator minus population parameter. Unknown

    Example: 33.91 35.69 = 1.78

    Average, 100 residents Mean, all 77,386

    residents (unknown)

    Estimation error (unknown)

  • 8/3/2019 Statistics- Random Sampling

    8/23

    Slide

    8-8

    2/10/2012

    Sampling Terminology (continued)

    Unbiased Estimator

    Correct on average. Neither systematically too high nor

    too low

    Hits the target on average although it may miss each time

    Pilot Study

    Small-scale test before full study is run

    Test your questionnaire on a few people, change as needed,

    and then use it on hundreds more

    Sampling Distribution The probability distribution of a statistic computed

    using data from a random sample

  • 8/3/2019 Statistics- Random Sampling

    9/23

    Slide

    8-9

    2/10/2012

    Random Sample

    Provides a foundation for statistical inference

    A random sample must satisfy:

    1. Each population unit must have an equal chance of

    being selected

    This helps assure representation, because all units in the

    population are equally accessible

    2. Units must be selected independently of one another

    This guarantees that each item to be selected will bring new,

    independent information

    Properties

    Sample is representative of population (on average)

    Statistical inference will use randomness of the sample

  • 8/3/2019 Statistics- Random Sampling

    10/23

    Slide

    8-10

    2/10/2012

    Random Samples of5 from 36

    On 6by 6 grid

    5 squares shaded in, selected at random, one at a time

    Without replacement

    Note that adjacent (touching) squares can be selected

    To exclude them would make selection non-independent

    Perhaps you see systematic patterns

    They are not there by design, but by coincidence

    But a checkerboard pattern would very likely notbe random

  • 8/3/2019 Statistics- Random Sampling

    11/23

    Slide

    8-11

    2/10/2012

    Selecting a Random Sample

    Use Table of Random Digits

    Establish the frame (population units from 1 to N)

    Decide starting place in table of random digits

    Read random digits in groups

    e.g., ifN= 5,281, then use groups of4 digits (Nhas 4 digits)

    Include number group

    if it is from 1 to N, and has not yet been chosen

    Shuffle the Population (Spreadsheet)

    Arrange the population items in a column from 1 to N

    Put random numbers in an adjacent column: =RAND()

    Sort population items in order by random numbers

  • 8/3/2019 Statistics- Random Sampling

    12/23

    Slide

    8-12

    2/10/2012

    Table ofRandom Digits

    For example

    Starting in row 21, column 3

    We find 52794, then 01466

    1 2 3 4 5 6 7 8 9 10

    1 51449 39284 85527 67168 91284 19954 91166 70918 85957 194922 16144 56830 67507 97275 25982 69294 32841 20861 83114 12531

    3 48145 48280 99481 13050 81818 25282 66466 24461 97021 210724 83780 48351 85422 42978 26088 17869 94245 26622 48318 73850

    5 95329 38482 93510 39170 63683 40587 80451 43058 81923 970726 11179 69004 34273 36062 26234 58601 47159 82248 95968 99722

    7 94631 52413 31524 02316 27611 15888 13525 43809 40014 306678 64275 10294 35027 25604 65695 36014 17988 02734 31732 29911

    9 72125 19232 10782 30615 42005 90419 32447 53688 36125 2845610 16463 42028 27927 48403 88963 79615 41218 43290 53618 68082

    11 10036 66273 69506 19610 01479 92338 55140 81097 73071 61544

    12 85356 51400 88502 98267 73943 25828 38219 13268 09016 77465

    13 84076 82087 55053 75370 71030 92275 55497 97123 40919 57479

    14 76731 39755 78537 51937 11680 78820 50082 56068 36908 55399

    15 19032 73472 79399 05549 14772 32746 38841 45524 13535 03113

    16 72791 59040 61529 74437 74482 76619 05232 28616 98690 24011

    17 11553 00135 28306 65571 34465 47423 39198 54456 95283 54637

    18 71405 70352 46763 64002 62461 41982 15933 46942 36941 93412

    19 17594 10116 55483 96219 85493 96955 89180 59690 82170 7764320 09584 23476 09243 65568 89128 36747 63692 09986 47687 46448

    21 81677 62634 52794 01466 85938 14565 79993 44956 82254 6522322 45849 01177 13773 43523 69825 03222 58458 77463 58521 07273

    23 97252 92257 90419 01241 52516 66293 14536 23870 78402 4175924 26232 77422 76289 57587 42831 87047 20092 92676 12017 43554

    25 87799 33602 01931 66913 63008 03745 93939 07178 70003 1815826 46120 62298 69126 07862 76731 58527 39342 42749 57050 91725

    27 53292 55652 11834 47581 25682 64085 26587 92289 41853 3835428 81606 56009 06021 98392 40450 87721 50917 16978 39472 23505

    29 67819 47314 96988 89931 49395 37071 72658 53947 11996 6463130 50458 20350 87362 83996 86422 58694 71813 97695 28804 58523

    31 59772 27000 97805 25042 09916 77569 71347 62667 09330 02152

    32 94752 91056 08939 93410 59204 04644 44336 55570 21106 76588

    33 01885 82054 45944 55398 55487 56455 56940 68787 36591 29914

    34 85190 91941 86714 76593 77199 39724 99548 13827 84961 76740

    35 97747 67607 14549 08215 95408 46381 12449 03672 40325 77312

    36 43318 84469 26047 86003 34786 38931 34846 28711 42833 93019

    37 47874 71365 76603 57440 49514 17335 71969 58055 99136 7358938 24259 48079 71198 95859 94212 55402 93392 31965 94622 11673

    39 31947 64805 34133 03245 24546 48934 41730 47831 26531 0220340 37911 93224 87153 54541 57529 38299 65659 00202 07054 40168

    41 82714 15799 93126 74180 94171 97117 31431 00323 62793 1199542 82927 37884 74411 45887 36713 52339 68421 35968 67714 05883

    43 65934 21782 35804 36676 35404 69987 52268 19894 81977 8776444 56953 04356 68903 21369 35901 86797 83901 68681 02397 55359

    45 16278 17165 67843 49349 90163 97337 35003 34915 91485 3381446 96339 95028 48468 12279 81039 56531 10759 19579 00015 22829

    47 84110 49661 13988 75909 35580 18426 29038 79111 56049 9645148 49017 60748 03412 09880 94091 90052 43596 21424 16584 67970

    49 43560 05552 54344 69418 01327 07771 25364 77373 34841 7592750 25206 15177 63049 12464 16149 18759 96184 15968 89446 07168

    Table 8.2.1

    19 17594 10116 55483 96219 85493 96955

    20 09584 23476 09243 65568 89128 36747

    21 81677 62634 52794 01466 85938 14565

    22 45849 01177 13773 43523 69825 03222

    23 97252 92257 90419 01241 52516 66293

    24 26232 77422 76289 57587 42831 87047

  • 8/3/2019 Statistics- Random Sampling

    13/23

    Slide

    8-13

    2/10/2012

    Example: Sample Selection

    Select random sample using random number table

    Of size n = 4 from a population of size N= 861, starting

    at row 21, column 3 of the Table of Random Digits

    Start with random digits

    52794 01466 85938 14565 79993

    Group by 3 (because N= 861 has 3 digits)

    527 940 146 685 938 145 657

    Omit 000, and also 862, 863, , 999

    Omit duplicates, until n = 4 are obtained in the sample

    527 146 685 145

    The random sample includes the following population units

    (numbered by frame):

    527, 146, 685, 145

  • 8/3/2019 Statistics- Random Sampling

    14/23

    Slide

    8-14

    2/10/2012

    Visualizing the Sampling Process

    The process:

    Sample n, compute statistic

    is the same as the process:

    Choose 1 from sampling distribution

    Sample

    n units

    Statistic

    (estimator)

    Sample

    n units

    Statistic

    (estimator)

    Sample

    n units

    Statistic

    (estimator)

    Sample

    n units

    Statistic

    (estimator)

    The Population

    A histogram of these imagined

    values represents the sampling

    distribution of this statistic

  • 8/3/2019 Statistics- Random Sampling

    15/23

    Slide

    8-15

    2/10/2012

    Central Limit Theorem

    Helps you find probabilities for an average ofn

    independent individuals by giving you

    The mean

    The standard deviation

    The right to use the normal probability tables:

    Ifn is large, then the average is approximately normal,

    even if individuals are skewed

    Works for totals also by giving you

    The mean

    The standard deviation

    The right to use the normal probability tables fortotal

    Q!Q!QXX

    nX

    /W!W

    X

    Q!Q ntotal

    ntotal

    W!W

    X

  • 8/3/2019 Statistics- Random Sampling

    16/23

    Slide

    8-16

    2/10/2012

    Central Limit Theorem (continued)

    Individuals in population

    Highly non-normal distribution

    Mean Q, standard deviation W

    Averages ofn = 3 individuals

    Non-normal, but less so

    Same mean Q

    Lower std. deviation:

    Averages ofn = 10 individuals

    Close to normal

    Same mean Q

    Lower std. deviation

    0

    300

    0 5 10

    0

    100

    200

    0 5 10

    0

    100

    0 5 10

    3/W!WX

    10/W!WX

    Q

    Q

    Q

  • 8/3/2019 Statistics- Random Sampling

    17/23

    Slide

    8-17

    2/10/2012

    What good to us is ?

    Why bother choosing a larger sample to estimate Q?

    Even sampling just n= 1 (or a few) we have an unbiased

    estimator that is, on average, equal to Q

    But the error can be very large! When we sample n= 100 orn= 1,000, what we gain for our

    trouble is the LOWER ERROR , which says that

    our estimator, , is closer to the unknown population mean Q

    It tells us how the variability of the sample average

    is related to the variability of individuals in thepopulation

    This will be useful soon in defining thestandard error

    Central Limit Theorem (continued)

    nX

    /W!W

    nX

    /W!W

    X

    X

  • 8/3/2019 Statistics- Random Sampling

    18/23

  • 8/3/2019 Statistics- Random Sampling

    19/23

    Slide

    8-19

    2/10/2012

    Example (continued) The problem: Find the probability that the totalweight of a

    box exceeds 215 ounces

    By central limit theorem, total weight is

    approximately normal with

    Standardize, using CLT:

    Find

    Draw picture, use normal table

    Answer is 0.32

    210730 !v!Q!Q ntotal 95.10302 !!W!W ntotal

    "!

    "

    W

    Q46.0

    normal

    standardProb

    95.10

    210215Prob

    total

    totaltotal

    -3 -2 -1 0 1 2 3

    0.46

  • 8/3/2019 Statistics- Random Sampling

    20/23

    Slide

    8-20

    2/10/2012

    Standard Error

    Definition: the Estimated Standard Deviation (ofthe sampling distribution) of a statistic

    Standard Error of the Average

    Indicates approximately how far the sample average is

    from the population mean Q

    Advantage: can be computed using sample data

    Standard Deviation of the Average

    Also indicates approximately how far the sample average

    is from the population mean Q

    Problem: cannot be computed without population parameters

    X

    n

    SS

    X!

    n

    X

    W

    !W

    X

  • 8/3/2019 Statistics- Random Sampling

    21/23

    Slide

    8-21

    2/10/2012

    Standard Error for Binomial

    Standard Error ofp (binomial percent) is

    Indicates approximately how far the sample estimatep is fromits population value T

    Advantage: can be computed using sample data

    Standard Deviation ofp is

    Also indicates approximately how far the sample estimatep is

    from its population value T

    Problem: cannot be computed without population parameters

    n

    ppSp

    )1( !

    np

    )1( TT!W

  • 8/3/2019 Statistics- Random Sampling

    22/23

    Slide

    8-22

    2/10/2012

    Example of Standard Error

    n= 100 Sample Size: Number of people in sample

    = $23.91 Sample Average: Typical expenditure for people

    in sample. An estimate of typical expenditure Q

    for people in the population

    S= $11.49 Standard Deviation: Individuals in the sampleare approximately S= $11.49 away from =

    $23.91. Individuals in the population are

    approximately W=? away from Q=?

    Standard Error: Thesample average is approximately

    from Q=?

    X

    X

    15.1$100/49.11/ !!! nSSX

    15.1$!X

    S

  • 8/3/2019 Statistics- Random Sampling

    23/23

    Slide

    8-23

    2/10/2012

    Example: Binomial Standard Error

    X= 83 out ofn = 268 interviewed said they would buy theproduct

    p = X/n = 83/268 = 0.3097 or31.0% is the sample percent, an

    estimate ofT, the (unknown) population percent

    How far is the (known)p = 31.0% from the (unknown) T =?Answer: aboutone standard error

    The observed sample percentage,

    31.0%, is approximately 2.82%

    (in percentage points) away from

    the unknown population

    percentage T

    %82.20282.0268

    )3097.01(3097.0

    )1(

    !!

    v!

    !

    n

    ppSp