SADC Course in Statistics Estimation in Stratified Random Sampling (Session 07)
Statistics- Random Sampling
Transcript of Statistics- Random Sampling
-
8/3/2019 Statistics- Random Sampling
1/23
Slide
8-1
2/10/2012
Chapter 8
Random Sampling: Planning
Ahead for Data Gathering
-
8/3/2019 Statistics- Random Sampling
2/23
Slide
8-2
2/10/2012
Random Sampling
The basis for statistical inference about a
population based on a sample
Example: Build restaurant in neighborhood?
Population: the collection of items you want tounderstand
Nitems. Example: all people in neighborhood
Sample: a smaller collection of population units
n items. Example: 100 neighborhood residents whoagree to be interviewed
Which 100 residents?
How to select?
-
8/3/2019 Statistics- Random Sampling
3/23
Slide
8-3
2/10/2012
Terminology of Sampling
Representative Sample
Same percentages in sample as in population
Example: sample is representative if same percent:
work, are young/old, are single/married, etc
Biased Sample
Not representative of population in an important way
Example: sample is biased if too many retired people
Frame: Access to population (by number from 1 to N)
Example: from phone book:
1. Fred Jones
2. Mary Kipling
N. Carol Lewis
-
8/3/2019 Statistics- Random Sampling
4/23
Slide
8-4
2/10/2012
Visualizing the Sampling Process
Here is a fairly (but not perfectly) representative sample
Note that neither open triangle was selected
Sample
Population
-
8/3/2019 Statistics- Random Sampling
5/23
Slide
8-5
2/10/2012
Sampling Terminology (continued)
Sampling Without Replacement
No unit may appear more than once in the sample
After a unit is selected, it is removed from the population
before another population unit is drawn
Sampling With Replacement A unit can be represented more than once in the sample
After a unit is selected, it is replaced back in the population
before another unit is drawn
Census A sample that consists of the entire population
Often too expensive
Even when affordable, may not be cost-efficient
-
8/3/2019 Statistics- Random Sampling
6/23
Slide
8-6
2/10/2012
Statistic and Parameter
Sample Statistic
Any number computed from sample data
A random variable. Known
Example: Average weekly food expenditures for 100 sampled
residents Random? Yes! Due to randomness of sample selection
Population Parameter
Any number computed for the entire population
A fixed number. Unknown
Example: mean weekly food expenditures for all 77,386
residents
Do we ever know this? NO!
But we estimate it (with error)
-
8/3/2019 Statistics- Random Sampling
7/23
Slide
8-7
2/10/2012
Estimator and Estimate
Estimator
A sample statistic used to guess a population parameter
Example: Sample average for100 selected residents is an
estimator of the population mean of all 77,386 residents
Estimate [WRONG! Estimators are usually wrong. Often useful anyway]
The actual number computed from the data
Example: $33.91 is an estimate of neighborhood weekly food
expenditures per person
Estimation error Estimator minus population parameter. Unknown
Example: 33.91 35.69 = 1.78
Average, 100 residents Mean, all 77,386
residents (unknown)
Estimation error (unknown)
-
8/3/2019 Statistics- Random Sampling
8/23
Slide
8-8
2/10/2012
Sampling Terminology (continued)
Unbiased Estimator
Correct on average. Neither systematically too high nor
too low
Hits the target on average although it may miss each time
Pilot Study
Small-scale test before full study is run
Test your questionnaire on a few people, change as needed,
and then use it on hundreds more
Sampling Distribution The probability distribution of a statistic computed
using data from a random sample
-
8/3/2019 Statistics- Random Sampling
9/23
Slide
8-9
2/10/2012
Random Sample
Provides a foundation for statistical inference
A random sample must satisfy:
1. Each population unit must have an equal chance of
being selected
This helps assure representation, because all units in the
population are equally accessible
2. Units must be selected independently of one another
This guarantees that each item to be selected will bring new,
independent information
Properties
Sample is representative of population (on average)
Statistical inference will use randomness of the sample
-
8/3/2019 Statistics- Random Sampling
10/23
Slide
8-10
2/10/2012
Random Samples of5 from 36
On 6by 6 grid
5 squares shaded in, selected at random, one at a time
Without replacement
Note that adjacent (touching) squares can be selected
To exclude them would make selection non-independent
Perhaps you see systematic patterns
They are not there by design, but by coincidence
But a checkerboard pattern would very likely notbe random
-
8/3/2019 Statistics- Random Sampling
11/23
Slide
8-11
2/10/2012
Selecting a Random Sample
Use Table of Random Digits
Establish the frame (population units from 1 to N)
Decide starting place in table of random digits
Read random digits in groups
e.g., ifN= 5,281, then use groups of4 digits (Nhas 4 digits)
Include number group
if it is from 1 to N, and has not yet been chosen
Shuffle the Population (Spreadsheet)
Arrange the population items in a column from 1 to N
Put random numbers in an adjacent column: =RAND()
Sort population items in order by random numbers
-
8/3/2019 Statistics- Random Sampling
12/23
Slide
8-12
2/10/2012
Table ofRandom Digits
For example
Starting in row 21, column 3
We find 52794, then 01466
1 2 3 4 5 6 7 8 9 10
1 51449 39284 85527 67168 91284 19954 91166 70918 85957 194922 16144 56830 67507 97275 25982 69294 32841 20861 83114 12531
3 48145 48280 99481 13050 81818 25282 66466 24461 97021 210724 83780 48351 85422 42978 26088 17869 94245 26622 48318 73850
5 95329 38482 93510 39170 63683 40587 80451 43058 81923 970726 11179 69004 34273 36062 26234 58601 47159 82248 95968 99722
7 94631 52413 31524 02316 27611 15888 13525 43809 40014 306678 64275 10294 35027 25604 65695 36014 17988 02734 31732 29911
9 72125 19232 10782 30615 42005 90419 32447 53688 36125 2845610 16463 42028 27927 48403 88963 79615 41218 43290 53618 68082
11 10036 66273 69506 19610 01479 92338 55140 81097 73071 61544
12 85356 51400 88502 98267 73943 25828 38219 13268 09016 77465
13 84076 82087 55053 75370 71030 92275 55497 97123 40919 57479
14 76731 39755 78537 51937 11680 78820 50082 56068 36908 55399
15 19032 73472 79399 05549 14772 32746 38841 45524 13535 03113
16 72791 59040 61529 74437 74482 76619 05232 28616 98690 24011
17 11553 00135 28306 65571 34465 47423 39198 54456 95283 54637
18 71405 70352 46763 64002 62461 41982 15933 46942 36941 93412
19 17594 10116 55483 96219 85493 96955 89180 59690 82170 7764320 09584 23476 09243 65568 89128 36747 63692 09986 47687 46448
21 81677 62634 52794 01466 85938 14565 79993 44956 82254 6522322 45849 01177 13773 43523 69825 03222 58458 77463 58521 07273
23 97252 92257 90419 01241 52516 66293 14536 23870 78402 4175924 26232 77422 76289 57587 42831 87047 20092 92676 12017 43554
25 87799 33602 01931 66913 63008 03745 93939 07178 70003 1815826 46120 62298 69126 07862 76731 58527 39342 42749 57050 91725
27 53292 55652 11834 47581 25682 64085 26587 92289 41853 3835428 81606 56009 06021 98392 40450 87721 50917 16978 39472 23505
29 67819 47314 96988 89931 49395 37071 72658 53947 11996 6463130 50458 20350 87362 83996 86422 58694 71813 97695 28804 58523
31 59772 27000 97805 25042 09916 77569 71347 62667 09330 02152
32 94752 91056 08939 93410 59204 04644 44336 55570 21106 76588
33 01885 82054 45944 55398 55487 56455 56940 68787 36591 29914
34 85190 91941 86714 76593 77199 39724 99548 13827 84961 76740
35 97747 67607 14549 08215 95408 46381 12449 03672 40325 77312
36 43318 84469 26047 86003 34786 38931 34846 28711 42833 93019
37 47874 71365 76603 57440 49514 17335 71969 58055 99136 7358938 24259 48079 71198 95859 94212 55402 93392 31965 94622 11673
39 31947 64805 34133 03245 24546 48934 41730 47831 26531 0220340 37911 93224 87153 54541 57529 38299 65659 00202 07054 40168
41 82714 15799 93126 74180 94171 97117 31431 00323 62793 1199542 82927 37884 74411 45887 36713 52339 68421 35968 67714 05883
43 65934 21782 35804 36676 35404 69987 52268 19894 81977 8776444 56953 04356 68903 21369 35901 86797 83901 68681 02397 55359
45 16278 17165 67843 49349 90163 97337 35003 34915 91485 3381446 96339 95028 48468 12279 81039 56531 10759 19579 00015 22829
47 84110 49661 13988 75909 35580 18426 29038 79111 56049 9645148 49017 60748 03412 09880 94091 90052 43596 21424 16584 67970
49 43560 05552 54344 69418 01327 07771 25364 77373 34841 7592750 25206 15177 63049 12464 16149 18759 96184 15968 89446 07168
Table 8.2.1
19 17594 10116 55483 96219 85493 96955
20 09584 23476 09243 65568 89128 36747
21 81677 62634 52794 01466 85938 14565
22 45849 01177 13773 43523 69825 03222
23 97252 92257 90419 01241 52516 66293
24 26232 77422 76289 57587 42831 87047
-
8/3/2019 Statistics- Random Sampling
13/23
Slide
8-13
2/10/2012
Example: Sample Selection
Select random sample using random number table
Of size n = 4 from a population of size N= 861, starting
at row 21, column 3 of the Table of Random Digits
Start with random digits
52794 01466 85938 14565 79993
Group by 3 (because N= 861 has 3 digits)
527 940 146 685 938 145 657
Omit 000, and also 862, 863, , 999
Omit duplicates, until n = 4 are obtained in the sample
527 146 685 145
The random sample includes the following population units
(numbered by frame):
527, 146, 685, 145
-
8/3/2019 Statistics- Random Sampling
14/23
Slide
8-14
2/10/2012
Visualizing the Sampling Process
The process:
Sample n, compute statistic
is the same as the process:
Choose 1 from sampling distribution
Sample
n units
Statistic
(estimator)
Sample
n units
Statistic
(estimator)
Sample
n units
Statistic
(estimator)
Sample
n units
Statistic
(estimator)
The Population
A histogram of these imagined
values represents the sampling
distribution of this statistic
-
8/3/2019 Statistics- Random Sampling
15/23
Slide
8-15
2/10/2012
Central Limit Theorem
Helps you find probabilities for an average ofn
independent individuals by giving you
The mean
The standard deviation
The right to use the normal probability tables:
Ifn is large, then the average is approximately normal,
even if individuals are skewed
Works for totals also by giving you
The mean
The standard deviation
The right to use the normal probability tables fortotal
Q!Q!QXX
nX
/W!W
X
Q!Q ntotal
ntotal
W!W
X
-
8/3/2019 Statistics- Random Sampling
16/23
Slide
8-16
2/10/2012
Central Limit Theorem (continued)
Individuals in population
Highly non-normal distribution
Mean Q, standard deviation W
Averages ofn = 3 individuals
Non-normal, but less so
Same mean Q
Lower std. deviation:
Averages ofn = 10 individuals
Close to normal
Same mean Q
Lower std. deviation
0
300
0 5 10
0
100
200
0 5 10
0
100
0 5 10
3/W!WX
10/W!WX
Q
Q
Q
-
8/3/2019 Statistics- Random Sampling
17/23
Slide
8-17
2/10/2012
What good to us is ?
Why bother choosing a larger sample to estimate Q?
Even sampling just n= 1 (or a few) we have an unbiased
estimator that is, on average, equal to Q
But the error can be very large! When we sample n= 100 orn= 1,000, what we gain for our
trouble is the LOWER ERROR , which says that
our estimator, , is closer to the unknown population mean Q
It tells us how the variability of the sample average
is related to the variability of individuals in thepopulation
This will be useful soon in defining thestandard error
Central Limit Theorem (continued)
nX
/W!W
nX
/W!W
X
X
-
8/3/2019 Statistics- Random Sampling
18/23
-
8/3/2019 Statistics- Random Sampling
19/23
Slide
8-19
2/10/2012
Example (continued) The problem: Find the probability that the totalweight of a
box exceeds 215 ounces
By central limit theorem, total weight is
approximately normal with
Standardize, using CLT:
Find
Draw picture, use normal table
Answer is 0.32
210730 !v!Q!Q ntotal 95.10302 !!W!W ntotal
"!
"
W
Q46.0
normal
standardProb
95.10
210215Prob
total
totaltotal
-3 -2 -1 0 1 2 3
0.46
-
8/3/2019 Statistics- Random Sampling
20/23
Slide
8-20
2/10/2012
Standard Error
Definition: the Estimated Standard Deviation (ofthe sampling distribution) of a statistic
Standard Error of the Average
Indicates approximately how far the sample average is
from the population mean Q
Advantage: can be computed using sample data
Standard Deviation of the Average
Also indicates approximately how far the sample average
is from the population mean Q
Problem: cannot be computed without population parameters
X
n
SS
X!
n
X
W
!W
X
-
8/3/2019 Statistics- Random Sampling
21/23
Slide
8-21
2/10/2012
Standard Error for Binomial
Standard Error ofp (binomial percent) is
Indicates approximately how far the sample estimatep is fromits population value T
Advantage: can be computed using sample data
Standard Deviation ofp is
Also indicates approximately how far the sample estimatep is
from its population value T
Problem: cannot be computed without population parameters
n
ppSp
)1( !
np
)1( TT!W
-
8/3/2019 Statistics- Random Sampling
22/23
Slide
8-22
2/10/2012
Example of Standard Error
n= 100 Sample Size: Number of people in sample
= $23.91 Sample Average: Typical expenditure for people
in sample. An estimate of typical expenditure Q
for people in the population
S= $11.49 Standard Deviation: Individuals in the sampleare approximately S= $11.49 away from =
$23.91. Individuals in the population are
approximately W=? away from Q=?
Standard Error: Thesample average is approximately
from Q=?
X
X
15.1$100/49.11/ !!! nSSX
15.1$!X
S
-
8/3/2019 Statistics- Random Sampling
23/23
Slide
8-23
2/10/2012
Example: Binomial Standard Error
X= 83 out ofn = 268 interviewed said they would buy theproduct
p = X/n = 83/268 = 0.3097 or31.0% is the sample percent, an
estimate ofT, the (unknown) population percent
How far is the (known)p = 31.0% from the (unknown) T =?Answer: aboutone standard error
The observed sample percentage,
31.0%, is approximately 2.82%
(in percentage points) away from
the unknown population
percentage T
%82.20282.0268
)3097.01(3097.0
)1(
!!
v!
!
n
ppSp