Surviving Changing Markets Mission Impossible?. Mission Impossible.
1 Sampling Distributions Lecture 9. 2 Background We want to learn about the feature of a population...
-
Upload
robert-porter -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Sampling Distributions Lecture 9. 2 Background We want to learn about the feature of a population...
2
Background
We want to learn about the feature of a population (parameter) In many situations, it is impossible to examine all elements of a
population because elements are physically inaccessible, too costly to do so, or the examination involved may destroy the item.
Sample is a relatively small subset of the total population. We study a random sample to draw conclusions about a population,
this is where statistics come into the picture. Statistics, such as the sample mean and sample variance,
computed from sample measurements, vary from sample to sample. Therefore, they are random variables.
The probability distribution of a statistic is called a sampling distribution.
3
Sampling Distributions
Sampling Distributions
Sampling Distribution of
the Mean
Sampling Distribution of the Proportion
A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population
4
Developing a Sampling Distribution
Assume there is a population …
Population size N=4
Random variable, X,
is age of individuals
Values of X: 18, 20,
22, 24 (years)
A B C D
5
.3
.2
.1
0 18 20 22 24
A B C D
P(x)
x
(continued)
Summary Measures for the Population Distribution:
Developing a Sampling Distribution
214
24222018
N
Xμ i
2.236N
μ)(Xσ
2i
6
Sampling with replacement
Samples Age Sample means
A, A 18, 18 18
A, B 18, 20 19
A, C 18, 22 20
A, D 18, 24 21
B, A 20, 18 19
B, B 20, 20 20
B, C 20, 22 21
B, D 20, 24 22
C, A 22, 18 20
C, B 22, 20 21
C, C 22, 22 22
C, D 22, 24 23
D, A 24, 18 21
D, D 24, 20 22
D, C 24, 22 23
D, D 24, 24 24
7
1st 2nd Observation Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
Sampling Distribution of All Sample Means
18 19 20 21 22 23 240
.1
.2
.3 P(X)
X
Sample Means Distribution
16 Sample Means
_
Developing a Sampling Distribution
(continued)
_
8
Summary Measures of this Sampling Distribution (note
that N=16 for the population of sample means):
Developing aSampling Distribution
(continued)
2116
24211918
N
Xμ i
X
1.5816
21)-(2421)-(1921)-(18
N
)μX(σ
222
2Xi
X
9
Comparing the Population with its Sampling Distribution (with replacement)
18 19 20 21 22 23 240
.1
.2
.3 P(X)
X 18 20 22 24
A B C D
0
.1
.2
.3
PopulationN = 4
P(X)
X _
21.58σ 21μ
X
X2.236σ 21μ
Sample Means Distributionn = 2
_
10
Mean and standard error of the sample Mean (sample with replacement)
The mean of the distribution of sample mean:
A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean:(This assumes that sampling is with replacement or sampling is without replacement from an infinite population)
Note that the standard error of the mean decreases as the sample size increases
n
σσ
X
X
11
If the Population is Normal If a population is normal with mean μ and standard
deviation σ, The sampling distribution of is also normally distributed
with
and
Or, equivalently, the sampling distribution of is
normally distributed with
and
n
iiX
1
X
μμX
n
σσ
X
μμ niX n
iXσ
12
Sampling Distribution Properties
As n increases,
decreasesLarger sample size
Smaller sample size
x
(continued)
xσ
μ
13
If the Population is not normal The central limit theorem states that when the number of
observations in each sample (called sample size) gets large enough
The sampling distribution of is approximately normally
distributed with
and
Or, equivalently, the sampling distribution of is also
approximately normally distributed with
and
n
iiX
1
X
μμX
n
σσ
X
μμ niX n
iXσ
15
Population Distribution
Sampling Distribution (becomes normal as n increases)
Central Tendency
Variation
x
x
Larger sample size
Smaller sample size
Visualizing the Central Limit Theorem
Sampling distribution properties:
μμx
n
σσx
xμ
μ
16
How Large is Large Enough?
For most distributions, n > 30 will give a sampling distribution that is nearly normal
For fairly symmetric distributions, n > 15
Recall that, for normal population distributions, the sampling distribution of the mean is always normally distributed regardless of sample size n
17
Calculating probabilities
Suppose we want to find out
If the population is normal, then regardless of the value of n:
If the population is not normal, then, when n is large enough (n > 30)
n
aZ
n
aPbXaP
)(
)( bXaP
n
aZ
n
aPbXaP
)(
18
Example
Suppose a population has mean μ = 10 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected.
What is the probability that the sample mean is between 9.7 and 10.3?
19
Example
Solution:
Even if the population is not normally distributed, the central limit theorem can be used (n > 30)
… so the sampling distribution of is approximately normal
… with mean = 10
…and standard deviation
(continued)
x
xμ
0.536
3
n
σσx
20
Example
Solution (continued):(continued)
0.65140.6)ZP(-0.6
363
10-10.3
nσ
μ- X
363
10-9.7P 10.3) X P(9.7
9.7 10 10.3
Sampling Distribution
Population Distribution
??
??
?????
??? Sample
10μ 10μ
X xX
21
One more example
Time spent using e-mail per session is normally distributed with =8 minutes and =2 minutes.
1. If a random sample of 25 sessions were selected, what proportion of the sample mean would be between 7.8 and 8.2 minutes?
22
Example (Cont’d)2. If a random sample of 100 sessions were selected, what proportion
of the sample mean would be between 7.8 and 8.2 minutes?
3. What sample size would you suggest if it is desired to have at least 0.90 probability that the sample mean is within 0.2 of the population mean?
23
Sampling Distribution of the Proportion
Sampling Distributions
Sampling Distribution of
the Mean
Sampling Distribution of the Proportion
24
Population Proportions
In Bernoulli trials, let
π = the proportion of successes
Recall that Y = the number of successes in n Bernoulli trials follows
Bin(n, π)
For the ith Bernoulli trial, Define
Then, obviously
failure"" a is outcomeith theif 0
success"" a is outcomeith theif 1iX
)1()( and )( ii XXE
25
Population proportions (Cont’d) For large n, apply the CLT to sample mean and sum
How large is large?
Or
)-(1,N as ddistributeely approximat is
n
)-(1,N as ddistributeely approximat is
2
1
2
1
nnXY
n
XXp
n
ii
n
ii
51 and 5 )-n(n
51 and 5 -p)n(np
27
Example
If the true proportion of voters who support
Proposition A is π = 0.4, what is the probability
that a sample of size 200 yields a sample
proportion between 0.40 and 0.45?
i.e.: if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
28
Example(continued)
0.03464200
0.4)0.4(1
n
)(1σp
0.4251 1.44)ZP(0
0.03464
0.400.45Z
0.03464
0.400.40P0.45)pP(0.40
Find :
Convert to standard normal:
pσ
29
Review example The number of claims received by an automobile insurance company
on collision insurance on one day follows the following probability distribution:
With
Suppose the number of claims received are independent from day to day.
x 0 1 2 3 4
p(x) 0.65 0.2 0.1 0.03 0.02
93.0)(
57.0)(
24
0
2
4
0
x
x
xpx
xxp
30
Review example (cont’d) For a 50-day period, Find the probability of the following
events:1) The total number of claims exceeds 20
2) On more than 20 days, at least one claim is received
31
Sampling distribution of difference of two independent populations
An important estimation problem involves the comparison of means of the two populations. For example, you may want to make comparisons like these: The average scores on GRE for students who
majored in mathematics versus chemistry The average income for male and female college
graduates The proportion of patients receiving different
medications who recovered from a certain disease
32
Sample distributions of difference of two independent sample means
Suppose there are two populations
Independent random samples of size n₁ and n₂ observations have been selected from the two populations with sample means and respectively
Recall that when n₁ and n₂ are large, and are approximately normally distributed with
Population Mean S.d.
I
II
12
12
1X
2X
1
1111 ,
nXXE
2
2222 ,
nXXE
1X 2X
33
Since the two samples are independent
Standardize:
2
22
1
21
XX
21XX
n
σ
n
σσ
21
21
2
22
1
21
2121
nσ
nσ
XXZ
34
Example
A light bulb factory operates two different types of machines. The mean life expectancy is 385 hours from machine I and 365 hours from machine II. The process standard deviation of life expectancy of machine I is 110 hours and of machine II is 120 hours.
What is the probability that the average life expectancy of a random sample of 100 light bulbs from Machine I is shorter than the average life expectancy of 100 light bulbs from Machine II?
35
Example (Cont’d) Note that
Therefore
120,110
365,385
100,100
21
21
21
nn
1093.023.128.16
20
100120
100110
36538500
2221
ZPZP
ZPXXP
36
Sampling distribution of difference of two independent sample proportions
Assume that independent random samples of n₁ and n₂ observations have been selected from binomial populations with parameters and , respectively.
The sampling distribution of the difference in sample proportions (p₁-p₂) can be approximated by a normal distribution with mean and standard deviation
The Z statistic is
2
22
1
11
21
)1()1(21
21
nnpp
pp
2
22
1
11
2121
)1()1(nn
ppZ
21
37
Example
From a study by the Charles Schwab Corporation, 74% of African Americans and 84% of Whites with an annual income above $50,000 owned stocks.
For a random sample of 500 African American and a random sample of 500 Whites with income above $50,000, what is the probability that more whites own stocks?
38
Example (Cont’d)
Summary data:
It follows that
84.0,74.0
500,500
21
21
nn
99995.0)91.3(
50016.084.0
50026.074.0
74.084.00012
ZP
ZPppP
39
Important Summary of sampling distributions
Param. Point estimate
Sampling distribution
Standardized Z
μ
21
21
X
n
N2
,
n
XZ
nN
1, n
pZ
1
21 XX
2
22
1
21
21 ,nn
N
2
22
1
21
2121
nn
XXZ
21 pp
1
11
1
1121
11,
nnN
2
22
1
11
2121
11nn
ppZ
p
41
Simple Random Samples
Every individual or item from the frame has an equal chance of being selected
Selection may be with replacement or without replacement
Samples obtained from table of random numbers or computer random number generators
Simple to use May not be a good representation of the population’s
underlying characteristics
42
Stratified Samples
Divide population into two or more subgroups (called strata)
according to some common characteristic
A simple random sample is selected from each subgroup, with
sample sizes proportional to strata sizes
Samples from subgroups are combined into one
Ensures representation of individuals across the entire population
Population
Divided
into 4
strata
Sample