9 - 65 Is there a familiar pattern to the variability of ? As the sample size becomes larger, the...
-
Upload
marianna-marchand -
Category
Documents
-
view
213 -
download
0
Transcript of 9 - 65 Is there a familiar pattern to the variability of ? As the sample size becomes larger, the...
9 - 1
Is there a familiar pattern to the variability of ?
• As the sample size becomes larger, the distribution of the sample mean becomes closer to a normal distribution, regardless of the population from which the sample is drawn.
• The central limit theorem by Polya (1920’s) is a very important theorem which states that the distribution of the sample mean is Normal
x
9 - 2
Central Limit Theorem
If a sufficiently large random sample (i.e. n > 30) is drawn from a population with mean, , and variance, 2, the distribution of the sample mean will have the following characteristics:
1. an approximately normal distribution regardless of the distribution of the underlying population.
2.
3.
X
E X ( )
x n2
2
9 - 3
Example 7
Suppose the random variable X has a mean of 50 and a standard deviation of 10.
Calculate the mean and the standard deviation of the sample mean (standard error) for each of following sample sizes: (Assume the population is infinite.)
a. n=40
b. n=55
c. n=100
d. What are the sizes of the standard deviation of the sample mean (Standard errors) as the sample size increases?
9 - 4
Example 7 - Solution
We are given that X has = 50 and = 10 and the population is infinite. SE= / n
a.
b.
X
n 10 40 1 5811.
X
50
X
n 10 55 1 3484.
X
50
9 - 5
Example 7 - Solution
c.
d. It decreases–reflecting the additional information provided by a larger sample size.
Summary
n = 40 = 1.5811
n = 50 = 1.3484
n = 100 = 1
X
n 10 100 1
X
50
X
X
X
9 - 6
Importance of the Central Limit Theorem
• The most important feature of this theorem is that it can be applied to any population.
• Because the theorem does not have any distribution assumptions, it is widely applicable and is one of the cornerstones of statistical inference.
9 - 7
Central Limit Theorem and Sample Size
• The only restrictive feature of the theorem is that the sample size must be sufficiently large for the theorem to be applicable.
• Even if the distribution of the population deviates substantially from the normal distribution, a sample size of 30 will usually be sufficiently large to produce a sampling distribution for that is approximately normal.
x
9 - 8
Distribution Shapes
•
exponentialpopulation
Population Distribution
Distribution of the Sample Mean for Large Samples
Bimodal Population
Exponential Population
9 - 9
Distribution Shapes
• Population Distribution
Distribution of the Sample Mean for Large Samples
Normal Population
Uniform Population
uniformpopulation
normalpopulation
9 - 10
Example 8
Suppose a sample of size 40 is drawn from a population that has a mean of 276 and a variance of 81.
What is the probability that the mean of the sample will be less than 273?
9 - 11
Example 8 - Solution
We are given that a sample of size n = 40 is drawn from a population that has = 276 and .
By the CLT, has a normal distribution with
= = 81 9
X
X
276,
X
n 9 40 1.423.
9 - 12
Example 8 - Solution
9 - 13
Example 8 - Solution
P( < 273) = P( < )
= P(z < -2.11) = .5 - P(-2.11 < z < 0)
= .5 -.4826 = .0174
XX
X
X
273 276
1.423
z
x
9 - 14
Example 9
Suppose there is a normally distributed population with a mean of 100 and a standard deviation of 10.
If is the average of a sample of 50, find the following probabilities.
a.
b.
c.
X
P X( 103)
P X( 96)
P X(95 103)
9 - 15
Example 9 - Solution
We are given that X has a normal distribution with = 100 and = 10 and n = 50.
By the CLT, has a normal distribution with
X
X
100 ,
X
n 10 50 1.4142.
9 - 16
Example 9 - Solution
P ( 103) = P( )
= P(z 2.12) = .5 + P(0 < z < 2.12)
.5 + .4830 = .9830
P ( 96) = P( )
= P(z -2.83) = .5 + P(-2.83 < z < 0)
.5 + .4977 = .9977
XX
X
103 100
1.4142X
XX
X
96 100
1.4142
X
a.
b.
9 - 17
Example 9 - Solution
P (95 103)
= P( )
= P(-3.54 z 2.12)
= P(-3.54 < z < 0) + P(0 < z < 2.12)
.5 + .4830 = .9830
c.
95 1001.4142
XX
X
X
103 1001.4142
9 - 18
Example 10
A travel agency conducted a survey of the prices charged by ocean cruise ship lines and determined they were approximately normally distributed with a mean of $110 per day and a standard deviation of $20 per day.
9 - 19
Example 10 - Questions
1. If an ocean cruise ship line is chosen at random, find the probability that they will charge less than $99 per day?
2. What is the probability that the average charge for a randomly selected sample of 35 ocean cruise shop lines will be less than $99 per day?
9 - 20
Example 10 - Solution
P(X < 99) = P( < )
= P(z < -.55)
= .5 - P(-.55 < z < 0)
= .5 - .2088 = .2912
X
99 11020
1.
9 - 21
Example 10 - Solution
By the CLT, has a normal distribution with
P( < 99) = P( )
= P(z -3.25) = .5 - P(-3.25 < z < 0)
= .5 - .4994 = .0006
2.X
X
110 ,
X
n 20 35 3.381.
XX
X
99 110
3.381X
9 - 22
The Distribution of the Sample Proportion
9 - 23
Proportions
• There are many instances in which the variable of interest is a proportion.
• Examples:
– A marketing researcher may be interested in what proportion of persons on a mailing list will buy their product.
– A college is concerned with the fraction of freshmen that will be in academic difficulty after the first year.
9 - 24
Population Proportions and Sample Proportions
• Population proportions must be estimated just like population means.
• The sample proportion is a reasonable estimate of the population proportion.
• Sample proportions vary depending on the selected samples.
9 - 25
Symbols
The symbols used to represent the population and sample proportions are
p - population proportion,
- sample proportion.p
9 - 26
How do you determine a sample proportion?
When calculating a proportion, the number in the sample that possesses the characteristic of interest goes in the numerator, and the size of the sample is placed in the denominator.
where x is the number in the sample possessing the characteristic of interest
p = xn
9 - 27
What is the central value of ?
• The expected value (mean) of the sample proportion is the population proportion.
E( ) = p
• Since the expected value of the estimator is equal to p, then is an unbiased estimator of p.
p
p
pp
9 - 28
What is the variance of ?
• The variance of is given by
• If the population proportion is unknown (which is usually the case), p can be estimated by , and the variance of the sample proportion is estimated as
p
p
( )
.p2
p p1
n
( ).
p2
p p1
n
p
9 - 29
Is there a familiar pattern to the variability of ?
• The sampling distribution of approaches normality as n becomes sufficiently large.
• The sample size is generally considered “sufficiently large” if np 5 and n(1-p) 5.
p
p
Sampling Distribution
of pp
p
9 - 30
Sampling distribution of the Sample Proportion
If the population is infinite and the sample is sufficiently large, the distribution of has the following characteristics:
1. an approximately normal distribution.
2.
3.
( ) .p
E p p
p
.
p2
p p p p(1 )
n
(1 )
n
9 - 31
Sampling Distribution of the Sample Proportion
If the population is finite and the sample is sufficiently large, the distribution of has the following characteristics:
1. an approximately normal distribution.
2.
3.
where N is the size of the population.
p
( ) .p
E p p
9 - 32
Since is a good estimator of p ...
Can limits be established for the error in estimation?
Since the sampling distribution of is known, determining probabilities for various errors of estimation can be determined.
p
p
9 - 33
Example 11
A random sample of 100 employees of a large steel company has 30 females and 70 males.
1. Find the sample proportion of female employees.
2. Find the sample proportion of male employees.
9 - 34
Example 11 - Solution
1.
2.
p = 30
100 = .30
p = 1 - (30
100) = .70
9 - 35
Example 12
Suppose that the true proportion of Americans over 25 years old that have a 4 year college degree is .35.
Find the mean and the standard deviation of the sample proportion for samples of the following sizes.
a. n = 38
b. n = 52
c. n = 75
d. What happens to the size of the standard deviation of the sample proportion as the sample size increases?
9 - 36
Example 12 - Solution
p p .35
p
p p(1 )
n
.35(1 .35)
38.0774
p p .35
p
p p(1 )
n
.35(1 .35)
52.0661
a.
b.
9 - 37
Example 12 - Solution
It decreases–reflecting the additional information provided by the larger sample size.
c.
p p .35
p
p p(1 )
n
.35(1 .35)
75.0551
d.
9 - 38
Example 13
Suppose that the true population proportion, p = .30.
What is the probability that the sample proportion of a sample of size 30 will be less than .20?
9 - 39
Example 13 - Solution
has an approximately normal distribution because
np = (30)(.3) = 9, and
n(1 - p) = (30)(.7) = 21
are both greater than or equal to 5.
p p .30
p
p p(1 )
n
.3(1 .3)
30.08367
p
9 - 40
Example 13 ans
• Zstat= (0.2-0.3)/0.08367
• =-1.195172
• Rounded to -1.20
• Area 0 to 1.20 in Table A is 0.3849
• Tail area =0.5-0.3849
• =0.1151 this is the area in the left tail
9 - 41
Example 14
• The property manager of a large office building would like to make the building smoke free; however, he does not want to upset too many of his customers.
• He decides to randomly select 50 of the workers in the building and ask them whether or not they smoke.
• If the sample proportion of workers who smoke is less than .30, the property manager will make the building smoke free.
9 - 42
Example 14
1. Find the probability that the property manager will make the building smoke free when the true proportion of smokers is .50.
2. Find the probability that the property manager will not make the building smoke free when the true proportion of smokers is .20.
9 - 43
Example 14 - Solution
Because
np = (50)(.50) = 25 and
n(1-p)=(50)(.50) = 25
are both greater than or equal to 5, we can assume that has an approximately normal distribution with
p
p p .50,
p
p p(1 )
n
.5(1 .5)
50.0707.
1.
9 - 44
Example 14 - Solution
The property manager will make the building smoke free if is less than .30.
P( < .30) = P( < )
= P(z < -2.83)
= .5 - P(-2.83 < z < 0)
= .5 - .4977 = .0023
1.
p
p
pp
p
.3 .5
.0707
9 - 45
Example 14 - Solution
Because
np = (50)(.20) = 10 and
n(1-p)=(50)(.80) = 40
are both greater than or equal to 5, we can assume that has an approximately normal distribution with
2.
p
p p .20,
p
p p(1 )
n
.2(1 .2)
50.0566.
9 - 46
Example 14 - Solution
The property manager will not make the building smoke free if is greater than .30.
P( > .30) = P( > )
= P(z > 1.77)
= .5 - P(0 < z < 1.77)
= .5 - .4616 = .0384
2.
p
pp
p
.3 .2
.0566
p
9 - 47
Other Forms of Sampling
9 - 48
Probability Samples
• Probability samples enable an analyst to determine the probable errors that an estimator might generate.
• They allow the analyst a known degree of confidence in their estimation.
• All statistical inference relies on probability sampling.
9 - 49
Types of Probability Samples
• Cluster sampling involves dividing the population into clusters, and randomly selecting a sample of clusters to represent the population.
• In stratified sampling, the population is divided into strata, which are sub-populations.
• A strata can be any identifiable characteristic that can be used to classify the population.
• If the population consisted of people, then strata could be sex, income, political party, religion, education, race, and location.
9 - 50
Pros and Cons of Cluster Sampling
• Cluster sampling can be as effective as simple random sampling if the clusters are as heterogeneous as the population; however, clusters are almost never as diverse as the population.
• Smaller cluster sizes will result in more representative samples.
• Cluster sampling simplifies the task of constructing the sampling frame, since the initial frame is composed only of clusters.
9 - 51
Stratified Sampling
Stratified sampling can provide greater accuracy if the population is heterogeneous, and sub-populations of the population can be identified that are relatively homogeneous.
9 - 52
Non-probability Samples
• Non-probability samples are a convenient means of obtaining sample data.
• If data from a non-probability sample is used to estimate a population parameter, there is no statistical theory that helps define the potential error of the estimate and hence no statement about an estimate’s reliability can ba made.
9 - 53
Types of Non-probability
Samples
• A judgment sample is a sample in which sample values are selected by an expert in the field.
• A convenience sample is a convenient group of observations.
• One of the worst forms of non-probability samples is the voluntary or self-selected sample.
9 - 54
Almost Random Samples
• The systematic sample, does not clearly belong to probability or non-probability samples.
• In a systematic sample, every kth member of the population is included in the sample.
• Note: If there is some pattern in the sampling frame that corresponds to the sampling pattern, an unrepresentative sample may result.
9 - 55
Example 15 (a - c)
A social researcher in Florida wants to determine the average number of children per family in the state.
a. What is the population of interest?
b. What variable will be measured?
c. What level of measurement is the variable of interest?
9 - 56
Example 15 (a - c) Solution
a. Population - families in the state of Florida
b. Variable measured - number of children per family
c. Level of measurement - ratio
9 - 57
Example 15 (d)
d. What are the steps that would be necessary for each of the following sampling methods:
1. Simple random sampling
2. Cluster sampling
3. Stratified sampling
9 - 58
Example 15 (d) Solution
1. Simple Random Sample -
– List all families in the state of Florida (perhaps from a census, phone books, tax returns etc.
– Assign sequential numbers to all of the families (1 to N).
– Select n random numbers between 1 and N from a random number table (or generate these).
– Select the families corresponding to the random numbers.
9 - 59
Example 15 (d) Solution
2. Cluster Sampling -
– e.g. Take a map and divide the state of Florida into 1000 regions.
– Number the regions from 1 to 1000.
– Select n random numbers between 1 and 1000.
– Select the n regions corresponding to the random numbers.
– Survey every family in the region indicated by the random numbers.
9 - 60
Example 15 (d) Solution
3. Stratified Sampling -
– e.g. Separate all families in the state by income level.
– Number each family within the income level.
– Select e.g. 100 random numbers for each income level.
– Select the 100 families for each income level indicated by the random numbers.
9 - 61
Example 15 (e)
What sampling method do you believe would be most cost effective?
9 - 62
Example 15 (e) Solution
The most cost effective method would be cluster sampling.
9 - 63
Example 16
• A biology professor is interested in the proportion of students at his college who are pre-med. majors.
• In his next class he asks the students who are pre-med. majors to raise their hands.
• Fifty percent of the students raise their hands.
9 - 64
Example 16
1. What type of sampling technique was used for this survey?
2. What type of biases may be present in the responses?
3. Is 50% a reasonable point estimate of the proportion of students at the college who are pre-med. majors? Explain.
9 - 65
Example 16 - Solution
1. Convenience
2. If the Biology course is a required course for all majors, then there may be a larger proportion of freshmen and sophomores in the class than in the college population as a whole.
9 - 66
Example 16 - Solution
2. If the Biology course is not a required course for all majors, then there may be a larger proportion of students in the class who are in majors which require the course, than in the college population as a whole.
3. No. For the reasons cited in part 2.