ioc.pdf
SAMPLING DISTRIBUTION
David M. Lane. et al. Introduction to Statistics: Chapter 9
[email protected] ICY0006: Lecture 7 1 / 39
ioc.pdf
Contents
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 2 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 3 / 39
ioc.pdf
Probability densityDe�nition
A probability density function or density of a continuous random variable, is a function thatdescribes the relative likelihood for this random variable to take on a given value.
The analogue for discrete random variable:
De�nition
A probability mass function is a function that gives the probability that a discrete randomvariable is exactly equal to some value.
[email protected] ICY0006: Lecture 7 4 / 39
ioc.pdf
Probability as area under density curve
Any continuous probability distribution, areas under the density curve representprobabilities.
For any numbers a and b, the probability P(a< x < b) equals the area under the curvebetween a and b;
[email protected] ICY0006: Lecture 7 5 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 6 / 39
ioc.pdf
Inferential statistics
Inferential statistics allow the researcher to come to conclusions about a population on thebasis of descriptive statistics about a sample.
Inferential statistics includesI Making inferences based on sample data.I Con�dence intervals.I Margin of error.I Hypothesis testing.
[email protected] ICY0006: Lecture 7 7 / 39
ioc.pdf
Inferential statistics
Inferential statistics allow the researcher to come to conclusions about a population on thebasis of descriptive statistics about a sample.
Inferential statistics includesI Making inferences based on sample data.I Con�dence intervals.I Margin of error.I Hypothesis testing.
[email protected] ICY0006: Lecture 7 7 / 39
ioc.pdf
Inferential statistics
Inferential statistics allow the researcher to come to conclusions about a population on thebasis of descriptive statistics about a sample.
Inferential statistics includesI Making inferences based on sample data.I Con�dence intervals.I Margin of error.I Hypothesis testing.
Example
A sample shows that a candidate gets support from 47%.
Inferential statistics allow you to conclude that the candidate gets support from 47% ofthe population with a margin of error of ±4%.
This means that the support in the population is likely somewhere between 43% and 51%.
[email protected] ICY0006: Lecture 7 7 / 39
ioc.pdf
Random sampling
The samples here are of a same size, but have di�erent means
Recall that Simple Random Sampling (SRS) is a method of obtaining a sample from apopulation in which every member of the population has an equal chance of beingselected.
[email protected] ICY0006: Lecture 7 8 / 39
ioc.pdf
Population vs. samples
Population � what we want to talk about
Sample � what we have with our data
Sampling distribution � the means by which we will go from our sample to the population
A parameter (population parameter) isI a number that describe the population (For example, the population mean is a
parameter.)I �xed but unknown
A sample statistic isI a number that describe a sampleI known after we take a sampleI changing from sample to sampleI used to estimate an unknown parameter
Sampling distributions may concern any statistic:I Measures of Central TendencyI Measures of VariabilityI Measures of RelationshipI Ratios
[email protected] ICY0006: Lecture 7 9 / 39
ioc.pdf
Population vs. samples
Population � what we want to talk about
Sample � what we have with our data
Sampling distribution � the means by which we will go from our sample to the population
A parameter (population parameter) isI a number that describe the population (For example, the population mean is a
parameter.)I �xed but unknown
A sample statistic isI a number that describe a sampleI known after we take a sampleI changing from sample to sampleI used to estimate an unknown parameter
Sampling distributions may concern any statistic:I Measures of Central TendencyI Measures of VariabilityI Measures of RelationshipI Ratios
[email protected] ICY0006: Lecture 7 9 / 39
ioc.pdf
Population vs. samples
Population � what we want to talk about
Sample � what we have with our data
Sampling distribution � the means by which we will go from our sample to the population
A parameter (population parameter) isI a number that describe the population (For example, the population mean is a
parameter.)I �xed but unknown
A sample statistic isI a number that describe a sampleI known after we take a sampleI changing from sample to sampleI used to estimate an unknown parameter
Sampling distributions may concern any statistic:I Measures of Central TendencyI Measures of VariabilityI Measures of RelationshipI Ratios
[email protected] ICY0006: Lecture 7 9 / 39
ioc.pdf
Statistical estimation
Random variables are used to estimate a population parameter. The reason: goodsamples are chosen randomly, statistic such as x̄ are random variables.
The probability of any outcome of a random phenomenon is the proportion of times theoutcome will occur in the long run. Thus, we can describe the behavior of a samplestatistics by a probability model that answers the question
�What would happen if we do this many times?�
and
�What would happen if we take a big # of observations?�
[email protected] ICY0006: Lecture 7 10 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 11 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Let's try create a sampling distribution of means...
Take a sample of size 1,500 from the US. Record the mean income.
US census says the per capita income in past 12 months (in 2014) has been $28,555.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Say that the standard deviation of this distribution is $9,5K.
What are the odds you would get a sample mean that is more than $19K o�.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
A Sampling Distribution
Say that the standard deviation of this distribution is $9,5K.
What are the odds you would get a sample mean that is more than $19K o�.
[email protected] ICY0006: Lecture 7 12 / 39
ioc.pdf
Law of Large Number
Draw observations at random from any population with �nite mean µ. As the number ofobservations drawn increases, the sample mean of the observed values x̄ gets closer and closer tothe mean µ of the population.
Example: How sample means approach the population mean (µ = 25).
[email protected] ICY0006: Lecture 7 13 / 39
ioc.pdf
Comparison of population and sampledistributions
[email protected] ICY0006: Lecture 7 14 / 39
ioc.pdf
Sampling Distribution of the Mean x̄
When we choose many SRSs from a population, the sampling distribution of the samplemean is centred at the population mean µ and is less spread out than the populationdistribution.
NB! Here σ is standard deviation of the population.
The value σx̄ = σ/√n is called a standard error of the sampling distribution.
[email protected] ICY0006: Lecture 7 15 / 39
ioc.pdf
The Central Limit Theorem
Most population distributions are not Normal. What is the shape of the samplingdistribution of sample means when the population distribution isn't Normal?
It is a remarkable fact that as the sample size increases, the distribution of sample meanschanges its shape: it looks less like that of the population and more like a Normaldistribution!
When the sample is large enough, the distribution of sample means is very close toNormal, no matter what shape the population distribution has, as long as the populationhas a �nite standard deviation.
The Central Limit Theorem
For random sampling, as the sample size n grows, the sample mean x̄ is approximately Normal:
x̄ ≈N
(µ,
σ√n
)
[email protected] ICY0006: Lecture 7 16 / 39
ioc.pdf
The Central Limit Theorem
Consider the strange population distribution.
Describe the shape of the sampling distributions asn increases. What do you notice?
Normal Condition for Sample Means
If the population distribution is Normal, the so isthe sampling distribution of x̄ . This is no matterwhat the sample size n is.
If the population distribution is not Normal, the sois the sampling distribution of x̄ approximatelyNormal in most cases if n > 30.
[email protected] ICY0006: Lecture 7 17 / 39
ioc.pdf
In summary: sample means:
are random;
are normally distributed for large sample sizes;
distribution has mean µ;
distribution has standard error (standard deviation) σ√n.
[email protected] ICY0006: Lecture 7 18 / 39
ioc.pdf
Con�dence intervalsDraw a sample � it gives us a mean x̄ that is our best guess at µ (for most samples x̄ willbe close to µ);x̄ is a `point' estimate for the mean of the population.However, we can also give a range or interval estimate that takes into account theuncertainty involved in that point estimate.Con�dence interval equation is Limits = x̄±z(σx̄ )where x̄ is sample mean, z is value from normal curve, and σx̄ is standard error of the
mean.
95% con�dence interval
Let's say we want a 95% con�dence interval.
Obtain the `critical' z-score for p = 0.025
If p = 0.025, then z = 1.96
When the population standard deviation is notknown, we use the t critical value (we will discuss itlater on) instead Limits = x̄± t(sx̄ )
[email protected] ICY0006: Lecture 7 19 / 39
ioc.pdf
Con�dence intervalsDraw a sample � it gives us a mean x̄ that is our best guess at µ (for most samples x̄ willbe close to µ);x̄ is a `point' estimate for the mean of the population.However, we can also give a range or interval estimate that takes into account theuncertainty involved in that point estimate.Con�dence interval equation is Limits = x̄±z(σx̄ )where x̄ is sample mean, z is value from normal curve, and σx̄ is standard error of the
mean.
95% con�dence interval
Let's say we want a 95% con�dence interval.
Obtain the `critical' z-score for p = 0.025
If p = 0.025, then z = 1.96
When the population standard deviation is notknown, we use the t critical value (we will discuss itlater on) instead Limits = x̄± t(sx̄ )
[email protected] ICY0006: Lecture 7 19 / 39
ioc.pdf
Con�dence intervalsDraw a sample � it gives us a mean x̄ that is our best guess at µ (for most samples x̄ willbe close to µ);x̄ is a `point' estimate for the mean of the population.However, we can also give a range or interval estimate that takes into account theuncertainty involved in that point estimate.Con�dence interval equation is Limits = x̄±z(σx̄ )where x̄ is sample mean, z is value from normal curve, and σx̄ is standard error of the
mean.
95% con�dence interval
Let's say we want a 95% con�dence interval.
Obtain the `critical' z-score for p = 0.025
If p = 0.025, then z = 1.96
When the population standard deviation is notknown, we use the t critical value (we will discuss itlater on) instead Limits = x̄± t(sx̄ )
[email protected] ICY0006: Lecture 7 19 / 39
ioc.pdf
Con�dence interval example
Based on service records from the past year, the time (in hours) that a technician requires tocomplete preventative maintenance on an air conditioner follows the distribution that is stronglyright-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour andthe standard deviation is σ = 1.
Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hoursper unit. Will this be enough?
The central limit theorem states that the sampling distribution of the mean time spentworking on the 70 units has:
µx̄ = µ = 1 σx̄ =σ√n
=σ√70
= 0.12
The sampling distribution of the mean time spent working is approximately N(1,0.12)since n = 70> 30.
z = 1.1−10.12 = 0.83 P(x̄ > 1.1) = P(z > 0.83) = 1−0.7967 = 0.2033
If you budget 1.1 hours per unit, there is a 20% chance the technicians will not completethe work within the budgeted time.
[email protected] ICY0006: Lecture 7 20 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 21 / 39
ioc.pdf
Sampling distribution of di�erence between means
Statistical analyses are very often concerned with the di�erence between means.
A typical example is an experiment designed to compare the mean of a control group withthe mean of an experimental group.
It is generally assumed that the two populations are normally distributed:Population1∼N (µ1,σ
2
1) and Population2∼N (µ2,σ
2
2).
The sampling distribution of the di�erence between means can be thought of as the distributionthat would result if we repeated the following three steps over and over again:
1 a sample M1 of n1 scores from Population 1 and a sample M2 of n2 scores fromPopulation 2;
2 compute the means of the two samples (µM1and µM2
);
3 compute the di�erence between means, µM1−µM2
.
The distribution of the di�erences between means is the sampling distribution of the di�erencebetween means.
[email protected] ICY0006: Lecture 7 22 / 39
ioc.pdf
Mean of the Distribution
the mean of the sampling distribution of the di�erence between means is:
µM1−M2= µM1
−µM2= µ1−µ2
Example
The mean test score of all 12-year-olds in a population is µ1 = 34
The mean of 10-year-olds is µ2 = 25.
If numerous samples were taken from each age group and the mean di�erence computedeach time, the mean of these numerous di�erences between sample means would beµM1−M2
= 34−25 = 9.
[email protected] ICY0006: Lecture 7 23 / 39
ioc.pdf
Variance of the Distribution
From the variance sum law, we know that σ2
M1−M2= σ2
M1+ σ2
M2
The variance of the sampling distribution of the mean is σ2
M = σ2/N
Thus, the formula for the variance of the sampling distribution of the di�erence betweenmeans is:
σ2
M1−M2=
σ2
1
n1+
σ2
2
n2
The standard error of the di�erence between means is:
σM1−M2=
√σ2
1
n1+
σ2
2
n2
[email protected] ICY0006: Lecture 7 24 / 39
ioc.pdf
Variance of the Distribution (2)
Example
In a study of annual family expenditures for household expenditures, two populations weresurveyed with the following results:
Population 1: n1 = 40, µM1= 346 e
Population 2: n2 = 35, µM2= 300 e
If the variances of the populations are σ2
1= 2800 and σ2
2= 3250, what is the probability that
the mean expenditures of the Population 1 is 30 euros greater than of the Population 2?
The sampling distribution of the di�erence between means:
µM1−M2= 346−300 = 46
The standard error is:
σ2
M1−M2=
√2800
40+
3250
35=√70+92.86 = 12,76
[email protected] ICY0006: Lecture 7 25 / 39
ioc.pdf
Variance of the Distribution (2)
Example
In a study of annual family expenditures for household expenditures, two populations weresurveyed with the following results:
Population 1: n1 = 40, µM1= 346 e
Population 2: n2 = 35, µM2= 300 e
If the variances of the populations are σ2
1= 2800 and σ2
2= 3250, what is the probability that
the mean expenditures of the Population 1 is 30 euros greater than of the Population 2?
The probability that the mean of the expenditures of thePopulation 1 exceed the expenditures of the Population 2 by30 euros or more is equal to the shaded area below thedistribution function.
The area cn be computed by the following R commands:
> pnorm(30,mean=46,sd=12.76, lower.tail = FALSE)
[1] 0.8950642
[email protected] ICY0006: Lecture 7 26 / 39
ioc.pdf
Equal Sample SizesThe formula for standard error simpli�es with equal sample sizes
σM1−M2=
√σ2
1
n1+
σ2
2
n2=
√σ2
n+
σ2
n=
√2σ2
n
15 Year-old Beings From Earth
8 girlsMean height: 165
Variance of Height: 64
8 boysMean height: 175
Variance of height: 64
The sampling distribution of the di�erence between means:
µM1−M2= 165−175 =−10
The standard error is:
σM1−M2=
√2σ2
n=
√2 ·648
= 4
[email protected] ICY0006: Lecture 7 27 / 39
ioc.pdf
Equal Sample Sizes (2)
What is the probability that the di�erence between mean the mean height for girls would behigher than the mean height for boys?
µM1−M2=−10
σM1−M2= 4
Shaded area = 0.0062
[email protected] ICY0006: Lecture 7 28 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 29 / 39
ioc.pdf
Sampling Distribution of the Pearson's r
Let a population correlation ρ = 0.60
Di�erent samples yield di�erent values of r
The distribution of r after repeated samples is the sampling distribution
Shapes of the distribution
ρ = 0.60 and N = 12 ρ = 0.90 and N = 12
[email protected] ICY0006: Lecture 7 30 / 39
ioc.pdf
Fisher's z ′-transformation
For a true comparison, we need the variables would be normally distributed, but r is not.
Fisher's z ′-transformation provides related normally distributed variable z ′.
The transformation:
z ′ = 0.5 ln1+ r
1− r
Standard error of z ′ is1√
N−3
where N is the number of pairs of sources.
Example: the probability that in a sample of 12 students, the sample value of r would be0.75 or higher
ρ = 0.60, N = 12, r > 0.75
0.60→ 0.693
0.75→ 0.97
Standard error:1√
12−3=
1√9
=1
3= 0.333
[email protected] ICY0006: Lecture 7 31 / 39
ioc.pdf
Fisher's z ′-transformation
For a true comparison, we need the variables would be normally distributed, but r is not.
Fisher's z ′-transformation provides related normally distributed variable z ′.
The transformation:
z ′ = 0.5 ln1+ r
1− r
Standard error of z ′ is1√
N−3
where N is the number of pairs of sources.
Example: the probability that in a sample of 12 students, the sample value of r would be0.75 or higher
ρ = 0.60, N = 12, r > 0.75
0.60→ 0.693
0.75→ 0.97
Standard error:1√
12−3=
1√9
=1
3= 0.333
[email protected] ICY0006: Lecture 7 31 / 39
ioc.pdf
Probability of obtaining an r above a speci�edvalue
Calculation of the probability of obtaining an r above the value 0.973:
Mean = 0.693
Sd = 0.333
Shaded area: 0.200219
[email protected] ICY0006: Lecture 7 32 / 39
ioc.pdf
Next section
1 Probability density, distribution
2 Inferential statistics
3 Sampling Distribution of the Mean
4 Sampling Distribution of Di�erence Between Means
5 Sampling Distribution of the Pearson's r
6 Sampling Distribution of a Proportion
[email protected] ICY0006: Lecture 7 33 / 39
ioc.pdf
Proportions
Candidate A
Favoured by 0.60
Candidate B
Favoured by 0.40
[email protected] ICY0006: Lecture 7 34 / 39
ioc.pdf
[email protected] ICY0006: Lecture 7 35 / 39
ioc.pdf
Relation to Sampling Distribution of the Mean
Here p = 7/10 = 0.7
Binomial distribution is the distribution of the number of successes
The sampling distribution of p is the distribution of the proportion of successes (Do notconfuse it with the population proportion π.)
[email protected] ICY0006: Lecture 7 36 / 39
ioc.pdf
Mean and Standard Deviation
Mean of binomialµ = Nπ
Mean of sampling distribution of pµp = π
Standard deviation of binomialσ =
√Nπ(1−π)
Standard deviation of sampling distribution of p
σp =
√Nπ(1−π)
N=
√π(1−π)
N
[email protected] ICY0006: Lecture 7 37 / 39
ioc.pdf
Mean and Standard Deviation
Mean of binomialµ = Nπ
Mean of sampling distribution of pµp = π
Standard deviation of binomialσ =
√Nπ(1−π)
Standard deviation of sampling distribution of p
σp =
√Nπ(1−π)
N=
√π(1−π)
N
[email protected] ICY0006: Lecture 7 37 / 39
ioc.pdf
Back to the Example
π = 0.60
p = 0.70
σp =
√0.6(1−0.6)
10= 0.155
[email protected] ICY0006: Lecture 7 38 / 39
ioc.pdf
The sampling distribution of p is approximately normally distributed if N is fairly large andπ is not close to 0 or 1.
The rule of thumb is that the approximation is good if both Nπ and N(1−π) are greaterthan 10.
[email protected] ICY0006: Lecture 7 39 / 39
Top Related