Download - SAMPLING DISTRIBUTION - cs.ioc.ee distribution.pdf · Sampling Distribution of the Mean x When we choose many SRSs from a population, the sampling distribution of the sample mean

ioc.pdf

SAMPLING DISTRIBUTION

David M. Lane. et al. Introduction to Statistics: Chapter 9

[email protected] ICY0006: Lecture 7 1 / 39

ioc.pdf

Contents

1 Probability density, distribution

2 Inferential statistics

3 Sampling Distribution of the Mean

4 Sampling Distribution of Di�erence Between Means

5 Sampling Distribution of the Pearson's r

6 Sampling Distribution of a Proportion


ioc.pdf

Next section








ioc.pdf

Probability densityDe�nition

A probability density function or density of a continuous random variable, is a function thatdescribes the relative likelihood for this random variable to take on a given value.

The analogue for discrete random variable:

De�nition

A probability mass function is a function that gives the probability that a discrete randomvariable is exactly equal to some value.


ioc.pdf

Probability as area under density curve

Any continuous probability distribution, areas under the density curve representprobabilities.

For any numbers a and b, the probability P(a< x < b) equals the area under the curvebetween a and b;


ioc.pdf

Next section








ioc.pdf

Inferential statistics

Inferential statistics allow the researcher to come to conclusions about a population on thebasis of descriptive statistics about a sample.

Inferential statistics includesI Making inferences based on sample data.I Con�dence intervals.I Margin of error.I Hypothesis testing.


ioc.pdf

Inferential statistics

Inferential statistics allow the researcher to come to conclusions about a population on thebasis of descriptive statistics about a sample.

Inferential statistics includesI Making inferences based on sample data.I Con�dence intervals.I Margin of error.I Hypothesis testing.

Example

A sample shows that a candidate gets support from 47%.

Inferential statistics allow you to conclude that the candidate gets support from 47% ofthe population with a margin of error of ±4%.

This means that the support in the population is likely somewhere between 43% and 51%.


ioc.pdf

Random sampling

The samples here are of a same size, but have di�erent means

Recall that Simple Random Sampling (SRS) is a method of obtaining a sample from apopulation in which every member of the population has an equal chance of beingselected.


ioc.pdf

Population vs. samples

Population � what we want to talk about

Sample � what we have with our data

Sampling distribution � the means by which we will go from our sample to the population

A parameter (population parameter) isI a number that describe the population (For example, the population mean is a

parameter.)I �xed but unknown

A sample statistic isI a number that describe a sampleI known after we take a sampleI changing from sample to sampleI used to estimate an unknown parameter

Sampling distributions may concern any statistic:I Measures of Central TendencyI Measures of VariabilityI Measures of RelationshipI Ratios


ioc.pdf

Statistical estimation

Random variables are used to estimate a population parameter. The reason: goodsamples are chosen randomly, statistic such as x̄ are random variables.

The probability of any outcome of a random phenomenon is the proportion of times theoutcome will occur in the long run. Thus, we can describe the behavior of a samplestatistics by a probability model that answers the question

�What would happen if we do this many times?�

and

�What would happen if we take a big # of observations?�


ioc.pdf

Next section








ioc.pdf

A Sampling Distribution

Let's try create a sampling distribution of means...

Take a sample of size 1,500 from the US. Record the mean income.

US census says the per capita income in past 12 months (in 2014) has been $28,555.


ioc.pdf

A Sampling Distribution

Say that the standard deviation of this distribution is $9,5K.

What are the odds you would get a sample mean that is more than $19K o�.


ioc.pdf

Law of Large Number

Draw observations at random from any population with �nite mean µ. As the number ofobservations drawn increases, the sample mean of the observed values x̄ gets closer and closer tothe mean µ of the population.

Example: How sample means approach the population mean (µ = 25).


ioc.pdf

Comparison of population and sampledistributions


ioc.pdf

Sampling Distribution of the Mean x̄

When we choose many SRSs from a population, the sampling distribution of the samplemean is centred at the population mean µ and is less spread out than the populationdistribution.

NB! Here σ is standard deviation of the population.

The value σx̄ = σ/√n is called a standard error of the sampling distribution.


ioc.pdf

The Central Limit Theorem

Most population distributions are not Normal. What is the shape of the samplingdistribution of sample means when the population distribution isn't Normal?

It is a remarkable fact that as the sample size increases, the distribution of sample meanschanges its shape: it looks less like that of the population and more like a Normaldistribution!

When the sample is large enough, the distribution of sample means is very close toNormal, no matter what shape the population distribution has, as long as the populationhas a �nite standard deviation.


For random sampling, as the sample size n grows, the sample mean x̄ is approximately Normal:

x̄ ≈N

(µ,

σ√n

)


ioc.pdf


Consider the strange population distribution.

Describe the shape of the sampling distributions asn increases. What do you notice?

Normal Condition for Sample Means

If the population distribution is Normal, the so isthe sampling distribution of x̄ . This is no matterwhat the sample size n is.

If the population distribution is not Normal, the sois the sampling distribution of x̄ approximatelyNormal in most cases if n > 30.


ioc.pdf

In summary: sample means:

are random;

are normally distributed for large sample sizes;

distribution has mean µ;

distribution has standard error (standard deviation) σ√n.


ioc.pdf

Con�dence intervalsDraw a sample � it gives us a mean x̄ that is our best guess at µ (for most samples x̄ willbe close to µ);x̄ is a `point' estimate for the mean of the population.However, we can also give a range or interval estimate that takes into account theuncertainty involved in that point estimate.Con�dence interval equation is Limits = x̄±z(σx̄ )where x̄ is sample mean, z is value from normal curve, and σx̄ is standard error of the

mean.

95% con�dence interval

Let's say we want a 95% con�dence interval.

Obtain the `critical' z-score for p = 0.025

If p = 0.025, then z = 1.96

When the population standard deviation is notknown, we use the t critical value (we will discuss itlater on) instead Limits = x̄± t(sx̄ )


ioc.pdf

Con�dence interval example

Based on service records from the past year, the time (in hours) that a technician requires tocomplete preventative maintenance on an air conditioner follows the distribution that is stronglyright-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour andthe standard deviation is σ = 1.

Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hoursper unit. Will this be enough?

The central limit theorem states that the sampling distribution of the mean time spentworking on the 70 units has:

µx̄ = µ = 1 σx̄ =σ√n

=σ√70

= 0.12

The sampling distribution of the mean time spent working is approximately N(1,0.12)since n = 70> 30.

z = 1.1−10.12 = 0.83 P(x̄ > 1.1) = P(z > 0.83) = 1−0.7967 = 0.2033

If you budget 1.1 hours per unit, there is a 20% chance the technicians will not completethe work within the budgeted time.


ioc.pdf

Next section








ioc.pdf

Sampling distribution of di�erence between means

Statistical analyses are very often concerned with the di�erence between means.

A typical example is an experiment designed to compare the mean of a control group withthe mean of an experimental group.

It is generally assumed that the two populations are normally distributed:Population1∼N (µ1,σ

2

1) and Population2∼N (µ2,σ

2

2).

The sampling distribution of the di�erence between means can be thought of as the distributionthat would result if we repeated the following three steps over and over again:

1 a sample M1 of n1 scores from Population 1 and a sample M2 of n2 scores fromPopulation 2;

2 compute the means of the two samples (µM1and µM2

);

3 compute the di�erence between means, µM1−µM2

.

The distribution of the di�erences between means is the sampling distribution of the di�erencebetween means.


ioc.pdf

Mean of the Distribution

the mean of the sampling distribution of the di�erence between means is:

µM1−M2= µM1

−µM2= µ1−µ2

Example

The mean test score of all 12-year-olds in a population is µ1 = 34

The mean of 10-year-olds is µ2 = 25.

If numerous samples were taken from each age group and the mean di�erence computedeach time, the mean of these numerous di�erences between sample means would beµM1−M2

= 34−25 = 9.


ioc.pdf

Variance of the Distribution

From the variance sum law, we know that σ2

M1−M2= σ2

M1+ σ2

M2

The variance of the sampling distribution of the mean is σ2

M = σ2/N

Thus, the formula for the variance of the sampling distribution of the di�erence betweenmeans is:

σ2

M1−M2=

σ2

1

n1+

σ2

2

n2

The standard error of the di�erence between means is:

σM1−M2=

√σ2

1

n1+

σ2

2

n2


ioc.pdf

Variance of the Distribution (2)

Example

In a study of annual family expenditures for household expenditures, two populations weresurveyed with the following results:

Population 1: n1 = 40, µM1= 346 e


If the variances of the populations are σ2

1= 2800 and σ2

2= 3250, what is the probability that

the mean expenditures of the Population 1 is 30 euros greater than of the Population 2?

The sampling distribution of the di�erence between means:

µM1−M2= 346−300 = 46

The standard error is:

σ2

M1−M2=

√2800

40+

3250

35=√70+92.86 = 12,76


ioc.pdf

Variance of the Distribution (2)

Example

In a study of annual family expenditures for household expenditures, two populations weresurveyed with the following results:



If the variances of the populations are σ2

1= 2800 and σ2

2= 3250, what is the probability that

the mean expenditures of the Population 1 is 30 euros greater than of the Population 2?

The probability that the mean of the expenditures of thePopulation 1 exceed the expenditures of the Population 2 by30 euros or more is equal to the shaded area below thedistribution function.

The area cn be computed by the following R commands:

> pnorm(30,mean=46,sd=12.76, lower.tail = FALSE)

[1] 0.8950642


ioc.pdf

Equal Sample SizesThe formula for standard error simpli�es with equal sample sizes

σM1−M2=

√σ2

1

n1+

σ2

2

n2=

√σ2

n+

σ2

n=

√2σ2

n

15 Year-old Beings From Earth

8 girlsMean height: 165

Variance of Height: 64

8 boysMean height: 175

Variance of height: 64

The sampling distribution of the di�erence between means:

µM1−M2= 165−175 =−10

The standard error is:

σM1−M2=

√2σ2

n=

√2 ·648

= 4


ioc.pdf

Equal Sample Sizes (2)

What is the probability that the di�erence between mean the mean height for girls would behigher than the mean height for boys?

µM1−M2=−10

σM1−M2= 4

Shaded area = 0.0062


ioc.pdf

Next section








ioc.pdf

Sampling Distribution of the Pearson's r

Let a population correlation ρ = 0.60

Di�erent samples yield di�erent values of r

The distribution of r after repeated samples is the sampling distribution

Shapes of the distribution

ρ = 0.60 and N = 12 ρ = 0.90 and N = 12


ioc.pdf

Fisher's z ′-transformation

For a true comparison, we need the variables would be normally distributed, but r is not.

Fisher's z ′-transformation provides related normally distributed variable z ′.

The transformation:

z ′ = 0.5 ln1+ r

1− r

Standard error of z ′ is1√

N−3

where N is the number of pairs of sources.

Example: the probability that in a sample of 12 students, the sample value of r would be0.75 or higher

ρ = 0.60, N = 12, r > 0.75

0.60→ 0.693

0.75→ 0.97

Standard error:1√

12−3=

1√9

=1

3= 0.333


ioc.pdf

Probability of obtaining an r above a speci�edvalue

Calculation of the probability of obtaining an r above the value 0.973:

Mean = 0.693

Sd = 0.333

Shaded area: 0.200219


ioc.pdf

Next section








ioc.pdf

Proportions

Candidate A

Favoured by 0.60

Candidate B

Favoured by 0.40


ioc.pdf


ioc.pdf

Relation to Sampling Distribution of the Mean

Here p = 7/10 = 0.7

Binomial distribution is the distribution of the number of successes

The sampling distribution of p is the distribution of the proportion of successes (Do notconfuse it with the population proportion π.)


ioc.pdf

Mean and Standard Deviation

Mean of binomialµ = Nπ

Mean of sampling distribution of pµp = π

Standard deviation of binomialσ =

√Nπ(1−π)

Standard deviation of sampling distribution of p

σp =

√Nπ(1−π)

N=

√π(1−π)

N


ioc.pdf

Back to the Example

π = 0.60

p = 0.70

σp =

√0.6(1−0.6)

10= 0.155


ioc.pdf

The sampling distribution of p is approximately normally distributed if N is fairly large andπ is not close to 0 or 1.

The rule of thumb is that the approximation is good if both Nπ and N(1−π) are greaterthan 10.