Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.

41
Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008

Transcript of Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.

Psych 5500/6500

The Sampling Distribution of the Mean

Fall, 2008

Sampling Distribution of the Mean

The 'sampling distribution of the mean’ (SDM): the population of all the sample means you could get if you sampled a certain number of scores from a certain population.

For Example

In a previous semester I asked the students to draw a sample from a deck of playing cards.

Original Population from Which the Sample Was Drawn

4 cards of each type (jacks counted as ’11’, queens as ’12’, kings as ’13’). This is a graph of individual scores in the population (i.e. ‘Y’). The mean of the population of playing cards is μY=7 and its standard deviation is σY=3.74. Note the population is not normally distributed, and the exact values of μ and σ are known (not estimated).

Sample Means When N=4

The students were asked to sample four cards and find the mean of the sample. Not surprisingly they obtained many different sample means.

Sample means from when n=4:

6.75, 5.5, 7.25, 7.5, 6.25, 8, 9.75, 7, 9.5, 4.25, 3.75, 9, 5.25, 7.75, 7.5, 11.5, 2.25, 9.25, 3.5, 8.75, 5.5, 6, 7.25, 7.75, 9.75, 7, 9.5, 7.5, 6.5, 9.25, 7.25, 7.25, 9.5

SDM for N=4

This is a graph of the 23 sample means (rounded off to the nearest whole number). We are starting to see the shape of the sampling distribution of the mean when n=4. Note that the mean of the sample means looks to be around ‘7’ (the mean of the original population, which is why the sample mean is an unbiased estimate of the population mean).

Sample Means When N=8

The students were then asked to sample eight cards and find the sample mean.

Sample means from when n=8:

6.25, 8.25, 5.75, 5.38, 6.63, 7.5, 7.5, 9.13, 8.38, 5.63, 6.13, 5.88, 8.13, 7.75, 7.13, 4.7, 5.63, 6.63, 9.13, 5.88, 5.88, 5.13, 8.63, 6.13, 7.5, 9.13, 8.13, 7.63, 6.75, 7.88, 7.38, 7.50, 7.85

SDM for N=8

Again, this is a graph of the sample means for when N=8.And again, the mean of the sample means looks to be the same as the mean of the original population (7).

ComparisonsThe next three slides show the three graphs. Note

the following:

1. While the population from which we sampled was not normally distributed, the graphs of the sample means begin to look more like normal curves.

2. The variance of the sample means is less than the variance of the original population, as n moves from 4 to 8, the variance of the sample means decreases (the sample mean is a ‘consistent’ estimate of the population mean).

Original Population from Which the Sample Was Drawn

This is a graph of individual scores (Y).

SDM for N=4

This is a graph of sample means (when n=4).

SDM for N=8

This is a graph of the sample means for when N=8.

Short CutThe preceding approach for finding the

sampling distribution of the mean would actually require that we obtain an infinite number of sample means to arrive at a true picture of the population of sample means we could obtain if we sampled a certain number of scores from a certain population (i.e. the SDM). This is a good way to introduce the concept of SDM but we need a short cut for actually producing an SDM...

1) The Shape of the SDM

You can count on the SDM being normally distributed if either of the following two conditions are met. 

1. The SDM will be normally distributed if the population you sampled from is normally distributed.

2. The SDM will be normally distributed (even if the population you sampled from is not) if the N of your sample is large enough (Central Limit Theorem). Rule of thumb: N ≥ 30

2) The Mean of the SDM

The mean of the population of sample means equals the mean of the population from which you sample (that is why the sample mean is an ‘unbiased’ estimate of the population mean).

YY μμ

3) The Standard Deviation of the SDM

The standard deviation of the sample means is less than the standard deviation of the population from which you sampled, as the means will vary less than the scores do.

N

σσ Y

Y

that?called isit out why figureyou can

,mean' theoferror standard' theasknown also is σY

Example: Original Population

Let’s say the population is normally distributed, which means that theSDM will be normally distributed as well.

SDM for N=4

84

16

N

σσ 60 Y

YYY

SDM for N=64

264

16

N

σσ 60 Y

YYY

Probability and the SDM

When the SDM is normally distributed we can answer certain types of questions. The following slides take us through a typical question from the homework assignment.

Question

We will begin by repeating a process learned in an earlier lecture.

We are sampling from a population that is normally distributed with a mean of 55 and a standard deviation of 10.

What is the probability of drawing a score from that population that is between 50 and 60?

60)?Yp(50

Original Population

Step 1: draw and label the population.

Original Population

Step 2: shade in the area of question.

Original Population

Step 3: compute the z scores and look up the area underthe normal curve. The probability of obtaining a single scorebetween 50 Y 60 = .1915+.1915 = .3830 p=.3830

Question

Now we are going to ask a new question. If we sample nine scores from that population, what is the probability of obtaining a sample mean that is between 50 and 60?

60)?Yp(50

SDM for N=9

Step 1: draw the sampling distribution of the mean, which isthe population of all the sample means we could get if wesample 9 scores from the original population. We know theSDM is normally distributed, its mean is the same as the mean of the population, and we can compute the standard deviation of the curve (‘standard error’). Note this is a population of sample means.

SDM for N=9

Step 2: shade in the area of question.

SDM & Standard Score

To figure out the shaded area of the normal curve we need to change the sample means of 50 and 60 to standard scores.

As always, the standard score will be the ‘raw’ score on the graph (this is a graph of sample means) – the mean of the graph (the mean of the sample means) divided by the standard deviation of the graph (the standard deviation of the sample means, a.k.a. the ‘standard error’)

Y

Y-Yz

SDM for N=9

Step 3: compute the z scores and look up the area underthe normal curve. The probability of obtaining a sample meanbetween 50 and 60 = .4332+.4332 p=.8664

5.13.33

55-50z and 5.1

33.3

5560-Yz

Y

Y

Looking Back

When we sampled one score from a normal population that had μ=55 and σ=10 there was a 38.3% chance that the score would be within 5 of the population mean.

When we sampled 9 scores from that population there was a 86.64% chance that the sample mean would be within 5 of the population mean.

1-tail and 2-tail p values

We are very close to doing some statistical analyses to test specific hypothesis. The next step is to play with scenarios such as:

You sample 36 scores from a population that has a μ=80 and σ=12. For what value of the sample mean is there only a 5% chance that you would obtain a sample mean that is that far or farther above the population mean?

To set up the problem first draw the population you will be samplingfrom, and then the SDM (population of sample means for N=36).We don’t know if the population is normally distributed, do we know if the SDM is?

Formulas

YYY

Y )(z)(Y and -Y

z

What sample mean would be 1.65 standard deviations above the meanon this curve?

3.8380)2)(65.1()(z)(Y YY

Conditional Probability

Let’s think of it as a conditional probability.

.05 12) of and 80 of awith

population a from scores 36 sampling |83.3Yp(

Another Example

You sample 36 scores from a population that has a μ=80 and σ=12. For what value of the sample mean is there only a 5% chance that you would obtain a sample mean that is that far or farther below the population mean?

Conditional Probability

.05 12) of and 80 of awith

population a from scores 36 sampling |7.67Yp(

Final Example

You sample 36 scores from a population that has a μ=80 and σ=12. For what values of the sample mean is there only a 5% chance that you would obtain a sample mean that is that far or farther away from the population mean (in either direction)?

For a normal curve the z scores that cut off a total of the 5% mostextreme scores (in both directions) are:

Conditional Probability

.05 12) of and 80 of a with population a from

scores 36 sampling | 92.83Yor 08.67Yp(