1 The (“Sampling”) Distribution for the Sample Mean*

55
1 The (“Sampling”) Distribution for the Sample Mean*

Transcript of 1 The (“Sampling”) Distribution for the Sample Mean*

Page 1: 1 The (“Sampling”) Distribution for the Sample Mean*

1

The (“Sampling”) Distribution for the Sample Mean*

Page 2: 1 The (“Sampling”) Distribution for the Sample Mean*

2

Distribution of Sample Means

A quantitative population of N units with parameters

mean standard deviation

A random sample of n units from the population

Statistic: The sample mean .X

Page 3: 1 The (“Sampling”) Distribution for the Sample Mean*

3

Distribution of Sample Means

Statistic: The sample mean .

This statistic is an unbiased point estimate (on average correct) of the parameter .

X

X

Page 4: 1 The (“Sampling”) Distribution for the Sample Mean*

4

20 Times Rule / 5% Rule(same thing)

If the population size (N) is at least 20 times the sample size (n)

N / n 20 or n / N 0.05

then the standard deviation is (essentially)

nX

Page 5: 1 The (“Sampling”) Distribution for the Sample Mean*

5

Distribution of the Sample MeanGiven

A variable with population that is not Normally distributed with mean and standard deviation .

A random sample of size n.

Result

The sample mean has approximate Normal distribution with

X nX

When the population size is at least 20 times n.

Page 6: 1 The (“Sampling”) Distribution for the Sample Mean*

6

ExampleRolls of paper leave a factory with weights that are Normal with mean = 1493 lbs, and standard deviation = 12 lbs.

Page 7: 1 The (“Sampling”) Distribution for the Sample Mean*

7

Finding probabilitiesWhat is the probability a roll weighs over 1500 lbs?

ANS: 0.2798

(about 28% of rolls exceed 1500 lbs)

5833.012

14931500

Z

Page 8: 1 The (“Sampling”) Distribution for the Sample Mean*

8

New QuestionA truck transports 8 rolls at a time. The legal weight limit for the truck is 12,000 lbs. What is the probability 8 rolls have total weight exceeding this limit?

Since 12000/8 = 1500, the question could also be phrased:

What is the probability 8 rolls have (sample) mean weight exceeding 1500?

The bad news: The answer is not 0.2798.

The good news: It’s not that tough.

Page 9: 1 The (“Sampling”) Distribution for the Sample Mean*

9

Distribution of the Sample MeanGiven

A variable with population that is Normally distributed with mean and standard deviation .

A random sample of size n. (N/n 20)

Result

The sample mean has Normal distribution:

X nX

Review of previous slide

X

Page 10: 1 The (“Sampling”) Distribution for the Sample Mean*

10

Example - continuedRolls (single rolls) of paper leave a factory with weights that are Normal with mean = 1493 lbs, and standard deviation = 12 lbs.

If n = 8 rolls are randomly selected, what is the probability their sample mean weight exceeds 1500?

The distribution of sample means is Normal.

1493X 243.48

12

nX

X

Page 11: 1 The (“Sampling”) Distribution for the Sample Mean*

11

Finding probabilitiesFind the probability the sample mean is over 1500 lbs.

Here we’re using the same mean, but a standard deviation reduced to 4.243.

ANS: 0.0495

650.1243.4

7

243.4

14931500

Z

Page 12: 1 The (“Sampling”) Distribution for the Sample Mean*

12

Interpreting the ResultThe probability the sample mean for 8 rolls exceeds 1500 lbs is 0.0495.

For 4.95% of all possible samples of 8 rolls, the sample mean exceeds 1500 lbs.

Equivalent: There is a 0.0495 probability that the total weight will exceed 81500 = 12,000 lbs.

We’re working towards using the sample mean as an estimate of the population mean.

Page 13: 1 The (“Sampling”) Distribution for the Sample Mean*

13

The Picture

Weight (lbs)153315231513150314931483147314631453

1500

Weights of single rolls.Sample mean weights for samples of 8 rolls.

Page 14: 1 The (“Sampling”) Distribution for the Sample Mean*

14

The Picture

About 28% of all rolls are > 1500 lbs

Page 15: 1 The (“Sampling”) Distribution for the Sample Mean*

15

The Picture

About 5% of all samples of 8 rolls

have mean > 1500 lbs

Page 16: 1 The (“Sampling”) Distribution for the Sample Mean*

16

ExampleSurvival times have a right skewed distribution with mean = 13 months and standard deviation = 12 months.

What can we say about the distribution of sample mean survival times for samples of n patients?

As n gets larger, the distribution gets closer to Normal.

nnX

0.12

13X

Page 17: 1 The (“Sampling”) Distribution for the Sample Mean*

17

6050403020100

13

Single values

SD = 12.0

Sample mean n = 4SD = 6.0

Sample mean n = 16SD = 3.0

Sample mean n = 64SD = 1.5

Page 18: 1 The (“Sampling”) Distribution for the Sample Mean*

18

Distribution of the Sample MeanGiven

A variable with population that is not Normally distributed with mean and standard deviation .

A random sample of size n.

Result

The sample mean has approximate Normal distribution with

X nX

Assume the population size is at least 20 times n.

Page 19: 1 The (“Sampling”) Distribution for the Sample Mean*

19

Distribution of the Sample MeanGiven

A variable with population that is not Normally distributed with mean and standard deviation .

A random sample of size n.

Result

The sample mean has generally unknown distribution with

X nX

Page 20: 1 The (“Sampling”) Distribution for the Sample Mean*

20

Distribution of the Sample MeanGiven

A variable with population that is not Normally distributed with mean and standard deviation .

A random sample of size n, where n is sufficiently large.

Result

The sample mean has approximate Normal distribution with

X nX

Central Limit Theorem (CLT)

Page 21: 1 The (“Sampling”) Distribution for the Sample Mean*

21

What is “Sufficiently Large?”Your book says “generally n at least 30.”

If the population is fairly symmetric without outliers, considerably less than 30 will do the trick.

If the population is highly skewed, or not unimodal, considerably more than 30 may be required.

If the population is Normal then sample size is not a concern: The sample mean is Normal.

You may use the “30” rule if you recognize that it’s not that black and white, and that for Normal populations, n = 1 is “sufficiently large.”

Page 22: 1 The (“Sampling”) Distribution for the Sample Mean*

22

ExampleThe Census Bureau reports the average age at death for female Americans is 79.7 years, with standard deviation 14.5 years.

= 79.7 years = 14.5 years

What can we say about the distribution of sample means for samples of size 7?

It has mean

It has standard deviation

Is the distribution Normal?

7.79X

48.57

5.14

nX

Page 23: 1 The (“Sampling”) Distribution for the Sample Mean*

23

ExampleDistribution of longevity: 80 15

Within 1 s.d.:

Page 24: 1 The (“Sampling”) Distribution for the Sample Mean*

24

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95)

Page 25: 1 The (“Sampling”) Distribution for the Sample Mean*

25

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95) 68%

Page 26: 1 The (“Sampling”) Distribution for the Sample Mean*

26

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95) 68%

Within 2 s.d.s: (50, 110) 95%

Page 27: 1 The (“Sampling”) Distribution for the Sample Mean*

27

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95) 68%

Within 2 s.d.s: (50, 110) 95%

Above 110

Page 28: 1 The (“Sampling”) Distribution for the Sample Mean*

28

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95) 68%

Within 2 s.d.s: (50, 110) 95%

Above 110 2.5%

Page 29: 1 The (“Sampling”) Distribution for the Sample Mean*

29

ExampleDistribution of longevity: 80 15

If Normal

Within 1 s.d.: (65, 95) 68%

Within 2 s.d.s: (50, 110) 95%

Above 110 2.5%

1 in 40 ???

No way! The distribution is not Normal.

Page 30: 1 The (“Sampling”) Distribution for the Sample Mean*

309075604530

16

14

12

10

8

6

4

2

0

Age at Death (years) for Women

Per

cen

t of

Wo

men

ExampleThe Normal shouldn’t be used here (why not?)

Page 31: 1 The (“Sampling”) Distribution for the Sample Mean*

31

ExampleThe Normal shouldn’t be used here (why not?)

The distribution of age at death is not Normal. It is quite left skewed.

The sample size is not sufficiently large. (At least 30 by your book, although for this situation your instructor would probably buy into as low as 20.)

The Central Limit Theorem can’t be applied.

The sample mean doesn’t have approximate Normal distribution

Page 32: 1 The (“Sampling”) Distribution for the Sample Mean*

32

ExampleWhat can we say about the distribution of sample means for samples of size 7?

It has mean

It has standard deviation

Is the distribution Normal?

NO!

7.79X

48.57

5.14

nX

Page 33: 1 The (“Sampling”) Distribution for the Sample Mean*

33

Example = 79.7 years = 14.5 years

I looked at a few recent obituaries in the Oswego Daily News (online):

79 70 48 99 85 71 45

36.1900.71 SX

Page 34: 1 The (“Sampling”) Distribution for the Sample Mean*

34

Example

This sample has . A difference of 8.7.

Can we compute a Z score for 71.0? Should we?

Z = (71.0 – 79.7) /5.48 = 8.7/5.48 = –1.59

Why not? This suggests 71.0 (8.7 from 79.7) is somewhat, but not extremely, unusually low. 71.0 is 1.59 standard deviations from 79.7.

7.79X 48.57

5.14

nX

0.71X

Page 35: 1 The (“Sampling”) Distribution for the Sample Mean*

35

ExampleShould we use the Table to obtain probabilities from Z scores (such as our Z = –1.59)?

NO

If not, how could we get the probabilityof a result within 8.7 from 79.7?

Using either

a huge database of longevities:

Simulate many (all possible) samples of size 7. Determine what proportion of samples give a mean at no more than 8.7 from 79.7.

a mathematical model for the longevities

Either determine the model for sample means using calculus, or approximate it using numerical methods.

Preferred method: Much more compact; faster to work with; essentially identical results.

Page 36: 1 The (“Sampling”) Distribution for the Sample Mean*

36

ExampleWhat is the distribution of the sample mean of samples of size n = 48?

Even though age at death is left skewed, with n = 48 (large enough) the Central Limit Theorem applies, and the sample mean has approximate Normal distribution.

7.79X09.2

48

5.14

nX

Page 37: 1 The (“Sampling”) Distribution for the Sample Mean*

37

ExampleI looked at 41 more recent obituaries (total of 48)

79 70 48 99 85 71 45 more data 87 75 90 95 51 99 69 71 49 93 80 89 77 72101 69 92 92 86 78 92 89 91 81 74 68 89 92 64 71 50 81 88 42 91 44 51 85 81 92 93

37.1652.77 SX

Page 38: 1 The (“Sampling”) Distribution for the Sample Mean*

38

100908070605040

Median

Mean

90858075

95% Confidence Intervals

Example

Mean

Median

Mode

Page 39: 1 The (“Sampling”) Distribution for the Sample Mean*

39

ExampleMeans for samples of 48 US longevities:

Normal

My sample

The sample mean is (79.7 – 77.52) = 2.18 from the population mean.

What is the probability that a random sample of 48 U.S. women’s deaths gives a sample mean at within 2.18 of 79.7.

2.18 below 79.7 is 77.52.

2.18 above 79.7 is 81.88

7.79X52.77X

09.2X

Page 40: 1 The (“Sampling”) Distribution for the Sample Mean*

40

77.52

0.7054

81.8879.7

Normal, Mean=79.7, StDev=2.08

ExampleBelow 77.52 or above 81.88.

Z = 2.18/2.08 = 1.04

Probability = 0.852 – 0.148 = 0.704

Page 41: 1 The (“Sampling”) Distribution for the Sample Mean*

41

ExampleFind the probability that a random sample of 48 U.S. women’s deaths gives a sample mean at within 2.18 of 79.7.

Probability = 0.704

About 30% (that’s almost 1 in 3) of all samples of 48 deaths give a sample mean more than 2.18 from 79.7.

Page 42: 1 The (“Sampling”) Distribution for the Sample Mean*

42

ExampleGive two explanations that account for the 2.18 year difference between the data on Oswego longevity (which were lower on average) and the U.S. longevity parameter of 79.7.

1. Women in Oswego do not live as long on average as they do nationwide. That is:

Oswego< 79.7

Page 43: 1 The (“Sampling”) Distribution for the Sample Mean*

43

ExampleGive two explanations that account for the 2.18 year difference between the data on Oswego longevity (which were lower on average) and the U.S. longevity parameter of 79.7.

2. Sampling variability (sampling “error”):

Oswego= 79.7

About 30% of all samples of 48 women yield a mean 2.18 or more from 79.7. That isn’t so uncommon. Our data aren’t very inconsistent with the national result.

Page 44: 1 The (“Sampling”) Distribution for the Sample Mean*

44

Sampling Without Replacement

What to do if the sample size is more than 5% of the population size…

N= population size

n = sample size

N / n 20 n / N ≤ 0.05

Page 45: 1 The (“Sampling”) Distribution for the Sample Mean*

45

Distribution of Sample MeansThe distribution of the sample mean has

> mean (“unbiased”)

> standard deviation

> shape closer to Normal(but not necessarily Normal)

X

1

N

nN

nX

Page 46: 1 The (“Sampling”) Distribution for the Sample Mean*

46

1110987654321Individual Word Lengths

Dotplot of word length

Each symbol represents up to 2 observations.

Word Lengths – Gettysburg Address

N = 268 words: Mean length = 4.295.

Standard Deviation = 2.123.

Not Normal. Right skewed. Can’t use Table A2.

Page 47: 1 The (“Sampling”) Distribution for the Sample Mean*

47

Distribution of Sample Means: n = 5Sample means from samples of size n = 5 have

> mean

> standard deviation

> shape closer to Normal (but not Normal – a bit right skewed)

295.4X

1268

5268

5

123.2

1

N

nN

nX

942.09925.09494.0267

2639494.0

X

Page 48: 1 The (“Sampling”) Distribution for the Sample Mean*

48

8.07.67.26.86.46.05.65.24.84.44.03.63.22.82.42.01.6Sample Mean Word Length

Each symbol represents up to 22 observations.

4.295

Distribution of Sample Means : n = 5

The mean of this distribution is 4.295.

The standard deviation of this distribution is 0.942.

The shape is close to Normal (but not Normal – there’s right skew).

Page 49: 1 The (“Sampling”) Distribution for the Sample Mean*

49

Distribution of Sample Means : n = 10Sample means from samples of size n = 10 have

> mean

> standard deviation

> shape closer to Normal (but not exactly Normal – a bit right skewed)

295.4X

1268

10268

10

123.2

1

N

nN

nX

660.09830.06714.0267

2586714.0

X

Page 50: 1 The (“Sampling”) Distribution for the Sample Mean*

50

8.07.67.26.86.46.05.65.24.84.44.03.63.22.82.42.01.6Sample Mean Word Length

Each symbol represents up to 31 observations.

4.295 The mean of this distribution is 4.295.

The standard deviation of this distribution is 0.660.

The shape is quite close to Normal (just a little right skew – not enough to fuss over).

Distribution of Sample Means : n = 10

Page 51: 1 The (“Sampling”) Distribution for the Sample Mean*

51

n = 5

n = 10

1268

10268

10

123.2

1

N

nN

nX

660.09830.06714.0267

2586714.0

1268

5268

5

123.2

1

N

nN

nX

942.09925.09494.0267

2639494.0

Awful close to 1

Page 52: 1 The (“Sampling”) Distribution for the Sample Mean*

52

n = 5

n = 10

1268

10268

10

123.2

1

N

nN

nX

660.09830.06714.0267

2586714.0

1268

5268

5

123.2

1

N

nN

nX

942.09925.09494.0267

2639494.0

Almost the same.

Page 53: 1 The (“Sampling”) Distribution for the Sample Mean*

53

n = 100

1268

100268

100

123.2

1

N

nN

nX

1684.07932.02123.0267

1682123.0

Not so close to 1

Not almost the same.

Page 54: 1 The (“Sampling”) Distribution for the Sample Mean*

54

Distribution of the Sample MeanGiven

A variable with population that is distributed with mean and standard deviation .

A random sample of size n. PARAMETERS

Results 1 and 2 STATISTIC

The sample mean has distribution with the same mean and a smaller standard deviation.

X 1

N

nN

nX

X

Page 55: 1 The (“Sampling”) Distribution for the Sample Mean*

55

Distribution of the Sample MeanGiven

A variable with population that is distributed with mean and standard deviation .

A random sample of size n.

Results 3

The sample mean has distribution with a shape that is closer to Normal.

X 1

N

nN

nX

X