226 lec9 jda

58
1 Chapter 4 Probability and Sampling Distributions

Transcript of 226 lec9 jda

Page 1: 226 lec9 jda

1

Chapter 4

Probability and Sampling Distributions

Page 2: 226 lec9 jda

2

Random Variable

Definition: A random variable is a variable whose value is a numerical outcome of a random phenomenon. The statistic calculated from a randomly chosen

sample is an example of a random variable. We don’t know the exact outcome beforehand.

A statistic from a random sample will take different values if we take more samples from the same population.

Page 3: 226 lec9 jda

3

Section 4.4

The Sampling Distribution of a Sample Mean

Page 4: 226 lec9 jda

4

Introduction A statistic from a random sample will take

different values if we take more samples from the same population

The values of a statistic do no vary haphazardly from sample to sample but have a regular pattern in many samplesWe already saw the sampling distribution

We’re going to discuss an important sampling distribution. The sampling distribution of the sample mean, x-bar( )

Page 5: 226 lec9 jda

5

Example

Suppose that we are interested in the workout times of ISU students at the Recreation center.

Let’s assume that μ is the average workout time of all ISU students To estimate μ lets take a simple random sample of 100

students at ISU We will record each students work out time (x) Then we find the average workout time for the 100 students

The population mean μ is the parameter of interest. The sample mean, , is the statistic (which is a random variable). Use to estimate μ (This seems like a sensible thing to do).

x

xx

Page 6: 226 lec9 jda

6

Example

A SRS should be a fairly good representation of the population so the x-bar should be somewhere near the . x-bar from a SRS is an unbiased estimate of due to

the randomization We don’t expect x-bar to be exactly equal to

There is variability in x-bar from sample to sample If we take another simple random sample (SRS) of

100 students, then the x-bar will probably be different. Why, then, can I use the results of one sample to

estimate ?

Page 7: 226 lec9 jda

7

If x-bar is rarely exactly right and varies from sample to sample, why is x-bar a reasonable estimate of the population mean ? Answer: if we keep on taking larger and larger

samples, the statistic x-bar is guaranteed to get closer and closer to the parameter

We have the comfort of knowing that if we can afford to keep on measuring more subjects, eventually we will estimate the mean amount of workout time for ISU students very accurately

Statistical Estimation

Page 8: 226 lec9 jda

8

The Law of Large Numbers Law of Large Numbers (LLN):

Draw independent observations at random from any population with finite mean

As the number of observations drawn increases, the mean x-bar of the observed values gets closer and closer to the mean of the population

If n is the sample size as n gets large

The Law of Large Numbers holds for any population, not just for special classes such as Normal distributions

x

Page 9: 226 lec9 jda

9

Example

Suppose we have a bowl with 21 small pieces of paper inside. Each paper is labeled with a number 0-20. We will draw several random samples out of the bowl of size n and record the sample means, x-bar for each sample. What is the population?

Since we know the values for each individual in the population (i.e. for each paper in the bowl), we can actually calculate the value of µ, the true population mean. µ = 10

Draw a random sample of size n = 1. Calculate x-bar for this sample.

Page 10: 226 lec9 jda

10

Example

Draw a second random sample of size n = 5. Calculate for this sample.

Draw a third random sample of size n = 10. Calculate for this sample.

Draw a fourth random sample of size n = 15. Calculate for this sample.

Draw a fifth random sample of size n = 20. Calculate for this sample.

What can we conclude about the value of as the sample size increases?

THIS IS CALLED THE LAW OF LARGE NUMBERS.

x

x

x

x

x

Page 11: 226 lec9 jda

11

Another Example

number of observations

me

an

of

firs

t n

ob

se

rva

tio

ns (

fee

t)

0 5000 10000 15000 20000

5.6

95

5.7

00

5.7

05

5.7

10

Example: Suppose we know that the average height of all high school students in Iowa is 5.70 feet. We get SRS’s from the population and calculate the height.

Mea

n of

firs

t n

obse

rvat

ions

Page 12: 226 lec9 jda

12

Example 4.21 From Book

Sulfur compounds such as dimethyl sulfide (DMS) are sometimes present in wine

DMS causes “off-odors” in wine, so winemakers want to know the odor threshold What is the lowest concentration of DMS that the

human nose can detect

Different people have different thresholds, so we start by asking about the mean threshold in the population of all adults is a parameter that describes this population

Page 13: 226 lec9 jda

13

To estimate , we present tasters with both natural wine and the same wine spiked with DMS at different concentrations to find the lowest concentration at which they can identify the spiked wine

The odor thresholds for 10 randomly chosen subjects (in micrograms/liter): 28 40 28 33 20 31 29 27 17 21

The mean threshold for these subjects is 27.4 x-bar is a statistic calculated from this sample A statistic, such as the mean of a random sample of

10 adults, is a random variable.

Example 4.21 From Text

Page 14: 226 lec9 jda

14

Example Suppose = 25 is the true value of the

parameter we seek to estimate The first subject had threshold 28 so the

line starts there The second point is the mean of the first

two subjects:

This process continues many many times, and our line begins to settle around = 25

342

4028

x

Page 15: 226 lec9 jda

15

The law of large numbers in action: as we take more observations, the sample mean always approaches the mean of the population

x

25

Example 4.21From Book

Page 16: 226 lec9 jda

16

The law of large numbers is the foundation of business enterprises such as casinos and insurance companies The winnings (or losses) of a gambler on a few plays are

uncertain -- that’s why gambling is exciting(?) But, the “house” plays tens of thousands of times

So the house, unlike individual gamblers, can count on the long-run regularity described by the Law of Large Numbers

The average winnings of the house on tens of thousands of plays will be very close to the mean of the distribution of winnings

Hence, the LLN guarantees the house a profit!

The Law of Large Numbers

Page 17: 226 lec9 jda

17

Thinking about the Law of Large Numbers The Law of Large Numbers says broadly that

the average results of many independent observations are stable and predictable

A grocery store deciding how many gallons of milk to stock and a fast-food restaurant deciding how many beef patties to prepare can predict demand even though their customers make independent decisionsThe Law of Large Numbers says that the many

individual decisions will produce a stable result

Page 18: 226 lec9 jda

18

The “Law of Small Numbers” or “Averages”

The Law of Large Numbers describes the regular behavior of chance phenomena in the in the long runlong run

Many people believe in an incorrect “law of small numbers”We falsely expect even short sequences of

random events to show the kind of average behaviors that in fact appears only in the long run

Page 19: 226 lec9 jda

19

Example: Pretend you have an average free throw success rate of 70%. One day on the free throw line, you miss 8 shots in a row. Should you hit the next shot by the mythical “law of averages.”

No. The law of large numbers tells us that the long run average will be close to 70%. Missing 8 shots in a row simply means you are having a bad day. 8 shots is hardly the “long run”. Furthermore, the law of large numbers says nothing about the next event. It only tells us what will happen if we keep track of the long run average.

The “Law of Small Numbers” or “Averages”

Page 20: 226 lec9 jda

20

In some sports If player makes several consecutive good plays, like a few good golf shots in a row, often they claim to have the “hot hand”, which generally implies that their next shot is likely to a good one.

There have been studies that suggests that runs of golf shots good or bad are no more frequent in golf than would be expected if each shot were independent of the player’s previous shots

Players perform consistently, not in streaks Our perception of hot or cold streaks simply shows that

we don’t perceive random behavior very well!

The Hot Hand Debate

Page 21: 226 lec9 jda

21

Gamblers often follow the hot-hand theory, betting that a “lucky” run will continue

At other times, however, they draw the opposite conclusion when confronted with a run of outcomes If a coin gives 10 straight heads, some gamblers feel

that it must now produce some extra tails to get back into the average of half heads and half tails

Not true! If the next 10,000 tosses give about 50% tails, those 10 straight heads will be swamped by the later thousands of heads and tails.

No short run compensation is needed to get back to the average in the long run.

The Gambling Hot Hand

Page 22: 226 lec9 jda

22

Our inability to accurately distinguish random behavior from systematic influences points out the need for statistical inference to supplement exploratory analysis of data

Probability calculations can help verify that what we see in the data is more than a random pattern

Need for Law of Large Numbers

Page 23: 226 lec9 jda

23

How Large is a Large Number?

The Law of Large Numbers says that the actual mean outcome of many trials gets close to the distribution mean as more trials are made

It doesn’t say how many trials are needed to guarantee a mean outcome close to That depends on the variability of the random outcomes

The more variable the outcomes, the more trials are needed to ensure that the mean outcome x-bar is close to the distribution

Page 24: 226 lec9 jda

24

More Laws of Large Numbers The Law of Large Numbers is one of the central

facts about probability LLN explains why gambling, casinos, and insurance

companies make money LLN assures us that statistical estimation will be accurate

if we can afford enough observations The basic Law of Large Numbers applies to

independent observations that all have the same distribution Mathematicians have extended the law to many more

general settings

Page 25: 226 lec9 jda

25

What if Observations are not Independent You are in charge of a process that

manufactures video screens for computer monitors

Your equipment measures the tension on the metal mesh that lies behind each screen and is critical to its image quality

You want to estimate the mean tension for the process by the average x-bar of the measurements

The tension measurements are not independent

Page 26: 226 lec9 jda

26

AYK 4.82 Use the Law of Large Numbers applet on

the text book website

Page 27: 226 lec9 jda

27

Sampling Distributions

The Law of Large Numbers assures us that if we measure enough subjects, the statistic x-bar will eventually get very close to the unknown parameter

Page 28: 226 lec9 jda

28

What if we don’t have a large sample?Take a large number of samples of the same

size from the same population

Calculate the sample mean for each sample

Make a histogram of the sample means the histogram of values of the statistic

approximates the sampling distribution that we would see if we kept on sampling forever…

Sampling Distributions

Page 29: 226 lec9 jda

29

The idea of a sampling distribution is the foundation of statistical inferenceThe laws of probability can tell us about

sampling distributions without the need to actually choose or simulate a large number of samples

Page 30: 226 lec9 jda

30

Mean and Standard Deviation of aSample Mean

Suppose that x-bar is the mean of a SRS of size n drawn from a large population with mean and standard deviation

The mean of the sampling distribution of x-bar is and its standard deviation is

Notice: averages are less variable than individual observations!

n

Page 31: 226 lec9 jda

31

The mean of the statistic x-bar is always the same as the mean of the population the sampling distribution of x-bar is centered at in repeated sampling, x-bar will sometimes fall above

the true value of the parameter and sometimes below, but there is no systematic tendency to overestimate or underestimate the parameter

because the mean of x-bar is equal to , we say that the statistic x-bar is an unbiased estimator of the parameter

Mean and Standard Deviation of aSample Mean

Page 32: 226 lec9 jda

32

An unbiased estimator is “correct on the average” in many samples how close the estimator falls to the parameter in most

samples is determined by the spread of the sampling distribution

if individual observations have standard deviation , then sample means x-bar from samples of size n have standard deviation

Again, notice that averages are less variable than individual observations

n

Mean and Standard Deviation of aSample Mean

Page 33: 226 lec9 jda

33

Not only is the standard deviation of the distribution of x-bar smaller than the standard deviation of individual observations, but it gets smaller as we take larger samples The results of large samples are less variable than

the results of small samples Remember, we divided by the square root of n

Mean and Standard Deviation of aSample Mean

Page 34: 226 lec9 jda

34

If n is large, the standard deviation of x-bar is small and almost all samples will give values of x-bar that lie very close to the true parameter The sample mean from a large sample can be trusted

to estimate the population mean accurately

Notice, that the standard deviation of the sample distribution gets smaller only at the rate To cut the standard deviation of x-bar in half, we must

take four times as many observations, not just twice as many (square root of 4 is 2)

n

Mean and Standard Deviation of aSample Mean

Page 35: 226 lec9 jda

35

Example

Suppose we take samples of size 15 from a distribution with mean 25 and standard deviation 7 the distribution of x-bar is:

the mean of x-bar is: 25

the standard deviation of x-bar is: 1.80739

25,7

15

Page 36: 226 lec9 jda

36

What About Shape?

We have described the center and spread of the sampling distribution of a sample mean x-bar, but not its shape

The shape of the distribution of x-bar depends on the shape of the population distribution

Page 37: 226 lec9 jda

37

Sampling Distribution of a Sample Mean If a population has the N(, ) distribution,

then the sample mean x-bar of n independent observations has the

distribution

nN

,

Page 38: 226 lec9 jda

38

Example

Adults differ in the smallest amount of dimethyl sulfide they can detect in wine

Extensive studies have found that the DMS odor threshold of adults follows roughly a Normal distribution with mean = 25 micrograms per liter and standard deviation = 7 micrograms per liter

Page 39: 226 lec9 jda

39

Because the population distribution is Normal, the sampling distribution of x-bar is also Normal

If n = 10, what is the distribution of x-bar?

10

7,25N

Example

Page 40: 226 lec9 jda

40

What if the Population Distribution is not Normal?

As the sample size increases, the distribution of x-bar changes shapeThe distribution looks less like that of the

population and more like a Normal distribution When the sample is large enough, the

distribution of x-bar is very close to NormalThis result is true no matter what shape of the

population distribution as long as the population has a finite standard deviation

Page 41: 226 lec9 jda

41

Central Limit Theorem

Draw a SRS of size n from any population with mean and finite standard deviation

When n is large, the sampling distribution of the sample mean x-bar is approximately Normal:

x-bar is approximately

nN

,

Page 42: 226 lec9 jda

42

More general versions of the central limit theorem say that the distribution of a sum or average of many small random quantities is close to Normal

The central limit theorem suggests why the Normal distributions are common models for observed data

Central Limit Theorem

Page 43: 226 lec9 jda

43

How Large a Sample is Needed?

Sample Size depends on whether the population distribution is close to NormalWe require more observations if the shape of

the population distribution is far from Normal

Page 44: 226 lec9 jda

44

Example

The time X that a technician requires to perform preventive maintenance on an air-conditioning unit is governed by the Exponential distribution (figure 4.17 (a)) with mean time = 1 hour and standard deviation = 1 hour

Your company operates 70 of these units The distribution of the mean time your company

spends on preventative maintenance is:

12.0,170

1,1 NN

Page 45: 226 lec9 jda

45

What is the probability that your company’s units average maintenance time exceeds 50 minutes?

50/60 = 0.83 hour So we want to know P(x-bar >

0.83) Use Normal distribution

calculations we learned in Chapter 2!

9222.00778.01

42.11

42.1

12.0

183.0

83.0

zP

zP

n

xP

xP

Example

Page 46: 226 lec9 jda

46

4.86 ACT scores

The scores of students on the ACT college entrance examination in a recent year had the Normal distribution with mean µ = 18.6 and standard deviation σ = 5.9

Page 47: 226 lec9 jda

47

What is the probability that a single student randomly chosen from all those taking the test scores 21 or higher?

4.86 ACT scores

3409.06591.01

)41.0(1)4068.0(

9.5

6.1821

)21(

zPzP

xP

xP

Page 48: 226 lec9 jda

48

About 34% of students (from this population) scored a 21 or higher on the ACT

The probability that a single student randomly chosen from this population would have a score of 21 or higher is 0.34

4.86 ACT scores

Page 49: 226 lec9 jda

49

Now take a SRS of 50 students who took the test. What are the mean and standard deviation of the sample mean score x-bar of these 50 students?Mean = 18.6 [same as µ]Standard Deviation = 0.8344 [sigma/sqrt(50)]

4.86 ACT scores

Page 50: 226 lec9 jda

50

What is the probability that the mean score x-bar of these students is 21 or higher?

4.86 ACT scores

002.09980.01

)88.2(1)8778.2(

834.0

6.1821

)21(

zPzP

n

xP

xP

Page 51: 226 lec9 jda

51

About 0.2 % of all random samples of size 50 (from this population) would have a mean score x-bar of 21 or higher.

The probability of having a mean score x-bar of 21 or higher from a sample of 50 students (from this population) is 0.002.

4.86 ACT scores

Page 52: 226 lec9 jda

52

Section 4.4 Summary

When we want information about the population mean µ for some variable, we often take a SRS and use the sample mean x-bar to estimate the unknown parameter µ.

Page 53: 226 lec9 jda

53

The Law of Large Numbers states that the actually observed mean outcome x-bar must approach the mean µ of the population as the number of observations increases.

Section 4.4 Summary

Page 54: 226 lec9 jda

54

The sampling distribution of x-bar describes how the statistic x-bar varies in all possible samples of the same size from the same population.

Section 4.4 Summary

Page 55: 226 lec9 jda

55

The mean of the sampling distribution is µ, so that x-bar is an unbiased estimator of µ.

Section 4.4 Summary

Page 56: 226 lec9 jda

56

The standard deviation of the sampling distribution of x-bar is sigma over the square root of n for a SRS of size n if the population has standard deviation sigma. That is, averages are less variable than individual observations.

Section 4.4 Summary

Page 57: 226 lec9 jda

57

If the population has a Normal distribution, so does x-bar.

Section 4.4 Summary

Page 58: 226 lec9 jda

58

The Central Limit Theorem states that for large n the sampling distribution of x-bar is approximately Normal for any population with finite standard deviation sigma. That is, averages are more Normal than individual observations. We can use the fact that x-bar has a known Normal distribution to calculate approximate probabilities for events involving x-bar.

Section 4.4 Summary