Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an...

28
Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1) Find and interpret the percentile of an individual value within a distribution of data. 2) Find and interpret the standardized score (z-score) of an individual value within a distribution of data. 3) Use the 68-95-99.7 rule to estimate areas (proportions of values) in a Normal distribution. 4) Use Table A or technology to find (i) the proportion of z-values in a specified interval, or (ii) a z-score from a percentile in the standard Normal distribution. 5) Use Table A or technology to find (i) the proportion of values in a specified interval, or (ii) the value that corresponds to a given percentile in any Normal distribution.

Transcript of Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an...

Page 1: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Chapter 2 Modeling Distributions of Data

ObjectivesSWBAT:1) Find and interpret the percentile of an individual value within a distribution of data.2) Find and interpret the standardized score (z-score) of an individual value within a

distribution of data.3) Use the 68-95-99.7 rule to estimate areas (proportions of values) in a Normal

distribution.4) Use Table A or technology to find (i) the proportion of z-values in a specified

interval, or (ii) a z-score from a percentile in the standard Normal distribution.5) Use Table A or technology to find (i) the proportion of values in a specified interval,

or (ii) the value that corresponds to a given percentile in any Normal distribution.

Page 2: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

The Normal Distribution• A Normal distribution is described by a

Normal density curve (bell-shaped curve).

• Any particular Normal distribution is completely specified by two numbers:

• The mean of the Normal distribution is at the center of the symmetric Normal curve.

Page 3: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Important Properties of a Normal Distribution

• The Normal distribution is roughly symmetric, unimodal, and bell-shaped.• The mean, median, and mode have roughly the same value, which is

located exactly in the center of the distribution.• The total area under the Normal curve equals 1. The area to the left of

the mean is .5 and the area to the right of the mean is .5.• Data that lie beyond two standard deviations from the mean are rare, and

data that lie beyond three standard deviations from the mean are very rare. Outliers are considered values falling below a z-score of -2.68 (area is .0037) or above a z-score of 2.68 (area is .9963).

• Many variables approximate the Normal distribution. However, even if a data set is skewed, the sample from that data set will likely approximate the Normal distribution.

Page 4: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

The 68-95-99.7 Rule (aka the Empirical Rule)

The 68-95-99.7 RuleIn the Normal distribution with mean µ and standard deviation σ:

• Approximately 68% of the observations fall within σ of µ.

• Approximately 95% of the observations fall within 2σ of µ.

• Approximately 99.7% of the observations fall within 3σ of µ.

Page 5: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: Suppose a sample of scores yields a mean of 100 and a standard deviation of 15. Assume that the distribution is Normal.Approximately what percent of scores should fall between 85 and 115? (Hint: Draw a diagram first!)

85 and 115 are both one standard deviation from the mean, so the percent of scores that fall between 85 and 115 is approximately 68%

Let’s try some more with the same distribution…

Page 6: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

What percent of scores should fall:

a) Between 70 and 130?95%c) Between 70 and 115?13.5%+68%=81.5%e) Less than 70?2.5%

b) Between 55 and 145?99.7%d) Greater than 115?13.5%+2.5%=16%

Page 7: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

• We know approximately what percent of data fall exactly 1, 2, and 3 standard deviations from the mean. However, how can we find a percent if a value does not fall exactly 1, 2, or 3 standard deviations from the mean.

• The first thing we have to do is standardize our score(s). This is referred to as finding the z-score.

Page 8: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

The Standard Normal DistributionAll Normal distributions are the same if we measure in units of size σ from the mean µ as center.

The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1.

If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable

has the standard Normal distribution, N(0,1).

Page 9: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Some notes on z-scores:• Z-scores can be negative, positive, or 0.• A positive z-score would indicate the original value is above the mean. For

example, a z-score of 1.24 would mean that the score is 1.24 standard deviations above the mean.

• A negative z-score would indicate the original value is below the mean. For example, a z-score of -0.27 would mean that the score is 0.27 standard deviations below the mean.

• A z-score of 0 would indicate that the original value was the same as the mean.• Question: Are all negative z-scores bad?

Page 10: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: A distribution is approximately Normal with a mean of 26 and a standard deviation of 7. Calculate and interpret the z-score for a value of 21.

The value 21 is 0.71 standard deviations below the mean.

Page 11: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

• Example: Seth recently took two tests in school. On his history test he scored a 75. The class average on the test was a 63 and the test had a standard deviation of 3. On his biology test, Seth scored a 81. The class average on the test was a 76 and the test had a standard deviation of 7. Relatively speaking, on which test did he perform better?

He performed better on his history test as he was more standard deviations above that respective class mean.

Page 12: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Steps to Use the standard Normal table1) Draw the Normal curve. Make sure to identify the mean and standard deviation.2) Standardize the score(s) of interest.3) Plot the score(s), draw a vertical line(s), and shade the area of interest.4) Look up the z-score on the standard Normal table.

• If the area of interest is shaded to the left, the value in the table is the desired area.• If the area of interest is shaded to the right, we need to subtract the area in the table

from 1.• If the area of interest is shaded between two z-scores, we need to look up the area for

both z-scores and subtract.

Note for the AP test: For all Normal distribution problems (and other distributions which we will get to) you need to do 3 things:

1) state the distribution and identify values of interest2) show work3) answer the question

Page 13: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: A data set is Normally distributed with a mean of 259 and a standard deviation of 74. Find the area under the curve less than a score of 180.

• We have to look up the z-score of -1.07 on the table.

• Since the score starts out as “-1.0”, go to the z column and go down until you reach -1.0.

• Since there is a 7 in the hundredths place (.07), go to the right until you reach the .07 column. You should now be located in the spot that has -1.0 to the left, and .07 on the top.

This value should be .1423.

Page 14: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: A data set is Normally distributed with a mean of 26 and a standard deviation of 2.4. Find the area under the curve more than a score of 29.N(26, 2.4)

A z-score of 1.25 on the table gives an area of .8944. However, this is the area to the LEFT of the score. We want the area to the right of the score. Since the area under the Normal curve totals 1, we need to subtract .8944 from 1. This gives us our desired area, which is .1056.

Page 15: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour on his first-serves. Assuming that the distribution of his first-serve speeds is Normal with a standard deviation of 6 mph, find what proportion of his first-serves you would expect to be between 110 and 125 mph.N(115, 6)

Look up both areas and subtract..9525-.2033=.7492

Page 16: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Using the Normal Distribution in Reverse

• In 2008, the distribution of batting averages for MLB players with at least 300 plate appearances was approximately Normal with a mean of 0.272 and a standard deviation of 0.027.

• Suppose a player gets a salary bonus if his batting average is in the top 10% of all players. How well must a player hit for his batting average to be in the top 10%?

• We need to find the boundary between the lowest 90% of the distribution and the highest 10%.

• The boundary value is called the 90th percentile, because 90% of the values fall below it.

• A percentile describes location in a distribution, like quartiles.

Page 17: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

• N(.272, .027)• We know that the area under the

curve is .90. Therefore, we want to look at the interior of the standard Normal table for a proportion closest to 0.9000 and get the z-score associated with this proportion.

• The closest value is 0.8997. This corresponds to a z-score of 1.28. This means the 90th percentile is 1.28 standard deviations above the mean.

• Now let’s find the batting average associated with this z-score.

Page 18: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

How can we do these calculations on the TI-84?

• To find areas: normalcdf(lower, upper, mean, SD)• 2nd, DISTR, 2: normalcdf

• To find boundaries: invNorm(area to left, mean, SD)• 2nd, DISTR, 3: invNorm

• Note: mean and SD default to 0,1 if not entered.• Note: You must show your three steps!

• State distribution and identify values of interest• Show work• Answer

Page 19: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Example: Suppose that Clayton Kershaw of the LA Dodgers throws his fastball with a mean velocity of 94 miles per hour (mph) and a standard deviation of 2 mph, and that the distribution if his fastball speeds can be modeled by a Normal distribution.a) About what proportion of his fastballs will travel at least 100 mph?N(94, 2)normalcdf (100, 100000, 94, 2)Lower bound: 100Upper bound: 100000Mean: 94SD: 2

Approximately .0013 of his fastballs will travel at least 100 mph.Note: (a) and (b) in the notes are the same question.

Page 20: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

c) About what proportion of his fastballs will travel less than 90 mph?N(94, 2)normalcdf (0, 90, 94, 2)Lower bound: 0 (use 0 because a pitch cannot be negative mph)Upper bound: 90Mean: 94SD: 2

Approximately .0228 of his fastballs will travel less than 90 mph.

Page 21: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

d) About what proportion of his fastballs will travel between 93 and 95 mph?N(94, 2)normalcdf (93, 95, 94, 2)Lower bound: 93Upper bound: 95Mean: 94SD: 2

Approximately .3829 of his fastballs will travel between 93 and 95 mph.

Page 22: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

e) What is the 30th percentile of Kershaw’s distribution of fastball velocities?N(94, 2)invNorm(.3, 94, 2)area to left: .30mean: 94SD: 2

The 30th percentile is 92.9512 mph.

Page 23: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

f) What fastball velocities would be considered low outliers for Kershaw?N(94, 2)The values would fall below a z-score of -2.68 (area of .0037)

Fastballs below 88.64 mph would be considered outliers.

On the calculator:invNorm(.0037, 94, 2)Area to the left: .0037Mean: 94SD: 2

Same answer!!!

Page 24: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

g) Suppose that a different pitcher’s fastballs have a mean velocity of 92 mph and 40% of his fastballs go less than 90 mph. What is his standard deviation of his fastball velocities, assuming his distribution of velocities can be modeled by a Normal distribution?N(92, ?)Use your table and work backwards to find the z-score associated with .40.Z=-0.25.Now substitute into our equation.

To check: use invNorm (.4, 92, 8) and you should get 90.

Page 25: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Normal Distribution Calculations

We can answer a question about areas in any Normal distribution by standardizing and using Table A or by using technology.

Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and boundary value(s) clearly identified.

Step 2: Perform calculations—show your work! Do one of the following: (i) Compute a z-score for each boundary value and use Table A or technology to find the desired area under the standard Normal curve; or (ii) use the normalcdf command and label each of the inputs.

Step 3: Answer the question.

How To Find Areas In Any Normal Distribution

Page 26: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Working Backwards: Normal Distribution Calculations

Sometimes, we may want to find the observed value that corresponds to a given percentile. There are again three steps.

Step 1: State the distribution and the values of interest. Draw a Normal curve with the area of interest shaded and the mean, standard deviation, and unknown boundary value clearly identified.

Step 2: Perform calculations—show your work! Do one of the following: (i) Use Table A or technology to find the value of z with the indicated area under the standard Normal curve, then “unstandardize” to transform back to the original distribution; or (ii) Use the invNorm command and label each of the inputs.

Step 3: Answer the question.

How To Find Values From Areas In Any Normal Distribution

Page 27: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Assessing Normality

The Normal distributions provide good models for some distributions of real data.

Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed.

A Normal probability plot provides a good assessment of whether a data set follows a Normal distribution.

If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal.

Systematic deviations from a straight line indicate a non-Normal distribution.

Outliers appear as points that are far away from the overall pattern of the plot.

Interpreting Normal Probability Plots

Page 28: Chapter 2 Modeling Distributions of Data Objectives SWBAT: 1)Find and interpret the percentile of an individual value within a distribution of data. 2)Find.

Transforming Data

Adding the same number a to (subtracting a from) each observation:

• adds a to (subtracts a from) measures of center and location (mean, median, quartiles, percentiles), but

• Does not change the shape of the distribution or measures of spread (range, IQR, standard deviation).

Effect of Adding (or Subtracting) a Constant

Multiplying (or dividing) each observation by the same number b:

• multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b

• multiplies (divides) measures of spread (range, IQR, standard deviation) by |b|, but

• does not change the shape of the distribution

Effect of Multiplying (or Dividing) by a Constant