Normal Distribution

147
Excursions in Modern Mathematics, 7e: 16.1 - 1 Copyright © 2010 Pearson Education, Inc. 1 Chapter 16 Chapter 16 Normal Distributions Normal Distributions Peter Tannenbaum Peter Tannenbaum Everything is Back to Normal (Almost) Everything is Back to Normal (Almost)

description

Engineering

Transcript of Normal Distribution

Page 1: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 1Copyright © 2010 Pearson Education, Inc.1

Chapter 16Chapter 16Normal DistributionsNormal Distributions

Peter TannenbaumPeter Tannenbaum

Everything is Back to Normal (Almost)Everything is Back to Normal (Almost)

Page 2: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 2Copyright © 2010 Pearson Education, Inc.2

Normal DistributionsOutline/learning Objectives

• To identify and describe an approximately normal distribution.

• To state properties of a normal distribution.

• To understand a data set in terms of standardized data values.

Page 3: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 3Copyright © 2010 Pearson Education, Inc.3

• To state the 68-95-99.7 rule.

• To apply the honest and dishonest-coin principles to understand the concept of a confidence interval.

Normal DistributionsOutline/learning Objectives

Page 4: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 4Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 5: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 5Copyright © 2010 Pearson Education, Inc.

This table is a frequency table giving the heights of 430 NBA players listed on team rosters at the start of the 2008–2009 season.

Example 16.1 Distribution of Heights of NBA Players

Page 6: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 6Copyright © 2010 Pearson Education, Inc.

The bar graph for this data set is shown.

Example 16.1 Distribution of Heights of NBA Players

Page 7: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 7Copyright © 2010 Pearson Education, Inc.

We can see that the bar graph fits roughly the pattern of a somewhat skewed (off-center) bell-shaped curve (the orange curve).

Example 16.1 Distribution of Heights of NBA Players

Page 8: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 8Copyright © 2010 Pearson Education, Inc.

An idealized bell-shaped curve for this data (the red curve) is shown for comparison purposes. The data would be even more bell-shaped if it weren’t for all the 6’7” to 7’ players.

Example 16.1 Distribution of Heights of NBA Players

Page 9: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 9Copyright © 2010 Pearson Education, Inc.

This is not a quirk of nature but rather a reflection of the way NBA teams draft players.

Example 16.1 Distribution of Heights of NBA Players

Page 10: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 10Copyright © 2010 Pearson Education, Inc.

The table on the next slide shows the scores of N = 1,494,531 college-bound seniors on the mathematics section of the 2007 SAT. (Scores range from 200 to 800 and are grouped in class intervals of 50 points.) The table shows the score distribution and the percentage of test takers in each class interval.

Example 16.2 2007 SAT Math Scores

Page 11: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 11Copyright © 2010 Pearson Education, Inc.

Page 12: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 12Copyright © 2010 Pearson Education, Inc.

Here is a bar graph of the data.

Example 16.2 2007 SAT Math Scores

Page 13: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 13Copyright © 2010 Pearson Education, Inc.

The orange bell-shaped curve traces the pattern of the data in the bar graph. If the data followed a perfect bell curve, it would follow the red curve shown in the figure.

Example 16.2 2007 SAT Math Scores

Page 14: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 14Copyright © 2010 Pearson Education, Inc.

Unlike the curves in Fig.16-3, here the orange and red curves are very close.

Example 16.2 2007 SAT Math Scores

Page 15: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 15Copyright © 2010 Pearson Education, Inc.

The two very different data sets discussed in Examples 16.1 and 16.2 have one thing in common–both can be described as having bar graphs that roughly fit a bell-shaped pattern. In Example 16.1, the fit is crude; in Example 16.2, it is very good. In either case, we say that the data set has an approximately normal distribution.

Approximately Normal Distribution

Page 16: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 16Copyright © 2010 Pearson Education, Inc.

The word normal in this context is to be interpreted as meaning that the data fits into a special type of bell-shaped curve; the word approximately is a reflection of the fact that with real-world data we should not expect an absolutely perfect fit. A distribution of data that has a perfect bell shape is called a normal distribution.

Normal Distribution

Page 17: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 17Copyright © 2010 Pearson Education, Inc.

Perfect bell-shaped curves are called normal curves. Every approximately normal data set can be idealized mathematically by a corresponding normal curve (the red curves in Examples 16.1 and 16.2). This is important because we can then use the mathematical properties of the normal curve to analyze and draw conclusions about the data.

Normal Curves

Page 18: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 18Copyright © 2010 Pearson Education, Inc.

The tighter the fit between the approximately normal distribution and the normal curve, the better our analysis and conclusions are going to be. Thus, to understand real-world data sets that have an approximately normal distribution, we first need to understand some of the mathematical properties of normal curves.

Normal Curves

Page 19: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 19Copyright © 2010 Pearson Education, Inc.

As usual, we will use the letter N to represent the size of the data set. In real- life applications, data sets can range in size from reasonably small (a dozen or so data points) to very large (hundreds of millions of data points), and the larger the data set is, the more we need a good way to describe and summarize it.

Data Set

Page 20: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 20Copyright © 2010 Pearson Education, Inc.

Example 14.1 Stat 101 Test Scores

Page 21: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 21Copyright © 2010 Pearson Education, Inc.

Like students everywhere, the students in the Stat 101 class have one question foremost on their mind when they look at the results: How did I do? Each student can answer this question directly from the table. It’s the next question that is statistically much more interesting. How did the class as a whole do? To answer this last question, we will have to find a way to package the results into a compact, organized, and intelligible whole.

Example 14.1 Stat 101 Test Scores

Page 22: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 22Copyright © 2010 Pearson Education, Inc.

The first step in summarizing the information in Table 14-1 is to organize the scores in a frequency table such as Table 14-2. In this table, the number below each score gives the frequency of the score–that is, the number of students getting that particular score.

Example 14.2 Stat 101 Test Scores: Part 2

Page 23: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 23Copyright © 2010 Pearson Education, Inc.

We can readily see from Table 14-2 that there was one student with a score of 1, one with a score of 6, two with a score of 7, six with a score of 8, and so on. Notice that the scores with a frequency of zero are not listed in the table.

Example 14.2 Stat 101 Test Scores: Part 2

Page 24: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 24Copyright © 2010 Pearson Education, Inc.

We can do even better. Figure 14-1 (next slide) shows the same information in a much more visual way called a bar graph, with the test scores listed in increasing order on a horizontal axis and the frequency of each test score displayed by the height of the column above that test score. Notice that in the bar graph, even the test scores with a frequency of zero show up–there simply is no column above these scores.

Example 14.2 Stat 101 Test Scores: Part 2

Page 25: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 25Copyright © 2010 Pearson Education, Inc.

Figure 14-1

Example 14.2 Stat 101 Test Scores: Part 2

Page 26: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 26Copyright © 2010 Pearson Education, Inc.

Bar graphs are easy to read, and they are a nice way to present a good general picture of the data. With a bar graph, for example, it is easy to detect outliers–extreme data points that do not fit into the overall pattern of the data. In this example there are two obvious outliers–the score of 24 (head and shoulders above the rest of the class) and the score of 1 (lagging way behind the pack).

Example 14.2 Stat 101 Test Scores: Part 2

Page 27: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 27Copyright © 2010 Pearson Education, Inc.

Sometimes it is more convenient to express the bar graph in terms of relative frequencies –that is, the frequencies given in terms of percentages of the total population. Figure 14-2 shows a relative frequency bar graph for the Stat 101 data set. Notice that we indicated on the graph that we are dealing with percentages rather than total counts and that the size of the data set is N = 75.

Example 14.2 Stat 101 Test Scores: Part 2

Page 28: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 28Copyright © 2010 Pearson Education, Inc.

Figure 14-2

Example 14.2 Stat 101 Test Scores: Part 2

Page 29: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 29Copyright © 2010 Pearson Education, Inc.

This allows anyone who wishes to do so to compute the actual frequencies. For example, Fig. 14-2 indicates that 12% of the 75 students scored a 12 on the exam, so the actual frequency is given by 75 0.12 = 9 students.The change from actual frequencies to percentages (or vice versa) does not change the shape of the graph–it is basically a change of scale.

Example 14.2 Stat 101 Test Scores: Part 2

Page 30: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 30Copyright © 2010 Pearson Education, Inc.

Frequency charts that use icons or pictures instead of bars to show the frequencies are commonly referred to as pictograms. The point of a pictogram is that a graph is often used not only to inform but also to impress and persuade, and, in such cases, a well-chosen icon or picture can be a more effective tool than just a bar.

Here’s a pictogram displaying the same data as in figure 14-2.

Bar Graph versus Pictogram

Page 31: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 31Copyright © 2010 Pearson Education, Inc.

Figure 14-3

Bar Graph versus Pictogram

Page 32: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 32Copyright © 2010 Pearson Education, Inc.

This figure is a pictogram showing the growth in yearly sales of the XYZ Corporation between 2001 and 2006. It’s a good picture to

Example 14.3 Selling the XYZ Corporation

show at a shareholders meeting, but the picture is actually quite misleading.

Page 33: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 33Copyright © 2010 Pearson Education, Inc.

This figure shows a pictogram for exactly the same data with a much more accurate and sobering picture of how well the XYZ

Example 14.3 Selling the XYZ Corporation

Corporation had been doing.

Page 34: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 34Copyright © 2010 Pearson Education, Inc.

The difference between the two pictograms can be attributed to a couple of standard tricks of the trade: (1) stretching the scale of the vertical axis and (2) “cheating” on the choice of starting value on the vertical axis. As an educated consumer, you should always be on the lookout for these tricks. In graphical descriptions of data, a fine line separates objectivity from propaganda.

Example 14.3 Selling the XYZ Corporation

Page 35: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 35Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 36: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 36Copyright © 2010 Pearson Education, Inc.

The study of normal curves can be traced back to the work of the great German mathematician Carl Friedrich Gauss, and for this reason, normal curves are sometimes known as Gaussian curves. Normal curves all share the same basic shape–that of a bell–but otherwise they can differ widely in their appearance.

Normal Curves

Page 37: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 37Copyright © 2010 Pearson Education, Inc.

Some bells are short and squat,others are tall and skinny, and others fall somewhere in between.

Normal Curves

Page 38: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 38Copyright © 2010 Pearson Education, Inc.

Mathematically speaking, however, they all have the same underlying structure. In fact, whether a normal curve is skinny and tall or short and squat depends on the choice of units on the axes, and any two normal curves can be made to look the same by just fiddling with the scales of the axes.What follows is a summary of some of the essential facts about normal curves and their associated normal distributions.These facts are going to help us greatly later on in the chapter.

Normal Curves

Page 39: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 39Copyright © 2010 Pearson Education, Inc.

Symmetry.

Every normal curve has a vertical axis of symmetry, splitting the bell-shaped region outlined by the curve into two identical halves. This is the only line of symmetry of a normal curve, so we can refer to it without ambiguity as the line of symmetry.

Essential Facts About Normal Curves

Page 40: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 40Copyright © 2010 Pearson Education, Inc.

Median / mean.

We will call the point of intersection of the

Essential Facts About Normal Curves

horizontal axis and the line of symmetry of the curve the center of the distribution.

Page 41: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 41Copyright © 2010 Pearson Education, Inc.

Median / mean.

The center represents both the median M and the mean (average) of the data. Thus, in a normal distribution, M = . The fact that in a normal distribution the median equals the mean implies that 50% of the data are less than or equal to the mean and 50% of the data are greater than or equal to the mean.

Essential Facts About Normal Curves

Page 42: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 42Copyright © 2010 Pearson Education, Inc.

In a normal distribution, M = .

(If the distribution is approximately normal, then M ≈ .)

MEDIAN AND MEAN OF A NORMAL DISTRIBUTION

Page 43: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 43Copyright © 2010 Pearson Education, Inc.

Standard Deviation.

The standard deviation–traditionally denoted by the Greek letter (sigma)–is an important measure of spread, and it is particularly useful when dealing with normal (or approximately normal) distributions, as we will see shortly. The easiest way to describe the standard deviation of a normal distribution is to look at the normal curve.

Essential Facts About Normal Curves

Page 44: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 44Copyright © 2010 Pearson Education, Inc.

Standard Deviation.

If you were to bend a piece of wire into a bell-shaped normal curve, at the very top you would be bending the wire downward.

Essential Facts About Normal Curves

Page 45: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 45Copyright © 2010 Pearson Education, Inc.

Standard Deviation.

But, at the bottom you would be bending the wire upward.

Essential Facts About Normal Curves

Page 46: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 46Copyright © 2010 Pearson Education, Inc.

Standard Deviation.As you move your hands down the wire, the curvature gradually changes, and there is one point on each side of the curve where

Essential Facts About Normal Curves

the transition from being bent downward to being bent upward takes place. Such a point is called a point of inflection of the curve.

Page 47: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 47Copyright © 2010 Pearson Education, Inc.

Standard Deviation.The standard deviation of a normal distribution is the horizontal distance

Essential Facts About Normal Curves

between the line of symmetry of the curve and one of the two points of inflection, P´ or P in the figure.

Page 48: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 48Copyright © 2010 Pearson Education, Inc.

In a normal distribution, the standard deviation equals the distance between a point of inflection and the line of symmetry of the curve.

STANDARD DEVIATION OF A NORMAL DISTRIBUTION

Page 49: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 49Copyright © 2010 Pearson Education, Inc.

Quartiles.We learned in Chapter 14 how to find the quartiles of a data set. When the data set has a normal distribution, the first and third quartiles can be approximated using the mean and the standard deviation . The magic number to memorize is 0.675. Multiplying the standard deviation by 0.675 tells us how far to go to the right or left of the mean to locate the quartiles.

Essential Facts About Normal Curves

Page 50: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 50Copyright © 2010 Pearson Education, Inc.

In a normal distribution,

Q3 ≈ + (0.675)and

Q1 ≈ – (0.675).

QUARTILES OF A NORMAL DISTRIBUTION

Page 51: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 51Copyright © 2010 Pearson Education, Inc.

Imagine you are told that a data set ofN = 1,494,531 numbers has a normal distribution with mean = 515 and standard deviation = 114. For now, let’s not worry about the source of this data–we’ll discuss this soon.Just knowing the mean and standard deviation of this normal distribution will allow us to draw a few useful conclusions about this data set.

Example 16.3 A Mystery Normal Distribution

Page 52: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 52Copyright © 2010 Pearson Education, Inc.

■ In a normal distribution, the median equals the mean, so the median value is M = 515. This implies that of the 1,494,531 numbers, there are 747,266 that are smaller than or equal to 515 and 747,266 that are greater than or equal to 515.

Example 16.3 A Mystery Normal Distribution

Page 53: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 53Copyright © 2010 Pearson Education, Inc.

■ The first quartile is given by Q1 ≈ 515 – 0.675 114 ≈ 438.This implies that 25% of the data set (373,633 numbers) are smaller than or equal to 438.

■ The third quartile is given byQ3 ≈ 515 + 0.675 114 ≈ 592.

This implies that 25% of the data set (373,633 numbers) are bigger than or equal to 592.

Example 16.3 A Mystery Normal Distribution

Page 54: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 54Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 55: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 55Copyright © 2010 Pearson Education, Inc.

We have seen that normal curves don’t all look alike, but this is only a matter of perception. In fact, all normal distributions tell the same underlying story but use slightly different dialects to do it. One way to understand the story of any given normal distribution is to rephrase it in a simple common language–a language that uses the mean and the standard deviation as its only vocabulary. This process is called standardizing the data.

Standardizing the Data

Page 56: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 56Copyright © 2010 Pearson Education, Inc.

To standardize a data value x, we measure how far x has strayed from the mean using the standard deviation as the unit of measurement. A standardized data value is often referred to as a z-value.

The best way to illustrate the process of standardizing normal data is by means of a few examples.

z-value

Page 57: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 57Copyright © 2010 Pearson Education, Inc.

Let’s consider a normally distributed data set with mean = 45 ft and standard deviation = 10 ft. We will standardize several data values, starting with a couple of easy cases.

Example 16.4 Standardizing Normal Data

Page 58: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 58Copyright © 2010 Pearson Education, Inc.

■ x1 = 55 ft is a data point located 10 ft above (A in the figure) the mean = 45 ft.

Example 16.4 Standardizing Normal Data

Page 59: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 59Copyright © 2010 Pearson Education, Inc.

■ Coincidentally, 10 ft happens to be exactly one standard deviation. The fact thatx1 = 55 ft is located one standard deviation above the mean can be rephrased by saying that the standardized value ofx1 = 55 is z1 = 1.

Example 16.4 Standardizing Normal Data

Page 60: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 60Copyright © 2010 Pearson Education, Inc.

■ x2 = 35 ft is a data point located 10 ft (i.e., one standard deviation) below the mean (B in the figure). This means that the standardized value of x2 = 35 is z2 = –1.

Example 16.4 Standardizing Normal Data

Page 61: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 61Copyright © 2010 Pearson Education, Inc.

■ x3 = 50 ft is a data point that is 5 ft (i.e., half a standard deviation) above the mean (C in the figure). This means that the standardized value of x3 = 50 is z3 = 0.5.

Example 16.4 Standardizing Normal Data

Page 62: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 62Copyright © 2010 Pearson Education, Inc.

■ x4 = 21.58 is ... uh, this is a slightly more complicated case. How do we handle this one? First, we find the signed distance between the data value and the mean by taking their difference (x4 – ). In this case we get 21.58 ft – 45 ft = –23.42 ft. (Notice that for data values smaller than the mean this difference will be negative.)

Example 16.4 Standardizing Normal Data

Page 63: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 63Copyright © 2010 Pearson Education, Inc.

■ If we divide this difference by = 10 ft, we get the standardized value z4 = –2.342. This tells us the data point x4 is –2.342 standard deviations from the mean = 45 ft (D in the figure).

Example 16.4 Standardizing Normal Data

Page 64: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 64Copyright © 2010 Pearson Education, Inc.

In Example 16.4 we were somewhat fortunate in that the standard deviation was = 10, an especially easy number to work with. It helped us get our feet wet. What do we do in more realistic situations, when the mean and standard deviation may not be such nice round numbers? Other than the fact that we may need a calculator to do the arithmetic, the basic idea we used in Example 16.4 remains the same.

Standardizing Values

Page 65: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 65Copyright © 2010 Pearson Education, Inc.

In a normal distribution with mean and standard deviation , the standardized value of a data point x is z = (x – )/.

STANDARDING RULE

Page 66: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 66Copyright © 2010 Pearson Education, Inc.

This time we will consider a normally distributed data set with mean = 63.18 lb and standard deviation = 13.27 lb. What is the standardized value of x = 91.54 lb?

This looks nasty, but with a calculator, it’s a piece of cake:

z = (x – )/ = (91.54 – 63.18)/13.27

= 28.36/13.27 ≈ 2.14

Example 16.5 Standardizing Normal Data: Part 2

Page 67: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 67Copyright © 2010 Pearson Education, Inc.

One important point to note is that while the original data is given in pounds, there are no units given for the z-value.

The units for the z-value are standard deviations, and this is implicit in the very fact that it is a z-value.

Example 16.5 Standardizing Normal Data: Part 2

Page 68: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 68Copyright © 2010 Pearson Education, Inc.

The process of standardizing data can also be reversed, and given a z-value we can go back and find the corresponding x-value. All we have to do is take the formulaz = (x – )/ and solve for x in terms of z. When we do this we get the equivalent formula x = + •z. Given , , and a value for z, this formula allows us to “unstandardize” z and find the original data value x.

Finding the Value of a Data Point

Page 69: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 69Copyright © 2010 Pearson Education, Inc.

Consider a normal distribution with mean

= 235.7 m and standard deviation

= 41.58 m.

What is the data value x that corresponds to

the standardized z-value z = –3.45?

Example 16.6 “Unstandardizing” az-Value

Page 70: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 70Copyright © 2010 Pearson Education, Inc.

We first compute the value of –3.45 standard deviations:

–3.45 = –3.45 41.58 m = –143.451 m.

The negative sign indicates that the data point is to be located below the mean.

Thus, x = 235.7 m – 143.451 m = 92.249 m.

Example 16.6 “Unstandardizing” az-Value

Page 71: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 71Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 72: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 72Copyright © 2010 Pearson Education, Inc.

When we look at a typical bell-shaped distribution, we can see that most of the data are concentrated near the center.

As we move away from the center the heights of the columns drop rather fast, and if we move far enough away from the center, there are essentially no data to be found.

The 68-95-99.7 Rule

Page 73: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 73Copyright © 2010 Pearson Education, Inc.

These are all rather informal observations, but there is a more formal way to phrase this, called the 68-95-99.7 rule.

This useful rule is obtained by using one, two, and three standard deviations above and below the mean as special landmarks. In effect, the 68-95-99.7 rule is three separate rules in one.

The 68-95-99.7 Rule

Page 74: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 74Copyright © 2010 Pearson Education, Inc.

1. In every normal distribution, about 68% of all the data values fall within one standard deviation above and below the mean. In other words, 68% of all the data have standardized values between z = –1 and z = 1. The remaining 32% of the data are divided equally between data with standardized values z ≤ –1 and data with standardized values z ≥ 1.

THE 68-95-99.7 RULE

Page 75: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 75Copyright © 2010 Pearson Education, Inc.

THE 68-95-99.7 RULE

Page 76: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 76Copyright © 2010 Pearson Education, Inc.

2. In every normal distribution, about 95% of all the data values fall within two standard deviations above and below the mean. In other words, 95% of all the data have standardized values between z = –2 and z = 2. The remaining 5% of the data are divided equally between data with standardized values z ≤ –2 and data with standardized values z ≥ 2.

THE 68-95-99.7 RULE

Page 77: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 77Copyright © 2010 Pearson Education, Inc.

THE 68-95-99.7 RULE

Page 78: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 78Copyright © 2010 Pearson Education, Inc.

3. In every normal distribution, about 99.7% (i.e., practically 100%) of all the data values fall within three standard deviations above and below the mean. In other words, 99.7% of all the data have standardized values between z = –3 and z = 3. There is a minuscule amount of data with standardized values outside this range.

THE 68-95-99.7 RULE

Page 79: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 79Copyright © 2010 Pearson Education, Inc.

THE 68-95-99.7 RULE

Page 80: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 80Copyright © 2010 Pearson Education, Inc.

For approximately normal distributions, it is often convenient to round the 99.7% to 100% and work under the assumption that essentially all of the data fall within three standard deviations above and below the mean.

This means that if there are no outliers in the data, we can figure that there are approximately six standard deviations separating the smallest (Min) and the largest (Max) values of the data.

Practical Implications

Page 81: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 81Copyright © 2010 Pearson Education, Inc.

Earlier in the text, we defined the range R of a data set (R = Max – Min) and, in the case of an approximately normal distribution, we can conclude that the range is about six standard deviations.

Remember that this is true as long as we can assume that there are no outliers.

Practical Implications

Page 82: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 82Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 83: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 83Copyright © 2010 Pearson Education, Inc.

The reason we like to idealize a real-life, approximately normal data set by means of a normal distribution is that we can use many of the properties we just learned about normal distributions to draw useful conclusions about our data. For example, the 68-95-99.7 rule for normal curves can be reinterpreted in the context of an approximately normal data set as follows.

Real-Life and The 68-95-99.7 Rule

Page 84: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 84Copyright © 2010 Pearson Education, Inc.

1. In an approximately normal data set, about 68% of the data values fall within (plus or minus) one standard deviation of the mean.

2. In an approximately normal data set, about 95% of the data values fall within (plus or minus) two standard deviations of the mean.

THE 68-95-99.7 RULE FOR APPROXIMATELY NORMAL DATA

Page 85: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 85Copyright © 2010 Pearson Education, Inc.

3. In an approximately normal data set, about 99.7%, or practically 100%, of the data values fall within (plus or minus) three standard deviations of the mean.

THE 68-95-99.7 RULE FOR APPROXIMATELY NORMAL DATA

Page 86: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 86Copyright © 2010 Pearson Education, Inc.

We are now going to use what we learned so far to analyze the 2007 SAT mathematics scores. As you may recall, there were N = 1,494,531 scores, distributed in a nice, approximately normal distribution. The two new pieces of information that we are going to use now are that the mean score was = 515 points and the standard deviation was = 114 points.

Example 16.7 2007 SAT Math Scores: Part 2

Page 87: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 87Copyright © 2010 Pearson Education, Inc.

Just knowing the mean and the standard deviation (and that the distribution of test scores is approximately normal) allows us to draw a lot of useful conclusions:

Example 16.7 2007 SAT Math Scores: Part 2

Page 88: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 88Copyright © 2010 Pearson Education, Inc.

Median. In an approximately normal distribution, the mean and the median should be about the same. Given that the mean score was = 515, we can expect the median score to be close to 515. Moreover, the median has to be an actual test score when N is odd (which it is in this example), and SAT scores come in multiples of 10, so a reasonable guess for the median would be either 510 or 520 points.

Example 16.7 2007 SAT Math Scores: Part 2

Page 89: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 89Copyright © 2010 Pearson Education, Inc.

First Quartile. Recall that the first quartile is located 0.675 standard deviation below the mean. This means that in this example the first quartile should be close to515 – 0.675 114 ≈ 438 points. But again, the first quartile has to be an actual test score (the only time this is not the case is when N is divisible by 4), so a reasonable guess is that the first quartile of the test scores is either 430 or 440 points.

Example 16.7 2007 SAT Math Scores: Part 2

Page 90: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 90Copyright © 2010 Pearson Education, Inc.

Third Quartile. We know that the third quartile is located 0.675 standard deviation above the mean. In this case this gives515 + 0.675 114 ≈ 592 points.

The most reasonable guess is that the third quartile of the test scores is either 590 or 600 points.

Example 16.7 2007 SAT Math Scores: Part 2

Page 91: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 91Copyright © 2010 Pearson Education, Inc.

In all three cases our guesses were very good–as reported by the College Board, the 2007 SAT math scores had median M = 510, first quartile Q1 = 430, and third quartileQ3 = 590.

We are now going to go analyze the 2007 SAT scores in a little more depth–using the 68-95-99.7 rule.

Example 16.7 2007 SAT Math Scores: Part 2

Page 92: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 92Copyright © 2010 Pearson Education, Inc.

The Middle 68.Approximately 68% of the scores should have fallen within plus or minus one standard deviation from the mean. In this case, this range of scores goes from 515 – 114 = 401 to 515 + 114 = 629 points. Since SAT scores can only come in multiples of 10, we can estimate that a little over two-thirds of students had scores between 400 and 630 points.

Example 16.7 2007 SAT Math Scores: Part 2

Page 93: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 93Copyright © 2010 Pearson Education, Inc.

The Middle 68.

The remaining third were equally divided between those scoring 630 points or more (about 16% of test takers) and those scoring 400 points or less (the other 16%).

Example 16.7 2007 SAT Math Scores: Part 2

Page 94: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 94Copyright © 2010 Pearson Education, Inc.

The Middle 95.Approximately 95% of the scores should have fallen within plus or minus two standard deviations from the mean, that is, between 515 – 228 = 287 and 515 – 228 = 743 points. In practice this means SAT scores between 290 and 740 points. The remaining 5% of the scores were 740 points or above (about 2.5%) and 290 points or below (the other 2.5%).

Example 16.7 2007 SAT Math Scores: Part 2

Page 95: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 95Copyright © 2010 Pearson Education, Inc.

Everyone.

The 99.7 part of the 68-95-99.7 rule is not much help in this example. Essentially, it says that all test scores fell between515 – 342 = 173 points and 515 + 342 = 857 points.Duh! SAT mathematics scores are always between 200 and 800 points, so what else is new?

Example 16.7 2007 SAT Math Scores: Part 2

Page 96: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 96Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 97: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 97Copyright © 2010 Pearson Education, Inc.

We are now ready to take up another important aspect of normal curves–their connection with random events and, through that, their critical role in margins of error of public opinion polls.

Normal Curves and Random Events

Page 98: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 98Copyright © 2010 Pearson Education, Inc.

John Kerrich while he was a prisoner of war during World War II, tossed a coin 10,000 times and kept records of the number of heads in groups of 100 tosses. With modern technology, we can repeat Kerrich’s experiment and take it much further. Practically any computer can imitate the tossing of a coin by means of a random-number generator. If we use this technique, we can “toss coins” in mind-boggling numbers–millions of times if we so choose.

Example 16.8 Coin-Tossing Experiments

Page 99: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 99Copyright © 2010 Pearson Education, Inc.

We will start modestly. We will toss our make-believe coin 100 times and count the number of heads, which we will denote by X. Before we do that, let’s say a few words about X. Since we cannot predict ahead of time its exact value–we are tempted to think that it should be 50, but, in principle, it could be anything from 0 to 100–we call X a random variable.

Example 16.8 Coin-Tossing Experiments

Page 100: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 100Copyright © 2010 Pearson Education, Inc.

The possible values of the random variable X are governed by the laws of probability: Some values of X are extremely unlikely (X = 0, X = 100) and others are much more likely (X = 50) although the likelihood of X = 50 is not as great as one would think. It also seems reasonable (assuming that the coin is fair and heads and tails are equally likely) that the likelihood of X = 49 should be the same as the likelihood X = 51, the likelihood of X = 48 should be the same as the likelihood of X = 52 and so on.

Example 16.8 Coin-Tossing Experiments

Page 101: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 101Copyright © 2010 Pearson Education, Inc.

While all of the preceding statements are true, we still don’t have a clue as to what is going to happen when we toss the coin 100 times. One way to get a sense of the probabilities of the different values of X is to repeat the experiment many times and check the frequencies of the various outcomes. Finally, we are ready to do some experimenting!

Example 16.8 Coin-Tossing Experiments

Page 102: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 102Copyright © 2010 Pearson Education, Inc.

Our first trial results in 46 heads out of 100 tosses (X = 46). The first 10 trials give X = 46, 49, 51, 53, 49, 52, 47, 46, 53, 49.

Example 16.8 Coin-Tossing Experiments

Page 103: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 103Copyright © 2010 Pearson Education, Inc.

Continuing this way, we collect data for the values of X in 100 trials.

Example 16.8 Coin-Tossing Experiments

Page 104: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 104Copyright © 2010 Pearson Education, Inc.

Then, we collect data for the values of X in 500 trials.

Example 16.8 Coin-Tossing Experiments

Page 105: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 105Copyright © 2010 Pearson Education, Inc.

Then, we collect data for the values of X in 1000 trials.

Example 16.8 Coin-Tossing Experiments

Page 106: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 106Copyright © 2010 Pearson Education, Inc.

Then, we collect data for the values of X in 5000 trials.

Example 16.8 Coin-Tossing Experiments

Page 107: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 107Copyright © 2010 Pearson Education, Inc.

Then, we collect data for the values of X in 10,000 trials.

Example 16.8 Coin-Tossing Experiments

Page 108: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 108Copyright © 2010 Pearson Education, Inc.

The bar graphs paint a pretty clear picture of what happens: As the number of trials increases, the distribution of the data becomes more and more bell shaped. At the end, we have data from 10,000 trials, and the bar graph gives an almost perfect normal distribution!What would happen if someone else decided to repeat what we did–toss an honest coin (be it by hand or by computer) 100 times, count the number of heads, and repeat this experiment a few times?

Example 16.8 Coin-Tossing Experiments

Page 109: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 109Copyright © 2010 Pearson Education, Inc.

The first 10 trials are likely to produce results very different from ours, but as the number of trials increases, their results and our results will begin to look more and more alike. After 10,000 trials, their bar graph will be almost identical to our bar graph. In a sense, this says that doing the experiments a second time is a total waste of time–in fact, it was even a waste the first time! The outline of the final distribution could have been predicted without ever tossing a coin!

Example 16.8 Coin-Tossing Experiments

Page 110: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 110Copyright © 2010 Pearson Education, Inc.

Knowing that the random variable X has an approximately normal distribution is, as we have seen, quite useful. The clincher would be to find out the values of the mean and the standard deviation of this distribution. Looking at the bar graphs, we can pretty much see where the mean is–right at 50. This is not surprising, since the axis of symmetry of the distribution has to pass through 50 as a simple consequence of the fact that the coin is honest. For now, let’s accept = 5.

Example 16.8 Coin-Tossing Experiments

Page 111: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 111Copyright © 2010 Pearson Education, Inc.

Let’s summarize what we now know. An honest coin is tossed 100 times. The number of heads in the 100 tosses is a random variable, which we call X. If we repeat this experiment a large number of times (say N), the random variable X will have an approximately normal distribution with mean = 50 and standard deviation = 5, and the larger the value of N is, the better this approximation will be.

Example 16.8 Coin-Tossing Experiments

Page 112: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 112Copyright © 2010 Pearson Education, Inc.

The real significance of these facts is that they are true not because we took the trouble to toss a coin a million times. Even if we did not toss a coin at all, all of these statements would still be true. For a sufficiently large number of repetitions of the experiment of tossing an honest coin 100 times, the number of heads X is a random variable that has an approximately normal distribution with center = 50 heads and standard deviation = 5 heads. This is a mathematical, rather than an experimental, fact.

Example 16.8 Coin-Tossing Experiments

Page 113: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 113Copyright © 2010 Pearson Education, Inc.

16 Mathematics of Normal Distributions

16.1 Approximately Normal Distributions of Data

16.2 Normal Curves and Normal Distributions

16.3 Standardizing Normal Data

16.4 The 68-95-99.7 Rule

16.5 Normal Curves as Models of Real-Life Data Sets

16.6 Distribution of Random Events

16.7 Statistical Inference

Page 114: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 114Copyright © 2010 Pearson Education, Inc.

Suppose that we have an honest coin and intend to toss it 100 times. We are going to do this just once, and we will let X denote the resulting number of heads. Been there, done that! What’s new now is that we a have a solid understanding of the statistical behavior of the random variable X–it has an approximately normal distribution with mean = 50 and standard deviation = 5–and this allows us to make some very reasonable predictions about the possible values of X.

Statistical Inference

Page 115: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 115Copyright © 2010 Pearson Education, Inc.

For starters, we can predict the chance that X will fall somewhere between 45 and 55 (one standard deviation below and above the mean)–it is 68%. Likewise, we know that the chance that X will fall somewhere between 40 and 60 is 95%, and between 35 and 65 is a whopping 99.7%.

What if, instead of tossing the coin 100 times, we were to toss it n times?

Statistical Inference

Page 116: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 116Copyright © 2010 Pearson Education, Inc.

Not surprisingly, bell-shaped distribution would still be there–only the values of and would change. Specifically, for n sufficiently large (typically n ≥ 30), the number of heads in n tosses would be a random variable with an approximately normal distribution with mean = n/2 heads and standard deviation heads. This is an important fact for which we have coined the name the honest-coin principle.

Statistical Inference

/ 2n

Page 117: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 117Copyright © 2010 Pearson Education, Inc.

Let X denote the number of heads in n

tosses of an honest coin (assume

n ≥ 30). Then, X has an approximately

normal distribution with mean = n/2

and standard deviation

THE HONEST-COIN PRINCIPLE

/ 2.n

Page 118: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 118Copyright © 2010 Pearson Education, Inc.

An honest coin is going to be tossed 256 times. Before this is done, we have the opportunity to make some bets. Let’s say that we can make a bet (with even odds) that if the number of heads tossed falls somewhere between 120 and 136, we will win; otherwise, we will lose. Should we make such a bet?

Let X denote the number of heads in 256 tosses of an honest coin.

Example 16.9 Coin-Tossing Experiments: Part 2

Page 119: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 119Copyright © 2010 Pearson Education, Inc.

By the honest-coin principle, X is a random variable having a distribution that is approxi-mately normal with mean = 256/2 = 128 heads and standard deviation heads. The values 120 to 136 are exactly one standard deviation below and above the mean of 128, which means that there is a 68% chance that the number of heads will fall somewhere between 120 and 136.

Example 16.9 Coin-Tossing Experiments: Part 2

256 / 2 8

Page 120: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 120Copyright © 2010 Pearson Education, Inc.

We should indeed make this bet! A similar calculation tells us that there is a 95% chance that the number of heads will fall somewhere between 112 and 144, and the chance that the number of heads will fall somewhere between 104 and 152 is 99.7%.

Example 16.9 Coin-Tossing Experiments: Part 2

Page 121: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 121Copyright © 2010 Pearson Education, Inc.

What happens when the coin being tossed is not an honest coin? Surprisingly, the distribution of the number of heads X in n tosses of such a coin is still approximately normal, as long as the number n is not too small (a good rule of thumb is n ≥ 30). All we need now is a dishonest-coin principle to tell us how to find the mean and the standard deviation.

Dishonest Coin

Page 122: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 122Copyright © 2010 Pearson Education, Inc.

Let X denote the number of heads in n tosses of a coin (assume n ≥ 30). Let p denote the probability of heads on each toss of the coin. Then, X has an approximately normal distribution with mean = n • P and standard deviation

THE DISHONEST-COIN PRINCIPLE

1 .n p p

Page 123: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 123Copyright © 2010 Pearson Education, Inc.

A coin is rigged so that it comes up heads

only 20% of the time (i.e., p = 0.20). The coin

is tossed 100 times (n = 100) and X is the

number of heads in the 100 tosses. What can

we say about X?

Example 16.10 Coin-Tossing Experiments: Part 3

Page 124: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 124Copyright © 2010 Pearson Education, Inc.

According to the dishonest-coin principle, the

distribution of the random variable X is

approximately normal with mean

m = 100 0.20 = 20 and standard deviation

Applying the 68-95-99.7 rule with = 20 and

= 4 gives the following facts:

Example 16.10 Coin-Tossing Experiments: Part 3

100 0.20 0.80 4.

Page 125: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 125Copyright © 2010 Pearson Education, Inc.

■ There is about a 68% chance that X will be somewhere between 16 and 24( – ≤ X ≤ + ).

■ There is about a 95% chance that X will be somewhere between 12 and 28( – 2 ≤ X ≤ + 2 ).

■ The number of heads is almost guaranteed (about 99.7%) to fall somewhere between 8 and 32 ( – 3 ≤ X ≤ + 3 ).

Example 16.10 Coin-Tossing Experiments: Part 3

Page 126: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 126Copyright © 2010 Pearson Education, Inc.

In this example, heads and tails are no longer interchangeable concepts–heads is an outcome with probability p = 0.20 while tails is an outcome with much higher probability (0.8). We can, however, apply the principle equally well to describe the distribution of the number of tails in 100 coin tosses of the same dishonest coin: The distribution for the number of tails is approximately normal with mean = 100 0.80 = 80 and standarddeviation

Example 16.10 Coin-Tossing Experiments: Part 3

100 0.80 0.20 4.

Page 127: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 127Copyright © 2010 Pearson Education, Inc.

The dishonest-coin principle is a special

version of one of the most important laws in

statistics, a law generally known as the

central limit theorem. We will now briefly

illustrate why the importance of the

dishonest-coin principle goes beyond the

tossing of coins.

Central Limit Theorem

Page 128: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 128Copyright © 2010 Pearson Education, Inc.

An assembly line produces 100,000 light bulbs a day, 20% of which generally turn out to be defective. Suppose that we draw a random sample of n = 100 light bulbs. Let X represent the number of defective light bulbs in the sample. What can we say about X?A moment’s reflection will show that, in a sense, this example is completely parallel to Example 16.10–think of selecting defective light bulbs as analogous to tossing heads with a dishonest coin.

Example 16.11 Sampling for Defective Light Bulbs

Page 129: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 129Copyright © 2010 Pearson Education, Inc.

We can use the dishonest-coin principle to infer that the number of defective light bulbs in the sample is a random variable having an approximately normal distribution with a mean of 20 light bulbs and standard deviation of 4 light bulbs.

Using these facts, we can draw the following conclusions:

Example 16.11 Sampling for Defective Light Bulbs

Page 130: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 130Copyright © 2010 Pearson Education, Inc.

■ There is a 68% chance that the number of defective light bulbs in the sample will fall somewhere between 16 and 24.

■ There is a 95% chance that the number of defective light bulbs in the sample will fall somewhere between 12 and 28.

■ The number of defective light bulbs in the sample is practically guaranteed (a 99.7% chance) to fall somewhere between 8 and 32.

Example 16.11 Sampling for Defective Light Bulbs

Page 131: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 131Copyright © 2010 Pearson Education, Inc.

Probably the most important point here is that each of the preceding facts can be rephrased in terms of sampling errors (Chapter 13). For example, say we had 24 defective light bulbs in the sample; in other words, 24% of the sample (24 out of 100) are defective light bulbs. If we use this statistic to estimate the percentage of defective light bulbs overall, then the sampling error would be 4% (because the estimate is 24% and the value of the parameter is 20%).

Example 16.11 Sampling for Defective Light Bulbs

Page 132: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 132Copyright © 2010 Pearson Education, Inc.

By the same token, if we had 16 defective light bulbs in the sample, the sampling error would be –4%. Coincidentally, the standard deviation is = 4 light bulbs, or 4% of the sample. (We computed it in Example 16.10.) Thus, we can rephrase our previous assertions about sampling errors as follows:

Example 16.11 Sampling for Defective Light Bulbs

Page 133: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 133Copyright © 2010 Pearson Education, Inc.

■ When estimating the proportion of defective light bulbs coming out of the assembly line by using a sample of 100 light bulbs, there is a 68% chance that the sampling error will fall somewhere between –4% and 4%.

Example 16.11 Sampling for Defective Light Bulbs

Page 134: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 134Copyright © 2010 Pearson Education, Inc.

■ When estimating the proportion of defective light bulbs coming out of the assembly line by using a sample of 100 light bulbs, there is a 95% chance that the sampling error will fall somewhere between –8% and 8%.

Example 16.11 Sampling for Defective Light Bulbs

Page 135: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 135Copyright © 2010 Pearson Education, Inc.

■ When estimating the proportion of defective light bulbs coming out of the assembly line by using a sample of 100 light bulbs, there is a 99.7% chance that the sampling error will fall somewhere between –12% and 12%.

Example 16.11 Sampling for Defective Light Bulbs

Page 136: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 136Copyright © 2010 Pearson Education, Inc.

Suppose that we have the same assembly line as in Example 16.11, but this time we are going to take a really big sample of n = 1600 light bulbs. Before we even count the number of defective light bulbs in the sample, let’s see how much mileage we can get out of the dishonest-coin principle. The standard deviation for the distribution of defective light bulbs in the sample is

Example 16.12 Sampling with Larger Samples

1600 0.2 0.8 16,

Page 137: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 137Copyright © 2010 Pearson Education, Inc.

which just happens to be exactly 1% of the sample (16/1600 = 1%). This means that when we estimate the proportion of defective light bulbs coming out of the assembly line using this sample, we can have some sort of a handle on the sampling error.

Example 16.12 Sampling with Larger Samples

Page 138: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 138Copyright © 2010 Pearson Education, Inc.

■ We can say with some confidence (68%) that the sampling error will fall somewhere between –1% and 1%.

■ We can say with a lot of confidence (95%) that the sampling error will fall somewhere between –2% and 2%.

■ We can say with tremendous confidence (99.7%) that the sampling error will fall somewhere between –3% and 3%.

Example 16.12 Sampling with Larger Samples

Page 139: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 139Copyright © 2010 Pearson Education, Inc.

In California, school bond measures require a 66.67% vote for approval. Suppose that an important school bond measure is on the ballot in the upcoming election. In the most recent poll of 1200 randomly chosen voters, 744 of the 1200 voters sampled, or 62%, indicated that they would vote for the school bond measure. Let’s assume that the poll was properly conducted and that the 1200 voters sampled represent an unbiased sample of the entire population.

Example 16.13 Measuring the Margin of Error of a Poll

Page 140: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 140Copyright © 2010 Pearson Education, Inc.

What are the chances that the 62% statistic is the result of sampling variability and that the actual vote for the bond measure will be 66.67% or more?Here, we will use a variation of the dishonest-coin principle, with each vote being likened to a coin toss: A vote for the bond measure is equivalent to flipping heads, a vote against the bond measure is equivalent to flipping tails.

Example 16.13 Measuring the Margin of Error of a Poll

Page 141: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 141Copyright © 2010 Pearson Education, Inc.

In this analogy, the probability (p) of “heads” represents the proportion of voters in the population that support the bond measure: If p turns out to be 0.6667 or more, the bond measure will pass. Our problem is that we don’t know p, so how can we use the dishonest-coin principle to estimate the mean and standard deviation of the sampling distribution?

Example 16.13 Measuring the Margin of Error of a Poll

Page 142: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 142Copyright © 2010 Pearson Education, Inc.

We start by letting the 62% (0.62) statistic from the sample serve as an estimate for the actual value of p in the formula for the standard deviation given by the dishonest-coin principle. (Even though we know that this is only a rough estimate for p, it turns out to give us a good estimate for the standard deviation .)

Example 16.13 Measuring the Margin of Error of a Poll

Page 143: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 143Copyright © 2010 Pearson Education, Inc.

Using p = 0.62 and the dishonest-coin

principle, we get

votes. This number represents the

approximate standard deviation for the

number of “heads” (i.e., voters who will vote

for the school bond measure) in the sample.

Example 16.13 Measuring the Margin of Error of a Poll

np 1 p 1200 0.62 0.38 16.8

Page 144: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 144Copyright © 2010 Pearson Education, Inc.

If we express this number as a percentage of the sample size, we can say that the standard deviation represents approximately 1.4% of the sample (16.8/1200 = 0.014).The standard deviation for the sampling distribution of the proportion of voters in favor of the measure expressed as a percentage of the entire sample is called the standard error. (For our example, we have found above that the standard error is approximately 1.4%.)

Example 16.13 Measuring the Margin of Error of a Poll

Page 145: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 145Copyright © 2010 Pearson Education, Inc.

In sampling and public opinion polls, it is customary to express the information about the population in terms of confidence intervals, which are themselves based on standard errors: A 95% confidence interval is given by two standard errors below and above the statistic obtained from the sample, and a 99.7% confidence interval is given by going three standard errors below and above the sample statistic.

Example 16.13 Measuring the Margin of Error of a Poll

Page 146: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 146Copyright © 2010 Pearson Education, Inc.

For the school bond measure, a 95% confidence interval is 62% plus or minus2 (1.4%) = 2.8%. This means that we can say with 95% confidence (we would be right approximately 95 out of 100 times) that the actual vote for the bond measure will fall somewhere between 59.2% (62 – 2.8) and 64.8% (62 + 2.8) and thus that the bond measure will lose.

Example 16.13 Measuring the Margin of Error of a Poll

Page 147: Normal Distribution

Excursions in Modern Mathematics, 7e: 16.1 - 147Copyright © 2010 Pearson Education, Inc.

Take a 99.7% confidence interval of 62% plus or minus 3 (1.4%) = 4.2%–it is almost certain that the actual vote will turn out somewhere in that range. Even in the most optimistic scenario, the vote will not reach the 66.67% needed to pass the bond measure.

Example 16.13 Measuring the Margin of Error of a Poll