CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf ·...

27
CHAPTER 2 In this chapter we cover... Measuring center: the mean Measuring center: the median Comparing the mean and the median Measuring spread: the quartiles The five-number summary and boxplots Spotting suspected outliers* Measuring spread: the standard deviation Choosing measures of center and spread Using technology Organizing a statistical problem Mitchell Funk/Getty Images Describing Distributions with Numbers How long does it take you to get from home to work? Here are the travel times in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau: 1 30 20 10 40 25 20 10 60 15 40 5 30 12 10 10 We aren’t surprised that most people estimate their travel time in multiples of 5 minutes. Here is a stemplot of these data: 0 2 3 4 5 6 5 000025 005 00 00 0 1 The distribution is single-peaked and right-skewed. The longest travel time (60 minutes) may be an outlier. Our goal in this chapter is to describe with numbers the center and spread of this and other distributions. 37

Transcript of CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf ·...

Page 1: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

CH

AP

TE

R

2In this chapter we cover...

Measuring center: the mean

Measuring center: themedian

Comparing the mean andthe median

Measuring spread: thequartiles

The five-number summaryand boxplots

Spotting suspected outliers*

Measuring spread: thestandard deviation

Choosing measures ofcenter and spread

Using technology

Organizing a statisticalproblem

Mit

chel

l Fun

k/G

etty

Imag

es

Describing Distributionswith Numbers

How long does it take you to get from home to work? Here are the travel timesin minutes for 15 workers in North Carolina, chosen at random by the CensusBureau:1

30 20 10 40 25 20 10 60 15 40 5 30 12 10 10

We aren’t surprised that most people estimate their travel time in multiples of 5minutes. Here is a stemplot of these data:

0

2

3

4

5

6

5

0 0 0 0 2 5

0 0 5

0 0

0 0

0

1

The distribution is single-peaked and right-skewed. The longest travel time (60minutes) may be an outlier. Our goal in this chapter is to describe with numbersthe center and spread of this and other distributions.

37

Page 2: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

38 C H A P T E R 2 • Describing Distributions with Numbers

Measuring center: the meanThe most common measure of center is the ordinary arithmetic average, or mean.

THE MEAN x

To find the mean of a set of observations, add their values and divide by thenumber of observations. If the n observations are x1, x2, . . . , xn , their mean is

x = x1 + x2 + · · · + xn

nor, in more compact notation,

x = 1n

∑xi

Don’t hide the outliers

Data from an airliner’s controlsurfaces, such as the vertical tailrudder, go to cockpit instrumentsand then to the “black box” flightdata recorder. To avoid confusingthe pilots, short erratic movementsin the data are “smoothed” so thatthe instruments show overallpatterns. When a crash killed 260people, investigators suspected acatastrophic movement of the tailrudder. But the black box containedonly the smoothed data. Sometimesoutliers are more important thanthe overall pattern.

The � (capital Greek sigma) in the formula for the mean is short for “addthem all up.”The subscripts on the observations xi are just a way of keeping the nobservations distinct. They do not necessarily indicate order or any other specialfacts about the data. The bar over the x indicates the mean of all the x-values.Pronounce the mean x as “x-bar.” This notation is very common. When writerswho are discussing data use x or y, they are talking about a mean.

E X A M P L E 2 . 1 Travel times to work

The mean travel time of our 15 North Carolina workers is

x = x1 + x2 + · · · + xn

n

= 30 + 20 + · · · + 1015

= 33715

= 22.5 minutes

In practice, you can key the data into your calculator and hit the x button. You don’thave to actually add and divide. But you should know that this is what the calculator isdoing.

Notice that only 6 of the 15 travel times are larger than the mean. If we leave outthe longest single travel time, 60 minutes, the mean for the remaining 14 people is 19.8minutes. That one observation raises the mean by 2.7 minutes.

Example 2.1 illustrates an important fact about the mean as a measure of cen-ter: it is sensitive to the influence of a few extreme observations. These may be out-liers, but a skewed distribution that has no outliers will also pull the mean towardits long tail. Because the mean cannot resist the influence of extreme observations,we say that it is not a resistant measure of center.resistant measure

Page 3: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Measuring center: the median 39

A P P L Y Y O U R K N O W L E D G E

2.1 Pulling wood apart. Example 1.10 (page 19) gives the breaking strength inpounds of 20 pieces of Douglas fir. Find the mean breaking strength. How many ofthe pieces of wood have strengths less than the mean? What feature of thestemplot (Figure 1.10, page 21) explains the fact that the mean is smaller thanmost of the observations?

Measuring center: the medianIn Chapter 1, we used the midpoint of a distribution as an informal measure ofcenter. The median is the formal version of the midpoint, with a specific rule forcalculation.

THE MEDIAN M

The median M is the midpoint of a distribution, the number such that halfthe observations are smaller and the other half are larger. To find themedian of a distribution:1. Arrange all observations in order of size, from smallest to largest.2. If the number of observations n is odd, the median M is the center

observation in the ordered list. Find the location of the median bycounting (n + 1)/2 observations up from the bottom of the list.

3. If the number of observations n is even, the median M is the mean ofthe two center observations in the ordered list. The location of themedian is again (n + 1)/2 from the bottom of the list.

Note that the formula (n + 1)/2 does not give the median, just the locationof the median in the ordered list. Medians require little arithmetic, so they areeasy to find by hand for small sets of data. Arranging even a moderate number ofobservations in order is very tedious, however, so that finding the median by handfor larger sets of data is unpleasant. Even simple calculators have an x button,but you will need to use software or a graphing calculator to automate finding themedian.

E X A M P L E 2 . 2 Finding the median: odd n

What is the median travel time for our 15 North Carolina workers? Here are the dataarranged in order:

5 10 10 10 10 12 15 20 20 25 30 30 40 40 60

The count of observations n = 15 is odd. The bold 20 is the center observation in theordered list, with 7 observations to its left and 7 to its right. This is the median, M = 20minutes.

Page 4: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

40 C H A P T E R 2 • Describing Distributions with Numbers

Because n = 15, our rule for the location of the median gives

Mitchell Funk/Getty Images

location of M = n + 12

= 162

= 8

That is, the median is the 8th observation in the ordered list. It is faster to use this rulethan to locate the center by eye.

E X A M P L E 2 . 3 Finding the median: even n

Travel times to work in New York State are (on the average) longer than in NorthCarolina. Here are the travel times in minutes of 20 randomly chosen New York workers:

10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

A stemplot not only displays the distribution but also makes finding the median easybecause it arranges the observations in order:

0 5

1 0 0 5 5 5 5

2 0 0 00 55

3 0 0

4 0 0 5

5

6 0 0 5

7

8 5

The distribution is single-peaked and right-skewed, with several travel times of an houror more. There is no center observation, but there is a center pair. These are the bold20 and 25 in the stemplot, which have 9 observations before them in the ordered listand 9 after them. The median is midway between these two observations:

M = 20 + 252

= 22.5 minutes

With n = 20, the rule for locating the median in the list gives

location of M = n + 12

= 212

= 10.5

The location 10.5 means “halfway between the 10th and 11th observations in the or-dered list.” That agrees with what we found by eye.

Comparing the mean and the medianExamples 2.1 and 2.2 illustrate an important difference between the mean and themedian. The median travel time (the midpoint of the distribution) is 20 minutes.The mean travel time is higher, 22.5 minutes. The mean is pulled toward the righttail of this right-skewed distribution. The median, unlike the mean, is resistant.If the longest travel time were 600 minutes rather than 60 minutes, the meanwould increase to more than 58 minutes but the median would not change at all.The outlier just counts as one observation above the center, no matter how farabove the center it lies. The mean uses the actual value of each observation andso will chase a single large observation upward. The Mean and Median applet is an

APPLETAPPLET excellent way to compare the resistance of M and x .

Page 5: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls June 26, 2006 20:3

Measuring spread: the quartiles 41

COMPARING THE MEAN AND THE MEDIAN

The mean and median of a roughly symmetric distribution are closetogether. If the distribution is exactly symmetric, the mean and median areexactly the same. In a skewed distribution, the mean is usually farther outin the long tail than is the median.2

Many economic variables have distributions that are skewed to the right. Forexample, the median endowment of colleges and universities in 2004 was $72million—but the mean endowment was $360 million. Most institutions have mod-est endowments, but a few are very wealthy. Harvard’s endowment topped $22billion. The few wealthy institutions pull the mean up but do not affect the me-dian. Reports about incomes and other strongly skewed distributions usually givethe median (“midpoint”) rather than the mean (“arithmetic average”). However,a county that is about to impose a tax of 1% on the incomes of its residents caresabout the mean income, not the median. The tax revenue will be 1% of total in-come, and the total is the mean times the number of residents. The mean andmedian measure center in different ways, and both are useful. Don’t confuse the“average” value of a variable (the mean) with its “typical” value, which we might de-

CAUTIONUTIONscribe by the median.

Lucy Nicholson/CORBIS

A P P L Y Y O U R K N O W L E D G E

2.2 New York travel times. Find the mean of the travel times to work for the 20New York workers in Example 2.3. Compare the mean and median for these data.What general fact does your comparison illustrate?

2.3 House prices. The mean and median selling price of existing single-familyhomes sold in October 2005 were $216,200 and $265,000.3 Which of thesenumbers is the mean and which is the median? Explain how you know.

2.4 Barry Bonds. The Major League Baseball single-season home run record is heldby Barry Bonds of the San Francisco Giants, who hit 73 in 2001. Bonds playedonly 14 games in 2005 because of injuries, so let’s look at his home run totals from1986 (his first year) to 2004:

16 25 24 19 33 25 34 46 37 33 42 40 37 34 49 73 46 45 45

Bonds’s record year is a high outlier. How do his career mean and median numberof home runs change when we drop the record 73? What general fact about themean and median does your result illustrate?

Measuring spread: the quartilesThe mean and median provide two different measures of the center of a distribu-tion. But a measure of center alone can be misleading. The Census Bureau reportsthat in 2004 the median income of American households was $44,389. Half of all

Page 6: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

42 C H A P T E R 2 • Describing Distributions with Numbers

households had incomes below $44,389, and half had higher incomes. The meanwas higher, $60,528, because the distribution of incomes is skewed to the right.But the median and mean don’t tell the whole story. The bottom 10% of house-holds had incomes less than $10,927, and households in the top 5% took in morethan $157,185.4 We are interested in the spread or variability of incomes as well astheir center. The simplest useful numerical description of a distribution requires both ameasure of center and a measure of spread.

One way to measure spread is to give the smallest and largest observations. ForCAUTIONUTION

example, the travel times of our 15 North Carolina workers range from 5 minutesto 60 minutes. These single observations show the full spread of the data, but theymay be outliers. We can improve our description of spread by also looking at thespread of the middle half of the data. The quartiles mark out the middle half. Countup the ordered list of observations, starting from the smallest. The first quartile liesone-quarter of the way up the list. The third quartile lies three-quarters of the way upthe list. In other words, the first quartile is larger than 25% of the observations, andthe third quartile is larger than 75% of the observations. The second quartile is themedian, which is larger than 50% of the observations. That is the idea of quartiles.We need a rule to make the idea exact. The rule for calculating the quartiles usesthe rule for the median.

THE QUARTILES Q1 and Q3

To calculate the quartiles:1. Arrange the observations in increasing order and locate the median M

in the ordered list of observations.2. The first quartile Q1 is the median of the observations whose position

in the ordered list is to the left of the location of the overall median.3. The third quartile Q3 is the median of the observations whose position

in the ordered list is to the right of the location of the overall median.

Here are examples that show how the rules for the quartiles work for both oddand even numbers of observations.

E X A M P L E 2 . 4 Finding the quartiles: odd n

Our North Carolina sample of 15 workers’ travel times, arranged in increasing order, is

5 10 10 10 10 12 15 20 20 25 30 30 40 40 60

There is an odd number of observations, so the median is the middle one, the bold 20 inthe list. The first quartile is the median of the 7 observations to the left of the median.This is the 4th of these 7 observations, so Q1 = 10 minutes. If you want, you can usethe rule for the location of the median with n = 7:

location of Q1 = n + 12

= 7 + 12

= 4

Page 7: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

The five-number summary and boxplots 43

The third quartile is the median of the 7 observations to the right of the median, Q3 =30 minutes. When there is an odd number of observations, leave the overall median out of thecalculation of the quartiles.

The quartiles are resistant. For example, Q3 would still be 30 if the outlier were 600rather than 60.

E X A M P L E 2 . 5 Finding the quartiles: even n

Here are the travel times to work of the 20 New Yorkers from Example 2.3, arranged inincreasing order:

5 10 10 15 15 15 15 20 20 20 | 25 30 30 40 40 45 60 60 65 85

There is an even number of observations, so the median lies midway between the middlepair, the 10th and 11th in the list. Its value is M = 22.5 minutes. We have marked thelocation of the median by |. The first quartile is the median of the first 10 observations,because these are the observations to the left of the location of the median. Check thatQ1 = 15 minutes and Q3 = 42.5 minutes. When the number of observations is even, useall the observations in calculating the quartiles.

Be careful when, as in these examples, several observations take the samenumerical value. Write down all of the observations and apply the rules just asif they all had distinct values.

The five-number summary and boxplotsThe smallest and largest observations tell us little about the distribution as a whole,but they give information about the tails of the distribution that is missing if weknow only Q1, M, and Q3. To get a quick summary of both center and spread,combine all five numbers.

THE FIVE-NUMBER SUMMARY

The five-number summary of a distribution consists of the smallestobservation, the first quartile, the median, the third quartile, and the largestobservation, written in order from smallest to largest. In symbols, thefive-number summary is

Minimum Q1 M Q3 Maximum

These five numbers offer a reasonably complete description of center andspread. The five-number summaries of travel times to work from Examples 2.4and 2.5 are

North Carolina 5 10 20 30 60New York 5 15 22.5 42.5 85

Page 8: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

44 C H A P T E R 2 • Describing Distributions with Numbers

010

2030

4050

6070

8090

Tra

vel t

ime

to w

ork

(m

inu

tes)

North Carolina New York

Maximum = 85

Minimum = 5

Third quartile = 42.5

First quartile = 15

Median = 22.5

F I G U R E 2 . 1 Boxplots comparing the travel times to work of samples of workersin North Carolina and New York.

The five-number summary of a distribution leads to a new graph, the boxplot.Figure 2.1 shows boxplots comparing travel times to work in North Carolina andin New York.

BOXPLOT

A boxplot is a graph of the five-number summary.• A central box spans the quartiles Q1 and Q3.• A line in the box marks the median M.• Lines extend from the box out to the smallest and largest observations.

Because boxplots show less detail than histograms or stemplots, they are bestused for side-by-side comparison of more than one distribution, as in Figure 2.1.Be sure to include a numerical scale in the graph. When you look at a boxplot,first locate the median, which marks the center of the distribution. Then look at

Page 9: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Spotting suspected outliers 45

the spread. The height of the box shows the spread of the middle half of the data,and the extremes (the smallest and largest observations) show the spread of theentire data set. We see from Figure 2.1 that travel times to work are in general a bitlonger in New York than in North Carolina. The median, both quartiles, and themaximum are all larger in New York. New York travel times are also more variable,as shown by the height of the box and the spread between the extremes.

Finally, the New York data are more strongly right-skewed. In a symmetric dis-tribution, the first and third quartiles are equally distant from the median. In mostdistributions that are skewed to the right, on the other hand, the third quartilewill be farther above the median than the first quartile is below it. The extremesbehave the same way, but remember that they are just single observations and maysay little about the distribution as a whole.

A P P L Y Y O U R K N O W L E D G E

2.5 Pulling wood apart. Example 1.10 (page 19) gives the breaking strengths of 20pieces of Douglas fir.

(a) Give the five-number summary of the distribution of breaking strengths. (Thestemplot, Figure 1.10, helps because it arranges the data in order, but you shoulduse the unrounded values in numerical work.)

(b) The stemplot shows that the distribution is skewed to the left. Does thefive-number summary show the skew? Remember that only a graph gives a clearpicture of the shape of a distribution.

2.6 Comparing investments. Should you put your money into a fund that buysstocks or a fund that invests in real estate? The answer changes from time to time,and unfortunately we can’t look into the future. Looking back into the past, theboxplots in Figure 2.2 compare the daily returns (in percent) on a “total stockmarket” fund and a real estate fund over 14 months ending in May 2005.5

(a) Read the graph: about what were the highest and lowest daily returns on thestock fund?

(b) Read the graph: the median return was about the same on both investments.About what was the median return?

(c) What is the most important difference between the two distributions?

Spotting suspected outliers∗

Look again at the stemplot of travel times to work in New York in Example 2.3.The five-number summary for this distribution is

5 15 22.5 42.5 85

How shall we describe the spread of this distribution? The smallest and largestobservations are extremes that don’t describe the spread of the majority of the

∗This short section is optional.

Page 10: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

46 C H A P T E R 2 • Describing Distributions with Numbers

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

Dai

ly p

erce

nt

retu

rn

Stocks

Type of investmentReal estate

F I G U R E 2 . 2 Boxplots comparing the distributions of daily returns on two kindsof investment, for Exercise 2.6.

data. The distance between the quartiles (the range of the center half of the data)is a more resistant measure of spread. This distance is called the interquartile range.

THE INTERQUARTILE RANGE IQR

The interquartile range I QR is the distance between the first and thirdquartiles:

I QR = Q3 − Q1

For our data on New York travel times, I QR = 42.5 − 15 = 27.5 minutes.However, no single numerical measure of spread, such as I QR, is very useful for

CAUTIONUTION

describing skewed distributions. The two sides of a skewed distribution have differentspreads, so one number can’t summarize them. The interquartile range is mainlyused as the basis for a rule of thumb for identifying suspected outliers.

Page 11: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Measuring spread: the standard deviation 47

THE 1.5 × I Q R RULE FOR OUTLIERS

Call an observation a suspected outlier if it falls more than 1.5 × I QRabove the third quartile or below the first quartile.

E X A M P L E 2 . 6 Using the 1.5 × I Q R rule

For the New York travel time data, I QR = 27.5 and

1.5 × I QR = 1.5 × 27.5 = 41.25

Any values not falling between

Q1 − (1.5 × I QR) = 15.0 − 41.25 = −26.25 and

Q3 + (1.5 × I QR) = 42.5 + 41.25 = 83.75

are flagged as suspected outliers. Look again at the stemplot in Example 2.3: the onlysuspected outlier is the longest travel time, 85 minutes. The 1.5 × I QR rule suggeststhat the three next-longest travel times (60 and 65 minutes) are just part of the longright tail of this skewed distribution.

How much is that houseworth?

The town of Manhattan, Kansas, issometimes called “the little Apple”to distinguish it from that otherManhattan. A few years ago, ahouse there appeared in the countyappraiser’s records valued at$200,059,000. That would be quitea house even on Manhattan Island.As you might guess, the entry waswrong: the true value was $59,500.But before the error was discovered,the county, the city, and the schoolboard had based their budgets onthe total appraised value of realestate, which the one outlier jackedup by 6.5%. It can pay to spotoutliers before you trust your data.

The 1.5 × I QR rule is not a replacement for looking at the data. It is mostuseful when large volumes of data are scanned automatically.

A P P L Y Y O U R K N O W L E D G E

2.7 Travel time to work. In Example 2.1, we noted the influence of one long traveltime of 60 minutes in our sample of 15 North Carolina workers. Does the1.5 × I QR rule identify this travel time as a suspected outlier?

2.8 Older Americans. The stemplot in Exercise 1.19 (page 25) displays thedistribution of the percents of residents aged 65 and older in the 50 states.Stemplots help you find the five-number summary because they arrange theobservations in increasing order.

(a) Give the five-number summary of this distribution.

(b) Does the 1.5 × I QR rule identify Alaska and Florida as suspected outliers?Does it also flag any other states?

Measuring spread: the standard deviationThe five-number summary is not the most common numerical description of adistribution. That distinction belongs to the combination of the mean to measurecenter and the standard deviation to measure spread. The standard deviation and itsclose relative, the variance, measure spread by looking at how far the observationsare from their mean.

Page 12: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

48 C H A P T E R 2 • Describing Distributions with Numbers

THE STANDARD DEVIATION s

The variance s2 of a set of observations is an average of the squares of thedeviations of the observations from their mean. In symbols, the variance ofn observations x1, x2, . . . , xn is

s 2 = (x1 − x)2 + (x2 − x)2 + · · · + (xn − x)2

n − 1

or, more compactly,

s 2 = 1n − 1

∑(xi − x)2

The standard deviation s is the square root of the variance s 2:

s =√

1n − 1

∑(xi − x)2

In practice, use software or your calculator to obtain the standard deviationfrom keyed-in data. Doing an example step-by-step will help you understand howthe variance and standard deviation work, however.

Tom Tracy Photography/Alamy

E X A M P L E 2 . 7 Calculating the standard deviation

A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rateis important in studies of weight gain, dieting, and exercise. Here are the metabolic ratesof 7 men who took part in a study of dieting. The units are calories per 24 hours. Theseare the same calories used to describe the energy content of foods.

1792 1666 1362 1614 1460 1867 1439

The researchers reported x and s for these men. First find the mean:

x = 1792 + 1666 + 1362 + 1614 + 1460 + 1867 + 14397

= 11,2007

= 1600 calories

Figure 2.3 displays the data as points above the number line, with their mean marked byan asterisk (∗). The arrows mark two of the deviations from the mean. These deviationsshow how spread out the data are about their mean. They are the starting point forcalculating the variance and the standard deviation.

Page 13: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Measuring spread: the standard deviation 49

Metabolic rate

150

0

deviation = –161 deviation = 192

x = 1792

190

0

130

0

180

0

170

0

160

0

140

0

x = 1439

****

–x = 1600

1362

1439

1460

1614

1666

1792

1867

F I G U R E 2 . 3 Metabolicrates for 7 men, with their mean(∗) and the deviations of twoobservations from the mean.

Observations Deviations Squared deviationsxi xi − x (xi − x)2

1792 1792 − 1600 = 192 1922 = 36,8641666 1666 − 1600 = 66 662 = 4,3561362 1362 − 1600 = −238 (−238)2 = 56,6441614 1614 − 1600 = 14 142 = 1961460 1460 − 1600 = −140 (−140)2 = 19,6001867 1867 − 1600 = 267 2672 = 71,2891439 1439 − 1600 = −161 (−161)2 = 25,921

sum = 0 sum = 214,870

The variance is the sum of the squared deviations divided by one less than the numberof observations:

s 2 = 214, 8706

= 35,811.67

The standard deviation is the square root of the variance:

s =√

35,811.67 = 189.24 calories

Notice that the “average” in the variance s 2 divides the sum by one fewerthan the number of observations, that is, n − 1 rather than n. The reason is thatthe deviations xi − x always sum to exactly 0, so that knowing n − 1 of themdetermines the last one. Only n − 1 of the squared deviations can vary freely, andwe average by dividing the total by n − 1. The number n − 1 is called the degrees degrees of freedomof freedom of the variance or standard deviation. Some calculators offer a choicebetween dividing by n and dividing by n − 1, so be sure to use n − 1.

More important than the details of hand calculation are the properties thatdetermine the usefulness of the standard deviation:

• s measures spread about the mean and should be used only when the mean ischosen as the measure of center.

• s is always zero or greater than zero. s = 0 only when there is no spread. Thishappens only when all observations have the same value. Otherwise, s > 0.As the observations become more spread out about their mean, s gets larger.

Page 14: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

50 C H A P T E R 2 • Describing Distributions with Numbers

• s has the same units of measurement as the original observations. For example, ifyou measure metabolic rates in calories, both the mean x and the standarddeviation s are also in calories. This is one reason to prefer s to the variances 2, which is in squared calories.

• Like the mean x , s is not resistant. A few outliers can make s very large.

The use of squared deviations renders s even more sensitive than x to a few extreme

CAUTIONUTION

observations. For example, the standard deviation of the travel times for the 15North Carolina workers in Example 2.1 is 15.23 minutes. (Use your calculatorto verify this.) If we omit the high outlier, the standard deviation drops to 11.56minutes.

If you feel that the importance of the standard deviation is not yet clear, youare right. We will see in Chapter 3 that the standard deviation is the natural mea-sure of spread for a very important class of symmetric distributions, the Normaldistributions. The usefulness of many statistical procedures is tied to distributionsof particular shapes. This is certainly true of the standard deviation.

Choosing measures of center and spreadWe now have a choice between two descriptions of the center and spread of adistribution: the five-number summary, or x and s . Because x and s are sensitiveto extreme observations, they can be misleading when a distribution is stronglyskewed or has outliers. In fact, because the two sides of a skewed distribution havedifferent spreads, no single number such as s describes the spread well. The five-number summary, with its two quartiles and two extremes, does a better job.

CHOOSING A SUMMARY

The five-number summary is usually better than the mean and standarddeviation for describing a skewed distribution or a distribution with strongoutliers. Use x and s only for reasonably symmetric distributions that arefree of outliers.

Remember that a graph gives the best overall picture of a distribution. Numerical

CAUTIONUTION

measures of center and spread report specific facts about a distribution, but they do notdescribe its entire shape. Numerical summaries do not disclose the presence of mul-tiple peaks or clusters, for example. Exercise 2.10 shows how misleading numericalsummaries can be. Always plot your data.

A P P L Y Y O U R K N O W L E D G E

2.9 Blood phosphate. The level of various substances in the blood influences ourhealth. Here are measurements of the level of phosphate in the blood of a patient,in milligrams of phosphate per deciliter of blood, made on 6 consecutive visits toa clinic:

5.6 5.2 4.6 4.9 5.7 6.4

Page 15: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Using technology 51

A graph of only 6 observations gives little information, so we proceed to computethe mean and standard deviation.

(a) Find the mean step-by-step. That is, find the sum of the 6 observations anddivide by 6.

(b) Find the standard deviation step-by-step. That is, find the deviations of eachobservation from the mean, square the deviations, then obtain the variance andthe standard deviation. Example 2.7 shows the method.

(c) Now enter the data into your calculator and use the mean and standarddeviation buttons to obtain x and s . Do the results agree with your handcalculations?

2.10 x and s are not enough. The mean x and standard deviation s measurecenter and spread but are not a complete description of a distribution. Data setswith different shapes can have the same mean and standard deviation. Todemonstrate this fact, use your calculator to find x and s for these two smalldata sets. Then make a stemplot of each and comment on the shape of eachdistribution.

Data A 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74

Data B 6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50

2.11 Choose a summary. The shape of a distribution is a rough guide to whether themean and standard deviation are a helpful summary of center and spread. Forwhich of these distributions would x and s be useful? In each case, give a reasonfor your decision.

(a) Percents of college graduates in the states, Figure 1.5 (page 15).

(b) Iowa Test scores, Figure 1.6 (page 17).

(c) Breaking strength of wood, Figure 1.10 (page 21).

Using technologyAlthough a calculator with “two-variable statistics” functions will do the basiccalculations we need, more elaborate tools are helpful. Graphing calculators andcomputer software will do calculations and make graphs as you command, freeingyou to concentrate on choosing the right methods and interpreting your results.Figure 2.4 displays output describing the travel times to work of 20 people in NewYork State (Example 2.3). Can you find x , s , and the five-number summary in eachoutput? The big message of this section is: Once you know what to look for, you canread output from any technological tool.

The displays in Figure 2.4 come from the TI-83 (or TI-84) graphing calcula-tor, two statistical programs, and the Microsoft Excel spreadsheet program. Thestatistical programs are CrunchIt! and Minitab. The statistical programs allow youto choose what descriptive measures you want. Excel and the TI calculators givesome things we don’t need. Just ignore the extras. Excel’s “Descriptive Statistics”menu item doesn’t give the quartiles. We used the spreadsheet’s separate quartilefunction to get Q1 and Q3.

Page 16: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 9, 2006 21:33

52 C H A P T E R 2 • Describing Distributions with Numbers

Microsoft Excel

minutesA B

12

34

56

7

89

10

1112

1314

1516

C D

Mean

QUARTILE(A2:A21,1)

QUARTILE(A2:A21,3)

31.25

Median

Standard Error

Mode

Standard Deviation 21.8773495

22.5

15

1542.5

4.891924064

Sample VarianceKurtosis

Skewness

478.6184211

0.329884126

1.040110836

80

585

62520

Range

Minimum

Maximum

Sum

Count

Sheet4 Sheet1 SheetSheet2

Texas Instruments TI-83

Minitab

CrunchIt!

Descriptive Statistics: NYtime

variable Count Mean StDev Variance Minimum Q1 Median Q3 MaximumNYtime 20 31.25 21.88 478.62 5.00 15.00 22.50 43.75 85.00

Total

F I G U R E 2 . 4 Output from a graphing calculator and three software packagesdescribing the data on travel times to work in New York State.

Page 17: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Organizing a statistical problem 53

E X A M P L E 2 . 8 What is the third quartile?

In Example 2.5, we saw that the quartiles of the New York travel times are Q1 = 15 andQ3 = 42.5. Look at the output displays in Figure 2.4. The TI-83, CrunchIt!, and Excelagree with our work. Minitab says that Q3 = 43.75. What happened? There are severalrules for finding the quartiles. Some software packages use rules that give results different fromours for some sets of data. This is true of Minitab and also of Excel, though Excel agreeswith our work in this example. Our rule is simplest for hand computation. Results fromthe various rules are always close to each other, so to describe data you should just use theanswer your technology gives you.

Organizing a statistical problemMost of our examples and exercises have aimed to help you learn basic tools (graphs

CAUTIONUTION

and calculations) for describing and comparing distributions. You have also learnedprinciples that guide use of these tools, such as “always start with a graph”and “lookfor the overall pattern and striking deviations from the pattern.” The data youwork with are not just numbers. They describe specific settings such as water depthin the Everglades or travel time to work. Because data come from a specific setting,the final step in examining data is a conclusion for that setting. Water depth inthe Everglades has a yearly cycle that reflects Florida’s wet and dry seasons. Traveltimes to work are generally longer in New York than in North Carolina.

As you learn more statistical tools and principles, you will face more complexstatistical problems. Although no framework accommodates all the varied issuesthat arise in applying statistics to real settings, the following four-step thoughtprocess gives useful guidance. In particular, the first and last steps emphasize thatstatistical problems are tied to specific real-world settings and therefore involvemore than doing calculations and making graphs.

ORGANIZING A STATISTICAL PROBLEM: A FOUR-STEP PROCESS

STATE: What is the practical question, in the context of the real-worldsetting?FORMULATE: What specific statistical operations does this problem callfor?SOLVE: Make the graphs and carry out the calculations needed for thisproblem.CONCLUDE: Give your practical conclusion in the setting of thereal-world problem.

To help you master the basics, many exercises will continue to tell you what todo—make a histogram, find the five-number summary, and so on. Real statisticalproblems don’t come with detailed instructions. From now on, especially in thelater chapters of the book, you will meet some exercises that are more realistic.Use the four-step process as a guide to solving and reporting these problems. Theyare marked with the four-step icon, as Example 2.9 illustrates.

Page 18: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

54 C H A P T E R 2 • Describing Distributions with Numbers

E X A M P L E 2 . 9 Comparing tropical flowers

STATE: Ethan Temeles of Amherst College, with his colleague W. John Kress, stud-4STEPSTEP

ied the relationship between varieties of the tropical flower Heliconia on the island ofDominica and the different species of hummingbirds that fertilize the flowers.6 Overtime, the researchers believe, the lengths of the flowers and the form of the humming-birds’ beaks have evolved to match each other. If that is true, flower varieties fertilizedby different hummingbird species should have distinct distributions of length.

Table 2.1 gives length measurements (in millimeters) for samples of three varietiesof Heliconia, each fertilized by a different species of hummingbird. Do the three varietiesdisplay distinct distributions of length? How do the mean lengths compare?

T A B L E 2 . 1 Flower lengths (millimeters) for three Heliconia varieties

H. bihai

47.12 46.75 46.81 47.12 46.67 47.43 46.44 46.6448.07 48.34 48.15 50.26 50.12 46.34 46.94 48.36

H. caribaea red

41.90 42.01 41.93 43.09 41.47 41.69 39.78 40.5739.63 42.18 40.66 37.87 39.16 37.40 38.20 38.0738.10 37.97 38.79 38.23 38.87 37.78 38.01

H. caribaea yellow

36.78 37.02 36.52 36.11 36.03 35.45 38.13 37.1035.17 36.82 36.66 35.68 36.03 34.57 34.63

FORMULATE: Use graphs and numerical descriptions to describe and compare thesethree distributions of flower length.

Art Wolfe/Getty Images

SOLVE: We might use boxplots to compare the distributions, but stemplots preservemore detail and work well for data sets of these sizes. Figure 2.5 displays stemplots withthe stems lined up for easy comparison. The lengths have been rounded to the nearesttenth of a millimeter. The bihai and red varieties have somewhat skewed distributions,so we might choose to compare the five-number summaries. But because the researchersplan to use x and s for further analysis, we instead calculate these measures:

Variety Mean length Standard deviation

bihai 47.60 1.213red 39.71 1.799yellow 36.18 0.975

CONCLUDE: The three varieties differ so much in flower length that there is littleoverlap among them. In particular, the flowers of bihai are longer than either red or

Page 19: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 9, 2006 21:33

Chapter 2 Summary 55

bihai

3 4 6 7 8 8 9

1 1 4

1 2 3 4

1 3

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

red

4 8 9

0 0 1 1 2 2 8 9

2 6 8

6 7

5 7 9 9

0 2

1

yellow

6 6

2 5 7

0 0 1 5 7 8 8

0 1

1

F I G U R E 2 . 5 Stemplots comparing the distributions of flower lengths from Table 2.1.The stems are whole millimeters and the leaves are tenths of a millimeter.

yellow. The mean lengths are 47.6 mm for H. bihai, 39.7 mm for H. caribaea red, and36.2 mm for H. caribaea yellow.

Digital Vision/Getty Images

A P P L Y Y O U R K N O W L E D G E

2.12 Logging in the rain forest. “Conservationists have despaired over destruction of 4STEPSTEP

tropical rain forest by logging, clearing, and burning.” These words begin a reporton a statistical study of the effects of logging in Borneo.7 Researchers comparedforest plots that had never been logged (Group 1) with similar plots nearby thathad been logged 1 year earlier (Group 2) and 8 years earlier (Group 3). All plotswere 0.1 hectare in area. Here are the counts of trees for plots in each group:

Group 1: 27 22 29 21 19 33 16 20 24 27 28 19Group 2: 12 12 15 9 20 18 17 14 14 2 17 19Group 3: 18 4 22 15 18 19 22 12 12

To what extent has logging affected the count of trees? Follow the four-stepprocess in reporting your work.

C H A P T E R 2 SUMMARYA numerical summary of a distribution should report at least its center and itsspread or variability.The mean x and the median M describe the center of a distribution in differentways. The mean is the arithmetic average of the observations, and the median isthe midpoint of the values.

Page 20: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 9, 2006 21:33

56 C H A P T E R 2 • Describing Distributions with Numbers

When you use the median to indicate the center of the distribution, describe itsspread by giving the quartiles. The first quartile Q1 has one-fourth of theobservations below it, and the third quartile Q3 has three-fourths of theobservations below it.The five-number summary consisting of the median, the quartiles, and thesmallest and largest individual observations provides a quick overall descriptionof a distribution. The median describes the center, and the quartiles andextremes show the spread.Boxplots based on the five-number summary are useful for comparing severaldistributions. The box spans the quartiles and shows the spread of the centralhalf of the distribution. The median is marked within the box. Lines extend fromthe box to the extremes and show the full spread of the data.The variance s2 and especially its square root, the standard deviation s, arecommon measures of spread about the mean as center. The standard deviation sis zero when there is no spread and gets larger as the spread increases.A resistant measure of any aspect of a distribution is relatively unaffected bychanges in the numerical value of a small proportion of the total number ofobservations, no matter how large these changes are. The median and quartilesare resistant, but the mean and the standard deviation are not.The mean and standard deviation are good descriptions for symmetricdistributions without outliers. They are most useful for the Normal distributionsintroduced in the next chapter. The five-number summary is a better descriptionfor skewed distributions.Numerical summaries do not fully describe the shape of a distribution. Alwaysplot your data.A statistical problem has a real-world setting. You can organize many problemsusing the four steps state, formulate, solve, and conclude.

C H E C K Y O U R S K I L L S

2.13 Here are the IQ test scores of 10 randomly chosen fifth-grade students:

145 139 126 122 125 130 96 110 118 118

The mean of these scores is

(a) 122.9. (b) 123.4. (c) 136.6.

2.14 The median of the 10 IQ test scores in Exercise 2.13 is

(a) 125. (b) 123.5. (c) 122.9.

2.15 The five-number summary of the 10 IQ scores in Exercise 2.13 is

(a) 96, 114, 125, 134.5, 145.

(b) 96, 118, 122.9, 130, 145.

(c) 96, 118, 123.5, 130, 145.

Page 21: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Chapter 2 Exercises 57

2.16 If a distribution is skewed to the right,

(a) the mean is less than the median.

(b) the mean and median are equal.

(c) the mean is greater than the median.

2.17 What percent of the observations in a distribution lie between the first quartileand the third quartile?

(a) 25% (b) 50% (c) 75%

2.18 To make a boxplot of a distribution, you must know

(a) all of the individual observations.

(b) the mean and the standard deviation.

(c) the five-number summary.

2.19 The standard deviation of the 10 IQ scores in Exercise 2.13 (use your calculator) is

(a) 13.23. (b) 13.95. (c) 194.6.

2.20 What are all the values that a standard deviation s can possibly take?

(a) 0 ≤ s (b) 0 ≤ s ≤ 1 (c) −1 ≤ s ≤ 1

2.21 You have data on the weights in grams of 5 baby pythons. The mean weight is31.8 and the standard deviation of the weights is 2.39. The correct units for thestandard deviation are

(a) no units—it’s just a number.

(b) grams.

(c) grams squared.

2.22 Which of the following is least affected if an extreme high outlier is added to yourdata?

(a) The median

(b) The mean

(c) The standard deviation

C H A P T E R 2 EXERCISES

2.23 Incomes of college grads. The Census Bureau reports that the mean and medianincome of people at least 25 years old who had a bachelor’s degree but no higherdegree were $42,087 and $53,581 in 2004. Which of these numbers is the meanand which is the median? Explain your reasoning.

2.24 Assets of young households. A report on the assets of American householdssays that the median net worth of households headed by someone younger thanage 35 is $11,600. The mean net worth of these same young households is$90,700.8 What explains the difference between these two measures of center?

2.25 Where are the doctors? Table 1.4 (page 32) gives the number of medical doctorsper 100,000 people in each state. Exercise 1.34 asked you to plot the data. Thedistribution is right-skewed with several high outliers.

(a) Do you expect the mean to be greater than the median, about equal to themedian, or less than the median? Why? Calculate x and M and verify yourexpectation.

Page 22: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

58 C H A P T E R 2 • Describing Distributions with Numbers

(b) The District of Columbia, at 683 doctors per 100,000 residents, is a highoutlier. If you remove D.C. because it is a city rather than a state, do you expect xor M to change more? Why? Omitting D.C., calculate both measures for the 50states and verify your expectation.

2.26 Making resistance visible. In the Mean and Median applet, place three

APPLETAPPLET

observations on the line by clicking below it: two close together near the centerof the line, and one somewhat to the right of these two.

(a) Pull the single rightmost observation out to the right. (Place the cursor onthe point, hold down a mouse button, and drag the point.) How does the meanbehave? How does the median behave? Explain briefly why each measure acts as itdoes.

(b) Now drag the single rightmost point to the left as far as you can. Whathappens to the mean? What happens to the median as you drag this point past theother two (watch carefully)?

2.27 Comparing tropical flowers. An alternative presentation of the flower lengthdata in Table 2.1 reports the five-number summary and uses boxplots to displaythe distributions. Do this. Do the boxplots fail to reveal any importantinformation visible in the stemplots in Figure 2.5?

2.28 University endowments. The National Association of College and UniversityBusiness Officers collects data on college endowments. In 2004, 741 colleges anduniversities reported the value of their endowments. When the endowmentvalues are arranged in order, what are the positions of the median and thequartiles in this ordered list?

2.29 How much fruit do adolescent girls eat? Figure 1.13 (page 29) is a histogram ofthe number of servings of fruit per day claimed by 74 seventeen-year-old girls.With a little care, you can find the median and the quartiles from the histogram.What are these numbers? How did you find them?

Photodisc Red/Getty Images

2.30 Weight of newborns. Here is the distribution of the weight at birth for all babiesborn in the United States in 2002:9

Weight Count Weight Count

Less than 500 grams 6,268 3,000 to 3,499 grams 1,521,884500 to 999 grams 22,845 3,500 to 3,999 grams 1,125,9591,000 to 1,499 grams 29,431 4,000 to 4,499 grams 314,1821,500 to 1,999 grams 61,652 4,500 to 4,999 grams 48,6062,000 to 2,499 grams 193,881 5,000 to 5,499 grams 5,3962,500 to 2,999 grams 688,630

(a) For comparison with other years and with other countries, we prefer ahistogram of the percents in each weight class rather than the counts. Explain why.

(b) How many babies were there? Make a histogram of the distribution, usingpercents on the vertical scale.

(c) What are the positions of the median and quartiles in the ordered list of allbirth weights? In which weight classes do the median and quartiles fall?

Page 23: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Chapter 2 Exercises 59

T A B L E 2 . 2 Positions and weights (pounds) for a major college football team

QB 208 QB 195 QB 209 RB 185 RB 221 RB 221 RB 211RB 206 RB 193 RB 235 RB 220 OL 308 OL 298 OL 285OL 281 OL 286 OL 275 OL 293 OL 283 OL 337 OL 284OL 325 OL 334 OL 325 OL 310 OL 290 OL 291 OL 254WR 185 WR 183 WR 174 WR 162 WR 154 WR 188 WR 182WR 215 WR 210 TE 224 TE 247 TE 215 KP 207 KP 192KP 205 KP 179 KP 182 KP 207 KP 201 DB 193 DB 184DB 177 DB 198 DB 185 DB 188 DB 188 DB 201 DB 181DB 195 DB 187 LB 220 LB 230 LB 237 LB 237 LB 208LB 220 LB 222 LB 199 DL 240 DL 286 DL 240 DL 232DL 230 DL 270 DL 300 DL 246 DL 264 DL 267 DL 285DL 263 DL 301

2.31 More on study times. In Exercise 1.37 you examined the nightly study timeclaimed by first-year college men and women. The most common methods forformal comparison of two groups use x and s to summarize the data.

(a) What kinds of distributions are best summarized by x and s ?

(b) One student in each group claimed to study at least 300 minutes (five hours)per night. How much does removing these observations change x and s for eachgroup?

2.32 Behavior of the median. Place five observations on the line in the Mean and

APPLETAPPLET

Median applet by clicking below it.

(a) Add one additional observation without changing the median. Where is yournew point?

(b) Use the applet to convince yourself that when you add yet anotherobservation (there are now seven in all), the median does not change no matterwhere you put the seventh point. Explain why this must be true.

Jason Arnold/CORBIS

2.33 A football team. The University of Miami Hurricanes have been among themore successful teams in college football. Table 2.2 gives the weights in poundsand the positions of the players on the 2005 team.10 The positions arequarterback (QB), running back (RB), offensive line (OL), wide receiver (WR),tight end (TE), kicker/punter (KP), defensive back (DB), linebacker (LB), anddefensive line (DL).

(a) Make boxplots of the weights for running backs, wide receivers, offensivelinemen, defensive linemen, linebackers, and defensive backs.

(b) Briefly compare the weight distributions. Which position has the heaviestplayers overall? Which has the lightest?

(c) Are any individual players outliers within their position?

2.34 Guinea pig survival times. Listed on the next page are the survival times in daysof 72 guinea pigs after they were injected with infectious bacteria in a medicalexperiment.11 Survival times, whether of machines under stress or cancerpatients after treatment, usually have distributions that are skewed to the right.

Page 24: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

60 C H A P T E R 2 • Describing Distributions with Numbers

43 45 53 56 56 57 58 66 67 73 74 7980 80 81 81 81 82 83 83 84 88 89 9191 92 92 97 99 99 100 100 101 102 102 102

103 104 107 108 109 113 114 118 121 123 126 128137 138 139 144 145 147 156 162 174 178 179 184191 198 211 214 243 249 329 380 403 511 522 598

(a) Graph the distribution and describe its main features. Does it show theexpected right skew?

(b) Which numerical summary would you choose for these data? Calculate yourchosen summary. How does it reflect the skewness of the distribution?

Dorling Kindersley/Getty Images

2.35 Never on Sunday: also in Canada? Exercise 1.4 (page 10) gives the number ofbirths in the United States on each day of the week during an entire year. Theboxplots in Figure 2.6 are based on more detailed data from Toronto, Canada: thenumber of births on each of the 365 days in a year, grouped by day of the week.12

Based on these plots, give a more detailed description of how births depend onthe day of the week.

120

100

80

Nu

mb

er o

f b

irth

s

Day of week

Monday Tuesday Wednesday Thursday Friday Saturday Sunday

60

F I G U R E 2 . 6 Boxplots of thedistributions of numbers ofbirths in Toronto, Canada, oneach day of the week during ayear, for Exercise 2.35.

2.36 Does breast-feeding weaken bones? Breast-feeding mothers secrete calciuminto their milk. Some of the calcium may come from their bones, so mothers may4

STEPSTEP lose bone mineral. Researchers compared 47 breast-feeding women with 22women of similar age who were neither pregnant nor lactating. They measuredthe percent change in the mineral content of the women’s spines over threemonths. Here are the data:13

Page 25: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Chapter 2 Exercises 61

Breast-feeding women Other women

−4.7 −2.5 −4.9 −2.7 −0.8 −5.3 2.4 0.0 0.9 −0.2 1.0 1.7−8.3 −2.1 −6.8 −4.3 2.2 −7.8 2.9 −0.6 1.1 −0.1 −0.4 0.3−3.1 −1.0 −6.5 −1.8 −5.2 −5.7 1.2 −1.6 −0.1 −1.5 0.7 −0.4−7.0 −2.2 −6.5 −1.0 −3.0 −3.6 2.2 −0.4 −2.2 −0.1−5.2 −2.0 −2.1 −5.6 −4.4 −3.3−4.0 −4.9 −4.7 −3.8 −5.9 −2.5−0.3 −6.2 −6.8 1.7 0.3 −2.3

0.4 −5.3 0.2 −2.2 −5.1

Do the data show distinctly greater bone mineral loss among the breast-feedingwomen? Follow the four-step process illustrated by Example 2.9.

2.37 Compressing soil. Farmers know that driving heavy equipment on wet soil 4STEPSTEP

compresses the soil and injures future crops. Table 2.3 gives data on the“penetrability” of the same soil at three levels of compression.14 Penetrability is ameasure of the resistance plant roots meet when they grow through the soil. Lowpenetrability means high resistance. How does increasing compression affectpenetrability? Follow the four-step process in your work.

T A B L E 2 . 3 Penetrability of soil at three compression levels

Soil Compression Level

Compressed Intermediate Loose

2.86 3.13 3.992.68 3.38 4.202.92 3.10 3.942.82 3.40 4.162.76 3.38 4.292.81 3.14 4.192.78 3.18 4.133.08 3.26 4.412.94 2.96 3.982.86 3.02 4.413.08 3.54 4.112.82 3.36 4.302.78 3.18 3.962.98 3.12 4.033.00 3.86 4.892.78 2.92 4.122.96 3.46 4.002.90 3.44 4.343.18 3.62 4.273.16 4.26 4.91

Page 26: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

62 C H A P T E R 2 • Describing Distributions with Numbers

T A B L E 2 . 4 2005 salaries for the Boston Red Sox baseball team

Player Salary Player Salary Player Salary

Manny Ramirez $19,806,820 Tim Wakefield $4,670,000 Doug Mirabelli $1,500,000Curt Schilling 14,500,000 David Wells 4,075,000 Wade Miller 1,500,000Johnny Damon 8,250,000 Jay Payton 3,500,000 John Halama 850,000Edgar Renteria 8,000,000 Kevin Millar 3,500,000 Matt Mantei 750,000Jason Varitek 8,000,000 Alan Embree 3,000,000 Ramon Vazquez 700,000Trot Nixon 7,500,000 Mike Timlin 2,750,000 Mike Myers 600,000Keith Foulke 7,500,000 Mark Bellhorn 2,750,000 Kevin Youkilis 323,125Matt Clement 6,500,000 Bill Mueller 2,500,000 Adam Stern 316,000David Ortiz 5,250,000 Bronson Arroyo 1,850,000

Brian Snyder/CORBIS

2.38 Athletes’ salaries. In 2004, the Boston Red Sox won the World Series for thefirst time in 86 years. Table 2.4 gives the salaries of the Red Sox players as ofopening day of the 2005 season. Describe the distribution of salaries both with agraph and with a numerical summary. Then write a brief description of theimportant features of the distribution.

2.39 Returns on stocks. How well have stocks done over the past generation? TheStandard & Poor’s 500 stock index describes the average performance of thestocks of 500 leading companies. Because the average is weighted by the totalmarket value of each company’s stock, the index emphasizes larger companies.Here are the real (that is, adjusted for the changing buying power of the dollar)returns on the S&P 500 for the years 1972 to 2004:

Year Return Year Return Year Return

1972 15.070 1983 18.075 1994 −1.3161973 −21.522 1984 2.253 1995 34.1671974 −34.540 1985 26.896 1996 19.0081975 28.353 1986 17.390 1997 31.1381976 18.177 1987 0.783 1998 26.5341977 −12.992 1988 11.677 1999 17.8811978 −2.264 1989 25.821 2000 −12.0821979 4.682 1990 −8.679 2001 −13.2301980 17.797 1991 26.594 2002 −23.9091981 −12.710 1992 4.584 2003 26.3111982 17.033 1993 7.127 2004 7.370

What can you say about the distribution of real returns on stocks? Follow thefour-step process in your answer.

2.40 A standard deviation contest. This is a standard deviation contest. You must

4STEPSTEP

choose four numbers from the whole numbers 0 to 10, with repeats allowed.

(a) Choose four numbers that have the smallest possible standard deviation.

Page 27: CHAPTER Mitchell Funk/Getty Images 2virtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch2-37-63.pdf · Describing Distributions ... you can key the data into your calculator and hit the

P1: OSO

GTBL011-02 GTBL011-Moore-v14.cls May 3, 2006 8:46

Chapter 2 Exercises 63

(b) Choose four numbers that have the largest possible standard deviation.

(c) Is more than one choice possible in either (a) or (b)? Explain.

2.41 Test your technology. This exercise requires a calculator with a standarddeviation button or statistical software on a computer. The observations

10,001 10,002 10,003

have mean x = 10, 002 and standard deviation s = 1. Adding a 0 in the center ofeach number, the next set becomes

100,001 100,002 100,003

The standard deviation remains s = 1 as more 0s are added. Use your calculatoror software to find the standard deviation of these numbers, adding extra 0s untilyou get an incorrect answer. How soon did you go wrong? This demonstrates thatcalculators and software cannot handle an arbitrary number of digits correctly.

2.42 You create the data. Create a set of 5 positive numbers (repeats allowed) thathave median 10 and mean 7. What thought process did you use to create yournumbers?

2.43 You create the data. Give an example of a small set of data for which the meanis larger than the third quartile.

Exercises 2.44 to 2.47 make use of the optional material on the 1.5 × I QR rule forsuspected outliers.

2.44 Tornado damage. Table 1.3 (page 32) shows the average property damage causedby tornadoes over a 50-year period in each of the states and Puerto Rico. Thedistribution is strongly skewed to the right.

(a) Give the five-number summary. Explain why you can see from these fivenumbers that the distribution is right-skewed.

(b) Your histogram from Exercise 1.33 suggests that a few states are outliers.Show that there are no suspected outliers according to the 1.5 × I QR rule. Yousee once again that a rule is not a substitute for plotting your data.

(c) Find the mean property damage. Explain why the mean and median differ sogreatly for this distribution.

2.45 Carbon dioxide emissions. Table 1.5 (page 33) gives carbon dioxide (CO2)emissions per person for countries with population at least 20 million. A stemplotor histogram shows that the distribution is strongly skewed to the right. TheUnited States and several other countries appear to be high outliers.

(a) Give the five-number summary. Explain why this summary suggests that thedistribution is right-skewed.

(b) Which countries are outliers according to the 1.5 × I QR rule? Make astemplot of the data or look at your stemplot from Exercise 1.35. Do you agreewith the rule’s suggestions about which countries are and are not outliers?

2.46 Athletes’ salaries. Which members of the Boston Red Sox (Table 2.4) havesalaries that are suspected outliers by the 1.5 × I QR rule?

2.47 Returns on stocks. The returns on stocks in Exercise 2.39 vary a lot: they rangefrom a loss of more than 34% to a gain of more than 34%. Are any of these yearssuspected outliers by the 1.5 × I QR rule?