Lesson 2.3.1 Quantifying Variability Relative to the Median...

12
© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT STATWAYSTUDENT HANDOUT Lesson 2.3.1 Quantifying Variability Relative to the Median STUDENT NAME DATE INTRODUCTION Recall the monthly normal temperatures for St. Louis, Missouri, and San Francisco, California, which are presented again in the following table: Monthly Normal Temperatures (°F) for St. Louis and San Francisco Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9 San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7 In earlier lessons, you examined representative values for the center of the distributions of the monthly normal temperatures, including the mean and median. In this lesson, you will consider the variability (or spread) in the data. 1 Examine the dotplots below for the monthly normal temperatures for St. Louis and San Francisco. Write a sentence comparing how variable the monthly normal temperature values are for the two cities. 80 75 70 65 60 55 50 45 40 35 30 St. Louis San Franciso Temperatures Dotplots of Temperatures for St. Louis and San Franciso

Transcript of Lesson 2.3.1 Quantifying Variability Relative to the Median...

Page 1: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

STATWAY™ STUDENT HANDOUT

Lesson 2.3.1 Quantifying Variability Relative to the Median

STUDENT NAME DATE

INTRODUCTION

Recall the monthly normal temperatures for St. Louis, Missouri, and San Francisco, California, which are

presented again in the following table:

Monthly Normal Temperatures (°F) for St. Louis and San Francisco

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

St. Louis 29.3 33.9 45.1 56.7 66.1 75.4 79.8 77.6 70.2 58.4 46.2 33.9

San Francisco 51.1 54.4 54.9 56.0 56.6 58.4 59.1 60.1 62.3 62.0 57.2 51.7

In earlier lessons, you examined representative values for the center of the distributions of the monthly

normal temperatures, including the mean and median. In this lesson, you will consider the variability (or

spread) in the data.

1 Examine the dotplots below for the monthly normal temperatures for St. Louis and San Francisco.

Write a sentence comparing how variable the monthly normal temperature values are for the two

cities.

8075706560555045403530

St. Louis

San Franciso

Temperatures

Dotplots of Temperatures for St. Louis and San Franciso

Page 2: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 2

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

2 In previous lessons, you explored how to represent and interpret data in graphical displays such as

dotplots and histograms. Then you learned to summarize the center of a distribution numerically

using measures such as the sample mean and sample median. Reporting a number to represent

the distribution is important, but in this example the centers of the distributions are similar and

the variability is very different.

A Consider how you might report a single number to represent the variability in the data for St.

Louis. What value would you report?

B Use the data from San Francisco to report a similar single number to represent the variability

in the monthly normal temperatures.

C Do the numbers you determined to represent the variability in monthly normal temperatures

for St. Louis and San Francisco capture the differences in the distributions? Is the value for St.

Louis substantially larger to reflect greater variability?

3 As individuals or in groups, share the numeric values you used to represent the variability in

monthly normal temperatures for St. Louis and San Francisco with the rest of the class. Explain

why you chose these values to represent variability and how they facilitate the comparison of the

variability for the two cities.

NEXT STEPS – PART 1

4 One statistic (or number) that can be used to represent the variability in quantitative data is the

range. This is the maximum value minus the minimum data value:

range = maximum – minimum

Page 3: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 3

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

Report the values of the minimum, maximum, and range in monthly normal temperature for St.

Louis. Report the corresponding values for San Francisco. Make a brief comparison of the ranges

for the two cities.

One problem with the range is that it is sensitive to outliers and extreme observations. In other words, it

only depends on the values at the lower and upper ends of the list of sorted values. These values do not

represent the central portion of the data. It is possible that they are unusual compared to the rest of the

distribution. Also the range depends on only two values, the rest of the values don’t matter.

Another approach to quantify variability is to find the quartiles, which are the first and third quarter points

in the data.

Here are the monthly normal temperatures for St. Louis, sorted from smallest to largest.

St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8

Consider breaking the data up into quarters. Since there are 12 observations in the St. Louis data, each

quarter of the data contains 12/4 = 3 observations. The dividers in the following table illustrate the four

quarters of the data.

St. Louis 29.3 33.9 33.9 45.1 46.2 56.7 58.4 66.1 70.2 75.4 77.6 79.8

The median value, which you computed in Lesson 2.2.1, falls in the middle of this list, at the middle divider.

The value of the median is (56.7 + 58.4)/2 = 57.55 °F.

The first-quarter point occurs between the values 33.9 and 45.1. This value is called the first quartile, and is

denoted by Q1. In this example, Q1 = (33.9 + 45.1)/2 = 39.5 °F.

Note that the first quartile (Q1) represents the median of the lower half of values in the dataset. That is, a

median of half of the dataset is a quarter point.

The third-quarter point occurs between the values of 70.2 and 75.4 °F. The third quartile is denoted by Q3

and is found to be Q3 = (70.2 + 75.4)/2 = 72.8 °F. Once again, the third quartile represents the median of the

higher half of values in the data set.

Page 4: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 4

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

5 Together with the median, minimum, and maximum, the quartiles form what is called the five-

number summary. Here is the five-number summary for the monthly normal temperatures in St.

Louis.

St. Louis

Minimum 29.3

Q1 (first quartile) 39.5

Median 57.55

Q3 (third quartile) 72.8

Maximum 79.8

Use the information for the monthly normal temperatures in San Francisco to construct the

five-number summary. Report the values in the following table:

San Francisco

Minimum

Q1 (first quartile)

Median

Q3 (third quartile)

Maximum

Note: When the median is an exact value from the data, that value is not included in either the upper

half or lower half of the data.

6 An alternative way to quantify variability in numeric data is to find the distance between the

quartiles (Q1 and Q3). This value is called the interquartile range and is abbreviated as IQR. The

formula is IQR = Q3 – Q1. The middle 50% of the data fall between Q1 and Q3, so the IQR gives

the width of the middle 50%.

For the monthly normal temperatures in St, Louis, the IQR is Q3 – Q1 = 72.75 – 39.50 = 33.25 °F.

Report the value of the IQR for the monthly normal temperatures in San Francisco.

Page 5: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 5

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

7 Write a comparison of the values of the IQRs for the monthly normal temperatures in St. Louis and

San Francisco. Did your conclusion match the conclusion from your visual examination of the data

in Question 1? Did you reach the same conclusion as the one you reached when you compared

the ranges of the values in Question 4? Explain.

The values in a five-number summary can be represented in a graph called a boxplot (sometimes these are

called a “box and whiskers plot”).

Here is a boxplot for the monthly normal temperatures for St. Louis:

80

70

60

50

40

30

Te

mp

era

ture

Boxplot of St. Louis Temperatures

8 Here is the five-number summary for the temperatures in St. Louis.

min = 29.3°F Q1 = 39.5°F med = 57.55°F Q3 = 72.8°F max = 79.8°F

What does each part of the boxplot represent relative to the five-number summary? Label the

boxplot above with the name of each number.

Page 6: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 6

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

9 The graph below contains the boxplot for St. Louis. Sketch the boxplot for San Francisco beside the

boxplot for St. Louis.

San FranciscoSt. Louis

80

70

60

50

40

30

80

70

60

50

40

30

Te

mp

era

ture

Boxplots of St. Louis and San Francisco Temperatures

10 Write a brief comparison of the variability (spread) represented in the boxplots of monthly normal

temperatures for St. Louis and San Francisco. Compare your conclusion based on the boxplots

with the other comparisons you have made in this lesson.

NEXT STEPS – PART 2

We mentioned outliers in the previous lesson. An outlier is a data value that deviates greatly from the

overall pattern. In Statistics we need a rule to define outliers we cannot just use our own judgment. While

there is no universally agree upon rule, one common rule is based on IQR, the interquartile range. We start

at Q1 and Q3 and go outwards 1.5 IQR’s. Here’s the formula:

Q1 - 1.5(IQR) and Q3 + 1.5(IQR)

These are called the fences for outliers. We call any value below the first fence or above the second fence

an outlier.

Page 7: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 7

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

11 The ages of the last 30 Academy Award winners for Best Actress are given in the table below.

21 25 26 26 28 29 29 29 30 32 33 33 33 33 34

35 35 35 38 39 41 43 45 45 49 49 61 61 74 80

A The ages are already sorted. Find the five-number summary for the data and the IQR.

B Find the fences for outliers. Identify any outliers in the data.

When we draw boxplots for data that contain outliers, we stop the line at the last data value that is not an

outlier and draw the outliers separately. The boxplot for the actresses is given below. Notice that boxplots

can be vertical or horizontal.

80706050403020

Age

Best Actress Oscar Winner Ages, 1982-2011

Page 8: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 8

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

SUMMARY

The example in this lesson generates focus on differences in variability because the centers of the two

distributions are essentially identical. The context of monthly normal temperatures is accessible and

provides the opportunity to concentrate on the concept of quantifying variability.

The range is simple to compute, but it is sensitive to outliers and extreme observations. The IQR offers a

relatively simple measure of variability, and it is resistant to the effect of outliers and extreme observations.

Boxplots provide a graph with a simple structure that contains visual representations of center and

variability. Boxplots are analogous to a skeleton of a data set: They are always based on five simple summary

values of the data set, like bones, but represent data sets of different sizes and characteristics, like bodies

built on top of skeletons.

The IQR also allows us to determine whether high or low values are outliers. We use the heart of the data to

determine fences or boundaries where outliers start. We need to remember that the method we used is

somewhat arbitrary and will sometimes give us neighboring values where one is an outlier and one is not an

outlier.

Page 9: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 9

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

TAKE IT HOME

1 Recall the home team–visiting team differences in scores for NBA games during the last week

of 2006 (using a synthetic basketball) and the first week of 2007 (using the traditional leather

basketball) displayed in the following tables:

2006 Final Score Differences

16 13 11 1 19 5 2 23 –7 8 10

15 –13 –7 6 23 –16 –25 6 –13 15 25

3 2 14 23 9 10 –1 10 26 9 –10

19 10 22 –3 –10 1 7 14 –11 6 17

8 29 23 10 –4 –7 2 10 10 14 6

2007 Final Score Differences

–6 –8 –1 4 24 –11 –8 5 12 –3 7

9 –15 2 –18 –2 11 3 9 –24 –4 14

19 –9 –9 2 5 32 28 –5 –18 13 11

12 17 5 –12 4 –7 –5 3 –14 4 8

23 –3 5

A Report the values for the minimum, maximum, and range for the data values in 2006. Report

the corresponding values for 2007. B Report the values of the first and third quartiles (Q1 and Q3), for the data values in 2006.

Report the IQR for 2006. Report the corresponding values for 2007. C Construct side-by-side boxplots for the values in 2006 and 2007. Describe the important

features of the distributions of the home team minus visiting team score differences based on the graphs. Make sure to talk about both center and variability. Refer to the specific graphs to support your conclusion.

Page 10: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 10

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

2 Suppose an error was made when the data values for monthly normal temperatures in St. Louis

were recorded in the table below. The first two digits for the July temperature were reversed.

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

Temperature 29.3 33.9 45.1 56.7 66.1 75.4 97.8 77.6 70.2 58.4 46.2 33.9

A How does this error affect the range? Does it increase the range, decrease the range, or leave

it unchanged? B How does this error affect the IQR? Does it increase the IQR, decrease the IQR, or leave it

unchanged?

Note: The IQR is resistant to outliers or extreme observations. The range is sensitive to outliers and extreme

observations. C How does the change affect the boxplot for the data? Explain, or illustrate by sketching the

boxplot.

Page 11: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 11

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

3 The following table contains the ages of the 30 women on the 15th season (2010) of “The

Bachelor.”

Bachelorette Age Bachelorette Age

Alli 24 Lindsay 25

Ashley H 26 Lisa M 24

Ashley S 26 Lisa P 27

Britnee 25 Madison 25

Britt 25 Marissa 26

Chantal 28 Meghan 30

Cristy 30 Melissa 32

Emily 24 Michelle 30

J 26 Raichel 29

Jackie 27 Rebecca 30

Jill 28 Renee 28

Keltie 28 Sarah L 25

Kimberly 27 Sarah P 27

Lacey 27 Shawntel 25

Lauren 26 Stacey 26

A Construct a dotplot or histogram for the ages of the bachelorettes. Describe important

features of the graph. B Report the values in the five-number summary for the ages of the bachelorettes. C Construct a boxplot for the ages of the bachelorettes.

Page 12: Lesson 2.3.1 Quantifying Variability Relative to the Median 2/Students/PDF/lesson_2.3.1_version_… · Note: When the median is an exact value from the data, that value is not included

STATWAY STUDENT HANDOUT | 12

Lesson 2.3.1 Quantifying Variability Relative to the Median

© 2011 THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING A PATHWAY THROUGH STATISTICS, VERSION 1.5, STATWAY™ - STUDENT HANDOUT

D Compare the boxplot you constructed in Question 3c to the graph you constructed in Question 3a. Which do you prefer to represent the distribution of the ages of the bachelorettes? Explain your reasoning.

+++++ This lesson is part of STATWAY™, A Pathway Through College Statistics, which is a product of a Carnegie Networked Improvement Community that seeks to advance student success. Version 1.0, A Pathway Through Statistics, Statway™ was created by the Charles A. Dana Center at the University of Texas at Austin under sponsorship of the Carnegie Foundation for the Advancement of Teaching. This version 1.5 and all subsequent versions, result from the continuous improvement efforts of the Carnegie Networked Improvement Community. The network brings together community college faculty and staff, designers, researchers and developers. It is an open-resource research and development community that seeks to harvest the wisdom of its diverse participants in systematic and disciplined inquiries to improve developmental mathematics instruction. For more information on the Statway Networked Improvement Community, please visit carnegiefoundation.org. For the most recent version of instructional materials, visit Statway.org/kernel.

+++++ STATWAY™ and the Carnegie Foundation logo are trademarks of the Carnegie Foundation for the Advancement of Teaching. A Pathway Through College Statistics may be used as provided in the CC BY license, but neither the Statway trademark nor the Carnegie Foundation logo may be used without the prior written consent of the Carnegie Foundation.