Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

63
Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1

Transcript of Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Page 1: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning

Turning Data Into

Information

Chapter 2

1

Page 2: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 2

2.1 Raw Data

• Raw data are for numbers and category labels that have been collected but have not yet been processed in any way.

• Example list of questions and raw data for a student:

Page 3: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 3

2.1 Raw Data

• An observation is an individual entity in a study.

• A variable is a characteristic that may differ among individuals.

• Sample data are collected from a subset of a larger population.

• Population data are collected when all individuals in a population are measured.

• A statistic is a summary measure of sample data.

• A parameter is a summary measure of population data.

Page 4: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 4

2.2 Types of Variables• Raw data from categorical variables consist of

group or category names that don’t necessarily have a logical ordering. Examples: eye color, country of residence.

• Categorical variables for which the categories have a logical ordering are called ordinal variables. Examples: highest educational degree earned, tee shirt size (S, M, L, XL).

• Raw data from quantitative variables consist of numerical values taken on each individual. Examples: height, number of siblings.

Page 5: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 5

Asking the Right Questions

One Categorical Variable

Example: What percentage of college students favor the legalization of marijuana, and what percentage of college students oppose legalization of marijuana?

Ask: How many and what percentage of individuals fall into each category?

Page 6: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 6

Asking the Right Questions

Two Categorical Variables

Example: In Case Study 1.6, we asked if the risk of having a heart attack was different for the physicians who took aspirin than for those who took a placebo.

Ask: Is there a relationship between the two variables? Does the chance of falling into a particular category for one variable depend on which category an individual is in for the other variable?

Page 7: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 7

Asking the Right Questions

One Quantitative Variable

Example: What is the average body temperature for adults, and how much variability is there in body temperature measurements?

Ask: What are the interesting summary measures, like the average or the range of values?

Page 8: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 8

Asking the Right Questions

One Categorical and One Quantitative Variable

Example: Do men and women drive at the same “fastest speeds” on average?

Ask: Are the measurements similar across categories or do they differ? Could be asked regarding the averages or the ranges.

Page 9: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 9

Asking the Right Questions

Two Quantitative Variables

Example: Does average body temperature change as people age?

Ask: Are these variables related so that when measurements are high (or low) on one variable the measurements for the other variable also tend to be high (or low)?

Page 10: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 10

Explanatory and Response Variables

Many questions about the relationship between two variables.

It is useful to identify one variable as the explanatory variable and the other variable as the response variable.

In general, the value of the explanatory variable for an individual is thought to partially explain the value of the response variable for that individual.

Page 11: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 11

2.3 Summarizing One or TwoCategorical Variables

• Count how many fall into each category.

• Calculate the percent in each category.

• If two variables, have the categories of the explanatory variable define the rows and compute row percentages.

Numerical Summaries

Page 12: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 12

Example 2.1 Seatbelt Use by Twelfth-Graders

2003 nationwide survey of American HS students“How often do you wear a seatbelt when driving a car?”•Total sample size n = 3042 students.

• A majority, 1686/3042 = .554, or 55.4%, said they always wear a seatbelt, while 115/3042 = .038, or 3.8%, said they never wear a seatbelt.

• Rarely or never: 8.2% + 3.8% =12%

Page 13: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 13

Example 2.1 Seatbelt Use by Twelfth-Graders

Are females more likely to say always wear seatbelt? Are males more likely to say rarely or never wear seatbelt?

• Females: 915/1467 = 62.4% said always wear seatbelt • Males: 771/1575 = 49.0% said always wear seatbelt.• Males: 10.5% + 5.7% =16.2% rarely or never wear one.• Females: 5.7% + 1.7% = 7.4% rarely or never wear one.

Page 14: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 14

2.3 Summarizing One or TwoCategorical Variables

• A frequency distribution for a categorical variable is a listing of all categories along with their frequencies (counts).

• A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages, for example).

Frequency and Relative Frequency

Page 15: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 15

Example 2.2 Lighting the Way to Nearsightedness

Survey of n = 479 children. Those who slept with nightlight or in fully lit room before age 2 had higher incidence of nearsightedness (myopia) later in childhood.

Note: Study does not prove sleeping with light actually caused myopia in more children.

Page 16: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 16

• Pie Charts: useful for summarizing a single categorical variable if not too many categories.

• Bar Graphs: useful for summarizing one or two categorical variables and particularly useful for making comparisons when there are two categorical variables.

Visual Summaries for Categorical Variables

Page 17: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 17

Example 2.3 Humans Are Not Good Randomizers

Survey of n = 190 college students. “Randomly pick a number between 1 and 10.”

Results: Most chose 7, very few chose 1 or 10.

Page 18: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 18

Example 2.4 Revisiting Nightlights and Nearsightedness

Survey of n = 479 children.

Response: Degree of Myopia

Explanatory:Amount of Sleeptime Lighting

Page 19: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 19

2.4 Finding Information in Quantitative Data

Long list of numbers – needs to be organized to obtain answers to questions of interest.

Page 20: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 20

• Find extremes (high, low), the median, and the quartiles (medians of lower and upper halves of the values).

• Quick overview of the data values.• Information about the center,

spread, and shape of data.

Five-Number Summaries

Page 21: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 21

Example 2.5 Right Handspans

• Majority of females had handspans between 19 and 21 cm,and many males had handspans between 21.5 and 23 cm.

• Two females with unusually small handspans.

Page 22: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 22

Example 2.5 Right Handspans

About 25% of handspans of females are between 12.5 and 19.0 centimeters,

• about 25% are between 19 and 20 cm,• about 25% are between 20 and 21 cm, and• about 25% are between 21 and 23.25 cm.

Page 23: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 23

• Location: center or average. e.g. median• Spread: variability e.g. difference between

two extremes or two quartiles.• Shape: clumped in middle or on one end

(more later)• Outliers: a data point that is not consistent

with the bulk of the data

Summary Features of Quantitative Variables

Page 24: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 24

Example 2.6 Annual Compensation for Highest Paid CEOs in the United StatesPaid compensation (in millions of $) for 50 highest-paid CEOs in 2008 for Fortune Magazine’s Top 500 companies

Median: ~ $35.6 millionMinimum: $24.3 millionMaximum: $557 million (perhaps outlier?)Shape: most clumped on lower end (= skewed)

Page 25: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 25

Example 2.7 Ages of Death of U.S. First Ladies

Partial Data Listing and five-number summary:

Extremes are more interesting here:Who died at 34? Martha JeffersonWho lived to be 97? Bess Truman

Page 26: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 26

Pictures for Quantitative Data

• Histograms: similar to bar graphs, used for any number of data values.

• Stem-and-leaf plots and dotplots: present all individual values, useful for small to moderate sized data sets.

• Boxplot or box-and-whisker plot: useful summary for comparing two or more groups.

Page 27: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 27

• Values are centered around 20 cm.• Two possible low outliers.• Apart from outliers, spans range from about 16 to 23 cm.

Interpreting Histograms, Stemplots, and Dotplots

Page 28: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 28

Creating a HistogramStep 1: Decide how many equally spaced (same

width) intervals to use for the horizontal axis. Between 6 and 15 intervals is a good number.

Step 2: Decide to use frequencies (count) or relative frequencies (proportion) on the vertical axis.

Step 3: Draw equally spaced intervals on horizontal axis covering entire range of the data. Determine frequency or relative frequency of data values in each interval and draw a bar with corresponding height. Decide rule to use for values that fall on the border between two intervals.

Page 29: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 29

Example 2.8 Ages of Death of First Ladies

Two different histograms

Page 30: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 30

Creating a Dotplot• Draw a number line (horizontal axis)

to cover range from smallest to largest data value.

• For each observation, place a dot above the number line located at the observation’s data value.

• When multiple observations with the same value, dots are stacked vertically

Page 31: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 31

Creating a Stem-and-Leaf PlotStep 1: Determine stem values. The “stem” contains all but the last of the displayed digits of a number. Stems should define equally spaced intervals.

Step 2: For each individual, attach a “leaf” to the appropriate stem. A “leaf” is the last of the displayed digits of a number. Often leaves are ordered on each stem.Note: More than one way to define stems.

Can use split-stems or truncate/round values first.

Page 32: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 32

Example 2.9 Big Music Collection How many songs on iPod or MP3?

Final two digits truncated• 2510: stem label of 2 and leaf value of 5• 500: stem label of 0 and leaf value of 5• 30: stem value is 0 and leaf value also 0Two stems for each 1000s possibility: • first = leaf values 0, 1, 2, 3, 4 • second = leaf values 5, 6, 7, 8, 9Shape is skewed right

2510, 500, 500, 1300, 687, 600, 500, 2600, 30, 900, 800, 0, 750, 1500, 1500, 2400, 800, 2017, 1150, 5000, 4000, 1250, 1700, 3305

Page 33: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 33

Describing Shape• Symmetric, bell-shaped• Symmetric, not bell-shaped• Skewed Right: values trail off to right• Skewed Left: values trail off to left

Page 34: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 34

Boxplots: Picturing Location and Spread for Group Comparisons

• Box covers the middle 50% of the data

• Line within box marks the median value

• Possible outliers are marked with asterisk

• Apart from outliers, lines extending from box reach to min and max values.

Page 35: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 35

Using Visual Displays• To illustrate location and spread,

any of the pictures work well. • To illustrate shape,

histograms and stem-and-leaf plots are best.• To see individual values,

use stem-and-leaf plots and dotplots.• To sort values,

use stem-and-leaf plots. • To compare groups,

use side-by-side boxplots.• To identify outliers

using the standard definition, use a boxplot.

Page 36: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 36

2.6 Numerical Summaries of Quantitative Data

Notation for Raw Data:

n = number of individuals in a data setx1, x2 , x3,…, xn represent individual raw data values

Example: A data set consists of handspan values in centimeters for six females; the values are 21, 19, 20, 20, 22, and 19.

Then, n = 6x1= 21, x2 = 19, x3 = 20, x4 = 20, x5 = 22, and x6 = 19

Page 37: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 37

Describing the Location of a Data Set• Mean: the numerical average

• Median: the middle value (if n odd) or the average of the middle two values (n even)

Symmetric: mean = medianSkewed Left: mean < medianSkewed Right: mean > median

Page 38: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 38

Determining the Mean and Median

The Mean

where means “add together all the values” ixn

xx i

The MedianIf n is odd: M = middle of ordered values.

Count (n + 1)/2 down from top of ordered list.

If n is even: M = average of middle two ordered values.Average values that are (n/2) and (n/2) + 1 down from top of ordered list.

Page 39: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning39

Example 2.12 Will “Normal” RainfallGet Rid of Those Odors?

Mean = 18.69 inchesMedian = 16.72 inches

Data: Average rainfall (inches) for Davis, California for 47 years

In 1997-98, a company with odor problem blamed it on excessive rain.That year rainfall was 29.69 inches. More rain occurred in 4 other years.

Page 40: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 40

The Influence of Outliers on the Mean and Median

Larger influence on mean than median.

High outliers will increase the mean. Low outliers will decrease the mean.

If ages at death are: 76, 78, 80, 82, and 84then mean = median = 80 years.

If ages at death are: 46, 78, 80, 82, and 84 then median = 80 but mean = 74 years.

Page 41: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 41

Describing Spread: Range and Interquartile Range

• Range = high value – low value

• Interquartile Range (IQR) = upper quartile – lower quartile

• Standard Deviation (covered later in Section 2.7)

Page 42: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 42

Example 2.13 Fastest Speeds Ever Driven

Five-Number Summary for 87 males

• Median = 110 mph measures the center of the data• Two extremes describe spread over 100% of data

Range = 150 – 55 = 95 mph• Two quartiles describe spread over middle 50% of data

Interquartile Range = 120 – 95 = 25 mph

Page 43: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 43

Notation and Finding the Quartiles

Split the ordered values into the half that is below the median and the half that is above the median.Q1 = lower quartile

= median of data valuesthat are below the median

Q3 = upper quartile = median of data values

that are above the median

Page 44: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 44

Example 2.14 Fastest Speeds for MenOrdered Data (in rows of 10 values) for the 87 males:

• Median = (87+1)/2 = 44th value in the list = 110 mph• Q1 = median of the 43 values below the median =

(43+1)/2 = 22nd value from the start of the list = 95 mph• Q3 = median of the 43 values above the median =

(43+1)/2 = 22nd value from the end of the list = 120 mph

Page 45: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 45

How to Draw a Boxplot and Identify Outliers

Step 1: Label either a vertical axis or a horizontal axis with numbers from min to max of the data.

Step 2: Draw box with lower end at Q1 and upper end at Q3.Step 3: Draw a line through the box at the median M.Step 4: Calculate IQR = Q3 – Q1.Step 5: Draw a line from Q1 end of box to smallest data value

that is not further than 1.5 IQR from Q1. Draw a line from Q3 end of box to largest data value that is not further than 1.5 IQR from Q3.

Step 6: Mark data points further than 1.5 IQR from either edge of the box with an asterisk. Points represented with asterisks are considered to be outliers.

Page 46: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 46

Percentiles

The kth percentile is a number that has k% of the data values at or below it and (100 – k)% of the data values at or above it.

• Lower quartile = 25th percentile• Median = 50th percentile• Upper quartile = 75th percentile

Page 47: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 47

Outlier: a data point that is not consistent with the bulk of the data.

2.6 How to Handle Outliers

• Look for them via graphs.

• Can have big influence on conclusions.

• Can cause complications in some statistical analyses.

• Cannot discard without justification.

Page 48: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 48

Possible Reasons for Outliersand Reasonable Actions

• Outlier is legitimate data value and represents natural variability for the group and variable(s) measured. Values may not be discarded — they provide important information about location and spread.

• Mistake made while taking measurement or entering it into computer. If verified, should be discarded/corrected.

• Individual in question belongs to a different group than bulk of individuals measured. Values may be discarded if summary is desired and reported for the majority group only.

Page 49: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 49

Example 2.16 Tiny Boatsmen

Weights (in pounds) of 18 men on crew team:

Note: last weight in each list is unusually small.

They are the coxswains for their teams, while others are rowers.

Cambridge:188.5, 183.0, 194.5, 185.0, 214.0, 203.5, 186.0, 178.5, 109.0

Oxford: 186.0, 184.5, 204.0, 184.5, 195.5, 202.5, 174.0, 183.0, 109.5

Page 50: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 50

2.7 Bell-Shaped Distributionsof Numbers

Many measurements follow a predictable pattern:

• Most individuals are clumped around the center

• The greater the distance a value is from the center, the fewer individuals have that value.

Variables that follow such a pattern are said to be “bell-shaped”. A special case is called a normal distribution or normal curve.

Page 51: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 51

Example 2.17 Bell-Shaped British Women’s Heights

Data: representative sample of 199 married British couples.Below shows a histogram of the wives’ heights with a normal curve superimposed. The mean height = 1602 millimeters.

Page 52: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 52

Describing Spread with Standard Deviation

Standard deviation measures variability by summarizing how far individual data values are from the mean.

Think of the standard deviation as roughly the average distance values fall from the mean.

Page 53: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 53

Describing Spread with Standard Deviation

Both sets have same mean of 100.

Set 1: all values are equal to the mean so there is no variability at all.

Set 2: one value equals the mean and other four values are 10 points away from the mean, so the average distance away from the mean is about 10.

Page 54: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 54

Formula for the (sample) standard deviation:

The value of s2 is called the (sample) variance. An equivalent formula, easier to compute, is:

Calculating the Standard Deviation

1

2

n

xxs i

1

22

n

xnxs i

Page 55: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 55

Step 1: Calculate , the sample mean.

Step 2: For each observation, calculate the difference between the data value and the mean.

Step 3: Square each difference in step 2.

Step 4: Sum the squared differences in step 3, and then divide this sum by n – 1.

Step 5: Take the square root of the value in step 4.

Calculating the Standard Deviation

x

Page 56: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 56

Consider four pulse rates: 62, 68, 74, 76

704

280

4

76746862

xStep 1:

Steps 2 and 3:

Step 4:

4014

1202

s

Step 5:

3.640 s

Example 2.18 Calculating a Standard Deviation

Page 57: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 57

Data sets usually represent a sample from a larger population. If the data set includes measurements for an entire population, the notations for the mean and standard deviation are different, and the formula for the standard deviation is also slightly different. A population mean is represented by the symbol (“mu”), and the population standard deviation is

Population Mean and Standard Deviation

n

xi

2

Page 58: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 58

Interpreting the Standard Deviation for Bell-Shaped Curves:

The Empirical RuleFor any bell-shaped curve, approximately • 68% of the values fall within 1 standard

deviation of the mean in either direction • 95% of the values fall within 2 standard

deviations of the mean in either direction• 99.7% of the values fall within 3 standard

deviations of the mean in either directionNote: ~0.3% fall farther than 3 standard deviations from mean

Page 59: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 59

Example 2.19 Women’s Heights revisited

Mean height for the 199 British women is 1602 mm and standard deviation is 62.4 mm.

• 68% of the 199 heights would fall in the range 1602 62.4, or 1539.6 to 1664.4 mm

• 95% of the heights would fall in the interval 1602 2(62.4), or 1477.2 to 1726.8 mm

• 99.7% of the heights would fall in the interval 1602 3(62.4), or 1414.8 to 1789.2 mm

Page 60: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 60

Example 2.19 Women’s Heights revisited

Note: Not perfect, but follows Empirical Rule quite well

Page 61: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 61

The Empirical Rule, the Standard Deviation, and the Range

• Empirical Rule => the range from the minimum to the maximum data values equals about 4 to 6 standard deviations for data with an approximate bell shape.

• You can get a rough idea of the value of the standard deviation by dividing the range by 6.

6

Ranges

Page 62: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 62

Standardized z-Scores

Standardized score or z-score:

deviation Standard

Mean valueObserved z

Example: Mean resting pulse rate for adult men is 70 beats per minute (bpm), standard deviation is 8 bpm. The standardized score for a resting pulse rate of 80:

25.18

7080

z

A pulse rate of 80 is 1.25 standard deviations above the mean pulse rate for adult men.

Page 63: Copyright ©2011 Brooks/Cole, Cengage Learning Turning Data Into Information Chapter 2 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 63

The Empirical Rule Restated

For bell-shaped data, • About 68% of values have z-scores between –1 and +1. • About 95% of values have z-scores between –2 and +2. • About 99.7% of values have z-scores between –3 and +3.