Descriptive Statistics Unit

119
Descriptive Statistics (Level IV Graduate Math) Draft (NSSAL) C. David Pilmer ©2011 (Last Updated: Dec 2011)

Transcript of Descriptive Statistics Unit

Page 1: Descriptive Statistics Unit

Descriptive Statistics

(Level IV Graduate Math)

Draft

(NSSAL)

C. David Pilmer

©2011

(Last Updated: Dec 2011)

Page 2: Descriptive Statistics Unit

This resource is the intellectual property of the Adult Education Division of the Nova Scotia Department of Labour and Advanced Education. The following are permitted to use and reproduce this resource for classroom purposes.

• Nova Scotia instructors delivering the Nova Scotia Adult Learning Program • Canadian public school teachers delivering public school curriculum • Canadian nonprofit tuition-free adult basic education programs

The following are not permitted to use or reproduce this resource without the written authorization of the Adult Education Division of the Nova Scotia Department of Labour and Advanced Education.

• Upgrading programs at post-secondary institutions • Core programs at post-secondary institutions • Public or private schools outside of Canada • Basic adult education programs outside of Canada

Individuals, not including teachers or instructors, are permitted to use this resource for their own learning. They are not permitted to make multiple copies of the resource for distribution. Nor are they permitted to use this resource under the direction of a teacher or instructor at a learning institution. Acknowledgments

The Adult Education Division would also like to thank the following NSCC instructors for reviewing this resource and offering suggestions during its development. Eileen Burchill (IT Campus)

Nancy Harvey (Akerley Campus)

Eric Tetford (Burridge Campus)

Tanya Tuttle-Comeau (Cumberland Campus)

Alice Veenema (Kingstec Campus)

Page 3: Descriptive Statistics Unit

NSSAL i Draft ©2011 C. D. Pilmer

Table of Contents Introduction…………………………………………………………………………... ii Negotiated Completion Date…………………………………………………………. ii The Big Picture………………………………………………………………………. Course Timelines……………………………………………………………………..

iii iv

Populations and Samples ……………………………………………………………. 1 Tables ………………………………………………………………………………... 3 Types of Data ……………………………………………………………………….. 5 Bar Graphs and Histograms ………………………………………………………… 7 Circle Graphs and Line Graphs ……………………………………………………… 15 First Impressions ……………………………………………………………………. 20 Second Impressions …………………………………………………………………. 22 What Type of Graph Should be Used ………………………………………………. 24 Mean, Median, Mode, and Trimmed Mean …………………………………………. 26 Box and Whisker Plots ………………………………………………………………. 34 Using Technology to Make Box and Whisker Plots ………………………………… 41 Standard Deviation …………………………………………………………………... 46 Using Technology to Calculate Population Standard Deviation …………………….. 52 Distributions …………………………………………………………………………. 57 Normal Distributions and the 68-95-99.7 Rule ……………………………………… 60 Z-Scores ……………………………………………………………………………… 68 Growth Charts ……………………………………………………………………….. 80 Putting It Together …………………………………………………………………… 85 Appendix Area Under the Normal Curve (z-Table) …………………………………………….. 96 Weight-for-Age Percentiles: Boys …………………………………………………... 97 Length-for-Age Percentiles: Boys …………………………………………………… 98 Head Circumference-for-Age: Boys ………………………………………………… 99 Post-Unit Reflections ………………………………………………………………… 100 Answers ……………………………………………………………………………… 101

Page 4: Descriptive Statistics Unit

NSSAL ii Draft ©2011 C. D. Pilmer

Introduction Statistics is the discipline concerned with the collection, organization, and analysis of data to draw conclusions or make predictions. Statistics is widely employed in government, business, and the natural and social sciences. In this unit we will focus on descriptive statistics; the branch of statistics that deals with the description of data. In the first part of the unit, we will look at the different ways data can be presented using graphs (e.g. bar graphs, histograms, circle graphs, line graphs,…) and how these graphs can be interpreted. In the next part of the unit we will learn how to determine and interpret measures of central tendency and standard deviation. In descriptive statistics, we must differentiate between two important terms; population and sample. A population is the set representing all measurements of interest to an investigator. A sample is a subset of measurements selected randomly from the population of interest. It is probably easier to look at these terms in the following way. Suppose you wanted to know the average income of working adults in your community. If you asked every working adult in the community, then you are dealing with the population. If, however, you randomly selected and interviewed only a portion of the working adults in your community, then you are dealing with a sample. For the sake of simplicity, this unit will only focus on populations. For example, if one of the questions supplies student scores on a test, you will assume that these scores represent all the student scores, not a randomly selected portion of the scores. The other branch of statistics that we have not discussed is inferential statistics. In the case of inferential statistics one makes inferences about population characteristics based on evidence drawn from samples. Translated you take a random sample from a population and use the information collected from that small sample to make a prediction about the much larger population. For example if you wanted to know how much time Nova Scotian adults between the ages of 20 years and 40 years of age spent watching television on weekdays, it would be impractical to collect data from every NS adult in that age group. It would be very challenging, time-consuming, and expensive. It would make more sense to randomly select 300 adults from that age group, collect the data, analyze the data, and use that data to predict the average number of hours all NS adults in that age group view television on weekdays. Although inferential statistics is an extremely important branch of statistics, it goes beyond what is needed for a graduate level math course. Inferential statistics is, however, examined in the Academic Level IV Math course. Negotiated Completion Date

After working for a few days on this unit, sit down with your instructor and negotiate a completion date for this unit.

Start Date: _________________

Completion Date: _________________

Instructor Signature: __________________________

Student Signature: __________________________

Page 5: Descriptive Statistics Unit

NSSAL iii Draft ©2011 C. D. Pilmer

The Big Picture The following flow chart shows the five required units and the four optional units (choose two of the four) in Level IV Graduate Math. These have been presented in a suggested order.

Note: You are not permitted to complete four ALP Approved Projects and thus avoid selecting from the Linear Functions and Linear Systems Unit, Trigonometry Unit, or Statistics Unit.

Math in the Real World Unit (Required) • Fractions, decimals, percents, ratios, proportions, and

signed numbers in real world applications • Career Exploration and Math

Solving Equations Unit (Required) • Solve and check equations of the form DCxBAx +=+ , CBxA += 2

, and CBxA += 3.

Consumer Finance Unit (Required) • Simple Interest and Compound Interest • TVM Solver (Loans and Investments) • Credit and Credit Scores

Graphs and Functions Unit (Required) • Understanding Graphs • Linear Functions and Line of Best Fit

Measurement Unit (Required) • Imperial and Metric Measures • Precision and Accuracy • Perimeter, Area and Volume

Choose two of the four.

Linear Functions and Linear Systems

Unit

Trigonometry Unit

Statistics Unit ALP Approved Projects

(Complete 2 of the 5 projects.)

Page 6: Descriptive Statistics Unit

NSSAL iv Draft ©2011 C. D. Pilmer

Course Timelines Graduate Level IV Math is a two credit course within the Adult Learning Program. As a two credit course, learners are expected to complete 200 hours of course material. Since most ALP math classes meet for 6 hours each week, the course should be completed within 35 weeks. The curriculum developers have worked diligently to ensure that the course can be completed within this time span. Below you will find a chart containing the unit names and suggested completion times. The hours listed are classroom hours. Unit Name Minimum

Completion Time in Hours

Maximum Completion Time

in Hours Math in the Real World Unit 24 36 Solving Equations Unit 20 28 Consumer Finance Unit 18 24 Graphs and Functions Unit 28 34 Measurement Unit 24 30 Selected Unit #1 20 24 Selected Unit #2 20 24 Total: 154 hours Total: 200 hours As one can see, this course covers numerous topics and for this reason may seem daunting. You can complete this course in a timely manner if you manage your time wisely, remain focused, and seek assistance from your instructor when needed.

Page 7: Descriptive Statistics Unit

NSSAL 1 Draft ©2011 C. D. Pilmer

Populations and Samples As we learned in the introduction, descriptive statistics is concerned with the description of data. This means that we look at methods that organize data and summarize data in an effective presentation that ultimately increases our understanding of the data. In the same introduction, we learned about populations and samples. A population is the set representing all measurements of interest to an investigator. A sample is a subset of measurements selected randomly from the population of interest. The relationship between a sample and population can be represented by the diagram on the right where the sample is a small portion of the population. With the exception of this small section of the unit, we are only going to focus on populations. Example 1 The Testing and Evaluation Division of the Department of Education reported that the average mark on the grade 12 provincial math exam was 68%. This average was obtained by randomly selecting 500 exams from throughout the province. Are we dealing with a sample or a population? Explain.

Answer: The Testing and Evaluation Division randomly selected 500 exams, rather than every exam. For this reason they were dealing with a sample (i.e. a subset of the population).

Example 2 Statistics Canada had all households complete the long-form census. They reported that the average salary, after tax, of unattached individuals in 2009 was $31 500. Are we dealing with a sample or a population? Explain.

Answer: Since every household, which would include every unattached individual, was reporting,

then we are dealing with a population (i.e. all measurements of interest). Questions:

1. The town’s mayor is interested in knowing what portion of her 4127 taxpayers support the development of a new recreational center in the community. Because it is too costly to contact all the taxpayers, a survey of 300 randomly selected taxpayers is conducted. Describe the population and sample for this problem.

Population

Sample

Page 8: Descriptive Statistics Unit

NSSAL 2 Draft ©2011 C. D. Pilmer

2. A building contractor just purchased 6000 used bricks. He knows that a small portion of these bricks are cracked and therefore unusable. He randomly selected 200 bricks and discovered that 14 of them were unusable. Describe the population and sample for this problem.

3. A company conducted a phone survey that involved 1200 randomly selected employed

workers from Nova Scotia. Each participant had to report their annual gross income. At the time (2009) it was known that there were 453 000 employed workers in Nova Scotia. After conducting the survey and analyzing the data, the company reported an average annual income of 29 900 for the 1200 participants. Describe the population and sample for this problem.

4. Between 2001 and 2009, 3730 adults obtained high school diplomas through the Nova Scotia

School for Adult Learning (NSSAL). The Nova Scotia government wanted to know how many of these adults pursued further education after obtaining their diploma. After interviewing 240 randomly selected graduates, it was discovered that 65% had pursued post secondary education primarily at the Nova Scotia Community College. Describe the population and sample for this problem.

Page 9: Descriptive Statistics Unit

NSSAL 3 Draft ©2011 C. D. Pilmer

Tables Investigation: The Fringe Movie Festival

A small privately owned multiplex movie theatre has decided to host a fringe movie festival. Over the weekend, they are showing "cheesy" prequel movies that are obvious parodies of the original blockbusters. The following table shows the number of tickets sold for each movie over the weekend. They have broken the tickets into three categories: senior, adult, and child tickets. Movie Senior

Tickets Adult

Tickets Child

Tickets Jaws: The Teething Years 158 349 54 Terminator: Rise of the Toasters 33 412 47 Star Wars: Episode 0 133 341 146 Avatar: Evolving from the Blue Man Group 51 409 136 Transformers: The Horse and Buggy Years 62 350 122 Use the table to answer the following questions.

1. Which movie had the greatest number of child viewers? 2. Which movie had the greatest number of viewers during the festival? How did you arrive at

this answer? 3. Which movie had the fewest number of viewers during the festival? 4. Based solely on ticket sales, what movie appeared to be most popular by both seniors and

adults? How did you arrive at this answer? 5. Based solely on ticket sales, what movie appeared to be least popular by both seniors and

adults? 6. Could you quickly answer the questions above? Besides a table, what other way could the

data be displayed so that you can more efficiently address the questions?

Page 10: Descriptive Statistics Unit

NSSAL 4 Draft ©2011 C. D. Pilmer

7. Here is the stacked bar graph corresponding to the fringe movie festival ticket sales data.

0

100

200

300

400

500

600

700

Jaw

s: T

heTe

ethi

ngY

ears

Term

inat

or:

Ris

e of

the

Toas

ters

Sta

r War

s:E

piso

de 0

Ava

tar:E

volv

ing

from

the

Blu

eM

an G

roup

Tran

sfor

mer

s:Th

e H

orse

and

Bug

gy Y

ears

Num

ber o

f Tic

kets

Sol

d

Child TicketsAdult TicketsSenior Tickets

What are your thoughts regarding presenting the data in this graphical form? 8. Was the fringe movie festival data collected on the previous page derived from a sample or a

population? Justify your answer.

Page 11: Descriptive Statistics Unit

NSSAL 5 Draft ©2011 C. D. Pilmer

Types of Data In the last section we learned that data is often easier to understand if it is expressed as a graph instead of a table. Before we can look at all the different ways data can be displayed in graphical form (e.g. line graphs, circle graphs, histograms, …), we need to take a few minutes and learn about the different types of data. These different types influence the type of graph that can be used. When data is collected, the responses can be classified as a categorical data set or a numerical data set. These two terms are most easily explained using an example. Suppose we have an adult education class comprised of 10 learners who all have cell phones. The instructor asks two questions and obtains the following responses. Question 1: What cell phone provider do you use? Responses to Question 1: {Telus, Bell Aliant, Telus, Bell Aliant, Rogers, Rogers, Koodo, Rogers, Telus, Rogers} Question 2: What was your cell phone bill for the previous month? Responses to Question 2: {$27.80, $33.50, $45.70, $32.00, $54.90, $29.00, $43.65, $67.40, $35.89, $39.67} The collection of responses to the first question is called a categorical data set. Categorical data is data that can be assigned to distinct non-overlapping categories. The responses to question 1 fit into four categories; Bell Aliant, Koodo, Rogers and Telus. The collection of responses to the second question is called a numerical data set. This is the case because the data is comprised of numbers, specifically different amounts of money.

There are two types of numerical data; discrete and continuous. Numerical data is discrete if the possible values are isolated points on a number line. For example, if survey participants were asked how many phone calls they made today, their responses would be whole numbers like 0, 4 or 12. They would not respond with something like 7.8 phone calls. Since they can only report isolated points, then we end up with discrete numerical data. Numerical data is continuous if the set of possible values forms an entire interval on the number line. For example, if soil samples were tested for acidity, the pH could be reported with numbers like 4, 4.17, 4.173, or any other number in the interval. Generally continuous data arises when observations involve making measurements (e.g. weighing objects, recording temperatures, recording time to complete tasks,…) while discrete data arises when observations involve counting.

Page 12: Descriptive Statistics Unit

NSSAL 6 Draft ©2011 C. D. Pilmer

Question:

1. For each of the following, state whether the data collection would result in a categorical data set or numerical data set. If the data is numerical, indicate whether we are dealing with discrete or continuous data.

(a) Concentration in parts per million (ppm) of a particular contaminant in water supplies

(b) Brand of personal computer purchased by customers

(c) The sex of children born at the IWK Hospital in December

(d) The height of male adult education learners at a specific campus

(e) The number of children in each household.

(f) The gross income of adult workers between the ages of 25 and 35 in Nova Scotia

(g) The races of people immigrating to Canada

(h) The time it takes for females between the ages of 20 and 30 to complete the 100 m dash

(i) The sum of the numbers rolled on two dice

(j) The amount of gas purchased by individual UltraCan customers on a specific day

(k) The size of shoe purchased by teenage males

(l) The destination city or town for summer vacations

(m) The head circumference of a newborn child

(n) The country of manufacture for vehicles in the staff parking lot at the NSCC Waterfront Campus

Page 13: Descriptive Statistics Unit

NSSAL 7 Draft ©2011 C. D. Pilmer

Bar Graphs and Histograms Bar graphs and histograms look very similar so learners often get them confused. Bar graphs are used to display categorical data or discrete numerical data. The bars in bar graphs are separated from one another. Examples of bar graphs are shown below.

Bar Graph #1 In this survey, 60 randomly selected Australian students were asked to report in which month they were born.

Bar Graph #2 In this survey, 200 randomly selected international students were asked which hand they write with.

Histograms are used to display continuous numerical data where the data is organized into classes. The bars on a histogram are not separated from one another.

Histogram #1 In this survey, 100 randomly selected students from all over the world were asked to report how long it took to travel from home to school. In this case the class width is 5. The first class goes from 0 to 5, not including five. The second class goes from 5 to 10, not including 10.

Histogram #2 Forty randomly selected secondary students from Canada were asked to report their heights in centimeters. As with Histogram #1, the class width in this case is 5 however the intervals do not start and end on multiples of 5. For example the first class showing a value is centered at 120. That means that this class goes from 117.5 to 122.5, not including 122.5.

Page 14: Descriptive Statistics Unit

NSSAL 8 Draft ©2011 C. D. Pilmer

Bar graphs also come in different forms; two of the most common are stacked bar graphs and double bar graphs. We have already been exposed to stacked bar graphs when we completed the questions regarding the fringe movie festival in the section titled "Tables." On a stacked bar graph the bars are divided into categories so that we can compare the parts to the whole. In the case of the fringe movie festival graph, the bars were divided into three categories: senior tickets, adult tickets, and child tickets. By doing this we can quickly see how those three types of tickets sales contributed to the overall sales for each movie. Double bar graphs allow one to present more than one kind of information, situation, or event in one graph, instead of drawing two separate bar graphs. One of the most common uses is to simultaneously display data for both males and females. The example on the right shows how the coffee purchasing decisions for males and female differ at a particular coffee shop on a particular morning. It should be mentioned that in all the bar graph examples we have provided to this point, the bars have been oriented vertically. Bar graphs can also be drawn such that the bars are in a horizontal orientation. That is what we have done with the stacked bar graph on the right which was obtained using the data from the fringe movie festival.

0

100

200

300

400

500

600

700

Jaw

s: T

heTe

ethi

ngY

ears

Term

inat

or:

Ris

e of

the

Toas

ters

Sta

r War

s:E

piso

de 0

Ava

tar:E

volv

ing

from

the

Blu

eM

an G

roup

Tran

sfor

mer

s:Th

e H

orse

and

Bug

gy Y

ears

Num

ber o

f Tic

kets

Sol

d

Child TicketsAdult TicketsSenior Tickets

0

5

10

15

20

25

30

35

40

45

small coffee medium coffee large coffee

quan

tity

sold

malefemale

0 100 200 300 400 500 600 700

Jaws: The TeethingYears

Terminator: Rise ofthe Toasters

Star Wars:Episode 0

Avatar:Evolvingfrom the Blue Man

Group

Transformers: TheHorse and Buggy

Years

Senior TicketsAdult TicketsChild Tickets

Page 15: Descriptive Statistics Unit

NSSAL 9 Draft ©2011 C. D. Pilmer

Example 1 Anne tracked the additional time, in minutes, she spent outside of regular class time to work on her five courses, over two days (Wednesday and Thursday). That information is displayed in the graph below.

0

5

10

15

20

25

30

35

40

Biolog

y

Communicati

ons

Math

History

Sociology

Min

utes

of A

dditi

onal

Wor

k

WednesdayThursday

(a) How much time did she spend on Thursday doing additional work in History? (b) In what subject and on what day did she spend 25 minutes doing additional work? (c) In what subject did she spend the same amount of time on Wednesday and Thursday doing

additional work? (d) How much more time did she spend on Wednesday doing additional work in Math compared

to Thursday? (e) How much more time did she spend on Thursday doing addition work in Biology compared

to History? (f) How much time over the two days did she spend doing additional work in Biology and

Communications?

Answers: (a) 10 minutes (b) Math on Thursday (c) Sociology (She spent 15 minutes each day) (d) Math Wednesday: 30 minutes Math Thursday: 25 minutes 30 - 25 = 5 minutes (e) Biology Thursday: 20 minutes History Thursday: 10 minutes 20 - 10 = 10 minutes (f) 15 + 20 + 20 + 35 = 90 minutes or 1.5 hours

Page 16: Descriptive Statistics Unit

NSSAL 10 Draft ©2011 C. D. Pilmer

Example 2 Thirty-six randomly selected males between the ages of 20 and 29 years of ages were weighed. The weights in pounds are shown below.

210 174 224 186 188 182 166 188 207 178 160 188 143 203 171 182 215 194 177 191 189 162 193 181 194 181 178 186 192 174 192 167 155 202 181 196

(a) Construct a histogram with class widths of 10 starting at 140. (b) What percentage of the randomly selected males weighed less than 180 pounds?

Answers: (a) Construct a table to organize the data in terms of the classes. The first class is from 140

to 150 includes 140 but does not include 150.

Class Tally Frequency 140 to 150 1

150 to 160 1

160 to 170 4

170 to 180 6

180 to 190 11

190 to 200 7

200 to 210 3

210 to 220 2

220 to 230 1

Now construct the histogram.

(b) Out of the 36 participants, 12 weighed less than 180 pounds.

%3133100

3612

Page 17: Descriptive Statistics Unit

NSSAL 11 Draft ©2011 C. D. Pilmer

Questions

1. A study was conducted to see which major league sport is most popular. In the study, they looked at how many fans (in millions) each sport has. The information is displayed using a bar graph.

Acronyms: NFL: National Football League NBA: National Basketball Association MLB: Major League Baseball NHL: National Hockey League NASCAR: National Association for Stock

Car Auto Racing (a) Which sport is most popular amongst the fans? (b) Approximate the number of fans the National Hockey League has. (c) Which major league sport has 120 million fans? (d) Approximately how many more fans does the NFL have compared to the NBA? (e) Is this a bar graph or histogram? 2. The medal counts for the 2006 and 2010 winter Olympics for four countries have been

provided in the following graph.

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Canada

United States

Germany

Norway

number of medals

20102006

(a) What type of graph are we dealing with?

0

20

40

60

80

100

120

140

160

180

200

NFL NBA MLB NHL NASCAR

Num

ber o

f Fan

s (in

mill

ions

)

Page 18: Descriptive Statistics Unit

NSSAL 12 Draft ©2011 C. D. Pilmer

(b) Of the four countries, which had highest medal count in 2006? (c) What was the medal count for the United States in 2010? (d) Which country had a medal count of 19 in 2006? (e) How many more medals did Canada obtain in 2010 compared to 2006? (f) In 2010, how many more medals did the United States get compared to Germany? (g) What was the total medal count all four countries in 2010? (h) What was the total medal count for both Germany and the United States over the 2006

and 2010 winter Olympics? 3. The Canadian Nurses Association reported the age distribution of all registered nurses (RNs)

in Canada for the year 2009. This data was used the construct the following graph.

0

5000

10000

15000

20000

25000

30000

35000

40000

<24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65+

Age

Num

ber o

f RN

s

Source: Canadian Institute for Health Information (a) What type of graph are we dealing with?

(b) What type of data was used to construct this graph? (c) Approximately how many registered nurses in 2009 were between the ages 30 and 39?

Page 19: Descriptive Statistics Unit

NSSAL 13 Draft ©2011 C. D. Pilmer

(d) In 2009, approximately how many more 55 to 59 year old RNs are there compared to 60

to 64 year old RNs? (e) What three classes of ages had the greatest number of RNs in 2009? (f) Considering that Canada has an aging population, what potential problem is likely to

occur in the near future based on the information supplied in this graph. 4. The Nephrology and Hypertension Department of the Children's Hospital in London, Ontario

reported the number of cases they addressed over the different fiscal years (i.e. from April 1 of one year to March 31 of the next year). They broke the cases into three categories: new consults, consult visits, and inpatient days. New consults refer to cases that have been referred by an outside source (typically a family doctor) to the department. With each case, the information in the patient's medical file is reviewed to see if the patient needs can be served by the department. Consult visits refer to day clinic visits by patients. Inpatient days refer to hospital stays by patients whose immediate needs cannot be met by day clinic visits.

0100200300400500600700800900

10001100120013001400

2004

/2005

2005

/2006

2006

/2007

2007

/2008

2008

/2009

2009

/2010

Fiscal Year

Num

ber o

f Cas

es

Inpatient DaysConsult VisitsNew Consults

Source: University of Western Ontario, Department of Paediatrics (a) What type of graph are we dealing with? (b) Were there significant changes in the number of new consults to the Nephrolopgy and

Hypertension Department over the six fiscal years?

Page 20: Descriptive Statistics Unit

NSSAL 14 Draft ©2011 C. D. Pilmer

(c) Approximately how many cases were dealt with in the 2008/2009 fiscal year?

(d) Approximately how many consult visits were dealt with in 2004/2005?

(e) Approximately how many cases involving inpatient visits were addressed in 2005/2006?

(f) Approximately how many more cases involving consult visits occurred in 2006/2007

compared to 2005/2006?

(g) What was the big shift from 2008/2009 to 2009/2010? 5. Thirty randomly selected families of four were asked how much they spent on their last

family meal at a restaurant. The following data was obtained.

70 86 94 74 65 68 67 72 90 66 68 78 82 66 97 80 71 69 72 64 62 67 75 103 64 83 77 64 78 86

(a) Construct a histogram with class widths of 5 starting at 60. Reminder that the class 60 to 65 does not include the number 65. The 65 is in the next class.

Class Tally Frequency 60 to 65 65 to 70 70 to 75 75 to 80 80 to 85 85 to 90 90 to 95 95 to 100 100 to 105

(b) What percentage of the families spent $90 or more on their meal? (c) What type of data are we dealing with? (d) Are we dealing with a sample or population?

Page 21: Descriptive Statistics Unit

NSSAL 15 Draft ©2011 C. D. Pilmer

Circle Graphs and Line Graphs Circle graphs, also called pie charts, are divided into sectors where each sector represents part of a whole. Each sector is proportional in size to the amount each sector represents. For example if 70 out of 140 people responded that their favorite ice cream was chocolate, then the "chocolate" sector of the circle graph would be 50% or half of the circle graph.

Example 1 In 1999, registered nurses were asked to report where they were employed. The results are presented in the circle graph on the right. At the time there were 229 000 registered nurses in Canada. Source: Registered Nurses Database

(a) What percentage of registered nurses worked in nursing homes in 1999?

(b) Approximately how many registered nurses worked in hospitals in 1999?

(c) Approximately 9160 RNs were employed in what sector?

(d) Approximately how many RNs were employed in either home care or nursing homes?

(e) Approximately how many more RNs were employed in hospitals than in community health agencies?

(f) What is the ratio of RNs employed in community health agencies to nursing home?

Answers: (a) 12%

(b) 59% of 229 000 0.59 × 229 000 = 135 110 RNs

(c) %41002290009160

=× These RNs are working in home care.

(d) 4% + 12% = 16% 16% of 229 000 0.16 × 229 000 = 36 640 RNs

(e) 59% - 8% = 51% 51% of 229 000 0.51 × 229 000 = 116 790 RNs

(f) home nursing

agencyhealth community 32

41248

128

=÷÷

= ← desired ratio

Line graphs are created by plotting data points and connected them with lines. These lines are useful for showing trends; that is, how something changes in value as something else happens.

Home Care4%

Nursing Home12%

Hospital59%

Not Stated1%

Other16%

Community Health Agency

8%

Page 22: Descriptive Statistics Unit

NSSAL 16 Draft ©2011 C. D. Pilmer

Example 2 This line graph shows how the fertility rate in Canada has changed since 1950. The fertility rate is the average number of children born of women between the ages of 15 and 49. Source: Statistics Canada

(a) What was the approximate fertility rate in 1970? (b) In what year was the fertility rate approximately

3.2? (c) How much did the fertility rate drop by between

1960 and 1970? (d) After 1960, when did the fertility rate increase?

Answers: (a) 2.3 (b) 1965 (c) 3.9 - 2.3 = 1.6 The fertility rate dropped by approximately 1.6 (d) It only increased slightly between 1985 and 1990. Questions

1. The following circle graph was constructed using data collected from all patients over a one month period at a specific emergency room. That month 1200 patients visited the site.

(a) What is the leading cause of emergency room visits to this location during this month? (b) How many more times likely was the staff at this emergency room going to see a patient

injured in an auto accident compared to a patient having respiratory problems? (c) How many patients suffering from work related injuries sought treatment at the

emergency room?

heart attacks14%

miscellaneous7%

w ork injuries24%

auto accidents 27%

respiratory problems

9%

home injuries19%

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

Year

Fert

ility

Rat

e

Page 23: Descriptive Statistics Unit

NSSAL 17 Draft ©2011 C. D. Pilmer

(d) How many more patients sought treatment for heart attacks compared to patients

suffering from respiratory problems? (e) Which one of the following represents the ratio of patients with worked related injuries to

patients suffering from heart attacks? (Multiple Choice)

(i) 127 (ii)

712

(iii) 2714 (iv)

1427

(f) What was the cause of emergency room visits for 228 patients? 2. The following graph shows the value of Canada's exports from January 2008 until November

2010. The values are expressed in millions of Canadian dollars; for example the number 20,000 on the vertical scale represents $20,000 million dollars or $20 billion dollars.

0.00

5,000.00

10,000.00

15,000.00

20,000.00

25,000.00

30,000.00

35,000.00

40,000.00

45,000.00

50,000.00

Jan-0

8

Februa

ryMarc

hApri

lMay

JuneJu

ly

Augus

t

Septem

ber

Octobe

r

Novembe

r

Decembe

r

Jan-0

9

Februa

ryMarc

hApri

lMay

JuneJu

ly

Augus

t

Septem

ber

Octobe

r

Novembe

r

Decembe

r

Jan-1

0

Februa

ryMarc

hApri

lMay

JuneJu

ly

Augus

t

Septem

ber

Octobe

r

Novembe

r

Expo

rts in

Mill

ions

of D

olla

rs

Source: Statistics Canada

(a) Name at least three periods when Canada's exports largely remained unchanged.

Page 24: Descriptive Statistics Unit

NSSAL 18 Draft ©2011 C. D. Pilmer

(b) During what month and year did Canada's exports almost reach $45 billion dollars? (c) When were Canada's exports lowest between Jan-08 and Nov-10? (d) Approximately how much did exports drop by between October 2008 and January 2009?

Based on your knowledge of world events, why do you think this occurred? 3. There were 725 housing starts in the first quarter of 2011 in Nova Scotia. These starts were

broken into four categories: single detached (i.e. single dwelling homes), semi-detached (i.e. single-family home that is joined on one side to another home), row housing (i.e. townhouse), and apartments.

Single Detached, 293

Apartments, 337

Semi-detached, 60Row Housing, 35

Source: Canada Mortgage and Housing Corporation

(a) What percentage of the housing starts was for single detached homes? (b) What is the ratio of row housing starts to semi-detached starts? (c) How many more apartment starts were there compared to the combined row housing and

semi-detached starts?

Page 25: Descriptive Statistics Unit

NSSAL 19 Draft ©2011 C. D. Pilmer

(d) The Canada Mortgage and Housing Corporation predicts that the second quarter housing starts in Nova Scotia will increase from 725 to 850. If they assume that the proportion of single detached starts remains the same from the first quarter to the second, how many single detached starts do they anticipate in this second quarter?

4. The value of stock changes over time. The following line graph shows how the Research in

Motion (RIM) stock changed over the month of June in 2011. Notice that the month is comprised of 22 days, rather than 30. There were only 22 trading days in June 2011; stocks are not traded on weekends.

2021222324252627282930313233343536373839404142

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Trading Day

Valu

e of

RIM

Sot

ck ($

)

Source: Nasdaq.com

(a) On what trading day was the greatest single day loss in the value of RIM shares during the month of June? Approximate the amount that was lost per share on that day.

(b) By how much approximately did the stock drop by from the beginning of the month until

the end of the month? (c) On what trading day was the greatest single day gain in the value of RIM shares during

the month of June? Approximate the amount that each share increased by on that day.

Page 26: Descriptive Statistics Unit

NSSAL 20 Draft ©2011 C. D. Pilmer

First Impressions Part 1 Grocery store customers were asked to identify their favorite brand of ice cream. Once the data was collected, a circle graph was constructed. It is shown on the right.

What is your first impression regarding customer's preferences for particular brands of ice cream? Part 2 The 2001 population counts for five urban centres in Canada were used to construct this graph. Source: Statistics Canada

What is your first impression regarding the population counts for these centres?

Jen and Berry Ice Cream

Faxter Ice Cream

Charmer Dairies Ice Cream

50000

60000

70000

80000

90000

100000

110000

120000

130000

140000

Lethbridge,AB

Moncton,NB

Nanaimo,BC

Sarnia, ON Trois-Riveres, QC

Pop

ulat

ion

Page 27: Descriptive Statistics Unit

NSSAL 21 Draft ©2011 C. D. Pilmer

Part 3 The owners of an amusement park kept track of the number of male and female patrons that used four particular rides in the park on a weekday morning. They used the data to construct the following graph. What is your first impression regarding the patron usage of these rides? Part 4 The following line graph shows how the average price of a domestic flight from Halifax changed between the first quarter of 2007 until the third quarter of 2010. Source: Statistics Canada

What is your first impression regarding the change in the price of a domestic flight

160

170

180

190

200

210

I-200

7 II III IVI-2

008 II III IV

I-200

9 II III IVI-2

010 II III

Quarters

Ave

rage

Dom

estic

Far

e

0%

20%

40%

60%

80%

100%

Hurl-a-Twirl Death Drop BumperBoats

Zip Line

Per

cent

age

FemalesMales

Page 28: Descriptive Statistics Unit

NSSAL 22 Draft ©2011 C. D. Pilmer

Second Impressions We are going to re-examine some of the real world applications that we were exposed to in the section titled "First Impressions." In part 1 of First Impressions, we looked at a circle graph regarding customer's preference for particular brands of ice cream. We have redrawn the circle graph using the same data. Based on this new perspective of the circle graph, have your first impressions changed? Why or why not? In part 2 of First Impressions, we looked at a bar graph regarding 2001 population counts for five Canadian urban centres. We have redrawn the graph using the same data. Based on this new graph, has your first impression changed? Why or why not?

Faxter Ice Cream

28%

Charmer Dairies Ice

Cream36%

Jen and Berry Ice Cream36%

0100002000030000400005000060000700008000090000

100000110000120000130000140000

Lethbridge,AB

Moncton,NB

Nanaimo,BC

Sarnia, ON Trois-Riveres, QC

Popu

latio

n

Page 29: Descriptive Statistics Unit

NSSAL 23 Draft ©2011 C. D. Pilmer

In part 3 of First Impressions, we looked at a stacked bar graph regarding the patron usage of four specific rides in an amusement park. We have redrawn the graph using the same data. Based on this new graph, has your first impression changed? Why or why not? In part 4 of First Impressions, we looked at a line graph regarding the average price of domestic flights from Halifax. We have redrawn the graph using the same data. Based on this new graph, has your first impression changed? Why or why not? Why did we bother exposing you to the two versions of each of these graphs?

0

50

100

150

200

250

I-200

7 II III IVI-2

008 II III IV

I-200

9 II III IVI-2

010 II III

Quarters

Ave

rage

Dom

estic

Far

e

0

50

100

150

200

250

Hurl-a-Twirl Death Drop BumperBoats

Zip Line

Num

ber o

f Peo

ple

FemalesMales

Page 30: Descriptive Statistics Unit

NSSAL 24 Draft ©2011 C. D. Pilmer

What Type of Graph Should Be Used? Below you have been provided with data tables. Indicate what type of graph (histogram, line, circle, bar, double bar, or stacked bar graph) you would use for this data. In a few cases, there can be more than one acceptable answer.

1. Graph Type: _______________________

2. Graph Type: _______________________

Favorite Music Genre

Male Female

Pop 90 150 Rock 120 70 Hip Hop 70 60 Country 100 120 Blues 50 40 Other 70 60

Favorite Movie Genre

Percentage

Action 32 Comedy 18 Drama 15 Horror 8 Science Fiction 21 Other 6

3. Graph Type: _______________________

4. Graph Type: _______________________

time (s) distance (m) 0 0 2 1.6 4 3.2 6 4.8 8 6.4

Time Commuting to Work (min)

Frequency

0 - 10 27 10 - 20 39 20 - 30 58 30 - 40 43 > 40 12

5. Graph Type: _______________________

6. Graph Type: _______________________ Triathlon

Athlete Swim Time (min)

Bike Time (min)

Run Time (min)

Anne 10 55 35 Jane 12 54 37 Denise 13 58 40 Meera 11 53 39 Yoshi 10 53 41

Blood Type Percentage A+ 35.7 A- 6.3 O+ 37.4 O- 6.6 B+ 8.5 B- 1.5 AB+ 3.4 AB- 0.6

Page 31: Descriptive Statistics Unit

NSSAL 25 Draft ©2011 C. D. Pilmer

7. Graph Type: _______________________

8. Graph Type: _______________________

Town Population in 2006

Amherst 9505 Digby 2092 Kentville 5812 Pictou 3813 Port Hawkesbury 3517

Television Audience Share (%) Program

Type 1996 - 1997 2001 - 2002

Comedy 12 8 Drama 13 9 Reality 10 8

9. Graph Type: _______________________

10. Graph Type: _______________________

Salaries in Thousands of Dollars

Number of Employees

15 - 25 16 25 - 35 43 35 - 45 57 45 - 55 48 55 - 65 23 65- 75 11 more than 75 6

Year Cell Phone Revenues (Billions of Canadian Dollars)

1997 3.3 1998 4.4 1999 4.6 2000 5.4 2001 6.0 2002 7.2 2003 8.1

Page 32: Descriptive Statistics Unit

NSSAL 26 Draft ©2011 C. D. Pilmer

Mean, Median, Mode, and Trimmed Mean Charlie looks at the marks his Level IV Graduate Math learners earned in a particular unit over the last year.

{81, 74, 91, 82, 79, 95, 78, 92, 86, 74, 78, 69, 84, 77, 88, 78, 71}

He wants to report how well his students performed on this particular unit without having to supply all seventeen pieces of data. He could use a histogram to display the results but he decides instead to calculate two measures of central tendency: the mean (arithmetic average) and median (middle). Mean

The most common measure of central tendency is the arithmetic average, or mean. When calculating a mean, statisticians differentiate between population means and sample means by using different symbols. The procedure for calculating either of these means is identical. The population mean and sample mean are calculated by adding all the data points and then dividing up the number of data points.

n

xxxx n++++=

...321µ where µ (mu) is the population mean

n

xxxxx n++++=

...321 where x (x bar) is the sample mean

Although in later sections of this unit, we are only going to concentrate on populations, in this section we will ask you to know both formulas, specifically the two symbols (µ and x ) used to represent the different means. Let's return to Charlie’s math marks. Since he is looking at the marks of all of the learners who completed the unit, he is dealing with a population. The population mean, µ , is calculated below.

171377

1771.78887784697874869278957982917481

...321

=

++++++++++++++++=

++++=

µ

µ

µn

xxxx n

81=µ The mean mark for Charlie’s learners on this unit is 81%.

Page 33: Descriptive Statistics Unit

NSSAL 27 Draft ©2011 C. D. Pilmer

Median

The mean is not the only way to describe the center. Another method is to use the “middle value” of the data which is called the median. The median separates the higher half of the data from the lower half. The median can be calculated in the following manner. 1. Arrange the data points in order of size, from smallest to largest. 2. If the number of data points is odd, then the median is the data point in the middle of the

ordered list. 3. If the number of data points is even, then the median is the mean of the two data points

that share the middle of the ordered list. Return to Charlie’s math marks. The median is calculated by following the procedure provided below.

Order the data points from smallest to largest 69, 71, 73, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95

Since we have an odd number of data points (n = 17), then median will be in the middle data point of the ordered list.

69, 71, 74, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95

The median will be 79. Suppose we had another instructor, Angela, who had sixteen learners who completed the same unit. She has recorded the marks that they made and worked out the mean and median. {99, 94, 80, 63, 77, 99, 68, 62, 95, 78, 66, 93, 65, 64, 98, 95} Mean:

161296

1695986465936678956268997763809499

...321

=

+++++++++++++++=

++++=

µ

µ

µn

xxxx n

81=µ The mean mark for these learners on this unit is 81%. Median: Order the data points from smallest to largest 62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99

Since the number of data points is even (n = 16), then the median is the mean of the two data points that share the middle of the ordered list.

62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99

Median 792

8078=

+=

Page 34: Descriptive Statistics Unit

NSSAL 28 Draft ©2011 C. D. Pilmer

Is the Mean and Median Enough?

These measures of central tendency often do not give us a complete understanding of the data set because they do not give any indication how the data is spread out. This is especially evident when we look at the means and medians for the two groups of math students previously discussed. Although the means and medians are identical for Charlie's and Angela's learners, the marks earned by the two groups are vastly different.

• In Charlie’s group, the majority of students earned marks between 71 and 88. There was only one mark in the sixties and only three marks in the nineties. The marks are clustered together.

• In Angela's group, learners could largely be divided into two groups; learners who did very well (i.e. obtained marks in the high 90's) and learners who found the material challenging (i.e. obtained marks in the 60's). The marks are not clustered together as they were with Charlie's learners.

Range of Marks Number of Charlie's

Learners Number of Angela's

Learners 60 to 65 0 3 65 to 70 1 3 70 to 75 3 0 75 to 80 5 2 80 to 85 3 1 85 to 90 2 0 90 to 95 2 2 95 to 100 1 5

It is important to note that our two measures of central tendency, mean and median, did not reveal this important difference between the two data sets. We will address this issue in a later section of this unit. When are the Mean and Median Not Close to Each Other?

There are times when the mean and median may not be close to each other. One case is if an outlier exists within the data set. An outlier is a data point that falls outside the overall pattern of the data set. Consider the following data set where the data points have already been arranged in ascending order.

{2.8, 3.0, 3.0, 3.1, 3.2, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.9, 4.0, 4.2, 16.7}

Notice that all but one data point is between 2.8 and 4.2. The mean for this data set is 4.3 and the median is 3.5. It is obvious that in this case the median is a far better measure of central tendency than the mean. The outlier, 16.7, greatly influenced the mean to a point where it no longer accurately represented the center of the data set. The extreme sensitivity of the mean to even a single outlier and the insensitivity of the median to outliers led to the development of trimmed means. Trimmed means are calculated by ordering

Page 35: Descriptive Statistics Unit

NSSAL 29 Draft ©2011 C. D. Pilmer

the data points from smallest to largest, deleting a selected number of points from both ends of the ordered list, and finally averaging the remaining numbers. For example to calculate the 5% trimmed mean, the bottom 5% of the data points and the top 5% of the data points are deleted. Consider the data set at the top of the page. We will calculate the 5% trimmed mean for this data set. If 5% of the number of data points (i.e. 5% of 15) is 0.75, we would round up to 1 (round to nearest whole number). Since we obtained a 1, we would drop one data point from the bottom and one data point from the top of the data set.

2.8, 3.0, 3.0, 3.1, 3.2, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.9, 4.0, 4.2, 16.7

Finally we work out the mean of the remaining thirteen data points.

5% trimmed mean = 13

2.40.49.37.36.35.35.34.34.32.31.30.30.3 ++++++++++++

= 3.5

Notice that this trimmed mean is equal to the median that we previously calculated. By eliminating the effects of outliers, the median and resulting mean should be in close proximity.

The symbol, ( )Tx , is used to represent a trimmed mean. The only problem with this symbol is that it does not indicate whether we are dealing with a 5%, 10%, 15% or 20% trimmed mean. Example 1 Twenty two runners of the 100 m dash were randomly selected from colleges and universities in Canada. The time of each runner in the last competition was recorded. Of these runners, one person had pulled a hamstring and another had tripped during their last competition. The times in seconds are recorded below. Determine the mean, median, and 10% trimmed mean.

10.23 10.89 11.76 9.87 11.54 10.52 18.57 9.72 12.05 11.56 10.15

11.33 10.75 9.96 19.42 11.68 12.09 11.49 11.67 10.19 10.52 9.99

Answer:

Mean = 22

99.952.1019.10...76.1189.1083.10 ++++++

= 11.63

Median: Rearrange the data points from smallest to largest. Since we are dealing with an even number of data points (22), then the median is the mean of the two data points that share the middle of the ordered list.

9.72, 9.87, 9.96, 9.99,…, 10.75, 10.89, 11.33, 11.49,…, 12.05, 12.09, 18.57, 19.42

Median 11.112

33.1189.10=

+=

Page 36: Descriptive Statistics Unit

NSSAL 30 Draft ©2011 C. D. Pilmer

10% Trimmed Mean If 10% of the number of data points (i.e. 10% of 22) is 2.2, we would round down

to 2 (round to nearest whole number). We will now drop two data points from the bottom and two data points from the top of the data set, and then work out the mean of the remaining eighteen data points.

9.72, 9.87, 9.96, 9.99, 10.15,…, 11.76, 12.05, 12.09, 18.57, 19.42

10% trimmed mean = 18

09.1205.1276.11...15.1099.996.9 ++++++

= 11.02 Mode

The mode of a set of data is the value in the set that occurs most frequently. For the following data, the mode is 6 because it occurs more times than any other value.

{2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 7, 7, 9, 10} Mode = 6

Many textbooks and websites refer to the mode as a measure of central tendency; this is incorrect. Although the mode is often around the center of the data set when the points are arranged from smallest to largest, this is not always the case. Consider the data we previously examined concerning Charlie's and Angela's Graduate Math learners. Data for Charlie's Learners Order the data points from smallest to largest, and identify the data point that occurs most

frequently. 69, 71, 73, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95 Mode = 78 Data for Angela's Learners Order the data points from smallest to largest, and identify the data point(s) that occurs most

frequently. 62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99 The data points 95 and 99 occur the most frequently therefore we state that is data set is

bimodal. Mode = 95 and 99 The mode for the Charlie's data is close to the center of the data set, however, the modes for Angela's data is not near the center.

Page 37: Descriptive Statistics Unit

NSSAL 31 Draft ©2011 C. D. Pilmer

Questions

Please use the appropriate symbols ( x , µ , and ( )Tx ) when answering these questions. 1. A study regarding the size of winter wolf packs in regions of the United States, Canada, and

Finland was conducted. The following data from 18 randomly selected packs was obtained.

2 3 15 8 7 8 2 4 13

7 3 7 10 7 5 4 2 4

(a) Are we dealing with a sample or a population? _____________________ (b) Determine the mean, median, and mode. (c) Why would the researchers not likely use a trimmed mean with this data set? 2. A local cab company has a fleet of nine cars. The company kept the records for the amount

of money each vehicle required for a one week period. The data is shown below.

$125 $157 $210 $139 $182 $167 $143 $150 $162

(a) Are we dealing with a sample or a population? _____________________ (b) Are we dealing with a numerical or categorical data set? _____________________ (c) Determine the mean, median, and mode.

Page 38: Descriptive Statistics Unit

NSSAL 32 Draft ©2011 C. D. Pilmer

3. A magazine conducted a survey where they wished to understand the average class size of first year courses at a local community college. They randomly selected 17 first year classes and obtained the following numbers.

23 37 36 40 39 115 28 25 41

23 32 27 16 15 31 27 34

(a) Are we dealing with a sample or a population? ____________________ (b) Determine the mean, median, mode, 5% trimmed mean, and 10% trimmed mean. (c) Why is it appropriate to use trimmed means in this situation? (d) If this data set was comprised of 78 data points and we wanted to calculate a 5% trimmed

mean, how many data points would be dropped from the bottom and top of the data set? 4. A new subdivision outside of Halifax was constructed over the last few years. Barb wanted

to know what the average value of the new homes was. She was not prepared to look at the assessed values of all 218 new homes. Instead she randomly selected 24 homes and recorded their assessed values. These values in thousands of dollars are shown below.

267 265 226 254 231 221 246 252 253 241 261 589

243 269 267 253 287 320 221 264 257 249 226 267

Page 39: Descriptive Statistics Unit

NSSAL 33 Draft ©2011 C. D. Pilmer

(a) Calculate the mean, median, mode, and 5% trimmed mean. (b) Which of these measures is not influenced or less influenced by extremely high or low

data points? (c) Would a histogram or a bar graph be used with this data set? 5. In gymnastics and diving, several judges score each athlete. The final score for the athlete is

calculated by removing the high and low scores and averaging the remainder. Why do you think they use this trimmed mean scoring method in gymnastics and diving?

Page 40: Descriptive Statistics Unit

NSSAL 34 Draft ©2011 C. D. Pilmer

Box and Whisker Plots Box and whisker plots, also called box plots, are a quick graphic approach for examining one or more sets of data. It is named such because the middle portion is comprised on a rectangular box which typically has a line (whisker) extending from the two ends of the box.

The box and whisker plot provides us with five critical pieces of information regarding the data that was used to construct it. (Refer to the diagram below.)

• We are supplied with the minimum value in our data set. In this case, that value is 17. • We are supplied with the maximum value in our data set. In this case, that value is 36. • We are supplied with the median (or middle) of the data set. In this case, the median is

26. • We are supplied with the lower quartile (also called first quartile or Q1). This value is

found by working out the median of the numbers below the median of the entire set of data. The lower quartile is the number that 25% of the data is below. In this case, the lower quartile is 21.

• We are supplied with the upper quartile (also called third quartile or Q3). This value is found by working out the median of the numbers above the median of the entire set of data. The upper quartile is the number that 25% of the data is above. In this case, the upper quartile is 30.

Before we learn how to construct a box and whisker plot, we are going to look at a sample question involving a real world context where we have to compare two plots.

15 20 25 30 35 40

Box Whisker Whisker

15 20 25 30 35 40

median minimum value

lower quartile

maximum value

upper quartile

Page 41: Descriptive Statistics Unit

NSSAL 35 Draft ©2011 C. D. Pilmer

Example 1 Two blood testing departments at different Nova Scotia hospitals recorded their patient wait times in minutes. This data was used to construct the two box and whisker plots.

How do the wait times compare at these two blood testing departments?

Answer: Although the minimum value for Department B is 2 minutes less than the minimum value for

Department A, and the lower quartile for Department B is 1 minute less than the lower quartile of Department A, the overall results for Department A are better. The median or Department A is slightly better, and the upper quartile and maximum value for Department A are much better than those for Department B. Department A appears to deliver a more consistent level of service in terms of wait times; that is why the box and whiskers are shorter for Department A's plot. We can say that the wait times are clustered closer together for Department A versus Department B. To explain this further, just look at the boxes for the two plots. Based on the first box, we can see that middle 50% of Department A's patients are served between 10 minutes and 16 minutes. Based on the second box, the middle 50% of Department B's patients are, however, served between 9 minutes and 21 minutes; a much longer time span. We can also conclude that generally patients had shorter wait times at Department A.

Making a Box and Whisker Plot

It is a six step process to construct a box and whisker plot.

(i) Arrange the data points in order of size, from smallest to largest.

(ii) Identify the minimum value and maximum value.

(iii) Determine the median.

(iv) Find the lower quartile by finding the median of the numbers below, but not including, the median of the entire set of numbers.

(v) Find the upper quartile by finding the median of the numbers above, but not including, the median of the entire set of numbers.

(vi) Draw your box and whisker plot along a number line using the values you found in steps (ii) through (v).

0 5 10 15 20 25

Department A

Department B

30

Page 42: Descriptive Statistics Unit

NSSAL 36 Draft ©2011 C. D. Pilmer

Example 2 Construct a box and whisker plot for the following data. 22, 4, 11, 24, 18, 9, 19, 21, 13

Answer: (i) Arrange from smallest to largest 4, 9, 11, 13, 18, 19, 21, 22, 24 (ii) Minimum Value = 4, Maximum Value = 24 (iii) Find the median (i.e. middle value). 4, 9, 11, 13, 18, 19, 21, 22, 24 Median = 18 (iv) Find the lower quartile. This is done by taking the lower 50% of the data, not including

the median from step (iii), and finding the median of these data points. 4, 9, 11, 13

102119

=+ Lower Quartile = 10

(v) Find the upper quartile. This is done by taking the upper 50% of the data, not including the median from step (iii), and finding the median of these data points.

19, 21, 22, 24

5.212

2221=

+ Upper Quartile = 21.5

(vi) Draw the plot along a number line.

Example 3 Display the following as a box and whisker plot. 10, 14, 21, 26, 16, 12, 14, 9, 17, 26

Answers: (i) Arrange from smallest to largest. 9, 10, 12, 14, 14, 16, 17, 21, 26, 26 (ii) Minimum Value = 9, Maximum Value = 26 (iii) Find the median. 9, 10, 12, 14, 14, 16, 17, 21, 26, 26

152

1614=

+ Median = 15

(iv) Find the lower quartile using the lower 50% of the data, not including the median. 9, 10, 12, 14, 14 Lower Quartile = 12 (v) Find the upper quartile using the upper 50% of the data, not including the median. 16, 17, 21, 26, 26 Upper Quartile = 21

5 10 15 20 25

Page 43: Descriptive Statistics Unit

NSSAL 37 Draft ©2011 C. D. Pilmer

(vi) Draw plot along a number line.

Questions

1 Construct a box and whisker plot for each of the following sets of data. (a) 30, 15, 6, 24, 19, 15, 17, 21, 20, 11, 9

Remember to start by reorganizing the data.

(b) 45, 46, 37, 52, 33, 34, 43, 43, 48, 50, 49, 43, 46, 40

Remember to start by reorganizing the data.

25 30 35 40 45 50 55

5 10 15 20 25 30

5 10 15 20 25

Page 44: Descriptive Statistics Unit

NSSAL 38 Draft ©2011 C. D. Pilmer

(c) 31, 26, 38, 25, 24, 29, 31, 37, 38, 30, 40, 27, 24, 24, 31, 26, 33

(d) 38, 37, 40, 28, 34, 36, 35, 41, 38, 35

2. A reaction time experiment is conducted in several adult education classrooms. In the

experiment one student releases a ruler and a second student tries to grasp it as quickly as possible. The distance that the ruler drops is one way to measure the second student's reaction time. For example, if Student A's ruler only drops 7 cm compared to Student B's ruler that drops 12 cm, then we could say that Student A has a better reaction time.

20 25 30 35 40 45

20 25 30 35 40 45

Page 45: Descriptive Statistics Unit

NSSAL 39 Draft ©2011 C. D. Pilmer

(a) Each member of Mrs. Leck's math class participated in the experiment. The following data was collected. Construct a box-and-whisker plot.

18 22 10 19 12 21 7 16 22 20 9 20 11

(b) Mr. Porter's class and Mr. Churchill's class participated in the same experiment. A box-

and-whisker plot was constructed for both classes.

How do the two classes compare in terms of reaction times? (c) Mrs. Lowe's class and Mr. Vroom's class participated in the same experiment. The

following data was collected. Mrs. Lowe's Class 9 17 6 12 15 20 10 17 13 19 20 10

Mr. Vroom's Class 16 20 23 10 23 18 6 21 17 23 15

Construct two box-and-whisker plots.

5 10 15 20 25 30

Mr. Porter's Class

Mr. Churchill's Class

5 10 15 20 25 30

Page 46: Descriptive Statistics Unit

NSSAL 40 Draft ©2011 C. D. Pilmer

How do the two classes compare in terms of reaction times? (d) Mrs. Burchill's class and Mr. Rhodenizer's class participated in the same experiment.

The following data was collected. Mrs. Burchill's Class 16 7 12 5 21 13 16 10 18 11 8 19 14 11

Mr. Rhodenizer's Class 9 14 13 19 8 16 11 22 14 6 11

Construct two box-and-whisker plots.

How do the two classes compare in terms of reaction times?

5 10 15 20 25 30

5 10 15 20 25 30

Page 47: Descriptive Statistics Unit

NSSAL 41 Draft ©2011 C. D. Pilmer

Using Technology to Make Box and Whisker Plots The TI-83 and TI-84 graphing calculators can draw box and whisker plots. This is particularly useful when we have lots of data. In this example we are going to use two sets of data to create two box and whisker plots at the same time.

First Data Set 5.8 3.9 11.0 9.3 5.3 4.5 14.5 6.1 16.1 7.1 12.7 6.9 4.7 3.1 4.5 7.2 6.0 6.2 4.7 10.2 3.2 8.0 5.2 15.9 7.8 13.2 6.7 4.9

Second Data Set 7.3 10.2 8.3 9.9 5.0 9.4 8.1 9.7 7.5 7.9 8.6 4.8 8.3 13.2 7.2 12.6 7.7 9.0 6.9 8.5 8.7 4.9 10.0 7.7 7.2 8.2 Procedure:

1. Enter the First Data Set in List 1 and the Second Data Set in List 2

STAT > EDIT > Edit > Enter first data set in L1 > Enter second data set in L2

2. Turn on the Plots

STAT PLOT > Select Plot 1 > Select On, Box and Whisker, and L1 > STAT PLOT

> Select Plot 2 > Select On, Box-and Whisker, and L2

3. Draw the Box-and-Whisker Plot

ZOOM > ZoomStat > TRACE > Move the right, left, up and down buttons to see the different values on the box and whisker plots.

Page 48: Descriptive Statistics Unit

NSSAL 42 Draft ©2011 C. D. Pilmer

Questions

In the following questions you will be asked to draw histograms as well as box-and-whisker plots. You are required to draw the histograms by hand and the box and whisker plots using technology. 1. Mrs. Ross is coaching her daughter's junior high basketball team. She has three players to

choose from the bench. The statistics for each of the players is shown below. You are going to use your knowledge of statistics to help Mrs. Ross in making a selection.

Tanya 8 4 20 22 25 14 23 24 2 10 23 25 16 2 25

Barb 22 6 12 18 18 12 25 14 13 20 8 20 18 16

Suzette 30 29 11 16 4 5 20 6 8 22 9 6 28 11 9 9 (a) Using technology, construct three box-and-whisker plots.

(b) Determine the mean score for each player. (c) Draw three histograms for the three sets of data. Note that the classes will include the

first number but not the second. For example the class 0 to 5 includes 0, but not 5. Tanya Barb Suzette Class Frequency Class Frequency Class Frequency 0 to 5 0 to 5 0 to 5 5 to 10 5 to 10 5 to 10 10 to 15 10 to 15 10 to 15 15 to 20 15 to 20 15 to 20 20 to 25 20 to 25 20 to 25 25 to 30 25 to 30 25 to 30 30 to 35 30 to 35 30 to 35

5 10 15 20 25 30 0

Page 49: Descriptive Statistics Unit

NSSAL 43 Draft ©2011 C. D. Pilmer

(d) Which player has two distinct clusters within their data? __________________ (e) Who is the best player? __________________ (f) Who is the most consistent player? __________________ (g) What range of scores would be considered Tanya's top 25%? __________________ (h) What range of scores would be considered Barb's bottom 25%? __________________ (i) What range of scores would be considered Suzette's top 50%? __________________ 2. Mrs. Tuttle-Comeau is an assistant coach for her son's high school track and field team. At

the last track meet (Track Meet A) she gathered the following data regarding 30 sprinters in the 100 m race. Each of these pieces of data represents the best time each of the high school sprinters obtained during this meet.

11.0 12.5 12.1 11.2 12.2 12.7 11.4 13.7 10.9 12.9 12.2 10.6 12.8 13.0 12.2 11.2 13.2 12.2 16.2 11.9 11.5 12.2 11.0 11.6 10.9 12.0 10.7 11.5 11.1 12.2 (a) Determine the mean time. (b) Construct a box and whisker plot for this data.

11 12 13 14 15 16 10

Page 50: Descriptive Statistics Unit

NSSAL 44 Draft ©2011 C. D. Pilmer

(c) Construct a histogram. Note that the classes will include the first number but not the second. For example the class 10 to 11 includes 10, but not 11.

Class Frequency 10 to 11 11 to 12 12 to 13 13 to 14 14 to 15 15 to 16 16 to 17 (d) Are there two distinct clusters within this data? __________________ (e) What range of times would place an individual in the top 50% of the competitors? (f) What range of times would place an individual in the bottom 25% of the competitors? (g) What range of times would place an individual in the top 25% of the competitors? (h) Here's a box-and-whisker plot for another track meet (Track Meet B). Which track meet,

A or B, resulted in a greater percentage of strong performances? How did you arrive at this answer?

11 12 13 14 15 16 10

Page 51: Descriptive Statistics Unit

NSSAL 45 Draft ©2011 C. D. Pilmer

3. Body mass index (BMI) is a calculation that uses an individual's height and weight to estimate how much body fat they have. In Canada a BMI is recorded in kg/m2 and then those results are then matched with one of four categories designated by Health Canada. These categories are: • underweight (BMIs less than 18.5); • normal weight (BMIs 18.5 to 24.9); • overweight (BMIs 25 to 29.9), and • obese (BMIs 30 and over).

The BMIs for adult learners from two different college classes were calculated and recorded. Class A 29.3 27.3 24.3 23.5 27.2 28.6 20.2 24.6 27.3 29.4 21.8 25.2 27.9 28.5 26.8 23.1 28.4 26.9 22.9 28.1 26.7 22.5 Class B 30.2 21.4 17.2 28.6 20.9 26.8 20.7 30.8 21.8 17.8 23.6 18.8 24.2 19.6 32.7 23.8 18.5 31.4 22.5 18.3 Using technology, construct two box and whisker plots and record the results below.

How to the BMI's for the two classes compare?

15 20 25 30

Page 52: Descriptive Statistics Unit

NSSAL 46 Draft ©2011 C. D. Pilmer

Standard Deviation Measures of central tendency (median and mode) do not give us any indication of how the data is spread out. Consider the following two sets of data.

First Data Set: 13, 14, 15, 15, 15, 16, 17

Second Data Set: 10, 12, 13, 15, 17, 18, 20

The mean for both of these data sets is 15, however, the individual pieces of data in these sets are considerably different. In the first set, the numbers range from 13 to 17, and clearly cluster around the number 15. In the second set the numbers range from 10 to 20 and tend to be more spread out around the mean. The dispersion is far greater in the second set, than in the first. Standard deviation is one way of measuring dispersion. If the standard deviation is low, then the data clusters around the mean. If the standard deviation is high, then the data is spread out around the mean. Without getting into the actual calculations, the standard deviation for the first data set is 1.20 and the standard deviation for the second data set is 3.30. The larger number indicates greater dispersion. Calculating Standard Deviation

Before we get to the calculations, we have to remind you of an important point and introduce two formulas. In the unit introduction we stated that this unit would focus on populations, rather than samples. A population is the set representing all measurements of interest to an investigator while a sample is simply a subset of the measurements from the population chosen at random. We learned that the mean is calculated by adding all the data values and then dividing up the number of data values. This can be expressed using the following formula.

n

xxxx n++++=

...321µ where µ (mu) is the population mean

The formula for population standard deviation, σ (sigma), is shown below. You are not expected to memorize this formula.

( ) ( ) ( ) ( )

nxxxx n

223

22

21 ... µµµµ

σ−++−+−+−

=

This formula requires that you complete six steps.

Step 1: Find the mean; µ .

Step 2: Calculate the difference between each data value and the mean; µ−ix .

Step 3: Square those differences found in Step 2; ( )2µ−ix

Step 4: Add the squared differences; ( ) ( ) ( ) ( )223

22

21 ... µµµµ −++−+−+− nxxxx

Step 5: Divide the sum from Step 4 by the number of data values.

Step 6: Square root the value from Step 5.

Page 53: Descriptive Statistics Unit

NSSAL 47 Draft ©2011 C. D. Pilmer

The easiest way to learn how to use this formula (i.e. complete the six steps) is to construct a table where only small portions of the calculation are completed at any one time. Example 1 Determine the standard deviation for the following set of data. 10, 12, 13, 15, 17, 18, 20

Answer:

Find the mean. n

xxxx n++++=

...321µ (Step 1)

7

20181715131210 ++++++=µ

15=µ

Construct the table. ix µ−ix

(Step 2) ( )2µ−ix (Step 3)

10 -5 25 12 -3 9 13 -2 4 15 0 0 17 2 4 18 3 9 20 5 25

Sum = 76 (Step 4)

3.3776

=

=

σ

σ (Steps 5 and 6)

The population standard deviation is 3.3. Example 2 Mrs. Gillis teaches math to adults. At the end of the year she examines the final marks for all of her students who have completed the course. She wants to work out the standard deviation of those marks.

87 72 91 82 74 93 75 83 78 75 Answer:

Find the mean. n

xxxx n++++=

...321µ

Page 54: Descriptive Statistics Unit

NSSAL 48 Draft ©2011 C. D. Pilmer

10

75788375937482917287 +++++++++=µ

81=µ Construct the table. ix µ−ix ( )2µ−ix 87 6 36 72 -9 81 91 10 100 82 1 1 74 -7 49 93 12 144 75 -6 36 83 2 4 78 -3 9 75 -6 36

Sum = 496

04.710496

=

=

σ

σ

The population standard deviation is 7.04. Questions

1. Determine the standard deviation for the following data.

25 32 24 28 31 28

ix µ−ix ( )2µ−ix

Page 55: Descriptive Statistics Unit

NSSAL 49 Draft ©2011 C. D. Pilmer

2. Determine the standard deviation for the following data.

3.7 4.3 5.0 4.6 4.0 4.7 3.9 4.2

ix µ−ix ( )2µ−ix

3. Two data sets have been provided.

15 14 13 18 16 13 16 15 15

17 15 16 14 11 19 16 11 16 (a) Calculate the standard deviation for each data set.

=µ =µ

ix µ−ix ( )2µ−ix ix µ−ix ( )2µ−ix

Page 56: Descriptive Statistics Unit

NSSAL 50 Draft ©2011 C. D. Pilmer

(b) The standard deviations are different for the two data sets. What is this telling you? 4. Barb, a math instructor, recorded the height in centimetres of all of the male students in her

Level IV math courses. She obtained the following measurements.

181 173 184 183 190 180 186 176 185 (a) What is the median for this data? (b) What is the mean for this data? (c) Is Barb dealing with a categorical or numerical data set? (d) Determine the standard deviation.

ix

Page 57: Descriptive Statistics Unit

NSSAL 51 Draft ©2011 C. D. Pilmer

(e) Another instructor at different campus also has 9 male learners in his Level IV Math courses. He measured their heights. He found the mean to be 182 cm with a standard deviation of 6.4 cm. Based on these results, what can you say about the heights of this instructor’s male learners compared to Barb’s male learners?

(f) A third instructor at another campus also has 9 male learners in her Level IV Math

courses. She measured their heights. She found the mean to be 179 cm with a standard deviation of 4.8 cm. Based on these results, what can you say about the heights of this instructor’s male learners compared to Barb’s male learners?

5. Without attempting any calculations, match each standard deviation with the appropriate

histogram. Please note that all of the histograms are drawn at the same scale.

Standard Deviations: (a) 0.69 (b) 1.40 (c) 3.34 (d) 3.62 Histograms: (i)

Matches with _____ Matches with _____ Matches with _____ Matches with _____ 6. Create two data sets the meet all of the following conditions.

• They have at least six pieces of data. • They must have a mean of 10. • They have standard deviations that are quite different.

Page 58: Descriptive Statistics Unit

NSSAL 52 Draft ©2011 C. D. Pilmer

Using Technology to Calculate Population Standard Deviation In the last section we learned how to work out the population standard deviation (σ ) using paper and pencil. The TI graphing calculators can calculate this along with several other measures we have been exposed to in this unit. Using such technology is particularly useful when we are dealing with a large number of data points. Example Tylena was teaching an evening class comprised of 30 adult learners. She asked them all to complete a series of thirty basic math problems. She recorded how long it took for each learner to complete the task in minutes. The data is shown below.

40 46 68 51 42 55 48 52 38 49 56 50 35 54 50 60 56 44 53 58 60 45 52 55 46 51 40 50 64 45

(a) Draw a histogram using technology. Use class widths of 5 starting at 35. (b) Determine the mean time. (c) Determine the standard deviation. (d) Determine the median.

Answers:

Step 1: Enter the Data in the Calculator

STAT > Edit > If data already exists in L1 then move the > Enter the data in L1 cursor up so L1 is highlighted, press CLEAR, and move the cursor back down.

Step 2: Draw the Histogram

STATPLOT > Select Plot 1 > Turn on the plot, select histogram, Xlist > WINDOW should be L1 and Freg should be 1.

> Set Xmin at 35, Xmax at 70, Xscl at 5 > GRAPH > TRACE > Use the right Ymin at 0, Ymax at 10, Yscl at 1 and left arrows

Note: The Xmin on the Window setting is the starting point for the first class and the Xscl

sets the class width. In this case the first class is 35 - 40.

Page 59: Descriptive Statistics Unit

NSSAL 53 Draft ©2011 C. D. Pilmer

STAT > CALC > 1-Var Stats > Enter the List (typically L1) > ENTER

The calculator does not report the population mean (µ ) however, as we previously learned,

the formula for sample mean and population mean are the same. The calculator reports the sample mean ( )x , but we know that we are actually dealing with a population mean of 50.4 minutes. We are also asked to determine the standard deviation, which is actually the population standard deviation (σ ). This calculator uses the symbol xσ , rather than σ , to represent the population standard deviation. Therefore our population standard deviation is 7.5 minutes. To find the median, scroll down using the down arrow while still on the 1-Var Stats results until you find Med. The median in this case is 50.5.

(b) population mean ( µ ) = 50.4 minutes

(c) population standard deviation (σ ) = 7.5 minutes

(d) median = 50.5 minutes Questions

1. Provincial governments keep records of the number of young offenders who are incarcerated each year. The incarceration rates vary greatly from province to province. In 2006 Nova Scotia reported an incarceration rate of 9.91. That means that 9.91 young persons out of

10 000 young persons was incarcerated. Below you will find the incarceration rates for the provinces and territories for 2006. (Source: Statistics Canada)

Province Rate Province Rate Province Rate YT 8.57 SK 24.54 NB 10.20 NT 46.12 MB 21.25 PE 7.21 NU 20.49 ON 7.51 NS 9.91 BC 4.45 QC 3.89 NL 11.93 AB 7.18 (a) Are we dealing with a population or a sample? Explain. (b) Using technology draw a histogram showing the distribution of incarceration rates. Use

class widths of 5 starting at 0. (c) Determine the mean, median, and standard deviation.

Page 60: Descriptive Statistics Unit

NSSAL 54 Draft ©2011 C. D. Pilmer

(d) There is a substantial difference between the mean and median. Why is this so? 2. Below you will find a list of Prime Ministers of Canada since Confederation in 1867. We

have also been supplied with their age upon first taking office as PM.

Prime Minister (PM) First Term Starts Age John A. MacDonald 1867 52 Alexander Mackenzie 1873 51 John Abbott 1891 70 John Sparrow Thompson 1892 48 Mackenzie Bowell 1894 70 Charles Tupper 1896 74 Wilfrid Laurier 1896 54 Robert Borden 1911 57 Arthur Meighen 1920 46 William Lyon Mackenzie King 1921 47 Richard Bennett 1930 60 Louis St-Laurent 1948 66 John Diefenbaker 1957 61 Lester Pearson 1963 65 Pierre Trudeau 1968 48 Joe Clark 1979 39 John Turner 1984 55 Brian Mulroney 1984 45 Kim Campbell 1993 46 Jean Chretien 1993 59 Paul Martin 2003 65 Stephen Harper 2006 46

(a) Are we dealing with a population or a sample? Explain. (b) Using technology draw a histogram showing the distribution of ages for PMs first taking

office. Use class widths of 5 starting at 35. (c) Determine the mean PM age for first taking office.

Page 61: Descriptive Statistics Unit

NSSAL 55 Draft ©2011 C. D. Pilmer

(d) Determine the standard deviation. (e) Determine the median. (f) What can you conclude based on the histogram and standard deviation? 3. Cholesterol is waxy, fat-like substance found in all cells of the body. Our bodies need it to

make hormones, vitamin D, and substances used in digestion. However, cholesterol, specifically low density lipoprotein (LDL) cholesterol, in high amounts is dangerous to one's health. The following chart looks at various cholesterol ranges and their classifications. The units of measure are millimoles per litre (mmol/L).

LDL Cholesterol Levels

below 2.6 from 2.6 to 3.3

from 3.4 to 4.1

from 4.2 to 4.9

above 4.9

Classification desirable near optimal

borderline high too high

Dr. Gillis is looking through the records for all her male patients over the last year who are between the ages of 50 and 60 years. They have all had blood work and she records all the LDL cholesterol levels for these patients in the chart below.

4.1 3.6 3.4 5.1 2.4 2.5 2.5 3.5 3.8 4.8 4.4 2.4 2.3 4.2 3.3 5.2 2.9 2.7 5.3 2.6 2.8 3.0 4.6 4.9 3.3 3.2 3.0 3.7 3.7 3.4

(a) Using technology draw a histogram showing the distribution of LDL cholesterol levels. Use class widths of 0.8 starting at 1.8.

(b) Determine the mean LDL cholesterol levels for Dr. Gillis' male patients between the ages of 50 and 60 years.

Page 62: Descriptive Statistics Unit

NSSAL 56 Draft ©2011 C. D. Pilmer

(c) Determine the standard deviation. (d) Determine the median. (e) What can you conclude based on the histogram and standard deviation?

Page 63: Descriptive Statistics Unit

NSSAL 57 Draft ©2011 C. D. Pilmer

Distributions A frequency polygon is the shape that is formed when midpoints of the tops of the bars on a histogram are joined by straight lines.

In this case, the frequency polygon forms a bell-shaped curve that is associated with a population that follows a normal distribution. Many variables observed in nature, including heights, weights, and reaction times, follow normal distributions. Consider the heights of female students at college. There are a few women who are less than 5 feet tall, a few who are taller than 6 feet, but the majority of the women are probably between 5’3” and 5’8”. We would expect a normal distribution for the heights of women attending college. Let’s consider a population that results in a normal distribution. The normal curve will be centered about population mean ( µ ). The standard deviation (σ ) determines the extent to which the curve spreads out. If we look at the two normal distributions supplied below, we can see that both distributions are centered around the same value, 65. That means that the mean for both of these populations is 65. The standard deviations, although not supplied, are not the same. The standard deviation for normal distribution A must be lower than that for distribution B because the curve is narrowing meaning that the data points are more clustered around the mean. Please note that the horizontal axis is labeled x. This indicates that we are looking at the distribution of the individual data points denoted by the symbol x.

A

B

Page 64: Descriptive Statistics Unit

NSSAL 58 Draft ©2011 C. D. Pilmer

Do not assume that we have to have a perfectly symmetrical bell-shaped distribution to have a normal distribution. The histogram on the right would create a frequency polygon which is almost symmetrical, but we would still say that we are dealing with a normal distribution. For this course, most of our time will be spent examining situations that follow normal distributions. However, it is important to understand that other types of distributions exist. These other types are shown below. A uniform distribution occurs when every class has equal frequency. A skewed distribution occurs when one tail is much larger than the other tail. A bimodal distribution occurs when two classes with the largest frequencies are separated by at least one class.

Uniform Distribution

Skewed Left Distribution

Skewed Right Distribution

Bimodal Distribution

Question 1. Based on the situation, what type of distribution (normal, uniform, bimodal,…) would you

likely obtain? Distribution Type (a) You randomly select 100 students at an elementary school and

each must report their grade level. There are two classes at each grade level and between 22 to 26 students in each class. What would the distribution of grade levels look like?

(b) Two groups of athletes are running the 100 m dash. One group is comprised of males 12 years of age or younger, and the other is comprised of males between 16 and 20 years of age. You randomly select 150 athletes and ask them to report their time for the 100 m dash. What would the distributions of times look like?

(c) Mrs. Chopra teaches one of the three grade six classes. Normally the administration tries to distribute the strongest math students evenly between the three classes. That did not occur this year and now Mrs. Chopra has a large portion of strong math students in her class. If her class was asked to complete a fair math test, what would the distribution of marks look like?

Page 65: Descriptive Statistics Unit

NSSAL 59 Draft ©2011 C. D. Pilmer

Distribution Type (d) You randomly select 100 females between the ages of 20 and 29

and record their heights. What would the distribution of heights look like?

(e) A college instructor had what he described as an average class of students. From his perspective there were a few weak students, a few strong students but the majority of the students were of average ability. He gave the class an extremely challenging test where only the strongest students could maintain good marks, ranging from 75% to 95%. The rest of the students did poorly where many resoundingly failed the test. What would the distribution of marks for this test look like?

(f) You spin the following spinner 300 times recording how many times you obtain each of the results (1, 2, 3, 4). What would the distribution of results look like?

(g) A nursing student working at the children's hospital looks at the birth weights of all babies born in the hospital during June, July, and August. What would the distribution of birth weights look like?

(h) Eastern American Toad, common in Nova Scotia, enter the world as small dark polliwogs, become miniature toads, and finally mature to be adult toads. What would the distribution of ages for Eastern American Toads of all forms (polliwogs to adults) look like?

(i) A personal trainer at a coed gym recorded the maximum resistance people would set on a particular piece of exercise equipment over a one month period. What would the distribution of resistance settings look like?

(j) A kinesiologist is recording the grip strength of 250 randomly selected males between the ages of 25 and 35. What would the distribution of grip strengths look this?

1

3 4

2

Page 66: Descriptive Statistics Unit

NSSAL 60 Draft ©2011 C. D. Pilmer

Normal Distributions and the 68-95-99.7 Rule In the last section we learned about symmetrical bell-shaped distributions called normal distributions. We also mentioned that the normal curve will be centered about population mean (µ ), and that the standard deviation (σ ) determines the extent to which the curve spreads out. Lower standard deviations result in taller narrower curves. There is something else that is important to learn about normal curve. It is the 68-95-99.7 rule. According to the 68-95-99.7 rule, in any bell-shaped distribution, the following holds true.

• Approximately 68% of the data points will lie within one standard deviation of the mean. • Approximately 95% of the data points will lie within two standard deviations of the

mean. • Approximately 99.7% of the data points will lie within three standard deviations of the

mean.

Let's describe this rule again using the proper symbols that we use for populations. According to the 68-95-99.7 rule, in any bell-shaped distribution of a population, the following holds true.

• Approximately 68% of the data points are between σµ − and σµ + . • Approximately 95% of the data points are between σµ 2− and σµ 2+ . • Approximately 99.7% of the data points are between σµ 3− and σµ 3+ .

Let’s see how this rule applies to a population with a normal distribution where the population mean ( µ ) is 40 and the population standard deviation (σ ) is 10. This distribution is shown below. Notice that it is centered about the mean.

For this population we would expect that approximately 68% of the data points would be between 30 ( σµ − or 40-10) and 50 ( σµ + or 40+10). We would expect that approximately 95% of the data points would be between 20 ( σµ 2− ) and 60 ( σµ 2+ ). Finally we would expect that approximately 99.7% of the data points to be between 10 ( σµ 3− ) and 70 ( σµ 3+ ).

Page 67: Descriptive Statistics Unit

NSSAL 61 Draft ©2011 C. D. Pilmer

Let's take what we just learned and expand upon it. Consider the following statements for a normal population.

• If 68% of the data points are found between σµ − and σµ + , then 34% of the data points would be between µ and σµ + .

• If 68% of the data points are found between σµ − and σµ + , then 34% of the data points would be between σµ − and µ .

If we extend this line of thinking, we can state the following.

• If 95% of the data points are found between σµ 2− and σµ 2+ , then 47.5% of the data points would be between µ and σµ 2+ .

• If 95% of the data points are found between σµ 2− and σµ 2+ , then 47.5% of the data points would be between σµ 2− and µ .

• If 99.7% of the data points are found between σµ 3− and σµ 3+ , then 49.85% of the data points would be between µ and σµ 3+ .

• If 99.7% of the data points are found between σµ 3− and σµ 3+ , then 49.85% of the data points would be between σµ 3− and µ .

Hopefully it makes sense that 50% of the data points should be above the mean, and 50% of the data points must be below the mean. It should also be noted that these values (64%, 95%, 99.7%, 34%, 47.5%,…) can be expressed as probabilities. Probability is the chance that something will happen - how likely it is that some event will occur. Referring back to our normal distribution, there is a 0.64 probability that a randomly selected data point can be found within one standard deviation of the mean (i.e. from

σµ − to σµ + ).

68% 34%

34%

σµ − σµ + µ x

Page 68: Descriptive Statistics Unit

NSSAL 62 Draft ©2011 C. D. Pilmer

Example 1 For a normal population with a mean of 15 and standard deviation of 2, what percentage of the data points would measure (a) between 15 and 19? (b) between 13 and 21? (c) between 11 and 13?

Answers: (a) This question could be restated. It would read, “What percentage of the data points

would be between µ and σµ 2+ ?” (Reason: 15 is µ , and 19 is 2σ to the right of µ )

Therefore approximately 47.5% of the data points will be between 15 and 19. (b) This question could be restated. It would read, “What percentage of the data points

would be between σµ − and σµ 3+ ?”

Therefore approximately 83.85% (34% + 49.85%) of the data points will be between 13

and 21.

47.5%

19 σµ 2+

15 µ

49.85%

21 σµ 3+

15 µ

34%

13 σµ −

x

x

Page 69: Descriptive Statistics Unit

NSSAL 63 Draft ©2011 C. D. Pilmer

(c) This question could be restated. It would read, “What percentage of the data points would be between σµ 2− and σµ − ?”

Therefore approximately 13.5% (47.5%-34%) of the data points will be between 11 and

13. Example 2 The quality control officer at a cereal factory knows that the mean weight for the cereal in their regular size box is 461 grams with a standard deviation of 6 grams. (a) What is the probability of randomly choosing a cereal box off the assembly line that weighs

between 461 grams and 467 grams? (b) What is the probability of randomly choosing a cereal box off the assembly line that weighs

between 455 grams and 479 grams? (c) What is the probability of randomly choosing a cereal box off the assembly line that weighs

between 443 grams and 449 grams? (d) What is the probability of randomly choosing a cereal box off the assembly line that weighs

more than 455 grams? (e) If we randomly chose 800 boxes, how many would we expect to be between 449 grams and

473 grams?

Answers: (a) Attack this logically.

• We were told that µ is 461, and that σ is 6. • We were told that we are dealing with boxes between 461 and 467 grams. Notice

that 467 is 6 (or one standard deviations) away from 461 (µ ). That means that 467 is actually σµ + .

• Let's find the percentage of data points that are between σµ + and µ . The answer is 34%.

• Now convert that percentage to a probability. The probability is 0.34.

11 σµ 2−

15 µ

34%

13 σµ −

47.5%

Page 70: Descriptive Statistics Unit

NSSAL 64 Draft ©2011 C. D. Pilmer

(b) Think logically. • 455 is one standard deviation to the left of the mean, and therefore can be expressed

as σµ − . • 479 is three standard deviations to the right of the mean and therefore can be

expressed as σµ 3+ . • We actually need to find the percentage of boxes that are between σµ − and

σµ 3+ . • We know that 34% of the data points are between σµ − and µ . We also know that

49.85% of the data points are between µ and σµ 3+ . Therefore we can conclude that 83.85% (34% + 49.85%) of the data points are between σµ − and σµ 3+ .

• Convert 83.85% to a probability of 0.8385. Based on this number, we can say that there is a very high chance that a randomly selected cereal box will have weight between 455 g and 479 g.

(c) Think logically.

• 443 is three standard deviations to the left of the mean, and therefore can be expressed as σµ 3− .

• 449 is two standard deviations to the left of the mean, and therefore can be expressed as σµ 2− .

• We actually need to find the percentage of boxes that are between σµ 3− and σµ 2− .

• We know that 49.85% of the data points are between σµ 3− and µ . We also know that 47.5% of the data points are between σµ 2− and µ . Therefore we can conclude that 2.35% (49.85% - 47.5%) of the data points are between σµ 3− and σµ 2− .

• Convert 2.35% to a probability of 0.0235. Based on this number, we can say that there is a very slight chance that a randomly selected cereal box will have weight between 443 g and 449 g.

(d) Think logically.

• 34% of the data points are between 455 ( σµ − ) and 461 (µ ). • 50% of the data points are greater than 461 ( µ ) • Therefore 84% of the data is greater than 455. This gives us a probability of 0.84

(e) The number 449 is σµ 2− . The number 473 is σµ 2+ . We know that 95% of the data

points should be two standard deviations to the left and right of the mean. As a probability, it is expressed as 0.95.

76080095.0 =×

Of the 800 randomly selected cereal boxes, we would expect 760 boxes to be between 449 g and 473 g.

Page 71: Descriptive Statistics Unit

NSSAL 65 Draft ©2011 C. D. Pilmer

Questions 1. Use the 68-95-99.7 rule on a distribution of data points with a population mean of 230 and a

population standard deviation of 15 to answer the following questions. You may wish to draw and label a normal distribution curve to assist you with each of these questions. This is what we did in Example 1.

(a) What percentage of the data points would measure between 215 and 245? (b) What percentage of the data points would measure between 230 and 260? (c) What percentage of the data points would measure between 215 and 230? (d) What percentage of the data points would measure between 185 and 230? (e) What percentage of the data points would measure between 200 and 245? (f) What percentage of the data points would measure between 215 and 275? (g) What is the probability that a randomly selected data point would be between 185 and

260? (h) What is the probability that a randomly selected data point would be between 245 and

260?

Page 72: Descriptive Statistics Unit

NSSAL 66 Draft ©2011 C. D. Pilmer

(i) What is the probability that a randomly selected data point would be between 185 and 200?

(j) What is the probability that a randomly selected data point would be between 245 and

275? (k) What is the probability that a randomly selected data point would be less 245? (l) What is the probability that a randomly selected data point is greater than 200? (m) What is the probability that a randomly selected data point is less than 215? 2. A company monitored the production of 2000 bagels for a one day period. They determined

that the mean weight (population mean) of the bagels was 104 grams with a standard deviation of 3 grams. Assume the distribution of bagel weights is bell-shaped. You may choose to draw and label a normal distribution curve to assist you with each of these questions.

(a) How many of the 2000 bagels were within 9 grams of the mean? (b) How many of the 2000 bagels were within 3 grams of the mean?

Page 73: Descriptive Statistics Unit

NSSAL 67 Draft ©2011 C. D. Pilmer

(c) How many of the 2000 bagels are between 98 grams and 104 grams? (d) How many of the 2000 bagels are between 101 grams and 110 grams? (e) How many of the 2000 bagels are between 107 grams and 110 grams? (f) How many of the 2000 bagels are between 98 grams and 110 grams? (g) How many of the 2000 bagels are between 95 grams and 101 grams? (h) How many of the 2000 bagels are between 98 grams and 113 grams? (i) How many of the 2000 bagels are between 95 grams and 104 grams? (j) How many of the 2000 bagels are between 110 grams and 113 grams? (k) How many of the 2000 bagels are less than 98 grams?

Page 74: Descriptive Statistics Unit

NSSAL 68 Draft ©2011 C. D. Pilmer

Z-Score In the last section, the problems used numbers that were always 1, 2, or 3 standard deviations from the mean. For example in question 1 (e), we were told that the population mean was 230 and the population standard deviation was 15, and then we were asked to find percentage of the data points that were between 200 and 245? The number 200 is exactly two standard deviations below the mean, while the number 245 is exactly one standard deviation above the mean. What if we were asked to find the percentage of data points that would be between 197 and 251? These two values are not 1, 2, or 3 standard deviations from the mean; rather, they are located some fractional amount of the standard deviation away from the mean. Because of this, the technique that we learned in the previous section will not work. We need another approach; we are going to use z-scores. In statistics, the z-score (also called the standard score) indicates how many standard deviations a data point is above or below the mean. It is found using the following formula.

σµ−

=xz

Example 1 A population, which results in a bell-shaped distribution, has a mean of 26.1 and standard deviation of 2.3. How many standard deviations from the mean is each of these data points? (a) 28.9 (b) 24.7

Answers:

(a)

22.13.2

1.269.28

=

−=

−=

z

z

xzσµ

The data point 28.6 is 1.22 standard deviations from the mean of 26.1. The z-score is positive because the data point is larger than the mean (i.e. to the right of the mean).

(b)

61.03.2

1.267.24

−=

−=

−=

z

z

xzσµ

The data point 24.7 is 0.61 standard deviations from the mean of 26.1. The z-score is negative because the data point is smaller than the mean (i.e. to the left of the mean).

What we have just learned regarding z-scores does not help us answer questions like the one introduced at the beginning of this section.

Original Question: We have a population, which results in a bell-shaped distribution, has a mean of 230 and

standard deviation of 15. What percentage of data points that would be between 197 and 251?

where x is the data point (also called an observation or raw value), µ is the population mean, and σ is the population standard deviation.

Page 75: Descriptive Statistics Unit

NSSAL 69 Draft ©2011 C. D. Pilmer

Using the z-score we can now determine how many standard deviations the data points 197 and 251 are away from the mean, 230. This, however, does not tell us the percentage of data points that are between 197 and 251. We need to learn about area under the standard normal curve. The mathematics necessary to understand how one determines the area under the standard normal curve is well beyond the scope of this course. At this level all we need to know is that the standard normal curve is centered at 0 (i.e. has a mean of 0), has a standard deviation of 1, that the total area under this curve is equal to 1, and that area is equal to the probability that a randomly selected data point falls within that interval. We use the standard normal curve to understand other populations that are normally distributed, even though these populations have different means and standard deviations. Standard Normal Curve: 0=µ , 1=σ , Area Under the Complete Curve = 1 If we look at the standard normal curve on the right, we notice that we have gone 2 standard deviations to the left and right of the mean (represented by the -2.0 and 2.0). The area under the curve within this interval (i.e. the shaded region on the diagram) is 0.9544. This area is equivalent to probability that a randomly selected data point falls within that interval. This makes sense when we remember that we had already learned that there is a 95% chance that a randomly selected data point is within two standard deviations of the mean. If we look at the next diagram, we have gone 1.2 standard deviations to the left of the mean and 1.6 standard deviations to the right of our mean on the standard normal curve. In this case, the area under the curve in that interval is 0.8301. That means that there is a 0.8301 probability that a randomly data point will fall within that interval. In the last two diagrams, we supplied the areas under the curves in the defined intervals but how do we determine these areas when they are not supplied? We have to use a chart and a procedure that is identical to what we used in the last section. The chart allows us use to determine areas/probabilities from a specific standard deviation to the mean. The easy way to show how to use the chart is through worked examples.

Area = 0.9544

Area = 0.8301

Page 76: Descriptive Statistics Unit

NSSAL 70 Draft ©2011 C. D. Pilmer

Example 2 A population, which results in a bell-shaped distribution, has a mean of 250 and standard deviation of 30. What is the probability that a measurement from a randomly selected item is between 250 and 272?

Answer: Start by considering the interval from 250 to 272. The 250 is equivalent to the population

mean ( µ ). The 272 is 22 units to the right of the mean; we need to determine how many standard deviations this value (272) is away from the mean. This is when we use z-scores.

73.030

250272

=

−=

−=

z

z

xzσµ

We can now rephrase the original question. We are really trying to find the probability that a randomly selected data point is between µ and σµ 73.0+ .

Now let's put this in the context of our standard normal curve, which is drawn on the right. Remember on our standard normal curve, the mean is 0 and the standard deviation is 1. We are going to find the area under this curve from 0 (µ ) to 0.73 ( σµ 73.0+ ). The area under this curve in this interval has been shaded on our diagram. We can use our knowledge of the standard normal curve to understand other populations that are normally distributed, even though these populations have different means and standard deviations. The area under our standard normal curve from 0 to 0.73 is equivalent to the area under our original normal distribution from µ (250) to σµ 73.0+ (272).

To find the area under our standard normal curve, we go to the Areas Under the Standard Normal Curve chart found in the back of this resource (page 96). We have reproduced a portion of this chart below. We work with the row labeled 0.7 and the column labeled 0.03 (Reason: 0.7 + 0.03 = 0.73). We find that this row and column intersect at 0.2673.

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Page 77: Descriptive Statistics Unit

NSSAL 71 Draft ©2011 C. D. Pilmer

That means that the area under the standard normal curve between 0 ( µ ) and 0.73 ( σµ 73.0+ ) is 0.2673. In terms of our original normally distribution, it means that there is a 0.2673 probability that a randomly selected data point will be between 250 (µ ) and 272 ( σµ 73.0+ ).

Example 3 Data for a population was normally distributed with a mean of 167 and standard deviation of 18. What is the probability that a randomly selected data point from this population is between 144 and 181?

Answer: This question is more challenging than the last one because neither of the values supplied

(144 or 181) is the population mean. The lower limit, 144, is below the mean, while the upper limit, 181, is above the mean.

We need to find out how much above and below these two values are but in terms of standard deviations. That means we need to work out the z-scores.

28.118

167144

−=

−=

−=

z

z

xzσµ

78.018

167181

=

−=

−=

z

z

xzσµ

Our question can now be rephrased as "What is the probability that a randomly selected data point from this population is between σµ 28.1− and σµ 78.0+ ?"

To tackle this, we need to work with the standard normal curve and have to break the question into parts. We start by finding the area/probability on our standard normal curve from -1.28 ( σµ 28.1− ) to 0 (µ ), then find the area/probability from 0 ( µ ) to 0.78 ( σµ 78.0+ ), and finally we add the two areas/probabilities.

Area/Probability between σµ 28.1− and µ (Find 1.28 on the chart.)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

Page 78: Descriptive Statistics Unit

NSSAL 72 Draft ©2011 C. D. Pilmer

Area/Probability between µ and σµ 78.0+ (Find 0.78 on the chart.)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.3997 + 0.2823 = 0.6820

For the standard normal curve, the area from -1.28 to 0.78 is 0.6820.

In terms of our original normal distribution, there is a 0.6820 probability that a randomly selected data point from this population is between 144 ( σµ 28.1− ) and 181 ( σµ 78.0+ ).

The Different Cases

The biggest struggle with these questions is the determination of the areas since the chart only shows areas from 0 ( µ ) to the specified z value. There are five different cases we may encounter, two of which we have already examined in Examples 2 and 3. Case 1 This occurs when we need to find the area/probability between a given z value and 0 ( µ ). With these questions we simply use the chart once. This is what we did in Example 2. Case 2 This occurs when we need to find the area/probability between two given z values that are on either side of 0 ( µ ). With these questions, we find two separate area/probabilities and add them together. This is what we did in Example 3.

= +

Page 79: Descriptive Statistics Unit

NSSAL 73 Draft ©2011 C. D. Pilmer

Case 3 This occurs when we need to find the area/probability between two given z values that are on same side of 0 ( µ ). With these questions, we find two separate areas/probabilities and subtract the smaller from the larger.

Case 4 This occurs when we need to find the area/probability to the right of a positive z value, or to the left of a negative z value. With these questions, we take the area to the right (or left) of 0 (This area is always equal to 0.5 because it is half the area of our standard normal curve) and subtract the area from 0 to the z value.

Area always equals 0.5 Case 5 This occurs when we need to find the area/probability to the right of a negative z value, or to the left of a positive z value. With these questions, we take the area to the right (or left) of 0 (This area is always equal to 0.5 because it is half the area of our standard normal curve) and add the area from 0 to the z value.

Area always equals 0.5

= -

= -

+ =

Page 80: Descriptive Statistics Unit

NSSAL 74 Draft ©2011 C. D. Pilmer

Example 4 Porphyrin is a pigment in blood protoplasm. In the population of healthy adults, the concentration of porphyrin is normally distributed with mean =µ 38 mg/dL and standard deviation =σ 12 mg/dL. (a) What is the probability that a randomly selected healthy adult would have a prophyrin

concentration between 43 mg/dL and 54 mg/dL? (b) What is the probability that a randomly selected healthy adult would have a prophyrin

concentration less than 47 mg/dL?

Answers: (a) Both 43 and 54 are above the mean (38). We need to find out how much above these two

values are but in terms of standard deviations. That means we need to determine the z-scores.

42.012

3843

=

−=

−=

z

z

xzσµ

33.112

3854

=

−=

−=

z

z

xzσµ

Based on this work the question can be rephrased. "What is the probability that a randomly selected healthy adult would have a prophyrin concentration between

σµ 42.0+ and σµ 33.1+ ?"

Now let's put this in the context of our standard normal curve. We need to find the area under the curve (which is equivalent to the probability) from 0.42 to 1.33. Notice that both of these values are to the right of 0 (µ on our standard normal curve). That means that we are dealing with Case 3.

• Find the area/probability from 0 to 0.42. From the chart we find that the answer is 0.1628.

• Find the area/probability from 0 to 1.33. From the chart we find that the answer is 0.4082.

• Now subtract the two areas/probabilities. 0.4082 - 0.1628 = 0.2454

There is a 0.2454 probability that a randomly selected healthy adult would have a prophyrin concentration between 43 mg/dL and 54 mg/dL?

(b) We start by finding how much 47 is above the mean (38) in terms of standard deviations.

75.012

3847

=

−=

−=

z

z

xzσµ

Page 81: Descriptive Statistics Unit

NSSAL 75 Draft ©2011 C. D. Pilmer

The question can now be rephrased. "What is the probability that a randomly selected healthy adult would have a prophyrin concentration less than σµ 75.0+ ?"

Let's put this in the context of our standard normal curve. We need to find the area under the curve (which is equivalent to the probability) below 0.75. Notice we are trying to find the area under the curve to the left of a positive z value; this is Case 5.

• Find the area/probability less than 0. It is always 0.5 because we are dealing with exactly half of our standard normal curve.

• Find the area/probability from 0 to 0.75. From the chart we find that the answer is 0.2734.

• Now add the two areas/probabilities. 0.5 + 0.2734 = 0.7734

There is a 0.7734 probability that a randomly selected healthy adult would have a prophyrin concentration less than 47 mg/dL?

Checking Your Answers on the TI-83 or TI-84 (Optional)

The normalcdf command (normal cumulative density function command) allows one to determine the probability that a data point will fall within an interval for a known normal distribution. This command is found using the DISTR button.

normalcdf(lower limit, upper limit, mean, standard deviation)

In part (a) of example 4, we wanted to find the probability that a randomly selected healthy adult would have a prophyrin concentration between 43 mg/dL and 54 mg/dL? To do this we enter normalcdf(43, 54, 38, 12) into the calculator. It generates the probability 0.2472. This is very close to the 0.2454 we worked out by hand. The calculator actually produced a more accurate answer because we had to round off our z-scores to two decimal points when working things out by hand.

For questions where there is only one endpoint, it is recommended that one go 5 (or more) standard deviations above or below the mean. This happened in part (b) of example 4 where we had to find the probability that a randomly selected healthy adult would have a prophyrin concentration less than 47 mg/dL. Five standard deviation below the mean is -22 (38 - 5×12). We would enter normalcdf(-22, 47, 38, 12) into the calculator. It generates the probability 0.7734.

Page 82: Descriptive Statistics Unit

NSSAL 76 Draft ©2011 C. D. Pilmer

Questions

1. A population, which results in a bell-shaped distribution, has a mean of 42.7 and standard deviation of 7.9. How many standard deviations from the mean is each of these data points?

(a) 37.6 (b) 53.2 2. It may surprise you but professors at universities do not spend all their time teaching

graduate and undergraduate students. A significant amount of time is spent on research. So what percentage of time do professors spend teaching and on teaching-related activities? The NEA Almanac of Higher Education reported that the mean percentage of time spent on teaching activities is about 51% with a standard deviation of 25%. If we are dealing with a bell-shaped distribution, determine the z-scores corresponding to the following professors' percentage of time devoted to teaching activities.

(a) Dr. B. Pletner, 68% (b) Dr. R. Dawson, 43% 3. An NSCC instructor examined the results from a common exam offered at all campuses. She

discovered that the marks were normally distributed. She calculated the z-scores for her six learners. These are shown below.

Tylena, 0.93 Meera, -0.42 Elliott, 1.27 Hamid, -1.13 Beverly, 0.00 Marcus, 0.58

(a) Which of these learners scored above the mean? (b) Which of these learners scored below the mean? (c) Which of the learner scored on the mean? (d) Which of her learners obtained the best mark? Based on the information provided, can

you determine the mark? (e) Can you tell if every one of her learners passed the test? Explain.

Page 83: Descriptive Statistics Unit

NSSAL 77 Draft ©2011 C. D. Pilmer

4. The concentration of red blood cells in whole blood is measured in millions per cubic millimetre. Within the population of healthy females, the red blood cell concentration is normally distributed with a mean of 4.8 million/mm3 and a standard deviation of 0.3 million/mm3.

(Hint: Each of these five questions corresponds to the five cases we described earlier for area under the standard normal curve. You may wish to draw the standard normal curve as was done in the worked examples to assist you with each part of this question.)

(a) What is the probability that a randomly selected healthy female would have a red blood cell concentration between 4.8 and 5.3 million/mm3?

(b) What is the probability that a randomly selected healthy female would have a red blood cell concentration between 4.4 and 5.0 million/mm3?

(c) What is the probability that a randomly selected healthy female would have a red blood cell concentration between 5.2 and 5.5 million/mm3?

(d) What is the probability that a randomly selected healthy female would have a red blood cell concentration less than 4.6 million/mm3?

(e) What is the probability that a randomly selected healthy female would have a red blood cell concentration greater than 4.3 million/mm3?

Page 84: Descriptive Statistics Unit

NSSAL 78 Draft ©2011 C. D. Pilmer

5. A community examined the response times of their police department over a three year period. They discovered that the distribution of response times was bell-shaped and that the mean response time was 8.2 minutes with a standard deviation of 1.9 minutes. For a randomly received emergency call to the police department in that three year period, what is the likelihood that the response time will be:

(a) greater than 8.2 minutes? (b) between 6.0 and 8.2 minutes? (c) less than 9.3 minutes? (d) between 6.4 and 7.7 minutes? (e) between 4.2 and 8.8 minutes? (f) greater than 9.7 minutes?

Page 85: Descriptive Statistics Unit

NSSAL 79 Draft ©2011 C. D. Pilmer

6. A consumer magazine reports that the average life of a refrigerator before replacement is 14 years with a standard deviation of 2.5 years. Assume that the distribution of refrigeration life spans is approximately normal. What is the probability that someone will keep a refrigerator:

(a) between 11 years and 16 years? (b) greater than 15 years? (c) less than 14 years? (d) between 10 years and 13 years? (e) greater than 12 years? (f) between 8 years and 14 years?

Page 86: Descriptive Statistics Unit

NSSAL 80 Draft ©2011 C. D. Pilmer

Growth Charts One of the most common uses of standard deviations is in the production of growth charts used in the health sciences. These charts show the wide range of values for a particular measurement (e.g. weight, height, head circumference,…) for different ages. Normally we would use standard deviation to describe the spread of these measurements, but many growth charts use percentiles. Although the charts use percentiles, it is important to note that standard deviations were used in the construction of these percentiles. Each standard deviation represents a fixed percentile. For example −3σ is the 0.13th percentile, −2σ the 2.28th percentile, −1σ the 15.87th percentile, 0σ the 50th percentile, +1σ the 84.13th percentile, +2σ the 97.72th percentile, and +3σ the 99.87th percentile. You are not expected to know these values. Growth charts don't use percentiles like 0.13, 2.28 or 15.87, rather they use whole numbers like 3, 5, 10, 25, and so on. Percentiles rank the position of an individual by indicating what percent of the reference population the individual would equal or exceed. For example, on the weight growth charts, a 30-month-old boy whose weight is at the 25th percentile, weighs the same or more than 25 percent of the reference population of 30-month-old boys, and weighs less than 75 percent of the 30-month-old boys in the reference population. It is important to understand that the growth charts are best used to follow a child's growth over time or to find a pattern of his/her growth. Should one be concerned if a child consistently is in a low percentile for a particular measure? For example, should a parent be concerned if from the ages of 10 months to 32 months their girl ranks between the 5th and 10th percentile for weight? The answer is no; she is exhibiting normal growth. Should one be concerned with a sudden drop or sudden increase in a percentile value for a particular measure? For example, should a parent be concerned if their son dropped from the 90th percentile for weight at the age of 6 months to the 25th percentile at the age of 12 months? The answer is yes; such a large drop may indicate a problem. On the growth charts we will be using, there are nine lines/curves. The bottom line represents the 3rd percentile and the top line represented the 97th percentile. The other lines from top to bottom are the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentile. We have included these growth charts in the appendix, found at the end of this resource. We will need to use these charts to answer all the questions in this section. All of these charts are from the 2000 CDC Growth Charts for the United States: Methods and Development (Kuczmarski RJ, Ogden CL, Guo SS, et al. 2000 CDC growth charts for the United States: Methods and development. National Center for Health Statistics.Vital Health Stat 11(246). 2002). We should apologize ahead of time that we have only supplied growth charts for boys. The growth charts for boys are blue and those for

Source: Wikimedia Commons, Author: Mwtoews

Page 87: Descriptive Statistics Unit

NSSAL 81 Draft ©2011 C. D. Pilmer

girls are pink. Unfortunately charts in pink do not reproduce well in a black and white resource so we had to omit them.

Example 1 Using the weight growth chart for boys, answer the following. (a) In what percentile is a 3 month year old boy weighting 12 pounds (or 5.44 kg). What does

this percentile mean? (b) What weight would one expect for a four month old boy who is in the 75th percentile for

weight? (c) What range of weights would one expect for two month old boys who are between the 3rd

and 97th percentile for weight? (d) What range of ages would one expect for boys whose weights are 12 pounds yet stay within

the 3rd and 97th percentile for their age? Answers:

(a) On the vertical axis, find 12 pounds and on the horizontal axis, find 3 months. Plot the point (3, 12) on the coordinate system. This point intersects the fourth curve from the bottom; (i.e. the 25th percentile curve). It means that this 3 month old 12 pound boy weights as much or more than 25 percent of the boys of the same age.

(b) On the horizontal axis, find 4 months. Move up until we intersect the sixth curve from the bottom (i.e. the 75th percentile curve). This point corresponds with a weight of 16 pounds (or approximately 7.23 kg).

Page 88: Descriptive Statistics Unit

NSSAL 82 Draft ©2011 C. D. Pilmer

(c) A two month old boy in the 3rd percentile would only weigh approximately 8.8 pounds. A two month old boy in the 97th percentile weighs approximately 14.5 pounds. Therefore we would expect that weights between 8.8 pounds and 14.5 pounds would cover all two month old boys between the 3rd and 97th percentile.

(d) A one month old boy could weigh as much as 12 pounds if he is in the 97th percentile. A boy a little more than 4 month old could weigh as little as 12 kg if he is in the 3rd percentile. Therefore, boys between 1 month and a little more than 4 months of age could weigh 12 pounds yet still be within the 3rd and 97th percentile for their age.

Questions

1. In what percentile for head circumference is a 12 month old boy with a head circumference of 46.2 cm? Explain what this percentile means.

2. In what percentile for length is a 31 month old boy with a length of 99 cm (or 39 inches).

Explain what this percentile means.

Page 89: Descriptive Statistics Unit

NSSAL 83 Draft ©2011 C. D. Pilmer

3. For each case, determine the percentile ranking. (a) 33 month old boy, length = 36 inches (b) 21 month old boy, weight = 31 pounds (c) 30 month old boy, weight = 26 pounds (d) 23 month old boy, head circumference = 19.5 inches (e) 10 month old boy, length = 28.5 inches (f) 33 month old boy, head circumference = 19.75 inches (or approximately 51 cm) (g) 10 month old boy, weight = 24.5 pounds (or approximately 11.3 kg) (h) 28 month old boy, length = 33.5 inches (or approximately 86 cm) 4. For each case, determine the measure. (a) What weight would one expect for a twelve month old boy who is in the 5th percentile for

weight? (b) What length would one expect for a 20 month old boy who is in the 50th percentile for

length? (c) What head circumference would one expect for a 10 month old boy who is in the 97th

percentile for head circumference? 5. What range of lengths would one expect for 15 month old boys who are between the 3rd and

97th percentile for length? 6. What range of head circumferences would one expect for 30 month old boys who are

between the 3rd and 97th percentile for head circumference?

Page 90: Descriptive Statistics Unit

NSSAL 84 Draft ©2011 C. D. Pilmer

7. What range of ages would one expect for boys whose lengths are 31 inches yet stay within

the 3rd and 97th percentile for their age? 8. What range of ages would one expect for boys whose head circumferences are 16.25 inches

yet stay within the 3rd and 97th percentile for their age? 9. What range of weights would one expect for 33 month old boys who are between the 25th

and 75th percentile for weight? 10. What range of lengths would one expect for 22 month old boys who are between the 10th and

90th percentile for length? 11. What range of ages would one expect for boys whose weights are 21 pounds yet stay within

the 5th and 90th percentile for their age? 12. What range of ages would one expect for boys whose lengths are 29 inches yet stay within

the 25th and 75th percentile for their age? 13. Look at the weights of a particular boy over a 12 month period. Do you have concerns

regarding his weight? Explain.

Months 0 2 4 6 8 10 12 Weight (kg) 4.55 5.89 6.80 7.58 7.82 8.16 8.42

Page 91: Descriptive Statistics Unit

NSSAL 85 Draft ©2011 C. D. Pilmer

Putting It Together In this unit we looked at the following.

• Populations and Samples • Categorical and Numerical Data • Bar Graphs, Double Bar Graphs, Stacked Bar Graphs, Histogram, Circle Graphs and Line

Graphs • Mean, Trimmed Mean, Median, and Mode • Box and Whisker Plots (with and without technology) • Standard Deviation (with and without technology) • Distributions (Normal, Skewed, Bimodal, Uniform) • The 68-95-99.7 Rule for Normal Distributions • Z-Scores • Growth Charts

Questions:

1. The manager of the community sportsplex wanted to know how the 1386 members might feel about the discussion concerning an addition to the existing building that included a 25 metre, 8 lane pool. He asked 230 randomly selected members if they were willing to pay an additional $35 a year on their membership fee to have these new features. Describe the population and the sample for this situation.

2. For each of the following, state whether the data collection would result in a categorical data

set or numerical data set. If the data is numerical, indicate whether we are dealing with discrete or continuous data.

(a) The number of pets in Nova Scotian households

(b) The type of MP3 player owned by adults.

(c) The diameter of the trunk of spruce trees growing in a particular valley.

(d) The size of T-shirts worn by boys between the ages of 16 and 18 years

(e) The number of children traveling more than 1.5 kilometres to school.

(f) The time to complete a driver’s license renewal at a specific Access Nova Scotia location

Page 92: Descriptive Statistics Unit

NSSAL 86 Draft ©2011 C. D. Pilmer

3. The 5-year survival rates for six different types of cancers have been supplied in the graph below.

0

10

20

30

40

50

60

70

80

90

100

Prostate

Skin M

elano

maBrea

st

Colorec

talOva

ryBrai

n

Surv

ival

Rat

e %

1992 to 19942004 to 2006

Source: Canadian Cancer Registry (a) What was the approximate survival rate for colorectal cancer between 1992 and 1994? (b) What was the approximate survival rate for breast cancer between 2004 and 2006? (c) By approximately how much did the survival rate for ovarian cancer improve from 1992-

1994 to 2004-2006? (d) If approximately 22 200 Canadian women were diagnosed with breast cancer in 2006,

then how many are expected to survive? (e) What type of graph (bar, double bar, stacked bar, circle,…) are we dealing with here? (f) Can you conclude that there were fewer cases of brain cancer than prostate cancer based

on this graph? Why or why not?

Page 93: Descriptive Statistics Unit

NSSAL 87 Draft ©2011 C. D. Pilmer

4. A major fast food chain that specializes in pizzas had all its store report on the topping selected by all customers for their pizzas. This data was used to construct the circle graph below. It is also important to know that this chain sold 564 000 pizzas over a one year period amongst all of their establishments.

onions4%

pepperoni42%

vegetable15%

sausage19%

mushroom14%

other6%

(a) Are we dealing with a sample or a population? Explain. (b) What percentage of customers ordered vegetables on their pizza?

(c) What percentage of customers ordered sausage and/or onion on their pizzas?

(d) What percentage of customers ordered sausage and onion on their pizzas? (e) How many pizzas with pepperoni topping were sold during this year? (f) How many pizzas with sausage and/or mushroom toppings were sold during this year? (g) What is the ratio of pizzas with mushroom toppings to pizzas with pepperoni toppings? (h) There were 107 160 pizzas with a particular topping. What topping was it?

Page 94: Descriptive Statistics Unit

NSSAL 88 Draft ©2011 C. D. Pilmer

5. The following graph shows the number of infant deaths in Canada from 1999 to 2007.

1,720

1,740

1,760

1,780

1,800

1,820

1,840

1,860

1,880

1,900

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Year

Num

ber o

f Inf

ant D

eath

s

Source: Statistics Canada

What are your thoughts regarding the scale used on the vertical axis of this line graph? 6. Below you have been provided with data tables. Indicate what type of graph (histogram,

line, circle, bar, double bar, or stacked bar graph) you would use for this data. (a)

Graph Type: ___________________ (b) Graph Type: ___________________

Brand of Car

Canadian Market Share (Sept 2011)

Toyota 9.9% GM 12.8%

Honda 8.1% Ford 16.7%

Chrysler 15.8% Volkswagen 4.8%

Hyundai 13.1% Other 18.8%

Canadian Police-reported Crimes

2008 2009

Impaired Driving 84 759 88 630 Abduction 464 429

Arson 13 270 13 372 Counterfeiting 1015 798

Theft over $5000 16 743 15 573 Fraud 90 932 90 623

Uttering Threats 78 500 78 407 Extortion 1385 1701

Page 95: Descriptive Statistics Unit

NSSAL 89 Draft ©2011 C. D. Pilmer

(c)

Graph Type: ___________________ (d) Graph Type: ___________________

Cause for Lateness Frequency Snoozing after Alarm 83

Car Problems 23 Missed Public Transit 47

Family Crisis 62 Stuck in Traffic 113

Other 59

Mean Amount of Sleep in Hours

Number of People

5 - 6 26 6 - 7 74 7 - 8 103 8 - 9 57 9 - 10 21

(e)

Graph Type: ___________________ (f) Graph Type: ___________________

Time Height of Projectile in Metres

0 2.0 1 22.1 2 32.4 3 32.9 4 23.6 5 4.5

Department Jan Profit

($)

Feb Profit

($)

Mar Profit

($) Automotive 4045 5612 6289

Toys 2045 2549 3283 Electronics 6845 2248 1867 Sporting G. 2567 1217 1506 Footwear 4753 5608 6099

Men's 1598 2286 1894 Women's 3725 4589 4635

7. An airline company randomly selected eighteen suitcases from domestic flights and recorded

their weights in kilograms. 16.2 11.3 15.7 14.7 15.1 19.6 16.0 14.1 3.9 18.0 14.8 16.3 13.6 11.9 12.4 14.8 13.5 19.7

(a) Although the airline collected a sample, describe the population in this situation. (b) Would a histogram or bar graph be used with this data set? (c) Calculate the mean, median, mode, and 5% trimmed mean without using the STAT

feature on a TI-83/84 calculator.

Page 96: Descriptive Statistics Unit

NSSAL 90 Draft ©2011 C. D. Pilmer

8. Mr. Tetford's and Mrs. Gatien's learners wrote the same math test. The test was out of 30. The results for the two classes are shown below.

Mr. Tetford's Class 26 26 29 22 23 19 25 27 23 27 24 20 25

Mrs. Gatien's Class 25 27 23 21 23 22 20 24 20 30 21 24 20 22 (a) Construct box and whisker plots for each set of data without using a graphing calculator.

(b) What range of marks would place a learner in the top 50% of Mr. Tetford's class? (c) What range of marks would place a learner in the bottom 25% of Mrs. Gatien's class? (d) What range of marks would place a learner in the top 25% of the Mrs. Gatien's class? (e) How do the two classes compare in terms of marks on this math test?

5 10 15 20 25 30

Page 97: Descriptive Statistics Unit

NSSAL 91 Draft ©2011 C. D. Pilmer

9. A study looked at the concentration of iron in the bloodstream of ten randomly selected high performance female athletes. The following data was collected. The concentrations are measured in grams per decilitre (g/dl).

15.3 14.2 13.6 11.9 14.8 12.6 14.6 13.9 14.2 12.9

(a) Are we dealing with a population or a sample? (b) Calculate the mean without using the STAT features on your calculator. Use the

appropriate symbol. (c) Calculate the standard deviation without using the STAT features on your calculator..

ix

10. If you were collecting a random sample in each situation, what type of distribution (normal,

uniform, bimodal, skewed) would you likely obtain? Distribution Type (a) Hodgkin’s lymphoma is a type of cancer that originates from

white blood cells. This disease typically affects people either in early adulthood or when they are 55 years of age or older. You randomly select 250 patients with Hodgkin’s lymphoma and ask them to report the age of their initial diagnosis. What would the distribution of ages likely look like?

(b) Most people make under $40,000 a year, but some make quite a bit more, with a smaller number making many millions of dollars a year. What would the distribution of yearly earnings likely look like?

(c) James is working as a biologist for the summer and measuring the circumferences of randomly selected maple trees in a natural growth forest. What would the distribution of circumferences likely look like?

Page 98: Descriptive Statistics Unit

NSSAL 92 Draft ©2011 C. D. Pilmer

Distribution Type (d) You use the random number generator on your calculator to find

500 random whole numbers between 1 and 10. What would the distribution of numbers likely look like?

11. The body mass index of all 6000 new recruits to the armed forces were taken. The mean was

23.0 kg/m2 and the standard deviation 2.5 kg/m2. Assume that the distribution of body mass indexes was bell-shaped. (Hint: Use the 68-95-99.7% rule to solve these questions, rather than z-scores and the standard normal curve.)

(a) How many new recruits had body mass indexes between 23.0 kg/m2 and 25.5 kg/m2?

(b) How many new recruits had body mass indexes between 18.0 kg/m2 and 23.0 kg/m2?

(c) How many new recruits had body mass indexes between 15.5 kg/m2 and 30.5 kg/m2?

(d) How many new recruits had body mass indexes between 20.5 kg/m2 and 28.0 kg/m2?

(e) How many new recruits had body mass indexes between 18.0 kg/m2 and 30.5 kg/m2?

(f) How many new recruits had body mass indexes between 15.5 kg/m2 and 25.5 kg/m2?

(g) How many new recruits had body mass indexes between 25.5 kg/m2 and 28.0 kg/m2?

Page 99: Descriptive Statistics Unit

NSSAL 93 Draft ©2011 C. D. Pilmer

(h) How many new recruits had body mass indexes between 15.5 kg/m2 and 18.0 kg/m2? (i) How many new recruits had body mass indexes greater than 23.0 kg/m2? (j) How many new recruits had body mass indexes greater than 20.5 kg/m2? (k) How many new recruits had body mass indexes less than 28.0 kg/m2? (l) How many new recruits had body mass indexes greater than 25.5 kg/m2? (m) How many new recruits had body mass indexes less than 18.0 kg/m2? 12. Data collected over the last 100 years indicates that the average daily temperature for a

particular location in August is 26oC with a standard deviation of 3oC. If we are dealing with a bell-shaped distribution, determine the z-scores corresponding to each of these temperatures.

(a) 31oC (b) 24oC

Page 100: Descriptive Statistics Unit

NSSAL 94 Draft ©2011 C. D. Pilmer

13. Scores on the Wechsler Adult Intelligence Scale (i.e. an IQ test) for 20 to 34 year old adults are approximately normal with a mean of 110 and a standard deviation of 25. For a randomly selected adult within that age group, determine (without using a graphing calculator) the likelihood that their IQ will be:

(a) between 104 and 128? (b) between 80 and 110? (c) greater than 110? (d) less than 132? (e) between 90 and 107? (f) greater than 150?

Page 101: Descriptive Statistics Unit

NSSAL 95 Draft ©2011 C. D. Pilmer

14. In what percentile for head circumference is a 11 month old boy with a head circumference of 44.4 cm? Explain what this percentile means.

15. What weight would one expect for a 24 month old boy who is in the 25th percentile for

weight? 16. What range of lengths would one expect for 28 month old boys who are between the 3rd and

97th percentile for lengths? 17. What range of ages would one expect for boys whose lengths are 25 inches yet stay within

the 3rd and 97th percentile for their age? 18. What range of head circumferences would one expect for 25 month old boys who are

between the 10th and 90th percentile for head circumference?

Page 102: Descriptive Statistics Unit

NSSAL 96 Draft ©2011 C. D. Pilmer

Areas Under the Normal Curve (z-Table) The values inside the table represent the areas under the normal curve for values between 0 and a z-score. For example, to determine the area under the curve between 0 and 1.37, look in the intersecting cell for the row labeled 1.3 and the column labeled 0.07. The area is 0.4147.

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Page 103: Descriptive Statistics Unit

NSSAL 97 Draft ©2011 C. D. Pilmer

Page 104: Descriptive Statistics Unit

NSSAL 98 Draft ©2011 C. D. Pilmer

Page 105: Descriptive Statistics Unit

NSSAL 99 Draft ©2011 C. D. Pilmer

Page 106: Descriptive Statistics Unit

NSSAL 100 Draft ©2011 C. D. Pilmer

Post-Unit Reflections What is the most valuable or important thing you learned in this unit?

What part did you find most interesting or enjoyable?

What was the most challenging part, and how did you respond to this challenge?

How did you feel about this topic when you started this unit?

How do you feel about this topic now?

Of the skills you used in this unit, which is your strongest skill?

What skill(s) do you feel you need to improve, and how will you improve them?

How does what you learned in this unit fit with your personal goals?

Page 107: Descriptive Statistics Unit

NSSAL 101 Draft ©2011 C. D. Pilmer

Answers Populations and Samples (pages 1 to 2) 1. Population: all the taxpayers in this community (4127) Sample: the 300 randomly selected taxpayers 2. Population: all the used bricks that the contractor purchased (6000) Sample: the 200 randomly selected bricks that were examined to determine usability 3. Population: all of the employed workers in Nova Scotia (453 000) Sample: the 1200 randomly selected employed workers who participated in the survey and

reported their annual gross income 4. Population: all of the adults who received a high school diploma from NSSAL between 2001

and 2009 Sample: the 240 randomly selected NSSAL graduates who participated in the interview Tables (pages 3 to 4) 1. Star Wars: Episode 0 2. Star Wars: Episode 0 3. Terminator: Rise of the Toasters 4. Jaws: The Teething Years 5. Transformers: The Horse and Buggy Years 6. A graph of some fashion 7. It is far easier to use this graph to answer the questions on the previous page. 8. Population Types of Data (pages 5 to 6) 1. (a) numerical (continuous) (b) categorical (c) categorical (d) numerical (continuous) (e) numerical (discrete) (f) numerical (continuous) (g) categorical (h) numerical (continuous) (i) numerical (discrete) (j) numerical (continuous) (k) numerical (discrete) (l) categorical

Page 108: Descriptive Statistics Unit

NSSAL 102 Draft ©2011 C. D. Pilmer

(m) numerical (continuous) (n) categorical Bar Graphs and Histograms (pages 7 to 14) 1. (a) baseball (b) approximately 78 million fans (c) football (d) little less than 20 million fans (e) bar graph 2. (a) double bar graph (b) Germany (c) 37 medals (d) Norway (e) 2 medals (f) 7 medals (g) 116 medals (h) 121 medals 3. (a) histogram (b) numerical, continuous (c) approximately 52 000 RNs (24 000 + 28 000) (d) approximately 18 000 RNs (36 000 - 18 000) (e) three classes: 45 to 49 years, 50 to 54 years, and 55 to 59 years (f) shortage of RNs in the future 4. (a) stacked bar graph (b) no (c) little more than 1300 cases (d) approximately 550 cases (e) approximately 300 cases (850 - 550) (f) consult visits 2005/2006: 460 (540-80) consult visits 2006/2007: 660 (750-90) 660 - 460 = 200 cases (g) inpatient days decreased significantly but consult visits increased by a similar amount 5. (a)

Page 109: Descriptive Statistics Unit

NSSAL 103 Draft ©2011 C. D. Pilmer

(b) %3216

(c) numerical, continuous (d) sample Circle Graphs and Line Graphs (pages 15 to 19) 1. (a) automobile accidents (b) 3 times (c) 288 (d) 60

(e) (ii) 7

12

(f) home injuries 2. (a) Jan - Feb 08, Aug - Sept 08, Jan - Feb 09, Jan - Feb 10, Oct - Nov 10 (b) Oct 08 (c) May 09 (d) $15 000 million ($15 billion) 3. (a) 40%

(b) 127

(c) 242 starts (d) 340 4. (a) 13th day, $7.40 (b) $11.40 per share (c) 15th day, $2.50 per share First Impression/Second Impressions (pages 20 to 23)

(More detailed responses are required than what is supplied below.)

Part 1 - The perspective of the circle graph that was initially presented can lead one to believe that the three brands of ice cream are favored equally; this is not the case. Part 2 - One may initially assume that the population of Trois-Rivieres is 4 to 5 times that of Lethbridge if one did not consider the scale on the vertical axis. On the first bar graph, the vertical axis starts at 50 000, rather than 0 (as it does on the second graph). Part 3 - Because the first graph deals with percentages, we do know what percentage of patrons for each ride were male and female. However, we are unable to see how the rides compared to

Page 110: Descriptive Statistics Unit

NSSAL 104 Draft ©2011 C. D. Pilmer

each other in terms of attracting patrons. This only occurred when we were able to examine the second graph which plotted number of people on the vertical axis. Part 4 - The first graph may have made individuals believe that the average price of a domestic airfare was fluctuating wildly. This occurs when one fails to look at the scale on the vertical axis. In the first graph, the scale starts at $160, rather than $0 (as it does in the second graph). What Type of Graph Should Be Used? (pages 24 to 25) 1. Double Bar Graph (or Stacked Bar Graph) 2. Circle Graph (or Bar Graph) 3. Line Graph 4. Histogram 5. Stacked Bar Graph 6. Circle Graph (or Bar Graph) 7. Bar Graph 8. Double Bar Graph 9. Histogram 10. Line Graph Mean, Median, Mode, and Trimmed Mean (pages 26 to 33) 1. (a) sample (b) 2.6=x Median = 6 Mode = 7 (c) There are no outliers. 2. (a) population (b) numerical (c) 44.159=µ Median = 157 No Mode 3. (a) sample (b) 35=x (34.6) Median = 31 Mode = 23 and 27 (bimodal) 5% Trimmed Mean ( ) 31=Tx (30.6) 10% Trimmed Mean ( ) 31=Tx (30.9)

Page 111: Descriptive Statistics Unit

NSSAL 105 Draft ©2011 C. D. Pilmer

(c) Trimmed means are appropriate because the outlier 115 exists within the data set. (d) Four data points from the bottom and four data points from top of the data set 4. (a) 268=x (267.875) Median = 254 (253.5) Mode = 267 ( ) 255=Tx (255.409) (b) Median and Trimmed Mean (c) Histogram 5. This score system was likely implemented to eliminate the effect of a single rogue judge who

would inflate or deflate the score of a particular athlete. Box and Whisker Plots (pages 34 to 40) 1 (a) minimum: 6 lower quartile: 11 median: 17 upper quartile: 21 maximum: 30

(b) minimum: 33 lower quartile: 40 median: 44 upper quartile: 48 maximum: 52

(c) minimum: 24 lower quartile: 25.5 median: 30 upper quartile: 35 maximum: 40

(d) minimum value: 28 lower quartile: 35 median: 36.5 upper quartile: 38 maximum: 41 2. (a) minimum: 7 lower quartile: 10.5 median: 18 upper quartile: 20.5 maximum: 22 (b) The median, upper quartile and maximum for Mr. Porter's class are equal to those for

Mr.Churchill's class. That means that in both classes student with slower reaction times (i.e. worse than the median) were performing at the approximately the same level. When we compared students with faster reaction times (i.e. better than the median), however, we notice a difference between the two classes. Because Mr. Churchill's class has a

Page 112: Descriptive Statistics Unit

NSSAL 106 Draft ©2011 C. D. Pilmer

smaller minimum and lower quartile, we can say that his faster reaction time students in general out-performed Mr. Porter's faster reaction time students.

(c) Mrs. Lowe's Class Mr. Vroom's Class minimum: 6 minimum: 6 lower quartile: 10 lower quartile: 15 median: 14 median: 18 upper quartile: 18 upper quartile: 23 maximum: 20 maximum: 23 With the exception of the minimum, all other values are lower (faster reaction times) for

Mrs. Lowe's class. That means that the majority of Mrs. Lowe's students out-performed Mr. Vroom's students in the reaction time experiment.

(d) Mrs. Burchill's Class Mr. Rhodenizer's Class minimum: 5 minimum: 6 lower quartile: 10 lower quartile: 9 median: 12.5 median: 13 upper quartile: 16 upper quartile: 16 maximum: 21 maximum: 22 The two box-and-whisker plots are very similar. One can conclude that the students

performed at about the same level on the reaction time experiment. Using Technology to Make Box-and-Whisker Plots (pages 41 to 45) 1. (a) Tanya Barb Suzette minimum: 2 minimum: 6 minimum: 4 lower quartile: 8 lower quartile: 12 lower quartile: 7 median: 20 median: 17 median; 10 upper quartile: 24 upper quartile: 20 upper quartile: 21 maximum: 25 maximum: 25 maximum: 30 (b) Tanya's Mean: 16.2 Barb's Mean: 15.9 Suzette's Mean: 13.9 (c) Tanya Barb Suzette Class Frequency Class Frequency Class Frequency 0 to 5 3 0 to 5 0 0 to 5 1 5 to 10 1 5 to 10 2 5 to 10 7 10 to 15 2 10 to 15 4 10 to 15 2 15 to 20 1 15 to 20 4 15 to 20 1 20 to 25 5 20 to 25 3 20 to 25 2 25 to 30 3 25 to 30 1 25 to 30 2 30 to 35 0 30 to 35 0 30 to 35 1

Page 113: Descriptive Statistics Unit

NSSAL 107 Draft ©2011 C. D. Pilmer

(d) Tanya (e) Tanya (f) Barb (g) 24 to 25 points (h) 6 to 12 points (i) 10 to 30 points 2. (a) Mean Time: 12.0 (b) minimum: 10.6 lower quartile: 11.2 median: 12.05 upper quartile: 12.5 maximum: 16.2 (c) Class Frequency 10 to 11 4 11 to 12 10 12 to 13 12 13 to 14 3 14 to 15 0 15 to 16 0 16 to 17 1 (d) no (e) 10.6 to 12.05 seconds (f) 12.5 to 16.2 seconds (g) 10.6 to 11.2 seconds (h) Track Meet A 3. Class A Class B minimum: 20.2 minimum: 17.2 lower quartile: 23.5 lower quartile: 19.2 median: 26.85 median: 22.15 upper quartile: 28.1 upper quartile: 27.7 maximum: 29.4 maximum: 32.7 Although the median for Class B is much lower (and in the normal range), we have far more

extremes in this class. There are a significant number in Class B that are underweight or obese; that is why the box and whiskers are so much larger when plotting this classes BMI data. For Class A the data is more clustered together with all individual being found within the normal and overweight range, although more than half are in the overweight category.

Standard Deviation (pages 46 to 50) 1. =σ 2.89 2. =σ 0.41

Page 114: Descriptive Statistics Unit

NSSAL 108 Draft ©2011 C. D. Pilmer

3. (a) =σ 1.49 and =σ 2.49 (b) The standard deviation is lower for the first data set. That means this data is not as

spread out as the data in the second data set. 4. (a) 183 (b) 182 (c) numerical data set (d) 90.4=σ (e) The average heights of these two groups of learners are the same however the standard

deviation for Barb’s group is much lower. That means that there is less variation in heights between Barb’s male learners compared to the other instructor’s learners. The heights of her learners are more clustered around the mean.

(f) The standard deviations are almost the same for the two groups of male learners, however, the mean height for Barb’s group is higher. We can conclude that the average height of male learners in Barb’s math courses is three centimeters more than the third instructor’s male students. The variation in heights between the two groups is essentially the same.

5. Histogram (i) matches with (c). Histogram (ii) matches with (b). Histogram (iii) matches with (d). Histogram (iv) matches with (a). 6. Answers will vary. Using Technology to Calculate Population Standard Deviation (pages 52 to 56) 1. (a) population (b)

(c) 1.14=µ , median: 9.91 , 2.11=σ (Units: young persons out of 10 000 young persons) (d) The mean is high because the incarceration rate for the Northwest Territories is so much

higher than the rates. 2. (a) population (b)

Page 115: Descriptive Statistics Unit

NSSAL 109 Draft ©2011 C. D. Pilmer

(c) 6.55=µ years (d) 5.9=σ years (e) median: 54.5 years (f) The data does not cluster well around the mean. 3. (a)

(b) 6.3=µ mmol/L (c) 90.0=σ mmol/L (d) median: 3.4 mmol/L (e) Most of the patients are clustered in the near optimal and borderline ranges. There are a

few who are in desirable range, and even a few more in the high and too high ranges. Distributions (pages 57 to 59) 1. (a) uniform (b) bimodal (c) skewed right (d) normal (e) skewed left (f) uniform (g) normal (h) skewed left (i) bimodal (j) normal Normal Distributions and the 68-95-99.7 Rule (pages 60 to 67) Hint: Calculation: Answer: 1. (a) Between σµ − and σµ + -- 68% (b) Between µ and σµ 2+ -- 47.5% (c) Between σµ − and µ -- 34% (d) Between σµ 3− and µ -- 49.85% (e) Between σµ 2− and σµ + 47.5% + 34% 81.5% (f) Between σµ − and σµ 3+ 34% + 49.85% 83.85% (g) Between σµ 3− and σµ 2+ 49.85% + 47.5% 0.9735 (h) Between σµ + and σµ 2+ 47.5% - 34% 0.135 (i) Between σµ 3− and σµ 2− 49.85% - 47.5% 0.0235 (j) Between σµ + and σµ 3+ 49.85% - 34% 0.1585 (k) Less than σµ + 50% + 34% 0.84 (l) Greater than σµ 2− 47.5% + 50% 0.975 (m) Less than σµ − 50% - 34% 0.16

Page 116: Descriptive Statistics Unit

NSSAL 110 Draft ©2011 C. D. Pilmer

Hint: Calculation: Percentage: Answer: 2. (a) Between σµ 3− and σµ 3+ -- 99.7% 1994 (b) Between σµ − and σµ + -- 68% 1360 (c) Between σµ 2− and µ -- 47.5% 950 (d) Between σµ − and σµ 2+ 34% + 47.5% 81.5% 1630 (e) Between σµ + and σµ 2+ 47.5% - 34% 13.5% 270 (f) Between σµ 2− and σµ 2+ -- 95% 1900 (g) Between σµ 3− and σµ − 49.85% - 34% 15.85% 317 (h) Between σµ 2− and σµ 3+ 47.5% + 49.85% 97.35% 1947 (i) Between σµ 3− and µ -- 49.85% 997 (j) Between σµ 2+ and σµ 3+ 49.85% – 47.5% 2.35% 47 (k) Less than σµ 2− 50% - 47.5% 2.5% 50 Z-Scores (pages 68 to 79) 1. (a) -0.65 (b) 1.33 2. (a) 0.68 (b) -0.32 3. (a) Tylena, Elliott, Marcus (b) Meera, Hamid ` (c) Beverly (d) Elliott, no (e) No, they may have all passed if the mean mark was very high or the majority could have

failed if the mean mark was very low. Without the mean and standard deviation we cannot tell who passed and who failed.

4. (a) 0.4525 (b) 0.4082 + 0.2486 = 0.6568 (c) 0.4901 - 0.4082 = 0.0819 (d) 0.5 - 0.2486 = 0.2514 (e) 0.5 + 0.4525 = 0.9525 5. (a) 0.5 (b) 0.3770 (c) 0.5 + 0.2190 = 0.7190 (d) 0.3289 - 0.1026 = 0.2263 (e) 0.4826 - 0.1255 = 0.6081 (f) 0.5 - 0.2852 = 0.2148 6. (a) 0.3849 + 0.2881 = 0.6730

Page 117: Descriptive Statistics Unit

NSSAL 111 Draft ©2011 C. D. Pilmer

(b) 0.5 - 0.1554 = 0.3446 (c) 0.5 (d) 0.4452 - 0.1554 = 0.2898 (e) 0.2881 + 0.5 = 0.7881 (f) 0.4918 Growth Charts (pages 80 to 84) 1. 50th percentile; The head circumference for this 12 month old boy is equal to or greater than

the head circumference of 50% of the boys of the same age. 2. 95th percentile; The length of this 31 month old boy is equal to or greater than the length of

95% of the boys of the same age. 3. (a) 25th percentile (b) 90th percentile (c) 10th percentile (d) 75th percentile (e) Between the 25th and 50th percentile (f) Between the 50th and 75th percentile (g) Between the 90th and the 95th percentile (h) Between 5th and 10th percentile 4. (a) 19 pounds (approximately 8.6 kg) (b) 33 inches (approximately 83.7 cm) (c) 19 inches (approximately 48.2 cm) 5. 29 inches (approximately 73.6 cm) to 33.5 inches (approximately 85.1 cm) 6. 18.25 inches (approximately 46.3 cm) to 20.5 inches (approximately 52 cm) 7. 10 to 21 months 8. 1 to 6 months 9. 28.5 pounds (approximately 12.9 kg) to 33 pounds (approximately 15 kg) 10. 32 inches (approximately 81.3 cm) to 35.5 inches (approximately 90.2 cm) 11. 6 to 17 months 12. 9 to 12 months 13. (Hint: Change to Percentiles) Should be concerned; the boy went from 97th percentile for

weight at birth to the 3rd percentile for weight by the age of 12 months

Page 118: Descriptive Statistics Unit

NSSAL 112 Draft ©2011 C. D. Pilmer

Putting It Together (pages 85 to 95) 1. Population: all 1386 members of the sportsplex Sample: the 230 randomly selected members 2. (a) Numerical, Discrete (b) Categorical (c) Numerical, Continuous (d) Categorical (e) Numerical, Discrete (f) Numerical, Continuous 3. (a) 56% (b) 87% (c) 4% (d) 19314 (if you use a survival rate of 87%) (e) double bar (f) No, The graph does not show the number of cases. It only shows survival rates. 4. (a) population because all stores had to report toppings selected by all customers. (b) 15% (c) 23% (d) Cannot determine based on the information supplied. (e) 236 880 pizzas (f) 186 120 pizzas

(g) 31

(h) sausage 5. The scale used makes one initially feel that there were drastic fluctuations in the number of

infant deaths between 2004 and 2007. This is not the case. 6. (a) circle graph (b) double bar graph (c) bar graph (d) histogram (e) line graph (f) stacked bar graph 7. (a) Population: All suitcases on domestic flights (b) Histogram (c) 5.14=x kg, Median = 14.8 kg, Mode = 14.8, ( ) 9.14=Tx kg 8. (a) Mr. Tetford's Class Mrs. Gatien's Class Minimum: 19 Minimum: 20 Lower Quartile: 22.5 Lower Quartile: 21 Median: 25 Median: 22.5 Upper Quartile: 26.5 Upper Quartile: 24 Maximum: 29 Maximum: 30

Page 119: Descriptive Statistics Unit

NSSAL 113 Draft ©2011 C. D. Pilmer

(b) 25 to 29 (c) 20 to 21 (d) 24 to 30 (e) Although Mrs. Gatien's class' lowest and highest marks are better than those for Mr.

Tetford's class, the middle 50% of her learners obtained marks between 21 and 24, while the middle 50% of Mr. Tetford's learners obtained marks between 22.5 and 26.5 (actually between 23 and 26 because half points were not awarded on the test). Mr. Tetford's class outperformed Mrs. Gatien's class on this particular test.

9. (a) sample (b) 13.8 g/dl (c) 1.01 g/dl 10. (a) Bimodal (b) Skewed (left) (c) Normal (d) Uniform 11. (a) 2040 (b) 2850 (c) 5982 (d) 4890 (e) 5841 (f) 5031 (g) 810 (h) 141 (i) 3000 (j) 5040 (k) 5850 (l) 960 (m) 150 12 (a) 1.67 (b) -0.67 13. (a) 0.0948 + 0.2642 = 0.3590 (b) 0.3849 (c) 0.50 (d) 0.3106 + 0.5 = 0.8106 (e) 0.2881 - 0.0478 = 0.2403 (f) 0.5 - 0.4452 = 0.0548 14. 10th percentile; The head circumference for this 11 month old boy is equal to or greater than

the head circumference of 10% of the boys of the same age. 15. 26 pounds (or 11.8 kg) 16. 33 inches to 38.5 inches 17. 2 months to approximately 6.7 months 18. 18.5 inches to almost 20 inches