Chapter Statistics 81&255(&7('6$03/(3$*(6

70
Chapter Collecting and using data Review of statistical graphs (Consolidating) Summary statistics Box plots Standard deviation (10A) Time-series data Bivariate data and scatter plots Line of best fit by eye Linear regression with technology 9A 9B 9C 9D 9E 9F 9G 9H 9I STATISTICS AND PROBABILITY Data representation and interpretation Determine quartiles and interquartile range. Construct and interpret box plots and use them to compare data sets. Compare shapes of box plots with corresponding histograms and dot plots. Use scatter plots to investigate and comment on relationships between two continuous variables. Investigate and describe bivariate numerical data for which the independent variable is time. Evaluate statistical reports in the media and other places by linking claims to displays, statistics and representative data. Calculate and interpret the mean and standard deviation data and use these to compare data sets. Use information technologies to investigate bivariate numerical data sets. Where appropriate, use a straight line to describe the relationship, allowing for variation. What you will learn Australian curriculum 9 Statistics (10A) (10A) (10A) UNCORRECTED SAMPLE PAGES Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Transcript of Chapter Statistics 81&255(&7('6$03/(3$*(6

Page 1: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapter

Collecting and using dataReview of statistical graphs (Consolidating)Summary statisticsBox plotsStandard deviation (10A)

Time-series dataBivariate data and scatter plotsLine of best fit by eyeLinear regression with technology

9A9B

9C9D9E9F9G

9H9I

S T A T I S T I C S A N D P R O B A B I L I T Y

Data representation and interpretationDetermine quartiles and interquartile range.Construct and interpret box plots and use them to compare data sets.Compare shapes of box plots with corresponding histograms and dot plots.Use scatter plots to investigate and comment on relationships between two continuous variables.Investigate and describe bivariate numerical data for which the independent variable is time.Evaluate statistical reports in the media and other places by linking claims to displays, statistics and representative data.

Calculate and interpret the mean and standard deviation data and use these to compare data sets.

Use information technologies to investigate bivariate numerical data sets. Where appropriate, use a straight line to describe the relationship, allowing for variation.

16x16 32x32

What you will learn

Australian curriculum

9 Statistics

9781107xxxxxxc09.indd 2 30/07/15 4:55 PM

(10A)

(10A)

(10A)

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 2: Chapter Statistics 81&255(&7('6$03/(3$*(6

According to the Australian Bureau of Statistics the population of Indigenous Australians has increased from 351 000 in 1991 to 548 000 in 2011. This represents an annual increase of 2.3% compared with a total Australian annual population increase of 1.2%.

The life expectancy for Indigenous Australians, however, is not as high as for non-Indigenous Australians. For males it is about 69 years (compared to about 80 years Australia-wide) and about

74 years for females (compared to about 83 years Australia-wide). This represents a life expectancy gap for Indigenous Australians of about 11 years for males and 9 years for females.

The Australian government works with departments such as the Australian Bureau of Statistics to collect accurate data, like the examples above, so it can best determine how to allocate resources in assisting Indigenous communities.

• Chapter pre-test• Videos of all worked

examples• Interactive widgets• Interactive walkthroughs• Downloadable HOTsheets• Access to HOTmaths

Australian Curriculum courses

Online resources

Closing the gap

9781107xxxxxxc09.indd 3 30/07/15 4:55 PM

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 3: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

612 612 Chapter 9 Statistics

9A Collecting and using dataThere are many reports on television and radio that begin with the

words ‘A recent study has found that …’. These are usually the result

of a survey or investigation that a researcher has conducted to collect

information about an important issue, such as unemployment, crime or

obesity.

Sometimes the results of these surveys are used to persuade people

to change their behaviour. Sometimes they are used to pressure

the government into changing the laws or to change the way the

government spends public money.

Results of surveys and other statistics can sometimes be misused or

displayed in a way to present a certain point of view.

Let’s start: Improving survey questions

Here is a short survey. It is not very well constructed.

Question 1: How old are you?

Question 2: How much time did you spend sitting in front of the television or a computer yesterday?

Question 3: Some people say that teenagers like you are lazy and spend way toomuch time sitting around

when you should be outside exercising. What do you think of that comment?

Have a class discussion about the following.

• What will the answers to Question 1 look like? How could they be displayed?

• What will the answers to Question 2 look like? How could they be displayed?

• Is Question 2 going to give a realistic picture of your normal daily activity?

• Do you think Question 2 could be improved somehow?

• What will the answers to Question 3 look like? How could they be displayed?

• Do you think Question 3 could be improved somehow?

Keyideas

Surveys are used to collect statistical data.

• Survey questions need to be constructed carefully so that the person knows exactly what

sort of answer to give. Survey questions should use simple language and should not be

ambiguous.

• Survey questions should not be worded so that they deliberately try to provoke a certain kind

of response.

• If the question contains an option to be chosen from a list, the number of options should be an

odd number, so that there is a ‘neutral’ choice. For example, the options could be:

strongly agree agree unsure disagree strongly disagree

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 4: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 613 613

Keyideas

A population is a group of people, animals or objects with something in common. Some

examples of populations are:

• all the people in Australia on Census Night

• all the students in your school

• all the boys in your maths class

• all the tigers in the wild in Sumatra

• all the cars in Brisbane

• all the wheat farms in NSW.

A sample is a group that has been chosen from a population. Sometimes information from a

sample is used to describe the whole population, so it is important to choose the sample carefully.

Statistical data can be divided into subgroups.

Data

Categorical Numerical

Nominal(Categories

have no order;e.g. colours:

red, blue, yellow)

Ordinal(Categories can

be ordered;e.g. low, medium,

high)

Discrete(Data can take

on only a limitednumber of values;

e.g. number ofchildren ina family)

Continuous(Data can takeon any value ina given range;e.g. time taken

to complete a race)

Example 1 Describing types of data

What type of data would the following survey questions generate?

a How many televisions do you have in your home?

b To what type of music do you most like to listen?

SOLUTION EXPLANATION

a numerical and discrete The answer to the question is a number with a

limited number of values; in this case, a whole

number.

b categorical and nominal The answer is a type of music and these

categories have no order.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 5: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

614 614 Chapter 9 Statistics

Example 2 Choosing a survey sample

A survey is carried out on the internet to determine Australia’s favourite musical performer.

Why will this sample not necessarily be representative of Australia’s views?

SOLUTION EXPLANATION

An internet survey is restricted to people with

a computer and internet access, ruling out some

sections of the community from participating in

the survey.

The sample may not include some of the older

members of the community or those in areas

without access to the internet. Also, the survey

would need to be set up so that people can

do it only once so that ‘fake’ surveys are not

completed.

Exercise 9A

1 Match each word (a–e) with its definition (A–E).populationa a group chosen from a populationAcensusb a tool used to collect statistical dataBsamplec all the people or objects in questionCsurveyd statistics collected from every member of the populationDdatae the factual information collected from a survey or other sourceE

2 Match each word (a–f) with its definition (A–F).numericala categorical data that has no orderAcontinuousb data that are numbersBdiscretec numerical data that take on a limited number of valuesCcategoricald data that can be divided into categoriesDordinale numerical data that take any value in a given rangeEnominalf categorical data that can be orderedF

3 Classify each set of data as categorical or numerical.

4.7, 3.8, 1.6, 9.2, 4.8ared, blue, yellow, green, blue, redblow, medium, high, low, low, mediumc3 g, 7 g, 8 g, 7 g, 4 g, 1 g, 10 gd

4 Which one of the following survey questions would generate categorical data?

How many times do you eat at your favourite fast-food place in a typical week?AHow much do you usually spend buying your favourite fast food?BHow many items did you buy last time you went to your favourite fast-food place?CWhich is your favourite fast food?D

UNDE

RSTA

NDING

—1–4 3–4

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 6: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 615 615

9A5Example 1 Year 10 students were asked the following questions in a survey. Describe what type of

data each question generates.

How many people under the age of 18 years are

there in your immediate family?

a

How many letters are there in your first name?bWhich company is the carrier of your mobile

telephone calls? Optus/Telstra/

Vodafone/3/Virgin/Other (Please specify.)

c

What is your height?d

Choices

categorical numerical

ordinal nominal discrete continuous

How would you describe your level of application in Maths? (Choose from very high,

high, medium or low.)

e

FLUE

NCY

55 5

6Example 2 The principal decides to survey Year 10 students to determine their opinion of Mathematics.

a In order to increase the chance of choosing a representative sample, the principal should:

Give a survey form to the first 30 Year 10 students who arrive at school.AGive a survey form to all the students studying the most advanced Maths subject.BGive a survey form to five students in every Maths class.CGive a survey form to 20% of the students in every class.D

b Explain your choice of answer in part a. Describe what is wrong with the other three options.

7 Discuss some of the problems with the selection of a survey sample for each given topic.

A survey at the train station of how Australians get to work.aAn email survey on people’s use of computers.bPhoning people on the electoral roll to determine Australia’s favourite sport.c

8 Choose a topic in which you are especially interested, such as football, cricket, movies, music,

cooking, food, computer games or social media.

Make up a survey about your topic that you could give to the people in your class.

It must have four questions.

Question 1 must produce data that are categorical and ordinal.

Question 2 must produce data that are categorical and nominal.

Question 3 must produce data that are numerical and discrete.

Question 4 must produce data that are numerical and continuous.

PROB

LEM-SOLVING

7–96, 7 6–8

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 7: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

616 616 Chapter 9 Statistics

9A9 A television news reporter surveyed four companies and found that the profits of three of

these companies had reduced over the past year. They report that this means the country is

facing an economic downturn and that only one in four companies is making a profit.

What are some of the problems in this media report?aHow could the news reporter improve their sampling methods?bIs it correct to say that only one in four companies is making a profit? Explain.c

PROB

LEM-SOLVING

10 Here are two column graphs, each showing the same results of a survey that asked people

which TV channel they preferred.

Graph ATV channel preference

Chann

el 15

Chann

el 16

Chann

el 17

Chann

el 18

25

24

23

26

27

Num

ber

of p

eopl

e

Graph BTV channel preference

Chann

el 15

Chann

el 16

Chann

el 17

Chann

el 18

Num

ber

of p

eopl

e

5

10

15

20

25

30

Which graph could be titled ‘Channel 15 is clearly most popular’?aWhich graph could be titled ‘All TV channels have similar popularity’?bWhat is the difference between the two graphs?cWhich graph is misleading and why?d

11 Describe three ways that graphs or statistics could be used to mislead people and give a false

impression about the data.

12 Search the internet or newspaper for ‘misleading graphs’ and ‘how to lie with statistics’.

Explain why they are misleading.

REAS

ONING

11, 1210 10, 11

The 2011 Australian Census

13 Research the 2011 Australian Census on the website of the Australian Bureau of Statistics.

Find out something interesting from the results of the 2011 Australian Census and write a

short news report.

14 It is often said that Australia has an ageing population. What does this mean?

Search the internet for evidence showing that the ‘average’ Australian is getting older every year.

ENRICH

MEN

T

13, 14— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 8: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 617 617

9B Review of statistical graphs CONSOLIDATING

Statistical graphs are an essential element in the

analysis and representation of data. Graphs can

help to show the most frequent category, the range

of values, the shape of the distribution and the

centre of the data. By looking at statistical graphs

the reader can quickly draw conclusions about the

numbers or categories in the data set and interpret

this within the context of the data.

Let’s start: Public transport analysis

A survey was carried out to find out how many times people in the group had used public transport in the

past month. The results are shown in this histogram.

010 20 30 40

2

6

Num

ber

of p

eopl

e

Transport use

8

4

50 60

Discuss what the histogram tells you about this group of people and their use of public transport.

You may wish to include these points:

• How many people were surveyed?

• Is the data symmetrical or skewed?

• Is it possible to work out the exact mean? Why/why not?

• Do you think these people were selected from a group in your own community? Give reasons.

Keyideas

The types of statistical data that we saw in the previous section; i.e. categorical (nominal or

ordinal) and numerical (discrete or continuous), can be displayed using different types of graphs

to represent the different data.

Graphs for a single set of categorical or discrete data

Dot plot

3 421

• Stem-and-leaf plotStem Leaf

0 1 31 2 5 92 1 4 6 73 0 4

2 | 4 means 24

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 9: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

618 618 Chapter 9 Statistics

Keyideas Column graph

0

1

3

Fre

quen

cy

Score

4

2

5

Poor

Low

Med

iumGoo

d

Excell

ent

Histograms can be used for grouped discrete or continuous numerical data.

The interval 10– includes all numbers from 10 (including 10) to fewer than 20.

PercentageClass interval Frequency frequency

0– 3 15

10– 5 25

20– 8 40

30–40 4 200

10 20 30 40

2

6F

requ

ency

% F

requ

ency

Score

8

4

10

30

40

20

Measures of centre include:

• mean (x) x =sum of all data valuesnumber of data values

• median the middle value when data are placed in order

The mode of a data set is the most common value.

Data can be symmetrical or skewed.

symmetrical

tail

positively skewed

tail

negatively skewed

Example 3 Presenting and analysing data

Twenty people were surveyed to find out how many times they use the internet in a week. The raw

data are listed.21, 19, 5, 10, 15, 18, 31, 40, 32, 25

11, 28, 31, 29, 16, 2, 13, 33, 14, 24

a Organise the data into a frequency table using class intervals of 10. Include a percentage

frequency column.

b Construct a histogram for the data, showing both the frequency and percentage frequency on the

one graph.

c Construct a stem-and-leaf plot for the data.

d Use your stem-and-leaf plot to find the median.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 10: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 619 619

SOLUTION EXPLANATION

a Class interval Frequency Percentage frequency0– 2 10

10– 8 40

20– 5 25

30– 4 20

40–50 1 5

Total 20 100

Calculate each percentage frequency by

dividing the frequency by the total (i.e. 20) and

multiplying by 100.

b Number of times the internet is accessed

02010 30 40 50

2

6

Fre

quen

cy

Per

cent

age

freq

uenc

y

Score

8

410

3040

20

Transfer the data from the frequency table to the

histogram. Axis scales are evenly spaced and the

histogram bar is placed across the boundaries of

the class interval. There is no space between the

bars.

c Stem Leaf0 2 51 0 1 3 4 5 6 8 92 1 4 5 8 93 1 1 2 34 0

3 | 1 means 31

Order the data in each leaf and also show a key

(e.g. 3 | 1 means 31).

d Median =19 + 21

2= 20 After counting the scores from the lowest value

(i.e. 2), the two middle values are 19 and 21, so

the median is the mean of these two numbers.

Exercise 9B

1 A number of families were surveyed to find the number of children in each. The results are shown

in this dot plot.

a How many families were surveyed?

b Find the mean number of children in the families surveyed.

c Find the median number of children in the families surveyed.

d Find the mode for the number of children in the families

surveyed.

e What percentage of the families have, at most, two children?

2 310

UNDE

RSTA

NDING

—1–2 2(a)UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 11: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

620 620 Chapter 9 Statistics

9B2 Complete these frequency tables.

a Class interval Frequency Percentage frequency0– 2

10– 1

20– 5

30–40 2

Total

b Class interval Frequency Percentage frequency80– 8

85– 23

90– 13

95–100

Total 50

UNDE

RSTA

NDING

3Example 3 The number of wins scored this season is given for 20 hockey teams. Here are the raw data.

4, 8, 5, 12, 15, 9, 9, 7, 3, 7,

10, 11, 1, 9, 13, 0, 6, 4, 12, 5

a Organise the data into a frequency table using class intervals of 5 and include a percentage

frequency column.

b Construct a histogram for the data, showing both the frequency and percentage frequency

on the one graph.

c Construct a stem-and-leaf plot for the data.

d Use your stem-and-leaf plot to find the median.

FLUE

NCY

3, 5, 63–6 3–6

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 12: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 621 621

9B4 This frequency table displays the way in which 40 people travel to and from work.

Type of transport Frequency Percentage frequencyCar 16

Train 6

Tram 8

Walking 5

Bicycle 2

Bus 3

Total 40

a Copy and complete the table.

b Use the table to find:

i the frequency of people who travel by train

ii the most popular form of transport

iii the percentage of people who travel by car

iv the percentage of people who walk or cycle to work

v the percentage of people who travel by public transport, including trains, buses and trams.

5 Describe each graph as symmetrical, positively skewed or negatively skewed.

7 8 965

a

Verylow

Veryhigh

Low Medium High

b

020 40 60 80

Score

Fre

quen

cy 6

3

9c Stem Leaf4 1 65 0 5 4 86 1 8 9 9 97 2 7 88 3 8

d

6 For the data in these stem-and-leaf plots, find:

the mean (rounded to one decimal place)ithe medianiithe modeiii

Stem Leaf2 1 3 73 2 8 9 94 4 6

3 | 2 means 32

a Stem Leaf0 41 0 4 92 1 7 83 2

2 | 7 means 27

b

FLUE

NCY

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 13: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

622 622 Chapter 9 Statistics

9B7 Two football players, Nick and Jayden, compare their personal tallies of the number of goals scored

for their team over a 12-match season. Their tallies are as follows.

Game 1 2 3 4 5 6 7 8 9 10 11 12

Nick 0 2 2 0 3 1 2 1 2 3 0 1

Jayden 0 0 4 1 0 5 0 3 1 0 4 0

a Draw a dot plot to display Nick’s goal-scoring achievement.

b Draw a dot plot to display Jayden’s goal-scoring achievement.

c How would you describe Nick’s scoring habits?

d How would you describe Jayden’s scoring habits?

8 Three different electric sensors, A, B andC, are used to detect movement inHarvey’s backyard over

a period of 3 weeks. An in-built device counts the number of times the sensor detects movement

each night. The results are as follows.

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Sensor A 0 0 1 0 0 1 1 0 0 2 0 0 0 0 0 1 1 0 0 1 0

Sensor B 0 15 1 2 18 20 2 1 3 25 0 0 1 15 8 9 0 0 2 23 2

Sensor C 4 6 8 3 5 5 5 4 8 2 3 3 1 2 2 1 5 4 0 4 9

a Using class intervals of 3 and starting

at 0, draw up a frequency table for each

sensor.

b Draw histograms for each sensor.

c Given that it is known that stray cats

consistently wander into Harvey’s

backyard, how would you describe the

performance of:

sensor A?isensor B?iisensor C?iii

9 This tally records the number of mice that were weighed and categorised into particular mass

intervals for a scientific experiment.

Construct a table using these column headings: Mass,

Frequency and Percentage frequency.

a

Find the total number of mice weighed in the

experiment.

b

State the percentage of mice that were in the 20– gram

interval.

c

Mass (grams) Tally10– |||

15– �|||| |20– �|||| �|||| �|||| |25– �|||| �|||| �|||| �|||| |30–35 ||||

Which was the most common weight interval?dWhat percentage of mice were in the most common mass interval?eWhat percentage of mice had a mass of 15 grams or more?f

PROB

LEM-SOLVING

9, 107, 8 8, 9

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 14: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 623 623

9B10 A school symphony orchestra contains four musical

sections: strings, woodwind, brass and percussion. The

number of students playing in each section is summarised

in this tally.

Construct and complete a percentage frequency table

for the data.

a

What is the total number of students in the school

orchestra?

b

Section TallyString �|||| �|||| �|||| �|||| |Woodwind �|||| |||Brass �|||| ||Percussion ||||

What percentage of students play in the string

section?

c

What percentage of students do not play in the

string section?

d

If the number of students in the string section

increases by 3, what will be the percentage of

students who play in the percussion section?

Round y your answer to one decimal place.

e

What will be the percentage of students in the string section of the orchestra if the entire

woodwind section is absent? Round your answer to one decimal place.

f

PROB

LEM-SOLVING

11 This histogram shows the distribution of test scores for a class. Explain why the percentage

of scores in the 20–30 range is 25%.

010 20 30 40

Score50

2

6

Fre

quen

cy 8

4

10

12

12 Explain why the exact value of the mean, median and mode cannot be determined directly

from a histogram.

13 State the possible values of a, b and c in this ordered stem-and-leaf plot.Stem Leaf

3 2 3 a 74 b 4 8 9 95 0 1 4 9 c6 2 6

REAS

ONING

12, 1311 11, 12

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 15: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

624 624 Chapter 9 Statistics

9B Cumulative frequency curves and percentiles

14 Cumulative frequency is obtained by adding a frequency to the total of its predecessors.

It is sometimes referred to as a ‘running total’.

Percentage cumulative frequency =cumulative frequency

total number of data elements× 100

A cumulative frequency graph is one in which the heights of the columns are proportional to

the corresponding cumulative frequencies.

The points in the upper right-hand corners of these rectangles join to form a smooth curve

called the cumulative frequency curve.

If a percentage scale is added to the vertical axis, the same graph can be used as a percentage

cumulative frequency curve, which is convenient for the reading of percentiles.

PercentageNumber of Cumulative cumulativehours Frequency frequency frequency0– 1 1 3.3

4– 4 5 16.7

8– 7 12 40.0

12– 12 24 80.0

16– 3 27 90.0

20– 2 29 96.7

24–28 1 30 100.0

5

4 89.5 13

12 16

80th percentile25th percentile

50th percentile

20 24 28 Hours

25

50

75

100P

erce

ntag

e cu

mul

ativ

efr

eque

ncy

10

20

30

Cumulative frequency

15

25

The following information relates to the amount, in dollars, of winter gas bills for houses

in a suburban street.

Amount ($) Frequency Cumulative frequency Percentage cumulative frequency0– 2

40– 1

80– 12

120– 18

160– 3

200–240 1

a Copy and complete the table. Round the percentage cumulative frequency to one

decimal place.

b Find the number of houses that have gas bills of less than $120.

c Construct a cumulative frequency curve for the gas bills.

d Estimate the following percentiles.

50thi 20thii 80thiii

e In this street, 95% of households pay less than what amount?

f What percentage of households pay less than $100?

ENRICH

MEN

T

14— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 16: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 625 625

Using calculators to graph grouped data

1 Enter the following data in a list called data and find the mean and median.

21, 34, 37, 24, 19, 11, 15, 26, 43, 38, 25, 16, 9, 41, 36, 31, 24, 21, 30, 39, 17

2 Construct a histogram using intervals of 3 and percentage frequency for the

data above.

Using the TI-Nspire: Using the ClassPad:

1 Go to a Lists and spreadsheets page and enter thedata into list A. Select menu, Statistics, StatCalculations, One-Variable Statistics. Press enter tofinish and scroll to view the statistics.

2 Go to a Data and Statistics page and select the datavariable for the horizontal axis. Select menu, PlotType, Histogram. Then select menu, Plot Properties,Histogram Properties, Bin Settings. Choose the Widthto be 3 and Alignment to be 0. Drag the scale to suit.Select menu, Plot Properties, Histogram Properties,Histogram Scale to receive the percentage frequency.

1 In the Statistics application enter the data into list1.Tap Calc, One-Variable and then OK. Scroll to view thestatistics.

2 Tap SetGraph, ensure StatGraph1 is ticked and thentap Setting. Change the Type to Histogram, set XList

to list1, Freq to 1 and then tap on Set. Tap and setHStart to 9 and HStep to 3.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 17: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

626 626 Chapter 9 Statistics

9C Summary statisticsIn addition to the median of a single set of data, there are

two related statistics called the upper and lower quartiles.

When data are placed in order, then the lower quartile is

central to the lower half of the data and the upper quartile

is central to the upper half of the data. These quartiles are

used to calculate the interquartile range, which helps to

describe the spread of the data, and determine whether or

not any data points are outliers.

Let’s start: House prices

A real estate agent tells you that the median house price for a suburb in 2015 was $753 000 and the mean

was $948 000.

• Is it possible for the median and the mean to differ by so much?

• Under what circumstances could this occur? Discuss.

Keyideas

Five-figure summary• Minimum value (min): the minimum value

• Lower quartile (Q1): the number above 25% of the ordered data

• Median (Q2): the middle value above 50% of the ordered data

• Upper quartile (Q3): the number above 75% of the ordered data

• Maximum value (max): the maximum value

IQR (4.5)

Odd number

1 2 2 3 5 6 6 7 9

↓ ↓ ↓Q1 (2) Q2 (5) Q3 (6.5)

{ {()Even number

2 3 3 4 7 8 8 9 9 9

↓ ↓ ↓Q1 (3) Q2 (7.5) Q3 (9)

IQR (6)

Measures of spread• Range = max value – min value

• Interquartile range (IQR)

IQR = upper quartile – lower quartile

= Q3 – Q1• The standard deviation is discussed in section 9E.

Outliers are data elements outside the vicinity of the rest of the data. More formally, a data point

is an outlier when it is below the lower fence (i.e. lower limit) or above the upper fence (i.e.

upper limit).

• Lower fence = Q1 – 1.5 × IQR or

• Upper fence = Q3 + 1.5 × IQR

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 18: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 627 627

Example 4 Finding the range and IQR

Determine the range and IQR for these data sets by finding the five-figure summary.

a 2, 2, 4, 5, 6, 8, 10, 13, 16, 20

b 1.6, 1.7, 1.9, 2.0, 2.1, 2.4, 2.4, 2.7, 2.9

SOLUTION EXPLANATION

a Range = 20 – 2 = 18

2 2 4 5 6 8 10 13 16 20

↑ ↑ ↑

Q1 Q2 (7) Q3

Q2 = 7, so Q1 = 4 and Q3 = 13.

IQR = 13 – 4 = 9 IQR = Q3 – Q1

Range = max – min

First, split the ordered data in half to locate the

median, which is6 + 82

= 7.

Q1 is the median of the lower half and Q3 is

the median of the upper half.

b Range = 2.9 – 1.6 = 1.3

1.6 1.7 1.9 2.0 2.1 2.4 2.4 2.7 2.9

Q1 =1.7 + 1.9

2= 1.8

Q3 =2.4 + 2.7

2= 2.55

IQR = 2.55 – 1.8 = 0.75

Max = 2.9, min = 1.6

1.6 1.7 1.9 2.0 2.1 2.4 2.4 2.7 2.9

↑ ↑ ↑Q1 Q2 Q3

Leave the median out of the upper and lower

halves when locating Q1 and Q3.

Example 5 Finding the five-figure summary and outliers

The following data set represents the number of flying geese spotted on each day of a 13-day tour

of England.

5, 1, 2, 6, 3, 3, 18, 4, 4, 1, 7, 2, 4

a For the data, find:

the minimum and maximum number of geese spottedi

the medianii

the upper and lower quartilesiii

the IQRiv

any outliers by determining the lower and upper fences.v

b Can you give a possible reason for why the outlier occurred?

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 19: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

628 628 Chapter 9 Statistics

SOLUTION EXPLANATION

a i Min = 1, max = 18

ii 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 6, 7, 18

∴ Median = 4

iii Lower quartile =2 + 22

= 2

Upper quartile =5 + 62

= 5.5

iv IQR = 5.5 – 2

= 3.5

v Lower fence = Q1 – 1.5 × IQR

= 2 – 1.5 × 3.5

= –3.25

Upper fence = Q3 + 1.5 × IQR

= 5.5 + 1.5 × 3.5

= 10.75∴ The outlier is 18.

Look for the largest and smallest numbers

and order the data:

1 1 2 2 3 3 (4 4 4 5 6 7 18

↑ ↑ ↑Q1 Q2 Q3

)

Since Q2 falls on a data value, it is not

included in the lower or higher halves when

Q1 and Q3 are calculated.

IQR = Q3 – Q1

A data point is an outlier when it is

less than Q1 – 1.5 × IQR or greater than

Q3 + 1.5 × IQR.

There are no numbers less than –3.25 but 18

is greater than 10.75.

b Perhaps a flock of geese was spotted that day.

Exercise 9C

1 a State the types of values that must be calculated for a five-figure summary.

b Explain the difference between the range and the interquartile range.

c What is an outlier?

d How do you determine if a score in a single data set is an outlier?

2 This data set shows the number of cars in 13 families surveyed.

1, 4, 2, 2, 3, 8, 1, 2, 2, 0, 3, 1, 2

a Arrange the data in ascending order.

b Find the median (i.e. the middle value).

c By first removing the middle value, determine:

the lower quartile Q1 (middle of lower half)ithe upper quartile Q3 (middle of upper half).ii

d Determine the interquartile range (IQR).

e Calculate Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.

f Are there any values that are outliers (numbers

below Q1 – 1.5 × IQR or above Q3 + 1.5 × IQR)?

UNDE

RSTA

NDING

—1–3 3

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 20: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 629 629

9C3 The number of ducks spotted in eight different flocks are given in this data set.

2, 7, 8, 10, 11, 11, 13, 15

a i Find the median (i.e. average of the middle two numbers).

ii Find the lower quartile (i.e. middle of the smallest four numbers).

iii Find the upper quartile (i.e. middle of the largest four numbers).

b Determine the IQR.

c Calculate Q1 – 1.5 × IQR and Q3 + 1.5 × IQR.

d Are there any outliers (i.e. numbers below Q1 – 1.5 × IQR or above Q3 + 1.5 × IQR)?

UNDE

RSTA

NDING

4Example 4 Determine the range and IQR for these data sets by finding the five-figure summary.

3, 4, 6, 8, 8, 10, 13a 10, 10, 11, 14, 14, 15, 16, 18b1.2, 1.8, 1.9, 2.3, 2.4, 2.5, 2.9, 3.2, 3.4c 41, 49, 53, 58, 59, 62, 62, 65, 66, 68d

5 Determine the median and mean of the following sets of data. Round to one decimal

place where necessary.

6, 7, 8, 9, 10a 2, 3, 4, 5, 5, 6b 4, 6, 3, 7, 3, 2, 5c

6Example 5 The following numbers of cars, travelling on a quiet suburban street, were counted on each

day for 15 days.

10, 9, 15, 14, 10, 17, 15, 0, 12, 14, 8, 15, 15, 11, 13

For the given data, find:

the minimum and maximum number of cars countedathe medianbthe lower and upper quartilescthe IQRdany outliers by determining the lower and upper fencesea possible reason for the outlier.f

FLUE

NCY

4–5(½), 6, 74–5(½), 6 4–5(½), 6, 7

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 21: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

630 630 Chapter 9 Statistics

9C7 Summarise the data sets below by finding:

the minimum and maximum valuesi the median (Q2)iithe lower and upper quartiles (Q1 and Q3)iii the IQRivany outliers.v

4, 5, 10, 7, 5, 14, 8, 5, 9, 9a 24, 21, 23, 18, 25, 29, 31, 16, 26, 25, 27b

FLUE

NCY

8 Twelve different calculators had the following numbers of

buttons.

36, 48, 52, 43, 46, 53, 25, 60, 128, 32, 52, 40a For the given data, find:

the minimum and maximum number of buttons on

the calculators

i

the medianiithe lower and upper quartilesiiithe IQRivany outliersvthe mean.vi

b Which is a better measure of the centre of the data,

the mean or the median? Explain.

c Can you give a possible reason why the outlier has

occurred?

9 Using the definition of an outlier, decide whether or not any outliers exist in the following

sets of data. If so, list them.

a 3, 6, 1, 4, 2, 5, 9, 8, 6, 3, 6, 2, 1

b 8, 13, 12, 16, 17, 14, 12, 2, 13, 19, 18, 12, 13

c 123, 146, 132, 136, 139, 141, 103, 143, 182, 139, 127, 140

d 2, 5, 5, 6, 5, 4, 5, 6, 7, 5, 8, 5, 5, 4

10 For the data in this stem-and-leaf plot, find:

a the IQR

b any outliers

c the median if the number 37 is added to the list

d the median if the number 22 is added to the list instead of 37.

Stem Leaf0 11 6 82 0 4 63 2 3

2 | 4 means 24

11 Three different numbers have median 2 and range 2. Find the three numbers.

PROB

LEM-SOLVING

10, 118, 9 9, 10

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 22: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 631 631

9C

12 Explain what happens to the mean of a data set if all the values are:

increased by 5a multiplied by 2b divided by 10.c

13 Explain what happens to the IQR of a data set if all values are:

increased by 5a multiplied by 2b divided by 10.c

14 Give an example of a small data set that satisfies the following.

median = meana median = upper quartileb range = IQRc

15 Explain why, in many situations, the median is preferred to the mean as a measure of centre.

REAS

ONING

13–1512 12, 13

Some research

16 Use the internet to search for data about a topic that interests you. Try to choose a single set

of data that includes between 15 and 50 values.a Organise the data using:

a stem-and-leaf ploti a frequency table and histogram.iib Find the mean and the median.

c Find the range and the interquartile range.

d Write a brief report describing the centre and spread of the data, referring to parts a to c above.

e Present your findings to your class or a partner.

ENRICH

MEN

T

16— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 23: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

632 632 Chapter 9 Statistics

9D Box plotsThe five-figure summary (min, Q1, Q2, Q3, max) can be represented in graphical form as a box plot.

Box plots are graphs that summarise single data sets. They clearly display the minimum and

maximum values, the median, the quartiles and any outliers. Box plots also give a clear indication

of how data are spread, as the IQR is shown by the width of the central box.

Let’s start: Fuel consumption

This parallel box plot summarises the average

fuel consumption (litres per 100 km) for a

group of Australian-made and European-made cars.

64 10Fuel consumption L/100 km

12 14 168

Australian cars

European cars

• What do the box plots say about how the fuel consumption compares between Australian-made and

European-made cars?

• What does each part of the box plot represent?

• What do you think the cross (

Statistics and Probability 3 3

9A Box plotsThe ive-igure summary (min, Q1, Q2, Q3, max) can be represented in graphical form as a box plot.

Box plots are graphs that summarise single data sets. They clearly display the minimum and

maximum values, the median, the quartiles and any outliers. Box plots also give a clear indication

of how data are spread, as the IQR is shown by the width of the central box.

Let’s start: Fuel consumption

This parallel box plot summarises the average

fuel consumption (litres per 100 km) for a

group of Australian-made and European-made cars.

What do the box plots say about how the fuel consumption compares between Australian-made and

European-made cars?

What does each part of the box plot represent?

What do you think the cross () represents on the European cars box plot?

Keyideas

A box plot (also called a box-and-whisker plot) can be used to summarise a data set.

The number of data values in each quarter (25%) are approximately equal.

An outlier is marked with a cross ( × ).

An outlier is greater than Q3 + 1.5 × IQR or less than Q1 – 1.5 × IQR.

The whiskers stretch to the lowest and highest data values that are not outliers.

Parallel box plots are two or more box plots drawn on the same scale. They are used to compare

data sets within the same context.

) represents on the European cars box plot?

Keyideas

A box plot (also called a box-and-whisker plot) can be used to summarise a data set.

• The number of data values in each quarter (25%) are approximately equal.

Minimumvalue

Q2 Q3 Maximumvalue

Whisker

Q1

25 % 25 % 25 % 25 %

Whisker Box

An outlier is marked with a cross (

Statistics and Probability 3 3

9A Box plotsThe ive-igure summary (min, Q1, Q2, Q3, max) can be represented in graphical form as a box plot.

Box plots are graphs that summarise single data sets. They clearly display the minimum and

maximum values, the median, the quartiles and any outliers. Box plots also give a clear indication

of how data are spread, as the IQR is shown by the width of the central box.

Let’s start: Fuel consumption

This parallel box plot summarises the average

fuel consumption (litres per 100 km) for a

group of Australian-made and European-made cars.

What do the box plots say about how the fuel consumption compares between Australian-made and

European-made cars?

What does each part of the box plot represent?

What do you think the cross () represents on the European cars box plot?

Keyideas

A box plot (also called a box-and-whisker plot) can be used to summarise a data set.

The number of data values in each quarter (25%) are approximately equal.

An outlier is marked with a cross ( × ).

An outlier is greater than Q3 + 1.5 × IQR or less than Q1 – 1.5 × IQR.

The whiskers stretch to the lowest and highest data values that are not outliers.

Parallel box plots are two or more box plots drawn on the same scale. They are used to compare

data sets within the same context.

).

• An outlier is greater than Q3 + 1.5 × IQR or less than Q1 – 1.5 × IQR.

• The whiskers stretch to the lowest and highest data values that are not outliers.

Minimumvalue

Q2 Q3 OutlierQ1

Parallel box plots are two or more box plots drawn on the same scale. They are used to compare

data sets within the same context.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 24: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 633 633

Example 6 Constructing box plots

Consider the given data set:

5, 9, 4, 3, 5, 6, 6, 5, 7, 12, 2, 3, 5

a Determine whether any outliers exist by first finding Q1 and Q3.

b Draw a box plot to summarise the data, marking outliers if they exist.

SOLUTION EXPLANATION

a2 3 3 4 5 5 5 5 6 6 7 9 12

↑ ↑ ↑Q1 Q2 Q3

Q13 4

23 5

=

= 3 5

Q16 7

26 5

= 6 7

= 6 5

{ {) (

∴ IQR = 6.5 – 3.5

= 3

Q1 – 1.5 × IQR = 3.5 – 1.5 × 3

= –1

Q3 + 1.5 × IQR = 6.5 + 1.5 × 3

= 11∴ 12 is an outlier.

Order the data to help find the quartiles.

Locate the median Q2 then split the data in

half above and below this value.

Q1 is the middle value of the lower half and

Q3 the middle value of the upper half.

Determine IQR = Q3 – Q1.

Check for any outliers; i.e. values below

Q1 – 1.5 × IQR or above Q3 + 1.5 × IQR.

There are no data values below –1 but 12 > 11.

b

4 52 3 8 9 10 11 126 7

Draw a line and mark in a uniform scale

reaching from 2 to 12. Sketch the box plot by

marking the minimum 2 and the outlier 12

and Q1, Q2 and Q3. The end of the five-point

summary is the nearest value below 11; i.e. 9.

Exercise 9D

1 For this simple box plot, find:

105 20 2515the median (Q2)a the minimumbthe maximumc the rangedthe lower quartile (Q1)e the upper quartile (Q3)fthe interquartile range (IQR).g

UNDE

RSTA

NDING

—1–3 2UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 25: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

634 634 Chapter 9 Statistics

9D2 Complete the following for this box plot.

40 12 16 208Find the IQR.a Calculate Q1 – 1.5 × IQR.bCalculate Q3 + 1.5 × IQR.c State the value of the outlier.dCheck that the outlier is greater than Q3 + 1.5 × IQR.e

3 Construct a box plot that shows these features.

a min = 1, Q1 = 3, Q2 = 4, Q3 = 7, max = 8

b outlier = 5, minimum above outlier = 10, Q1 = 12, Q2 = 14, Q3 = 15, max = 17

UNDE

RSTA

NDING

4Example 6 Consider the data sets below.

i Determine whether any outliers exist by first finding Q1 and Q3.

ii Draw a box plot to summarise the data, marking outliers if they exist.

a 4, 6, 5, 2, 3, 4, 4, 13, 8, 7, 6

b 1.8, 1.7, 1.8, 1.9, 1.6, 1.8, 2.0, 1.1, 1.4, 1.9, 2.2

c 21, 23, 18, 11, 16, 19, 24, 21, 23, 22, 20, 31, 26, 22

d 0.04, 0.04, 0.03, 0.03, 0.05, 0.06, 0.07, 0.03, 0.05, 0.02

5 First, find Q1, Q2 and Q3 and then draw box plots for the given data sets. Remember to find

outliers and mark them on your box plot if they exist.

a 11, 15, 18, 17, 1, 2, 8, 12, 19, 15

b 37, 48, 52, 51, 51, 42, 48, 47, 39, 41, 65

c 0, 1, 5, 4, 4, 4, 2, 3, 3, 1, 4, 3

d 124, 118, 73, 119, 117, 120, 120, 121, 118, 122FLUE

NCY

4–5(½)4–5(½) 4–5(½)

6 Consider these parallel box plots, A and B.

A

B

5 71 3 9 15 17 19 21 2311 13

a What statistical measure do these box plots have in common?

b Which data set (A or B) has a wider range of values?

c Find the IQR for:

data set Ai data set B.ii

d How would you describe the main difference between the two sets of data from which the

parallel box plots have been drawn?

PROB

LEM-SOLVING

7, 86, 7 7, 8

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 26: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 635 635

9D7 The following masses, in kilograms, of 15 Madagascan lemurs are recorded

as part of a conservation project.

14.4, 15.5, 17.3, 14.6, 14.7

15.0, 15.8, 16.2, 19.7, 15.3

13.8, 14.6, 15.4, 15.7, 14.9

a Find Q1, Q2 and Q3.

b Which masses, if any, would be considered outliers?

c Draw a box plot to summarise the lemurs’ masses.

8 Two data sets can be compared using parallel box plots on the

same scale, as shown below.

A

B

1210 14 18 20 2216a What statistical measures do these box plots have in common?

b Which data set (A or B) has a wider range of values?

c Find the IQR for:

data set Ai data set B.ii

d How would you describe the main difference between the two sets of data from which the

parallel box plots have been drawn?

PROB

LEM-SOLVING

9 Select the box plot (A or B) that best matches the given dot plot or histogram.

1 2 3 4 5 6 7 3 42

A

1 75 6

Ba

4 5 6 7 8 9 6 75

A

4 8 9

Bb

1

010 20 30 40

Score

2

4

Fre

quen

cy

5

3

6

20 3010

A

0 40

B

c

REAS

ONING

10, 119 9, 10

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 27: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

636 636 Chapter 9 Statistics

9D10 Fifteen essays are marked for spelling errors by a particular examiner and the following

numbers of spelling errors are counted.

3, 2, 4, 6, 8, 4, 6, 7, 6, 1, 7, 12, 7, 3, 8

The same 15 essays are marked for spelling errors by a second examiner and the following

numbers of spelling errors are counted.

12, 7, 9, 11, 15, 5, 14, 16, 9, 11, 8, 13, 14, 15, 13

a Draw parallel box plots for the data.

b Do you believe there is a major difference in the way the essays were marked by the two

examiners? If yes, describe this difference.

11 The results for a Year 12 class are to be

compared with the Year 12 results of the

school and the State, using the parallel box

plots shown.

a Describe the main differences between

the performance of:

i the class against the school

ii the class against the State

iii the school against the State.

20

Class

0 60Score %

80 10040

School

State

b Why is an outlier shown on the class box plot but not shown on the school box plot?

REAS

ONING

Creating your own parallel box plots

12 a Choose an area of study for which you can collect data easily, for example:

• heights or weights of students

• maximum temperatures over a weekly period

• amount of pocket money received each week for a group of students.

b Collect at least two sets of data for your chosen area of study – perhaps from two or three

different sources, including the internet.

Examples:

• Measure student heights in your class and from a second class in the same year level.

• Record maximum temperatures for 1 week and repeat for a second week to obtain a

second data set.

• Use the internet to obtain the football scores of two teams for each match in the

previous season.

c Draw parallel box plots for your data.

d Write a report on the characteristics of each data set and the similarities and differences

between the data sets collected.

ENRICH

MEN

T

12— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 28: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 637 637

Using calculators to draw box plots

1 Type these data into lists and define them as Test A and Test B.

Test A: 4, 6, 3, 4, 1, 3, 6, 4, 5, 3, 4, 3

Test B: 7, 3, 5, 6, 9, 3, 6, 7, 4, 1, 4, 6

2 Draw parallel box plots for the data.

Using the TI-Nspire: Using the ClassPad:

1 Go to a new Lists and spreadsheets page and enterthe data into the lists. Give each column a title.

2 Go to a new Data and Statistics page and select thetesta variable for the horizontal axis. Select menu, PlotType, Box Plot. Trace to reveal the statisticalmeasures. To show the box plot for testb, go to menu,Plot Properties, Add X Variable and click on testb.

1 In the Statistics application enter the data into the lists.Give each column a title.

2 Tap . For graph 1, set Draw to On, Type to Med-Box, XList to main\TestA and Freq to 1. For graph 2, setDraw to On, Type to MedBox, XList to main\TestB and

Freq to 1. Tap Set. Tap .

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 29: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

638 638 Chapter 9 Statistics

9E Standard deviation 10A

For a single data set we have already used the range and interquartile range to describe the spread of

the data. Another statistic commonly used to describe spread is standard deviation. The standard deviation

is a number that describes how far data values are from the mean. A data set with a relatively small

standard deviation will have data values concentrated about the mean, and if a data set has a relatively large

standard deviation then the data values will be more spread out from the mean.

The standard deviation can be calculated by hand but, given the tedious nature of the calculation,

technology can be used for more complex data sets. In this section technology is not required but you

will be able to find a function on your calculator (often denoted s) that can be used to find the

standard deviation.

Let’s start: Which is the better team?

These histograms show the number

of points scored by the Eagles and the

Monsters basketball teams in an 18-round

competition. The mean and standard

deviation are given for each team.

• Which team has the higher mean?

What does this say about the team’s

performance?

• Which team has the smaller standard

deviation? What does this say about

the team’s performance? Discuss.

2010 4030 6050

Eagles

Mean = 52Standard deviation = 21

8070 10090

1

3

Fre

quen

cy

Score

4

2

5

2010 4030 6050 8070

Monsters

Mean = 43Standard deviation = 11

10090

1

3

Score

4

2

5

6

7

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 30: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 639 639

Keyideas

The standard deviation is a number that describes the spread of data about the mean.

• The symbol used for standard deviation is s (sigma).

• The sample standard deviation is for a sample data set drawn from the population.

• If every data value from a population is used, then we calculate the population standard

deviation.

To calculate the sample standard deviation, follow these steps.

1 Find the mean (x̄).

2 Find the difference between each value and the mean (called the deviation).

3 Square each deviation.

4 Sum the squares of each deviation.

5 Divide by the number of data values less 1

(i.e. n – 1).

6 Take the square root.

s =

√(x1 – x̄)2 + (x2 – x̄)2 + . . . + (xn – x̄)2

(n – 1)

If the data represent the complete population, then divide by n instead of (n – 1). This would give

the population standard deviation. Dividing by (n – 1) for the sample standard deviation gives a

better estimate of the population standard deviation.

If data are concentrated about the mean, then the standard deviation is relatively small.

If data are spread out from the mean, then the standard deviation is relatively large.

In many common situations we can expect 95% of the data to be within two standard deviations

of the mean.

Example 7 Calculating the standard deviation

Calculate the mean and sample standard deviation for this small data set, correct to one decimal place.

2, 4, 5, 8, 9

SOLUTION EXPLANATION

x̄ =2 + 4 + 5 + 8 + 9

5

= 5.6

s =

√(2 – 5.6)2 + (4 – 5.6)2 + (5 – 5.6)2 + (8 – 5.6)2 + (9 – 5.6)2

5 – 1

Sum all the data values and divide

by the number of data values (i.e. 5)

to find the mean.

=

√(–3.6)2 + (–1.6)2 + (–0.6)2 + (2.4)2 + (3.4)2

4

= 2.9 (to 1 d.p.)

Sum the square of all the

deviations, divide by n – 1 (i.e. 4)

and then take the square root.

Deviation 1 is 2 – 5.6 = –3.6

Deviation 2 is 4 – 5.6 = –1.6 etc.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 31: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

640 640 Chapter 9 Statistics

Example 8 Interpreting the standard deviation

This back-to-back stem-and-leaf plot shows the distribution of distances that 17 people in Darwin

and Sydney travel to work. The means and standard deviations are given.

Darwin Sydney SydneyLeaf Stem Leaf x̄ = 27.9

s = 15.18 7 4 2 0 1 59 9 5 5 3 1 2 3 78 7 4 3 0 2 0 5 5 6 Darwin

5 2 2 3 2 5 9 9 x̄ = 19.0

4 4 4 6 s = 10.1

5 2

3 | 5 means 35 kmConsider the position and spread of the data and then answer the following.

a By looking at the stem-and-leaf plot, suggest why Darwin’s mean is less than that of Sydney.

b Why is Sydney’s standard deviation larger than that of Darwin?

c Give a practical reason for the difference in centre and spread for the data for Darwin and Sydney.

SOLUTION EXPLANATION

a The maximum score for Darwin is 35.

Sydney’s mean is affected by several values

larger than 35.

The mean depends on every value in the

data set.

b The data for Sydney are more spread out from

the mean. Darwin’s scores are more closely

clustered near its mean.

Sydney has more scores with a large distance

from its mean. Darwin’s scores are closer to

the Darwin mean.

c Sydney is a larger city and more spread out,

so people have to travel farther to get to work.

Higher populations often lead to larger cities

and longer travel distances.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 32: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 641 641

Exercise 9E

1 Insert the word smaller or larger into each sentence.

a If data are more spread out from the mean, then the standard deviation is .

b If data are more concentrated about the mean, then the standard deviation is .

2 Here are two dot plots, A and B.

3 4 521

A

3 4 521

B

a Which data set (A or B) would have the higher mean?

b Which data set (A or B) would have the higher standard deviation?

3 These dot plots show the results for a class of 15 students who sat tests A and B. Both sets of

results have the same mean and range.

3 4 5 6 7 8 9Mean

A

3 4 5 6Mean

7 8 9

B

a Which data set (A or B) would have the higher standard deviation?

b Give a reason for your answer in part a.

4 This back-to-back stem-and-leaf plot compares the

number of trees or shrubs in the backyards of homes

in the suburbs of Gum Heights and Oak Valley.

a Which suburb has the smaller mean number of

trees or shrubs? Do not calculate the actual means.

b Which suburb has the smaller standard deviation?

Gum Heights Oak ValleyLeaf Stem Leaf7 3 1 0

8 6 4 0 1 09 8 7 2 2 0 2 3 6 8 8 99 6 4 3 4 6 8 9

4 3 6

2 | 8 means 28

UNDE

RSTA

NDING

—1–4 3, 4

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 33: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

642 642 Chapter 9 Statistics

9E5Example 7 Calculate the mean and sample standard deviation for these small data sets. Round the

standard deviation to one decimal place where necessary.

3, 5, 6, 7, 9a 1, 1, 4, 5, 7b2, 5, 6, 9, 10, 11, 13c 28, 29, 32, 33, 36, 37d

6 Calculate the mean and sample standard deviation for the data in these graphs, correct to one

decimal place.

3 421

a Stem Leaf0 41 1 3 72 0 2

1 | 7 means 17

b

FLUE

NCY

5(½), 65(½), 6 5(½), 6

7Example 8 This back-to-back stem-and-leaf plot

shows the distribution of distances

travelled by students at an inner-city and

an outer-suburb school. The means and

standard deviations are given.

Consider the position and spread of the

data and then answer the following.a Why is the mean for the outer-suburb

Inner city Outer suburb Inner cityLeaf Stem Leaf x̄ = 10.6

9 6 4 3 1 1 0 3 4 9 s = 8.0

9 4 2 0 1 2 8 8 97 1 2 1 3 4 Outer suburb

3 4 x̄ = 18.8

4 1 s = 10.7

2 | 4 means 24school larger than that for the inner 1-1 city school?

b Why is the mean for the outer-suburb school larger than that for the inner-city school?

c Why is the standard deviation for the inner-city school smaller than that for the

outer-suburb school?

d Give a practical reason for the difference in centre and spread for the two schools.

8 Consider these two histograms, and then state whether the following are true or false.

100

20 30 40 50

1

34

2

5A

100

20 30 40 50

1

34

2

5B

a The mean for set A is greater than the mean for set B.

b The range for set A is greater than the range for set B.

c The standard deviation for set A is greater than the standard deviation for set B.

9 Find the mean and sample standard deviation for the scores in these frequency tables. Round

the standard deviations to one decimal place.

Score Frequency1 32 13 3

a Score Frequency4 15 46 3

b

PROB

LEM-SOLVING

8, 97, 8 7, 8

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 34: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 643 643

9E10 Two simple data sets, A and B, are identical except for the maximum value, which is an outlier

for set B.

A: 4, 5, 7, 9, 10

B: 4, 5, 7, 9, 20

a Is the range for set A equal to the range for set B?

b Is the mean for each data sets the same?

c Is the median for each data set the same?

d Would the standard deviation be affected by the outlier? Explain.

11 Data sets 1 and 2 have means x̄1 and x̄2, and standard deviations s1 and s2.

a If x̄1 > x̄2, does this necessarily mean that s1 > s2? Give a reason.

b If s1 < s2 does this necessarily mean that x̄1 < x̄2?

12 Data sets A and B each have 20 data values and are very similar except for an outlier in set A.

Explain why the interquartile range might be a better measure of spread than the range or

the standard deviation.

REAS

ONING

11, 1210 10, 11

Study scores

13 The Mathematics study scores (out of 100) for 50 students in a school are as listed.

71, 85, 62, 54, 37, 49, 92, 85, 67, 89

96, 44, 67, 62, 75, 84, 71, 63, 69, 81

57, 43, 64, 61, 52, 59, 83, 46, 90, 32

94, 84, 66, 70, 78, 45, 50, 64, 68, 73

79, 89, 80, 62, 57, 83, 86, 94, 81, 65

The mean (x̄) is 69.16 and the population standard deviation (s) is 16.0.

a Calculate:

x̄ + si x̄ – sii x̄ + 2siiix̄ – 2siv x̄ + 3sv x̄ – 3svi

b Use your answers from part a to find the percentage of students with a score within:

one standard deviation from the meanitwo standard deviations from the meaniithree standard deviations from the mean.iii

c i Research what it means when we say that the data are ‘normally distributed’. Give a

brief explanation.

ii For data that are normally distributed, find out what percentage of data are within one,

two and three standard deviations from the mean. Compare this with your results for

part b above.

ENRICH

MEN

T

13— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 35: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

644 644 Chapter 9 Statistics

Progress quiz

138pt9A What type of data would these survey questions generate?

a How many pets do you have?

b What is your favourite ice-cream flavour?

238pt9B A Year 10 class records the length of time (in minutes) each student takes to travel from home

to school. The results are listed here.

15 32 6 14 44 28 15 9 25 188 16 33 20 19 27 23 12 38 15

a Organise the data into a frequency table, using class intervals of 10. Include a percentage

frequency column.

b Construct a histogram for the data, showing both the frequency and percentage frequency

on the one graph.

c Construct a stem-and-leaf plot for the data.

d Use your stem-and-leaf plot to find the median.

338pt9C Determine the range and IQR for these data sets by finding the five-figure summary.

a 4, 9, 12, 15, 16, 18, 20, 23, 28, 32

b 4.2, 4.3, 4.7, 5.1, 5.2, 5.6, 5.8, 6.4, 6.6

438pt9C The following numbers of parked cars were counted in the school car park and adjacent street

each day at morning recess for 14 school days.

36, 38, 46, 30, 69, 31, 40, 37, 55, 34, 44, 33, 47, 42

a For the data, find:

i the minimum and maximum number of cars

ii the median

iii the upper and lower quartiles

iv the IQR

v any outliers.

b Can you give a possible reason for why the outlier occurred?

538pt9D The ages of a team of female gymnasts are given in this data set:

18, 23, 14, 28, 21, 19, 15, 32, 17, 18, 20, 13, 21

a Determine whether any outliers exist by first finding Q1 and Q3.

b Draw a box plot to summarise the data, marking outliers if they exist.

638pt9E

10A

Find the sample standard deviation for this small data set, correct to one decimal place.

2, 3, 5, 6, 9

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 36: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 645 645

9F Time-series dataA time series is a sequence of data values that

are recorded at regular time intervals. Examples

include temperature recorded on the hour, speed

recorded every second, population recorded every

year and profit recorded every month. A line

graph can be used to represent time-series data

and these can help to analyse the data, describe

trends and make predictions about the future.

Let’s start: Share price trends

A company’s share price is recorded at the end of

each month of the financial year, as shown in this

time-series graph.

• Describe the trend in the data at different times of

the year.

• At what time of year do you think the company

starts reporting bad profit results?

• Does it look like the company’s share price will

return to around $4 in the next year? Why?

Share price

0July October January April

Month

1

3

4

2

$

Keyideas

Time-series data are recorded at regular time intervals.

The graph or plot of a time series uses:

• time on the horizontal axis

• line segments connecting points on the graph.

Time

Var

iabl

e

If the time-series plot results in points being on or near a straight line,

then we say that the trend is linear.Downwardlinear trend

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 37: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

646 646 Chapter 9 Statistics

Example 9 Plotting and interpreting a time-series plot

The approximate population of an outback town is recorded from 1990 to 2005.

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Population 1300 1250 1250 1150 1000 950 900 950 800 700 800 750 850 950 800 850

a Plot the time series.

b Describe the trend in the data over the 16 years.

SOLUTION EXPLANATION

a

01990 1992 1994 1996

Time (year)

800Popu

lati

on

900

700

1000

1998 2000 2002 2004 2006

1100

1200

1300Use time on

the horizontal

axis. Break the

y-axis so as to not

include 0–700.

Join points with

line segments.

b The population declines steadily for the first 10 years. The population

rises and falls in the last 6 years, resulting in a slight upwards trend.

Interpret the

overall rise and

fall of the lines on

the graph.

Exercise 9F

1 Describe the following time-series plots as having a linear (i.e. straight-line trend),

non-linear trend (i.e. a curve) or no trend.

Var

iabl

e

Time

a

Var

iabl

e

Time

b

Var

iabl

e

Time

c

Var

iabl

e

Time

d

UNDE

RSTA

NDING

—1, 2 2

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 38: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 647 647

9F2 This time-series graph shows the temperature over

the course of an 8-hour school day.

a State the temperature at:

8 a.m.i 12 p.m.ii1 p.m.iii 4 p.m.iv

b What was the maximum temperature?

c During what times did the temperature:

stay the same?i decrease?ii

d Describe the general trend in the temperature

for the 8-hour school day.

0

8 a.m

.

10 a.

m.

12 p.

m.

2 p.m

.

Time

30

Tem

pera

ture

(°C

)

32

28

34

4 p.m

.

36

UNDE

RSTA

NDING

3Example 9 The approximate population of a small village is recorded from 2000 to 2010.

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Population 550 500 550 600 700 650 750 750 850 950 900

a Plot the time-series graph.

b Describe the general trend in the data over the

11 years.

c For the 11 years, what was the:

minimum population?imaximum population?ii

4 A company’s share price over 12 months is recorded in this table.

Month J F M A M J J A S O N D

Price ($) 1.30 1.32 1.35 1.34 1.40 1.43 1.40 1.38 1.30 1.25 1.22 1.23

a Plot the time-series graph. Break the y-axis to exclude values from $0 to $1.20.

b Describe the way in which the share price has changed over the 12 months.

c What is the difference between the maximum and minimum share price in the 12 months?

5 The pass rate (%) for a particular examination is given in a table over 10 years.

Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Pass rate (%) 74 71 73 79 85 84 87 81 84 83

a Plot the time-series graph for the 10 years.

b Describe the way in which the pass rate for the examination has changed in the given

time period.

c In what year was the pass rate a maximum?

d By how much had the pass rate improved from 1995 to 1999?

FLUE

NCY

3, 53, 4 3–5

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 39: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

648 648 Chapter 9 Statistics

9F6 This time-series plot shows the upwards trend of house prices in an Adelaide suburb over

7 years from 2008 to 2014.

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

300

500

Pri

ce (

$’00

0)

600

400

700

a Would you say that the general trend in house prices is linear or non-linear?

b Assuming the trend in house prices continues for this suburb, what would you expect the

house price to be in:

2015?i 2017?ii

7 The two top-selling book stores for a company list their sales figures for the first 6 months

of the year. Sales amounts are in thousands of dollars.July August September October November December

City Central ($’000) 12 13 12 10 11 13

Southbank ($’000) 17 19 16 12 13 9

a What was the difference in the sales volume for:

August?i December?ii

b In how many months did the City Central store sell more books than the Southbank store?

c Construct a time-series plot for both stores on the same set of axes.

d Describe the trend of sales for the 6 months for:

City Centrali Southbankii

e Based on the trend for the sales for the Southbank store, what would you expect the

approximate sales volume to be in January?

8 Two pigeons (Green Tail and Blue Crest) each have a

beacon that communicates with a recording machine.

The distance of each pigeon from the machine is

recorded every hour for 8 hours.

a State the distance from the machine at 3 p.m. for:

Blue Cresti Green Tailii

b Describe the trend in the distance from the

recording machine for:

Blue Cresti Green Tailii

c Assuming that the given trends continue, predict

the time when the pigeons will be the same

distance from the recording machine.1 p

.m.

2 p.m

.

3 p.m

.

4 p.m

.

5 p.m

.

6 p.m

.

7 p.m

.

8 p.m

.

Time (hours)

1

3Dis

tanc

e (k

m)

4

2

5

6

78

Green Tail

Blue Crest

PROB

LEM-SOLVING

7, 86, 7 6, 7

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 40: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 649 649

9F9 The average monthly maximum temperature for a city is illustrated in this graph.

0

Jan

Feb

Month

15

Ave

rage

tem

p. (

°C)

20

10

25

Apr

il

Mar

June

May

Aug

July

Oct

Sep

Dec

Nov

5

30

a Explain why the average maximum temperature for December is close to the average

maximum temperature for January.

b Do you think this graph is for an Australian city?

c Do you think the data are for a city in the Northern Hemisphere or the Southern Hemisphere?

Give a reason.

10 The balance of an investment account is shown in this

time-series plot.

a Describe the trend in the account balance over the

7 years.

b Give a practical reason for the shape of the curve

that models the trend in the graph.

11 A drink at room temperature is placed in a fridge that

is at 4◦C.

01 2 3 4 5 6 7

Year

15

Bal

ance

($’

000)

20

10

25

5

a Sketch a time-series plot that might show the temperature of the drink after it has been

placed in the fridge.

b Would the temperature of the drink ever get to 3◦C? Why?

REAS

ONING

10, 119 9, 10

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 41: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

650 650 Chapter 9 Statistics

9F Moving run average

12 In this particular question, a moving average is determined by calculating the average of all

data values up to a particular time or place in the data set.

Consider a batsman in cricket with the following runs scored from 10 completed innings.

Innings 1 2 3 4 5 6 7 8 9 10

Score 26 38 5 10 52 103 75 21 33 0

Moving average 26 32 23

a Complete the table by calculating the moving average for innings 4–10. Round to the

nearest whole number where required.

b Plot the score and moving averages for the batter on the same set of axes.

c Describe the behaviour of the:

score graphimoving average graph.ii

d Describe the main difference in the behaviour of the two graphs. Give reasons.

ENRICH

MEN

T

12— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 42: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 651 651

9G Bivariate data and scatter plotsWhen we collect information about two variables in a given context,

we are collecting bivariate data. As there are two variables involved in

bivariate data, we use a number plane to graph the data. These graphs are

called scatter plots and are used to illustrate a relationship that may exist

between the variables. Scatter plots make it very easy to see the strength

of the association between the two variables.

Weight

Hei

ght

Let’s start: A relationship or not?

Consider the two variables in each part below.

• Would you expect there to be some relationship between the two variables in each of these cases?

• If you think a relationship exists, would you expect the second listed variable to increase or to decrease

as the first variable increases?

a Height of person and Weight of person

b Temperature and Life of milk

c Length of hair and IQ

d Depth of topsoil and Brand of motorcycle

e Years of education and Income

f Spring rainfall and Crop yield

g Size of ship and Cargo capacity

h Fuel economy and CD track number

i Amount of traffic and Travel time

j Cost of 2 litres of milk and Ability to swim

k Background noise and Amount of work completed

Keyideas

Bivariate data include data for two variables.

• The two variables are usually related; for example, height and weight.

A scatter plot is a graph on a number plane in which the axes variables correspond to the two

variables from the bivariate data.

The words relationship, correlation and association are used to describe the way in which

variables are related.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 43: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

652 652 Chapter 9 Statistics

Keyideas

Types of correlation:

Examples

Strong positive correlation Weak negative correlation No correlation

An outlier can clearly be identified as a data point that is isolated from the rest of the data.

outliery

x

Example 10 Constructing and interpreting scatter plots

Consider this simple bivariate data set.

x 13 9 2 17 3 6 8 15

y 2.1 4.0 6.2 1.3 5.5 0.9 3.5 1.6

a Draw a scatter plot for the data.

b Describe the correlation between x and y as positive or negative.

c Describe the correlation between x and y as strong or weak.

d Identify any outliers.

SOLUTION EXPLANATION

a y

xO

1

3

4

2

5

6

7

2015105

Plot each point using a × symbol on graph paper.

b negative correlation As x increases, y decreases.

c strong correlation The downwards trend in the data is clearly defined.

d The outlier is (6, 0.9). This point defies the trend.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 44: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 653 653

Exercise 9G

1 Decide if it is likely for there to be a strong correlation between these pairs of variables.

a Height of door and Thickness of door handle

b Weight of car and Fuel consumption

c Temperature and Length of phone calls

d Size of textbook and Number of textbook

e Diameter of flower and Number of bees

f Amount of rain and Size of vegetables in the vegetable garden

2 For each of the following sets of bivariate data with variables x and y:

Draw a scatter plot by hand.iDecide whether y generally increases or decreases as x increases.ii

a x 1 2 3 4 5 6 7 8 9 10

y 3 2 4 4 5 8 7 9 11 12

b x 0.1 0.3 0.5 0.9 1.0 1.1 1.2 1.6 1.8 2.0 2.5

y 10 8 8 6 7 7 7 6 4 3 1

UNDE

RSTA

NDING

—1, 2(a) 2(b)

3Example 10 Consider this simple bivariate data set. (Use technology to assist if desired. See page 648.)

x 1 2 3 4 5 6 7 8

y 1.0 1.1 1.3 1.3 1.4 1.6 1.8 1.0

a Draw a scatter plot for the data.

b Describe the correlation between x and y as positive or negative.

c Describe the correlation between x and y as strong or weak.

d Identify any outliers.

4 Consider this simple bivariate data set. (Use technology to assist if desired. See page 648.)

x 14 8 7 10 11 15 6 9 10

y 4 2.5 2.5 1.5 1.5 0.5 3 2 2

a Draw a scatter plot for the data.

b Describe the correlation between x and y as positive or negative.

c Describe the correlation between x and y as strong or weak.

d Identify any outliers.

FLUE

NCY

3, 5(c), 63, 4, 5(a), 6 3, 5(b), 6

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 45: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

654 654 Chapter 9 Statistics

9G5 By completing scatter plots (by hand or using technology) for each of the following data sets,

describe the correlation between x and y as positive, negative or none.

a x 1.1 1.8 1.2 1.3 1.7 1.9 1.6 1.6 1.4 1.0 1.5

y 22 12 19 15 10 9 14 13 16 23 16

b x 4 3 1 7 8 10 6 9 5 5

y 115 105 105 135 145 145 125 140 120 130

c x 28 32 16 19 21 24 27 25 30 18

y 13 25 22 21 16 9 19 25 15 12

6 For the following scatter plots, describe the correlation between x and y.

y

x

a b

y

x

c y

x

d

FLUE

NCY

7 For common motor vehicles, consider the two variables Engine size (cylinder volume) and Fuel

economy (number of kilometres travelled for every litre of petrol).

a Do you expect there to be some relationship between these two variables?

b As the engine size increases, would you expect the fuel economy to increase or decrease?

c The following data were collected for 10 vehicles.

Car A B C D E F G H I JEngine size 1.1 1.2 1.2 1.5 1.5 1.8 2.4 3.3 4.2 5.0

Fuel economy 21 18 19 18 17 16 15 20 14 11

PROB

LEM-SOLVING

8, 107, 8 8, 9

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 46: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 655 655

9GDo the data generally support your answers to parts a and b?iWhich car gives a fuel economy reading that does not support the general trend?ii

8 A tomato grower experiments with a new organic fertiliser and sets up five separate garden beds:

A, B, C, D and E. The grower applies different amounts of fertiliser to each bed and records the

diameter of each tomato picked.

The average diameter of a tomato from each garden bed and the corresponding amount of

fertiliser are recorded below.

Bed A B C D EFertiliser (grams per week) 20 25 30 35 40

Average diameter (cm) 6.8 7.4 7.6 6.2 8.5

a Draw a scatter plot for the data with ‘Diameter’ on the vertical axis and ‘Fertiliser’ on the

horizontal axis. Label the points A, B, C, D and E.

b Which garden bed appears to go against the trend?

c According to the given results, would you be confident in saying that the amount of fertiliser

fed to tomato plants does affect the size of the tomato produced?

PROB

LEM-SOLVING

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 47: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

656 656 Chapter 9 Statistics

9G9 In a newspaper, the number of photos and number of words were counted for 15 different pages.

Here are the results.

Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Number of photos 3 2 1 2 6 4 5 7 4 5 2 3 1 0 1

Number of words 852 1432 1897 1621 912 1023 817 436 1132 1201 1936 1628 1403 2174 1829

a Sketch a scatter plot using ‘Number of photos’ on the horizontal axis and ‘Number of words’

on the vertical axis.

b From your scatter plot, describe the general relationship between the number of photos and

the number of words per page. Use the words positive, negative, strong correlation or weak

correlation.

10 On 14 consecutive days, a local council measures the

volume of sound heard from a freeway at

various points in a local suburb. The volume of sound,

in decibels, is recorded against the

distance (in metres) between the freeway and the point

in the suburb.

FPO

Distance (m) 200 350 500 150 1000 850 200 450 750 250 300 1500 700 1250

Volume (dB) 4.3 3.7 2.9 4.5 2.1 2.3 4.4 3.3 2.8 4.1 3.6 1.7 3.0 2.2

a Draw a scatter plot of Volume against Distance, plotting Volume on the vertical axis and

Distance on the horizontal axis.

b Describe the correlation between Distance and Volume as positive, negative or none.

c Generally, as Distance increases does Volume increase or decrease?

PROB

LEM-SOLVING

11 A government department is interested in convincing the electorate that a larger number of police

on patrol leads to a lower crime rate. Two separate surveys are completed over a one-week

period and the results are listed in this table.

Area A B C D E F GNumber of police 15 21 8 14 19 31 17

Survey 1Incidence of crime 28 16 36 24 24 19 21

Number of police 12 18 9 12 14 26 21Survey 2

Incidence of crime 26 25 20 24 22 23 19

a By using scatter plots, determine whether or not there is a relationship between the number

of police on patrol and the incidence of crime, using the data in:

survey 1i survey 2ii

b Which survey results do you think the government will use to make its point? Why?

REAS

ONING

12, 1311 11, 12

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 48: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 657 657

9G12 A student collects some data and finds that there is a positive correlation between height and

the ability to play tennis. Does that mean that if you are tall you will be better at tennis? Explain.

13 A person presents you with this scatter plot and suggests a

strong correlation between the amount of sleep and exam

marks. What do you suggest is the problem with the person’s

graph and conclusions?

O

100

50

84Amount of sleep (hours)

Exa

m m

arks

(%

)

x

y

REAS

ONING

Does television provide a good general knowledge?

14 A university graduate is conducting a test to see whether a student’s general knowledge is in

some way linked to the number of hours of television watched.

Twenty Year 10 students sit a written general knowledge test marked out of 50. Each student

also provides the graduate with details about the number of hours of television watched

per week. The results are given in the table below.

Student A B C D E F G H I J K L M N O P Q R S THours of TV 11 15 8 9 9 12 20 6 0 15 9 13 15 17 8 11 10 15 21 3

Test score 30 4 13 35 26 31 48 11 50 33 31 28 27 6 39 40 36 21 45 48

a Which two students performed best on the general knowledge test, having watched TV for

the following numbers of hours?

fewer than 10i more than 4ii

b Which two students performed worst on the general knowledge test, having watched TV for

the following numbers of hours?

fewer than 10i more than 4ii

c Which four students best support the argument that the more hours of TV watched, the better

your general knowledge will be?

d Which four students best support the argument that the more hours of TV watched, the

worse your general knowledge will be?

e From the given data, would you say that the graduate should conclude that a student’s general

knowledge is definitely linked to the number of hours of TV watched per week?

ENRICH

MEN

T

14— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 49: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

658 658 Chapter 9 Statistics

Using calculators to draw scatter plots

Type the following data about car fuel economy into two lists and draw a scatter plot.

Car A B C D E F G H I JEngine size 1.1 1.2 1.2 1.5 1.5 1.8 2.4 3.3 4.2 5.0

Fuel economy 21 18 19 18 17 16 15 20 14 11

Using the TI-Nspire: Using the ClassPad:

1 Go to a new Lists and spreadsheets page and enterthe data into the lists. Title each column.

2 Go to a new Data and Statistics page and select theengine variable for the horizontal axis and fuel for thevertical axis. Hover over points to reveal coordinates.

1 In the Statistics application, enter the data into thelists. Assign a title to each column.

2 Tap . For graph 1 set Draw to On, Type toScatter, XList to main\Engine, YList to main\Fuel, Freq

to 1 and Mark to square. Tap Set. Tap . TapAnalysis, Trace to reveal coordinates.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 50: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 659 659

9H Line of best fit by eyeWhen bivariate data have a strong linear correlation, we can model the data with a straight line. This

line is called a trend line or line of best fit. When we fit the line ‘by eye’, we try to balance the number

of data points above the line with the number of points below the line. This trend line and its equation can

then be used to construct other data points within and outside the existing data points.

Let’s start: Size versus cost

This scatter plot shows the estimated cost of building a house of a given size, as quoted by a building

company. The given trend line passes through the points (200, 200) and (700, 700).

O 200 400Size of house (m2)

Cos

t of c

onst

ruct

ion

($’0

00)

400

600

200

600 800

(700, 700)

(200, 200)

800

y

x

• Do you think the trend line is a good fit to the points on the scatter plot? Why?

• How can you find the equation of the trend line?

• How can you predict the cost of a house of 1000 m2 with this building company?

Keyideas

A line of best fit or (trend line) is positioned by eye by balancing the number of points above the

line with the number of points below the line.

• The distance of each point from the trend line also must be taken into account.

The equation of the line of best fit can be found using two

points that are on the line of best fit.

For y = mx + c:

m =y2 – y1x2 – x1

and substitute a point to find the value of c.

• Alternatively, use y – y1 = m(x – x1).

The line of best fit and its equation can be used for:

• interpolation: constructing points within the given

data range

• extrapolation: constructing points outside the given

data range.

y(x1, y1)

(x2, y2)

x

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 51: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

660 660 Chapter 9 Statistics

Example 11 Fitting a line of best fit

Consider the variables x and y and the corresponding bivariate data.

x 1 2 3 4 5 6 7

y 1 2 2 3 4 4 5

a Draw a scatter plot for the data.

b Is there positive, negative or no correlation between x and y?

c Fit a line of best fit by eye to the data on the scatter plot.

d Use your line of best fit to estimate:

y when x = 3.5i y when x = 0ii

x when y = 1.5iii x when y = 5.5iv

SOLUTION EXPLANATION

a

O 1 2 3 5

3

4

2

5

6 7 84

1

6

x

y Plot the points on graph paper.

b Positive correlation As x increases, y increases.

c

O 1 2 3 5

3

4

2

5

6 7 84

1

6

x

y Since a relationship exists, draw a line on the

plot, keeping as many points above as below

the line. (There are no outliers in this case.)

d i y ≈ 2.7

ii y ≈ 0.4

iii x ≈ 1.7

iv x ≈ 7.8

Extend vertical and horizontal lines from the

values given and read off your solution. As

they are approximations, we use the ≈ sign

and not the = sign.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 52: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 661 661

Example 12 Finding the equation of a line of best fit

This scatter plot shows a linear relationship

between English marks and Literature marks

in a small class of students. A trend line

passes through (40, 30) and (60, 60).

a Find the equation of the trend line.

b Use your equation to estimate a Literature score

if the English score is:

50i 86ii

c Use your equation to estimate the English score

if the Literature score is:

42i 87iiO 20 40

English mark (%)

Lit

erat

ure

mar

k (%

)

40

60

20

60 80

(60, 60)

(40, 30)

100

80

100

y

x

SOLUTION EXPLANATION

a y = mx + c

m =60 – 3060 – 40

=3020

=32

∴ y =32x + c

(40, 30): 30 =32(40) + c

30 = 60 + c

c = –30

∴ y =32x – 30

Use m =y2 – y1x2 – x1

for the two given points.

Substitute either (40, 30) or (60, 60) to find c.

b i y =32(50) – 30 = 45

∴ Literature score is 45.

ii y =32(86) – 30 = 99

∴ Literature score is 99.

Substitute x = 50 and find the value of y.

Repeat for x = 86.

c i 42 =32x – 30

72 =32x, so x = 48.

∴ English score is 48.

ii 87 =32x – 30

117 =32x, so x = 78.

∴ English score is 78.

Substitute y = 42 and solve for x.

Repeat for y = 87.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 53: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

662 662 Chapter 9 Statistics

Exercise 9H

1 Practise fitting a line of best fit on these scatter plots by trying to balance the number of points

above the line with the numbers of points below the line. (Using a pencil might help.)

a b

c d

2 For each graph find the equation of the line in the form y = mx + c. First, find the gradient

m =y2 – y1x2 – x1

and then subsititute a point.

(3, 5)

(11, 9)

x

ya

(1, 5)

(4, 3)

x

yb

3 Using y =54x – 3, find:

a y when:

x = 16i x = 7ii

b x when:

y = 4i y =12

ii

UNDE

RSTA

NDING

—1–3 3

4Example 11 Consider the variables x and y and the corresponding bivariate data.

x 1 2 3 4 5 6 7

y 2 2 3 4 4 5 5

a Draw a scatter plot for the data.

b Is there positive, negative or no correlation between x and y?

c Fit a line of best fit by eye to the data on the scatter plot.

d Use your line of best fit to estimate:

y when x = 3.5i y when x = 0iix when y = 2iii x when y = 5.5iv

FLUE

NCY

4, 64–6 4–6UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 54: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 663 663

9H5 For the following scatter plots, pencil in a line of best fit by eye, and then use your line to

estimate the value of y when x = 5.

O

10

5

105x

ya

xO

20

10

105

yb

xO

1.0

0.5

102.5 5 7.5

yc

xO

100

50

105

yd

6Example 12 This scatter plot shows a linear relationship between Mathematics marks and Physics marks in

a small class of students. A trend line passes through (20, 30) and (70, 60).

a Find the equation of the trend line.

b Use your equation to find the Physics score if the Mathematics score is:

40i 90iic Use your equation to find the Mathematics score if the Physics score is:

36i 78ii

O 20 40 60Mathematics (%)

Phy

sics

(%

)

80 100

40

20

60

80

100

y

x

(20, 30)

(70, 60)

FLUE

NCY

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 55: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

664 664 Chapter 9 Statistics

9H7 Over eight consecutive years, a city nursery has

measured the growth of an outdoor bamboo species

for that year. The annual rainfall in the area where

the bamboo is growing was also recorded. The data

are listed in the table. FPO

Rainfall (mm) 450 620 560 830 680 650 720 540

Growth (cm) 25 45 25 85 50 55 50 20

a Draw a scatter plot for the data, showing growth on the vertical axis.

b Fit a line of best fit by eye.

c Use your line of best fit to estimate the growth expected for the following rainfall readings.

You do not need to find the equation of the line.

500 mmi 900 mmii

d Use your line of best fit to estimate the rainfall for a given year if the growth of the

bamboo was:

30 cmi 60 cmii

8 A line of best fit for a scatter plot, relating the weight (kg) and length (cm) of a group of dogs,

passes through the points (15, 70) and (25, 120). Assume weight is on the x-axis.

a Find the equation of the trend line.

b Use your equation to estimate the length of an 18 kg dog.

c Use your equation to estimate the weight of a dog that has a length of 100 cm.

PROB

LEM-SOLVING

7, 87 7, 8

9 Describe the problem when using each trend line below for interpolation.

a b

10 Describe the problem when using each trend line below for extrapolation.

a b

REAS

ONING

10, 119 9, 10

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 56: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 665 665

9H11 A trend line relating the percentage

scores for Music performance (y)

and Music theory (x) is given by

y =45x + 10.

a Find the value of x when:

y = 50iy = 98ii

b What problem occurs in predicting

Music theory scores when using

high Music performance scores?

REAS

ONING

Heart rate and age

12 Two independent scientific experiments confirmed a correlation between Maximum heart rate

(in beats per minute or b.p.m.) and Age (in years). The data for the two experiments are as follows.

Experiment 1Age (years) 15 18 22 25 30 34 35 40 40 52 60 65 71

Max. heart rate (b.p.m.) 190 200 195 195 180 185 170 165 165 150 125 128 105

Experiment 2Age (years) 20 20 21 26 27 32 35 41 43 49 50 58 82

Max. heart rate (b.p.m.) 205 195 180 185 175 160 160 145 150 150 135 140 90

a Sketch separate scatter plots for experiment 1 and experiment 2.

b By fitting a line of best fit by eye to your scatter plots, estimate the maximum heart rate

for a person aged 55 years, using the results from:

experiment 1i experiment 2ii

c Estimate the age of a person who has a maximum heart rate of 190, using the results from:

experiment 1i experiment 2ii

d For a person aged 25 years, which experiment estimates a lower maximum heart rate?

e Research the average maximum heart rate of people according to age and compare with the

results given above.

ENRICH

MEN

T

12— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 57: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

666 666 Chapter 9 Statistics

9I Linear regression with technology 10A

In section 9Hwe used a line of best fit by eye to describe a general linear (i.e. straight line) trend for bivariate

data. In this section we look at the more formal methods for fitting straight lines to bivariate data. This is

called linear regression. There are many different methods used by statisticians to model bivariate data.

Two common methods are least squares regression and median–median regression. These methods are best

handled with the use of technology.

Let’s start: What can my calculator or software do?

Explore the menus of your chosen technology to see what kind of regression tools are available. For

CAS calculator users, refer to page 6 7.

• Can you find the least squares regression and median–median regression tools?

• Use your technology to try Example 13 below.

Keyideas

Linear regression involves using a method to fit a straight line to bivariate data.

• The result is a straight line equation that can be used for interpolation and extrapolation.

The least squares regression line minimises the sum of the square of the deviations of each point

from the line.

• Outliers have an effect on the least squares regression line because all deviations are included

in the calculation of the equation of the line.

The median–median regression line uses the three medians from the lower, middle and upper

groups within the data set.

• Outliers do not have an effect on the median–median line as they do not significantly alter the

median values.

Example 13 Finding and using regression lines

Consider the following data and use a graphics or CAS calculator or software to help answer

the questions below.

x 1 2 2 4 5 5 6 7 9 11

y 1.8 2 1.5 1.6 1.7 1.3 0.8 1.1 0.8 0.7

a Construct a scatter plot for the data.

b Find the equation of the least squares regression line.

c Find the equation of the median–median regression line.

d Sketch the graph of the regression line onto the scatter plot.

e Use the least squares regression line to estimate the value of y when x is:

4.5i 15ii

f Use the median–median regression line to estimate the value of y when x is:

4.5i 15ii

6

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 58: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 667 667

Using calculators to find equations of regression

Using the TI-Nspire: Using the ClassPad:

a, b, d Go to a new Lists and spreadsheets page andenter the data into the lists. Title each columnxvalue and yvalue. Go to a new Data andStatistics page and select xvalue variable for thehorizontal axis and yvalue for the vertical axis. Toproduce the least squares regression line, go tomenu, Analyze, Regression, Show Linear (mx+b).

c, d To produce the median–median regression line, goto menu, Analyze, Regression, ShowMedian-Median.

e i y ≈ 1.39ii y ≈ 0.06

f i y ≈ 1.47ii y ≈ 0.03

a, b, d In the Statistics application enter the data into thelists. Tap Calc, Linear Reg and set XList to list1,YList to list2, Freq to 1, Copy Formula to y1 andCopy Residual to Off. Tap OK to view theregression equation. Tap on OK again to view theregression line. Tap Analysis, Trace and thenscroll along the regression line.

c, d To produce the median–median regression line, tapCalc, MedMed Line and apply the settings statedpreviously.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 59: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

668 668 Chapter 9 Statistics

Exercise 9I

1 A regression line for a bivariate data set is given by y = 2.3x – 4.1. Use this equation to find:

a the value of y when x is:

7i 3.2ii

b the value of x when y is:

12i 0.5ii

2 Give a brief reason why a linear regression line is not very useful in the following scatter plots.

y

x

a y

x

b

UNDE

RSTA

NDING

—1, 2 2

3Example 13 Consider the data in tables A–C and use a graphics or CAS calculator or software to help

answer the following questions.

A x 1 2 3 4 5 6 7 8

y 3.2 5 5.6 5.4 6.8 6.9 7.1 7.6

B x 3 6 7 10 14 17 21 26

y 3.8 3.7 3.9 3.6 3.1 2.5 2.9 2.1

C x 0.1 0.2 0.5 0.8 0.9 1.2 1.6 1.7

y 8.2 5.9 6.1 4.3 4.2 1.9 2.5 2.1

a Construct a scatter plot for the data.

b Find the equation of the least squares regression line.

c Find the equation of the median–median regression line.

d Sketch the graph of the regression line onto the scatter plot.

e Use the least squares regression line to estimate the value of y when x is:

7i 12ii

f Use the median–median regression line to estimate the value of y when x is:

7i 12ii

FLUE

NCY

3(B), 43(A, B), 4 3(A, C), 4

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 60: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 669 669

9I4 The values and ages of 14 cars are summarised in these tables.

Age (years) 5 2 4 9 10 8 7

Price ($’000) 20 35 28 14 11 12 15

Age (years) 11 2 1 4 7 6 9

Price ($’000) 5 39 46 26 19 17 14

a Using Age for the x-axis, find:

the least squares regression linei the median–median regression line.iib Use your least squares regression line to estimate the value of a 3-year-old car.

c Use your median–median regression line to estimate the value of a 12-year-old car.

d Use your least squares regression line to estimate the age of a $15 000 car.

e Use your median–median regression line to estimate the age of an $8000 car.

FLUE

NCY

5 A factory that produces denim jackets does not have air-conditioning. It was suggested that

high temperatures inside the factory were having an effect on the number of jackets able

to be produced, so a study was completed and data collected on 14 consecutive days.

Max. daily temp. inside factory (◦C) 28 32 36 27 24 25 29 31 34 38 41 40 38 31

Number of jackets produced 155 136 120 135 142 148 147 141 136 118 112 127 136 132

Use a graphics or CAS calculator to complete the following.

a Draw a scatter plot for the data.

b Find the equation of the least squares regression line.

c Graph the line onto the scatter plot.

d Use the regression line to estimate how many jackets would be able to be produced if the

maximum daily temperature in the factory was:

30◦Ci 35◦Cii 45◦Ciii

6 A particular brand of electronic photocopier is considered for scrap once it has broken down

more than 50 times or if it has produced more than 200 000 copies. A study of one particular

copier gave the following results.

Number of copies ( × 1000) 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Total number of breakdowns 0 0 1 2 2 5 7 9 12 14 16 21 26 28 33

a Sketch a scatter plot for the data.

b Find the equation of the median–median regression line.

c Graph the median–median regression line onto the scatter plot.

d Using your regression line, estimate the number of copies the photocopier will have produced

at the point when you would expect 50 breakdowns.

e Would you expect this photocopier to be considered for scrap because of the number of

breakdowns or the number of copies made?

PROB

LEM-SOLVING

6, 75, 6 5, 6

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 61: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

670 670 Chapter 9 Statistics

9I7 At a suburban sports club, the distance record for the hammer throw has increased over time.

The first recorded value was 72.3 m in 1967 and the most recent record was 118.2 m in 1996.

Further details are as follows.

Year 1967 1968 1969 1976 1978 1983 1987 1996

New record (m) 72.3 73.4 82.7 94.2 99.1 101.2 111.6 118.2

a Draw a scatter plot for the data.

b Find the equation of the median–median regression line.

c Use your regression equation to estimate the distance record for the hammer throw for:

2000i 2020ii

d Would you say that it is realistic to use your regression equation to estimate distance

records beyond 2020? Why?

PROB

LEM-SOLVING

8 a Briefly explain why the least squares regression line is affected by outliers.

b Briefly explain why the median–median regression line is not strongly affected by outliers.

9 This scatter plot shows both the least squares regression line and

the median–median regression line. Which line (i.e. A or B) do

you think is the least squares line? Give a reason.

y

A

B

x

REAS

ONING

8, 98 8, 9

Correlation coefficient

10 Use the internet to find out about the

Pearson correlation coefficient and then

answer these questions.

a What is the coefficient used for?

b Do most calculators include the

coefficient as part of their statistical

functions?

c What does a relatively large or small

correlation coefficient mean?

Statisticians work in many fields of industry, business,finance, research, government and social services.As computers are used to process the data, they canspend more time on higher-level skills, such as designingstatistical investigations, and data analysis.

ENRICH

MEN

T

10— —

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 62: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 671 671

InvestigationIndigenous population comparisonThe following data was collected by the Australian

Bureau of Statistics during the 2011 National

Census. It shows the population of Indigenous and

non-Indigenous people in Australia and uses class

intervals of 5 years.

Cat. No. 2068.0 – 2006 Census Tables2006 Census of Population and HousingAustralia (Australia)INDIGENOUS STATUS BY AGECount of personsBased on place of usual residenceCommonwealth of Australia 2007

Age Total Indigenous Non-Indigenous Indigenous status not statedTotal all ages 21 507 719 548 371 19 900 765 1 058 5830–4 years 1 421 048 67 416 1 282 738 70 8945–9 years 1 351 921 64 935 1 222 111 64 87510–14 years 1 371 055 64 734 1 241 794 64 52715–19 years 1 405 798 59 200 1 282 018 64 58020–24 years 1 460 675 46 455 1 333 622 80 59825–29 years 1 513 237 38 803 1 387 922 86 51230–34 years 1 453 777 33 003 1 345 763 75 01135–39 years 1 520 138 34 072 1 414 171 71 89540–44 years 1 542 879 33 606 1 438 346 70 92745–49 years 1 504 142 28 818 1 407 494 67 83050–54 years 1 447 403 24 326 1 357 677 65 40055–59 years 1 297 247 18 640 1 220 529 58 07860–64 years 1 206 117 13 592 1 138 395 54 13065 years and over 3 012 282 20 771 2 828 185 163 326

Indigenous histogram

a Use the given data to construct a histogram for the population of Indigenous people in Australia

in 2011.

b Which age group contained the most Indigenous people?

c Describe the shape of the histogram. Is it symmetrical or skewed?

Non-Indigenous histogram

a Use the given data to construct a histogram for the population of non-Indigenous people in

Australia in 2011.

Try to construct this histogram so it is roughly the same width and height as the histogram

for the Indigenous population. You will need to rescale the y-axis.

b Which age group contains the most number of non-Indigenous people?

c Describe the shape of the histogram. Is it symmetrical or skewed?

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 63: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

672 672 Chapter 9 Statistics

Comparisons

a Explain the main differences in the shapes of the two histograms.

b What do the histograms tell you about the age of Indigenous and non-Indigenous people in

Australia in 2011?

c What do the graphs tell you about the difference in life expectancy for Indigenous and

non-Indigenous people?

Antarctic iceAccording to many different studies, the Antarctic ice mass is decreasing over time. The following

data show the approximate change in ice mass in gigatonnes (Gt; 109 tonnes) from 2002 to 2009.

Year 2002 2003 2004 2005 2006 2007 2008 2009

Change (Gt) 450 300 400 200 –100 –400 –200 –700

A change of 300, for example, means that the ice mass has increased by 300 Gt in that year.

Data interpretation

a By how much did the Antarctic ice mass

increase in:

2002?i

2005?ii

b By how much did the Antarctic ice mass

decrease in:

2006?i

2009?ii

c What was the overall change in ice mass from the beginning of 2005 to the end of 2007?

Time-series plot

a Construct a time-series plot for the given data.

b Describe the general trend in the change in ice mass over the 8 years.

Line of best fit

a Fit a line of best fit by eye to your time-series plot.

b Find an equation for your line of best fit.

c Use your equation to estimate the change in ice mass for:

2010i 2020ii

Regression

a Use technology to find either the least squares regression line or the median–median regression

line for your time-series plot.

b How does the equation of these lines compare to your line of best fit found above?

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 64: Chapter Statistics 81&255(&7('6$03/(3$*(6

ii

ii

ii

ii

Statistics and Probability 673 673

Problems and challengesCheck out the 'Working

with unfamiliar problems'poster on the inside cover

of your book to help youanswer these questions.

1 The mean mass of six boys is 71 kg, and the

mean mass of five girls is 60 kg. Find the

average mass of all 11 people put together.

2 Sean has a current four-topic average of 78% for Mathematics. What score does he need in the fifth

topic to have an overall average of 80%?

3 A single-ordered data set includes the following data.

2, 4, 5, 6, 8, 10, x

What is the largest possible value of x if it is not an outlier?

4 Find the interquartile range for a set of data if 75% of the data are above 2.6 and 25% of the data are

above 3.7.

5 A single data set has 3 added to every value. Describe the change in:

the meana the medianb the rangec

the interquartile ranged the standard deviation.e

6 Three key points on a scatter plot have coordinates (1, 3), (2, 3) and (4, 9).

Find a quadratic equation that fits these three points exactly.

O 3

6

3

96

9

x

y

7 Six numbers are written in ascending order: 1.4, 3, 4.7, 5.8, a, 11.

Find all possible values of a if the number 11 is considered to be an outlier.

8 The class mean, x̄, and standard deviation, s, for some Year 10 term tests are:

Maths (x̄ = 70%,s = 9%); Physics (x̄ = 70%,s = 6%); Biology (x̄ = 80%,s = 6.5%).

If Emily gained 80% in each of these subjects, which was her best and worst result? Give reasons

for your answer.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 65: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapters

ummary

ii

ii

ii

ii

674 674 Chapter 9 Statistics

Measures of centre

Graphs for single set of categorical or discrete data

Dot plot Column graphs Stem-and-leaf plot

01

1LeafStem

22 3

678

3 4

9

Measures of spread

• Range = Max − Min• Interquartile range (IQR) = Q3 − Q1• Sample standard deviation (σ ) for n data values (10A)

– If σ is relatively small the data is concentrated about the mean.– If σ is relatively large the data is spread out from the mean.

Data

Grouped data

Histogram

Categorical• Nominal (red, blue, ...)• Ordinal (low, medium, ...)

Class interval

0–10–20–30–40

2431

20403010

Total 10 100

FrequencyPercentagefrequency

Numerical• Discrete (1, 2, 3, ...)• Continuous (0.31, 0.481, ...)

Outliers

• Single data set – less than Q1 − 1.5 × IQR or more than Q3 + 1.5 × IQR• Bivariate – not in the vicinity of the rest of the data

Quartiles

sum of all values

2 3 means 23

number of scores

n − 1

Statistics

10O

20 30 40

1

3

Freq

uenc

y

Perc

enta

gefre

quen

cy4

210

3040

20

Bivariate data

– Two related variables– Scatter plot

A strong, positive

correlation

line ofbest fit(y = mx + c )

Linear regression with

technology (10A)

– Least squares line– Median–median line

Time-series data

Weak, negative

correlation No correlation

Downwards linear trend

Time

Box plots

Min Q1 Q2 Q3Max

Outlier

(x1 − x_

)2 + (x2 − x_

)2 + ... + (xn − x_

)2=Q1: above 25% of the dataQ3: above 75% of the data

Collecting data

• Mean (x_

) =

• Median (Q2) = middle value(Mode = most common value)

2 3 5 7 8 9 11 12 14 15

((Q2 = 8.5Q1 Q3

Q1 Q2 Q3

1 2 3 4

Freq

uenc

y

Size

Very

high

Low

Med

ium

High

y

x

1 4 7 8 12 16 21

A survey can be used to collect data from a population. The sample of the population chosento be surveyed should be selected without bias and should be representative of the population.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 66: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapterreview

ii

ii

ii

ii

Statistics and Probability 675 675

Multiple-choice questions

138pt9A The type of data generated by the survey question What is your favourite food? is:

numerical and discreteA numerical and continuousBa sampleC categorical and nominalDcategorical and ordinalE

Questions 2–4 refer to the dot plot shown at right.

238pt9B The mean of the scores in the data is:

3.5A 3.9B 3C4D 5E 4

Score532

338pt9B The mode for the data is:

3.5A 2B 3C 4D 5E

438pt9B The dot plot is:

symmetricalA positively skewedB negatively skewedCbimodalD correlatedE

Questions 5 and 6 refer to this box plot.

6 842 14 16 1810 12 2220538pt9D The interquartile range is:

8A 5B 3C 20D 14E

638pt9D The range is:

5A 3B 20C 14D 8E

738pt9G The variables x and y in this scatter plot could be described as having:

no correlationAa strong, positive correlationBa strong, negative correlationCa weak, negative correlationDa weak, positive correlationE

y

xUNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 67: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapterreview

ii

ii

ii

ii

676 676 Chapter 9 Statistics

838pt9H The equation of the line of best fit for a set of bivariate data is given by y = 2.5x – 3. An

estimate for the value of x when y = 7 is:

–1.4A 1.2B 1.6C 7D 4E

938pt9E

10A

The sample standard deviation for the small data set 1, 1, 2, 3, 3 is:

0.8A 2B 1C 0.9D 2.5E

1038pt9H The equation of the line of best fit connecting the points (1, 1) and (4, 6) is:

y = 5x + 3A y =53x –

23

B y = –53x +

83

C

y =53x –

83

D y =35x –

23

E

Short-answer questions

138pt9B A group of 16 people was surveyed

to find the number of hours of

television they watch in a week. The

raw data are listed:

6, 5, 11, 13, 24, 8, 1, 12

7, 6, 14, 10, 9, 16, 8, 3

Organise the data into a table

with class intervals of 5 and

include a percentage frequency

column.

a

Construct a histogram for the

data, showing both the frequency

and percentage frequency on the graph.

b

Would you describe the data as symmetrical, positively skewed or negatively skewed?cConstruct a stem-and-leaf plot for the data, using 10s as the stem.dUse your stem-and-leaf plot to find the median.e

238pt9D For each set of data below, complete the following tasks.

Find the range.iFind the lower quartile (Q1) and the upper quartile (Q3).iiFind the interquartile range.iiiLocate any outliers.ivDraw a box plot.v

a 2, 2, 3, 3, 3, 4, 5, 6, 12

b 11, 12, 15, 15, 17, 18, 20, 21, 24, 27, 28

c 2.4, 0.7, 2.1, 2.8, 2.3, 2.6, 2.6, 1.9, 3.1, 2.2

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 68: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapterreview

ii

ii

ii

ii

Statistics and Probability 677 677

338pt9D Compare these parallel box plots, A and B, and answer the following as true or false.

a The range for A is greater than the range for B.

b The median for A is equal to the median for B.

c The interquartile range is smaller for B.

d 75% of the data for A sit below 80. 20

A

0 60 80 10040

B

438pt9G Consider the simple bivariate data set.

x 1 4 3 2 1 4 3 2 5 5

y 24 15 16 20 22 11 5 17 6 8

a Draw a scatter plot for the data.

b Describe the correlation between x and y as positive or negative.

c Describe the correlation between x and y as strong or weak.

d Identify any outliers.

538pt9H The line of best fit passes through the two points labelled

on this graph.

a Find the equation of the line of best fit.

b Use your equation to estimate the value of y when:

x = 4i x = 10iic Use your equation to estimate the value of x when:

y = 3i y = 12ii

(6, 5)

(1, 2)

y

x

638pt9F Describe the trend in these time-series plots as linear, non-linear or no trend.

Time

a

Time

b

738pt9E

10A

Calculate the mean and sample standard deviation for these small data sets. Round the standard

deviation to one decimal place.

4, 5, 7, 9, 10a 1, 1, 3, 5, 5, 9b

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 69: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapterreview

ii

ii

ii

ii

678 678 Chapter 9 Statistics

838pt9E The Cats and The Vipers basketball teams compare their number of points per match for a

season. The data are presented in this back-to-back stem-and-leaf plot.

The Cats The VipersLeaf Stem Leaf

0 92 1 9

8 3 2 0 4 8 97 4 3 2 4 7 8 9

9 7 4 1 0 4 2 87 6 2 5 0

2 | 4 means 24

State which team has:

the higher rangeathe higher meanbthe higher medianc

10A the higher standard deviation.d

938pt9I

10A

For the simple bivariate data set in Question 4, use technology to find the equation of the:

a least squares regression line

b median–median regression line.

Extended-response questions

1 The number of flying foxes taking refuge in two different fig trees was recorded over a

period of 14 days. The data collected are given here.

Tree 1 56 38 47 59 63 43 49 51 60 77 71 48 50 62

Tree 2 73 50 36 82 15 24 73 57 65 86 51 32 21 39

a Find the IQR for:

tree 1itree 2ii

b Identify any outliers for:

tree 1itree 2ii

c Draw parallel box plots for the data.

d By comparing your box plots, describe

the difference in the ways the flying

foxes use the two fig trees for taking

refuge.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400

Page 70: Chapter Statistics 81&255(&7('6$03/(3$*(6

Chapterreview

ii

ii

ii

ii

Statistics and Probability 679 679

2 The approximate number of shoppers in an air-conditioned shopping plaza was recorded for 14 days,

along with the corresponding maximum daily outside temperatures for those days.

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Max. daily temp. (T ) (°C) 27 26 28 33 38 36 28 30 32 25 25 27 29 33

No. of shoppers (N ) 1050 950 1200 1550 1750 1800 1200 1450 1350 900 850 700 950 1250

a Draw a scatter plot for the number of shoppers versus the maximum daily temperatures, with the

number of shoppers on the vertical axis, and describe the correlation between the variables as

either positive, negative or none.

b10A Find the equation of the median–median regression line for the data, using technology.

c Use your regression equation to estimate:

i the number of shoppers on a day with a maximum daily temperature of 24◦C

ii the maximum daily temperature if the number of shoppers is 1500.

d Use technology to determine the least squares regression line for the data.

e Use your least squares regression equation to estimate:

i the number of shoppers on a day with a maximum daily temperature of 24◦C

ii the maximum daily temperature if the number of shoppers at the plaza is 1500.

UNCORRECTED

SAMPLE P

AGES

Uncorrected 3rd sample pages • Cambridge University Press © Greenwood et al., 2014 • ISBN 978-1-107-56890-7 • Ph 03 8671 1400