MBA 1 Quantitative Methods January 2013

QUANTITATIVE METHODS

STUDY GUIDE

PROGRAMME : MBA Year 1

CREDIT POINTS : 20 points

NOTIONAL LEARNING : 200 hours over 1 semester

TUTOR SUPPORT : [email protected]

Copyright 2013

MANAGEMENT COLLEGE OF SOUTHERN AFRICA

All rights reserved; no part of this book may be reproduced in any form or by any means, including

photocopying machines, without the written permission of the publisher

REF: EQM 2013

Quantitative Methods

MANCOSA - MBA 1

TABLE OF CONTENTS

UNIT

TITLE OF SECTION

PAGE

General Outcomes

3

Prescribed Reading

4

1

Graphical Representation

5

2

Measure of Central Tendency

27

3

Measure of Dispersion (Variability)

53

4

Probability

75

5

Probability Distribution

101

6

Hypothesis Testing

115

7

Simple Linear Regression and Correlation Analysis

149

8

Forecasting Time Series Analysis

167

9

Decision Analysis Decision Trees and Payoff Tables

185

Solutions to Units Exercises

209

References

243

Tables

245


MANCOSA - MBA 2


MANCOSA - MBA 3

General Outcomes

Studying this module will enable the student to:

Apply simple statistical tools and analyses to solve business-related problems.

Interpret and analyse business data for production, planning, forecasting and other decision-making

functions.

Communicate effectively with statistical analysts.

Apply quantitative methods and techniques to other management disciplines Economics, Accounting,

Financial Management, Marketing and Research.

Syllabus: The syllabus for the module is as follows:

Topic 1: Descriptive Statistics:

a. Graphical Representation

b. Measures of central Tendency

c. Measures of spread

d. Probability and Probability distributions

Topic 2: Inferential Statistics:

a. Hypothesis testing

b. Simple linear regression and correlation analysis

Topic 3: Forecasting Time series analysis

Topic 4: Decision Analysis Decision Trees and payoff tables

Topic 5: Time Value of Money

a. Simple and Compound Interest

b. Depreciation

c. Present Value

d. NPV

e. IRR


MANCOSA - MBA 4

READING

Prescribed Textbook:

Trevor Wegner (2006). Applied Business Statistics: Methods and Applications, Juta & Co, Ltd: Cape Town

Recommended Textbook:

Lind, Marchal and Wathen (2005). Statistical Techniques in Business and Economics (12th Edition), New York:

McGraw-Hill. Chapter 1

The purpose of this course

Statistics as a subject has been included in the MBA curriculum because it is needed in two main areas:

1. Descriptive statistics are used in subjects like Finance, Operations etc. to describe business phenomena.

When you get to these study areas it will be explained where they are used, and

2. It is a requirement for an MBA degree that you must complete a research project. In this research project

you will have to collect data. In processing the data to make decisions you will need inference. Inference

(hypothesis testing) is covered in the latter part of this course.


MANCOSA - MBA 5

UNIT 1

GRAPHICAL REPRESENTATION


MANCOSA - MBA 6

UNIT 1: GRAPHICAL REPRESENTATION

OBJECTIVES

By the end of this study unit, you should be able to:

1. Recognise whether the type of data under consideration is quantitative, qualitative, or ranked.

2. Summarise a set of quantitative data by means of a frequency distribution, histogram, relative

frequency polygon.

3. Summarise a set of qualitative data by means of a pie chart and bar chart.

CONTENTS

1.1 Introduction

1.2 Types of data

1.3 Graphical Techniques for Quantitative Data

1.4 Pie Charts, Bar Charts, and Line Charts

1.5 Scatter Diagrams

1.1 Introduction

The basic types of data is described in this unit. In Section 1.3 some graphical methods to present the data is

included.

1.2 Types of data

Statistics is the science of collecting and analyzing data. Data are obtained by measuring the values of one or

more variables. Data can be classified as either quantitative data or qualitative data.

Quantitative data are measurements that are recorded on a naturally occurring numerical scale.

Some examples of quantitative data are:

The time that you have to wait for the next bus.

Your height or weight

Qualitative data can only be classified into categories like:

The political party that you support

Your gender

Sometimes arbitrary numerical values are assigned to qualitative data like calling males 1 etc.


MANCOSA - MBA 7

The appropriate graphical method to be used in presenting data depends, in part, on the type of data

under consideration. Later in the guide, when statistical inference is covered, the data type will help to

identify the appropriate statistical technique to be used in solving a problem. In a few situations, it will be

necessary to recognise whether or not a set of non-quantitative data can be ordered. If the categories for

a set of non-quantitative data can be ordered or ranked, we have a third type of data, called ranked data.

SELF-ASSESSMENT ACTIVITY 1.1

How do I identify quantitative data?

SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.1

Quantitative data are real numbers. They are not numbers arbitrarily assigned to represent qualitative data. An

experiment that produces qualitative data always asks for verbal, non-numerical responses (e.g., yes and no;

defective and non-defective; Catholic, Protestant, and other).

Numerical data can also be classified as discrete (when there are only specific values that appear like the

number of students in a class) or continuous (when you can have intermediate values like your height that can

be measured more accurately).

Continuous data are sometimes summarized in tables where the number of data items in each interval is given.

See the example of interval data in the next table:

Mass (kg) Frequency

45-49 6

50-54 14

55-59 25

60-64 11


MANCOSA - MBA 8


How do I identify quantitative data?

For each of the following examples of data, determine whether the data type is quantitative, qualitative,

or ranked.

a) the weekly level of the prime interest rate during the past year.

b) the make of car driven by each of a sample of executives.

c) the number of contacts made by each of a company's salespersons during a week.

d) the rating (excellent, good, fair, or poor) given to a particular television program by each of a sample

of viewers.

e) the number of shares traded on the New York Stock Exchange each week throughout 2005.


a) Quantitative, if the interest rate level is expressed as a percentage. If the level is simply observed as

being high, moderate, or low, then the data type is qualitative.

b) Qualitative.

c) Quantitative.

d) Ranked, because the categories can be ordered.

e) Quantitative.

1.3. Graphical Techniques for Quantitative Data

This section introduced the basic methods of descriptive statistics used for organising a set of numerical

data in tabular form and presenting it graphically. Summarising data in this way requires that you first

group the data into classes. Judgment is required concerning the number and the size of the classes to

be used. The presentation of the grouped data should enable the user to quickly grasp the general shape

of the distribution of the data.


How do I choose the number of classes and the width of the classes to be used in constructing a

frequency distribution?


MANCOSA - MBA 9


Although this choice is arbitrary and no hard-and-fast rules can be given, here are a few useful

guidelines:

1. The classes must be non-overlapping, so that each measurement falls into exactly one class.

Therefore, choose the classes so that no measurement falls on a class boundary.

2. Choose the number of classes to be used as a number between 5 and 20, with smaller numbers of

classes being chosen for smaller data sets.

3. The approximate width of each class is given by the following:

Approximate class width = classesofNumber

valueimuminmvalueMaximum

Choose the actual class width to be a value close to the approximate width that is convenient to work with.

Avoid awkward fractional values.


The weights in kilograms of a group of workers are as follows:

173 165 171 175 188

183 177 160 151 169

162 179 145 171 175

168 158 186 182 162

154 180 164 166 157

1.4.1 Construct a stem and leaf display for these data.

1.4.2 Construct a frequency distribution for these data.


1.4.1 The first step in constructing a stem and leaf display is to decide how to split each observation

(weight) into two parts: a stem and a leaf. For this example, we will define the first two digits of an

observation to be its stem and the third digit to be its leaf. Thus, the first two weights are split into

a stem and a leaf as follows:

Weight Stem Leaf

173 17 3

183 18 3


MANCOSA - MBA 10

Scanning the remaining weights, we find that there are five possible stems (14, 15, 16, 17 and 18), which

we list in a column from smallest to largest, as shown below. Next, we consider each observation in turn

and place its leaf in the same row as its stem, to the right of the vertical line. The resulting stem and leaf

display shown below has grouped the 25 weights into five categories. The second row of the display,

corresponding to the stem 15, has four leaves: 4, 8, 1 and 7. The four weights represented in the second

row are therefore 151, 154, 157 and 158.

Stem Leaf

14 5

15 1 4 7 8

16 2 8 5 0 4 6 9 2

17 3 7 9 1 5 1 5

18 3 0 6 2 8

1.4.2 The hardest, and most important, step in constructing a frequency distribution is choosing the

number and width of the classes. Constructing a stem and leaf display first is often helpful. For this

example, the display in part a) suggests using five classes, each with a width of 10 pounds. The number

(or frequency) of weights falling into each class is then recorded as shown in the table that follows. Care

must be taken to define the classes in such a way that each measurement belongs to exactly one class.

We will follow the convention that a class (such as 140 up to 150) contains all measurements from the

lower limit (140) up to, but not including, the upper limit (150).

Stem No of leaves

140 up to 150 1

150 up to 160 4

160 up to 170 8

170 up to 180 7

180 up to 190 5

Total 25

Suppose that we hadn't first constructed a stem and leaf display, or that the stem and leaf display

contained only a few, or too many, categories. (If the number of measurements is less than 50, the

frequency distribution should contain between 5 and 7 classes.) We might then begin by noting that the

smallest and largest measurements are 145 and 188, respectively, so that the range of the

measurements is 188 - 145 = 43. If we decide to use five classes, the approximate width of each class is

43/5 = 8.6. In order to work with "round" numbers, we have chosen to use a class width of 10 and to set

the lower limit of the first class at 140.


MANCOSA - MBA 11


Refer to the data in Example 1.4 above

1.5.1 Construct a relative frequency histogram for the data.

1.5.2 Construct a relative frequency polygon for the data.

1.5.3 Construct an ogive for the data.


1.5.1 The relative frequencies, obtained by dividing each frequency by 25, are shown below:

Class Limits Frequency Relative

Frequency

Cumulative Relative

Frequency

140 up to 150 1 0.04 0.04

150 up to 160 4 0.16 0.20

160 up to 170 8 0.32 0.52

170 up to 180 7 0.28 0.80

180 up to 190 5 0.20 1.00

00.050.1

0.150.2

0.250.3

0.35

Re

lativ

e fre

quen

cy

Weight (Kg)

Relative frequency histogram for weight of workers

140150160170180190

The relative frequency histogram is constructed by erecting over each class interval a rectangle, the height

of which equals the relative frequency of that class.


MANCOSA - MBA 12

1.5.2 The relative frequency polygon is constructed by plotting the relative frequency of each class above

the midpoint of that class and then joining the points with straight lines. The polygon is closed by

considering one additional class (with zero frequency) at each end of the distribution and extending a

straight line to the midpoint of each of these classes.

1.5.3 The cumulative relative frequencies are shown in the table in part 1.5.1. The cumulative relative

frequency of a particular class is the proportion of measurements that fall below the upper limit

of that class. To construct the ogive, the cumulative relative frequency of each class is plotted

above the upper limit of that class, and the points representing the cumulative frequencies are

then joined by straight lines. The ogive is closed at the lower end by extending a straight line to

the lower limit of the first class.

Weights ( kg )


MANCOSA - MBA 13

1.4 Pie Charts, Bar Charts, and Line Charts

The methods described in the previous section are appropriate for summarizing data that are quanti-

tative, or numerical measurements. But we must also be able to describe data that are qualitative, or

categorical data. These data consist of attributes, which are the names of the categories into which the

observations are sorted.

1.4.1 Pie Chart

A pie chart is a useful method for displaying the percentage of observations that fall into each category of

qualitative data, while a bar chart can be used to display the frequency of observations that fall into each

category. If the categories consist of points in time and the objective is to focus on the trend in

frequencies over time, a line chart is useful.


Refer to the data in Example 1.4 above

According to the New York Times (27 September 1987), the June levels of unemployment in the United

States for five years were as follows:

Year Unemployed (millions)

1983 10.7

1984 8.5

1985 8.3

1986 8.2

1987 7.3

1.6.1 Use a bar chart to depict these data.

1.6.2 Use a line chart to depict these data.


MANCOSA - MBA 14


1.6.1 The five years, or categories, are represented by intervals of equal width on the horizontal axis. The

height of the vertical bar erected above any year is proportional to the frequency (number of

unemployed) corresponding to that year.

Bar Chart for Unemployment

0

2

4

6

8

10

12

1983 1984 1985 1986 1987

Year

Freq

uen

cy ( m

illio

ns)

1.6.2 A line chart is obtained by plotting the frequency of a category above the point on the horizontal axis

representing that category and then joining the points with straight lines.

0

2

4

6

8

10

12

1983 1984 1985 1986 1987


MANCOSA - MBA 15


The New York Times article alluded to in self-assessment 1.6 reported that 6 million Americans who say

they want work are not even seeking jobs.

A breakdown of these 6 million Americans by race follows:

Race Frequency

White 4320000

Black 1500000

Other 180000

Required: Use a pie chart to depict these data.


A pie chart is an effective method of showing the percentage breakdown of a whole entity into its

component parts. We must first determine the percentage of the 6 million Americans belonging to each of

the three racial categories: 72% white, 25% black, and 3% other. Each category is represented by a slice of

the pie (a circle) that is proportional in size to the percentage (or relative frequency) corresponding to that

category. Since the entire circle corresponds to 360, the angle between the lines demarcating the White

sector is therefore (0.72)(360) = 259.2. In a similar manner, we can determine that the *angles for the

Black and Other sectors are 90 and 10.8, respectively. The pie chart is on the next page.

(259.2)

(90)


MANCOSA - MBA 16

1.4.2 Bar charts

Bar charts are a quick and easy way of showing variation in or between variables.

Rectangles of equal width are drawn so that the area enclosed by each rectangle is proportional to the

size of the variable it represents. This type of graph not only illustrates a general trend, but also allows a

quick and accurate comparison of one period with another or the illustration of a situation a particular

time. When drawing up bar charts take care to:

make the bars reasonably wide so that they can be clearly seen;

draw them neatly and professionally;

ensure that the bars all have the same width;

ensure that the gaps between the bars have the same width.

We can produce a variety of bar charts to provide an overview of the data.

Simple bars representing each variable are drawn either vertically or horizontally.

1.4.3 Component or stacked bar chart

A single bar is drawn for each variable, with the heights of the bars representing the totals of the

categories. Each bar is then subdivided to show the components that make up the total bar. These

components may be identified by colouring or shading, accompanied by an explanatory key to show what

each component represents.


MANCOSA - MBA 17

Percentage component bar chart

The components are converted to percentages of the total, and the bars are divided in proportion to

these percentages. The scale is a percentage scale and the height of each bar is therefore 100%

1.4.4 Multiple bar charts

Two or more bars are grouped together in each category. The use of a key helps to distinguish between

the categories.

1.5 Scatter Diagrams

This section introduced the notion of the relationship between two quantitative variables. Economists, for

example, are interested in the relationship between inflation rates and unemployment rates. Business

owners are interested in many variables, including the relationship between their advertising

expenditures and sales levels. The graphical technique used to depict the relationship between the

variables X and Y is the scatter diagram, which is a plot of all pairs of values (x, y) for the variables X

and Y.


MANCOSA - MBA 18


An educational economist wants to establish the relationship between an individual's income and

education. She takes a random sample of 10 individuals and asks for their income (in $1,000s) and

education (in years). The results are shown below. Construct a scatter diagram for these data, and

describe the relationship between the number of years of education and income level.

x (education) y (income)

11 25

12 33

11 22

15 41

8 18

10 28

11 32

11 24

17 53

11 26


If we feel that the value of one variable (such as income) depends to some degree on the value of the

other variable (such as years of education), the first variable (income) is called the dependent variable

and is plotted on the vertical axis. The ten pairs of values for education (x) and income (y) are plotted in

Figure 1.5.1, forming a scatter diagram.

The scatter diagram allows us to observe two characteristics about the relationship between education

(x) and income (y):

1. Because these two variables move together-that is, their values tend to increase together and

decrease together, there is a positive relationship between the two variables.

2. The relationship between income and years of education appears to be linear, since we can

imagine drawing a straight line (as opposed to a curved line) through the scatter diagram that

approximates the positive relationship between the two variables.


MANCOSA - MBA 19

The pattern of a scatter diagram provides us with information about the relationship between two

variables. Figure 1.1 depicts a positive linear relationship. If two variables move in opposite directions,

and the scatter diagram consists of points that appear to cluster around a straight line, then the variables

have a negative linear relationship (see Figure 1.2). It is possible to have nonlinear relationships (see

Figures 1.3 and 1.4), as well as situations in which the two variables are unrelated (see Figure 1.5). In

Unit 7, we will compute numerical measures of the strength of the linear relationship between two

variables.

Figure 1.1

Scatter Diagram for Self Assessment

0

10

20

30

40

50

60

0 2 4 6 8 10 12 14 16 18 20

Years of Education

Incom

e ($'

000)

F i g 1. 2 N e ga t i v e Li ne a r Re l a t i onshi p

05

1015202530

0 10 20 30X

Figure 1.3 Nonlinear Relationship

0

50

100

0 10 20 30X

Y

Figure 1.4 Nonlinear Relatiuonship

05

101520

0 10 20 30

X

Y

Fig 1.5 No Relationshp

010

2030

0 10 20 30X

Y


MANCOSA - MBA 20

Unit 1 Exercises: (Solutions are found at the end of the module guide)

Exercise 1.1

Describe three ways of (graphically) representing data which you can consider to be appropriate for inclusion in a

companys annual report and accounts. Name the advantages of these forms of data representation.

Exercise 1.2

Produce a pie chart showing the percentage market share of the passenger car market held by each of South

Africas car manufacturers.

Manufacturer 1991 Sales (Units)

Toyota

Nissan

Volkswagen

Delta

Ford

MBSA

BMW

MMI

51 653

20 793

39 757

20 949

18 631

15 756

15 431

14 731

Total 1991 Sales 197 701

Exercise 1.3

Produce a component bar chart showing the breakdown of car sales for Toyota, Nissan and Ford only between

the first and second half of 1991.

Manufacturer

1991 Sales (units)

Total units First Half

(Jan - June)

Second Half

Toyota

Nissan

Ford

15 653

20 793

18 631

19 629

9 565

9 875

32 024

11 228

8 756

Totals 91 077 39 069 52 008


MANCOSA - MBA 21

Exercise 1.4

Produce a line graph showing the trend in market share for Volkswagen and Nissan over the period 1982 to

1991.

Year Volkswagen Nissan

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

13.4

11.6

9.8

14.4

17.4

19.9

21.3

22.2

19.6

20.1

9.9

9.6

8.2

6.8

7.8

9.7

11.7

10.2

10.6

10.5

Comment on the findings.

Exercise 1.5

Areas of Continents of the World.

Continents Area in million of

Square kilometres

Africa

Asia

Europe

North America

Oceania

South America

Russia

30.3

26.3

4.9

24.3

8.5

17.9

20.5

(i) Draw a bar chart of the above information

(ii) Construct a pie chart to represent the total area.


MANCOSA - MBA 22

Exercise 1.6

The distance travelled (in kilometres) by a courier service motorcycle on 30 trips were recorded by the driver.

24 19 21 27 20 17 17 32 22 26

18 13 23 30 10 13 18 22 34 16

18 23 15 19 28 25 25 20 17 15

a) Define the random variable, the data type, and the measurement scale. b) From the data set, prepare:

i. an absolute frequency distribution,

ii. a relative frequency distribution, and

iii. the (relative) less than ogive.

c) Construct the following graphs: i. a histogram of the relative frequency distribution, and

ii. the cumulative frequency polygon.

d) From the graphs, read off: i. what percentage of trips was between 25 and 30 km long?

ii. what percentage of trips were under 25 km long?

iii. what percentage of trips were 22 km or more?

iv. below which distance were 55% of the trips made?

v. above which distance were 20% of the trips made?

Exercise 1.7

Vorovka Director Marketing has offices in Windhoek, Johannesburg, Durban and Botswana. The number of

employees in each location and their genders are tabulated below.

Office Females Males Total

Windhoek 12 8 20

Johannesburg 9 15 24

Durban 23 6 29

a) Plot a cluster bar chart to show the total number of employees in each office.

b) Plot a component bar chart to show the number of employees in each office by gender.

c) Plot a cluster bar chart to show the number of employees at each office by gender.


MANCOSA - MBA 23

Exercise 1.8

Tourists seeking holiday accommodation in a self-catering complex in the resort ABC of Namibia can make either

a one-or two-week booking. The manager of the complex has produced the following table to show the bookings

she received last season:

Type of booking

Tourists home country One-week Two-week

France 13 44

Germany 29 36

Holland 17 21

Ireland 8 5

a) Produce a simple bar chart to show the total number of bookings by home country. b) Produce a component bar chart to show the number of bookings by home country and types of booking. c) Produce a cluster bar chart to show the numbers of bookings by home country and type of booking.

Exercise 1.9

A roadside breakdown assistance service answer 37 calls in Cape Town on one day. The response times taken

to deal with these calls were noted and have been arranged in grouped frequency distribution below.

Response time (minutes) Number of calls

20 to under 30 4

30 to under 40 8

40 to under 50 17

50 to under 60 6

60 to under 70 2

a) Produce a histogram to portray this distribution and describe the shape of the distribution. b) Find the cumulative frequency for each class. c) Produce a cumulative frequency graph of the distribution.


MANCOSA - MBA 24

Exercise 1.10

Rents per person (to the nearest $) for 83 flats and houses advertised on the notice boards at a university were

collected and the following grouped frequency distribution compiled:

Rent per person ($) Frequency

35 - 39 13

40 - 44 29

45 - 49 22

50 - 54 10

55 - 59 7

60 - 64 2

a) Plot a histogram to portray this distribution and comment on the shape of the distribution. b) Find the cumulative frequency for each class. c) Plot a cumulative frequency graph of the distribution.

Exercise 1.11

Monthly membership fees in $ for 22 health clubs are:

34 43 44 22 73 69 48 67 33 56 67

27 78 60 63 32 67 41 65 48 48 77

Compile a stem and leaf display of these data.

The clubs whose fees appear in bold do not have a swimming pool. Highlight them in your display.

Exercise 1.12

Select which of the statements listed below on the right-hand side describes the words listed on the left-hand

side.

(i) Histogram a) can only take a limited number of values

(ii) Time series b) segments or slice represents categories

(iii) Pictogram c) each plotted point represents a pair values

(iv) Discrete data d) separates parts of each observation

(v) Stem and leaf display e) each block represents a class

(vi) Scatter diagram f) data collected at regular intervals over time

(vii) Pie chart g) comprises set of small pictures


MANCOSA - MBA 25

Student review questions

1. Describe the difference between quantitative data and qualitative data.

2. For each of the following examples of data, determine whether the data are quantitative, qualitative,

or ranked.

a) the month of the highest sales for each firm in a sample.

b) the department in which each of a sample of university professors teaches.

c) the weekly closing price of gold throughout a year.

d) the size of soft drink (large, medium, or small) ordered by a sample of customers in a restaurant.

e) the number of barrels of crude oil imported monthly by the United States.

3. Identify the type of data observed for each of the following variables.

a) the number of students in a statistics class.

b) the student evaluations of the professor (1 = poor, 5 = excellent).

c) the political preferences of voters.

d) the states in the United States of America.

e) the size of a condominium (in square feet).


MANCOSA - MBA 26


MANCOSA - MBA 27

UNIT 2

MEASURES OF CENTRAL TENDENCY


MANCOSA - MBA 28

UNIT 2: MEASURES OF CENTRAL TENDENCY

OBJECTIVES


Determine the mean, median and mode for grouped and ungrouped data.

Describe the symmetry/skewness of a set of data in terms of the mean, median and mode.

Calculate the range, standard deviation, variance, quartiles and inter-quartile range for grouped as well as

ungrouped data.

CONTENTS

2.1 Introduction

2.2 Ungrouped data

2.2.1 Mean

2.2.2 Median

2.2.3 Mode

2.3 Grouped data

2.3.1 Mean for grouped data

2.3.2 Median for grouped data

2.3.3 Mode for grouped data

2.4 The best average

2.5 Box plots

2.6 Self-evaluation


MANCOSA - MBA 29

2.1 Introduction

This unit discusses numerical descriptive measures used to summarise and describe sets of data. There are

three commonly used numerical measures of central tendency of a data set: the mean, the median, and the

mode. You are expected to know how to compute each of these measures for a given data set. Moreover, you

are expected to know the advantages and disadvantages of each of these measures, as well as the type of data

for which each is an appropriate measure.

An average that consists of a single value that is central to or representative of the entire data set is information

of great importance. The most commonly used averages are the mean, median and mode. There are three

measures of central tendency that are often used:

2.2 For ungrouped data

2.2.1 The arithmetic mean

The first and most important one is the arithmetic mean (at school you just called this the average). Sometimes

we merely call the arithmetic mean the mean.

To calculate the mean of some numbers we merely add the numbers together and divide the total by the number

of values.

The mean of: 4, 5, 6, 7, 8, 10 is 40 / 6 = 6.66 (The total of the values is 40 and there are 6 values.)

In Excel the mean can be found by placing = Average(4,5,6,7,8,10) in a cell.

The mean can be written as a formula: N

xx

i=

We say X-bar (or the mean) is the sum of the values ( ix s) divided by the number of values (N).

The arithmetic mean is the most important of all numerical descriptive measurements, and it corresponds to what

most people call an average.

Definition 2.1: The arithmetic mean of a list of scores is obtained by adding the scores and dividing the total by

the number of scores. It will be referred to simply as the mean.


MANCOSA - MBA 30

Example 1

Find the mean of the scores 2, 3, 6, 7, 12.

The mean score is 2 3 6 7 12 6

5+ + + +

= .

Formula 1: Mean: x = x

n

Where,

denotes summation of a set of values.

x is the variable used to represent raw scores.

n represents the number of scores being considered.

The result can be denoted by x if the available scores are samples from a larger population. If all scores of the

population are available, then we can denote the computed mean by the greek letter (pronounced mu).

2.2.2 The median

The median is the middle value of an ordered set of numbers. In the case 4, 5, 6, 7, 8, 10 the middle value is

between the 6 and the 7. So we say that the median is 6.5.

Note: It is important that the values must be in the correct order before you choose the middle value.

Definition 2: The median of a set of scores is the middle value when the scores are arranged in order of

increasing (or decreasing) magnitude.

After first arranging the original scores in increasing (or decreasing) order, the median will be either of the

following:

1. If the number of scores is odd, the median is the number that is exactly in the middle of the list.

2. If the number of scores is even, the median is found by computing the mean of the two middle numbers.

Steps

Arrange the data in an array.

Determine the position of the median.

Median position = 2

1+n

Read the value of the median from the number list.


MANCOSA - MBA 31

Example: Find the median of each data set.

1. Over a 7-day period, the number of customers (per day) purchasing at Hides Leather Shop was as follows:

4 80 50 10 60 12 5

Array:

4 5 10 12 50 60 80

Median = (n+1)/2th item = (7+1)/2 = 4th item = 12.

The median is the fourth item which is 12.

2. Over an 8-day period, the number of customers observed at the shop per day was as follows:

21 5 11 7 12 15 20 5

Array:

5 5 7 11 12 15 20 21

Position of median: n + 1 = 8 + 1 = 4.5 (between 4th and 5th positions)

2 2

Median = (11+12)/2 = 11.5 (Average of 4th and 5th values)


The time taken to complete an assembling task has been measured for a group of employees and the results are

shown below:

Find the median in the scores 8, 2, 7, 3, 6, 9.


Begin by arranging the scores in increasing order.

2 3 6 7 8 9

We note that the numbers 6 and 7 share the middle position which is the average of the 3rd and 4th positions, i.e.

the (3+4)/2 = 3.5th position. Thus the median is the average of the 3rd and 4th values.

The mean of these two scores is therefore 5.62

76=

+ which is the median.


MANCOSA - MBA 32

2.2.3 The mode

The mode is the most common value. If we look at the following set of numbers:

3, 4, 5, 6, 6, 6, 7 the mode is 6 because it is the number that appears most often.

Definition 3: The mode is obtained from a collection of scores by selecting the score that occurs most frequently.

In those cases where no score is repeated there is no mode. Where two scores both occur with the same

greatest frequency, the data set is bimodal. If more than two scores occur with the same greatest frequency,

each is a mode and the data set is multimodal.

For ungrouped data the mode requires no calculation and can easily be obtained from a number list. If there is

no value that occurs more often than the others, then there is no mode, but this is not the same as a mode of

zero. A set of data may also have more than one mode and is then said to be bi-modal or multi-modal.

Example

1. The commission earnings of five salespeople were as follows for the previous month:

R5000 R5200 R5200 R5700 R8600

The modal commission was R5200

2. The lengths of stay (in days) for sample of 9 patients in a hospital are:

17 19 19 4 19 26 4 21 4

The modal lengths of stay are 19 and 4 days.

Example

There are 40 buck, 25 elephants and 20 smaller animals at a water hole. The modal category is buck since it

has the highest frequency.

The mode is the only central measure that can be used with data at the nominal level of measurement.

Example

The hourly income rates (in $) of 5 students are: 4 9 7 16 10

There is no mode.


MANCOSA - MBA 33

2.3 Grouped data

The problem is that we do not always have the actual data.

Sometimes the data is given as a frequency distribution. If we look at Table 1:

Table 1

Mass (in kg) Frequency

45-49 6

50-54 14

55-59 25

60-64 11

We know that there are 6 values in the first interval (first class) 45-49, but we do not have the actual values.

We must still be able to find the mean, the median and the mode.

2.3.1 Mean for grouped data

To get the mean, we take the midpoint of every class to represent the class.

There are 6 values in the first class. The midpoint of the first class is (45+49) / 2 = 47.

The total for the values of the first class is therefore 6 times 47 = 282.

The total for the values in the second class is 14 times 52 = 728.

The total for the values in the class 55-59 is 25 times 57 = 1425.

The total for the values in the interval 60-64 is 11 times 62 = 682.

If we add the class totals together we get 3117 (Check if this is correct)

To get the mean we must now divide by the number of values.

The number of values are 6+14+25+11 = 56.

The mean is 3117 divided by 56 = 55.66 kg.

As a formula we can write this as

=

i

ii

fxf

x , where we say x-bar (the mean) is the sum of the frequency

times the class midpoint, divided by the sum of the frequencies.

The value for the arithmetic mean that you get from ungrouped data is a better value to use, if the actual

ungrouped data is available.


MANCOSA - MBA 34

The mean for grouped data or the mean from a frequency distribution

Simple Frequency Distribution

Formula 2.2: mean:

= ffx

x

where x = class mark

f = frequency


The number of times per week that a particular photocopy machine breaks down was recorded over a period of

60 weeks. The results are given in the frequency table below.

Number of breakdowns 0 1 2 3 4 5

Number of weeks 15 12 16 10 5 2

Required

1. Find the mean number of breakdowns per week over the 60-week period.

2. A metro council needs information about the times local bicycle commuters spend on the road. A sample of

12 local bicycle commuters yields the following times in minutes:

22 29 27 30 12 22 31 15 26 16 48 23

Determine the mean travelling time.


MANCOSA - MBA 35


Table 2.1: Calculations for self assessment activity 2.1

Note that the figures in the third (fx) column have been formed by multiplying the corresponding figures in the

first two columns. From Equation 2.2, the mean number of breakdowns per week is:

73.160

104===

ffx

x

(Reasonable check: The data are very roughly balanced around 2, which is also the mode. A mean not too far

from 2 is therefore reasonable.)

2. Mean: 1.2512301

12234816261531221230272922

=+++++++++++

=x

Grouped Frequency Distribution

When using tabulated or grouped data from a frequency distribution, the individual values are not known. To

enable us to calculate this statistic, we need to assume that observation in a particular interval all take the same

value, and that value is the midpoint of the interval.

fxx f=

x = class midpoint

f = frequency of each class

n = number of observation in the sample = f

Steps

compute the midpoint (x) for each class.

multiply each midpoint by the respective frequency of that class (xf) and sum the product (xf).

Sum the frequency column, n = f

Divide the xf by n

x f fx

0 15 0

1 12 12

2 16 32

3 10 30

4 5 20

5 2 10

Total f = 60 fx = 104


MANCOSA - MBA 36

Example

The times taken to complete a particular assembling task have been measured for 250 employees and the

results are shown below.

Time (min) No. of people (f) x fx

0 - 5 2 2.5 5.0

5 - 10 2 7.5 15.0

10 - 15 3 12.5 37.5

15 - 20 5 17.5 87.5

20 - 25 5 22.5 112.5

25 - 30 18 27.5 495.5

30 - 35 85 32.5 2 762.5

35 - 40 92 37.5 3 450.0

40 - 45 37 42.5 1 572.5

45 - 50 1 47.5 47.5

Total 250 8 585.0

The arithmetic mean time is: 34.34250

8585===

ffx

x min.

Activity

The times during working hours in a factory when a certain machine is not operating as a result of breakage are

recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the mean of the

distribution

Time (min) f

0 - 10

10 - 20

20 - 30

30 - 40

40 - 50

50 - 60

60 - 70

70 - 80

80 - 90

3

13

30

25

14

8

4

2

1

Total 100


MANCOSA - MBA 37

2.3.2 The median for grouped data

As with the mean, we can get the median from grouped data as well. In this case we look at the cumulative

frequency.

There are 56 values in the table below, so the middle value will be value number 56 divided by 2 = 28. We want

to estimate what value number 28 was.

Mass (in kg) Frequency Cumulative

Frequency

45-49 6 6

50-54 14 20

55-59 25 45

60-64 11 56

At the end of the interval 45-49, we only have 6 values, so this is not at the median yet.

At the end of the 50-54 interval, we have 20 values, this is still short of the value 28 that we are looking for.

At the end of the interval 55-59, we have passed 45 values, this means we passed value number 28 as we

moved through the interval 55-59.

The median can be found from the following interpolation formula:

Me

MeMe f

cFnLMedian )2/( 1+=

where MeL is the lower limit of the median class. We said that the median class is the class 55-59. The lower

limit is the smallest value that will be rounded to this class, which is 54.5.

n is 56, the sum of the frequencies, so n/2 is 28.

1MeF is the cumulative frequency of the class that precedes the median class, which is 20. (Make sure you can

see where this value comes from in the table of cumulative frequencies.

Mef is the frequency of the median class, which is 25 (from the table). c is the class width, which is 5. You can take 59-54 to get it, or you can take the actual class limits 59.5 minus

54.5.

Put these values into the formula and we get

1.566.15.5425

5)2028(5.54 =+=+=median

Check that this value is in fact in the class 55-59.


MANCOSA - MBA 38

Note: You have to know the basic structure of the formula. In this guide different letters will be used in the

formula. You must know the formula, not the symbols used to represent the different variables. What would

happen in the exam if the formula is given with different symbols, would you still be able to calculate the median?

As with the mean, the value for the median that you get from ungrouped data is more accurate. If you have the

data available (like when you do your research project) it is better to use the ungrouped data to get the median.

Calculation of Median for Grouped data

The median can be determined either graphically or by calculation. With grouped data we are unable to

determine where the true middle value falls, but we can estimate the median by using a formula and assuming

that the median value will be the th

n

2item.

Median = cfFn

Lm

+2

L = lower boundary of the median class

f = sum of all the frequencies up to, but not including, the median class or the cumulative


MANCOSA - MBA 39


The time taken to complete an assembling task has been measured for 250 employees and the results are

shown below:

Time taken (min) Number of people (f) Cumulative


MANCOSA - MBA 40

2.3.3 The mode from grouped data

The mode is the most common value. It is the maximum value of the histogram that we want to estimate.

Mass (in kg) Frequency Cumulative

Frequency

45-49 6 6

50-54 14 20

55-59 25 45

60-64 11 56

The mode can be found by first deciding in what class it is and then using an interpolation formula.

From the table we see that the class (interval) with the highest frequency is the class 55-59 with a frequency of

25. So we say that the class 55-59 is the modal class.

The interpolation formula is 11

1

2)(

+

+=MoMoMo

MoMoMo fff

cffLMode

MoL , the lower limit of the modal class is 54.5,

Mof , the frequency of the modal class is 25,

1Mof , the frequency of the previous class is 14,

1+Mof , the frequency of the next class is 11, And c , the class width is 5.

Put these values into the formula to get

7.562.25.541114252

5)1425(5.54 =+=

+=Mode

So the mode is 56.7.

Later a different formula will be given where MoMo ffd = 11 , so again make sure that you are not confused if the formula looks different, it is the same formula. Remember if a lecturer uses a formula that looks slightly

different, it is up to you as a masters level student to check that it is still the same formula.

Unlike the median and the mean, the value we get for the mode is more accurate from grouped data. So

whenever possible calculate the mode from the grouped data.


MANCOSA - MBA 41

Calculation of the mode from a grouped frequency distribution.

It is not possible to calculate the exact value of the mode of the original data in a grouped frequency distribution,

since information is lost when the data are grouped. However, it is possible to make an estimate of the mode.

The class interval with the largest frequency is called the modal class.

(Note: The following formula looks different. Does it give the same answer?)

Mode = L + 1

1 2

dc

d d

+

Where:

L = lower limit of the modal class.

1d = frequency of the modal class minus the frequency of the immediately preceeding class.

2d = frequency of the modal class minus the frequency of the class that immediately follows the modal class.

c = the length of the class interval of the modal class.

Steps

Select the class containing the highest frequency as the modal class.

Use the formula to estimate the modal value.

Activity

The number of times during working hours in a factory when a certain machine is not operating as a result of

breakage are recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the

mode of the distribution

Time (min) f

0 10 3

10 20 13

20 30 30

30 40 25

40 50 14

50 60 8

60 70 4

70 80 2

80 90 1

Total 100


MANCOSA - MBA 42

Solution

The interval having the highest frequency, namely 30, is the 3rd interval: (20 30).

Mode = L + 1

1 2

dc

d d

+

min27.2727.720

221702010

22172010

517133020 =+=+=+=

+

+=

We used 20 as the lower limit, because if you look at the table you will see that the data are continuous and the

values are not rounded off. 19.999 would be in the class 10 to 20, while 20.00001 would be in the class 20 to 30.

2.4 The Best Average/Symmetry

The different averages have different advantages and disadvantages, and there are no objective criteria that

determine the most representative average of all data sets. Each researcher has to use his/her own discretion on

a set of data.

The mean is the most familiar average. It exists for each data set, takes every score into account, is affected by

extreme scores, and works well with many statistical methods.

The median is commonly used. It always exists, does not take every score into account, is not affected by

extreme scores, and is often a good choice if there are some extreme scores in the data set.

The mode is sometimes used. It might not exist, or there may be more than one mode. It does not take every

score into account, is not affected by extreme score, and is appropriate for data at the nominal level.

The best measure for central location

The arithmetic mean is more affected by extreme values. If your data has some values that are very large or

small (relative to the other values) then it is better to use the median. When we get to the normal distribution in a

later unit, you will see why the arithmetic mean is important.

Skewness

If there are large extreme values in your data the mean will be pulled to the right and we say that the distribution

is positive skew.

For a symmetrical distribution the mean, median and mode will be about the same

ModeMedianx == If we measure the mass or height of people it is usually a symmetrical (or normal) distribution. IQs or test results

are also usually from a normal distribution.


MANCOSA - MBA 43

A histogram of a symmetrical distribution is given in the following figure:

For a distribution that is skewed to the right the mode will be less than the median and the median will be less

than the mean.

xMedianMode


MANCOSA - MBA 44

A histogram that is skewed to the left (negative skewed) is shown in the following figure:

As a general rule the difference between the median and the mode is about twice the difference between the

mean and the median.

If the data are skewed to the left there are some outliers on the left (small values). If the data are skewed to the

right then there are some large outliers.

If the mean is 55.66, the Median is 56.1 and the Mode is 56.7. We thus have ModeMedianx


MANCOSA - MBA 45

A comparison of the mean and median can reveal information about skewness. Data can be identified as

skewed to the left, symmetric, skewed to the right. Data skewed to the left will have the mean and median to the

left of the mode, but in unpredictable order, as illustrated below:

The Relative Positions of the Mean, Median, and Mode:

Symmetric DistributionZero skewness :Mean =Median = Mode

ModeMedianMean

The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

Positively skewed: Mean>Median>Mode

ModeMedian

Mean


MANCOSA - MBA 46

Negatively Skewed: Mean


MANCOSA - MBA 47

2.5 Box plots

The box plot (box-and-whisker diagram) is a part of exploratory data analysis and reveals more information about

how the data is spread. The construction of a box plot requires the minimum, the maximum, the median, and two

other values called hinges.

Definition 1: The minimum score, the maximum score, the median, and two hinges constitute a 5-

number summary of a set of data.

Definition 2: The lower hinge is the median of the lower half of all scores (from the minimum score up

to the original median).

Definition 3: The upper hinge is the median of the upper half of all scores (from the original median up

to the maximum score).

1. Arrange the data in ascending order.

2. Find the median.

3. List the lower half of the data from the minimum score up to and including the median found in step 2. The

left hinge is the median of these scores (This value is called the first quartile).

4. List the upper half of the data starting with the median and including it in the scores up to and including the

maximum. The right hinge is the median of these scores. (This is called the third quartile).

5. List the minimum, the left hinge (from step 3), the median (from step 2), the right hinge (from step 4), and the

maximum.

Example. Construct the box plot for the following 20 scores:

9, 8, 6, 12, 4, 15, 7, 16, 8, 6, 13, 5, 9, 16, 4, 2, 6, 15, 9, 3

Arranging in increasing order, the list is:

2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16.

The lower half, after finding the median score 8 and including it, is:

2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8. The median of these scores is 6.

The upper half including the median of 8 is:

8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16. The median of these score is 12.

The minimum score is 2, the maximum score is 16, the median is 8, the left hinge is 6, and the right hinge is 12.

To construct the box plot, begin with a horizontal (vertical) scale. Box the hinges as shown and extend the lines

to connect the minimum score to a hinge and the maximum score to a hinge.


MANCOSA - MBA 48

0 6 8 122 104 14 16 18 20

Unit 2 Exercises: (Solutions are found at the end of the module guide)

Exercise 2.1

A supermarket sells kilogram-bags of pears. The numbers of pears in 21 bags were:

7 9 8 8 10 9 8 10 10 8 9

10 7 9 9 9 7 8 7 8 9

a) Find the mode, median and mean for these data.

b) Compare your results and comment on the likely shape of the distribution.

c) Plot a simple bar chart to portray the data.

Exercise 2.2

The number of credit cards carried by 25 shoppers are:

2 5 2 0 4 3 0 1 1 7 1 4 1

3 9 4 1 4 1 5 5 2 3 1 1

a) Determine the mode and median of this distribution.

b) Calculate the mean of the distribution and compare it to the mode and median.

What can you conclude about the shape of the distribution?

c) Draw a bar chart to represents the distribution and confirm your conclusions in (b).


MANCOSA - MBA 49

Exercise 2.3

A supermarket has one checkout for customers who wish to purchase 10 items or less.

The numbers of items presented at this checkout by 19 customers were:

10 8 7 7 6 11 10 8 9 9

9 6 10 9 8 9 10 10 10

a) Find the mode, median and mean for these data.

b) What do your results for (a) tell you about the shape of the distribution?

c) Plot a simple bar chart to portray the distribution.

Exercise 2.4

The numbers of driving tests taken to pass by 28 clients of a driving school are given in the following table:

a) Obtain the mode, median and mean from this frequency distribution and compare their value.

b) Plot a simple bar chart of the distribution.

Exercise 2.5

2.5.1 Spina Software Solutions operates an on-line help and advice service for PC owners. The numbers of

calls made to them by subscribers in a month are tabulated below.

2.5.2

Number of subscribers

Calls made Female Male

1 31 47

2 44 42

3 19 24

4 6 15

5 1 4

Find the mode, median and mean for both distributions and use them to compare the two distributions.

Tests taken Number of clients

1 10

2 8

3 4

4 3

5 3


MANCOSA - MBA 50

Exercise 2.6

Toofley the chemists own 29 pharmacies. The number of packets of a new skin medication sold in each of their

shops in a week were:

7 22 17 13 11 20 15 18 5 22

6 18 10 13 33 13 9 8 9 19

19 8 12 12 21 20 12 13 22

a) Find the mode and range of the data.

b) Identify the median of the data.

c) Find the lower and upper quartile values.

d) Determine the semi-interquartile range.

Exercise 2.7

Voditel international owns a large fleet of company cars. The mileages, in thousands of miles, of a sample of 17

of their cars over the last financial year were:

11 31 27 26 27 35 23 19 28 25

15 36 29 27 26 22 20

Calculate the mean and standard deviation of these mileage figures.

Exercise 2.8

Three credit companies each produced an analysis of its customers bills over the last month. The following

results have been published:

Company Mean bill size Standard deviation of bill size

Akula N$559 N$172

Bremia N$612 N$147

Dolg N$507 N$161

Are the following statements true or false?

a) Dolg bills are on average the smallest and vary more than those from the other companies.

b) Bremia bills are on average the largest and vary more than those from other companies.

c) Akula bill are on average larger than those from Dolg and vary more than those from Bremia.

d) Akula bill are on average smaller than those from Bremia and vary less than those from Dolg.

e) Bremia bill are on average larger than those from Akula and vary more than those from Dolg.

f) Dolg bill vary less than those from Akula and are on average less than those from Bremia.


MANCOSA - MBA 51

Exercise 2.9

The Kilocalories per portion in a sample of 32 different breakfast cereals were recorded and collated into the

following grouped frequency distribution:

Kcal per portion Frequency

80 up to 120 3

120 up to 160 11

160 up to 200 9

200 up to 240 7

240 up to 280 2

a) Obtain an approximate value for the median of the distribution.

b) Calculate approximate values for the mean and standard deviation of the distribution.

Exercise 2.10

The stem and leaf display below shows the Friday night admission prices for 31 clubs.

Stem Leaves

0 44

0 5555677789

1 000224444

1 5555588

2 002

Leaf unit =N$1

Find the values of the median and semi-interquartile range.

Exercise 2.11

Select which of the statements on the right-hand side best defines the words on the left-hand side.

(i) median (a) the square of the standard deviation

(ii) range (b) a diagram based on order statistics

(iii) variance (c) the most frequently occurring value

(iv) boxplot (d) the difference between the extreme observations

(v) SIQR (e) the middle value

(vi) mode (f) half the difference between the first and third quartiles


MANCOSA - MBA 52

Student self review questions

1) What is a measure of location.

2) How is the arithmetic mean defined?

3) Why is the special notation x1,x2,.,x,, used?

4) What does fx mean?

5) Why is the formula for the arithmetic mean of a frequency distribution different to that for the mean of a

set?

6) How is it that the mean of a grouped frequency distribution cannot be calculated exactly?

7) In what situation would a weighted mean be used?

8) Why is the mean considered to be the mathematical average?

9) What is the main disadvantage of the mean?

10) How is the mode defined?

11) Why is the mode not used extensively in statistical analysis?

12) Under what conditions may any one of the mean, median or mode be estimated, given the values of the

other two?

13) Write down the definition of the geometric mean and the type of values that it can be used to average.

14) Write down the definition of the harmonic mean and type of values that it can be used to average.

15) How is the median defined?

16) If a set has an even number of items, how can the median be determined?

17) Describe briefly how to estimate the median of a grouped frequency distribution graphically.

18) What is the graphical equivalent of the interpolation formula?

19) On balance, why is the graphical method preferred to the formula method for estimating the median?

20) Name two separate conditions under which the median rather than the mean would be chosen as a

measure of location and explain why.

21) What is the main disadvantage of the median?

22) What characteristic of the mean deviation precludes it from being the natural partner to the mean?

23) How is the standard deviation defined?

24) What is the practical advantage in using the computational formula for calculating the standard deviation?

25) The standard deviation is the natural partner to the mean. Explain why this is so.

26) What percentage of an approximately symmetric distribution lies within two standard deviation from the

mean?

27) What is the coefficient of variation and how is it used?

28) How is Pearsons measure of skewness calculated and how does it measure skewness?

29) What is the variance and why is it not used for practical purposes as a measure of dispersion?


MANCOSA - MBA 53

UNIT 3

MEASURE OF DISPERSION (VARIABILITY)


MANCOSA - MBA 54

UNIT 3: MEASURE OF DISPERSION (VARIABILITY)

OBJECTIVES


Define the various measures of dispersion.

Compute each dispersion measure for both grouped and ungrouped sets of data.

Interpret each measure of dispersion.

CONTENT

3.1 Introduction

3.2 Range

3.3 Standard deviation

3.4 Variance

3.5 Coefficient of variation

3.6 Measure of non-central position

3.7 Self-evaluation


MANCOSA - MBA 55

3.1 Introduction

For two projects A and B, we estimate the returns on the projects over the next year. We look at the percentage

return that will be achieved under different conditions (pessimistic, normal or optimistic).

Pessimistic Normal Optimistic

Project A 12 13 14

Project B 0 13 26

Must the company invest in Project A or Project B if the probability that the pessimistic, normal or optimistic

conditions will prevail are equal?

For Project A the mean is %133/)141312( =++=x For Project B the mean is %133/)26130( =++=x

The mean returns for the projects are equal. That means the expected returns for the projects are equal. Would

you prefer Project A, where your minimum return is 12% or Project B, where you could make no return at all

(0%)? You will do a course on Finance as part of the MBA. In this course you will learn that you have to select

the project that is more predictable (you want to maximize your return, but at the same time you want to minimize

your risk). The returns for the projects are the same, but Project A is more predictable. In statistics we need

measures to measure this spread. For the example above (with only three values) it is easy to see that Project B

has a wider spread of returns, but what happens if we have hundreds of values?

The variability among data is one characteristic to which averages are not sensitive. Consider following two

groups of data:

Group A Group B

65

66

67

68

71

73

74

77

77

77

42

54

58

62

67

77

77

85

93

100


MANCOSA - MBA 56

Computed Averages:

Group A

Mean = 71510

= 71.5

Median = 72

Mode = 77

Group B

Mean = 71510

= 71.5

Median = 72

Mode = 77

Interpretation

Although there is no difference in the computed central measures between the two groups, the scores of Group

B are much more widely scattered than the scores for Group A.


Which types of measures are used to measure dispersion (variability)?


The measures that are used to measure dispersion are:

Range

Standard deviation

Interquartile range

Quartile deviation

Variance

The method of computation, appropriate data types, uses and interpretation of each are now described.

3.2 Range

The first measure is the range. This is merely the biggest value minus the smallest value. For project A above it

is 14 - 12 = 2%, while for Project B it is 26 0 = 26%. The problem with this measure is that it looks only at the

two observations, we would rather have a measure that uses all the values.

The range is simply the difference between the highest value and the lowest value. For group A, the range is 77

65 = 12, and the range for group B is 100 42 = 58, which suggests greater dispersion. The range depends

only on the maximum and minimum scores, and is a rough measure of spread.


MANCOSA - MBA 57

Ungrouped data: Range = Maximum value Minimum value = max minx x

Grouped data: Range = Upper limit of highest class Lower limit of lowest class.


The merchandising manager for a retail clothing chain has recorded 30 observations on the number of days

between re-orders for a particular range of womans clothing.

The re-order intervals (in days) are:

18 26 15 17 7 27 24 17 10 17

23 29 28 18 10 23 16 9 12 26

5 12 23 22 24 14 16 26 19 22

Find the range of the number of days between re-orders.


maxx = 29

minx = 5

Range = 29 5 = 24 days

Interpretation

24 days separates the shortest time ( minx ) between successive re-orders from the longest time ( maxx ) between

successive re-orders for a particular range of womans clothing. The range depends only on the minimum and

maximum scores.


MANCOSA - MBA 58

3.3 Standard deviation

The standard deviation is given by the formula:

1)( 2

=

n

xxS .

For the following data, with the probability of the outcomes assumes equal, the standard deviation is calculated

as:

Pessimistic Normal Optimistic

Project A 12 13 14

Project B 0 13 26

For Project A the standard deviation is 122

13)1314()1313()1312( 222

==

++=S .

For project B the standard deviation is 132

33813

)1326()1313()130( 222==

++=S .

We see that the standard deviation for Project B is 13 times as large as the standard deviation for Project A.

On Excel the standard deviation for project A can be found by placing =stdev(12,13,14) in a cell.

In the exam you do not have Excel, so you will have to use a calculator. Most calculators can calculate the

statistical functions.

1. Put the calculator on Stat mode.

2. Enter 12

3. Press the DATA button (usually the M+ button).

4. The calculator displays 1, this means that you have entered one value.

5. Enter 13 and press DATA, the calculator displays 2.

6. Enter 14 and press DATA, the calculator displays 3.

7. Now ask for x , (It is usually second function 4) and the calculator will display 13.

8. Ask for nS , (Usually second function 6) and the calculator will display 1. (if you are working with a sample

you would use 1nS . (Some calculators show this as 1n )

Try this for Project B to see that you are doing it correctly.


MANCOSA - MBA 59

In Unit 5 we will come back to this. At this stage we can state that about two thirds of the values fall within one

standard deviation from the mean. About two thirds (about 37) values fall between 55.66-20.53 = 35.13 and

55.66+20.53 = 76.19. This gives us an indication of how far the values are from the mean (the central value).

In Corporate Finance the risk (uncertainty) is often measured with the standard deviation. They often say that

the risk is 20.53, but to be correct they should say that the standard deviation is 20.53.

3.3.1 Ungrouped data

2( )1

x xs

n

=

. Mathematical formula.

or

( ) ( )22( 1)

n x xs

n n

=

Computational formula.

Steps (Mathematical formula)

1. Compute the arithmetic mean ( x ).

2. Subtract the mean from each data value: ( x x ).

3. Square each difference: ( )2x x . 4. Sum the squared differences: ( )2x x . 5. Calculate the average by dividing the sum by ( )1n . Division by ( )1n is to correct the bias in estimating

the population standard deviation using the sample standard deviation.

6. The standard deviation is the square root of this total.

Example

Find the standard deviation of the following sample scores: 2, 3, 5, 6, 9, 17

x ( )x x 2( )x x 2 -5 25 3 -4 16 5 -2 4 6 -1 1 9 2 4 17 10 100

= 42 = 0 = 150

7642

: ==xmean


MANCOSA - MBA 60

Using the mathematical formula for the ungrouped data, the standard deviation is:

5.5305

15016

150==

=s

We will now use the computational formula for the self assessment exercise above.

From the previous table above, the sum of x is: 42x = .

The sum of the squares is: 2x = 4 + 9 + 25 + 36 + 81 + 289 = 444.

Thus the standard deviation is:

( ) ( )5.530

30900

3017642664

)16(6)42()444(6

)1(222

==

=

=

=

nn

xxns

The answer is identical to result calculated previously.

Check whether you get the same answer if you use the statistics function on the calculator.

3.4.2 Grouped data

If the actual raw data are not available and we have to calculate the standard deviation from the grouped data,

we use the formula: 1

22

=

fxnfx

S .

Table 1

Mass (in kg) Class midpoint Frequency 2fx 45-49 47 6 6 times 47 times 47 = 13254

50-54 52 14 14 times 52 times 52 = 37856

55-59 57 25 25 times 57 times 57 = 81225

60-64 62 11 11 times 62 times 62 = 42284

In Unit 2 (See 2.2.1) we calculated the mean as 55.66 kg and we saw that the total frequency is 56.

To get the 2fx , we have to add the column 13254+37856+81225+42284 = 174 619.

53.20156

66.55561746191

222

=

=

=

fxnfx

S


MANCOSA - MBA 61

If data have been grouped into a frequency distribution, each class is represented by its midpoint ( )x . 2( )

1x x f

sn

=

Mathematical formula

Steps

1. Compute the arithmetic mean ( )x . 2. Subtract the mean from each midpoint and square the difference: 2( )x x . 3. Multiply the squared difference by the frequency within each class: 2( )x x f . 4. Sum the result to obtain the total squared deviation from the mean: 2( )x x f . 5. Calculate the average by dividing this total by ( 1)n . 6. The standard deviation is the square root of this total.

OR

( ) ( )22( 1)

n fx fxs

n n

=

Computational formula

x = class mark (midpoint of class interval)

f = frequency

n = sample size


The errors in seven invoices were recorded as follows: 120, 30, 40, 8, 5, 20, 29

Use this data to calculate the standard deviation using both the Mathematical formula and Computational

formula.


MANCOSA - MBA 62



x ( )x x 2( )x x 120

30

40

8

5

20

29

84

-6

4

-28

-31

-16

-7

7 056

36

16

784

961

256

49

252 0 9 158

x = 2527

= 36 07.3117

91581

)( 2

=

=

n

xxs


x 2x

120

30

40

8

5

20

29

14400

900

1600

64

25

400

841

x =252 2x =18 230

( ) ( )22( 1)

n x xs

n n

=

( ) ( )27 18230 252

7(7 1)

=

127610 63504

42

=

6410642

= 1526.333= = 39.07


MANCOSA - MBA 63


The times (in hours per week) that 50 office staff members spent using personal computers were as follows:

Time (hours/week) Frequency (f)

0 - 3

3 - 6

6 - 9

9 - 12

12 - 15

15 18

14

6

6

7

14

3

f = 50

Use this data to compute the standard deviation using both Mathematical and Computational formulae.


Mathematical formula approach

Time (h) f x fx 2( )x x 2( )x x f 0 - 3

3 - 6

6 - 9

9 - 12

12 - 15

15 - 18

14

6

6

7

14

3

1.5

4.5

7.5

10.5

13.5

16.5

21

27

45

73.5

189

49.5

43.56

12.96

0.36

5.76

29.16

70.56

609.84

77.76

2.16

40.32

408.24

211.68

= 50 = 405 = 1 350.00

Mean: x = fxf

= 40550

= 8.1 h.

Standard Deviation: 25.5150

13501

)( 2

=

=

n

fxxs h.


MANCOSA - MBA 64

Using Computational formula approach

Time (h) f x fx 2x 2fx 0 - 3

3 - 6

6 - 9

9 - 12

12 - 15

15 - 18

14

6

6

7

14

3

1.5

4.5

7.5

10.5

13.5

16.5

21

27

45

73.5

189

49.5

2.25

20.25

56.25

110.25

182.25

272.25

31.5

121.5

337.5

771.75

2551.5

816.75

f = 50 x = 54 fx = 405 2

x = 43.5 2fx = 4630.5

( ) ( )22( 1)

n fx fxs

n n

=

( ) ( )250 4630.5 40550(50 1)

=

231525 1640252450

=

675002450

= 27.55= = 5.248 5.25 hours

3.4 Variance

The variance is the square of the standard deviation.

The variance for Project A is 12, and for project B it is 132 = 169.

Computation for ungrouped data

Example:

Consider the ages (in years) of 7 second hand cars: 13 7 10 15 12 18 9

Age in years (x ) x x x ( )2x x 13 12 +1 1

7 12 -5 25

10 12 -2 4

15 12 +3 9

12 12 0 0

18 12 +6 36

9 12 -3 9

Total ( )x x = 0 ( )2x x = 84


MANCOSA - MBA 65

Step 1:

Find the sample mean. x = x

n

= 847

= 12 years.

Step 2:

Find the squared deviation of each observation from the sample mean.

Since ( )x x =0, in column 3 above, the deviation must first be squared to avoid the plus and minus deviations cancelling each other. These squared deviations are then summed (see column 4 above).

Step 3:

Compute the variance by dividing the total squared deviation by (n-1).

i.e., variance ( 2s ) =

1)( 2

n

xxw

= 84

7 1 =

846

= 14

The formula for a variance can now be expressed as:

Variance = 1sizesample

deviationssquaredofsum

22 ( )

1x x

sn

=


The above mathematical formula for the variance is very complex. A more efficient approach using computational

technique is strongly recommended for students.

2 22 ( )

( 1)x n x

sn

=


Example

The variance for the car age problem.

The computational variance formula is used.

Age of car in years ( x) 2x

13 7 10 15 12 18 9

169 49 100 225 144 324 81

x = 84 2x = 1 092

22 1092 (7)(12 )

(7 1)s

=

= 146

84=

n = 7 x = 847

= 12 years


MANCOSA - MBA

MBA 1 Quantitative Methods January 2013

Documents

Transcript of MBA 1 Quantitative Methods January 2013