MBA 1 Quantitative Methods January 2013
-
Upload
ningegowda -
Category
Documents
-
view
51 -
download
4
description
Transcript of MBA 1 Quantitative Methods January 2013
-
QUANTITATIVE METHODS
STUDY GUIDE
PROGRAMME : MBA Year 1
CREDIT POINTS : 20 points
NOTIONAL LEARNING : 200 hours over 1 semester
TUTOR SUPPORT : [email protected]
Copyright 2013
MANAGEMENT COLLEGE OF SOUTHERN AFRICA
All rights reserved; no part of this book may be reproduced in any form or by any means, including
photocopying machines, without the written permission of the publisher
REF: EQM 2013
-
Quantitative Methods
MANCOSA - MBA 1
TABLE OF CONTENTS
UNIT
TITLE OF SECTION
PAGE
General Outcomes
3
Prescribed Reading
4
1
Graphical Representation
5
2
Measure of Central Tendency
27
3
Measure of Dispersion (Variability)
53
4
Probability
75
5
Probability Distribution
101
6
Hypothesis Testing
115
7
Simple Linear Regression and Correlation Analysis
149
8
Forecasting Time Series Analysis
167
9
Decision Analysis Decision Trees and Payoff Tables
185
Solutions to Units Exercises
209
References
243
Tables
245
-
Quantitative Methods
MANCOSA - MBA 2
-
Quantitative Methods
MANCOSA - MBA 3
General Outcomes
Studying this module will enable the student to:
Apply simple statistical tools and analyses to solve business-related problems.
Interpret and analyse business data for production, planning, forecasting and other decision-making
functions.
Communicate effectively with statistical analysts.
Apply quantitative methods and techniques to other management disciplines Economics, Accounting,
Financial Management, Marketing and Research.
Syllabus: The syllabus for the module is as follows:
Topic 1: Descriptive Statistics:
a. Graphical Representation
b. Measures of central Tendency
c. Measures of spread
d. Probability and Probability distributions
Topic 2: Inferential Statistics:
a. Hypothesis testing
b. Simple linear regression and correlation analysis
Topic 3: Forecasting Time series analysis
Topic 4: Decision Analysis Decision Trees and payoff tables
Topic 5: Time Value of Money
a. Simple and Compound Interest
b. Depreciation
c. Present Value
d. NPV
e. IRR
-
Quantitative Methods
MANCOSA - MBA 4
READING
Prescribed Textbook:
Trevor Wegner (2006). Applied Business Statistics: Methods and Applications, Juta & Co, Ltd: Cape Town
Recommended Textbook:
Lind, Marchal and Wathen (2005). Statistical Techniques in Business and Economics (12th Edition), New York:
McGraw-Hill. Chapter 1
The purpose of this course
Statistics as a subject has been included in the MBA curriculum because it is needed in two main areas:
1. Descriptive statistics are used in subjects like Finance, Operations etc. to describe business phenomena.
When you get to these study areas it will be explained where they are used, and
2. It is a requirement for an MBA degree that you must complete a research project. In this research project
you will have to collect data. In processing the data to make decisions you will need inference. Inference
(hypothesis testing) is covered in the latter part of this course.
-
Quantitative Methods
MANCOSA - MBA 5
UNIT 1
GRAPHICAL REPRESENTATION
-
Quantitative Methods
MANCOSA - MBA 6
UNIT 1: GRAPHICAL REPRESENTATION
OBJECTIVES
By the end of this study unit, you should be able to:
1. Recognise whether the type of data under consideration is quantitative, qualitative, or ranked.
2. Summarise a set of quantitative data by means of a frequency distribution, histogram, relative
frequency polygon.
3. Summarise a set of qualitative data by means of a pie chart and bar chart.
CONTENTS
1.1 Introduction
1.2 Types of data
1.3 Graphical Techniques for Quantitative Data
1.4 Pie Charts, Bar Charts, and Line Charts
1.5 Scatter Diagrams
1.1 Introduction
The basic types of data is described in this unit. In Section 1.3 some graphical methods to present the data is
included.
1.2 Types of data
Statistics is the science of collecting and analyzing data. Data are obtained by measuring the values of one or
more variables. Data can be classified as either quantitative data or qualitative data.
Quantitative data are measurements that are recorded on a naturally occurring numerical scale.
Some examples of quantitative data are:
The time that you have to wait for the next bus.
Your height or weight
Qualitative data can only be classified into categories like:
The political party that you support
Your gender
Sometimes arbitrary numerical values are assigned to qualitative data like calling males 1 etc.
-
Quantitative Methods
MANCOSA - MBA 7
The appropriate graphical method to be used in presenting data depends, in part, on the type of data
under consideration. Later in the guide, when statistical inference is covered, the data type will help to
identify the appropriate statistical technique to be used in solving a problem. In a few situations, it will be
necessary to recognise whether or not a set of non-quantitative data can be ordered. If the categories for
a set of non-quantitative data can be ordered or ranked, we have a third type of data, called ranked data.
SELF-ASSESSMENT ACTIVITY 1.1
How do I identify quantitative data?
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.1
Quantitative data are real numbers. They are not numbers arbitrarily assigned to represent qualitative data. An
experiment that produces qualitative data always asks for verbal, non-numerical responses (e.g., yes and no;
defective and non-defective; Catholic, Protestant, and other).
Numerical data can also be classified as discrete (when there are only specific values that appear like the
number of students in a class) or continuous (when you can have intermediate values like your height that can
be measured more accurately).
Continuous data are sometimes summarized in tables where the number of data items in each interval is given.
See the example of interval data in the next table:
Mass (kg) Frequency
45-49 6
50-54 14
55-59 25
60-64 11
-
Quantitative Methods
MANCOSA - MBA 8
SELF-ASSESSMENT ACTIVITY 1.2
How do I identify quantitative data?
For each of the following examples of data, determine whether the data type is quantitative, qualitative,
or ranked.
a) the weekly level of the prime interest rate during the past year.
b) the make of car driven by each of a sample of executives.
c) the number of contacts made by each of a company's salespersons during a week.
d) the rating (excellent, good, fair, or poor) given to a particular television program by each of a sample
of viewers.
e) the number of shares traded on the New York Stock Exchange each week throughout 2005.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.2
a) Quantitative, if the interest rate level is expressed as a percentage. If the level is simply observed as
being high, moderate, or low, then the data type is qualitative.
b) Qualitative.
c) Quantitative.
d) Ranked, because the categories can be ordered.
e) Quantitative.
1.3. Graphical Techniques for Quantitative Data
This section introduced the basic methods of descriptive statistics used for organising a set of numerical
data in tabular form and presenting it graphically. Summarising data in this way requires that you first
group the data into classes. Judgment is required concerning the number and the size of the classes to
be used. The presentation of the grouped data should enable the user to quickly grasp the general shape
of the distribution of the data.
SELF-ASSESSMENT ACTIVITY 1.3
How do I choose the number of classes and the width of the classes to be used in constructing a
frequency distribution?
-
Quantitative Methods
MANCOSA - MBA 9
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.3
Although this choice is arbitrary and no hard-and-fast rules can be given, here are a few useful
guidelines:
1. The classes must be non-overlapping, so that each measurement falls into exactly one class.
Therefore, choose the classes so that no measurement falls on a class boundary.
2. Choose the number of classes to be used as a number between 5 and 20, with smaller numbers of
classes being chosen for smaller data sets.
3. The approximate width of each class is given by the following:
Approximate class width = classesofNumber
valueimuminmvalueMaximum
Choose the actual class width to be a value close to the approximate width that is convenient to work with.
Avoid awkward fractional values.
SELF-ASSESSMENT ACTIVITY 1.4
The weights in kilograms of a group of workers are as follows:
173 165 171 175 188
183 177 160 151 169
162 179 145 171 175
168 158 186 182 162
154 180 164 166 157
1.4.1 Construct a stem and leaf display for these data.
1.4.2 Construct a frequency distribution for these data.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.4
1.4.1 The first step in constructing a stem and leaf display is to decide how to split each observation
(weight) into two parts: a stem and a leaf. For this example, we will define the first two digits of an
observation to be its stem and the third digit to be its leaf. Thus, the first two weights are split into
a stem and a leaf as follows:
Weight Stem Leaf
173 17 3
183 18 3
-
Quantitative Methods
MANCOSA - MBA 10
Scanning the remaining weights, we find that there are five possible stems (14, 15, 16, 17 and 18), which
we list in a column from smallest to largest, as shown below. Next, we consider each observation in turn
and place its leaf in the same row as its stem, to the right of the vertical line. The resulting stem and leaf
display shown below has grouped the 25 weights into five categories. The second row of the display,
corresponding to the stem 15, has four leaves: 4, 8, 1 and 7. The four weights represented in the second
row are therefore 151, 154, 157 and 158.
Stem Leaf
14 5
15 1 4 7 8
16 2 8 5 0 4 6 9 2
17 3 7 9 1 5 1 5
18 3 0 6 2 8
1.4.2 The hardest, and most important, step in constructing a frequency distribution is choosing the
number and width of the classes. Constructing a stem and leaf display first is often helpful. For this
example, the display in part a) suggests using five classes, each with a width of 10 pounds. The number
(or frequency) of weights falling into each class is then recorded as shown in the table that follows. Care
must be taken to define the classes in such a way that each measurement belongs to exactly one class.
We will follow the convention that a class (such as 140 up to 150) contains all measurements from the
lower limit (140) up to, but not including, the upper limit (150).
Stem No of leaves
140 up to 150 1
150 up to 160 4
160 up to 170 8
170 up to 180 7
180 up to 190 5
Total 25
Suppose that we hadn't first constructed a stem and leaf display, or that the stem and leaf display
contained only a few, or too many, categories. (If the number of measurements is less than 50, the
frequency distribution should contain between 5 and 7 classes.) We might then begin by noting that the
smallest and largest measurements are 145 and 188, respectively, so that the range of the
measurements is 188 - 145 = 43. If we decide to use five classes, the approximate width of each class is
43/5 = 8.6. In order to work with "round" numbers, we have chosen to use a class width of 10 and to set
the lower limit of the first class at 140.
-
Quantitative Methods
MANCOSA - MBA 11
SELF-ASSESSMENT ACTIVITY 1.5
Refer to the data in Example 1.4 above
1.5.1 Construct a relative frequency histogram for the data.
1.5.2 Construct a relative frequency polygon for the data.
1.5.3 Construct an ogive for the data.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.5
1.5.1 The relative frequencies, obtained by dividing each frequency by 25, are shown below:
Class Limits Frequency Relative
Frequency
Cumulative Relative
Frequency
140 up to 150 1 0.04 0.04
150 up to 160 4 0.16 0.20
160 up to 170 8 0.32 0.52
170 up to 180 7 0.28 0.80
180 up to 190 5 0.20 1.00
00.050.1
0.150.2
0.250.3
0.35
Re
lativ
e fre
quen
cy
Weight (Kg)
Relative frequency histogram for weight of workers
140150160170180190
The relative frequency histogram is constructed by erecting over each class interval a rectangle, the height
of which equals the relative frequency of that class.
-
Quantitative Methods
MANCOSA - MBA 12
1.5.2 The relative frequency polygon is constructed by plotting the relative frequency of each class above
the midpoint of that class and then joining the points with straight lines. The polygon is closed by
considering one additional class (with zero frequency) at each end of the distribution and extending a
straight line to the midpoint of each of these classes.
1.5.3 The cumulative relative frequencies are shown in the table in part 1.5.1. The cumulative relative
frequency of a particular class is the proportion of measurements that fall below the upper limit
of that class. To construct the ogive, the cumulative relative frequency of each class is plotted
above the upper limit of that class, and the points representing the cumulative frequencies are
then joined by straight lines. The ogive is closed at the lower end by extending a straight line to
the lower limit of the first class.
Weights ( kg )
-
Quantitative Methods
MANCOSA - MBA 13
1.4 Pie Charts, Bar Charts, and Line Charts
The methods described in the previous section are appropriate for summarizing data that are quanti-
tative, or numerical measurements. But we must also be able to describe data that are qualitative, or
categorical data. These data consist of attributes, which are the names of the categories into which the
observations are sorted.
1.4.1 Pie Chart
A pie chart is a useful method for displaying the percentage of observations that fall into each category of
qualitative data, while a bar chart can be used to display the frequency of observations that fall into each
category. If the categories consist of points in time and the objective is to focus on the trend in
frequencies over time, a line chart is useful.
SELF-ASSESSMENT ACTIVITY 1.6
Refer to the data in Example 1.4 above
According to the New York Times (27 September 1987), the June levels of unemployment in the United
States for five years were as follows:
Year Unemployed (millions)
1983 10.7
1984 8.5
1985 8.3
1986 8.2
1987 7.3
1.6.1 Use a bar chart to depict these data.
1.6.2 Use a line chart to depict these data.
-
Quantitative Methods
MANCOSA - MBA 14
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.6
1.6.1 The five years, or categories, are represented by intervals of equal width on the horizontal axis. The
height of the vertical bar erected above any year is proportional to the frequency (number of
unemployed) corresponding to that year.
Bar Chart for Unemployment
0
2
4
6
8
10
12
1983 1984 1985 1986 1987
Year
Freq
uen
cy ( m
illio
ns)
1.6.2 A line chart is obtained by plotting the frequency of a category above the point on the horizontal axis
representing that category and then joining the points with straight lines.
0
2
4
6
8
10
12
1983 1984 1985 1986 1987
-
Quantitative Methods
MANCOSA - MBA 15
SELF-ASSESSMENT ACTIVITY 1.7
The New York Times article alluded to in self-assessment 1.6 reported that 6 million Americans who say
they want work are not even seeking jobs.
A breakdown of these 6 million Americans by race follows:
Race Frequency
White 4320000
Black 1500000
Other 180000
Required: Use a pie chart to depict these data.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.7
A pie chart is an effective method of showing the percentage breakdown of a whole entity into its
component parts. We must first determine the percentage of the 6 million Americans belonging to each of
the three racial categories: 72% white, 25% black, and 3% other. Each category is represented by a slice of
the pie (a circle) that is proportional in size to the percentage (or relative frequency) corresponding to that
category. Since the entire circle corresponds to 360, the angle between the lines demarcating the White
sector is therefore (0.72)(360) = 259.2. In a similar manner, we can determine that the *angles for the
Black and Other sectors are 90 and 10.8, respectively. The pie chart is on the next page.
(259.2)
(90)
-
Quantitative Methods
MANCOSA - MBA 16
1.4.2 Bar charts
Bar charts are a quick and easy way of showing variation in or between variables.
Rectangles of equal width are drawn so that the area enclosed by each rectangle is proportional to the
size of the variable it represents. This type of graph not only illustrates a general trend, but also allows a
quick and accurate comparison of one period with another or the illustration of a situation a particular
time. When drawing up bar charts take care to:
make the bars reasonably wide so that they can be clearly seen;
draw them neatly and professionally;
ensure that the bars all have the same width;
ensure that the gaps between the bars have the same width.
We can produce a variety of bar charts to provide an overview of the data.
Simple bars representing each variable are drawn either vertically or horizontally.
1.4.3 Component or stacked bar chart
A single bar is drawn for each variable, with the heights of the bars representing the totals of the
categories. Each bar is then subdivided to show the components that make up the total bar. These
components may be identified by colouring or shading, accompanied by an explanatory key to show what
each component represents.
-
Quantitative Methods
MANCOSA - MBA 17
Percentage component bar chart
The components are converted to percentages of the total, and the bars are divided in proportion to
these percentages. The scale is a percentage scale and the height of each bar is therefore 100%
1.4.4 Multiple bar charts
Two or more bars are grouped together in each category. The use of a key helps to distinguish between
the categories.
1.5 Scatter Diagrams
This section introduced the notion of the relationship between two quantitative variables. Economists, for
example, are interested in the relationship between inflation rates and unemployment rates. Business
owners are interested in many variables, including the relationship between their advertising
expenditures and sales levels. The graphical technique used to depict the relationship between the
variables X and Y is the scatter diagram, which is a plot of all pairs of values (x, y) for the variables X
and Y.
-
Quantitative Methods
MANCOSA - MBA 18
SELF-ASSESSMENT ACTIVITY 1.8
An educational economist wants to establish the relationship between an individual's income and
education. She takes a random sample of 10 individuals and asks for their income (in $1,000s) and
education (in years). The results are shown below. Construct a scatter diagram for these data, and
describe the relationship between the number of years of education and income level.
x (education) y (income)
11 25
12 33
11 22
15 41
8 18
10 28
11 32
11 24
17 53
11 26
SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.8
If we feel that the value of one variable (such as income) depends to some degree on the value of the
other variable (such as years of education), the first variable (income) is called the dependent variable
and is plotted on the vertical axis. The ten pairs of values for education (x) and income (y) are plotted in
Figure 1.5.1, forming a scatter diagram.
The scatter diagram allows us to observe two characteristics about the relationship between education
(x) and income (y):
1. Because these two variables move together-that is, their values tend to increase together and
decrease together, there is a positive relationship between the two variables.
2. The relationship between income and years of education appears to be linear, since we can
imagine drawing a straight line (as opposed to a curved line) through the scatter diagram that
approximates the positive relationship between the two variables.
-
Quantitative Methods
MANCOSA - MBA 19
The pattern of a scatter diagram provides us with information about the relationship between two
variables. Figure 1.1 depicts a positive linear relationship. If two variables move in opposite directions,
and the scatter diagram consists of points that appear to cluster around a straight line, then the variables
have a negative linear relationship (see Figure 1.2). It is possible to have nonlinear relationships (see
Figures 1.3 and 1.4), as well as situations in which the two variables are unrelated (see Figure 1.5). In
Unit 7, we will compute numerical measures of the strength of the linear relationship between two
variables.
Figure 1.1
Scatter Diagram for Self Assessment
0
10
20
30
40
50
60
0 2 4 6 8 10 12 14 16 18 20
Years of Education
Incom
e ($'
000)
F i g 1. 2 N e ga t i v e Li ne a r Re l a t i onshi p
05
1015202530
0 10 20 30X
Figure 1.3 Nonlinear Relationship
0
50
100
0 10 20 30X
Y
Figure 1.4 Nonlinear Relatiuonship
05
101520
0 10 20 30
X
Y
Fig 1.5 No Relationshp
010
2030
0 10 20 30X
Y
-
Quantitative Methods
MANCOSA - MBA 20
Unit 1 Exercises: (Solutions are found at the end of the module guide)
Exercise 1.1
Describe three ways of (graphically) representing data which you can consider to be appropriate for inclusion in a
companys annual report and accounts. Name the advantages of these forms of data representation.
Exercise 1.2
Produce a pie chart showing the percentage market share of the passenger car market held by each of South
Africas car manufacturers.
Manufacturer 1991 Sales (Units)
Toyota
Nissan
Volkswagen
Delta
Ford
MBSA
BMW
MMI
51 653
20 793
39 757
20 949
18 631
15 756
15 431
14 731
Total 1991 Sales 197 701
Exercise 1.3
Produce a component bar chart showing the breakdown of car sales for Toyota, Nissan and Ford only between
the first and second half of 1991.
Manufacturer
1991 Sales (units)
Total units First Half
(Jan - June)
Second Half
Toyota
Nissan
Ford
15 653
20 793
18 631
19 629
9 565
9 875
32 024
11 228
8 756
Totals 91 077 39 069 52 008
-
Quantitative Methods
MANCOSA - MBA 21
Exercise 1.4
Produce a line graph showing the trend in market share for Volkswagen and Nissan over the period 1982 to
1991.
Year Volkswagen Nissan
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
13.4
11.6
9.8
14.4
17.4
19.9
21.3
22.2
19.6
20.1
9.9
9.6
8.2
6.8
7.8
9.7
11.7
10.2
10.6
10.5
Comment on the findings.
Exercise 1.5
Areas of Continents of the World.
Continents Area in million of
Square kilometres
Africa
Asia
Europe
North America
Oceania
South America
Russia
30.3
26.3
4.9
24.3
8.5
17.9
20.5
(i) Draw a bar chart of the above information
(ii) Construct a pie chart to represent the total area.
-
Quantitative Methods
MANCOSA - MBA 22
Exercise 1.6
The distance travelled (in kilometres) by a courier service motorcycle on 30 trips were recorded by the driver.
24 19 21 27 20 17 17 32 22 26
18 13 23 30 10 13 18 22 34 16
18 23 15 19 28 25 25 20 17 15
a) Define the random variable, the data type, and the measurement scale. b) From the data set, prepare:
i. an absolute frequency distribution,
ii. a relative frequency distribution, and
iii. the (relative) less than ogive.
c) Construct the following graphs: i. a histogram of the relative frequency distribution, and
ii. the cumulative frequency polygon.
d) From the graphs, read off: i. what percentage of trips was between 25 and 30 km long?
ii. what percentage of trips were under 25 km long?
iii. what percentage of trips were 22 km or more?
iv. below which distance were 55% of the trips made?
v. above which distance were 20% of the trips made?
Exercise 1.7
Vorovka Director Marketing has offices in Windhoek, Johannesburg, Durban and Botswana. The number of
employees in each location and their genders are tabulated below.
Office Females Males Total
Windhoek 12 8 20
Johannesburg 9 15 24
Durban 23 6 29
a) Plot a cluster bar chart to show the total number of employees in each office.
b) Plot a component bar chart to show the number of employees in each office by gender.
c) Plot a cluster bar chart to show the number of employees at each office by gender.
-
Quantitative Methods
MANCOSA - MBA 23
Exercise 1.8
Tourists seeking holiday accommodation in a self-catering complex in the resort ABC of Namibia can make either
a one-or two-week booking. The manager of the complex has produced the following table to show the bookings
she received last season:
Type of booking
Tourists home country One-week Two-week
France 13 44
Germany 29 36
Holland 17 21
Ireland 8 5
a) Produce a simple bar chart to show the total number of bookings by home country. b) Produce a component bar chart to show the number of bookings by home country and types of booking. c) Produce a cluster bar chart to show the numbers of bookings by home country and type of booking.
Exercise 1.9
A roadside breakdown assistance service answer 37 calls in Cape Town on one day. The response times taken
to deal with these calls were noted and have been arranged in grouped frequency distribution below.
Response time (minutes) Number of calls
20 to under 30 4
30 to under 40 8
40 to under 50 17
50 to under 60 6
60 to under 70 2
a) Produce a histogram to portray this distribution and describe the shape of the distribution. b) Find the cumulative frequency for each class. c) Produce a cumulative frequency graph of the distribution.
-
Quantitative Methods
MANCOSA - MBA 24
Exercise 1.10
Rents per person (to the nearest $) for 83 flats and houses advertised on the notice boards at a university were
collected and the following grouped frequency distribution compiled:
Rent per person ($) Frequency
35 - 39 13
40 - 44 29
45 - 49 22
50 - 54 10
55 - 59 7
60 - 64 2
a) Plot a histogram to portray this distribution and comment on the shape of the distribution. b) Find the cumulative frequency for each class. c) Plot a cumulative frequency graph of the distribution.
Exercise 1.11
Monthly membership fees in $ for 22 health clubs are:
34 43 44 22 73 69 48 67 33 56 67
27 78 60 63 32 67 41 65 48 48 77
Compile a stem and leaf display of these data.
The clubs whose fees appear in bold do not have a swimming pool. Highlight them in your display.
Exercise 1.12
Select which of the statements listed below on the right-hand side describes the words listed on the left-hand
side.
(i) Histogram a) can only take a limited number of values
(ii) Time series b) segments or slice represents categories
(iii) Pictogram c) each plotted point represents a pair values
(iv) Discrete data d) separates parts of each observation
(v) Stem and leaf display e) each block represents a class
(vi) Scatter diagram f) data collected at regular intervals over time
(vii) Pie chart g) comprises set of small pictures
-
Quantitative Methods
MANCOSA - MBA 25
Student review questions
1. Describe the difference between quantitative data and qualitative data.
2. For each of the following examples of data, determine whether the data are quantitative, qualitative,
or ranked.
a) the month of the highest sales for each firm in a sample.
b) the department in which each of a sample of university professors teaches.
c) the weekly closing price of gold throughout a year.
d) the size of soft drink (large, medium, or small) ordered by a sample of customers in a restaurant.
e) the number of barrels of crude oil imported monthly by the United States.
3. Identify the type of data observed for each of the following variables.
a) the number of students in a statistics class.
b) the student evaluations of the professor (1 = poor, 5 = excellent).
c) the political preferences of voters.
d) the states in the United States of America.
e) the size of a condominium (in square feet).
-
Quantitative Methods
MANCOSA - MBA 26
-
Quantitative Methods
MANCOSA - MBA 27
UNIT 2
MEASURES OF CENTRAL TENDENCY
-
Quantitative Methods
MANCOSA - MBA 28
UNIT 2: MEASURES OF CENTRAL TENDENCY
OBJECTIVES
By the end of this study unit, you should be able to:
Determine the mean, median and mode for grouped and ungrouped data.
Describe the symmetry/skewness of a set of data in terms of the mean, median and mode.
Calculate the range, standard deviation, variance, quartiles and inter-quartile range for grouped as well as
ungrouped data.
CONTENTS
2.1 Introduction
2.2 Ungrouped data
2.2.1 Mean
2.2.2 Median
2.2.3 Mode
2.3 Grouped data
2.3.1 Mean for grouped data
2.3.2 Median for grouped data
2.3.3 Mode for grouped data
2.4 The best average
2.5 Box plots
2.6 Self-evaluation
-
Quantitative Methods
MANCOSA - MBA 29
2.1 Introduction
This unit discusses numerical descriptive measures used to summarise and describe sets of data. There are
three commonly used numerical measures of central tendency of a data set: the mean, the median, and the
mode. You are expected to know how to compute each of these measures for a given data set. Moreover, you
are expected to know the advantages and disadvantages of each of these measures, as well as the type of data
for which each is an appropriate measure.
An average that consists of a single value that is central to or representative of the entire data set is information
of great importance. The most commonly used averages are the mean, median and mode. There are three
measures of central tendency that are often used:
2.2 For ungrouped data
2.2.1 The arithmetic mean
The first and most important one is the arithmetic mean (at school you just called this the average). Sometimes
we merely call the arithmetic mean the mean.
To calculate the mean of some numbers we merely add the numbers together and divide the total by the number
of values.
The mean of: 4, 5, 6, 7, 8, 10 is 40 / 6 = 6.66 (The total of the values is 40 and there are 6 values.)
In Excel the mean can be found by placing = Average(4,5,6,7,8,10) in a cell.
The mean can be written as a formula: N
xx
i=
We say X-bar (or the mean) is the sum of the values ( ix s) divided by the number of values (N).
The arithmetic mean is the most important of all numerical descriptive measurements, and it corresponds to what
most people call an average.
Definition 2.1: The arithmetic mean of a list of scores is obtained by adding the scores and dividing the total by
the number of scores. It will be referred to simply as the mean.
-
Quantitative Methods
MANCOSA - MBA 30
Example 1
Find the mean of the scores 2, 3, 6, 7, 12.
The mean score is 2 3 6 7 12 6
5+ + + +
= .
Formula 1: Mean: x = x
n
Where,
denotes summation of a set of values.
x is the variable used to represent raw scores.
n represents the number of scores being considered.
The result can be denoted by x if the available scores are samples from a larger population. If all scores of the
population are available, then we can denote the computed mean by the greek letter (pronounced mu).
2.2.2 The median
The median is the middle value of an ordered set of numbers. In the case 4, 5, 6, 7, 8, 10 the middle value is
between the 6 and the 7. So we say that the median is 6.5.
Note: It is important that the values must be in the correct order before you choose the middle value.
Definition 2: The median of a set of scores is the middle value when the scores are arranged in order of
increasing (or decreasing) magnitude.
After first arranging the original scores in increasing (or decreasing) order, the median will be either of the
following:
1. If the number of scores is odd, the median is the number that is exactly in the middle of the list.
2. If the number of scores is even, the median is found by computing the mean of the two middle numbers.
Steps
Arrange the data in an array.
Determine the position of the median.
Median position = 2
1+n
Read the value of the median from the number list.
-
Quantitative Methods
MANCOSA - MBA 31
Example: Find the median of each data set.
1. Over a 7-day period, the number of customers (per day) purchasing at Hides Leather Shop was as follows:
4 80 50 10 60 12 5
Array:
4 5 10 12 50 60 80
Median = (n+1)/2th item = (7+1)/2 = 4th item = 12.
The median is the fourth item which is 12.
2. Over an 8-day period, the number of customers observed at the shop per day was as follows:
21 5 11 7 12 15 20 5
Array:
5 5 7 11 12 15 20 21
Position of median: n + 1 = 8 + 1 = 4.5 (between 4th and 5th positions)
2 2
Median = (11+12)/2 = 11.5 (Average of 4th and 5th values)
SELF-ASSESSMENT ACTIVITY 2.3
The time taken to complete an assembling task has been measured for a group of employees and the results are
shown below:
Find the median in the scores 8, 2, 7, 3, 6, 9.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 2.3
Begin by arranging the scores in increasing order.
2 3 6 7 8 9
We note that the numbers 6 and 7 share the middle position which is the average of the 3rd and 4th positions, i.e.
the (3+4)/2 = 3.5th position. Thus the median is the average of the 3rd and 4th values.
The mean of these two scores is therefore 5.62
76=
+ which is the median.
-
Quantitative Methods
MANCOSA - MBA 32
2.2.3 The mode
The mode is the most common value. If we look at the following set of numbers:
3, 4, 5, 6, 6, 6, 7 the mode is 6 because it is the number that appears most often.
Definition 3: The mode is obtained from a collection of scores by selecting the score that occurs most frequently.
In those cases where no score is repeated there is no mode. Where two scores both occur with the same
greatest frequency, the data set is bimodal. If more than two scores occur with the same greatest frequency,
each is a mode and the data set is multimodal.
For ungrouped data the mode requires no calculation and can easily be obtained from a number list. If there is
no value that occurs more often than the others, then there is no mode, but this is not the same as a mode of
zero. A set of data may also have more than one mode and is then said to be bi-modal or multi-modal.
Example
1. The commission earnings of five salespeople were as follows for the previous month:
R5000 R5200 R5200 R5700 R8600
The modal commission was R5200
2. The lengths of stay (in days) for sample of 9 patients in a hospital are:
17 19 19 4 19 26 4 21 4
The modal lengths of stay are 19 and 4 days.
Example
There are 40 buck, 25 elephants and 20 smaller animals at a water hole. The modal category is buck since it
has the highest frequency.
The mode is the only central measure that can be used with data at the nominal level of measurement.
Example
The hourly income rates (in $) of 5 students are: 4 9 7 16 10
There is no mode.
-
Quantitative Methods
MANCOSA - MBA 33
2.3 Grouped data
The problem is that we do not always have the actual data.
Sometimes the data is given as a frequency distribution. If we look at Table 1:
Table 1
Mass (in kg) Frequency
45-49 6
50-54 14
55-59 25
60-64 11
We know that there are 6 values in the first interval (first class) 45-49, but we do not have the actual values.
We must still be able to find the mean, the median and the mode.
2.3.1 Mean for grouped data
To get the mean, we take the midpoint of every class to represent the class.
There are 6 values in the first class. The midpoint of the first class is (45+49) / 2 = 47.
The total for the values of the first class is therefore 6 times 47 = 282.
The total for the values in the second class is 14 times 52 = 728.
The total for the values in the class 55-59 is 25 times 57 = 1425.
The total for the values in the interval 60-64 is 11 times 62 = 682.
If we add the class totals together we get 3117 (Check if this is correct)
To get the mean we must now divide by the number of values.
The number of values are 6+14+25+11 = 56.
The mean is 3117 divided by 56 = 55.66 kg.
As a formula we can write this as
=
i
ii
fxf
x , where we say x-bar (the mean) is the sum of the frequency
times the class midpoint, divided by the sum of the frequencies.
The value for the arithmetic mean that you get from ungrouped data is a better value to use, if the actual
ungrouped data is available.
-
Quantitative Methods
MANCOSA - MBA 34
The mean for grouped data or the mean from a frequency distribution
Simple Frequency Distribution
Formula 2.2: mean:
= ffx
x
where x = class mark
f = frequency
SELF-ASSESSMENT ACTIVITY 2.1
The number of times per week that a particular photocopy machine breaks down was recorded over a period of
60 weeks. The results are given in the frequency table below.
Number of breakdowns 0 1 2 3 4 5
Number of weeks 15 12 16 10 5 2
Required
1. Find the mean number of breakdowns per week over the 60-week period.
2. A metro council needs information about the times local bicycle commuters spend on the road. A sample of
12 local bicycle commuters yields the following times in minutes:
22 29 27 30 12 22 31 15 26 16 48 23
Determine the mean travelling time.
-
Quantitative Methods
MANCOSA - MBA 35
SOLUTION TO SELF-ASSESSMENT ACTIVITY 2.1
Table 2.1: Calculations for self assessment activity 2.1
Note that the figures in the third (fx) column have been formed by multiplying the corresponding figures in the
first two columns. From Equation 2.2, the mean number of breakdowns per week is:
73.160
104===
ffx
x
(Reasonable check: The data are very roughly balanced around 2, which is also the mode. A mean not too far
from 2 is therefore reasonable.)
2. Mean: 1.2512301
12234816261531221230272922
=+++++++++++
=x
Grouped Frequency Distribution
When using tabulated or grouped data from a frequency distribution, the individual values are not known. To
enable us to calculate this statistic, we need to assume that observation in a particular interval all take the same
value, and that value is the midpoint of the interval.
fxx f=
x = class midpoint
f = frequency of each class
n = number of observation in the sample = f
Steps
compute the midpoint (x) for each class.
multiply each midpoint by the respective frequency of that class (xf) and sum the product (xf).
Sum the frequency column, n = f
Divide the xf by n
x f fx
0 15 0
1 12 12
2 16 32
3 10 30
4 5 20
5 2 10
Total f = 60 fx = 104
-
Quantitative Methods
MANCOSA - MBA 36
Example
The times taken to complete a particular assembling task have been measured for 250 employees and the
results are shown below.
Time (min) No. of people (f) x fx
0 - 5 2 2.5 5.0
5 - 10 2 7.5 15.0
10 - 15 3 12.5 37.5
15 - 20 5 17.5 87.5
20 - 25 5 22.5 112.5
25 - 30 18 27.5 495.5
30 - 35 85 32.5 2 762.5
35 - 40 92 37.5 3 450.0
40 - 45 37 42.5 1 572.5
45 - 50 1 47.5 47.5
Total 250 8 585.0
The arithmetic mean time is: 34.34250
8585===
ffx
x min.
Activity
The times during working hours in a factory when a certain machine is not operating as a result of breakage are
recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the mean of the
distribution
Time (min) f
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
50 - 60
60 - 70
70 - 80
80 - 90
3
13
30
25
14
8
4
2
1
Total 100
-
Quantitative Methods
MANCOSA - MBA 37
2.3.2 The median for grouped data
As with the mean, we can get the median from grouped data as well. In this case we look at the cumulative
frequency.
There are 56 values in the table below, so the middle value will be value number 56 divided by 2 = 28. We want
to estimate what value number 28 was.
Mass (in kg) Frequency Cumulative
Frequency
45-49 6 6
50-54 14 20
55-59 25 45
60-64 11 56
At the end of the interval 45-49, we only have 6 values, so this is not at the median yet.
At the end of the 50-54 interval, we have 20 values, this is still short of the value 28 that we are looking for.
At the end of the interval 55-59, we have passed 45 values, this means we passed value number 28 as we
moved through the interval 55-59.
The median can be found from the following interpolation formula:
Me
MeMe f
cFnLMedian )2/( 1+=
where MeL is the lower limit of the median class. We said that the median class is the class 55-59. The lower
limit is the smallest value that will be rounded to this class, which is 54.5.
n is 56, the sum of the frequencies, so n/2 is 28.
1MeF is the cumulative frequency of the class that precedes the median class, which is 20. (Make sure you can
see where this value comes from in the table of cumulative frequencies.
Mef is the frequency of the median class, which is 25 (from the table). c is the class width, which is 5. You can take 59-54 to get it, or you can take the actual class limits 59.5 minus
54.5.
Put these values into the formula and we get
1.566.15.5425
5)2028(5.54 =+=+=median
Check that this value is in fact in the class 55-59.
-
Quantitative Methods
MANCOSA - MBA 38
Note: You have to know the basic structure of the formula. In this guide different letters will be used in the
formula. You must know the formula, not the symbols used to represent the different variables. What would
happen in the exam if the formula is given with different symbols, would you still be able to calculate the median?
As with the mean, the value for the median that you get from ungrouped data is more accurate. If you have the
data available (like when you do your research project) it is better to use the ungrouped data to get the median.
Calculation of Median for Grouped data
The median can be determined either graphically or by calculation. With grouped data we are unable to
determine where the true middle value falls, but we can estimate the median by using a formula and assuming
that the median value will be the th
n
2item.
Median = cfFn
Lm
+2
L = lower boundary of the median class
f = sum of all the frequencies up to, but not including, the median class or the cumulative
-
Quantitative Methods
MANCOSA - MBA 39
SELF-ASSESSMENT ACTIVITY 2.2
The time taken to complete an assembling task has been measured for 250 employees and the results are
shown below:
Time taken (min) Number of people (f) Cumulative
-
Quantitative Methods
MANCOSA - MBA 40
2.3.3 The mode from grouped data
The mode is the most common value. It is the maximum value of the histogram that we want to estimate.
Mass (in kg) Frequency Cumulative
Frequency
45-49 6 6
50-54 14 20
55-59 25 45
60-64 11 56
The mode can be found by first deciding in what class it is and then using an interpolation formula.
From the table we see that the class (interval) with the highest frequency is the class 55-59 with a frequency of
25. So we say that the class 55-59 is the modal class.
The interpolation formula is 11
1
2)(
+
+=MoMoMo
MoMoMo fff
cffLMode
MoL , the lower limit of the modal class is 54.5,
Mof , the frequency of the modal class is 25,
1Mof , the frequency of the previous class is 14,
1+Mof , the frequency of the next class is 11, And c , the class width is 5.
Put these values into the formula to get
7.562.25.541114252
5)1425(5.54 =+=
+=Mode
So the mode is 56.7.
Later a different formula will be given where MoMo ffd = 11 , so again make sure that you are not confused if the formula looks different, it is the same formula. Remember if a lecturer uses a formula that looks slightly
different, it is up to you as a masters level student to check that it is still the same formula.
Unlike the median and the mean, the value we get for the mode is more accurate from grouped data. So
whenever possible calculate the mode from the grouped data.
-
Quantitative Methods
MANCOSA - MBA 41
Calculation of the mode from a grouped frequency distribution.
It is not possible to calculate the exact value of the mode of the original data in a grouped frequency distribution,
since information is lost when the data are grouped. However, it is possible to make an estimate of the mode.
The class interval with the largest frequency is called the modal class.
(Note: The following formula looks different. Does it give the same answer?)
Mode = L + 1
1 2
dc
d d
+
Where:
L = lower limit of the modal class.
1d = frequency of the modal class minus the frequency of the immediately preceeding class.
2d = frequency of the modal class minus the frequency of the class that immediately follows the modal class.
c = the length of the class interval of the modal class.
Steps
Select the class containing the highest frequency as the modal class.
Use the formula to estimate the modal value.
Activity
The number of times during working hours in a factory when a certain machine is not operating as a result of
breakage are recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the
mode of the distribution
Time (min) f
0 10 3
10 20 13
20 30 30
30 40 25
40 50 14
50 60 8
60 70 4
70 80 2
80 90 1
Total 100
-
Quantitative Methods
MANCOSA - MBA 42
Solution
The interval having the highest frequency, namely 30, is the 3rd interval: (20 30).
Mode = L + 1
1 2
dc
d d
+
min27.2727.720
221702010
22172010
517133020 =+=+=+=
+
+=
We used 20 as the lower limit, because if you look at the table you will see that the data are continuous and the
values are not rounded off. 19.999 would be in the class 10 to 20, while 20.00001 would be in the class 20 to 30.
2.4 The Best Average/Symmetry
The different averages have different advantages and disadvantages, and there are no objective criteria that
determine the most representative average of all data sets. Each researcher has to use his/her own discretion on
a set of data.
The mean is the most familiar average. It exists for each data set, takes every score into account, is affected by
extreme scores, and works well with many statistical methods.
The median is commonly used. It always exists, does not take every score into account, is not affected by
extreme scores, and is often a good choice if there are some extreme scores in the data set.
The mode is sometimes used. It might not exist, or there may be more than one mode. It does not take every
score into account, is not affected by extreme score, and is appropriate for data at the nominal level.
The best measure for central location
The arithmetic mean is more affected by extreme values. If your data has some values that are very large or
small (relative to the other values) then it is better to use the median. When we get to the normal distribution in a
later unit, you will see why the arithmetic mean is important.
Skewness
If there are large extreme values in your data the mean will be pulled to the right and we say that the distribution
is positive skew.
For a symmetrical distribution the mean, median and mode will be about the same
ModeMedianx == If we measure the mass or height of people it is usually a symmetrical (or normal) distribution. IQs or test results
are also usually from a normal distribution.
-
Quantitative Methods
MANCOSA - MBA 43
A histogram of a symmetrical distribution is given in the following figure:
For a distribution that is skewed to the right the mode will be less than the median and the median will be less
than the mean.
xMedianMode
-
Quantitative Methods
MANCOSA - MBA 44
A histogram that is skewed to the left (negative skewed) is shown in the following figure:
As a general rule the difference between the median and the mode is about twice the difference between the
mean and the median.
If the data are skewed to the left there are some outliers on the left (small values). If the data are skewed to the
right then there are some large outliers.
If the mean is 55.66, the Median is 56.1 and the Mode is 56.7. We thus have ModeMedianx
-
Quantitative Methods
MANCOSA - MBA 45
A comparison of the mean and median can reveal information about skewness. Data can be identified as
skewed to the left, symmetric, skewed to the right. Data skewed to the left will have the mean and median to the
left of the mode, but in unpredictable order, as illustrated below:
The Relative Positions of the Mean, Median, and Mode:
Symmetric DistributionZero skewness :Mean =Median = Mode
ModeMedianMean
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
Positively skewed: Mean>Median>Mode
ModeMedian
Mean
-
Quantitative Methods
MANCOSA - MBA 46
Negatively Skewed: Mean
-
Quantitative Methods
MANCOSA - MBA 47
2.5 Box plots
The box plot (box-and-whisker diagram) is a part of exploratory data analysis and reveals more information about
how the data is spread. The construction of a box plot requires the minimum, the maximum, the median, and two
other values called hinges.
Definition 1: The minimum score, the maximum score, the median, and two hinges constitute a 5-
number summary of a set of data.
Definition 2: The lower hinge is the median of the lower half of all scores (from the minimum score up
to the original median).
Definition 3: The upper hinge is the median of the upper half of all scores (from the original median up
to the maximum score).
1. Arrange the data in ascending order.
2. Find the median.
3. List the lower half of the data from the minimum score up to and including the median found in step 2. The
left hinge is the median of these scores (This value is called the first quartile).
4. List the upper half of the data starting with the median and including it in the scores up to and including the
maximum. The right hinge is the median of these scores. (This is called the third quartile).
5. List the minimum, the left hinge (from step 3), the median (from step 2), the right hinge (from step 4), and the
maximum.
Example. Construct the box plot for the following 20 scores:
9, 8, 6, 12, 4, 15, 7, 16, 8, 6, 13, 5, 9, 16, 4, 2, 6, 15, 9, 3
Arranging in increasing order, the list is:
2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16.
The lower half, after finding the median score 8 and including it, is:
2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8. The median of these scores is 6.
The upper half including the median of 8 is:
8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16. The median of these score is 12.
The minimum score is 2, the maximum score is 16, the median is 8, the left hinge is 6, and the right hinge is 12.
To construct the box plot, begin with a horizontal (vertical) scale. Box the hinges as shown and extend the lines
to connect the minimum score to a hinge and the maximum score to a hinge.
-
Quantitative Methods
MANCOSA - MBA 48
0 6 8 122 104 14 16 18 20
Unit 2 Exercises: (Solutions are found at the end of the module guide)
Exercise 2.1
A supermarket sells kilogram-bags of pears. The numbers of pears in 21 bags were:
7 9 8 8 10 9 8 10 10 8 9
10 7 9 9 9 7 8 7 8 9
a) Find the mode, median and mean for these data.
b) Compare your results and comment on the likely shape of the distribution.
c) Plot a simple bar chart to portray the data.
Exercise 2.2
The number of credit cards carried by 25 shoppers are:
2 5 2 0 4 3 0 1 1 7 1 4 1
3 9 4 1 4 1 5 5 2 3 1 1
a) Determine the mode and median of this distribution.
b) Calculate the mean of the distribution and compare it to the mode and median.
What can you conclude about the shape of the distribution?
c) Draw a bar chart to represents the distribution and confirm your conclusions in (b).
-
Quantitative Methods
MANCOSA - MBA 49
Exercise 2.3
A supermarket has one checkout for customers who wish to purchase 10 items or less.
The numbers of items presented at this checkout by 19 customers were:
10 8 7 7 6 11 10 8 9 9
9 6 10 9 8 9 10 10 10
a) Find the mode, median and mean for these data.
b) What do your results for (a) tell you about the shape of the distribution?
c) Plot a simple bar chart to portray the distribution.
Exercise 2.4
The numbers of driving tests taken to pass by 28 clients of a driving school are given in the following table:
a) Obtain the mode, median and mean from this frequency distribution and compare their value.
b) Plot a simple bar chart of the distribution.
Exercise 2.5
2.5.1 Spina Software Solutions operates an on-line help and advice service for PC owners. The numbers of
calls made to them by subscribers in a month are tabulated below.
2.5.2
Number of subscribers
Calls made Female Male
1 31 47
2 44 42
3 19 24
4 6 15
5 1 4
Find the mode, median and mean for both distributions and use them to compare the two distributions.
Tests taken Number of clients
1 10
2 8
3 4
4 3
5 3
-
Quantitative Methods
MANCOSA - MBA 50
Exercise 2.6
Toofley the chemists own 29 pharmacies. The number of packets of a new skin medication sold in each of their
shops in a week were:
7 22 17 13 11 20 15 18 5 22
6 18 10 13 33 13 9 8 9 19
19 8 12 12 21 20 12 13 22
a) Find the mode and range of the data.
b) Identify the median of the data.
c) Find the lower and upper quartile values.
d) Determine the semi-interquartile range.
Exercise 2.7
Voditel international owns a large fleet of company cars. The mileages, in thousands of miles, of a sample of 17
of their cars over the last financial year were:
11 31 27 26 27 35 23 19 28 25
15 36 29 27 26 22 20
Calculate the mean and standard deviation of these mileage figures.
Exercise 2.8
Three credit companies each produced an analysis of its customers bills over the last month. The following
results have been published:
Company Mean bill size Standard deviation of bill size
Akula N$559 N$172
Bremia N$612 N$147
Dolg N$507 N$161
Are the following statements true or false?
a) Dolg bills are on average the smallest and vary more than those from the other companies.
b) Bremia bills are on average the largest and vary more than those from other companies.
c) Akula bill are on average larger than those from Dolg and vary more than those from Bremia.
d) Akula bill are on average smaller than those from Bremia and vary less than those from Dolg.
e) Bremia bill are on average larger than those from Akula and vary more than those from Dolg.
f) Dolg bill vary less than those from Akula and are on average less than those from Bremia.
-
Quantitative Methods
MANCOSA - MBA 51
Exercise 2.9
The Kilocalories per portion in a sample of 32 different breakfast cereals were recorded and collated into the
following grouped frequency distribution:
Kcal per portion Frequency
80 up to 120 3
120 up to 160 11
160 up to 200 9
200 up to 240 7
240 up to 280 2
a) Obtain an approximate value for the median of the distribution.
b) Calculate approximate values for the mean and standard deviation of the distribution.
Exercise 2.10
The stem and leaf display below shows the Friday night admission prices for 31 clubs.
Stem Leaves
0 44
0 5555677789
1 000224444
1 5555588
2 002
Leaf unit =N$1
Find the values of the median and semi-interquartile range.
Exercise 2.11
Select which of the statements on the right-hand side best defines the words on the left-hand side.
(i) median (a) the square of the standard deviation
(ii) range (b) a diagram based on order statistics
(iii) variance (c) the most frequently occurring value
(iv) boxplot (d) the difference between the extreme observations
(v) SIQR (e) the middle value
(vi) mode (f) half the difference between the first and third quartiles
-
Quantitative Methods
MANCOSA - MBA 52
Student self review questions
1) What is a measure of location.
2) How is the arithmetic mean defined?
3) Why is the special notation x1,x2,.,x,, used?
4) What does fx mean?
5) Why is the formula for the arithmetic mean of a frequency distribution different to that for the mean of a
set?
6) How is it that the mean of a grouped frequency distribution cannot be calculated exactly?
7) In what situation would a weighted mean be used?
8) Why is the mean considered to be the mathematical average?
9) What is the main disadvantage of the mean?
10) How is the mode defined?
11) Why is the mode not used extensively in statistical analysis?
12) Under what conditions may any one of the mean, median or mode be estimated, given the values of the
other two?
13) Write down the definition of the geometric mean and the type of values that it can be used to average.
14) Write down the definition of the harmonic mean and type of values that it can be used to average.
15) How is the median defined?
16) If a set has an even number of items, how can the median be determined?
17) Describe briefly how to estimate the median of a grouped frequency distribution graphically.
18) What is the graphical equivalent of the interpolation formula?
19) On balance, why is the graphical method preferred to the formula method for estimating the median?
20) Name two separate conditions under which the median rather than the mean would be chosen as a
measure of location and explain why.
21) What is the main disadvantage of the median?
22) What characteristic of the mean deviation precludes it from being the natural partner to the mean?
23) How is the standard deviation defined?
24) What is the practical advantage in using the computational formula for calculating the standard deviation?
25) The standard deviation is the natural partner to the mean. Explain why this is so.
26) What percentage of an approximately symmetric distribution lies within two standard deviation from the
mean?
27) What is the coefficient of variation and how is it used?
28) How is Pearsons measure of skewness calculated and how does it measure skewness?
29) What is the variance and why is it not used for practical purposes as a measure of dispersion?
-
Quantitative Methods
MANCOSA - MBA 53
UNIT 3
MEASURE OF DISPERSION (VARIABILITY)
-
Quantitative Methods
MANCOSA - MBA 54
UNIT 3: MEASURE OF DISPERSION (VARIABILITY)
OBJECTIVES
By the end of this study unit, you should be able to:
Define the various measures of dispersion.
Compute each dispersion measure for both grouped and ungrouped sets of data.
Interpret each measure of dispersion.
CONTENT
3.1 Introduction
3.2 Range
3.3 Standard deviation
3.4 Variance
3.5 Coefficient of variation
3.6 Measure of non-central position
3.7 Self-evaluation
-
Quantitative Methods
MANCOSA - MBA 55
3.1 Introduction
For two projects A and B, we estimate the returns on the projects over the next year. We look at the percentage
return that will be achieved under different conditions (pessimistic, normal or optimistic).
Pessimistic Normal Optimistic
Project A 12 13 14
Project B 0 13 26
Must the company invest in Project A or Project B if the probability that the pessimistic, normal or optimistic
conditions will prevail are equal?
For Project A the mean is %133/)141312( =++=x For Project B the mean is %133/)26130( =++=x
The mean returns for the projects are equal. That means the expected returns for the projects are equal. Would
you prefer Project A, where your minimum return is 12% or Project B, where you could make no return at all
(0%)? You will do a course on Finance as part of the MBA. In this course you will learn that you have to select
the project that is more predictable (you want to maximize your return, but at the same time you want to minimize
your risk). The returns for the projects are the same, but Project A is more predictable. In statistics we need
measures to measure this spread. For the example above (with only three values) it is easy to see that Project B
has a wider spread of returns, but what happens if we have hundreds of values?
The variability among data is one characteristic to which averages are not sensitive. Consider following two
groups of data:
Group A Group B
65
66
67
68
71
73
74
77
77
77
42
54
58
62
67
77
77
85
93
100
-
Quantitative Methods
MANCOSA - MBA 56
Computed Averages:
Group A
Mean = 71510
= 71.5
Median = 72
Mode = 77
Group B
Mean = 71510
= 71.5
Median = 72
Mode = 77
Interpretation
Although there is no difference in the computed central measures between the two groups, the scores of Group
B are much more widely scattered than the scores for Group A.
SELF-ASSESSMENT ACTIVITY 3.1
Which types of measures are used to measure dispersion (variability)?
SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.1
The measures that are used to measure dispersion are:
Range
Standard deviation
Interquartile range
Quartile deviation
Variance
The method of computation, appropriate data types, uses and interpretation of each are now described.
3.2 Range
The first measure is the range. This is merely the biggest value minus the smallest value. For project A above it
is 14 - 12 = 2%, while for Project B it is 26 0 = 26%. The problem with this measure is that it looks only at the
two observations, we would rather have a measure that uses all the values.
The range is simply the difference between the highest value and the lowest value. For group A, the range is 77
65 = 12, and the range for group B is 100 42 = 58, which suggests greater dispersion. The range depends
only on the maximum and minimum scores, and is a rough measure of spread.
-
Quantitative Methods
MANCOSA - MBA 57
Ungrouped data: Range = Maximum value Minimum value = max minx x
Grouped data: Range = Upper limit of highest class Lower limit of lowest class.
SELF-ASSESSMENT ACTIVITY 3.2
The merchandising manager for a retail clothing chain has recorded 30 observations on the number of days
between re-orders for a particular range of womans clothing.
The re-order intervals (in days) are:
18 26 15 17 7 27 24 17 10 17
23 29 28 18 10 23 16 9 12 26
5 12 23 22 24 14 16 26 19 22
Find the range of the number of days between re-orders.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.2
maxx = 29
minx = 5
Range = 29 5 = 24 days
Interpretation
24 days separates the shortest time ( minx ) between successive re-orders from the longest time ( maxx ) between
successive re-orders for a particular range of womans clothing. The range depends only on the minimum and
maximum scores.
-
Quantitative Methods
MANCOSA - MBA 58
3.3 Standard deviation
The standard deviation is given by the formula:
1)( 2
=
n
xxS .
For the following data, with the probability of the outcomes assumes equal, the standard deviation is calculated
as:
Pessimistic Normal Optimistic
Project A 12 13 14
Project B 0 13 26
For Project A the standard deviation is 122
13)1314()1313()1312( 222
==
++=S .
For project B the standard deviation is 132
33813
)1326()1313()130( 222==
++=S .
We see that the standard deviation for Project B is 13 times as large as the standard deviation for Project A.
On Excel the standard deviation for project A can be found by placing =stdev(12,13,14) in a cell.
In the exam you do not have Excel, so you will have to use a calculator. Most calculators can calculate the
statistical functions.
1. Put the calculator on Stat mode.
2. Enter 12
3. Press the DATA button (usually the M+ button).
4. The calculator displays 1, this means that you have entered one value.
5. Enter 13 and press DATA, the calculator displays 2.
6. Enter 14 and press DATA, the calculator displays 3.
7. Now ask for x , (It is usually second function 4) and the calculator will display 13.
8. Ask for nS , (Usually second function 6) and the calculator will display 1. (if you are working with a sample
you would use 1nS . (Some calculators show this as 1n )
Try this for Project B to see that you are doing it correctly.
-
Quantitative Methods
MANCOSA - MBA 59
In Unit 5 we will come back to this. At this stage we can state that about two thirds of the values fall within one
standard deviation from the mean. About two thirds (about 37) values fall between 55.66-20.53 = 35.13 and
55.66+20.53 = 76.19. This gives us an indication of how far the values are from the mean (the central value).
In Corporate Finance the risk (uncertainty) is often measured with the standard deviation. They often say that
the risk is 20.53, but to be correct they should say that the standard deviation is 20.53.
3.3.1 Ungrouped data
2( )1
x xs
n
=
. Mathematical formula.
or
( ) ( )22( 1)
n x xs
n n
=
Computational formula.
Steps (Mathematical formula)
1. Compute the arithmetic mean ( x ).
2. Subtract the mean from each data value: ( x x ).
3. Square each difference: ( )2x x . 4. Sum the squared differences: ( )2x x . 5. Calculate the average by dividing the sum by ( )1n . Division by ( )1n is to correct the bias in estimating
the population standard deviation using the sample standard deviation.
6. The standard deviation is the square root of this total.
Example
Find the standard deviation of the following sample scores: 2, 3, 5, 6, 9, 17
x ( )x x 2( )x x 2 -5 25 3 -4 16 5 -2 4 6 -1 1 9 2 4 17 10 100
= 42 = 0 = 150
7642
: ==xmean
-
Quantitative Methods
MANCOSA - MBA 60
Using the mathematical formula for the ungrouped data, the standard deviation is:
5.5305
15016
150==
=s
We will now use the computational formula for the self assessment exercise above.
From the previous table above, the sum of x is: 42x = .
The sum of the squares is: 2x = 4 + 9 + 25 + 36 + 81 + 289 = 444.
Thus the standard deviation is:
( ) ( )5.530
30900
3017642664
)16(6)42()444(6
)1(222
==
=
=
=
nn
xxns
The answer is identical to result calculated previously.
Check whether you get the same answer if you use the statistics function on the calculator.
3.4.2 Grouped data
If the actual raw data are not available and we have to calculate the standard deviation from the grouped data,
we use the formula: 1
22
=
fxnfx
S .
Table 1
Mass (in kg) Class midpoint Frequency 2fx 45-49 47 6 6 times 47 times 47 = 13254
50-54 52 14 14 times 52 times 52 = 37856
55-59 57 25 25 times 57 times 57 = 81225
60-64 62 11 11 times 62 times 62 = 42284
In Unit 2 (See 2.2.1) we calculated the mean as 55.66 kg and we saw that the total frequency is 56.
To get the 2fx , we have to add the column 13254+37856+81225+42284 = 174 619.
53.20156
66.55561746191
222
=
=
=
fxnfx
S
-
Quantitative Methods
MANCOSA - MBA 61
If data have been grouped into a frequency distribution, each class is represented by its midpoint ( )x . 2( )
1x x f
sn
=
Mathematical formula
Steps
1. Compute the arithmetic mean ( )x . 2. Subtract the mean from each midpoint and square the difference: 2( )x x . 3. Multiply the squared difference by the frequency within each class: 2( )x x f . 4. Sum the result to obtain the total squared deviation from the mean: 2( )x x f . 5. Calculate the average by dividing this total by ( 1)n . 6. The standard deviation is the square root of this total.
OR
( ) ( )22( 1)
n fx fxs
n n
=
Computational formula
x = class mark (midpoint of class interval)
f = frequency
n = sample size
SELF-ASSESSMENT ACTIVITY 3.3
The errors in seven invoices were recorded as follows: 120, 30, 40, 8, 5, 20, 29
Use this data to calculate the standard deviation using both the Mathematical formula and Computational
formula.
-
Quantitative Methods
MANCOSA - MBA 62
SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.3
Mathematical formula
x ( )x x 2( )x x 120
30
40
8
5
20
29
84
-6
4
-28
-31
-16
-7
7 056
36
16
784
961
256
49
252 0 9 158
x = 2527
= 36 07.3117
91581
)( 2
=
=
n
xxs
Computational formula
x 2x
120
30
40
8
5
20
29
14400
900
1600
64
25
400
841
x =252 2x =18 230
( ) ( )22( 1)
n x xs
n n
=
( ) ( )27 18230 252
7(7 1)
=
127610 63504
42
=
6410642
= 1526.333= = 39.07
-
Quantitative Methods
MANCOSA - MBA 63
SELF-ASSESSMENT ACTIVITY 3.4
The times (in hours per week) that 50 office staff members spent using personal computers were as follows:
Time (hours/week) Frequency (f)
0 - 3
3 - 6
6 - 9
9 - 12
12 - 15
15 18
14
6
6
7
14
3
f = 50
Use this data to compute the standard deviation using both Mathematical and Computational formulae.
SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.4
Mathematical formula approach
Time (h) f x fx 2( )x x 2( )x x f 0 - 3
3 - 6
6 - 9
9 - 12
12 - 15
15 - 18
14
6
6
7
14
3
1.5
4.5
7.5
10.5
13.5
16.5
21
27
45
73.5
189
49.5
43.56
12.96
0.36
5.76
29.16
70.56
609.84
77.76
2.16
40.32
408.24
211.68
= 50 = 405 = 1 350.00
Mean: x = fxf
= 40550
= 8.1 h.
Standard Deviation: 25.5150
13501
)( 2
=
=
n
fxxs h.
-
Quantitative Methods
MANCOSA - MBA 64
Using Computational formula approach
Time (h) f x fx 2x 2fx 0 - 3
3 - 6
6 - 9
9 - 12
12 - 15
15 - 18
14
6
6
7
14
3
1.5
4.5
7.5
10.5
13.5
16.5
21
27
45
73.5
189
49.5
2.25
20.25
56.25
110.25
182.25
272.25
31.5
121.5
337.5
771.75
2551.5
816.75
f = 50 x = 54 fx = 405 2
x = 43.5 2fx = 4630.5
( ) ( )22( 1)
n fx fxs
n n
=
( ) ( )250 4630.5 40550(50 1)
=
231525 1640252450
=
675002450
= 27.55= = 5.248 5.25 hours
3.4 Variance
The variance is the square of the standard deviation.
The variance for Project A is 12, and for project B it is 132 = 169.
Computation for ungrouped data
Example:
Consider the ages (in years) of 7 second hand cars: 13 7 10 15 12 18 9
Age in years (x ) x x x ( )2x x 13 12 +1 1
7 12 -5 25
10 12 -2 4
15 12 +3 9
12 12 0 0
18 12 +6 36
9 12 -3 9
Total ( )x x = 0 ( )2x x = 84
-
Quantitative Methods
MANCOSA - MBA 65
Step 1:
Find the sample mean. x = x
n
= 847
= 12 years.
Step 2:
Find the squared deviation of each observation from the sample mean.
Since ( )x x =0, in column 3 above, the deviation must first be squared to avoid the plus and minus deviations cancelling each other. These squared deviations are then summed (see column 4 above).
Step 3:
Compute the variance by dividing the total squared deviation by (n-1).
i.e., variance ( 2s ) =
1)( 2
n
xxw
= 84
7 1 =
846
= 14
The formula for a variance can now be expressed as:
Variance = 1sizesample
deviationssquaredofsum
22 ( )
1x x
sn
=
Mathematical formula
The above mathematical formula for the variance is very complex. A more efficient approach using computational
technique is strongly recommended for students.
2 22 ( )
( 1)x n x
sn
=
Computational formula
Example
The variance for the car age problem.
The computational variance formula is used.
Age of car in years ( x) 2x
13 7 10 15 12 18 9
169 49 100 225 144 324 81
x = 84 2x = 1 092
22 1092 (7)(12 )
(7 1)s
=
= 146
84=
n = 7 x = 847
= 12 years
-
Quantitative Methods
MANCOSA - MBA