Lec 03_Describing Data(1)
-
Upload
kenneth-woong -
Category
Documents
-
view
228 -
download
0
description
Transcript of Lec 03_Describing Data(1)
Measures of Central Tendency
and Dispersion
RECOMMENDED READING
Customised Text, Adapted from ‘Statistical Techniques in Business
& Economics by Lind, Marchal 16th Edition’ McGraw Hill
Chapter 3, Page 50 – 92, excluding Geometric Mean, Chebyshev’s
Theorem
Menu
Measures of Central Tendency
Measures of Central Tendency
for ungrouped Data
Measures of Central Tendency
for Grouped Data
Properties of Measures of
Central Tendency
Relationship between mean,
median and mode
Central Tendency DispersionMeasures of Dispersion
Range
Variance and Standard
Deviation – ungrouped Data
Variance and Standard
Deviation – Grouped Data
Coefficient of Variation
Measures
of
Central Tendency
Measures of Central Tendency
Objectives
By the end of this lecture, you should be
able to
Calculate the mean, median and mode of a
data set
Compare the strengths and weaknesses of
these measures of central tendency
Describe the relationship between these
measures of central tendency
Basic Terms and Definition
Measure of Central Tendency
A single value that summarizes a set of data
It locates the centre of the values
Also called: Measures of Central Location
What is the function of these measures?
Describe and summarize a data set
Enable comparisons between data set
Types of Measures of Central Tendency
Mean the average
Median the middle most value
Mode most frequently
occurred value
Un-Grouped Data
Mean: ungrouped data
Population Mean (pronounced as mu)
N
population thein nsobservatio all of valuesof sum
N
X
where
N = number of items in the population
i.e. N = population size
= Sum (means add up)
X = add up all the X values
Mean: ungrouped data
Sample Mean (pronounced as x-bar)
n
samplea in nsobservatio all of valuesof sum
n
xx
Note
• the population parameters are represented by Greek letters while
the sample statistics are represented by normal alphabets
• the population size is represented by capital N while
the sample size is represented by the small letter n
Where n = number of items in the sample
i.e. n = sample size
x
Mean
Example
Find the average weekly earnings for the population of 5 workers
151 179 163 142 180
1635
815
N
X
Mean
Example
Find the average weekly earnings for the sample of 5 workers
151 179 163 142 180
1635
815
n
xx
Median: Ungrouped Data
Arrange all observations in ascending order
Find the median value
Odd number of observations:
The median is the unique middle value
Even number of observations:
The median is the average of two middle values
Median: Ungrouped Data
Example: Odd number of observations
Find the median of the weekly earnings of 5
workers: 151 179 163 142 180
Re-arrange the data in ascending order:
142 151 163 179 180
The median is the middle most value
Median = 163
Median: Ungrouped Data
Example: Even number of observations
Find the median of the weekly earnings of 6
workers: 151 179 163 142 180 195
Re-arrange the data in ascending order:
142 151 163 179 180 195
163 179
2171
Median =
Take the average of the 2 middle most
values:
Mode: Ungrouped Data
ExampleFind the mode of the weekly earnings for the following samples:
151 179 163 142 180 195
151 180 163 142 180 195
142 180 163 142 180 195
There is no mode. Every value occurs only once.
Mode = 180
There are two mode values. Mode = 142 and 180
Grouped Data
Mean: Grouped Data
To Compute the Mean for Grouped data,
we need to compute the mid-point for
each class interval.
A mid-point is an estimated value of the
values that fall within a particular class.
2
Limit Lower Limit Upper Mid-point =
We may round the mid-point for the convenience in
computation.
Mean: Grouped DataFormulae:
Population:
fx
N
Sample:
Xfx
n
Where n or N = f = number of observations
f = frequency count in each class
x = the mid-point of each class
(the text book uses M to represent the mid-point)
f x = the total values in the data set.
calculating f x :
calculate f times x for each class
total the fx values of each class
Mean: Grouped Data
Given the Weekly Earnings of the workers as follow:
Earnings
No. of
workers
f
mid-point
x fx
140 – up to 150 4
150 – up to 160 6
160 – up to 170 9
170 – up to 180 12
180 – up to 190 9
190 – up to 200 7
200 – up to 210 3
Total
(150+140)/2
= 145
155
165
175
185
195
205
4 x 145
= 580
930
1485
2100
1665
1365
615
N = f = 50 fx = 8740
145
155
205
930
615
N = f = 50 fx = 8740
580
Mean: Grouped Data
Given the Weekly Earnings of the workers as follow:
Earnings
No. of
workers
f
mid-point
x fx
140 – up to 150 4
150 – up to 160 6
200 – up to 210 3
Total
8.17450
8740
N
fx
Click here to go to the
calculation for standard deviation
Median: Grouped Data(not in the text book)
Steps
Compute the cumulative frequency for each class.
Identify the median class: the class that includes the middle most value
middle most value =
Apply the following formula to find the median:
M L w
nc
fd M
Md
d
2
2
1n
Median: Group Data
Notations:
LMd lower limit of the median class
fm frequency of the median class
w width of the class intervals
n total number of observations
c cumulative frequency up to the class before
the median class
M L w
nc
fd M
Md
d
2
Median: Grouped Data
Earnings
No. of
workers
f
Cumulative
Frequency
cf
140 – up to 150 4
150 – up to 160 6
160 – up to 170 9
170 – up to 180 12
180 – up to 190 9
190 – up to 200 7
200 – up to 210 3
4
10
19
31
40
47
50
item
position middle
th5.252
150
Median class:
1st Class with cumulative freq,
cf > middle position of 25.5
cfmdLmd
w=200-190
=10
Example: Workers Earnings
dd
M
Mdf
c2
n
wLM
17512
192
50
10170
Median: Grouped Data
Finding Median with Cumulative Frequency Polygon
Locate on the y-axis, the value that is equal to half of the total number of data points.
Draw a horizontal line across to cut the cumulative frequency polygon.
Drop a perpendicular line from the polygon and find the median value on the x-axis.
Median: Grouped Data
Given the Cumulative Frequency Polygon for the weekly
earnings of the workers as follow, find the median earnings.
0
5
10
15
20
25
30
35
40
45
50
135 145 155 165 175 185 195 205
No
. o
f w
ork
ers
Weekly Earnings
Cumulative Frequency Ogive 1 There are 50 workers. (Last
point is marked at 50 on the
y-axis). So the middle
position: 50/2 = 25th item
Look up 25 on the y-axis.
2 Draw a horizontal line across
to cut the polygon.
3 Drop a perpendicular line
from the polygon to cut the
x-axis.
Median = $175
Mode: Grouped Data(not in the text book)
Steps
Find the “Modal Class” – the class with
the highest frequency.
Apply the following formula to find the
mode:
M L wd
d do Mo
1
1 2
Mode: Group Data
Notations:
LMo lower limit of the modal class
w width of the class intervals
d1 frequency of the modal class minusfrequency of the class before it
d2 frequency of the modal class minusfrequency of the class after it
M L wd
d do Mo
1
1 2
Mode: Grouped Data
Weekly Earnings of the
workers :Earnings
No. of
workers
f
140 – up to 150 4
150 – up to 160 6
160 – up to 170 9
170 – up to 180 12
180 – up to 190 9
190 – up to 200 7
200 – up to 210 3
Modal class:
Highest Frequency
d1 = 12 – 9 = 3
Lmo
d2 = 12 – 9 = 3
21
1Mo
dd
dwLM
o 175
33
310170
Properties of mean, median and mode
Please refer to the text book
pp 59 and 63 - 64 for the properties of the mean, median and mode.
Note the disadvantages of mean:
Mean is affected by extreme values
Extreme values = very large or very small values
Inappropriate if there is an open-ended class in grouped data because we cannot find the mid-point of an open-ended class
Open-ended class: classes without lower limit or
upper limit
e.g. $50 or less; or
$1000 or more
Relation: mean median & mode
The values of the mean, median and
mode will determine the shape of the
distribution.
Shapes of the distribution:
Symmetric
Right-Skewed (Positive Skewed)
Left-Skewed (Negative Skewed)
Relation: Symmetric
Symmetric:
the areas on both sides of the distribution are equal
Mean = Median = Mode
There are no extreme values (values that are very large or very small)
Relation: Right-skewed
Right-skewed (Positive skewed):
More values in the lower end than the higher end
Long tail at the right
Mean > Median > Mode
Arises when the mean is increased by some unusually high values
Relation: Left-Skewed
Left-skewed (Negative skewed):
More values at the higher end then the lower end
Long tail at the left
Mean < Median < Mode
Arises when the mean is reduced by some unusually low values
Choice of Measures of Central Tendency
If the distribution is symmetrical, no choice is
needed. We can use the mean, median and
mode, since they are all of the same value.
If the distribution is skewed, either to the right
or left, then the median is often the best
measure because mean will be distorted by the
extreme values in these cases.
Measures
of
Dispersion
Measures of Dispersion
Objectives
By the end of this session, you should be
able to
Calculate the range, variance, standard
deviation and coefficient of variation
Compare the differences between the
measures of central tendency and the
measures of dispersion
Measures of Dispersion
What is dispersion ?
refers to the spread or variability of values in a
distribution with respect to its central location
(i.e.) it shows the extent to which the observations
are scattered
Importance
enables us to judge the reliability of the measures of
Central Tendency
enables us to compare dispersions of various data
sets
Measures of Dispersion
Data sets may have the same mean, but differ in the dispersion
Curve B has a
moderate spread
Curve C has a
very large spread
Curve A has a very
small spread
The bigger the spread, the less reliable it is to use the values of the measures of central of tendency as representatives of the values in the data set
Importance of Measures of Dispersion
Given the earnings of 2 firms as follow. Which firm will you invest in?
Earnings ($)
Firm A Firm B
1994 50,000 20,000
1995 55,000 100,000
1996 60,000 (6,000)
1997 58,000 5,000
1998 65,000 300,000
1999 70,000 10,000
2000 73,000 (50,000)
2001 79,000 150,000
Mean $63,750 $66,125
Although Firm B gives a higher
return than Firm A, as
most people do not like to take
risks, they will invest in Firm A
because the dispersion is smaller.
With a smaller dispersion, we are
more certain that we will get a
return of $63,750.
Types of Measures of Dispersion
Range
Variance and Standard Deviation (SD)
Coefficient of Variation (CV)
Note the units of measurements of these
measures:
Range & Standard Deviation - same unit as the
variable
Variance – in squared-units of the variable
Coefficient of Variation – in percentage % term
Dispersion - Range
Range = Xmaximum – Xminimum
Example
Find the range of the weekly earnings for a
sample of 5 workers
151 179 163 142 180
Xmaximum = 180 Xminimum = 142
Range = 180 – 142 = 38
Dispersion - Range
Major drawbacks of range:
involves only 2 values and thus ignores how other
data are distributed
it is affected by extreme values
Two data sets with same range but different spread
Widely spread out
More concentrated at lower values
but affected by an extremely high value
Variance and Standard Deviation
Most commonly used measures for
dispersion
Take into account how data are
distributed
Show data dispersion (variation) with
respect to the central location, mean
Dispersion – Variance (ungrouped data)
A measure of the average squared difference between
the mean and each item in the data set.
Formula: Population: Sample
N
)x( 22
1n
)xx(s
22
Note:
The unit of measurement for Variance is in Squared
term and it is hard to interpret the results
The population variance 2 (sigma square) has the
population size N as the denominator.
The sample variance s2 has n – 1 as the denominator
Standard Deviation (ungrouped data)
Square root of average squared difference between
observations and the mean.
has the same unit of measurement of the variable
Formula: Population: Sample
( )x
N
2
sx X
n
( )2
1Definition:
Computational:
22
N
x
N
x
1
2
2
n
n
xx
s
we will use the definition formula in our examples.
Only the Definition formula will be provided in the examination.
Standard Deviation (ungrouped data)
Example
Find the standard deviation for weekly earnings of a
population of 5 workers:
151 179 163 142 180
X X – (X – )2
151
179
163
142
180
X = 815
1635
815
N
X
N
X2)(
0332155
1130.
151-163=
-12
0
-21
17
256
0
441
289
(-12)2=
144
(X – )2=1130
179-163=16
Dispersion – Variance (Grouped data)
compute the mid-point of each class
assume the values of the observations fall at the mid-
points of respective classes
Formula: Population: Sample
f x
N
( )2
sf x X
n
( )2
1Definition:
Computational:22
N
fx
N
fx
1
2
2
n
n
fxfx
s
x = mid point of each class; f = frequency of each class
n or N = fx
Standard Deviation - Grouped Data: Example
Calculate the standard deviation for the weekly earnings of a sample of 50 workers
Weekly
Earnings
No. of
workers
(f)
mid-point
(x)
(x – ) (x – )2 f(x – )2
140 – up to 150 4 145
150 – up to 160 6 155
160 – up to 170 9 165
170 – up to 180 12 175
180 – up to 190 9 185
190 – up to 200 7 195
200 – up to 210 3 205
n = f = 50
145-174.8
= -29.8
817450
8740.$
n
fxx
-19.8
-9.8
0.2
10.2
20.2
30.2
(-29.8)2
= 888.04
392.04
96.04
0.04
104.04
408.04
912.04
2352.24
864.36
0.48
936.36
2856.28
2736.12
X X X
f (x – )2 =X 13298
471649
13298
1
2
.$)(
n
Xxfs
4 x 888.04
= 3552.16
Click here to see the calculation for mean
Mean and Standard Deviation
Ungrouped data Vs Grouped Data
Grouped Data are more organized
The calculation of the mean and standard deviation with grouped data is less accurate as the values are estimated by the mid-points instead of the actual values of the data
The mean and standard deviation cannot be computed with grouped data with open-ended classes because the mid-points of the open-ended classes cannot be ascertain.
Dispersion – Coefficient of Variation
Measure of relative dispersion
Show data dispersion (variation) in terms of
the percentage (%) of the central location,
mean
Free of units – a good choice of measure for
comparing dispersion of 2 or more data sets
when there is substantial difference in
the size of the mean values
the units of measurement
Dispersion - Coefficient of Variation
Formula
%.. 100
VC %.. 100
X
sVC
Population: Sample
Coefficient of Variation
(a) difference in the size of the average
%%,
,10100
000500
00050 XCV
%100xCV
%%,
,10100
00032
2003 XCV
Executives Unskilled Workers
= $500,000 = $32,000
= $50,000 = $3,200
Data are collected on the income of the executives and
unskilled workers.
Do the executives have greater dispersion in their incomes?
since the relative dispersion are the same (10%), the
executive do not have greater dispersion.
Dispersion – Coefficient of Variation
(b) difference in the units of measurement
%% 20100200
40 XCV %% 10100
20
2 XCV
Bonus Years of Service
= $200 = 20 years
= $40 = 2 years
Compare the variability of the amount of bonus paid and
the number of years of service yields :
The relative dispersion of the Bonus is 20%, higher than that of
the years of service of 10%, Bonus is more dispersed
Flaw of Average