Lec 03_Describing Data(1)

Measures of Central Tendency

and Dispersion

RECOMMENDED READING

Customised Text, Adapted from ‘Statistical Techniques in Business

& Economics by Lind, Marchal 16th Edition’ McGraw Hill

Chapter 3, Page 50 – 92, excluding Geometric Mean, Chebyshev’s

Theorem

Menu



for ungrouped Data


for Grouped Data

Properties of Measures of

Central Tendency

Relationship between mean,

median and mode

Central Tendency DispersionMeasures of Dispersion

Range

Variance and Standard

Deviation – ungrouped Data

Variance and Standard

Deviation – Grouped Data

Coefficient of Variation

Measures

of

Central Tendency


Objectives

By the end of this lecture, you should be

able to

Calculate the mean, median and mode of a

data set

Compare the strengths and weaknesses of

these measures of central tendency

Describe the relationship between these

measures of central tendency

Basic Terms and Definition

Measure of Central Tendency

A single value that summarizes a set of data

It locates the centre of the values

Also called: Measures of Central Location

What is the function of these measures?

Describe and summarize a data set

Enable comparisons between data set

Types of Measures of Central Tendency

Mean the average

Median the middle most value

Mode most frequently

occurred value

Un-Grouped Data

Mean: ungrouped data

Population Mean (pronounced as mu)

N

population thein nsobservatio all of valuesof sum

N

X

where

N = number of items in the population

i.e. N = population size

= Sum (means add up)

X = add up all the X values

Mean: ungrouped data

Sample Mean (pronounced as x-bar)

n

samplea in nsobservatio all of valuesof sum

n

xx

Note

• the population parameters are represented by Greek letters while

the sample statistics are represented by normal alphabets

• the population size is represented by capital N while

the sample size is represented by the small letter n

Where n = number of items in the sample

i.e. n = sample size

x

Mean

Example

Find the average weekly earnings for the population of 5 workers

151 179 163 142 180

1635

815

N

X

Mean

Example

Find the average weekly earnings for the sample of 5 workers

151 179 163 142 180

1635

815

n

xx

Median: Ungrouped Data

Arrange all observations in ascending order

Find the median value

Odd number of observations:

The median is the unique middle value

Even number of observations:

The median is the average of two middle values


Example: Odd number of observations

Find the median of the weekly earnings of 5

workers: 151 179 163 142 180

Re-arrange the data in ascending order:

142 151 163 179 180

The median is the middle most value

Median = 163


Example: Even number of observations

Find the median of the weekly earnings of 6

workers: 151 179 163 142 180 195

Re-arrange the data in ascending order:

142 151 163 179 180 195

163 179

2171

Median =

Take the average of the 2 middle most

values:

Mode: Ungrouped Data

ExampleFind the mode of the weekly earnings for the following samples:

151 179 163 142 180 195

151 180 163 142 180 195

142 180 163 142 180 195

There is no mode. Every value occurs only once.

Mode = 180

There are two mode values. Mode = 142 and 180

Grouped Data

Mean: Grouped Data

To Compute the Mean for Grouped data,

we need to compute the mid-point for

each class interval.

A mid-point is an estimated value of the

values that fall within a particular class.

2

Limit Lower Limit Upper Mid-point =

We may round the mid-point for the convenience in

computation.

Mean: Grouped DataFormulae:

Population:

fx

N

Sample:

Xfx

n

Where n or N = f = number of observations

f = frequency count in each class

x = the mid-point of each class

(the text book uses M to represent the mid-point)

f x = the total values in the data set.

calculating f x :

calculate f times x for each class

total the fx values of each class

Mean: Grouped Data

Given the Weekly Earnings of the workers as follow:

Earnings

No. of

workers

f

mid-point

x fx

140 – up to 150 4

150 – up to 160 6

160 – up to 170 9

170 – up to 180 12

180 – up to 190 9

190 – up to 200 7

200 – up to 210 3

Total

(150+140)/2

= 145

155

165

175

185

195

205

4 x 145

= 580

930

1485

2100

1665

1365

615

N = f = 50 fx = 8740

145

155

205

930

615

N = f = 50 fx = 8740

580

Mean: Grouped Data

Given the Weekly Earnings of the workers as follow:

Earnings

No. of

workers

f

mid-point

x fx

140 – up to 150 4

150 – up to 160 6

200 – up to 210 3

Total

8.17450

8740

N

fx

Click here to go to the

calculation for standard deviation

Median: Grouped Data(not in the text book)

Steps

Compute the cumulative frequency for each class.

Identify the median class: the class that includes the middle most value

middle most value =

Apply the following formula to find the median:

M L w

nc

fd M

Md

d

2

2

1n

Median: Group Data

Notations:

LMd lower limit of the median class

fm frequency of the median class

w width of the class intervals

n total number of observations

c cumulative frequency up to the class before

the median class

M L w

nc

fd M

Md

d

2

Median: Grouped Data

Earnings

No. of

workers

f

Cumulative

Frequency

cf

140 – up to 150 4

150 – up to 160 6

160 – up to 170 9

170 – up to 180 12

180 – up to 190 9

190 – up to 200 7

200 – up to 210 3

4

10

19

31

40

47

50

item

position middle

th5.252

150

Median class:

1st Class with cumulative freq,

cf > middle position of 25.5

cfmdLmd

w=200-190

=10

Example: Workers Earnings

dd

M

Mdf

c2

n

wLM

17512

192

50

10170


Finding Median with Cumulative Frequency Polygon

Locate on the y-axis, the value that is equal to half of the total number of data points.

Draw a horizontal line across to cut the cumulative frequency polygon.

Drop a perpendicular line from the polygon and find the median value on the x-axis.


Given the Cumulative Frequency Polygon for the weekly

earnings of the workers as follow, find the median earnings.

0

5

10

15

20

25

30

35

40

45

50

135 145 155 165 175 185 195 205

No

. o

f w

ork

ers

Weekly Earnings

Cumulative Frequency Ogive 1 There are 50 workers. (Last

point is marked at 50 on the

y-axis). So the middle

position: 50/2 = 25th item

Look up 25 on the y-axis.

2 Draw a horizontal line across

to cut the polygon.

3 Drop a perpendicular line

from the polygon to cut the

x-axis.

Median = $175

Mode: Grouped Data(not in the text book)

Steps

Find the “Modal Class” – the class with

the highest frequency.

Apply the following formula to find the

mode:

M L wd

d do Mo

1

1 2

Mode: Group Data

Notations:

LMo lower limit of the modal class

w width of the class intervals

d1 frequency of the modal class minusfrequency of the class before it

d2 frequency of the modal class minusfrequency of the class after it

M L wd

d do Mo

1

1 2

Mode: Grouped Data

Weekly Earnings of the

workers :Earnings

No. of

workers

f

140 – up to 150 4

150 – up to 160 6

160 – up to 170 9

170 – up to 180 12

180 – up to 190 9

190 – up to 200 7

200 – up to 210 3

Modal class:

Highest Frequency

d1 = 12 – 9 = 3

Lmo

d2 = 12 – 9 = 3

21

1Mo

dd

dwLM

o 175

33

310170

Properties of mean, median and mode

Please refer to the text book

pp 59 and 63 - 64 for the properties of the mean, median and mode.

Note the disadvantages of mean:

Mean is affected by extreme values

Extreme values = very large or very small values

Inappropriate if there is an open-ended class in grouped data because we cannot find the mid-point of an open-ended class

Open-ended class: classes without lower limit or

upper limit

e.g. $50 or less; or

$1000 or more

Relation: mean median & mode

The values of the mean, median and

mode will determine the shape of the

distribution.

Shapes of the distribution:

Symmetric

Right-Skewed (Positive Skewed)

Left-Skewed (Negative Skewed)

Relation: Symmetric

Symmetric:

the areas on both sides of the distribution are equal

Mean = Median = Mode

There are no extreme values (values that are very large or very small)

Relation: Right-skewed

Right-skewed (Positive skewed):

More values in the lower end than the higher end

Long tail at the right

Mean > Median > Mode

Arises when the mean is increased by some unusually high values

Relation: Left-Skewed

Left-skewed (Negative skewed):

More values at the higher end then the lower end

Long tail at the left

Mean < Median < Mode

Arises when the mean is reduced by some unusually low values

Choice of Measures of Central Tendency

If the distribution is symmetrical, no choice is

needed. We can use the mean, median and

mode, since they are all of the same value.

If the distribution is skewed, either to the right

or left, then the median is often the best

measure because mean will be distorted by the

extreme values in these cases.

Measures

of

Dispersion

Measures of Dispersion

Objectives

By the end of this session, you should be

able to

Calculate the range, variance, standard

deviation and coefficient of variation

Compare the differences between the

measures of central tendency and the

measures of dispersion


What is dispersion ?

refers to the spread or variability of values in a

distribution with respect to its central location

(i.e.) it shows the extent to which the observations

are scattered

Importance

enables us to judge the reliability of the measures of

Central Tendency

enables us to compare dispersions of various data

sets


Data sets may have the same mean, but differ in the dispersion

Curve B has a

moderate spread

Curve C has a

very large spread

Curve A has a very

small spread

The bigger the spread, the less reliable it is to use the values of the measures of central of tendency as representatives of the values in the data set

Importance of Measures of Dispersion

Given the earnings of 2 firms as follow. Which firm will you invest in?

Earnings ($)

Firm A Firm B

1994 50,000 20,000

1995 55,000 100,000

1996 60,000 (6,000)

1997 58,000 5,000

1998 65,000 300,000

1999 70,000 10,000

2000 73,000 (50,000)

2001 79,000 150,000

Mean $63,750 $66,125

Although Firm B gives a higher

return than Firm A, as

most people do not like to take

risks, they will invest in Firm A

because the dispersion is smaller.

With a smaller dispersion, we are

more certain that we will get a

return of $63,750.

Types of Measures of Dispersion

Range

Variance and Standard Deviation (SD)

Coefficient of Variation (CV)

Note the units of measurements of these

measures:

Range & Standard Deviation - same unit as the

variable

Variance – in squared-units of the variable

Coefficient of Variation – in percentage % term

Dispersion - Range

Range = Xmaximum – Xminimum

Example

Find the range of the weekly earnings for a

sample of 5 workers

151 179 163 142 180

Xmaximum = 180 Xminimum = 142

Range = 180 – 142 = 38

Dispersion - Range

Major drawbacks of range:

involves only 2 values and thus ignores how other

data are distributed

it is affected by extreme values

Two data sets with same range but different spread

Widely spread out

More concentrated at lower values

but affected by an extremely high value

Variance and Standard Deviation

Most commonly used measures for

dispersion

Take into account how data are

distributed

Show data dispersion (variation) with

respect to the central location, mean

Dispersion – Variance (ungrouped data)

A measure of the average squared difference between

the mean and each item in the data set.

Formula: Population: Sample

N

)x( 22

1n

)xx(s

22

Note:

The unit of measurement for Variance is in Squared

term and it is hard to interpret the results

The population variance 2 (sigma square) has the

population size N as the denominator.

The sample variance s2 has n – 1 as the denominator

Standard Deviation (ungrouped data)

Square root of average squared difference between

observations and the mean.

has the same unit of measurement of the variable


( )x

N

2

sx X

n

( )2

1Definition:

Computational:

22

N

x

N

x

1

2

2

n

n

xx

s

we will use the definition formula in our examples.

Only the Definition formula will be provided in the examination.

Standard Deviation (ungrouped data)

Example

Find the standard deviation for weekly earnings of a

population of 5 workers:

151 179 163 142 180

X X – (X – )2

151

179

163

142

180

X = 815

1635

815

N

X

N

X2)(

0332155

1130.

151-163=

-12

0

-21

17

256

0

441

289

(-12)2=

144

(X – )2=1130

179-163=16

Dispersion – Variance (Grouped data)

compute the mid-point of each class

assume the values of the observations fall at the mid-

points of respective classes


f x

N

( )2

sf x X

n

( )2

1Definition:

Computational:22

N

fx

N

fx

1

2

2

n

n

fxfx

s

x = mid point of each class; f = frequency of each class

n or N = fx

Standard Deviation - Grouped Data: Example

Calculate the standard deviation for the weekly earnings of a sample of 50 workers

Weekly

Earnings

No. of

workers

(f)

mid-point

(x)

(x – ) (x – )2 f(x – )2

140 – up to 150 4 145

150 – up to 160 6 155

160 – up to 170 9 165

170 – up to 180 12 175

180 – up to 190 9 185

190 – up to 200 7 195

200 – up to 210 3 205

n = f = 50

145-174.8

= -29.8

817450

8740.$

n

fxx

-19.8

-9.8

0.2

10.2

20.2

30.2

(-29.8)2

= 888.04

392.04

96.04

0.04

104.04

408.04

912.04

2352.24

864.36

0.48

936.36

2856.28

2736.12

X X X

f (x – )2 =X 13298

471649

13298

1

2

.$)(

n

Xxfs

4 x 888.04

= 3552.16

Click here to see the calculation for mean

Mean and Standard Deviation

Ungrouped data Vs Grouped Data

Grouped Data are more organized

The calculation of the mean and standard deviation with grouped data is less accurate as the values are estimated by the mid-points instead of the actual values of the data

The mean and standard deviation cannot be computed with grouped data with open-ended classes because the mid-points of the open-ended classes cannot be ascertain.

Dispersion – Coefficient of Variation

Measure of relative dispersion

Show data dispersion (variation) in terms of

the percentage (%) of the central location,

mean

Free of units – a good choice of measure for

comparing dispersion of 2 or more data sets

when there is substantial difference in

the size of the mean values

the units of measurement

Dispersion - Coefficient of Variation

Formula

%.. 100

VC %.. 100

X

sVC

Population: Sample

Coefficient of Variation

(a) difference in the size of the average

%%,

,10100

000500

00050 XCV

%100xCV

%%,

,10100

00032

2003 XCV

Executives Unskilled Workers

= $500,000 = $32,000

= $50,000 = $3,200

Data are collected on the income of the executives and

unskilled workers.

Do the executives have greater dispersion in their incomes?

since the relative dispersion are the same (10%), the

executive do not have greater dispersion.

Dispersion – Coefficient of Variation

(b) difference in the units of measurement

%% 20100200

40 XCV %% 10100

20

2 XCV

Bonus Years of Service

= $200 = 20 years

= $40 = 2 years

Compare the variability of the amount of bonus paid and

the number of years of service yields :

The relative dispersion of the Bonus is 20%, higher than that of

the years of service of 10%, Bonus is more dispersed

Flaw of Average

Lec 03_Describing Data(1)

Documents

Transcript of Lec 03_Describing Data(1)