Statistics 3, 4

44
MEASURES OF LOCATION and VARIATION for 1 variable

description

second year

Transcript of Statistics 3, 4

Page 1: Statistics 3, 4

MEASURES OF LOCATION and VARIATION for 1 variable

Page 2: Statistics 3, 4

Lectures 3+4+5 Topics•Measures of Central Tendency for numerical and categorical dataMean, Median, Mode + other means, Fractiles•Measures of Variation for numerical and binary data The Range, Variance and Standard Deviation

•Shape Symmetric, Skewed, Skewness, Kurtosis

Page 3: Statistics 3, 4

Summary Measures

Central Tendency part of Location

Mean

MedianMode

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

RangeFractiles

Page 4: Statistics 3, 4

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xn

ii∑

= 1

Page 5: Statistics 3, 4

The Mean (Arithmetic mean, Average)

•It is the Arithmetic Average of data values:

•The Most Common Measure of Central Tendency

•Affected by Extreme Values (Outliers)

n

xn

1ii∑

= n

xxx n2i +•••++=

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

=xSample Mean

Page 6: Statistics 3, 4

Sum of the observationsNumber of observations

Mean =

THE ARITHMETIC MEAN

This is the most popular and useful measure of central location

Page 7: Statistics 3, 4

n

xx i

n1i=∑

=

Sample mean Population mean

N

x iN

1i=∑=µ

Sample size Population size

n

xx i

n1i=∑=

THE ARITHMETIC MEAN

Page 8: Statistics 3, 4

=+++

=∑

= =

10

...

101021

101 xxxx

x ii

• Example 1The reported time spent on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.

00 77 222211.0 hours11.0 hours

• Example 2Suppose the telephone bills representthe population of measurements ( 200). The population mean is

=+++

=∑=µ =

200

x...xx

200

x 20021i200

1i 42.1942.19 38.4538.45 45.7745.7743.5943.59

THE ARITHMETIC MEAN

The arithmetic mean

Page 9: Statistics 3, 4

WEIGHTED MEAN FOR DATA GROUPED BY CATEGORIES OR VARIANTS

i

iiki

f

fxx

∑∑= =1

Page 10: Statistics 3, 4

When many of the measurements have the same value, the measurement can be summarized in a frequency table. Suppose the number of children in a sample of 16 families were recorded as follows:

NUMBER OF CHILDREN 0 1 2 3NUMBER OF FAMILIES 3 4 7 2

16 families

5.116

)3(2)2(7)1(4)0(316

....

1616162211

161 =+++=++=∑= = fxfxfxfx

x iii

Page 11: Statistics 3, 4

MEAN

FOR TABULATED DATA BY CLASSES

Page 12: Statistics 3, 4

APPROXIMATING DESCRIPTIVE MEASURES FOR GROUPED DATA BY CLASSES

Approximating descriptive measures for grouped data may be needed in two cases: when approximated values.suffices the needs, when only secondary grouped data are available.

iki

iiki

f

fxx

1

1

=

=

∑∑=

x midpointf frequency

Page 13: Statistics 3, 4

Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5

2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0

n =sample size= 30=f1+…+fn 312.0

Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5

2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0

n =sample size= 30=f1+…+fn 312.0

Example 3 Approximate the mean (calculate the mean) of the telephone call

durations problem as represented by the frequency distribution

0

2

4

6

8

10

2 5 8 11 14 17 20 More6.5

26.10

:valueReal

=x

Page 14: Statistics 3, 4

The Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

•Important Measure of Central Tendency

•In an ordered array, the median is the “middle” number.

•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2

middle numbers.•Not Affected by Extreme Values

Page 15: Statistics 3, 4

Odd number of observations

0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 330, 0, 5, 7, 8, 9, 12, 14, 22, 33

Even number of observations

Example

Find the median of the time spent on the internetfor the adults of example 1

THE MEDIAN The Median of a set of observations is the value that

falls in the middle when the observations are arranged in order of magnitude or ranked increasingly

Suppose only 9 adults were sampled

(exclude, say, the longest time (33))

Comment

8.5, 8

Page 16: Statistics 3, 4

MEDIAN

Data Tabulated discretely – as ungrouped

Data Tabulated by classes - estimation

Page 17: Statistics 3, 4

MEDIAN AND MODE

Median

Me

1-Me

1ii

0 n

n - 1) (21

K x ∑ ∑

=

++=

inMe

Page 18: Statistics 3, 4

The Mode

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6

No Mode

Page 19: Statistics 3, 4

THE MODE

The Mode of a set of observations is the variable value that occurs most frequently.

Set of data may have one mode (or modal class), or two or more modes.

The modal classFor large data setsthe modal class is much more relevant than a single-value mode.

Page 20: Statistics 3, 4

MEDIAN AND MODE

Mode

21

1

0 K x ∆+∆

∆+=Mo

Page 21: Statistics 3, 4

RELATIONSHIP AMONG MEAN, MEDIAN, AND MODE

If a distribution is symmetrical, the mean, median and mode coincide

• If a distribution is non symmetrical, and If a distribution is non symmetrical, and skewed to the left or to the right, the skewed to the left or to the right, the three measures differ.three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

Page 22: Statistics 3, 4

OTHER MEANS

Harmonic

Geometric

Square

Page 23: Statistics 3, 4

FRACTILES

Quartiles: 3

Percentiles: 99

Page 24: Statistics 3, 4

Summary Measures

Central Tendency

MeanMedian

Mode

n

xn

ii∑

= 1

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

( )1n

xxs

2i2

−∑ −=

Page 25: Statistics 3, 4

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviation

Sample

Standard

Deviation

Range

100%⋅

=X

SCV

Page 26: Statistics 3, 4

• Measure of Variation

• Difference Between Largest & Smallest Observations:

Absolute Range =

• Relative Range =

•Ignores How Data Are Distributed:

The Range

SmallestrgestLa xx −

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12Range = 12 - 7 = 5

meanxx SmallestLa /)( rgest −

Page 27: Statistics 3, 4

INTERQUARTILE RANGE

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Page 28: Statistics 3, 4

INTERQUARTILE RANGE

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Page 29: Statistics 3, 4

QUARTILES Quartiles split the ranked data into 4 segments

with an equal number of values per segment

25% 25% 25% 25%

• The first quartile, QThe first quartile, Q11, is the value for which 25% , is the value for which 25% of the observations are smaller and 75% are of the observations are smaller and 75% are largerlarger

• QQ22 is the same as the median (50% are is the same as the median (50% are smaller, 50% are larger)smaller, 50% are larger)

• Only 25% of the observations are greater than Only 25% of the observations are greater than the third quartilethe third quartile

QQ11

QQ22

QQ33

Page 30: Statistics 3, 4

QUARTILE FORMULAS

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Page 31: Statistics 3, 4

(n = 9)(n = 9)

QQ11 = is in the = is in the 0.25(0.25(9+1) = 2.5 position 9+1) = 2.5 position of the of the ranked dataranked data

so use the value half way between the 2so use the value half way between the 2ndnd and 3 and 3rdrd values,values,

so so QQ 11 = 12.5 = 12.5

QUARTILES

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

• Example: Find the first Example: Find the first quartilequartile

Page 32: Statistics 3, 4

DEVIATION

Individual deviation from the mean =

Overall deviation = 0, because

Summing squared deviations

or

absolute values of the deviations

meanxi −

( )∑ =− 0XX i

( )∑ − 2XX i

|| xxi∑ −

Page 33: Statistics 3, 4

•Important Measure of Variation

•Shows Variation About the Mean

• Computed as an arithmetic mean of squared deviations or as a square mean of individual deviations

•For the Population:

•For the Sample:

Variance

( )N

Xi∑ −=2

2 µσ

( )1

22

−∑ −=n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Page 34: Statistics 3, 4

•Most Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Standard Deviation

( )N

X i∑ −=2µσ

( )1

2

−∑ −=n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Page 35: Statistics 3, 4

Sample Standard Deviation

( )1

2

−∑ −=n

XX i

Data: 10 12 14 15 17 18 18 24

s =

n = 8 Mean =16

18

1624161816171615161416121610 2222222

−−+−+−+−+−+−+− )()()()()()()(

= 4.2426

s

:X i

Page 36: Statistics 3, 4

Comparing Standard Deviations

( )1

2

−∑ −n

XX is =

= 4.2426

( )N

Xi∑ −=2µσ = 3.9686

Value for the Standard Deviation is larger for data considered as a Sample.

Data : 10 12 14 15 17 18 18 24:X i

N= 8 Mean =16

Page 37: Statistics 3, 4

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B - AGE

Data A - AGE

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C - AGE

Page 38: Statistics 3, 4

COEFFICIENT OF VARIATION

Measure of Relative VariationAlways a % or coefficientShows Variation Relative to MeanUsed to Compare 2 or More GroupsFormula ( for Sample):

100%⋅

=X

SCV

Page 39: Statistics 3, 4

COMPARING COEFFICIENT OF VARIATION Stock A: Average Price last year = $50

Standard Deviation (sd) = $5

Stock B: Average Price last year = $100

(sd) = $5

100%⋅

=X

SCV

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%

Both average prices are representatives

Page 40: Statistics 3, 4

SHAPE Describes How Data Are Distributed between smallest and largest values Measures of Shape:

Symmetric or skewed

Right-Skewed or Positively Skewed

Left-Skewed or Positive Skew-ness Symmetric

Mean = Median = ModeMean Median Mode Median MeanMode

Page 41: Statistics 3, 4

BOX PLOT – GRAPHICAL PRESENTATION OF CTM

Page 42: Statistics 3, 4
Page 43: Statistics 3, 4
Page 44: Statistics 3, 4

CENTRAL TENDENCY MEASURES SUMMARY FOR 1 VARIABLE

Discussed Measures of Central Tendency Mean, Median, Mode Addressed Measures of Variation The Range, Variance, Standard Deviation, Coefficient of Variation Determined Shape of Distributions Symmetric or SkewedCoefficient of skewness

Mean = Median = ModeMean Median Mode Mode Median Mean