Statistics 3, 4

Post on 05-Jul-2015

72 views 3 download

Tags:

description

second year

Transcript of Statistics 3, 4

MEASURES OF LOCATION and VARIATION for 1 variable

Lectures 3+4+5 Topics•Measures of Central Tendency for numerical and categorical dataMean, Median, Mode + other means, Fractiles•Measures of Variation for numerical and binary data The Range, Variance and Standard Deviation

•Shape Symmetric, Skewed, Skewness, Kurtosis

Summary Measures

Central Tendency part of Location

Mean

MedianMode

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

RangeFractiles

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xn

ii∑

= 1

The Mean (Arithmetic mean, Average)

•It is the Arithmetic Average of data values:

•The Most Common Measure of Central Tendency

•Affected by Extreme Values (Outliers)

n

xn

1ii∑

= n

xxx n2i +•••++=

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

=xSample Mean

Sum of the observationsNumber of observations

Mean =

THE ARITHMETIC MEAN

This is the most popular and useful measure of central location

n

xx i

n1i=∑

=

Sample mean Population mean

N

x iN

1i=∑=µ

Sample size Population size

n

xx i

n1i=∑=

THE ARITHMETIC MEAN

=+++

=∑

= =

10

...

101021

101 xxxx

x ii

• Example 1The reported time spent on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.

00 77 222211.0 hours11.0 hours

• Example 2Suppose the telephone bills representthe population of measurements ( 200). The population mean is

=+++

=∑=µ =

200

x...xx

200

x 20021i200

1i 42.1942.19 38.4538.45 45.7745.7743.5943.59

THE ARITHMETIC MEAN

The arithmetic mean

WEIGHTED MEAN FOR DATA GROUPED BY CATEGORIES OR VARIANTS

i

iiki

f

fxx

∑∑= =1

When many of the measurements have the same value, the measurement can be summarized in a frequency table. Suppose the number of children in a sample of 16 families were recorded as follows:

NUMBER OF CHILDREN 0 1 2 3NUMBER OF FAMILIES 3 4 7 2

16 families

5.116

)3(2)2(7)1(4)0(316

....

1616162211

161 =+++=++=∑= = fxfxfxfx

x iii

MEAN

FOR TABULATED DATA BY CLASSES

APPROXIMATING DESCRIPTIVE MEASURES FOR GROUPED DATA BY CLASSES

Approximating descriptive measures for grouped data may be needed in two cases: when approximated values.suffices the needs, when only secondary grouped data are available.

iki

iiki

f

fxx

1

1

=

=

∑∑=

x midpointf frequency

Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5

2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0

n =sample size= 30=f1+…+fn 312.0

Class Class Frequency Midpoint i limits fi xi xi fi 1 2-5 3 3.5 10.5

2 5-8 6 6.5 39.03 8-11 8 9.5 76.0…. …. … …. …. .6 17-20 2 18.5 37.0

n =sample size= 30=f1+…+fn 312.0

Example 3 Approximate the mean (calculate the mean) of the telephone call

durations problem as represented by the frequency distribution

0

2

4

6

8

10

2 5 8 11 14 17 20 More6.5

26.10

:valueReal

=x

The Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

•Important Measure of Central Tendency

•In an ordered array, the median is the “middle” number.

•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2

middle numbers.•Not Affected by Extreme Values

Odd number of observations

0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 330, 0, 5, 7, 8, 9, 12, 14, 22, 33

Even number of observations

Example

Find the median of the time spent on the internetfor the adults of example 1

THE MEDIAN The Median of a set of observations is the value that

falls in the middle when the observations are arranged in order of magnitude or ranked increasingly

Suppose only 9 adults were sampled

(exclude, say, the longest time (33))

Comment

8.5, 8

MEDIAN

Data Tabulated discretely – as ungrouped

Data Tabulated by classes - estimation

MEDIAN AND MODE

Median

Me

1-Me

1ii

0 n

n - 1) (21

K x ∑ ∑

=

++=

inMe

The Mode

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6

No Mode

THE MODE

The Mode of a set of observations is the variable value that occurs most frequently.

Set of data may have one mode (or modal class), or two or more modes.

The modal classFor large data setsthe modal class is much more relevant than a single-value mode.

MEDIAN AND MODE

Mode

21

1

0 K x ∆+∆

∆+=Mo

RELATIONSHIP AMONG MEAN, MEDIAN, AND MODE

If a distribution is symmetrical, the mean, median and mode coincide

• If a distribution is non symmetrical, and If a distribution is non symmetrical, and skewed to the left or to the right, the skewed to the left or to the right, the three measures differ.three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

OTHER MEANS

Harmonic

Geometric

Square

FRACTILES

Quartiles: 3

Percentiles: 99

Summary Measures

Central Tendency

MeanMedian

Mode

n

xn

ii∑

= 1

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

( )1n

xxs

2i2

−∑ −=

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviation

Sample

Standard

Deviation

Range

100%⋅

=X

SCV

• Measure of Variation

• Difference Between Largest & Smallest Observations:

Absolute Range =

• Relative Range =

•Ignores How Data Are Distributed:

The Range

SmallestrgestLa xx −

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12Range = 12 - 7 = 5

meanxx SmallestLa /)( rgest −

INTERQUARTILE RANGE

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

INTERQUARTILE RANGE

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

QUARTILES Quartiles split the ranked data into 4 segments

with an equal number of values per segment

25% 25% 25% 25%

• The first quartile, QThe first quartile, Q11, is the value for which 25% , is the value for which 25% of the observations are smaller and 75% are of the observations are smaller and 75% are largerlarger

• QQ22 is the same as the median (50% are is the same as the median (50% are smaller, 50% are larger)smaller, 50% are larger)

• Only 25% of the observations are greater than Only 25% of the observations are greater than the third quartilethe third quartile

QQ11

QQ22

QQ33

QUARTILE FORMULAS

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

(n = 9)(n = 9)

QQ11 = is in the = is in the 0.25(0.25(9+1) = 2.5 position 9+1) = 2.5 position of the of the ranked dataranked data

so use the value half way between the 2so use the value half way between the 2ndnd and 3 and 3rdrd values,values,

so so QQ 11 = 12.5 = 12.5

QUARTILES

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

• Example: Find the first Example: Find the first quartilequartile

DEVIATION

Individual deviation from the mean =

Overall deviation = 0, because

Summing squared deviations

or

absolute values of the deviations

meanxi −

( )∑ =− 0XX i

( )∑ − 2XX i

|| xxi∑ −

•Important Measure of Variation

•Shows Variation About the Mean

• Computed as an arithmetic mean of squared deviations or as a square mean of individual deviations

•For the Population:

•For the Sample:

Variance

( )N

Xi∑ −=2

2 µσ

( )1

22

−∑ −=n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

•Most Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Standard Deviation

( )N

X i∑ −=2µσ

( )1

2

−∑ −=n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Sample Standard Deviation

( )1

2

−∑ −=n

XX i

Data: 10 12 14 15 17 18 18 24

s =

n = 8 Mean =16

18

1624161816171615161416121610 2222222

−−+−+−+−+−+−+− )()()()()()()(

= 4.2426

s

:X i

Comparing Standard Deviations

( )1

2

−∑ −n

XX is =

= 4.2426

( )N

Xi∑ −=2µσ = 3.9686

Value for the Standard Deviation is larger for data considered as a Sample.

Data : 10 12 14 15 17 18 18 24:X i

N= 8 Mean =16

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B - AGE

Data A - AGE

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C - AGE

COEFFICIENT OF VARIATION

Measure of Relative VariationAlways a % or coefficientShows Variation Relative to MeanUsed to Compare 2 or More GroupsFormula ( for Sample):

100%⋅

=X

SCV

COMPARING COEFFICIENT OF VARIATION Stock A: Average Price last year = $50

Standard Deviation (sd) = $5

Stock B: Average Price last year = $100

(sd) = $5

100%⋅

=X

SCV

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%

Both average prices are representatives

SHAPE Describes How Data Are Distributed between smallest and largest values Measures of Shape:

Symmetric or skewed

Right-Skewed or Positively Skewed

Left-Skewed or Positive Skew-ness Symmetric

Mean = Median = ModeMean Median Mode Median MeanMode

BOX PLOT – GRAPHICAL PRESENTATION OF CTM

CENTRAL TENDENCY MEASURES SUMMARY FOR 1 VARIABLE

Discussed Measures of Central Tendency Mean, Median, Mode Addressed Measures of Variation The Range, Variance, Standard Deviation, Coefficient of Variation Determined Shape of Distributions Symmetric or SkewedCoefficient of skewness

Mean = Median = ModeMean Median Mode Mode Median Mean