1 Descriptive Statistics: Numerical Methods Chapter III.

43
1 Descriptive Statistics: Numerical Methods Chapter III

Transcript of 1 Descriptive Statistics: Numerical Methods Chapter III.

Page 1: 1 Descriptive Statistics: Numerical Methods Chapter III.

1

Descriptive Statistics: Numerical Methods

Chapter III

Page 2: 1 Descriptive Statistics: Numerical Methods Chapter III.

2

Key Learning Objectives and Topics in this Chapter

Measures of Location: (Mean, Median, Mode, Percentiles, Quartiles)

Measures of Dispersion/Variability ( Range, Variance, Standard Deviation, Coefficient of Variation)

Measures of distribution shape, and association between two variables

Page 3: 1 Descriptive Statistics: Numerical Methods Chapter III.

3

Important Note

In all cases :

Know the formulas, learn the computation procedures (i.e., apply the formulas) and know the meaning (interpretation) of the measures computed.

Use Excel; Practice! Practice! and

Practice!

Page 4: 1 Descriptive Statistics: Numerical Methods Chapter III.

4

3.1. Introduction

When describing data, usually we focus our attention on two types of measures..

Central location (e.g. average) Variability or Spread

These measures could be computed for Population: Parameters Sample : Statistics

Page 5: 1 Descriptive Statistics: Numerical Methods Chapter III.

5

With one data pointclearly the central location is at the pointitself.

3.2 Measures of Central Location

A center is a reference point. Thus a good measure of central location is expected to reflect the locations of all the other actual points in the data.

How?

if the third data point appears on the left hand-sideof the center, it should “pull”the central location to the left.

With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).

Page 6: 1 Descriptive Statistics: Numerical Methods Chapter III.

6

Measures of LocationIf the measures are computed

for data from a sample,they are called sample statistics.

If the measures are computed for data from a population,

they are called population parameters.

A sample statistic is referred toas the point estimator of the

corresponding population parameter.

Mean Median Mode Percentiles Quartiles

Page 7: 1 Descriptive Statistics: Numerical Methods Chapter III.

7

This is the most popular and useful measure of central location

i) The Arithmetic Mean (µ)

Sum of the observationsNumber of observationsMean =

Page 8: 1 Descriptive Statistics: Numerical Methods Chapter III.

8

Sample mean Population mean

N

N

1iix

Number of observationsIn the sample (Sample size)

Number of ObservationsIn the Population (Population size)

n

Xx

n

ii

1

i) The Arithmetic MeanSum of the values of

Observations in the dataSum of the values of

Observations in the data

Page 9: 1 Descriptive Statistics: Numerical Methods Chapter III.

9

• Example 1Time (hours) spent by 10 adults on the Internet are as follows: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours.

Based on this data, compute the mean (average) amount of time spent on the Internet?

i) The Arithmetic Mean

hours1110

110==

10

22+9+0 +8+14+33+5+12+7+0

n

Xx

n

ii

1

Based on this data, the average amount of time spent on the internet by a typical adult is 11 hours.

Page 10: 1 Descriptive Statistics: Numerical Methods Chapter III.

10

The Median of a set of observations is the value that falls in the middle of a data that is arranged in certain order (ascending or descending).

It is the value that divides the observation into two equal halves

ii) The Median

Page 11: 1 Descriptive Statistics: Numerical Methods Chapter III.

ii) The Median

To find the median: We Put the data in an array (in increasing or decreasing order).

If the total number of observation in the data set is an ODD number, the median is the middle value.

If the total number of observation contained in the data set is EVEN, then the median is the AVERAGE of the middle two values.

Page 12: 1 Descriptive Statistics: Numerical Methods Chapter III.

12

Odd Number Observations Median= 8

0, 0, 5, 7, 8 9, 12, 14, 22

Example 2aFind the median for the following observations.

0, 7, 12, 5, 14, 8, 0, 9, 22

iii) The Median

Step-1: Arrange the data in increasing/ decreasing order

Step-2: Count the total number of observation in the data (9) …

Page 13: 1 Descriptive Statistics: Numerical Methods Chapter III.

13

0, 0, 5, 7, 8, 9, 12, 14, 22, 33

Example 2bFind the median for the following observations.

0, 7, 12, 5, 33, 14, 8, 0, 9, 22

iii) The Median

Even number Observations

Median=(8+9)/2=8.5

Step-1: Arrange the data in increasing/ decreasing order

Step-2: Count the total number of observation in the

data (10)…

Page 14: 1 Descriptive Statistics: Numerical Methods Chapter III.

ii) The Median

Note: The median (8 in example 2a)of an odd set of data is a

member of the data values.

The median (8.5 in example 2b) of an even data set is not necessarily a member of the set of values.

Unlike the mean, the median is not affected by the value of an observation in the data set.

Page 15: 1 Descriptive Statistics: Numerical Methods Chapter III.

III) The Center: Mode

The mode is the most frequent value.

The Mode is the value that occurs most frequently in the data. It is the value with the highest frequency

In any data set there is only one value for the mean or the median. However, a data set may have more than one value for the mode.

Page 16: 1 Descriptive Statistics: Numerical Methods Chapter III.

16

One modal class

III) The Center: Mode

Two modal classes

Histogram of Income distribution

Page 17: 1 Descriptive Statistics: Numerical Methods Chapter III.

17

Example 3: What is the mode for the following data?

0, 7, 12, 5, 33, 14, 8, 0, 9, 22

Solution All observation except “0” occur once. There are two “0”

values. Thus, the mode is zero.

Is this a good measure of central location?

The value “0” does not reside at the center of this set(compare with the mean = 11.0 and the median = 8.5).

III) The Center: Mode

Page 18: 1 Descriptive Statistics: Numerical Methods Chapter III.

18

• If mean = median = mode, the shape of the distribution is symmetric.

Comparing Measures ofCentral Tendency: Mean, Median, Mode

Page 19: 1 Descriptive Statistics: Numerical Methods Chapter III.

19

If mode < median < mean, the shape of the distribution trails to the right, is positively skewed.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

Comparing Measures ofCentral Tendency: Mean, Median, Mode

• If mode > median > mean, the shape of the distribution

trails to the left, is negatively skewed.

Page 20: 1 Descriptive Statistics: Numerical Methods Chapter III.

20

A percentile provides information about the relative location and spread of the data between the smallest to the largest value.

Is a measure of the relative location, but not necessarily that of the central location

Percentile tells us the proportion of observationsthat lie below or above a certain value in the data. Example: Admission test scores for colleges and universitiesare frequently reported in terms of percentiles.

Percentiles

Page 21: 1 Descriptive Statistics: Numerical Methods Chapter III.

21

Definition:

The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.

Percentiles

Page 22: 1 Descriptive Statistics: Numerical Methods Chapter III.

22

Arrange the data in ascending order.

Compute the ith position of the pth percentile.

If i is not an integer, round up. The p th percentile is the value in the i th position.

If i is an integer, the p th percentile is the average of the values in positions i and i +1.

Computing Percentiles

100100

xp

i

xn

pi

100

Page 23: 1 Descriptive Statistics: Numerical Methods Chapter III.

23

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

i = (p/100)n = (75/100)X10 =7.5

Rounding 7.5, we note that the 8th data value is

The 75th Percentile = 435

Compute the 75th percentile of the following data

Page 24: 1 Descriptive Statistics: Numerical Methods Chapter III.

24

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

i = (p/100)n = (50/100)X10 =5

Averaging the 5th and 6th data value, we get

Compute the 50th percentile of the following data

5th Percentile = (435 + 435)/2 = 435

Page 25: 1 Descriptive Statistics: Numerical Methods Chapter III.

25

Quartiles

Quartiles are specific percentiles.

First Quartile = 25th Percentile

Second Quartile = 50th Percentile = the Median

Third Quartile = 75th Percentile

Page 26: 1 Descriptive Statistics: Numerical Methods Chapter III.

26

Quartiles Divide a data set into four equal parts

QuartilethitheoflocationtheisiQWhere

NQ

NQ

NQ ; ;

4

)1(334

)1(22;

4

)1(1

+=

+=

+=

Page 27: 1 Descriptive Statistics: Numerical Methods Chapter III.

27

3.2 Measures of Variability

Page 28: 1 Descriptive Statistics: Numerical Methods Chapter III.

28

3.2 Measures of Variability

Measures of central location fail to tell the whole story about the distribution.

A question of interest that remains unanswered even after obtaining measures of central location is how spread out are the observations around the central (say, mean) value?

• Variability is Important in business decisions.

• For example, in choosing between two suppliers A and B, we might consider not only the average delivery time for each, but also the variability in delivery time for each.

Page 29: 1 Descriptive Statistics: Numerical Methods Chapter III.

29

Measures of Variability

Range

Inter-Quartile Range

Variance

Standard Deviation

Coefficient of Variation

Page 30: 1 Descriptive Statistics: Numerical Methods Chapter III.

30

The range in a set of observations is the difference between the largest and smallest observations.

The range is the distance between the smallest and the largest data value in the set.

• Range = largest value – smallest value

Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the

dispersion of the observations between the two end points. It is also very sensitive to the smallest and largest data

values

i) The Range

Page 31: 1 Descriptive Statistics: Numerical Methods Chapter III.

31

This is a measure of the spread of the middle 50% of the observations

Large value indicates a large spread of the observations

Is not sensitive to extreme data values

Inter quartile range = Q3 – Q1

ii) Inter Quartile Range

Page 32: 1 Descriptive Statistics: Numerical Methods Chapter III.

32

iii) The Variance

Is the average of the squared differences between each data value and the measure of central location (mean)

Is calculated differently when we use population and when we use a sample

The variance is a measure of variability that utilizes all the data.

Page 33: 1 Descriptive Statistics: Numerical Methods Chapter III.

33

N

xN

ii

1

2

2

)-(

1-

)-(1

2

2

n

xxs

n

ii

iv) The Variance

Variance of a Population

Variance of a sample

Page 34: 1 Descriptive Statistics: Numerical Methods Chapter III.

34

Why divide by n-1 instead of n ?

Better approximation of the population variance

iii) The Variance

Why square the difference?

Sum of deviation from the mean is zero

1-

)-(1

2

2

n

xxs

n

ii

Page 35: 1 Descriptive Statistics: Numerical Methods Chapter III.

35

1-

)-(1

2

2

n

xxs

n

ii

Example- Computing the Variance-Based on a Sample data

Variance of a sample

Find the variance of the following sample observations

9 11 8 12

Page 36: 1 Descriptive Statistics: Numerical Methods Chapter III.

36

Computing Variance of a sample

33.33

10

14

2)2(11 22222

s

8-10= -2

9-10= -111-10= +1

12-10= +2

104

40

4

128119

XStep-1: Find the mean

Step-2: Compute deviations from the mean

Step-3: Square the deviations, add them together, and divide

the sum of the squared deviations by n-1

Page 37: 1 Descriptive Statistics: Numerical Methods Chapter III.

37

The standard deviation of a set of observations is the square root of the variance .

2

2

:deviationandardstPopulation

ss:deviationstandardSample

iv) Standard Deviation

Page 38: 1 Descriptive Statistics: Numerical Methods Chapter III.

38

Why Standard Deviation?

The standard deviation Is often reported in the actual unit of measure in

which the data is recorded.

Thus it can be used to compare the variability of several distributions that are measured in the same units,

It can also be used to make a statement about the general shape of a distribution (Kurtosis).

Page 39: 1 Descriptive Statistics: Numerical Methods Chapter III.

39

Computing the standard deviation

33.33

10

14

2)2(11 22222

s10

4

40

4

128119

X

8-10= -2

9-10= -111-10= +1

12-10= +2

Step-1: Find the mean

Step-2: Compute deviations from the mean

Step-3: Square the deviations, add them together, and divide

the sum of the squared deviations by n-1

step-4: Take the square root of the variance 824.133.32 ss

Page 40: 1 Descriptive Statistics: Numerical Methods Chapter III.

40

The coefficient of variation is computed as follows:

V) Coefficient of Variation

100 %s

x

The coefficient of variation is a measure of how large the standard deviation is relative to the mean.

for asample

for apopulation

100 %

CV=

Page 41: 1 Descriptive Statistics: Numerical Methods Chapter III.

41

A standard deviation of 10 may be perceived large when the mean value is 100, but it is only moderately large if the mean value is 500

Why Coefficient of Variation?

Example: Is a standard deviation of 10 large?

Coefficient of Variation can be used to compare variability in data sets that are measured in different units.

Page 42: 1 Descriptive Statistics: Numerical Methods Chapter III.

42

54.74100 % 100 % 11.15%

490.80sx

22 ( )

2,996.161

ix xs

n

2 2996.47 54.74s s

the standarddeviation isabout 11%

of the mean

Variance

Standard Deviation

Coefficient of Variation

Variance, Standard Deviation,and Coefficient of Variation

Page 43: 1 Descriptive Statistics: Numerical Methods Chapter III.

44

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Compute every single measure of central location and Variability you have learned in this chapter for the following sample rent data on 70 efficiency apartments