DESCRIPTIVE MEASURES. Frequency distributions and graphs may be considered a first kind of...

DESCRIPTIVE MEASURESDESCRIPTIVE MEASURES


Frequency distributions and graphs may be Frequency distributions and graphs may be considered a first kind of summarization. considered a first kind of summarization. However, they are not helpful when we need to However, they are not helpful when we need to describe verbally the main futures of a data describe verbally the main futures of a data set.set.

Summaries are extremely useful in Summaries are extremely useful in understanding and communicating the most understanding and communicating the most important characteristics of a data set.important characteristics of a data set.


Example: these techniques can help us to graph data on Example: these techniques can help us to graph data on family incomes. However, we may want to know the family incomes. However, we may want to know the income of a “typical” family, the spread of the income of a “typical” family, the spread of the distribution of incomes, or the location of a family with distribution of incomes, or the location of a family with particular incomeparticular income

Y-A

xis Spread

Income Center$56,260

Position of a particular family

TYPE OF DESCRIPTIVE TYPE OF DESCRIPTIVE

MEASURESMEASURES Such questions can be answered using the Such questions can be answered using the

summary measures. Included among these summary measures. Included among these are:are:

1.1. Measures of Central TendencyMeasures of Central Tendency

2.2. Measures of PositionMeasures of Position

3.3. Measures ofMeasures of DispersionDispersion

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCIESTENDENCIES

Measures of Central Measures of Central TendenciesTendencies

The simplest and most extreme kind of summary is to The simplest and most extreme kind of summary is to reduce the entire group of observations down to one reduce the entire group of observations down to one single value that best represents the data setsingle value that best represents the data set

This single-value summary should be value that is typical This single-value summary should be value that is typical of the observations of the group (a population or a of the observations of the group (a population or a sample).sample).

Measures of central tendency are measures of the Measures of central tendency are measures of the location of the middle or the center of a distribution.location of the middle or the center of a distribution. The definition of "middle" or "center" is purposely left The definition of "middle" or "center" is purposely left somewhat vague so that the term "central tendency" somewhat vague so that the term "central tendency" can refer to a wide variety of measures.can refer to a wide variety of measures.

Measures of Central Measures of Central TendenciesTendencies

ModeMode for every kind of qualitative variables (“for every kind of qualitative variables (“nominalnominal” and ” and ““ordinalordinal”) and quantitative variables”) and quantitative variables

MedianMedian for qualitative “for qualitative “ordinalordinal” variables and quantitative ” variables and quantitative variablesvariables

MeanMean only for quantitative variables only for quantitative variables

Nominal variablesNominal variables: : the values do not have any quantitative the values do not have any quantitative meaning and there is no ordering relationship between them. meaning and there is no ordering relationship between them. E.g.: gender.E.g.: gender.Ordinal variablesOrdinal variables: there is an order relationship between : there is an order relationship between values, but the difference between two successive modalities values, but the difference between two successive modalities is not quantifiable. E.g: a three-point rating scale measuring is not quantifiable. E.g: a three-point rating scale measuring customer satisfaction customer satisfaction (“Not Satisfied”, “Satisfied”, “Very (“Not Satisfied”, “Satisfied”, “Very

Satisfied”Satisfied”).).

ModeMode

Definition: Definition:

ModeMode is a French word that means fashion. In is a French word that means fashion. In statistics, the mode represents the most statistics, the mode represents the most common value in a data set.common value in a data set.

TheThe mode is the value that occurs with the mode is the value that occurs with the highest frequency in a data sethighest frequency in a data set..

Mode: examplesMode: examples

Stress on Job

Frequency (ni)

Very Somewhat None

10146

ModeMode Highest Highest frequencyfrequency

Vehicles Owned

Number of Households

(ni)

012345

21811432

ModeMode Highest Highest frequencyfrequency

Mode - Grouped data: exampleMode - Grouped data: example

When quantitative variables are grouped in classes the When quantitative variables are grouped in classes the mode is defined as the class interval where most mode is defined as the class interval where most observations lie. This is called the modal-class interval.observations lie. This is called the modal-class interval.

MODAL-CLASS INTERVALMODAL-CLASS INTERVALThe class interval that occurs with the highest frequency in The class interval that occurs with the highest frequency in a dataa data set.

Weekly Earnings (dollars)

Number of Employees

n

400 -| 600600 -| 800800 -| 10001000 -| 12001200 -| 14001400 -| 1600

1422492096

Modal-class Modal-class intervalinterval

Highest Highest frequencyfrequency

Mode: Mode: other featuresother features

One advantage of the mode is that it can be calculated for both One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative.kinds of data, quantitative and qualitative.

The mode is rarely used as a measure of central tendency for The mode is rarely used as a measure of central tendency for numeric variables. However, for categorical variables, the mode numeric variables. However, for categorical variables, the mode is more useful because the mean and median do not make is more useful because the mean and median do not make sense.sense. A data set may have none or many modes:A data set may have none or many modes:

•The data set with The data set with only one modeonly one mode is called is called unimodalunimodal..

•The data set with The data set with two modestwo modes is called is called bimodalbimodal..

•The data set with The data set with more than two modesmore than two modes is is called called multimodal.multimodal.

MedianMedian

DefinitionDefinition

The The medianmedian is the value of the middle term in a is the value of the middle term in a data set that has been ranked in increasing data set that has been ranked in increasing order.order.

In other words, the median divides a ranked data set In other words, the median divides a ranked data set into two equal parts. into two equal parts.

The calculation of the median consists of the following The calculation of the median consists of the following two stepstwo steps

1.1. Rank the data set in increasing order;Rank the data set in increasing order;

2.2. Find the middle term in a data set with Find the middle term in a data set with nn values. The values. The value of this term is the medianvalue of this term is the median

MedianMedian

The position of the middle term in a data set with The position of the middle term in a data set with nn values is values is obtained as follows:obtained as follows:

Position of the middle term= Position of the middle term= 2

1nIf If nn is odd is odd

12

and 2

nn If If nn is even the average is even the average

of the two middle of the two middle valuesvalues

Thus, we can redefine the median as followsThus, we can redefine the median as follows

MedianMedian = value of the th term in a ranked data set if = value of the th term in a ranked data set if nn is is oddodd

2

1n

MedianMedian = value of the th term in a ranked data set = value of the th term in a ranked data set if if nn is even is even2/12

2

nn

Median: example 1Median: example 1The following data give the weight lost (in pounds) by a The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two sample of five members of a health club at the end of two months of membership:months of membership:

10 5 19 8 310 5 19 8 3

1.1. We rank the given data in increasing order as follows:We rank the given data in increasing order as follows:

3 5 8 10 193 5 8 10 19

2.2. We find the position of the middle term (We find the position of the middle term (nn is odd): is odd):

32

15 termmiddle theofPosition

Median: example 1Median: example 1

Therefore, the median is the value of the third term in the Therefore, the median is the value of the third term in the ranked data.ranked data.

3 5 8 10 193 5 8 10 19

The median weight loss for this sample of five members of The median weight loss for this sample of five members of this health club is 8 pounds.this health club is 8 pounds.

MedianMedian

Median: example 2Median: example 2The table lists the total revenue for the 12 top-grossing The table lists the total revenue for the 12 top-grossing

North American concert tours of all timeNorth American concert tours of all time

Tour ArtistTotal Revenue(millions of dollars)

Steel Wheels, 1989Magic Summer, 1990Voodoo Lounge, 1994The Division Bell, 1994Hell Freezes Over, 1994Bridges to Babylon, 1997Popmart, 1997Twenty-Four Seven, 2000No Strings Attached, 2000Elevation, 2001Popodyssey, 2001Black and Blue, 2001

The Rolling StonesNew Kids on the BlockThe Rolling StonesPink FloydThe EaglesThe Rolling StonesU2Tina Turner‘N-SyncU2‘N-SyncThe Backstreet Boys

98.074.1121.2103.579.489.379.980.276.4109.786.882.1


1.1. We rank the given data in increasing order as follows:We rank the given data in increasing order as follows:

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2109.7 121.2

2.2. We find the position of the middle term (We find the position of the middle term (nn is even): is even):

5.62/12

12

2


74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 109.7 121.2

The median is given by the mean of the sixth and the The median is given by the mean of the sixth and the seventh values in the ranked data.seventh values in the ranked data.

million 45.84$45.842

8.861.82Median


The following data consider a questionnaire item on the time The following data consider a questionnaire item on the time involvement of 11 scientists in the 'perception and involvement of 11 scientists in the 'perception and identification of research problems':identification of research problems':

Great, Very low or nil, Very great, Very great, Great, Very great, Medium, Low, Great, Medium, Medium.

We rank the given data in increasing order in the 'perception We rank the given data in increasing order in the 'perception and identification of research problems‘ as follows:and identification of research problems‘ as follows:

Very low or nil, Low, Medium, Medium, Medium, Great, Great, Great, Very great, Very great, Very great.

MedianMedian

Median: frequency distributionMedian: frequency distribution

In order to find the median using frequency distributions, In order to find the median using frequency distributions, you must calculate the cumulative frequency distribution.you must calculate the cumulative frequency distribution.

Then, the first value with a cumulative frequency greater Then, the first value with a cumulative frequency greater than or equal to the position of the middle value is the than or equal to the position of the middle value is the median.median.

If the position of the middle value is exactly 0.5 more than If the position of the middle value is exactly 0.5 more than the cumulative frequency of the previous value, then the the cumulative frequency of the previous value, then the median is the midpoint between the two values.median is the midpoint between the two values.


Stress on Job

Frequency (ni)

Cumulative frequency

None Somewhat Very

61410

62030

5.152/12

30

2


MedianMedian First First value value 15.515.5


5.202/12

40

2


MedianMedianFirst First value value 20.520.5

Vehicles Owned

Number of Households (ni)

Cumulative frequency

012345

21811432

22 + 18 = 202 + 18 + 11 = 312 + 18 + 11 + 4 = 352 + 18 + 11 + 4 + 3 = 382 + 18 + 11 + 4 + 3 + 2=40

5.12

21Median

0.5 more than the 0.5 more than the previous previous cumulative cumulative frequencyfrequency

MeanMean

DefinitionDefinitionAlso called the Also called the arithmetic meanarithmetic mean is the most is the most frequently used measure of central tendency frequently used measure of central tendency and is obtained by dividing the sum of all and is obtained by dividing the sum of all values by the number of values in the data set.values by the number of values in the data set.

valuesofNumber

valuesall of SumMean

The mean calculated for sample data is denoted by The mean calculated for sample data is denoted by and the mean calculated for the population is and the mean calculated for the population is

denoted bydenoted by

x

n

xn

xx

Mean: example 1Mean: example 1

The following data are the 2002 total payrolls of 5 Major The following data are the 2002 total payrolls of 5 Major League Baseball (MLB) teams.League Baseball (MLB) teams.

MLB Team2002 Total Payroll(millions of dollars)

Anaheim AngelsAtlanta BravesNew York YankeesSt. Louis CardinalsTampa Bay Devil Rays

6293

1267534

millions 78$5

390

5

34...9362

n

xx


The following are the ages of all 8 employees of a small The following are the ages of all 8 employees of a small company.company.

5353 3232 6161 2727 3939 4444 4949 5757

years 25.458

362

8

57...3253

n

x

Mean: frequency distributionMean: frequency distribution

To find the mean of a To find the mean of a frequency distributionfrequency distribution multiply each multiply each value by its frequency and add them up. Then divide by the value by its frequency and add them up. Then divide by the total number of elements in your data set:total number of elements in your data set:

n

nxi ii

n

nxx i ii


VehiclesOwned

( )

Number of Households

( )

012345

21811432

0*2=01*18=182*11=223*4=124*3=125*2=10

Sum 40 74

ix inii nx

85.140

74

n

nxx i ii

Mean: frequency distribution Mean: frequency distribution with classeswith classes

n

nmi ii

n

nmx i ii

When data are organized in classes, we don’t know the When data are organized in classes, we don’t know the values of individuals observations. In these cases the values of individuals observations. In these cases the mean is computed as follows:mean is computed as follows:

where where mi is the midpoint of each is the midpoint of each class intervalclass interval

Mean: example 4Mean: example 4The following table gives the frequency distribution of the The following table gives the frequency distribution of the number of orders received each day during the past 50 days number of orders received each day during the past 50 days

at the office of a mail-order companyat the office of a mail-order company.

Number of orders

(x)

Number of days( ni ) mi mi*ni

10 – 1213 – 1516 – 1819 – 21

4122014

(10+12)/2=11

141720

44168340280

Sum n=50 ∑m*n = 832

orders. 64.1650

832

n

nmx i ii

ModeMode: : one advantage of the mode is that it can be one advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for The mode is rarely used as a measure of central tendency for numeric variables.numeric variables.

MedianMedian: the median can be computed for at least qualitative : the median can be computed for at least qualitative ordinal data. The advantage of using the median is that it is ordinal data. The advantage of using the median is that it is not influenced by not influenced by outliersoutliers, and hence in this case it is , and hence in this case it is preferred over the meanpreferred over the mean

MeanMean: The mean can be calculated only for quantitative data.: The mean can be calculated only for quantitative data. The mean is influenced by The mean is influenced by outliersoutliers..

OUTLIERS or EXTREME VALUESOUTLIERS or EXTREME VALUES Values that are very small or very large relative to the Values that are very small or very large relative to the majority of the values in a data set.majority of the values in a data set.

Mode, Median Mean: Mode, Median Mean: relationshipsrelationships

The following Table shows The following Table shows the 2000 populations (in the 2000 populations (in thousands) of the five Pacific states.thousands) of the five Pacific states.

Median, Mean, outlier: Median, Mean, outlier: example1example1

StateStatePopulation Population

(thousands)(thousands)

WashingtonWashingtonOregonOregonAlaskaAlaskaHawaiiHawaiiCaliforniaCalifornia

5894589434213421627627

1212121233.87233.872 OutlierOutlier

Mean without California= Mean without California= thousandthousand

5.27884

121262734215894

Mean with California= Mean with California= thousandthousand

2.90055

33872121262734215894

Median=3421 thousandMedian=3421 thousand

1.1. For a symmetric histogram For a symmetric histogram and frequency curve with one and frequency curve with one peak the values of the mean, peak the values of the mean, median, and mode are median, and mode are identical, and they lie at the identical, and they lie at the centercenter of the distribution. of the distribution.

2.2. For a histogram and a For a histogram and a frequency curve skewed to frequency curve skewed to the right, the value of the the right, the value of the mean is the largest, that of mean is the largest, that of the mode is the smallest, and the mode is the smallest, and the value of the median lies the value of the median lies between these two. between these two.

3.3. If a histogram and a If a histogram and a distribution curve are skewed distribution curve are skewed to the left, the value of the to the left, the value of the mean is the smallest and that mean is the smallest and that of the mode is the largest, of the mode is the largest, with the value of the median with the value of the median lying between these twolying between these two.

Mode, Median Mean: Mode, Median Mean: relationshipsrelationships

MEASURES OF POSITIONMEASURES OF POSITION

MEASURES OF POSITIONMEASURES OF POSITION

DefinitionDefinition

A A measure of positionmeasure of position determines the position of a determines the position of a single value in relation to other values in a sample single value in relation to other values in a sample or a population data set.or a population data set.

There are many measures of position; we will see There are many measures of position; we will see quartiles quartiles and and percentilespercentiles

QUARTILESQUARTILES

QUARTILESQUARTILES

Quartiles are three summary measures that divide a ranked Quartiles are three summary measures that divide a ranked data set into four equal parts. data set into four equal parts.

SECOND QUARTILESECOND QUARTILE

The second quartile is the same as the median of a data set.The second quartile is the same as the median of a data set.

FIRST QUARTILEFIRST QUARTILE

The first quartile is the value of the middle term among the The first quartile is the value of the middle term among the observations that are less than the median.observations that are less than the median.

THIRD QUARTILETHIRD QUARTILE

The third quartile is the value of the middle term among the The third quartile is the value of the middle term among the observations that are greater than the median.observations that are greater than the median.

QUARTILESQUARTILES

Each of these portions contains 25% of the Each of these portions contains 25% of the observations of a data set arranged in increasing orderobservations of a data set arranged in increasing order

25% 25% 25% 25%

Q1 Q2 Q3

Approximately 25% of the values in a ranked data set are Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are greater than Q1. less than Q1 and about 75% are greater than Q1.

The second quartile, Q2, divides a ranked data set into two The second quartile, Q2, divides a ranked data set into two equal parts (median).equal parts (median).

Approximately 75% of the values in a ranked data set are Approximately 75% of the values in a ranked data set are less than Q3 and about 25% are greater than Q3. less than Q3 and about 25% are greater than Q3.

QUARTILES: example 1QUARTILES: example 1The following table lists the total revenue for the 12 top-The following table lists the total revenue for the 12 top-grossing North American concert tours of all time.grossing North American concert tours of all time.




98.074.1121.2103.579.489.379.980.276.4109.786.882.1

QUARTILES: example 1QUARTILES: example 1

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Values less than the medianValues less than the median Values greater than the Values greater than the medianmedian

65.79 2

9.794.791

Q

45.84 2

8.861.822

Q

75.100 2

5.1030.983

Q

Also the medianAlso the median

QUARTILES: example 2QUARTILES: example 2

The following are the ages of nine employees of an The following are the ages of nine employees of an insurance company:insurance company:

47 28 39 51 33 37 59 24 3347 28 39 51 33 37 59 24 33

24 28 33 33 37 39 47 51 59

Values less than the medianValues less than the median

5.30 2

33281

Q 372 Q

49 2

51473

Q

PERCENTILESPERCENTILES

Percentiles Percentiles are the summary measures that are the summary measures that divide a ranked data set into 100 equal divide a ranked data set into 100 equal parts. parts.

Each (ranked) data set has 99 percentiles Each (ranked) data set has 99 percentiles that divide it into 100 equal parts.that divide it into 100 equal parts.

The The kkth percentile is denoted by th percentile is denoted by PPkk

1% 1% 1% 1% 1% 1%

Each of these portions contains 1% of the Each of these portions contains 1% of the observations of a data set arranged in increasing observations of a data set arranged in increasing

orderorder

PERCENTILESPERCENTILES

Calculating PercentilesCalculating Percentiles The (approximate) value of the The (approximate) value of the kkth percentile, denoted by th percentile, denoted by PPkk, ,

is is

where where kk denotes the number of the percentile and denotes the number of the percentile and nn represents the sample size.represents the sample size.

set data ranked ain th term100

theof Value

kn

Pk

PERCENTILES: examplePERCENTILES: exampleRefer to the data on revenues for the 12 top-grossing Refer to the data on revenues for the 12 top-grossing North American concert tours of all timeNorth American concert tours of all time




98.074.1121.2103.579.489.379.980.276.4109.786.882.1

PERCENTILES: examplePERCENTILES: example

Find the value of the 42nd percentileFind the value of the 42nd percentile

The data arranged in increasing order as follows:The data arranged in increasing order as follows:

The position of the 42nd percentile isThe position of the 42nd percentile is

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

th term04.5100

)12)(42(

100

kn

PPkk = 42nd percentile = 80.2 = $80.2 million = 42nd percentile = 80.2 = $80.2 million

Thus, approximately 42% of the revenues in the given data are equal Thus, approximately 42% of the revenues in the given data are equal to or less than $80.2 million and 58 % are greater than $80.2 millionto or less than $80.2 million and 58 % are greater than $80.2 million

DESCRIPTIVE MEASURES. Frequency distributions and graphs may be considered a first kind of...

Documents

Transcript of DESCRIPTIVE MEASURES. Frequency distributions and graphs may be considered a first kind of...