DESCRIPTIVE MEASURES. Frequency distributions and graphs may be considered a first kind of...
-
Upload
jonas-arnold -
Category
Documents
-
view
218 -
download
0
Transcript of DESCRIPTIVE MEASURES. Frequency distributions and graphs may be considered a first kind of...
DESCRIPTIVE MEASURESDESCRIPTIVE MEASURES
DESCRIPTIVE MEASURESDESCRIPTIVE MEASURES
Frequency distributions and graphs may be Frequency distributions and graphs may be considered a first kind of summarization. considered a first kind of summarization. However, they are not helpful when we need to However, they are not helpful when we need to describe verbally the main futures of a data describe verbally the main futures of a data set.set.
Summaries are extremely useful in Summaries are extremely useful in understanding and communicating the most understanding and communicating the most important characteristics of a data set.important characteristics of a data set.
DESCRIPTIVE MEASURESDESCRIPTIVE MEASURES
Example: these techniques can help us to graph data on Example: these techniques can help us to graph data on family incomes. However, we may want to know the family incomes. However, we may want to know the income of a “typical” family, the spread of the income of a “typical” family, the spread of the distribution of incomes, or the location of a family with distribution of incomes, or the location of a family with particular incomeparticular income
Y-A
xis Spread
Income Center$56,260
Position of a particular family
TYPE OF DESCRIPTIVE TYPE OF DESCRIPTIVE
MEASURESMEASURES Such questions can be answered using the Such questions can be answered using the
summary measures. Included among these summary measures. Included among these are:are:
1.1. Measures of Central TendencyMeasures of Central Tendency
2.2. Measures of PositionMeasures of Position
3.3. Measures ofMeasures of DispersionDispersion
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCIESTENDENCIES
Measures of Central Measures of Central TendenciesTendencies
The simplest and most extreme kind of summary is to The simplest and most extreme kind of summary is to reduce the entire group of observations down to one reduce the entire group of observations down to one single value that best represents the data setsingle value that best represents the data set
This single-value summary should be value that is typical This single-value summary should be value that is typical of the observations of the group (a population or a of the observations of the group (a population or a sample).sample).
Measures of central tendency are measures of the Measures of central tendency are measures of the location of the middle or the center of a distribution.location of the middle or the center of a distribution. The definition of "middle" or "center" is purposely left The definition of "middle" or "center" is purposely left somewhat vague so that the term "central tendency" somewhat vague so that the term "central tendency" can refer to a wide variety of measures.can refer to a wide variety of measures.
Measures of Central Measures of Central TendenciesTendencies
ModeMode for every kind of qualitative variables (“for every kind of qualitative variables (“nominalnominal” and ” and ““ordinalordinal”) and quantitative variables”) and quantitative variables
MedianMedian for qualitative “for qualitative “ordinalordinal” variables and quantitative ” variables and quantitative variablesvariables
MeanMean only for quantitative variables only for quantitative variables
Nominal variablesNominal variables: : the values do not have any quantitative the values do not have any quantitative meaning and there is no ordering relationship between them. meaning and there is no ordering relationship between them. E.g.: gender.E.g.: gender.Ordinal variablesOrdinal variables: there is an order relationship between : there is an order relationship between values, but the difference between two successive modalities values, but the difference between two successive modalities is not quantifiable. E.g: a three-point rating scale measuring is not quantifiable. E.g: a three-point rating scale measuring customer satisfaction customer satisfaction (“Not Satisfied”, “Satisfied”, “Very (“Not Satisfied”, “Satisfied”, “Very
Satisfied”Satisfied”).).
ModeMode
Definition: Definition:
ModeMode is a French word that means fashion. In is a French word that means fashion. In statistics, the mode represents the most statistics, the mode represents the most common value in a data set.common value in a data set.
TheThe mode is the value that occurs with the mode is the value that occurs with the highest frequency in a data sethighest frequency in a data set..
Mode: examplesMode: examples
Stress on Job
Frequency (ni)
Very Somewhat None
10146
ModeMode Highest Highest frequencyfrequency
Vehicles Owned
Number of Households
(ni)
012345
21811432
ModeMode Highest Highest frequencyfrequency
Mode - Grouped data: exampleMode - Grouped data: example
When quantitative variables are grouped in classes the When quantitative variables are grouped in classes the mode is defined as the class interval where most mode is defined as the class interval where most observations lie. This is called the modal-class interval.observations lie. This is called the modal-class interval.
MODAL-CLASS INTERVALMODAL-CLASS INTERVALThe class interval that occurs with the highest frequency in The class interval that occurs with the highest frequency in a dataa data set.
Weekly Earnings (dollars)
Number of Employees
n
400 -| 600600 -| 800800 -| 10001000 -| 12001200 -| 14001400 -| 1600
1422492096
Modal-class Modal-class intervalinterval
Highest Highest frequencyfrequency
Mode: Mode: other featuresother features
One advantage of the mode is that it can be calculated for both One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative.kinds of data, quantitative and qualitative.
The mode is rarely used as a measure of central tendency for The mode is rarely used as a measure of central tendency for numeric variables. However, for categorical variables, the mode numeric variables. However, for categorical variables, the mode is more useful because the mean and median do not make is more useful because the mean and median do not make sense.sense. A data set may have none or many modes:A data set may have none or many modes:
•The data set with The data set with only one modeonly one mode is called is called unimodalunimodal..
•The data set with The data set with two modestwo modes is called is called bimodalbimodal..
•The data set with The data set with more than two modesmore than two modes is is called called multimodal.multimodal.
MedianMedian
DefinitionDefinition
The The medianmedian is the value of the middle term in a is the value of the middle term in a data set that has been ranked in increasing data set that has been ranked in increasing order.order.
In other words, the median divides a ranked data set In other words, the median divides a ranked data set into two equal parts. into two equal parts.
The calculation of the median consists of the following The calculation of the median consists of the following two stepstwo steps
1.1. Rank the data set in increasing order;Rank the data set in increasing order;
2.2. Find the middle term in a data set with Find the middle term in a data set with nn values. The values. The value of this term is the medianvalue of this term is the median
MedianMedian
The position of the middle term in a data set with The position of the middle term in a data set with nn values is values is obtained as follows:obtained as follows:
Position of the middle term= Position of the middle term= 2
1nIf If nn is odd is odd
12
and 2
nn If If nn is even the average is even the average
of the two middle of the two middle valuesvalues
Thus, we can redefine the median as followsThus, we can redefine the median as follows
MedianMedian = value of the th term in a ranked data set if = value of the th term in a ranked data set if nn is is oddodd
2
1n
MedianMedian = value of the th term in a ranked data set = value of the th term in a ranked data set if if nn is even is even2/12
2
nn
Median: example 1Median: example 1The following data give the weight lost (in pounds) by a The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two sample of five members of a health club at the end of two months of membership:months of membership:
10 5 19 8 310 5 19 8 3
1.1. We rank the given data in increasing order as follows:We rank the given data in increasing order as follows:
3 5 8 10 193 5 8 10 19
2.2. We find the position of the middle term (We find the position of the middle term (nn is odd): is odd):
32
15 termmiddle theofPosition
Median: example 1Median: example 1
Therefore, the median is the value of the third term in the Therefore, the median is the value of the third term in the ranked data.ranked data.
3 5 8 10 193 5 8 10 19
The median weight loss for this sample of five members of The median weight loss for this sample of five members of this health club is 8 pounds.this health club is 8 pounds.
MedianMedian
Median: example 2Median: example 2The table lists the total revenue for the 12 top-grossing The table lists the total revenue for the 12 top-grossing
North American concert tours of all timeNorth American concert tours of all time
Tour ArtistTotal Revenue(millions of dollars)
Steel Wheels, 1989Magic Summer, 1990Voodoo Lounge, 1994The Division Bell, 1994Hell Freezes Over, 1994Bridges to Babylon, 1997Popmart, 1997Twenty-Four Seven, 2000No Strings Attached, 2000Elevation, 2001Popodyssey, 2001Black and Blue, 2001
The Rolling StonesNew Kids on the BlockThe Rolling StonesPink FloydThe EaglesThe Rolling StonesU2Tina Turner‘N-SyncU2‘N-SyncThe Backstreet Boys
98.074.1121.2103.579.489.379.980.276.4109.786.882.1
Median: example 2Median: example 2
1.1. We rank the given data in increasing order as follows:We rank the given data in increasing order as follows:
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2109.7 121.2
2.2. We find the position of the middle term (We find the position of the middle term (nn is even): is even):
5.62/12
12
2
12 termmiddle theofPosition
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 109.7 121.2
The median is given by the mean of the sixth and the The median is given by the mean of the sixth and the seventh values in the ranked data.seventh values in the ranked data.
million 45.84$45.842
8.861.82Median
Median: example 3Median: example 3
The following data consider a questionnaire item on the time The following data consider a questionnaire item on the time involvement of 11 scientists in the 'perception and involvement of 11 scientists in the 'perception and identification of research problems':identification of research problems':
Great, Very low or nil, Very great, Very great, Great, Very great, Medium, Low, Great, Medium, Medium.
We rank the given data in increasing order in the 'perception We rank the given data in increasing order in the 'perception and identification of research problems‘ as follows:and identification of research problems‘ as follows:
Very low or nil, Low, Medium, Medium, Medium, Great, Great, Great, Very great, Very great, Very great.
MedianMedian
Median: frequency distributionMedian: frequency distribution
In order to find the median using frequency distributions, In order to find the median using frequency distributions, you must calculate the cumulative frequency distribution.you must calculate the cumulative frequency distribution.
Then, the first value with a cumulative frequency greater Then, the first value with a cumulative frequency greater than or equal to the position of the middle value is the than or equal to the position of the middle value is the median.median.
If the position of the middle value is exactly 0.5 more than If the position of the middle value is exactly 0.5 more than the cumulative frequency of the previous value, then the the cumulative frequency of the previous value, then the median is the midpoint between the two values.median is the midpoint between the two values.
Median: example 4Median: example 4
Stress on Job
Frequency (ni)
Cumulative frequency
None Somewhat Very
61410
62030
5.152/12
30
2
30 termmiddle theofPosition
MedianMedian First First value value 15.515.5
Median: example 5Median: example 5
5.202/12
40
2
40 termmiddle theofPosition
MedianMedianFirst First value value 20.520.5
Vehicles Owned
Number of Households (ni)
Cumulative frequency
012345
21811432
22 + 18 = 202 + 18 + 11 = 312 + 18 + 11 + 4 = 352 + 18 + 11 + 4 + 3 = 382 + 18 + 11 + 4 + 3 + 2=40
5.12
21Median
0.5 more than the 0.5 more than the previous previous cumulative cumulative frequencyfrequency
MeanMean
DefinitionDefinitionAlso called the Also called the arithmetic meanarithmetic mean is the most is the most frequently used measure of central tendency frequently used measure of central tendency and is obtained by dividing the sum of all and is obtained by dividing the sum of all values by the number of values in the data set.values by the number of values in the data set.
valuesofNumber
valuesall of SumMean
The mean calculated for sample data is denoted by The mean calculated for sample data is denoted by and the mean calculated for the population is and the mean calculated for the population is
denoted bydenoted by
x
n
xn
xx
Mean: example 1Mean: example 1
The following data are the 2002 total payrolls of 5 Major The following data are the 2002 total payrolls of 5 Major League Baseball (MLB) teams.League Baseball (MLB) teams.
MLB Team2002 Total Payroll(millions of dollars)
Anaheim AngelsAtlanta BravesNew York YankeesSt. Louis CardinalsTampa Bay Devil Rays
6293
1267534
millions 78$5
390
5
34...9362
n
xx
Mean: example 2Mean: example 2
The following are the ages of all 8 employees of a small The following are the ages of all 8 employees of a small company.company.
5353 3232 6161 2727 3939 4444 4949 5757
years 25.458
362
8
57...3253
n
x
Mean: frequency distributionMean: frequency distribution
To find the mean of a To find the mean of a frequency distributionfrequency distribution multiply each multiply each value by its frequency and add them up. Then divide by the value by its frequency and add them up. Then divide by the total number of elements in your data set:total number of elements in your data set:
n
nxi ii
n
nxx i ii
Mean: example 3Mean: example 3
VehiclesOwned
( )
Number of Households
( )
012345
21811432
0*2=01*18=182*11=223*4=124*3=125*2=10
Sum 40 74
ix inii nx
85.140
74
n
nxx i ii
Mean: frequency distribution Mean: frequency distribution with classeswith classes
n
nmi ii
n
nmx i ii
When data are organized in classes, we don’t know the When data are organized in classes, we don’t know the values of individuals observations. In these cases the values of individuals observations. In these cases the mean is computed as follows:mean is computed as follows:
where where mi is the midpoint of each is the midpoint of each class intervalclass interval
Mean: example 4Mean: example 4The following table gives the frequency distribution of the The following table gives the frequency distribution of the number of orders received each day during the past 50 days number of orders received each day during the past 50 days
at the office of a mail-order companyat the office of a mail-order company.
Number of orders
(x)
Number of days( ni ) mi mi*ni
10 – 1213 – 1516 – 1819 – 21
4122014
(10+12)/2=11
141720
44168340280
Sum n=50 ∑m*n = 832
orders. 64.1650
832
n
nmx i ii
ModeMode: : one advantage of the mode is that it can be one advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative. calculated for both kinds of data, quantitative and qualitative. The mode is rarely used as a measure of central tendency for The mode is rarely used as a measure of central tendency for numeric variables.numeric variables.
MedianMedian: the median can be computed for at least qualitative : the median can be computed for at least qualitative ordinal data. The advantage of using the median is that it is ordinal data. The advantage of using the median is that it is not influenced by not influenced by outliersoutliers, and hence in this case it is , and hence in this case it is preferred over the meanpreferred over the mean
MeanMean: The mean can be calculated only for quantitative data.: The mean can be calculated only for quantitative data. The mean is influenced by The mean is influenced by outliersoutliers..
OUTLIERS or EXTREME VALUESOUTLIERS or EXTREME VALUES Values that are very small or very large relative to the Values that are very small or very large relative to the majority of the values in a data set.majority of the values in a data set.
Mode, Median Mean: Mode, Median Mean: relationshipsrelationships
The following Table shows The following Table shows the 2000 populations (in the 2000 populations (in thousands) of the five Pacific states.thousands) of the five Pacific states.
Median, Mean, outlier: Median, Mean, outlier: example1example1
StateStatePopulation Population
(thousands)(thousands)
WashingtonWashingtonOregonOregonAlaskaAlaskaHawaiiHawaiiCaliforniaCalifornia
5894589434213421627627
1212121233.87233.872 OutlierOutlier
Mean without California= Mean without California= thousandthousand
5.27884
121262734215894
Mean with California= Mean with California= thousandthousand
2.90055
33872121262734215894
Median=3421 thousandMedian=3421 thousand
1.1. For a symmetric histogram For a symmetric histogram and frequency curve with one and frequency curve with one peak the values of the mean, peak the values of the mean, median, and mode are median, and mode are identical, and they lie at the identical, and they lie at the centercenter of the distribution. of the distribution.
2.2. For a histogram and a For a histogram and a frequency curve skewed to frequency curve skewed to the right, the value of the the right, the value of the mean is the largest, that of mean is the largest, that of the mode is the smallest, and the mode is the smallest, and the value of the median lies the value of the median lies between these two. between these two.
3.3. If a histogram and a If a histogram and a distribution curve are skewed distribution curve are skewed to the left, the value of the to the left, the value of the mean is the smallest and that mean is the smallest and that of the mode is the largest, of the mode is the largest, with the value of the median with the value of the median lying between these twolying between these two.
Mode, Median Mean: Mode, Median Mean: relationshipsrelationships
MEASURES OF POSITIONMEASURES OF POSITION
MEASURES OF POSITIONMEASURES OF POSITION
DefinitionDefinition
A A measure of positionmeasure of position determines the position of a determines the position of a single value in relation to other values in a sample single value in relation to other values in a sample or a population data set.or a population data set.
There are many measures of position; we will see There are many measures of position; we will see quartiles quartiles and and percentilespercentiles
QUARTILESQUARTILES
QUARTILESQUARTILES
Quartiles are three summary measures that divide a ranked Quartiles are three summary measures that divide a ranked data set into four equal parts. data set into four equal parts.
SECOND QUARTILESECOND QUARTILE
The second quartile is the same as the median of a data set.The second quartile is the same as the median of a data set.
FIRST QUARTILEFIRST QUARTILE
The first quartile is the value of the middle term among the The first quartile is the value of the middle term among the observations that are less than the median.observations that are less than the median.
THIRD QUARTILETHIRD QUARTILE
The third quartile is the value of the middle term among the The third quartile is the value of the middle term among the observations that are greater than the median.observations that are greater than the median.
QUARTILESQUARTILES
Each of these portions contains 25% of the Each of these portions contains 25% of the observations of a data set arranged in increasing orderobservations of a data set arranged in increasing order
25% 25% 25% 25%
Q1 Q2 Q3
Approximately 25% of the values in a ranked data set are Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are greater than Q1. less than Q1 and about 75% are greater than Q1.
The second quartile, Q2, divides a ranked data set into two The second quartile, Q2, divides a ranked data set into two equal parts (median).equal parts (median).
Approximately 75% of the values in a ranked data set are Approximately 75% of the values in a ranked data set are less than Q3 and about 25% are greater than Q3. less than Q3 and about 25% are greater than Q3.
QUARTILES: example 1QUARTILES: example 1The following table lists the total revenue for the 12 top-The following table lists the total revenue for the 12 top-grossing North American concert tours of all time.grossing North American concert tours of all time.
Tour ArtistTotal Revenue(millions of dollars)
Steel Wheels, 1989Magic Summer, 1990Voodoo Lounge, 1994The Division Bell, 1994Hell Freezes Over, 1994Bridges to Babylon, 1997Popmart, 1997Twenty-Four Seven, 2000No Strings Attached, 2000Elevation, 2001Popodyssey, 2001Black and Blue, 2001
The Rolling StonesNew Kids on the BlockThe Rolling StonesPink FloydThe EaglesThe Rolling StonesU2Tina Turner‘N-SyncU2‘N-SyncThe Backstreet Boys
98.074.1121.2103.579.489.379.980.276.4109.786.882.1
QUARTILES: example 1QUARTILES: example 1
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
Values less than the medianValues less than the median Values greater than the Values greater than the medianmedian
65.79 2
9.794.791
Q
45.84 2
8.861.822
Q
75.100 2
5.1030.983
Q
Also the medianAlso the median
QUARTILES: example 2QUARTILES: example 2
The following are the ages of nine employees of an The following are the ages of nine employees of an insurance company:insurance company:
47 28 39 51 33 37 59 24 3347 28 39 51 33 37 59 24 33
24 28 33 33 37 39 47 51 59
Values less than the medianValues less than the median
5.30 2
33281
Q 372 Q
49 2
51473
Q
PERCENTILESPERCENTILES
Percentiles Percentiles are the summary measures that are the summary measures that divide a ranked data set into 100 equal divide a ranked data set into 100 equal parts. parts.
Each (ranked) data set has 99 percentiles Each (ranked) data set has 99 percentiles that divide it into 100 equal parts.that divide it into 100 equal parts.
The The kkth percentile is denoted by th percentile is denoted by PPkk
1% 1% 1% 1% 1% 1%
Each of these portions contains 1% of the Each of these portions contains 1% of the observations of a data set arranged in increasing observations of a data set arranged in increasing
orderorder
PERCENTILESPERCENTILES
Calculating PercentilesCalculating Percentiles The (approximate) value of the The (approximate) value of the kkth percentile, denoted by th percentile, denoted by PPkk, ,
is is
where where kk denotes the number of the percentile and denotes the number of the percentile and nn represents the sample size.represents the sample size.
set data ranked ain th term100
theof Value
kn
Pk
PERCENTILES: examplePERCENTILES: exampleRefer to the data on revenues for the 12 top-grossing Refer to the data on revenues for the 12 top-grossing North American concert tours of all timeNorth American concert tours of all time
Tour ArtistTotal Revenue(millions of dollars)
Steel Wheels, 1989Magic Summer, 1990Voodoo Lounge, 1994The Division Bell, 1994Hell Freezes Over, 1994Bridges to Babylon, 1997Popmart, 1997Twenty-Four Seven, 2000No Strings Attached, 2000Elevation, 2001Popodyssey, 2001Black and Blue, 2001
The Rolling StonesNew Kids on the BlockThe Rolling StonesPink FloydThe EaglesThe Rolling StonesU2Tina Turner‘N-SyncU2‘N-SyncThe Backstreet Boys
98.074.1121.2103.579.489.379.980.276.4109.786.882.1
PERCENTILES: examplePERCENTILES: example
Find the value of the 42nd percentileFind the value of the 42nd percentile
The data arranged in increasing order as follows:The data arranged in increasing order as follows:
The position of the 42nd percentile isThe position of the 42nd percentile is
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
th term04.5100
)12)(42(
100
kn
PPkk = 42nd percentile = 80.2 = $80.2 million = 42nd percentile = 80.2 = $80.2 million
Thus, approximately 42% of the revenues in the given data are equal Thus, approximately 42% of the revenues in the given data are equal to or less than $80.2 million and 58 % are greater than $80.2 millionto or less than $80.2 million and 58 % are greater than $80.2 million