1
Tendencia central y dispersión Tendencia central y dispersión de una distribuciónde una distribución
2
Review Topics•Measures of Central Tendency Mean, Median, Mode•Quartile
•Measures of Variation The Range, Variance and Standard Deviation, Coefficient of variation•Shape Symmetric, Skewed
3
Important Summary Measures
Central Tendency
MeanMedian
Mode
Quartile
One sample Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
4
Measures of Central Tendency
Central Tendency
Mean Median Mode
n
xn
ii
1
Data: You can access practice sample data on HMO premiums here.
5
With one data pointclearly the central location is at the pointitself.
But if the third data point appears on the left hand-sideof the midrange, it should “pull”the central location to the left.
Measures of Central Measures of Central Location (Tendency)Location (Tendency)
Usually, we focus our attention on two aspects of measures of central location:– Measure of the central data point (the average).– Measure of dispersion of the data about the average.
With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).
If the third data point appears exactly in the middle of the current range, the centrallocation should not change (because it is currently residing in the middle).
6
nx
x in
1i
– This is the most popular and useful measure of central location
Sum of the measurementsNumber of measurements
Mean =
Sample mean Population mean
Nx i
N1i
Sample size Population size
nx
x in
1i
Arithmetic Arithmetic meanmean
7
66654321
61 xxxxxxx
x ii
• Example 4.1
The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by
77 33 99 44 664.54.5
• Example 4.2
Suppose the telephone bills of example 2.1 represent population of measurements. The population mean is
200x...xx
200x 20021i
2001i 42.1942.19 15.3015.30 53.2153.21
43.5943.59
2 2
8
26,26,28,29,30,32,60,31
Odd number of observations
26,26,28,29,30,32,60
Example 4.4
Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29.Find the median salary.
– The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude.
Suppose one employee’s salary of $31,000was added to the group recorded before.Find the median salary.
Even number of observations
26,26,28,29, 30,32,60,3126,26,28,29, 30,32,60,31
There are two middle values!First, sort the salaries.Then, locate the value in the middle
First, sort the salaries.Then, locate the values in the middle26,26,28,29, 30,32,60,3129.5,
The medianThe median
9
– The mode of a set of measurements is the value that occurs most frequently.
– Set of data may have one mode (or modal class), or two or more modes.
The modal classFor large data setsthe modal class is much more relevant than the a single-value mode.
The modeThe mode
10
• Example 4.6A professor of statistics wants to report the results of a midterm exam, taken by 100 students. The data appear in file XM04-06.Find the mean, median, and mode, and describe the information they provide.
Marks
Mean 73.98Standard Error 2.1502163Median 81Mode 84Standard Deviation 21.502163Sample Variance 462.34303Kurtosis 0.3936606Skewness -1.073098Range 89Minimum 11Maximum 100Sum 7398Count 100
Marks
Mean 73.98Standard Error 2.1502163Median 81Mode 84Standard Deviation 21.502163Sample Variance 462.34303Kurtosis 0.3936606Skewness -1.073098Range 89Minimum 11Maximum 100Sum 7398Count 100
The mean provides informationabout the over-all performance level of the class.The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute.
Excel Results
11
Relationship among Mean, Median, Relationship among Mean, Median, and Modeand Mode
If a distribution is symmetrical, the mean, median and mode coincide
If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode
12
`̀
If a distribution is symmetrical, the mean, median and mode coincide
If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode MeanMedian
Mode
A negatively skewed distribution(“skewed to the left”)
13
Measures of Variation
Variation
Variance Standard Deviation Coefficient of Variation
PopulationVariance
Sample
Variance
PopulationStandardDeviationSample
Standard
Deviation
Range
Interquartile Range
100%
X
SCV
14
Measures of variabilityMeasures of variability(Looking beyond the average)(Looking beyond the average)
Measures of central location fail to tell the whole story about the distribution.
A question of interest still remains unanswered:
How typical is the average value of all the measurements in the data set?
How much spread out are the measurements about the average value?
or
15
Observe two hypothetical data sets
The average value provides a good representation of thevalues in the data set.
Low variability data set
High variability data set
The same average value does not provide as good presentation of thevalues in the data set as before.
This is the previous data set. It is now changing to...
16
– The range of a set of measurements is the difference between the largest and smallest measurements.
– Its major advantage is the ease with which it can be computed.
– Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points.
? ? ?
But, how do all the measurements spread out?
Smallestmeasurement
Largestmeasurement
The range cannot assist in answering this question
Range
The rangeThe range
17
– This measure of dispersion reflects the values of all the measurements.
– The variance of a population of N measurements x1, x2,…,xN having a mean is defined as
– The variance of a sample of n measurementsx1, x2, …,xn having a mean is defined as
N
)x( 2i
N1i2
N
)x( 2i
N1i2
x
1n
)xx(s
2i
n1i2
1n
)xx(s
2i
n1i2
The varianceThe variance
18
Consider two small populations:Population A: 8, 9, 10, 11, 12Population B: 4, 7, 10, 13, 16
1098
74 10
11 12
13 16
8-10= -2
9-10= -111-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
The mean of both populations is 10...
…but measurements in Bare much more dispersedthen those in A.
Thus, a measure of dispersion is needed that agrees with this observation.
Let us start by calculatingthe sum of deviations
A
B
The sum of deviations is zero in both cases,therefore, another measure is needed.
19
1098
74 10
11 12
13 16
8-10= -2
9-10= -111-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
A
B
The sum of deviations is zero in both cases,therefore, another measure is needed.
The sum of squared deviationsis used in calculating the variance.See example next.
20
Let us calculate the variance of the two populations
185
)1016()1013()1010()107()104( 222222B
25
)1012()1011()1010()109()108( 222222A
Why is the variance defined as the average squared deviation?Why not use the sum of squared deviations as a measure of dispersion instead?
After all, the sum of squared deviations increases in magnitude when the dispersionof a data set increases!!
21
– Example 4.8 Find the mean and the variance of the following
sample of measurements (in years).
3.4, 2.5, 4.1, 1.2, 2.8, 3.7– Solution
n
)x(x
1n1
1n
)xx(s
2i
n1i2
i
n
1i
2i
n1i2
95.26
7.176
7.38.22.11.45.24.36
xx i
61i
A shortcut formula
=[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)2
22
Sample Standard Deviation
1
2
n
XX i For the Sample : use n - 1 in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
18
1624161816171615161416121610 2222222
)()()()()()()(
= 4.2426
s
:X i
23
Interpreting Standard Interpreting Standard DeviationDeviation
The standard deviation can be used to– compare the variability of several distributions– make a statement about the general shape of a
distribution.
The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval
tsmeasuremen the of 68%ely approximat contains )sx,sx( tsmeasuremen the of 95%ely approximat contains )s2x,s2x(
tsmeasuremen the of allvirtually contains )s3x,s3x(
24
Comparing Standard Deviations
1
2
n
XX is =
= 4.2426
N
X i
2 = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24:X i
N= 8 Mean =16
25
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
26
Measures of AssociationMeasures of Association
Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram.
– Covariance - is there any pattern to the way two
variables move together? – Correlation coefficient - how strong is the
linear relationship between two variables
27
N
)y)((xY)COV(X,covariance Population yixi
N
)y)((xY)COV(X,covariance Population yixi
x (y) is the population mean of the variable X (Y)
N is the population size. n is the sample size.
1-n
)y)((xY)cov(X,covariance Sample yixi
1-n
)y)((xY)cov(X,covariance Sample yixi
The The covariancecovariance
28
– This coefficient answers the question: How strong is the association between X and Y.
yx
)Y,X(COV
ncorrelatio oft coefficien Population
yx
)Y,X(COV
ncorrelatio oft coefficien Population
yxss)Y,Xcov(
r
ncorrelatio oft coefficien Sample
yxss
)Y,Xcov(r
ncorrelatio oft coefficien Sample
The coefficient of correlationThe coefficient of correlation
29
COV(X,Y)=0 or r =
+1
0
-1
Strong positive linear relationship
No linear relationship
Strong negative linear relationship
or
COV(X,Y)>0
COV(X,Y)<0
Top Related