1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
-
Upload
laurence-chapman -
Category
Documents
-
view
216 -
download
1
Transcript of 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
![Page 1: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/1.jpg)
1 1 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
![Page 2: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/2.jpg)
2 2 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Chapter 3Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical
MeasuresMeasures
Measures of LocationMeasures of Location Measures of VariabilityMeasures of Variability Measures of Distribution Shape, Relative Measures of Distribution Shape, Relative
Location, and Detecting OutliersLocation, and Detecting Outliers Measures of Association Between Two Measures of Association Between Two
VariablesVariables Weighted MeanWeighted Mean
![Page 3: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/3.jpg)
3 3 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Measures of LocationMeasures of Location
If the measures are computedIf the measures are computed for data from a sample,for data from a sample,
they are called they are called sample statisticssample statistics..
If the measures are computedIf the measures are computed for data from a population,for data from a population,
they are called they are called population parameterspopulation parameters..
A sample statistic is referred toA sample statistic is referred toas the as the point estimatorpoint estimator of the of the
corresponding population parameter.corresponding population parameter.
MeanMean
MedianMedian ModeMode PercentilesPercentiles QuartilesQuartiles
![Page 4: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/4.jpg)
4 4 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
MeanMean
The The meanmean of a data set is the average of all of a data set is the average of all the data values.the data values.
The sample mean is the point estimator of The sample mean is the point estimator of the population mean the population mean . .
xx
![Page 5: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/5.jpg)
5 5 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Sample Mean Sample Mean xx
Number ofNumber ofobservationsobservationsin the samplein the sample
Number ofNumber ofobservationsobservationsin the samplein the sample
Sum of the valuesSum of the valuesof the of the nn observations observations
Sum of the valuesSum of the valuesof the of the nn observations observations
ixx
n ix
xn
![Page 6: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/6.jpg)
6 6 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Population Mean Population Mean
Number ofNumber ofobservations inobservations inthe populationthe population
Number ofNumber ofobservations inobservations inthe populationthe population
Sum of the valuesSum of the valuesof the of the NN observations observations
Sum of the valuesSum of the valuesof the of the NN observations observations
ix
N
ix
N
![Page 7: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/7.jpg)
7 7 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
MedianMedian
Whenever a data set has extreme values, the medianWhenever a data set has extreme values, the median is the preferred measure of central location.is the preferred measure of central location.
A few extremely large incomes or property valuesA few extremely large incomes or property values can inflate the mean.can inflate the mean.
The median is the measure of location most oftenThe median is the measure of location most often reported for annual income and property value data.reported for annual income and property value data.
The The medianmedian of a data set is the value in the middle of a data set is the value in the middle when the data items are arranged in ascending order.when the data items are arranged in ascending order.
![Page 8: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/8.jpg)
8 8 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
MedianMedian
1212 2222 2626 2727 27272424 2828
For an For an odd numberodd number of observations: of observations:
in ascending orderin ascending order
2626 2828 2727 2222 2424 2727 1212 7 observations7 observations
the median is the middle value.the median is the middle value.
Median = 26Median = 26
![Page 9: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/9.jpg)
9 9 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
2828
MedianMedian
For an For an even numbereven number of observations: of observations:
in ascending orderin ascending order
2727 8 observations8 observations
the median is the average of the middle two values.the median is the average of the middle two values.
Median = (26 + 27)/2 = 26.5Median = (26 + 27)/2 = 26.5
30301212 2222 2626 2727 27272424
2626 2828 2727 2222 2424 3030 1212
![Page 10: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/10.jpg)
10 10 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Mean VS MedianMean VS Median
The mean IS affected by outliers (extreme The mean IS affected by outliers (extreme observations)observations)
The median IS NOT affected by outliersThe median IS NOT affected by outliers
![Page 11: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/11.jpg)
11 11 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
ModeMode
The The modemode of a data set is the value that occurs with of a data set is the value that occurs with greatest frequency.greatest frequency. The greatest frequency can occur at two or moreThe greatest frequency can occur at two or more different values.different values. If the data have exactly two modes, the data areIf the data have exactly two modes, the data are bimodalbimodal..
If the data have more than two modes, the data areIf the data have more than two modes, the data are multimodalmultimodal..
![Page 12: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/12.jpg)
12 12 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
PercentilesPercentiles
A percentile provides information about how theA percentile provides information about how the data are spread over the interval from the smallestdata are spread over the interval from the smallest value to the largest value.value to the largest value. Admission test scores for colleges and universitiesAdmission test scores for colleges and universities are frequently reported in terms of percentiles.are frequently reported in terms of percentiles.
![Page 13: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/13.jpg)
13 13 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The The ppth percentileth percentile of a data set is a value such of a data set is a value such that at least that at least pp percent of the items take on this percent of the items take on this value or less and at least (100 - value or less and at least (100 - pp) percent of ) percent of the items take on this value or more.the items take on this value or more.
PercentilesPercentiles
![Page 14: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/14.jpg)
14 14 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
PercentilesPercentiles
Arrange the data in ascending order.Arrange the data in ascending order. Arrange the data in ascending order.Arrange the data in ascending order.
Compute index Compute index ii, the position of the , the position of the ppth percentile.th percentile. Compute index Compute index ii, the position of the , the position of the ppth percentile.th percentile.
ii = ( = (pp/100)/100)nn
If If ii is not an integer, round up. The is not an integer, round up. The pp th percentileth percentile is the value in the is the value in the ii th position.th position. If If ii is not an integer, round up. The is not an integer, round up. The pp th percentileth percentile is the value in the is the value in the ii th position.th position.
If If ii is an integer, the is an integer, the pp th percentile is the averageth percentile is the average of the values in positionsof the values in positions i i and and ii +1.+1. If If ii is an integer, the is an integer, the pp th percentile is the averageth percentile is the average of the values in positionsof the values in positions i i and and ii +1.+1.
![Page 15: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/15.jpg)
15 15 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Note on Excel’s Note on Excel’s PercentilePercentile Function Function
The formula that Excel uses is The formula that Excel uses is differentdifferent from the one used in the textbook! from the one used in the textbook!
In order to find the observation where the median occurs, In order to find the observation where the median occurs, Excel uses the following formula:Excel uses the following formula:
LLpp = ( = (pp/100)/100)nn + (1 – + (1 – pp/100)/100)
Once the observation is identified Excel will: Once the observation is identified Excel will: 1.1. If If LLpp is a whole number (e.g. 12), is a whole number (e.g. 12),
Excel’s result will be the same as the textbook’s.Excel’s result will be the same as the textbook’s.2.2. If If LpLp is not a whole number (e.g. 12.3) Excel’s is not a whole number (e.g. 12.3) Excel’s
result will be different from the textbook’s.result will be different from the textbook’s.
![Page 16: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/16.jpg)
16 16 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
QuartilesQuartiles
Quartiles are specific percentiles.Quartiles are specific percentiles. First Quartile = 25th PercentileFirst Quartile = 25th Percentile
Second Quartile = 50th Percentile = MedianSecond Quartile = 50th Percentile = Median Third Quartile = 75th PercentileThird Quartile = 75th Percentile
![Page 17: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/17.jpg)
17 17 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Measures of VariabilityMeasures of Variability
It is often desirable to consider measures of variabilityIt is often desirable to consider measures of variability (dispersion), as well as measures of location.(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B weFor example, in choosing supplier A or supplier B we might consider not only the average delivery time formight consider not only the average delivery time for each, but also the variability in delivery time for each.each, but also the variability in delivery time for each.
![Page 18: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/18.jpg)
18 18 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Measures of VariabilityMeasures of Variability
RangeRange Interquartile RangeInterquartile Range VarianceVariance Standard DeviationStandard Deviation Coefficient of VariationCoefficient of Variation
![Page 19: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/19.jpg)
19 19 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
RangeRange
The The rangerange of a data set is the difference between the of a data set is the difference between the largest and smallest data values.largest and smallest data values.
It is the It is the simplest measuresimplest measure of variability. of variability. It is It is very sensitivevery sensitive to the smallest and largest data to the smallest and largest data values.values.
![Page 20: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/20.jpg)
20 20 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Interquartile RangeInterquartile Range
The The interquartile rangeinterquartile range of a data set is the difference of a data set is the difference between the third quartile and the first quartile.between the third quartile and the first quartile. It is the range for the It is the range for the middle 50%middle 50% of the data. of the data.
It overcomes the sensitivity to extreme data values.It overcomes the sensitivity to extreme data values.
![Page 21: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/21.jpg)
21 21 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The The variancevariance is a measure of variability that utilizes is a measure of variability that utilizes all the data.all the data.
VarianceVariance
It is based on the difference between the value ofIt is based on the difference between the value of each observation (each observation (xxii) and the mean ( for a sample,) and the mean ( for a sample, for a population).for a population).
xx
![Page 22: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/22.jpg)
22 22 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
VarianceVariance
The variance is computed as follows:The variance is computed as follows:
The variance is computed as follows:The variance is computed as follows:
The variance is the The variance is the average of the squaredaverage of the squared differencesdifferences between each data value and the mean. between each data value and the mean. The variance is the The variance is the average of the squaredaverage of the squared differencesdifferences between each data value and the mean. between each data value and the mean.
for afor asamplesample
for afor apopulationpopulation
22
( )xNi 2
2
( )xNis
xi x
n2
2
1
( )s
xi x
n2
2
1
( )
![Page 23: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/23.jpg)
23 23 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Standard DeviationStandard Deviation
The The standard deviationstandard deviation of a data set is the positive of a data set is the positive square root of the variance.square root of the variance.
It is measured in the It is measured in the same units as the datasame units as the data, making, making it more easily interpreted than the variance.it more easily interpreted than the variance.
![Page 24: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/24.jpg)
24 24 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The standard deviation is computed as follows:The standard deviation is computed as follows:
The standard deviation is computed as follows:The standard deviation is computed as follows:
for afor asamplesample
for afor apopulationpopulation
Standard DeviationStandard Deviation
s s 2s s 2 2 2
![Page 25: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/25.jpg)
25 25 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The coefficient of variation is computed as follows:The coefficient of variation is computed as follows:
The coefficient of variation is computed as follows:The coefficient of variation is computed as follows:
Coefficient of VariationCoefficient of Variation
100 %s
x
100 %s
x
The The coefficient of variationcoefficient of variation indicates how large the indicates how large the standard deviation is in relation to the mean.standard deviation is in relation to the mean. The The coefficient of variationcoefficient of variation indicates how large the indicates how large the standard deviation is in relation to the mean.standard deviation is in relation to the mean.
for afor asamplesample
for afor apopulationpopulation
100 %
100 %
![Page 26: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/26.jpg)
26 26 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Measures of Distribution Shape,Measures of Distribution Shape,Relative Location, and Detecting OutliersRelative Location, and Detecting Outliers
Distribution ShapeDistribution Shape z-Scoresz-Scores Chebyshev’s TheoremChebyshev’s Theorem Empirical RuleEmpirical Rule Detecting OutliersDetecting Outliers
![Page 27: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/27.jpg)
27 27 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Distribution Shape: SkewnessDistribution Shape: Skewness
An important measure of the shape of a An important measure of the shape of a distribution is called distribution is called skewnessskewness..
The formula for computing skewness for a data The formula for computing skewness for a data set is somewhat complex.set is somewhat complex.
• Skewness can be easily computed using Skewness can be easily computed using statistical software.statistical software.
Excel’s SKEW function can be used to compute theExcel’s SKEW function can be used to compute the
skewness of a data set.skewness of a data set.
![Page 28: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/28.jpg)
28 28 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Distribution Shape: SkewnessDistribution Shape: Skewness
Symmetric (not skewed)Symmetric (not skewed)
• Skewness is zero.Skewness is zero.
• Mean and median are equal.Mean and median are equal.R
ela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = Skewness = 0 0
![Page 29: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/29.jpg)
29 29 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Distribution Shape: SkewnessDistribution Shape: Skewness
Moderately Skewed LeftModerately Skewed Left
• Skewness is negative.Skewness is negative.
• Mean will usually be less than the median.Mean will usually be less than the median.
Skewness = Skewness = .31 .31
![Page 30: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/30.jpg)
30 30 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Distribution Shape: SkewnessDistribution Shape: Skewness
Moderately Skewed RightModerately Skewed Right
• Skewness is positive.Skewness is positive.
• Mean will usually be more than the median.Mean will usually be more than the median.R
ela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = .31 Skewness = .31
![Page 31: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/31.jpg)
31 31 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The The z-scorez-score is often called the standardized value. is often called the standardized value. The The z-scorez-score is often called the standardized value. is often called the standardized value.
It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean. It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean.
z-Scoresz-Scores
zx xsii
zx xsii
![Page 32: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/32.jpg)
32 32 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
z-Scoresz-Scores
A data value less than the sample mean will have aA data value less than the sample mean will have a z-score less than zero.z-score less than zero. A data value greater than the sample mean will haveA data value greater than the sample mean will have a z-score greater than zero.a z-score greater than zero. A data value equal to the sample mean will have aA data value equal to the sample mean will have a z-score of zero.z-score of zero.
An observation’s z-score is a measure of the relativeAn observation’s z-score is a measure of the relative location of the observation in a data set.location of the observation in a data set.
![Page 33: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/33.jpg)
33 33 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Chebyshev’s TheoremChebyshev’s Theorem
At least (1 - 1/At least (1 - 1/zz22) of the items in ) of the items in anyany data set will be data set will be within within zz standard deviations of the mean, where standard deviations of the mean, where z z isis any value greater than 1.any value greater than 1.
At least (1 - 1/At least (1 - 1/zz22) of the items in ) of the items in anyany data set will be data set will be within within zz standard deviations of the mean, where standard deviations of the mean, where z z isis any value greater than 1.any value greater than 1.
![Page 34: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/34.jpg)
34 34 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
75%75%75%75%
zz = 2 standard deviations = 2 standard deviations zz = 2 standard deviations = 2 standard deviations
Chebyshev’s TheoremChebyshev’s Theorem
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
89%89%89%89%
zz = 3 standard deviations = 3 standard deviations zz = 3 standard deviations = 3 standard deviations
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
At least of the data values must beAt least of the data values must be
within of the mean.within of the mean.
94%94%94%94%
zz = 4 standard deviations = 4 standard deviations zz = 4 standard deviations = 4 standard deviations
![Page 35: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/35.jpg)
35 35 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Empirical RuleEmpirical Rule
For data having a bell-shaped distribution:For data having a bell-shaped distribution:
of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean. of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.68.26%68.26%68.26%68.26%
+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation
of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean. of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.95.44%95.44%95.44%95.44%
+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations
of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean. of the values of a normal random variableof the values of a normal random variable are within of its mean.are within of its mean.99.72%99.72%99.72%99.72%
+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations
![Page 36: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/36.jpg)
36 36 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Empirical RuleEmpirical Rule
xx – – 33 – – 11
– – 22 + 1+ 1
+ 2+ 2 + 3+ 3
68.26%68.26%95.44%95.44%99.72%99.72%
![Page 37: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/37.jpg)
37 37 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Detecting OutliersDetecting Outliers
An An outlieroutlier is an unusually small or unusually large is an unusually small or unusually large value in a data set.value in a data set. A data value with a z-score less than -3 or greaterA data value with a z-score less than -3 or greater than +3 might be considered an outlier.than +3 might be considered an outlier. It might be:It might be:
• an incorrectly recorded data valuean incorrectly recorded data value• a data value that was incorrectly included in thea data value that was incorrectly included in the data setdata set• a correctly recorded data value that belongs ina correctly recorded data value that belongs in the data setthe data set
![Page 38: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/38.jpg)
38 38 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Measures of Association Measures of Association Between Two VariablesBetween Two Variables
CovarianceCovariance Correlation CoefficientCorrelation Coefficient
![Page 39: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/39.jpg)
39 39 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
CovarianceCovariance
Positive values indicate a positive relationship.Positive values indicate a positive relationship. Positive values indicate a positive relationship.Positive values indicate a positive relationship.
Negative values indicate a negative relationship.Negative values indicate a negative relationship. Negative values indicate a negative relationship.Negative values indicate a negative relationship.
The The covariancecovariance is a measure of the is a measure of the linearlinear association association between two variables.between two variables. The The covariancecovariance is a measure of the is a measure of the linearlinear association association between two variables.between two variables.
![Page 40: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/40.jpg)
40 40 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
CovarianceCovariance
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
forforsamplessamples
forforpopulationspopulations
sx x y ynxy
i i
( )( )
1s
x x y ynxy
i i
( )( )
1
xyi x i yx y
N
( )( )
xy
i x i yx y
N
( )( )
![Page 41: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/41.jpg)
41 41 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Correlation CoefficientCorrelation Coefficient
Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship.. Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship..
Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. . Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. .
The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1.
![Page 42: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/42.jpg)
42 42 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
forforsamplessamples
forforpopulationspopulations
rs
s sxyxy
x yrs
s sxyxy
x y
xyxy
x y
xyxy
x y
Correlation CoefficientCorrelation Coefficient
![Page 43: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/43.jpg)
43 43 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Correlation CoefficientCorrelation Coefficient
Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of thedoes not mean that one variable is the cause of the other.other.
Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of thedoes not mean that one variable is the cause of the other.other.
Correlation is a measure of linear association and notCorrelation is a measure of linear association and not necessarily causation. necessarily causation. Correlation is a measure of linear association and notCorrelation is a measure of linear association and not necessarily causation. necessarily causation.
![Page 44: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/44.jpg)
44 44 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
A golfer is interested in investigatingA golfer is interested in investigating
the relationship, if any, between drivingthe relationship, if any, between driving
distance and 18-hole score.distance and 18-hole score.
277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9
696971717070707071716969
Average DrivingAverage DrivingDistance (yds.)Distance (yds.)
AverageAverage18-Hole Score18-Hole Score
Covariance and Correlation CoefficientCovariance and Correlation Coefficient
![Page 45: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/45.jpg)
45 45 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Weighted MeanWeighted Mean
When the mean is computed by giving each dataWhen the mean is computed by giving each data value a weight that reflects its importance, it isvalue a weight that reflects its importance, it is referred to as a referred to as a weighted meanweighted mean.. In the computation of a grade point average (GPA),In the computation of a grade point average (GPA), the weights are the number of credit hours earned forthe weights are the number of credit hours earned for each grade.each grade. When data values vary in importance, the analystWhen data values vary in importance, the analyst must choose the weight that best reflects themust choose the weight that best reflects the importance of each value.importance of each value.
![Page 46: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/46.jpg)
46 46 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
Weighted MeanWeighted Mean
i i
i
wxx
w
i i
i
wxx
w
where:where:
xxii = value of observation = value of observation ii
wwi i = weight for observation = weight for observation ii
![Page 47: 1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eeb5503460f94bfcb43/html5/thumbnails/47.jpg)
47 47 Slide
Slide
© 2007 Thomson South-Western. All Rights Reserved© 2007 Thomson South-Western. All Rights Reserved
In class empirical exercisesIn class empirical exercises