Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s...

11SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Slides Prepared by

JOHN S. LOUCKSSt. Edward’s University

Slides Prepared bySlides Prepared by

JOHN S. LOUCKSJOHN S. LOUCKSSt. EdwardSt. Edward’’s Universitys University


Chapter 3Chapter 3Descriptive Statistics: Numerical MeasuresDescriptive Statistics: Numerical Measures

Part BPart BMeasures of Distribution Shape, Relative Location, Measures of Distribution Shape, Relative Location, and Detecting Outliersand Detecting OutliersExploratory Data AnalysisExploratory Data AnalysisMeasures of Association Between Two VariablesMeasures of Association Between Two VariablesThe Weighted Mean and The Weighted Mean and

Working with Grouped DataWorking with Grouped Data


Measures of Distribution Shape,Measures of Distribution Shape,Relative Location, and Detecting OutliersRelative Location, and Detecting Outliers

Distribution ShapeDistribution Shapezz--ScoresScoresChebyshevChebyshev’’s Theorems TheoremEmpirical RuleEmpirical RuleDetecting OutliersDetecting Outliers


Distribution Shape: Distribution Shape: SkewnessSkewness

An important measure of the shape of a distribution An important measure of the shape of a distribution is called is called skewnessskewness..

The formula for computing The formula for computing skewnessskewness for a data set is for a data set is somewhat complex.somewhat complex.

SkewnessSkewness can be easily computed using statistical can be easily computed using statistical software.software.


Distribution Shape:Distribution Shape: SkewnessSkewness

Symmetric (not skewed)Symmetric (not skewed)•• Skewness Skewness is zero.is zero.•• Mean and median are equal.Mean and median are equal.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = 0 = 0


Distribution Shape: Distribution Shape: SkewnessSkewness

Moderately Skewed LeftModerately Skewed Left•• Skewness Skewness is negative.is negative.•• Mean will usually be less than the median.Mean will usually be less than the median.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness Skewness = = −− .31 .31



Moderately Skewed RightModerately Skewed Right•• SkewnessSkewness is positive.is positive.•• Mean will usually be more than the median.Mean will usually be more than the median.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness Skewness = .31 = .31



Highly Skewed RightHighly Skewed Right•• SkewnessSkewness is positive (often above 1.0).is positive (often above 1.0).•• Mean will usually be more than the median.Mean will usually be more than the median.

Rela

tive

Freq

uenc

yRe

lativ

e Fr

eque

ncy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = 1.25 = 1.25


Seventy efficiency apartmentsSeventy efficiency apartmentswere randomly sampled inwere randomly sampled ina small college town. Thea small college town. Themonthly rent prices formonthly rent prices forthese apartments are listedthese apartments are listedin ascending order on the next slide. in ascending order on the next slide.


Example: Apartment RentsExample: Apartment Rents


425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615



Rela

tive

Freq

uenc

yRe

lativ

e Fr

eque

ncy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = .92 = .92



The z-score is often called the standardized value.The The zz--scorescore is often called the standardized value.is often called the standardized value.

It denotes the number of standard deviations a datavalue xi is from the mean.It denotes the number of standard deviations a dataIt denotes the number of standard deviations a datavalue value xxii is from the mean.is from the mean.

zz--ScoresScores

z x xsi

i=−z x xsi

i=−


zz--ScoresScores

A data value less than the sample mean will have aA data value less than the sample mean will have azz--score less than zero.score less than zero.A data value greater than the sample mean will haveA data value greater than the sample mean will havea za z--score greater than zero.score greater than zero.A data value equal to the sample mean will have aA data value equal to the sample mean will have azz--score of zero.score of zero.

An observationAn observation’’s zs z--score is a measure of the relativescore is a measure of the relativelocation of the observation in a data set.location of the observation in a data set.


zz--Score of Smallest Value (425)Score of Smallest Value (425)

425 490.80 1.2054.74

ix xzs− −

= = = −425 490.80 1.20

54.74ix xzs− −

= = = −

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

zz--ScoresScores

Standardized Values for Apartment RentsStandardized Values for Apartment Rents


ChebyshevChebyshev’’s Theorems Theorem

At least (1 - 1/z2) of the items in any data set will bewithin z standard deviations of the mean, where z isany value greater than 1.

At least (1 At least (1 -- 1/1/zz22) of the items in ) of the items in anyany data set will bedata set will bewithin within zz standard deviations of the mean, where standard deviations of the mean, where z z isisany value greater than 1.any value greater than 1.


At least of the data values must bewithin of the mean.

At least of the data values must beAt least of the data values must be

within within of the mean.of the mean.75%75%75%

z = 2 standard deviationszz = 2 standard deviations= 2 standard deviations











For example:For example:


Let Let zz = 1.5 with = 490.80 and = 1.5 with = 490.80 and ss = 54.74= 54.74xx

At least (1 At least (1 −− 1/(1.5)1/(1.5)22) = 1 ) = 1 −− 0.44 = 0.56 or 56%0.44 = 0.56 or 56%of the rent values must be betweenof the rent values must be between

xx -- zz((ss) = 490.80 ) = 490.80 −− 1.5(54.74) = 4091.5(54.74) = 409andand

xx + + zz((ss) = 490.80 + 1.5(54.74) = ) = 490.80 + 1.5(54.74) = 573573

(Actually, 86% of the rent values(Actually, 86% of the rent valuesare between 409 and 573.)are between 409 and 573.)


Empirical RuleEmpirical Rule

For data having a bellFor data having a bell--shaped distribution:shaped distribution:

of the values of a normal random variableare within of its mean.

of the values of a normal random variableof the values of a normal random variableare within are within of its mean.of its mean.68.26%68.26%68.26%

+/- 1 standard deviation+/+/-- 1 standard deviation1 standard deviation


of the values of a normal random variableof the values of a normal random variableare within ofare within of its mean.its mean.95.44%95.44%95.44%

+/- 2 standard deviations+/+/-- 2 standard deviations2 standard deviations


of the values of a normal random variableof the values of a normal random variableare within are within of its mean.of its mean.99.72%99.72%99.72%

+/- 3 standard deviations+/+/-- 3 standard deviations3 standard deviations


Empirical RuleEmpirical Rule

xxμμ –– 33σσ μμ –– 11σσ

μμ –– 22σσμμ + 1+ 1σσ

μμ + 2+ 2σσμμ + 3+ 3σσμμ

68.26%68.26%95.44%95.44%99.72%99.72%


Detecting OutliersDetecting Outliers

An An outlieroutlier is an unusually small or unusually largeis an unusually small or unusually largevalue in a data set.value in a data set.A data value with a zA data value with a z--score less than score less than --3 or greater3 or greaterthan +3 might be considered an outlier.than +3 might be considered an outlier.It might be:It might be:•• an incorrectly recorded data valuean incorrectly recorded data value•• a data value that was incorrectly included in thea data value that was incorrectly included in the

data setdata set•• a correctly recorded data value that belongs ina correctly recorded data value that belongs in

the data setthe data set


Detecting OutliersDetecting Outliers

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

The most extreme zThe most extreme z--scores are scores are --1.20 and 2.271.20 and 2.27

Using |Using |zz| | >> 3 as the criterion for an outlier, there are3 as the criterion for an outlier, there areno outliers in this data set.no outliers in this data set.

Standardized Values for Apartment RentsStandardized Values for Apartment Rents


Exploratory Data AnalysisExploratory Data Analysis

FiveFive--Number SummaryNumber SummaryBox PlotBox Plot


FiveFive--Number SummaryNumber Summary

11 Smallest ValueSmallest Value

First QuartileFirst Quartile

MedianMedian

Third QuartileThird Quartile

Largest ValueLargest Value

22

33

44

55


FiveFive--Number SummaryNumber Summary

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Lowest Value = 425Lowest Value = 425 First Quartile = 445First Quartile = 445Median = 475Median = 475

Third Quartile = 525Third Quartile = 525 Largest Value = 615Largest Value = 615


375375 400400 425425 450450 475475 500500 525525 550550 575575 600600 625625

A box is drawn with its ends located at the first andA box is drawn with its ends located at the first andthird quartiles.third quartiles.

Box PlotBox Plot

A vertical line is drawn in the box at the location ofA vertical line is drawn in the box at the location ofthe median (second quartile).the median (second quartile).

Q1 = 445Q1 = 445 Q3 = 525Q3 = 525Q2 = 475Q2 = 475


Box PlotBox Plot

Limits are located (not drawn) using the interquartile Limits are located (not drawn) using the interquartile range (IQR).range (IQR).Data outside these limits are considered Data outside these limits are considered outliersoutliers..The locations of each outlier is shown with the The locations of each outlier is shown with the symbolsymbol * * ..

…… continuedcontinued


Box PlotBox Plot

Lower Limit: Q1 Lower Limit: Q1 -- 1.5(IQR) = 445 1.5(IQR) = 445 -- 1.5(75) = 332.51.5(75) = 332.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5

The lower limit is located 1.5(IQR) below The lower limit is located 1.5(IQR) below QQ1.1.

The upper limit is located 1.5(IQR) above The upper limit is located 1.5(IQR) above QQ3.3.

There are no outliers (values less than 332.5 orThere are no outliers (values less than 332.5 orgreater than 637.5) in the apartment rent data.greater than 637.5) in the apartment rent data.


Box PlotBox Plot

Whiskers (dashed lines) are drawn from the ends of Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the box to the smallest and largest data values inside the limits.the limits.

375375 400400 425425 450450 475475 500500 525525 550550 575575 600600 625625

Smallest valueSmallest valueinside limits = 425inside limits = 425

Largest valueLargest valueinside limits = 615inside limits = 615


Measures of Association Measures of Association Between Two VariablesBetween Two Variables

CovarianceCovarianceCorrelation CoefficientCorrelation Coefficient


CovarianceCovariance

Positive values indicate a positive relationship.Positive values indicate a positive relationship.Positive values indicate a positive relationship.

Negative values indicate a negative relationship.Negative values indicate a negative relationship.Negative values indicate a negative relationship.

The covariance is a measure of the linear associationbetween two variables.The The covariancecovariance is a measure of the linear associationis a measure of the linear associationbetween two variables.between two variables.


CovarianceCovariance

The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

forforsamplessamples

forforpopulationspopulations

s x x y ynxy

i i=− −∑

−( )( )

1s x x y y

nxyi i=− −∑

−( )( )

1

σμ μ

xyi x i yx y

N=

− −∑ ( )( )σ

μ μxy

i x i yx yN

=− −∑ ( )( )


Correlation CoefficientCorrelation Coefficient

Values near +1 indicate a strong positive linearrelationship.Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linearrelationshiprelationship..

Values near -1 indicate a strong negative linearrelationship. Values near Values near --1 indicate a 1 indicate a strong negative linearstrong negative linearrelationshiprelationship. .

The coefficient can take on values between -1 and +1.The coefficient can take on values between The coefficient can take on values between --1 and +1.1 and +1.


The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

forforsamplessamples

forforpopulationspopulations

rss sxy

xy

x y=r

ss sxy

xy

x y= ρ

σσ σxy

xy

x y=ρ

σσ σxy

xy

x y=




Just because two variables are highly correlated, it does not mean that one variable is the cause of theother.

Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of thedoes not mean that one variable is the cause of theother.other.

Correlation is a measure of linear association and notnecessarily causation. Correlation is a measure of linear association and notCorrelation is a measure of linear association and notnecessarily causation. necessarily causation.


A golfer is interested in investigatingA golfer is interested in investigatingthe relationship, if any, between drivingthe relationship, if any, between drivingdistance and 18distance and 18--hole score.hole score.

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

Average DrivingAverage DrivingDistance (Distance (ydsyds.).)

AverageAverage1818--Hole ScoreHole Score

Covariance and Correlation CoefficientCovariance and Correlation Coefficient



277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

xx yy

10.6510.65--7.457.452.152.150.050.05

--11.3511.355.955.95

--1.01.01.01.0

0000

1.01.0--1.01.0

--10.6510.65--7.457.45

0000

--11.3511.35--5.955.95

( )ix x−( )ix x− ( )( )i ix x y y− −( )( )i ix x y y− −( )iy y−( )iy y−

AverageAverageStd. Dev.Std. Dev.

267.0267.0 70.070.0 --35.4035.408.21928.2192 .8944.8944

TotalTotal


Sample CovarianceSample Covariance

Sample Correlation CoefficientSample Correlation Coefficient


7.08 -.9631(8.2192)(.8944)

xyxy

x y

sr

s s−

= = =7.08 -.9631

(8.2192)(.8944)xy

xyx y

sr

s s−

= = =

( )( ) 35.40 7.081 6 1

i ixy

x x y ys

n− − −

= = = −− −

∑( )( ) 35.40 7.081 6 1

i ixy

x x y ys

n− − −

= = = −− −

∑


The Weighted Mean andThe Weighted Mean andWorking with Grouped DataWorking with Grouped Data

Weighted MeanWeighted MeanMean for Grouped DataMean for Grouped DataVariance for Grouped DataVariance for Grouped DataStandard Deviation for Grouped DataStandard Deviation for Grouped Data


Weighted MeanWeighted Mean

When the mean is computed by giving each dataWhen the mean is computed by giving each datavalue a weight that reflects its importance, it isvalue a weight that reflects its importance, it isreferred to as a referred to as a weighted meanweighted mean..In the computation of a grade point average (GPA),In the computation of a grade point average (GPA),the weights are the number of credit hours earned forthe weights are the number of credit hours earned foreach grade.each grade.When data values vary in importance, the analystWhen data values vary in importance, the analystmust choose the weight that best reflects themust choose the weight that best reflects theimportance of each value.importance of each value.


Weighted MeanWeighted Mean

i i

i

w xx

w= ∑∑

i i

i

w xx

w= ∑∑

where:where:xxii = value of observation = value of observation iiwwii = weight for observation = weight for observation ii


Grouped DataGrouped Data

The weighted mean computation can be used toThe weighted mean computation can be used toobtain approximations of the mean, variance, andobtain approximations of the mean, variance, andstandard deviation for the grouped data.standard deviation for the grouped data.To compute the weighted mean, we treat theTo compute the weighted mean, we treat themidpoint of each classmidpoint of each class as though it were the meanas though it were the meanof all items in the class.of all items in the class.We compute a weighted mean of the class midpointsWe compute a weighted mean of the class midpointsusing the using the class frequencies as weightsclass frequencies as weights..Similarly, in computing the variance and standardSimilarly, in computing the variance and standarddeviation, the class frequencies are used as weights.deviation, the class frequencies are used as weights.


Mean for Grouped DataMean for Grouped Data

i if Mx

n= ∑ i if M

xn

= ∑

NMf ii∑=μ

NMf ii∑=μ

where: where: ffii = frequency of class = frequency of class ii

MMi i = midpoint of class = midpoint of class ii

Sample DataSample Data

Population DataPopulation Data


Given below is the previous sample of monthly rentsGiven below is the previous sample of monthly rentsfor 70 efficiency apartments, presented here as groupedfor 70 efficiency apartments, presented here as groupeddata in the form of a frequency distribution. data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Sample Mean for Grouped DataSample Mean for Grouped Data


Sample Mean for Grouped DataSample Mean for Grouped Data

This approximationThis approximationdiffers by $2.41 fromdiffers by $2.41 fromthe actual samplethe actual samplemean of $490.80.mean of $490.80.

34,525 493.2170

x = =34,525 493.21

70x = =

Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

Mi

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

f iMi

3436.07641.55634.03916.03566.52118.01099.02278.01179.03657.034525.0


Variance for Grouped DataVariance for Grouped Data

s f M xn

i i22

1=

−∑−

( )s f M xn

i i22

1=

−∑−

( )

σ μ22

=−∑ f M

Ni i( )σ μ2

2

=−∑ f M

Ni i( )

For sample dataFor sample data

For population dataFor population data


Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

Mi

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

Sample Variance for Grouped DataSample Variance for Grouped Data

Mi - x-63.7-43.7-23.7-3.716.336.356.376.396.3116.3

f i(Mi - x )2

32471.7132479.596745.97110.11

1857.555267.866337.13

23280.6618543.5381140.18

208234.29

(Mi - x )2

4058.961910.56562.1613.76

265.361316.963168.565820.169271.76

13523.36

continuedcontinued


3,017.89 54.94s = =3,017.89 54.94s = =

ss22 = 208,234.29/(70 = 208,234.29/(70 –– 1) = 3,017.891) = 3,017.89

This approximation differs by only $.20 This approximation differs by only $.20 from the actual standard deviation of $54.74.from the actual standard deviation of $54.74.

Sample Variance for Grouped DataSample Variance for Grouped Data

Sample VarianceSample Variance

Sample Standard DeviationSample Standard Deviation


End of Chapter 3, Part BEnd of Chapter 3, Part B

Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s...

Documents

Transcript of Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s...