Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s...

48
1 1 Slide Slide © © 2006 Thomson/South 2006 Thomson/South - - Western Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by Slides Prepared by JOHN S. LOUCKS JOHN S. LOUCKS St. Edward St. Edward s University s University

Transcript of Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s...

Page 1: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

11SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Slides Prepared by

JOHN S. LOUCKSSt. Edward’s University

Slides Prepared bySlides Prepared by

JOHN S. LOUCKSJOHN S. LOUCKSSt. EdwardSt. Edward’’s Universitys University

Page 2: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

22SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Chapter 3Chapter 3Descriptive Statistics: Numerical MeasuresDescriptive Statistics: Numerical Measures

Part BPart BMeasures of Distribution Shape, Relative Location, Measures of Distribution Shape, Relative Location, and Detecting Outliersand Detecting OutliersExploratory Data AnalysisExploratory Data AnalysisMeasures of Association Between Two VariablesMeasures of Association Between Two VariablesThe Weighted Mean and The Weighted Mean and

Working with Grouped DataWorking with Grouped Data

Page 3: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

33SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Measures of Distribution Shape,Measures of Distribution Shape,Relative Location, and Detecting OutliersRelative Location, and Detecting Outliers

Distribution ShapeDistribution Shapezz--ScoresScoresChebyshevChebyshev’’s Theorems TheoremEmpirical RuleEmpirical RuleDetecting OutliersDetecting Outliers

Page 4: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

44SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Distribution Shape: Distribution Shape: SkewnessSkewness

An important measure of the shape of a distribution An important measure of the shape of a distribution is called is called skewnessskewness..

The formula for computing The formula for computing skewnessskewness for a data set is for a data set is somewhat complex.somewhat complex.

SkewnessSkewness can be easily computed using statistical can be easily computed using statistical software.software.

Page 5: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

55SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Distribution Shape:Distribution Shape: SkewnessSkewness

Symmetric (not skewed)Symmetric (not skewed)•• Skewness Skewness is zero.is zero.•• Mean and median are equal.Mean and median are equal.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = 0 = 0

Page 6: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

66SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Distribution Shape: Distribution Shape: SkewnessSkewness

Moderately Skewed LeftModerately Skewed Left•• Skewness Skewness is negative.is negative.•• Mean will usually be less than the median.Mean will usually be less than the median.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness Skewness = = −− .31 .31

Page 7: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

77SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Distribution Shape:Distribution Shape: SkewnessSkewness

Moderately Skewed RightModerately Skewed Right•• SkewnessSkewness is positive.is positive.•• Mean will usually be more than the median.Mean will usually be more than the median.Re

lativ

e Fr

eque

ncy

Rela

tive

Freq

uenc

y

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness Skewness = .31 = .31

Page 8: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

88SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Distribution Shape:Distribution Shape: SkewnessSkewness

Highly Skewed RightHighly Skewed Right•• SkewnessSkewness is positive (often above 1.0).is positive (often above 1.0).•• Mean will usually be more than the median.Mean will usually be more than the median.

Rela

tive

Freq

uenc

yRe

lativ

e Fr

eque

ncy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = 1.25 = 1.25

Page 9: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

99SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Seventy efficiency apartmentsSeventy efficiency apartmentswere randomly sampled inwere randomly sampled ina small college town. Thea small college town. Themonthly rent prices formonthly rent prices forthese apartments are listedthese apartments are listedin ascending order on the next slide. in ascending order on the next slide.

Distribution Shape:Distribution Shape: SkewnessSkewness

Example: Apartment RentsExample: Apartment Rents

Page 10: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1010SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Distribution Shape:Distribution Shape: SkewnessSkewness

Page 11: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1111SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Rela

tive

Freq

uenc

yRe

lativ

e Fr

eque

ncy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

SkewnessSkewness = .92 = .92

Distribution Shape:Distribution Shape: SkewnessSkewness

Page 12: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1212SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

The z-score is often called the standardized value.The The zz--scorescore is often called the standardized value.is often called the standardized value.

It denotes the number of standard deviations a datavalue xi is from the mean.It denotes the number of standard deviations a dataIt denotes the number of standard deviations a datavalue value xxii is from the mean.is from the mean.

zz--ScoresScores

z x xsi

i=−z x xsi

i=−

Page 13: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1313SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

zz--ScoresScores

A data value less than the sample mean will have aA data value less than the sample mean will have azz--score less than zero.score less than zero.A data value greater than the sample mean will haveA data value greater than the sample mean will havea za z--score greater than zero.score greater than zero.A data value equal to the sample mean will have aA data value equal to the sample mean will have azz--score of zero.score of zero.

An observationAn observation’’s zs z--score is a measure of the relativescore is a measure of the relativelocation of the observation in a data set.location of the observation in a data set.

Page 14: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1414SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

zz--Score of Smallest Value (425)Score of Smallest Value (425)

425 490.80 1.2054.74

ix xzs− −

= = = −425 490.80 1.20

54.74ix xzs− −

= = = −

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

zz--ScoresScores

Standardized Values for Apartment RentsStandardized Values for Apartment Rents

Page 15: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1515SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

ChebyshevChebyshev’’s Theorems Theorem

At least (1 - 1/z2) of the items in any data set will bewithin z standard deviations of the mean, where z isany value greater than 1.

At least (1 At least (1 -- 1/1/zz22) of the items in ) of the items in anyany data set will bedata set will bewithin within zz standard deviations of the mean, where standard deviations of the mean, where z z isisany value greater than 1.any value greater than 1.

Page 16: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1616SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

At least of the data values must bewithin of the mean.

At least of the data values must beAt least of the data values must be

within within of the mean.of the mean.75%75%75%

z = 2 standard deviationszz = 2 standard deviations= 2 standard deviations

ChebyshevChebyshev’’s Theorems Theorem

At least of the data values must bewithin of the mean.

At least of the data values must beAt least of the data values must be

within within of the mean.of the mean.89%89%89%

z = 3 standard deviationszz = 3 standard deviations= 3 standard deviations

At least of the data values must bewithin of the mean.

At least of the data values must beAt least of the data values must be

within within of the mean.of the mean.94%94%94%

z = 4 standard deviationszz = 4 standard deviations= 4 standard deviations

Page 17: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1717SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

For example:For example:

ChebyshevChebyshev’’s Theorems Theorem

Let Let zz = 1.5 with = 490.80 and = 1.5 with = 490.80 and ss = 54.74= 54.74xx

At least (1 At least (1 −− 1/(1.5)1/(1.5)22) = 1 ) = 1 −− 0.44 = 0.56 or 56%0.44 = 0.56 or 56%of the rent values must be betweenof the rent values must be between

xx -- zz((ss) = 490.80 ) = 490.80 −− 1.5(54.74) = 4091.5(54.74) = 409andand

xx + + zz((ss) = 490.80 + 1.5(54.74) = ) = 490.80 + 1.5(54.74) = 573573

(Actually, 86% of the rent values(Actually, 86% of the rent valuesare between 409 and 573.)are between 409 and 573.)

Page 18: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1818SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Empirical RuleEmpirical Rule

For data having a bellFor data having a bell--shaped distribution:shaped distribution:

of the values of a normal random variableare within of its mean.

of the values of a normal random variableof the values of a normal random variableare within are within of its mean.of its mean.68.26%68.26%68.26%

+/- 1 standard deviation+/+/-- 1 standard deviation1 standard deviation

of the values of a normal random variableare within of its mean.

of the values of a normal random variableof the values of a normal random variableare within ofare within of its mean.its mean.95.44%95.44%95.44%

+/- 2 standard deviations+/+/-- 2 standard deviations2 standard deviations

of the values of a normal random variableare within of its mean.

of the values of a normal random variableof the values of a normal random variableare within are within of its mean.of its mean.99.72%99.72%99.72%

+/- 3 standard deviations+/+/-- 3 standard deviations3 standard deviations

Page 19: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

1919SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Empirical RuleEmpirical Rule

xxμμ –– 33σσ μμ –– 11σσ

μμ –– 22σσμμ + 1+ 1σσ

μμ + 2+ 2σσμμ + 3+ 3σσμμ

68.26%68.26%95.44%95.44%99.72%99.72%

Page 20: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2020SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Detecting OutliersDetecting Outliers

An An outlieroutlier is an unusually small or unusually largeis an unusually small or unusually largevalue in a data set.value in a data set.A data value with a zA data value with a z--score less than score less than --3 or greater3 or greaterthan +3 might be considered an outlier.than +3 might be considered an outlier.It might be:It might be:•• an incorrectly recorded data valuean incorrectly recorded data value•• a data value that was incorrectly included in thea data value that was incorrectly included in the

data setdata set•• a correctly recorded data value that belongs ina correctly recorded data value that belongs in

the data setthe data set

Page 21: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2121SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Detecting OutliersDetecting Outliers

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

The most extreme zThe most extreme z--scores are scores are --1.20 and 2.271.20 and 2.27

Using |Using |zz| | >> 3 as the criterion for an outlier, there are3 as the criterion for an outlier, there areno outliers in this data set.no outliers in this data set.

Standardized Values for Apartment RentsStandardized Values for Apartment Rents

Page 22: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2222SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Exploratory Data AnalysisExploratory Data Analysis

FiveFive--Number SummaryNumber SummaryBox PlotBox Plot

Page 23: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2323SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

FiveFive--Number SummaryNumber Summary

11 Smallest ValueSmallest Value

First QuartileFirst Quartile

MedianMedian

Third QuartileThird Quartile

Largest ValueLargest Value

22

33

44

55

Page 24: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2424SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

FiveFive--Number SummaryNumber Summary

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Lowest Value = 425Lowest Value = 425 First Quartile = 445First Quartile = 445Median = 475Median = 475

Third Quartile = 525Third Quartile = 525 Largest Value = 615Largest Value = 615

Page 25: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2525SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

375375 400400 425425 450450 475475 500500 525525 550550 575575 600600 625625

A box is drawn with its ends located at the first andA box is drawn with its ends located at the first andthird quartiles.third quartiles.

Box PlotBox Plot

A vertical line is drawn in the box at the location ofA vertical line is drawn in the box at the location ofthe median (second quartile).the median (second quartile).

Q1 = 445Q1 = 445 Q3 = 525Q3 = 525Q2 = 475Q2 = 475

Page 26: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2626SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Box PlotBox Plot

Limits are located (not drawn) using the interquartile Limits are located (not drawn) using the interquartile range (IQR).range (IQR).Data outside these limits are considered Data outside these limits are considered outliersoutliers..The locations of each outlier is shown with the The locations of each outlier is shown with the symbolsymbol * * ..

…… continuedcontinued

Page 27: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2727SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Box PlotBox Plot

Lower Limit: Q1 Lower Limit: Q1 -- 1.5(IQR) = 445 1.5(IQR) = 445 -- 1.5(75) = 332.51.5(75) = 332.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5

The lower limit is located 1.5(IQR) below The lower limit is located 1.5(IQR) below QQ1.1.

The upper limit is located 1.5(IQR) above The upper limit is located 1.5(IQR) above QQ3.3.

There are no outliers (values less than 332.5 orThere are no outliers (values less than 332.5 orgreater than 637.5) in the apartment rent data.greater than 637.5) in the apartment rent data.

Page 28: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2828SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Box PlotBox Plot

Whiskers (dashed lines) are drawn from the ends of Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the box to the smallest and largest data values inside the limits.the limits.

375375 400400 425425 450450 475475 500500 525525 550550 575575 600600 625625

Smallest valueSmallest valueinside limits = 425inside limits = 425

Largest valueLargest valueinside limits = 615inside limits = 615

Page 29: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

2929SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Measures of Association Measures of Association Between Two VariablesBetween Two Variables

CovarianceCovarianceCorrelation CoefficientCorrelation Coefficient

Page 30: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3030SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

CovarianceCovariance

Positive values indicate a positive relationship.Positive values indicate a positive relationship.Positive values indicate a positive relationship.

Negative values indicate a negative relationship.Negative values indicate a negative relationship.Negative values indicate a negative relationship.

The covariance is a measure of the linear associationbetween two variables.The The covariancecovariance is a measure of the linear associationis a measure of the linear associationbetween two variables.between two variables.

Page 31: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3131SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

CovarianceCovariance

The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

forforsamplessamples

forforpopulationspopulations

s x x y ynxy

i i=− −∑

−( )( )

1s x x y y

nxyi i=− −∑

−( )( )

1

σμ μ

xyi x i yx y

N=

− −∑ ( )( )σ

μ μxy

i x i yx yN

=− −∑ ( )( )

Page 32: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3232SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Correlation CoefficientCorrelation Coefficient

Values near +1 indicate a strong positive linearrelationship.Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linearrelationshiprelationship..

Values near -1 indicate a strong negative linearrelationship. Values near Values near --1 indicate a 1 indicate a strong negative linearstrong negative linearrelationshiprelationship. .

The coefficient can take on values between -1 and +1.The coefficient can take on values between The coefficient can take on values between --1 and +1.1 and +1.

Page 33: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3333SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

forforsamplessamples

forforpopulationspopulations

rss sxy

xy

x y=r

ss sxy

xy

x y= ρ

σσ σxy

xy

x y=ρ

σσ σxy

xy

x y=

Correlation CoefficientCorrelation Coefficient

Page 34: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3434SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Correlation CoefficientCorrelation Coefficient

Just because two variables are highly correlated, it does not mean that one variable is the cause of theother.

Just because two variables are highly correlated, it Just because two variables are highly correlated, it does not mean that one variable is the cause of thedoes not mean that one variable is the cause of theother.other.

Correlation is a measure of linear association and notnecessarily causation. Correlation is a measure of linear association and notCorrelation is a measure of linear association and notnecessarily causation. necessarily causation.

Page 35: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3535SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

A golfer is interested in investigatingA golfer is interested in investigatingthe relationship, if any, between drivingthe relationship, if any, between drivingdistance and 18distance and 18--hole score.hole score.

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

Average DrivingAverage DrivingDistance (Distance (ydsyds.).)

AverageAverage1818--Hole ScoreHole Score

Covariance and Correlation CoefficientCovariance and Correlation Coefficient

Page 36: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3636SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Covariance and Correlation CoefficientCovariance and Correlation Coefficient

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

xx yy

10.6510.65--7.457.452.152.150.050.05

--11.3511.355.955.95

--1.01.01.01.0

0000

1.01.0--1.01.0

--10.6510.65--7.457.45

0000

--11.3511.35--5.955.95

( )ix x−( )ix x− ( )( )i ix x y y− −( )( )i ix x y y− −( )iy y−( )iy y−

AverageAverageStd. Dev.Std. Dev.

267.0267.0 70.070.0 --35.4035.408.21928.2192 .8944.8944

TotalTotal

Page 37: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3737SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Sample CovarianceSample Covariance

Sample Correlation CoefficientSample Correlation Coefficient

Covariance and Correlation CoefficientCovariance and Correlation Coefficient

7.08 -.9631(8.2192)(.8944)

xyxy

x y

sr

s s−

= = =7.08 -.9631

(8.2192)(.8944)xy

xyx y

sr

s s−

= = =

( )( ) 35.40 7.081 6 1

i ixy

x x y ys

n− − −

= = = −− −

∑( )( ) 35.40 7.081 6 1

i ixy

x x y ys

n− − −

= = = −− −

Page 38: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3838SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

The Weighted Mean andThe Weighted Mean andWorking with Grouped DataWorking with Grouped Data

Weighted MeanWeighted MeanMean for Grouped DataMean for Grouped DataVariance for Grouped DataVariance for Grouped DataStandard Deviation for Grouped DataStandard Deviation for Grouped Data

Page 39: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

3939SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Weighted MeanWeighted Mean

When the mean is computed by giving each dataWhen the mean is computed by giving each datavalue a weight that reflects its importance, it isvalue a weight that reflects its importance, it isreferred to as a referred to as a weighted meanweighted mean..In the computation of a grade point average (GPA),In the computation of a grade point average (GPA),the weights are the number of credit hours earned forthe weights are the number of credit hours earned foreach grade.each grade.When data values vary in importance, the analystWhen data values vary in importance, the analystmust choose the weight that best reflects themust choose the weight that best reflects theimportance of each value.importance of each value.

Page 40: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4040SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Weighted MeanWeighted Mean

i i

i

w xx

w= ∑∑

i i

i

w xx

w= ∑∑

where:where:xxii = value of observation = value of observation iiwwii = weight for observation = weight for observation ii

Page 41: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4141SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Grouped DataGrouped Data

The weighted mean computation can be used toThe weighted mean computation can be used toobtain approximations of the mean, variance, andobtain approximations of the mean, variance, andstandard deviation for the grouped data.standard deviation for the grouped data.To compute the weighted mean, we treat theTo compute the weighted mean, we treat themidpoint of each classmidpoint of each class as though it were the meanas though it were the meanof all items in the class.of all items in the class.We compute a weighted mean of the class midpointsWe compute a weighted mean of the class midpointsusing the using the class frequencies as weightsclass frequencies as weights..Similarly, in computing the variance and standardSimilarly, in computing the variance and standarddeviation, the class frequencies are used as weights.deviation, the class frequencies are used as weights.

Page 42: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4242SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Mean for Grouped DataMean for Grouped Data

i if Mx

n= ∑ i if M

xn

= ∑

NMf ii∑=μ

NMf ii∑=μ

where: where: ffii = frequency of class = frequency of class ii

MMi i = midpoint of class = midpoint of class ii

Sample DataSample Data

Population DataPopulation Data

Page 43: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4343SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Given below is the previous sample of monthly rentsGiven below is the previous sample of monthly rentsfor 70 efficiency apartments, presented here as groupedfor 70 efficiency apartments, presented here as groupeddata in the form of a frequency distribution. data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Sample Mean for Grouped DataSample Mean for Grouped Data

Page 44: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4444SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Sample Mean for Grouped DataSample Mean for Grouped Data

This approximationThis approximationdiffers by $2.41 fromdiffers by $2.41 fromthe actual samplethe actual samplemean of $490.80.mean of $490.80.

34,525 493.2170

x = =34,525 493.21

70x = =

Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

Mi

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

f iMi

3436.07641.55634.03916.03566.52118.01099.02278.01179.03657.034525.0

Page 45: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4545SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Variance for Grouped DataVariance for Grouped Data

s f M xn

i i22

1=

−∑−

( )s f M xn

i i22

1=

−∑−

( )

σ μ22

=−∑ f M

Ni i( )σ μ2

2

=−∑ f M

Ni i( )

For sample dataFor sample data

For population dataFor population data

Page 46: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4646SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

Rent ($) f i

420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Total 70

Mi

429.5449.5469.5489.5509.5529.5549.5569.5589.5609.5

Sample Variance for Grouped DataSample Variance for Grouped Data

Mi - x-63.7-43.7-23.7-3.716.336.356.376.396.3116.3

f i(Mi - x )2

32471.7132479.596745.97110.11

1857.555267.866337.13

23280.6618543.5381140.18

208234.29

(Mi - x )2

4058.961910.56562.1613.76

265.361316.963168.565820.169271.76

13523.36

continuedcontinued

Page 47: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4747SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

3,017.89 54.94s = =3,017.89 54.94s = =

ss22 = 208,234.29/(70 = 208,234.29/(70 –– 1) = 3,017.891) = 3,017.89

This approximation differs by only $.20 This approximation differs by only $.20 from the actual standard deviation of $54.74.from the actual standard deviation of $54.74.

Sample Variance for Grouped DataSample Variance for Grouped Data

Sample VarianceSample Variance

Sample Standard DeviationSample Standard Deviation

Page 48: Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/ch3b.pdf · Chebyshev’s Theorem At least of the data values must be within of the mean. At least of the data

4848SlideSlide©© 2006 Thomson/South2006 Thomson/South--WesternWestern

End of Chapter 3, Part BEnd of Chapter 3, Part B