The Variance and Standard Deviation

36
The Variance and Standard Deviation The most important measure of variability is based on deviations of individual observations about the central value. For this purpose the mean usually serves as the center.

description

The Variance and Standard Deviation. The most important measure of variability is based on deviations of individual observations about the central value. For this purpose the mean usually serves as the center. MEASURES OF VARIABILITY. Variance Population variance Sample variance - PowerPoint PPT Presentation

Transcript of The Variance and Standard Deviation

Page 1: The Variance and Standard Deviation

The Variance and Standard Deviation

The most important measure of variability is based on deviations of individual observations about the central value. For this purpose the mean usually serves as the center.

Page 2: The Variance and Standard Deviation

MEASURES OF VARIABILITY

• Variance– Population variance – Sample variance

• Standard Deviation– Population standard deviation – Sample standard deviation

• Coefficient of Variation (CV)– Sample CV– Population CV

Page 3: The Variance and Standard Deviation

MEASURES OF VARIABILITYPOPULATION VARIANCE

• The population variance is the mean squared deviation from the population mean:

• Where 2 stands for the population variance is the population mean• N is the total number of values in the population• is the value of the i-th observation.• represents a summation

N

xN

ii

12

)(

ix

Page 4: The Variance and Standard Deviation

An example related to deviation about the central value

• There are five SAT scores as below:

584, 613, 622, 693, 755.

• The mean is

(584+613+622+693+755)/5 = 653.4

• The deviation for each score can be computed by subtracting mean from each score:

755-653.4 = 101.6

Page 5: The Variance and Standard Deviation

An example related to deviation about the central value (cont..)

693-653.4 = 39.6

622-653.4 = -31.4

613.653.4 = -40.4

584-653.4 = -69.4

These deviations may be summarized by the collective measure that considers each deviation.

Page 6: The Variance and Standard Deviation

An example related to deviation about the central value (cont..)

With the previous data, this procedure results in

04.865.35

2.19325

5

)4.31()6.39()4.69()4.40()6.101( 22222

Page 7: The Variance and Standard Deviation

Population Variance

• In practice population variance cannot be computed directly because the entire population is not ordinarily observed.

• An analogous measure of variability may be determined with sample data.

• This referred to as sample variance

Page 8: The Variance and Standard Deviation

MEASURES OF VARIABILITYSAMPLE VARIANCE

• The sample variance is defined as follows:

• Where s2 stands for the sample variance• is the sample mean• n is the total number of values in the sample• is the value of the i-th observation.• represents a summation

112

n

xxs

N

ii )(

ix

x

Page 9: The Variance and Standard Deviation

MEASURES OF VARIABILITYSAMPLE VARIANCE

• Notice that the sample variance is defined as the sum of the squared deviations divided by n-1.

• Sample variance is computed to estimate the population variance.

• An unbiased estimate of the population variance may be obtained by defining the sample variance as the sum of the squared deviations divided by n-1 rather than by n.

• Defining sample variance as the mean squared deviation from the sample mean tends to underestimate the population variance.

Page 10: The Variance and Standard Deviation

MEASURES OF VARIABILITYSAMPLE VARIANCE

• A shortcut formula for the sample variance:

• Where s2 is the sample variance• n is the total number of values in the sample• is the value of the i-th observation.• represents a summation

n

x

xn

s

n

iin

ii

2

1

1

22

1

1

ix

Page 11: The Variance and Standard Deviation

MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION

• The standard deviation is the positive square root of the variance:

Population standard deviation:

Sample standard deviation: • Compute the standard deviations of advertising and

sales.

2ss

2

Page 12: The Variance and Standard Deviation

MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION

• Compute the sample standard deviation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0

• Compute the sample standard deviation of sales data: 264, 116, 165, 101 and 209

Page 13: The Variance and Standard Deviation

MEASURES OF VARIABILITY POPULATION/SAMPLE CV

• The coefficient of variation is the standard deviation divided by the means

Population coefficient of variation:

Sample coefficient of variation:x

scv

CV

Page 14: The Variance and Standard Deviation

MEASURES OF VARIABILITY POPULATION/SAMPLE CV

• Compute the sample coefficient of variation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0

• Compute the sample coefficient of variation of sales data: 264, 116, 165, 101 and 209

Page 15: The Variance and Standard Deviation

MEASURES OF ASSOCIATION

• Scatter diagram plot provides a graphical description of positive/negative, linear/non-linear relationship

• Some numerical description of the positive/negative, linear/non-linear relationship are obtained by:– Covariance

• Population covariance• Sample covariance

– Coefficient of correlation• Population coefficient of correlation• Sample coefficient of correlation

Page 16: The Variance and Standard Deviation

• A sample of monthly advertising and sales data are collected and shown below:

• How is the relationship between sales and advertising? Is the relationship linear/non-linear, positive/negative, etc.

MEASURES OF ASSOCIATION: EXAMPLE

Sales AdvertisingMonth (000 units) (000 $)

1 264 2.52 116 1.33 165 1.44 101 1.05 209 2.0

Page 17: The Variance and Standard Deviation

POPULATION COVARIANCE

• The population covariance is mean of products of deviations from the population mean:

• Where COV(X,Y) is the population covariance x, y are the population means of X and Y respectively

• N is the total number of values in the population• are the values of the i-th observations of X and Y

respectively.• represents a summation

N

yxYXCOV

N

iyixi

1

),(

ii yx ,

Page 18: The Variance and Standard Deviation

SAMPLE COVARIANCE

• The sample covariance is mean of products of deviations from the sample mean:

• Where cov(X,Y) is the sample covariance• are the sample means of X and Y respectively• n is the total number of values in the population• are the values of the i-th observations of X and Y

respectively.• represents a summation

1

1

1

n

yyxx)Y,Xcov(

n

iii

ii yx ,

y,x

Page 19: The Variance and Standard Deviation

SAMPLE COVARIANCE

Advertising SalesMonth (in 000$) (in 000 units)

1 2.5 2642 1.3 1163 1.4 1654 1 1015 2 209

Mean 1.64 171 Total=SD 0.602495 67.18258703 cov=

Page 20: The Variance and Standard Deviation

POPULATION/SAMPLE COVARIANCE

• If two variables increase/decrease together, covariance is a large positive number and the relationship is called positive.

• If the relationship is such that when one variable increases, the other decreases and vice versa, then covariance is a large negative number and the relationship is called negative.

• If two variables are unrelated, the covariance may be a small number.

• How large is large? How small is small?

Page 21: The Variance and Standard Deviation

POPULATION/SAMPLE COVARIANCE

• How large is large? How small is small? A drawback of covariance is that it is usually difficult to provide any guideline how large covariance shows a strong relationship and how small covariance shows no relationship.

• Coefficient of correlation can overcome this drawback to a certain extent.

Page 22: The Variance and Standard Deviation

POPULATION COEFFICIENT OF CORRELATION

• The population coefficient of correlation is the population covariance divided by the population standard deviations of X and Y:

• Where is the population coefficient of correlation• COV(X,Y) is the population covariance x, y are the population means of X and Y

respectively

yx

)Y,X(COV

Page 23: The Variance and Standard Deviation

SAMPLE COEFFICIENT OF CORRELATION

• The sample coefficient of correlation is the sample covariance divided by the sample standard deviations of X and Y:

• Where r is the sample coefficient of correlation• cov(X,Y) is the sample covariance

• sx, sy are the sample means of X and Y respectively

yx

)Y,X(COV

Page 24: The Variance and Standard Deviation

Advertising SalesMonth (in 000$) (in 000 units)

1 2.5 2642 1.3 1163 1.4 1654 1 1015 2 209

Mean 1.64 171 Total=SD 0.602495 67.18258703 cov=

r=

SAMPLE COEFFICIENT OF CORRELATION

Page 25: The Variance and Standard Deviation

RELATIVE STANDINGBOX PLOTS

• When the data set contains a small number of values, a box plot is used to graphically represent the data set. These plots involve five values: – the minimum value (S)

– the lower quartile (Q1)

– the median (Q2)

– the upper quartile (Q3)

– and the maximum value (L)

Page 26: The Variance and Standard Deviation

RELATIVE STANDING: BOX PLOTSEXAMPLE

• Example: Construct a box plot with the following data which shows the assets of the 15 largest North American banks, rounded off to the nearest hundred million dollars: 111, 135, 217, 108, 51 , 98, 65, 85, 75, 75, 93, 64, 57, 56, 98

Page 27: The Variance and Standard Deviation

RELATIVE STANDING: BOX PLOTSRANKING AND SUMMARIZING

Data Rank Smallest = 51217 1 Q1 = 64135 2 Median = 85111 3 Q3 = 108108 4 Largest = 21798 5 IQR = 4498 6 Outliers = (217, )93 785 875 975 1065 1164 1257 1356 1451 15

Page 28: The Variance and Standard Deviation

Box Plot

0 50 100 150 200 250

Assets (in 100 million dollars)

Page 29: The Variance and Standard Deviation

RELATIVE STANDING: BOX PLOTSINTERPRETATION

• If the median is near the center of the box, the distribution is approximately symmetric.

• If the median falls to the left of the center of the box, the distribution is positively skewed.

• If the median falls to the right of the center of the box, the distribution is negatively skewed.

• If the lines are about the same length, the distribution is approximately symmetric.

• If the line segment to the right of the box is larger than the one to the left, the distribution is positively skewed.

• If the line segment to the left of the box is larger than the one to the right, the distribution is positively skewed.

Page 30: The Variance and Standard Deviation

SYMMETRIC BOX PLOT

0 50 100 150 200 250 300

Number of units sold

Page 31: The Variance and Standard Deviation

POSITIVELY SKEWED BOX PLOT

0 50 100 150 200 250 300

Number of units sold

Page 32: The Variance and Standard Deviation

Summary Statistical Measure: The Proportion

Page 33: The Variance and Standard Deviation

EXAMPLE

• Salary and expenses for cultural activities, and sports related activities are collected from 100 households. Data of only 5 households shown below:

How are the relationships (linear/non-linear, positive/negative)between (i) salary and culture, (ii) salary and sports, and (iii) sports and culture?

Salary and expensesdata for 100 households

Salary Culture Sports$54,600 $1,020 $990$57,500 $1,100 $460$53,300 $900 $780$43,500 $570 $860$57,200 $900 $1,390

Page 34: The Variance and Standard Deviation

SALARY-CULTURE

$0

$400

$800

$1,200

$1,600

$35,000 $55,000 $75,000 $95,000

Salary

Ex

pe

ns

es

fo

r C

ult

ura

l A

cti

viti

es

cov = 1094787, r = 0.5065 (positive, linear)

Page 35: The Variance and Standard Deviation

SPORTS-CULTURE

0

400

800

1200

1600

$500 $1,000 $1,500 $2,000

Expenses for sports related activities

Ex

pe

ns

es

fo

r c

ult

ura

l a

cti

viti

es

cov = -33608, r = -0.5201 (negative, linear)

Page 36: The Variance and Standard Deviation

SALARY-SPORTS

$400

$900

$1,400

$1,900

$35,000 $55,000 $75,000 $95,000

Salary

Ex

pe

ns

es

fo

r s

po

rts

re

late

d a

cti

viti

es

cov = -219026, r = -0.08122 (no linear relationship)