EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our...

47
EXPECTATION, VARIANCE ETC. - APPLICATION 1

description

3 With one data point clearly the central location is at the point itself. Measures of Central Location The measure of central location reflects the locations of all the data points. How? But if the third data point appears on the left hand-side of the midrange, it should “pull” the central location to the left. With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them).

Transcript of EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our...

Page 1: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

EXPECTATION, VARIANCE ETC. - APPLICATION

1

Page 2: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

2

Measures of Central Location

• Usually, we focus our attention on two types of measures when describing population characteristics:– Central location– Variability or spread

Page 3: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

3

With one data pointclearly the central location is at the pointitself.

Measures of Central Location• The measure of central location reflects

the locations of all the data points.• How?

But if the third data point appears on the left hand-sideof the midrange, it should “pull”the central location to the left.

With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).

Page 4: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

4

Sum of the observationsNumber of observationsMean =

• This is the most popular measure of central location

The Arithmetic Mean

Page 5: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

5

nx

x in

1i

Sample mean Population mean

Nx i

N1i

Sample size Population size

nx

x in

1i

The Arithmetic Mean

Page 6: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

6

10...

101021

101 xxxx

x ii

• ExampleThe reported time on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time on the Internet.

0 7 22 11.0

The Arithmetic Mean

Page 7: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

7

The Arithmetic Mean

• Drawback of the mean: It can be influenced by unusual observations, because it uses all the information in the data set.

Page 8: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

8

Odd number of observations

0, 0, 5, 7, 8 9, 12, 14, 220, 0, 5, 7, 8, 9, 12, 14, 22, 330, 0, 5, 7, 8, 9, 12, 14, 22, 33

Even number of observations

ExampleFind the median of the time on the internetfor the 10 adults of previous example

• The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude. It divides the data in half.

The Median

Suppose only 9 adults were sampled (exclude, say, the longest time (33))

Comment

8.5, 8

Page 9: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

The Median

• Depth of median = (n+1)/2

9

)2(2

Median)1()(

)2/)1((

knevenisnifXX

oddisnifX

kk

n

Page 10: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

10

• The Mode of a set of observations is the value that occurs most frequently.

• Set of data may have one mode (or modal class), or two or more modes.

The modal class

The Mode

Page 11: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

11

• Find the mode for the data in the Example. Here are the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22

Solution

• All observation except “0” occur once. There are two “0”s. Thus, the mode is zero.

• Is this a good measure of central location?• The value “0” does not reside at the center of this set

(compare with the mean = 11.0 and the median = 8.5).

The Mode

Page 12: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

12

Relationship among Mean, Median, and Mode• If a distribution is from a bell shaped symmetrical

one, the mean, median and mode coincide

• If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode

Mean = Median = Mode

Mode < Median < Mean

Page 13: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

13

• If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

Relationship among Mean, Median, and Mode

Mean < Median < Mode

Page 14: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

14

Measures of variability

• Measures of central location fail to tell the whole story about the distribution.

• A question of interest still remains unanswered:

How much are the observations spread outaround the mean value?

Page 15: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

15

Measures of variability

Observe two hypothetical data sets:

The average value provides a good representation of theobservations in the data set.

Small variability

This data set is now changing to...

Page 16: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

16

Measures of Variability

Observe two hypothetical data sets:

The average value provides a good representation of theobservations in the data set.

Small variability

Larger variabilityThe same average value does not provide as good representation of theobservations in the data set as before.

Page 17: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

17

– The range of a set of observations is the difference between the largest and smallest observations.

– Its major advantage is the ease with which it can be computed.

– Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points.

? ? ?

But, how do all the observations spread out?

Smallestobservation

Largestobservation

The range cannot assist in answering this question

Range

The Range

Page 18: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

18

This measure reflects the dispersion of all the observations

The variance of a population of size N x1, x2,…,xN

whose mean is is defined as

The variance of a sample of n observationsx1, x2, …,xn whose mean is is defined asx

N

)x( 2i

N1i2

1n

)xx(s

2i

n1i2

The Variance

Page 19: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

19

Why not use the sum of deviations?

Consider two small populations:

1098

74 10

11 12

13 16

8-10= -2

9-10= -111-10= +1

12-10= +2

4-10 = - 6

7-10 = -3

13-10 = +3

16-10 = +6

Sum = 0

Sum = 0

The mean of both populations is 10...

…but measurements in Bare more dispersedthan those in A.

A measure of dispersion Should agrees with this observation.

Can the sum of deviationsBe a good measure of dispersion?

A

B

The sum of deviations is zero for both populations, therefore, is not a good measure of dispersion.

Page 20: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

20

Let us calculate the variance of the two populations

185

)1016()1013()1010()107()104( 222222B

25

)1012()1011()1010()109()108( 222222A

Why is the variance defined as the average squared deviation?Why not use the sum of squared deviations as a measure of variation instead?

After all, the sum of squared deviations increases in magnitude when the variationof a data set increases!!

The Variance

Page 21: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

21

Which data set has a larger dispersion?

1 3 1 32 5

A B

Data set Bis more dispersedaround the mean

Let us calculate the sum of squared deviations for both data sets

The Variance

Page 22: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

22

1 3 1 32 5

A B

SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10SumB = (1-3)2 + (5-3)2 = 8

SumA > SumB. This is inconsistent with the observation that set B is more dispersed.

The Variance

Page 23: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

23

1 3 1 32 5

A B

However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked.

A2 = SumA/N = 10/5 = 2

B2 = SumB/N = 8/2 = 4

The Variance

Page 24: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

24

• Example– The following sample consists of the number of

jobs six students applied for: 17, 15, 23, 7, 9, 13. Find its mean and variance

• Solution

2

2222

in

1i2

jobs2.33

)1413...()1415()1417(16

11n

)xx(s

jobs146

846

13972315176

xx i

61i

The Variance

Page 25: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

25

2

2222

2i

n1i2

i

n

1i

2

jobs2.33

613...1517

13...151716

1

n)x(

x1n

1s

The Variance – Shortcut method

Page 26: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

26

• The standard deviation of a set of observations is the square root of the variance.

2

2

:deviationandardstPopulation

ss:deviationstandardSample

Standard Deviation

Page 27: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

27

• Example– To examine the consistency of shots for a new

innovative golf club, a golfer was asked to hit 150 shots, 75 with a currently used (7-iron) club, and 75 with the new club.

– The distances were recorded. – Which club is better?

Standard Deviation

Page 28: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

28

• Example – solution

Standard Deviation

Excel printout, from the “Descriptive Statistics” sub-menu.

Current Innovation

Mean 150.5467 Mean 150.1467Standard Error 0.668815 Standard Error 0.357011Median 151 Median 150Mode 150 Mode 149Standard Deviation 5.792104 Standard Deviation 3.091808Sample Variance 33.54847 Sample Variance 9.559279Kurtosis 0.12674 Kurtosis -0.88542Skewness -0.42989 Skewness 0.177338Range 28 Range 12Minimum 134 Minimum 144Maximum 162 Maximum 156Sum 11291 Sum 11261Count 75 Count 75

The innovation club is more consistent, and because the means are close, is considered a better club

Page 29: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

29

• The coefficient of variation of a set of measurements is the standard deviation divided by the mean value.

• This coefficient provides a proportionate measure of variation.

CV : variationoft coefficien Population

xscv : variationoft coefficien Sample

A standard deviation of 10 may be perceivedlarge when the mean value is 100, but only moderately large when the mean value is 500

The Coefficient of Variation

Page 30: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

Percentiles• Example from http://www.ehow.com/how_2310404_calculate-percentiles.html

• Your test score, e.g. 70%, tells you how many questions you answered correctly. However, it doesn’t tell how well you did compared to the other people who took the same test.

• If the percentile of your score is 75, then you scored higher than 75% of other people who took the test.

30

Page 31: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

31

Sample Percentiles and Box Plots

• Percentile– The pth percentile of a set of measurements is the

value for which • p percent of the observations are less than that value• 100(1-p) percent of all the observations are greater than

that value.

Page 32: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

32

Sample Percentiles

•Find the 10 percentile of 6 8 3 6 2 8 1

•Order the data: 1 2 3 6 6 8 8

•7*(0.10) = 0.70; round up to 1

The first observation, 1, is the 10 percentile.

Page 33: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

33

• Commonly used percentiles– First (lower) quartile, Q1 = 25th percentile

– Second (middle) quartile,Q2 = 50th percentile

– Third quartile, Q3 = 75th percentile

– Fourth quartile, Q4 = 100th percentile– First (lower) decile = 10th percentile– Ninth (upper) decile = 90th percentile

Page 34: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

34

Quartiles and Variability

• Quartiles can provide an idea about the shape of a histogram

Q1 Q2 Q3

Positively skewedhistogram

Q1 Q2 Q3

Negatively skewedhistogram

Page 35: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

35

• Large value indicates a large spread of the observations

Interquartile range = Q3 – Q1

Interquartile Range

Page 36: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

36

Paired Data Sets and the Sample Correlation Coefficient

• The covariance and the coefficient of correlation are used to measure the direction and strength of the linear relationship between two variables.– Covariance - is there any pattern to the way two

variables move together? – Coefficient of correlation - how strong is the linear

relationship between two variables

Page 37: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

37

N

)y)((xY)COV(X,covariance Population yixi

x (y) is the population mean of the variable X (Y).N is the population size.

1-n)yy)(x(x

y) cov(x,covariance Sample ii

Covariance

x (y) is the sample mean of the variable X (Y).n is the sample size.

Page 38: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

38

• If the two variables move in opposite directions, (one increases when the other one decreases), the covariance is a large negative number.

• If the two variables are unrelated, the covariance will be close to zero.

• If the two variables move in the same direction, (both increase or both decrease), the covariance is a large positive number.

Covariance

Page 39: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

39

• Compare the following three sets

Covariance

xi yi (x – x) (y – y) (x – x)(y – y)

267

132027

-312

-707

21014

x=5 y =20 Cov(x,y)=17.5

xi yi (x – x) (y – y) (x – x)(y – y)

267

272013

-312

70-7

-210-14

x=5 y =20 Cov(x,y)=-17.5

xi yi

267

202713

Cov(x,y) = -3.5

x=5 y =20

Page 40: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

40

– This coefficient answers the question: How strong is the association between X and Y?

yx

)Y,X(COV

ncorrelatio oft coefficien Population

yxss)Y,Xcov(

r

ncorrelatio oft coefficien Sample

The coefficient of correlation

Page 41: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

41

COV(X,Y)=0 or r =

+1

0

-1

Strong positive linear relationship

No linear relationship

Strong negative linear relationship

or

COV(X,Y)>0

COV(X,Y)<0

The coefficient of correlation

Page 42: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

42

• If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship).

• If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship).

• No straight line relationship is indicated by a coefficient close to zero.

The Coefficient of Correlation

Page 43: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

The Coefficient of Correlation

43

Page 44: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

Correlation and causation• Recognize the difference between correlation and

causation — just because two things occur together, that does not necessarily mean that one causes the other.

• For random processes, causation means that if A occurs, that causes a change in the probability that B occurs.

44

Page 45: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

Correlation and causation• Existence of a statistical relationship, no matter how strong it

is, does not imply a cause-and-effect relationship between X and Y. for ex, let X be size of vocabulary, and Y be writing speed for a group of children. There most probably be a positive relationship but this does not imply that an increase in vocabulary causes an increase in the speed of writing. Other variables such as age, education etc will affect both X and Y.

• Even if there is a causal relationship between X and Y, it might be in the opposite direction, i.e. from Y to X. For eg, let X be thermometer reading and let Y be actual temperature. Here Y will affect X.

45

Page 46: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

ExampleDr. Leonard Eron, professor at the University of Illinois at Chicago, has

conducted a longitudinal study of the long–term effects of violent television programming. In 1960, he asked 870 third grade children their favorite television shows. He found that children judged most violent by their peers also watched the most violent television. Dr. Eron noted, however, that it was not clear which came first — the child’s behavior or the influence of television.

In follow-up interviews at ten–year intervals, Eron found that youngsters who at age eight were nonaggressive but were watching violent television were more aggressive than children who at age eight were aggressive and watched non–violent television. Eron claims that this establishes a cause–and–effect relationship between watching violent television and aggressive behavior.

Can you think of any other possible causes?

46

Page 47: EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.

Example - solution

• It could be that the difference in aggressive behavior is due to other familial influences. Perhaps children who are permitted to watch violent programming are more likely to come from violent or abusive families, which could also lead to more aggressive behavior.

47