Statistics

63
Copyright 2001, Alan Marshall Statistics Statistics

description

Statistics. Statistics. Branch of Mathematics that deals with the collection and analysis of data Descriptive Statistics: used to analyze and describe data - PowerPoint PPT Presentation

Transcript of Statistics

Page 1: Statistics

© Copyright 2001, Alan Marshall 1

StatisticsStatistics

Page 2: Statistics

© Copyright 2001, Alan Marshall 2

StatisticsStatistics

Branch of Mathematics that deals with the collection and analysis of data

Descriptive Statistics: used to analyze and describe data

Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events.

Page 3: Statistics

© Copyright 2001, Alan Marshall 3

Measures of Central Measures of Central TendencyTendency

Page 4: Statistics

© Copyright 2001, Alan Marshall 4

Measures of Central TendencyMeasures of Central Tendency

Arithmetic Mean Median Mode Geometric Mean

Page 5: Statistics

© Copyright 2001, Alan Marshall 5

Arithmetic MeanArithmetic Mean

Other names Average Mean

tsmeasuremen of Number

tsmeasuremen the of SumMean

Page 6: Statistics

© Copyright 2001, Alan Marshall 6

Arithmetic MeanArithmetic Mean

n

xx

Samplen

1ii

The calculation is identical, just the notation varies slightly

N

x

PopulationN

1ii

x

Page 7: Statistics

© Copyright 2001, Alan Marshall 7

Summation NotationSummation Notation

N

1tt

N

1t t xx

Notice that the first form uses less vertical space on the page

This makes accountants very happy The first can also be easier to fit into a line of text

Page 8: Statistics

© Copyright 2001, Alan Marshall 8

ExampleExample

Ten second year BBA students wrote the CSC exam last month

Their scores were:

71, 72, 88, 69, 77, 63, 91, 81, 83, 75

77

10

75838191637769887271x

Page 9: Statistics

© Copyright 2001, Alan Marshall 9

Calculating the MeanCalculating the Mean

Arithmetic mean sum the observations and divide by the number of

observations

Example: 5%, 7%, -2%, 12%, 8%

%65

%30

N

rr

N

1tt

Page 10: Statistics

© Copyright 2001, Alan Marshall 10

Problem with the Arithmetic MeanProblem with the Arithmetic Mean

Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change

$1,000 at 6% for 5 years should be $1,338.23

Starting Rate of EndingValue Return Value1000.00 5% 1050.00

(1000)(1.06)5 1050.00 7% 1123.50=1338.23 1123.50 -2% 1101.03

1101.03 12% 1233.151233.15 8% 1331.81

Page 11: Statistics

© Copyright 2001, Alan Marshall 11

Geometric MeanGeometric Mean

The Geometric Mean should be used for rates of change, like rates of return

%898.51)3318059.1(

108.112.198.007.105.1

1)r1(r

2.0

5

1

N

1N

1tt

Page 12: Statistics

© Copyright 2001, Alan Marshall 12

Geometric MeanGeometric Mean

The Geometric Mean should be used for rates of change, like rates of return

%898.51)3318059.1(

108.112.198.007.105.1

1)r1(r

2.0

5

1

N

1N

1tt

Means: The product of these factors from 1 to N

Page 13: Statistics

© Copyright 2001, Alan Marshall 13

Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean

The more variable the underlying data, the greater the error using the Arithmetic mean

The Geometric Mean is often easier to calculate: Stock prices: 1992: $20; 1999: $40, R =

10.41%

Page 14: Statistics

© Copyright 2001, Alan Marshall 14

Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean

For analysis of past performance, use the Geometric mean The past returns have averaged 5.898%

To use the past returns to estimate the future expected return, use the Arithmetic mean The expected return is 6%

Page 15: Statistics

© Copyright 2001, Alan Marshall 15

Median and ModeMedian and Mode

Median: Midpoint If odd number of observations: Middle

observation If even number of observations: Average of

middle 2 observations Mode: Most frequent

Page 16: Statistics

© Copyright 2001, Alan Marshall 16

ExampleExample

Our CSC mark data was (sorted):

63, 69, 71, 72, 75, 77, 81, 83, 88, 91 The median is 76 There is no mode

Page 17: Statistics

© Copyright 2001, Alan Marshall 17

ExampleExample

The Deviation is the difference between each observation and the mean

The sign indicates whether the observation is above (+) or below (-) the mean

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

Page 18: Statistics

© Copyright 2001, Alan Marshall 18

ExampleExample

The average deviation is always zero

If it isn’t, you must have made a mistake!

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

Page 19: Statistics

© Copyright 2001, Alan Marshall 19

Measures of DispersionMeasures of Dispersion

Page 20: Statistics

© Copyright 2001, Alan Marshall 20

Measures of DispersionMeasures of Dispersion

So far, we have look at measures of central tendency

What about measuring the tendency of the data to vary from these centre?

Page 21: Statistics

© Copyright 2001, Alan Marshall 21

Measures of DispersionMeasures of Dispersion

Range Highest - Lowest

Variance Standard Deviation

Page 22: Statistics

© Copyright 2001, Alan Marshall 22

ExampleExample

The range is 91-63=28 The range can be

extremely sensitive to outlier observations

Suppose one of these students had a very bad day and scored 8. The range would now

be 91-8=83

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

Page 23: Statistics

© Copyright 2001, Alan Marshall 23

Mean Absolute DeviationMean Absolute Deviation

The Mean Absolute Deviation is a measure of average dispersion that is not used very much

It has some undesirable mathematical properties beyond the level of this course

x Deviation |D|

71 -6 6 72 -5 5 88 11 11 69 -8 8 77 0 0 63 -14 14 91 14 14 81 4 4 83 6 6 75 -2 2

77 0 7

Page 24: Statistics

© Copyright 2001, Alan Marshall 24

Mean Squared DeviationMean Squared Deviation

The Mean Squared Deviation is very commonly used

The MSD in this example is 694/10=69.4

The more common name of the MSD is the VARIANCE

x Deviation D2

71 -6 36 72 -5 25 88 11 121 69 -8 64 77 0 0 63 -14 196 91 14 196 81 4 16 83 6 36 75 -2 4

77 0 694

Page 25: Statistics

© Copyright 2001, Alan Marshall 25

N

x 2i2

1n

xxsˆ

2

i22

VarianceVariance

Variance measures the amount of dispersion from the mean.

For Populations: For Samples:

Page 26: Statistics

© Copyright 2001, Alan Marshall 26

N

xx2

i

1N

xxsˆ

2

i

Standard DeviationStandard Deviation

Standard Deviation measures the amount of dispersion from the mean.

For Populations: For Samples:

Page 27: Statistics

© Copyright 2001, Alan Marshall 27

Standard Deviation ExampleStandard Deviation Example

Using the previous example The data is sample data

78.8110

694

1N

xxsˆ

2

i

Page 28: Statistics

© Copyright 2001, Alan Marshall 28

Interpreting the Std. Dev.Interpreting the Std. Dev.

You have heard of the Bell Shaped or Normal Distribution

The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE

Page 29: Statistics

© Copyright 2001, Alan Marshall 29

Normal DistributionNormal Distribution

-4 -3 -2 -1 0 1 2 3 4

Z Value- Standard Deviations from Mean

2/3

95%

99.7%

0.15%Tail

2.5%Tail

0.15%Tail

2.5%Tail

Page 30: Statistics

© Copyright 2001, Alan Marshall 30

Empirical RuleEmpirical Rule

For approximately Normally Distributed data: Within 1 of the mean: approx.. 2/3s Within 2 of the mean: approx. 95% (19/20) Within 3 of the mean: virtually all

Page 31: Statistics

© Copyright 2001, Alan Marshall 31

Quartiles, Percentiles, etc.Quartiles, Percentiles, etc.

The Median splits the data in half Quartiles split the data into quarters Deciles split the data into tenths Percentiles split the data into one-

hundredths

Page 32: Statistics

© Copyright 2001, Alan Marshall 32

Rank MeasuresRank Measures

“That was a top-half performance” “WTG Special fund has been a top quartile

performer for the past 5 years” “Our programme accepts only students

proven to be top decile performers” “I was in the 92nd percentile on the GMAT”

Page 33: Statistics

© Copyright 2001, Alan Marshall 33

Using ExcelUsing Excel

Full Descriptive Statistics Tools

Data Analysis Descriptive Statistics

Page 34: Statistics

© Copyright 2001, Alan Marshall 34

Measures of AssociationMeasures of Association

Page 35: Statistics

© Copyright 2001, Alan Marshall 35

Bivariate StatisticsBivariate Statistics

So far, we have been dealing with statistics of individual variables

We also have statistics that relate pairs of variables

Page 36: Statistics

© Copyright 2001, Alan Marshall 36

InteractionsInteractions

Sometimes two variables appear related: smoking and lung cancers height and weight years of education and income engine size and gas mileage GMAT scores and MBA GPA house size and price

Page 37: Statistics

© Copyright 2001, Alan Marshall 37

InteractionsInteractions

Some of these variables would appear to positively related & others negatively

If these were related, we would expect to be able to derive a linear relationship:

y = a + bx where, b is the slope, and a is the intercept

Page 38: Statistics

© Copyright 2001, Alan Marshall 38

Linear RelationshipsLinear Relationships

We will be deriving linear relationships from bivariate (two-variable) data

Our symbols will be:

term Error

Interceptˆ Slopeˆ

xy or xy

01

1010

Page 39: Statistics

© Copyright 2001, Alan Marshall 39

ExampleExample

Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index

The next slide shows 25 monthly returns

Page 40: Statistics

© Copyright 2001, Alan Marshall 40

Example DataExample Data

TSE CMP TSE CMP TSE CMPx y x y x y

3 4 -4 -3 2 4-1 -2 -1 0 -1 12 -2 0 -2 4 34 2 1 0 -2 -15 3 0 0 1 2

-3 -5 -3 1 -3 -4-5 -2 -3 -2 2 11 2 1 3 -2 -22 -1

Page 41: Statistics

© Copyright 2001, Alan Marshall 41

ExampleExample

From the data, it appears that a positive relationship may exist Most of the time when the TSE is up, CMP is

up Likewise, when the TSE is down, CMP is down

most of the time Sometimes, they move in opposite directions

Let’s graph this data

Page 42: Statistics

© Copyright 2001, Alan Marshall 42

Graph Of DataGraph Of Data

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

Page 43: Statistics

© Copyright 2001, Alan Marshall 43

Example Summary StatisticsExample Summary Statistics

The data do appear to be positively related Let’s derive some summary statistics about these

data:

Page 44: Statistics

© Copyright 2001, Alan Marshall 44

ObservationsObservations

Both have means of zero and standard deviations just under 3

However, each data point does not have simply one deviation from the mean, it deviates from both means

Consider Points A, B, C and D on the next graph

Page 45: Statistics

© Copyright 2001, Alan Marshall 45

Graph of DataGraph of Data

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

A

B

C

D

Page 46: Statistics

© Copyright 2001, Alan Marshall 46

ImplicationsImplications

When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive

When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative

Page 47: Statistics

© Copyright 2001, Alan Marshall 47

An Important ObservationAn Important Observation

The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship

Page 48: Statistics

© Copyright 2001, Alan Marshall 48

CovarianceCovariance(Showing the formula only to demonstrate a concept)(Showing the formula only to demonstrate a concept)

1nn

yxyx

1n

yyxxsYXcov

N

yxYXCOV

iiiii

n

1ii

XY

yi

N

1ixi

XY

),(

),(

Page 49: Statistics

© Copyright 2001, Alan Marshall 49

CovarianceCovariance

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

A

B

C

D

Page 50: Statistics

© Copyright 2001, Alan Marshall 50

CovarianceCovariance

In the same units as Variance (if both variables are in the same unit), i.e. units squared

Very important element of measuring portfolio risk in finance

Page 51: Statistics

© Copyright 2001, Alan Marshall 51

Covariance in ExcelCovariance in Excel

Tools Data Analysis

Covariance

Column 1 Column 2Column 1 7.25Column 2 4.875 6.25

Page 52: Statistics

© Copyright 2001, Alan Marshall 52

Interpreting the ResultInterpreting the Result

Column 1 Column 2Column 1 7.25Column 2 4.875 6.25

This gives us the variances (7.25 & 6.25) and the covariance between the variables, 4.875

In fact, variance is simply the covariance of a variable with itself!

Page 53: Statistics

© Copyright 2001, Alan Marshall 53

Using CovarianceUsing Covariance

Very useful in Finance for measuring portfolio risk

Unfortunately, it is hard to interpret for two reasons: What does the magnitude/size imply? The units are confusing

Page 54: Statistics

© Copyright 2001, Alan Marshall 54

A More Useful StatisticA More Useful Statistic

We can simultaneously adjust for both of these shortcomings by dividing the covariance by the two relevant standard deviations

This operation Removes the impact of size & scale Eliminates the units

Page 55: Statistics

© Copyright 2001, Alan Marshall 55

CorrelationCorrelation

Correlation measures the sensitivity of one variable to another, but ignoring magnitude

Range: -1 to 1 +1: Implies perfect positive co-movement -1: Implies perfect negative co-movement 0: No relationship

Page 56: Statistics

© Copyright 2001, Alan Marshall 56

Calculating CorrelationCalculating Correlation

YXXYXY

YXXY

ss

Y)cov(X,r

YXCOV

ˆ

),(

Page 57: Statistics

© Copyright 2001, Alan Marshall 57

Correlation in ExcelCorrelation in Excel

Tools Data Analysis

Correlation

Column 1 Column 2Column 1 1Column 2 0.724212 1

Page 58: Statistics

© Copyright 2001, Alan Marshall 58

Interpreting the ResultInterpreting the Result

The correlation of a variable with itself is 1 The correlation between CMP and the TSE Index

in this example is 0.724 This is positive, and relatively strong

Column 1 Column 2Column 1 1Column 2 0.724212 1

Page 59: Statistics

© Copyright 2001, Alan Marshall 59

Estimating Linear Estimating Linear RelationshipsRelationships

Page 60: Statistics

© Copyright 2001, Alan Marshall 60

Estimating Linear RelationshipsEstimating Linear Relationships

Often the data imply that a linear relationship exists

We can estimate this relationship using the Least Squares Method of Regression

We will just learn to use the Excel output and interpret it

Page 61: Statistics

© Copyright 2001, Alan Marshall 61

TSE-CMP Regression OutputTSE-CMP Regression Output(Abridged)(Abridged)

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.724211819R Square 0.524482759Adjusted R Square 0.503808096Standard Error 1.76102226Observations 25

Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05

Page 62: Statistics

© Copyright 2001, Alan Marshall 62

Interpreting the OutputInterpreting the Output

SUMMARY OUTPUT rCMP = 0 + 0.6724(rTSE) + e

Regression StatisticsMultiple R 0.724211819 CorrelationR Square 0.524482759 CoefficientAdjusted R Square 0.503808096Standard Error 1.76102226Observations 25 Intercept

Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05

Slope

Page 63: Statistics

© Copyright 2001, Alan Marshall 63

Where We Are GoingWhere We Are Going

We will develop the use of the regression technique more fully Multiple explanatory variables

Some Time-Series Applications