© Copyright 2001, Alan Marshall1 Statistics. 2 Statistics è Branch of Mathematics that deals with...

63
Copyright 2001, Alan Marshall Statistics Statistics

Transcript of © Copyright 2001, Alan Marshall1 Statistics. 2 Statistics è Branch of Mathematics that deals with...

© Copyright 2001, Alan Marshall 1

StatisticsStatistics

© Copyright 2001, Alan Marshall 2

StatisticsStatistics

Branch of Mathematics that deals with the collection and analysis of data

Descriptive Statistics: used to analyze and describe data

Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events.

© Copyright 2001, Alan Marshall 3

Measures of Central Measures of Central TendencyTendency

© Copyright 2001, Alan Marshall 4

Measures of Central TendencyMeasures of Central Tendency

Arithmetic Mean Median Mode Geometric Mean

© Copyright 2001, Alan Marshall 5

Arithmetic MeanArithmetic Mean

Other names Average Mean

tsmeasuremen of Number

tsmeasuremen the of SumMean

© Copyright 2001, Alan Marshall 6

Arithmetic MeanArithmetic Mean

n

xx

Samplen

1ii

The calculation is identical, just the notation varies slightly

N

x

PopulationN

1ii

x

© Copyright 2001, Alan Marshall 7

Summation NotationSummation Notation

N

1tt

N

1t t xx

Notice that the first form uses less vertical space on the page

This makes accountants very happy The first can also be easier to fit into a line of text

© Copyright 2001, Alan Marshall 8

ExampleExample

Ten second year BBA students wrote the CSC exam last month

Their scores were:

71, 72, 88, 69, 77, 63, 91, 81, 83, 75

77

10

75838191637769887271x

© Copyright 2001, Alan Marshall 9

Calculating the MeanCalculating the Mean

Arithmetic mean sum the observations and divide by the number of

observations

Example: 5%, 7%, -2%, 12%, 8%

%65

%30

N

rr

N

1tt

© Copyright 2001, Alan Marshall 10

Problem with the Arithmetic MeanProblem with the Arithmetic Mean

Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change

$1,000 at 6% for 5 years should be $1,338.23

Starting Rate of EndingValue Return Value1000.00 5% 1050.00

(1000)(1.06)5 1050.00 7% 1123.50=1338.23 1123.50 -2% 1101.03

1101.03 12% 1233.151233.15 8% 1331.81

© Copyright 2001, Alan Marshall 11

Geometric MeanGeometric Mean

The Geometric Mean should be used for rates of change, like rates of return

%898.51)3318059.1(

108.112.198.007.105.1

1)r1(r

2.0

5

1

N

1N

1tt

© Copyright 2001, Alan Marshall 12

Geometric MeanGeometric Mean

The Geometric Mean should be used for rates of change, like rates of return

%898.51)3318059.1(

108.112.198.007.105.1

1)r1(r

2.0

5

1

N

1N

1tt

Means: The product of these factors from 1 to N

© Copyright 2001, Alan Marshall 13

Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean

The more variable the underlying data, the greater the error using the Arithmetic mean

The Geometric Mean is often easier to calculate: Stock prices: 1992: $20; 1999: $40, R =

10.41%

© Copyright 2001, Alan Marshall 14

Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean

For analysis of past performance, use the Geometric mean The past returns have averaged 5.898%

To use the past returns to estimate the future expected return, use the Arithmetic mean The expected return is 6%

© Copyright 2001, Alan Marshall 15

Median and ModeMedian and Mode

Median: Midpoint If odd number of observations: Middle

observation If even number of observations: Average of

middle 2 observations Mode: Most frequent

© Copyright 2001, Alan Marshall 16

ExampleExample

Our CSC mark data was (sorted):

63, 69, 71, 72, 75, 77, 81, 83, 88, 91 The median is 76 There is no mode

© Copyright 2001, Alan Marshall 17

ExampleExample

The Deviation is the difference between each observation and the mean

The sign indicates whether the observation is above (+) or below (-) the mean

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

© Copyright 2001, Alan Marshall 18

ExampleExample

The average deviation is always zero

If it isn’t, you must have made a mistake!

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

© Copyright 2001, Alan Marshall 19

Measures of DispersionMeasures of Dispersion

© Copyright 2001, Alan Marshall 20

Measures of DispersionMeasures of Dispersion

So far, we have look at measures of central tendency

What about measuring the tendency of the data to vary from these centre?

© Copyright 2001, Alan Marshall 21

Measures of DispersionMeasures of Dispersion

Range Highest - Lowest

Variance Standard Deviation

© Copyright 2001, Alan Marshall 22

ExampleExample

The range is 91-63=28 The range can be

extremely sensitive to outlier observations

Suppose one of these students had a very bad day and scored 8. The range would now

be 91-8=83

x Deviation

71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2

77 0

© Copyright 2001, Alan Marshall 23

Mean Absolute DeviationMean Absolute Deviation

The Mean Absolute Deviation is a measure of average dispersion that is not used very much

It has some undesirable mathematical properties beyond the level of this course

x Deviation |D|

71 -6 6 72 -5 5 88 11 11 69 -8 8 77 0 0 63 -14 14 91 14 14 81 4 4 83 6 6 75 -2 2

77 0 7

© Copyright 2001, Alan Marshall 24

Mean Squared DeviationMean Squared Deviation

The Mean Squared Deviation is very commonly used

The MSD in this example is 694/10=69.4

The more common name of the MSD is the VARIANCE

x Deviation D2

71 -6 36 72 -5 25 88 11 121 69 -8 64 77 0 0 63 -14 196 91 14 196 81 4 16 83 6 36 75 -2 4

77 0 694

© Copyright 2001, Alan Marshall 25

N

x 2i2

1n

xxsˆ

2

i22

VarianceVariance

Variance measures the amount of dispersion from the mean.

For Populations: For Samples:

© Copyright 2001, Alan Marshall 26

N

xx2

i

1N

xxsˆ

2

i

Standard DeviationStandard Deviation

Standard Deviation measures the amount of dispersion from the mean.

For Populations: For Samples:

© Copyright 2001, Alan Marshall 27

Standard Deviation ExampleStandard Deviation Example

Using the previous example The data is sample data

78.8110

694

1N

xxsˆ

2

i

© Copyright 2001, Alan Marshall 28

Interpreting the Std. Dev.Interpreting the Std. Dev.

You have heard of the Bell Shaped or Normal Distribution

The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE

© Copyright 2001, Alan Marshall 29

Normal DistributionNormal Distribution

-4 -3 -2 -1 0 1 2 3 4

Z Value- Standard Deviations from Mean

2/3

95%

99.7%

0.15%Tail

2.5%Tail

0.15%Tail

2.5%Tail

© Copyright 2001, Alan Marshall 30

Empirical RuleEmpirical Rule

For approximately Normally Distributed data: Within 1 of the mean: approx.. 2/3s Within 2 of the mean: approx. 95% (19/20) Within 3 of the mean: virtually all

© Copyright 2001, Alan Marshall 31

Quartiles, Percentiles, etc.Quartiles, Percentiles, etc.

The Median splits the data in half Quartiles split the data into quarters Deciles split the data into tenths Percentiles split the data into one-

hundredths

© Copyright 2001, Alan Marshall 32

Rank MeasuresRank Measures

“That was a top-half performance” “WTG Special fund has been a top quartile

performer for the past 5 years” “Our programme accepts only students

proven to be top decile performers” “I was in the 92nd percentile on the GMAT”

© Copyright 2001, Alan Marshall 33

Using ExcelUsing Excel

Full Descriptive Statistics Tools

Data Analysis Descriptive Statistics

© Copyright 2001, Alan Marshall 34

Measures of AssociationMeasures of Association

© Copyright 2001, Alan Marshall 35

Bivariate StatisticsBivariate Statistics

So far, we have been dealing with statistics of individual variables

We also have statistics that relate pairs of variables

© Copyright 2001, Alan Marshall 36

InteractionsInteractions

Sometimes two variables appear related: smoking and lung cancers height and weight years of education and income engine size and gas mileage GMAT scores and MBA GPA house size and price

© Copyright 2001, Alan Marshall 37

InteractionsInteractions

Some of these variables would appear to positively related & others negatively

If these were related, we would expect to be able to derive a linear relationship:

y = a + bx where, b is the slope, and a is the intercept

© Copyright 2001, Alan Marshall 38

Linear RelationshipsLinear Relationships

We will be deriving linear relationships from bivariate (two-variable) data

Our symbols will be:

term Error

Interceptˆ Slopeˆ

xy or xy

01

1010

© Copyright 2001, Alan Marshall 39

ExampleExample

Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index

The next slide shows 25 monthly returns

© Copyright 2001, Alan Marshall 40

Example DataExample Data

TSE CMP TSE CMP TSE CMPx y x y x y

3 4 -4 -3 2 4-1 -2 -1 0 -1 12 -2 0 -2 4 34 2 1 0 -2 -15 3 0 0 1 2

-3 -5 -3 1 -3 -4-5 -2 -3 -2 2 11 2 1 3 -2 -22 -1

© Copyright 2001, Alan Marshall 41

ExampleExample

From the data, it appears that a positive relationship may exist Most of the time when the TSE is up, CMP is

up Likewise, when the TSE is down, CMP is down

most of the time Sometimes, they move in opposite directions

Let’s graph this data

© Copyright 2001, Alan Marshall 42

Graph Of DataGraph Of Data

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

© Copyright 2001, Alan Marshall 43

Example Summary StatisticsExample Summary Statistics

The data do appear to be positively related Let’s derive some summary statistics about these

data:

© Copyright 2001, Alan Marshall 44

ObservationsObservations

Both have means of zero and standard deviations just under 3

However, each data point does not have simply one deviation from the mean, it deviates from both means

Consider Points A, B, C and D on the next graph

© Copyright 2001, Alan Marshall 45

Graph of DataGraph of Data

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

A

B

C

D

© Copyright 2001, Alan Marshall 46

ImplicationsImplications

When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive

When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative

© Copyright 2001, Alan Marshall 47

An Important ObservationAn Important Observation

The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship

© Copyright 2001, Alan Marshall 48

CovarianceCovariance(Showing the formula only to demonstrate a concept)(Showing the formula only to demonstrate a concept)

1nn

yxyx

1n

yyxxsYXcov

N

yxYXCOV

iiiii

n

1ii

XY

yi

N

1ixi

XY

),(

),(

© Copyright 2001, Alan Marshall 49

CovarianceCovariance

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6TSE

CMP

A

B

C

D

© Copyright 2001, Alan Marshall 50

CovarianceCovariance

In the same units as Variance (if both variables are in the same unit), i.e. units squared

Very important element of measuring portfolio risk in finance

© Copyright 2001, Alan Marshall 51

Covariance in ExcelCovariance in Excel

Tools Data Analysis

Covariance

Column 1 Column 2Column 1 7.25Column 2 4.875 6.25

© Copyright 2001, Alan Marshall 52

Interpreting the ResultInterpreting the Result

Column 1 Column 2Column 1 7.25Column 2 4.875 6.25

This gives us the variances (7.25 & 6.25) and the covariance between the variables, 4.875

In fact, variance is simply the covariance of a variable with itself!

© Copyright 2001, Alan Marshall 53

Using CovarianceUsing Covariance

Very useful in Finance for measuring portfolio risk

Unfortunately, it is hard to interpret for two reasons: What does the magnitude/size imply? The units are confusing

© Copyright 2001, Alan Marshall 54

A More Useful StatisticA More Useful Statistic

We can simultaneously adjust for both of these shortcomings by dividing the covariance by the two relevant standard deviations

This operation Removes the impact of size & scale Eliminates the units

© Copyright 2001, Alan Marshall 55

CorrelationCorrelation

Correlation measures the sensitivity of one variable to another, but ignoring magnitude

Range: -1 to 1 +1: Implies perfect positive co-movement -1: Implies perfect negative co-movement 0: No relationship

© Copyright 2001, Alan Marshall 56

Calculating CorrelationCalculating Correlation

YXXYXY

YXXY

ss

Y)cov(X,r

YXCOV

ˆ

),(

© Copyright 2001, Alan Marshall 57

Correlation in ExcelCorrelation in Excel

Tools Data Analysis

Correlation

Column 1 Column 2Column 1 1Column 2 0.724212 1

© Copyright 2001, Alan Marshall 58

Interpreting the ResultInterpreting the Result

The correlation of a variable with itself is 1 The correlation between CMP and the TSE Index

in this example is 0.724 This is positive, and relatively strong

Column 1 Column 2Column 1 1Column 2 0.724212 1

© Copyright 2001, Alan Marshall 59

Estimating Linear Estimating Linear RelationshipsRelationships

© Copyright 2001, Alan Marshall 60

Estimating Linear RelationshipsEstimating Linear Relationships

Often the data imply that a linear relationship exists

We can estimate this relationship using the Least Squares Method of Regression

We will just learn to use the Excel output and interpret it

© Copyright 2001, Alan Marshall 61

TSE-CMP Regression OutputTSE-CMP Regression Output(Abridged)(Abridged)

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.724211819R Square 0.524482759Adjusted R Square 0.503808096Standard Error 1.76102226Observations 25

Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05

© Copyright 2001, Alan Marshall 62

Interpreting the OutputInterpreting the Output

SUMMARY OUTPUT rCMP = 0 + 0.6724(rTSE) + e

Regression StatisticsMultiple R 0.724211819 CorrelationR Square 0.524482759 CoefficientAdjusted R Square 0.503808096Standard Error 1.76102226Observations 25 Intercept

Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05

Slope

© Copyright 2001, Alan Marshall 63

Where We Are GoingWhere We Are Going

We will develop the use of the regression technique more fully Multiple explanatory variables

Some Time-Series Applications