Statistics
description
Transcript of Statistics
© Copyright 2001, Alan Marshall 1
StatisticsStatistics
© Copyright 2001, Alan Marshall 2
StatisticsStatistics
Branch of Mathematics that deals with the collection and analysis of data
Descriptive Statistics: used to analyze and describe data
Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events.
© Copyright 2001, Alan Marshall 3
Measures of Central Measures of Central TendencyTendency
© Copyright 2001, Alan Marshall 4
Measures of Central TendencyMeasures of Central Tendency
Arithmetic Mean Median Mode Geometric Mean
© Copyright 2001, Alan Marshall 5
Arithmetic MeanArithmetic Mean
Other names Average Mean
tsmeasuremen of Number
tsmeasuremen the of SumMean
© Copyright 2001, Alan Marshall 6
Arithmetic MeanArithmetic Mean
n
xx
Samplen
1ii
The calculation is identical, just the notation varies slightly
N
x
PopulationN
1ii
x
© Copyright 2001, Alan Marshall 7
Summation NotationSummation Notation
N
1tt
N
1t t xx
Notice that the first form uses less vertical space on the page
This makes accountants very happy The first can also be easier to fit into a line of text
© Copyright 2001, Alan Marshall 8
ExampleExample
Ten second year BBA students wrote the CSC exam last month
Their scores were:
71, 72, 88, 69, 77, 63, 91, 81, 83, 75
77
10
75838191637769887271x
© Copyright 2001, Alan Marshall 9
Calculating the MeanCalculating the Mean
Arithmetic mean sum the observations and divide by the number of
observations
Example: 5%, 7%, -2%, 12%, 8%
%65
%30
N
rr
N
1tt
© Copyright 2001, Alan Marshall 10
Problem with the Arithmetic MeanProblem with the Arithmetic Mean
Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change
$1,000 at 6% for 5 years should be $1,338.23
Starting Rate of EndingValue Return Value1000.00 5% 1050.00
(1000)(1.06)5 1050.00 7% 1123.50=1338.23 1123.50 -2% 1101.03
1101.03 12% 1233.151233.15 8% 1331.81
© Copyright 2001, Alan Marshall 11
Geometric MeanGeometric Mean
The Geometric Mean should be used for rates of change, like rates of return
%898.51)3318059.1(
108.112.198.007.105.1
1)r1(r
2.0
5
1
N
1N
1tt
© Copyright 2001, Alan Marshall 12
Geometric MeanGeometric Mean
The Geometric Mean should be used for rates of change, like rates of return
%898.51)3318059.1(
108.112.198.007.105.1
1)r1(r
2.0
5
1
N
1N
1tt
Means: The product of these factors from 1 to N
© Copyright 2001, Alan Marshall 13
Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean
The more variable the underlying data, the greater the error using the Arithmetic mean
The Geometric Mean is often easier to calculate: Stock prices: 1992: $20; 1999: $40, R =
10.41%
© Copyright 2001, Alan Marshall 14
Geometric vs. Arithmetic MeanGeometric vs. Arithmetic Mean
For analysis of past performance, use the Geometric mean The past returns have averaged 5.898%
To use the past returns to estimate the future expected return, use the Arithmetic mean The expected return is 6%
© Copyright 2001, Alan Marshall 15
Median and ModeMedian and Mode
Median: Midpoint If odd number of observations: Middle
observation If even number of observations: Average of
middle 2 observations Mode: Most frequent
© Copyright 2001, Alan Marshall 16
ExampleExample
Our CSC mark data was (sorted):
63, 69, 71, 72, 75, 77, 81, 83, 88, 91 The median is 76 There is no mode
© Copyright 2001, Alan Marshall 17
ExampleExample
The Deviation is the difference between each observation and the mean
The sign indicates whether the observation is above (+) or below (-) the mean
x Deviation
71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2
77 0
© Copyright 2001, Alan Marshall 18
ExampleExample
The average deviation is always zero
If it isn’t, you must have made a mistake!
x Deviation
71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2
77 0
© Copyright 2001, Alan Marshall 19
Measures of DispersionMeasures of Dispersion
© Copyright 2001, Alan Marshall 20
Measures of DispersionMeasures of Dispersion
So far, we have look at measures of central tendency
What about measuring the tendency of the data to vary from these centre?
© Copyright 2001, Alan Marshall 21
Measures of DispersionMeasures of Dispersion
Range Highest - Lowest
Variance Standard Deviation
© Copyright 2001, Alan Marshall 22
ExampleExample
The range is 91-63=28 The range can be
extremely sensitive to outlier observations
Suppose one of these students had a very bad day and scored 8. The range would now
be 91-8=83
x Deviation
71 -6 72 -5 88 11 69 -8 77 0 63 -14 91 14 81 4 83 6 75 -2
77 0
© Copyright 2001, Alan Marshall 23
Mean Absolute DeviationMean Absolute Deviation
The Mean Absolute Deviation is a measure of average dispersion that is not used very much
It has some undesirable mathematical properties beyond the level of this course
x Deviation |D|
71 -6 6 72 -5 5 88 11 11 69 -8 8 77 0 0 63 -14 14 91 14 14 81 4 4 83 6 6 75 -2 2
77 0 7
© Copyright 2001, Alan Marshall 24
Mean Squared DeviationMean Squared Deviation
The Mean Squared Deviation is very commonly used
The MSD in this example is 694/10=69.4
The more common name of the MSD is the VARIANCE
x Deviation D2
71 -6 36 72 -5 25 88 11 121 69 -8 64 77 0 0 63 -14 196 91 14 196 81 4 16 83 6 36 75 -2 4
77 0 694
© Copyright 2001, Alan Marshall 25
N
x 2i2
1n
xxsˆ
2
i22
VarianceVariance
Variance measures the amount of dispersion from the mean.
For Populations: For Samples:
© Copyright 2001, Alan Marshall 26
N
xx2
i
1N
xxsˆ
2
i
Standard DeviationStandard Deviation
Standard Deviation measures the amount of dispersion from the mean.
For Populations: For Samples:
© Copyright 2001, Alan Marshall 27
Standard Deviation ExampleStandard Deviation Example
Using the previous example The data is sample data
78.8110
694
1N
xxsˆ
2
i
© Copyright 2001, Alan Marshall 28
Interpreting the Std. Dev.Interpreting the Std. Dev.
You have heard of the Bell Shaped or Normal Distribution
The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE
© Copyright 2001, Alan Marshall 29
Normal DistributionNormal Distribution
-4 -3 -2 -1 0 1 2 3 4
Z Value- Standard Deviations from Mean
2/3
95%
99.7%
0.15%Tail
2.5%Tail
0.15%Tail
2.5%Tail
© Copyright 2001, Alan Marshall 30
Empirical RuleEmpirical Rule
For approximately Normally Distributed data: Within 1 of the mean: approx.. 2/3s Within 2 of the mean: approx. 95% (19/20) Within 3 of the mean: virtually all
© Copyright 2001, Alan Marshall 31
Quartiles, Percentiles, etc.Quartiles, Percentiles, etc.
The Median splits the data in half Quartiles split the data into quarters Deciles split the data into tenths Percentiles split the data into one-
hundredths
© Copyright 2001, Alan Marshall 32
Rank MeasuresRank Measures
“That was a top-half performance” “WTG Special fund has been a top quartile
performer for the past 5 years” “Our programme accepts only students
proven to be top decile performers” “I was in the 92nd percentile on the GMAT”
© Copyright 2001, Alan Marshall 33
Using ExcelUsing Excel
Full Descriptive Statistics Tools
Data Analysis Descriptive Statistics
© Copyright 2001, Alan Marshall 34
Measures of AssociationMeasures of Association
© Copyright 2001, Alan Marshall 35
Bivariate StatisticsBivariate Statistics
So far, we have been dealing with statistics of individual variables
We also have statistics that relate pairs of variables
© Copyright 2001, Alan Marshall 36
InteractionsInteractions
Sometimes two variables appear related: smoking and lung cancers height and weight years of education and income engine size and gas mileage GMAT scores and MBA GPA house size and price
© Copyright 2001, Alan Marshall 37
InteractionsInteractions
Some of these variables would appear to positively related & others negatively
If these were related, we would expect to be able to derive a linear relationship:
y = a + bx where, b is the slope, and a is the intercept
© Copyright 2001, Alan Marshall 38
Linear RelationshipsLinear Relationships
We will be deriving linear relationships from bivariate (two-variable) data
Our symbols will be:
term Error
Interceptˆ Slopeˆ
xy or xy
01
1010
© Copyright 2001, Alan Marshall 39
ExampleExample
Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index
The next slide shows 25 monthly returns
© Copyright 2001, Alan Marshall 40
Example DataExample Data
TSE CMP TSE CMP TSE CMPx y x y x y
3 4 -4 -3 2 4-1 -2 -1 0 -1 12 -2 0 -2 4 34 2 1 0 -2 -15 3 0 0 1 2
-3 -5 -3 1 -3 -4-5 -2 -3 -2 2 11 2 1 3 -2 -22 -1
© Copyright 2001, Alan Marshall 41
ExampleExample
From the data, it appears that a positive relationship may exist Most of the time when the TSE is up, CMP is
up Likewise, when the TSE is down, CMP is down
most of the time Sometimes, they move in opposite directions
Let’s graph this data
© Copyright 2001, Alan Marshall 42
Graph Of DataGraph Of Data
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6TSE
CMP
© Copyright 2001, Alan Marshall 43
Example Summary StatisticsExample Summary Statistics
The data do appear to be positively related Let’s derive some summary statistics about these
data:
© Copyright 2001, Alan Marshall 44
ObservationsObservations
Both have means of zero and standard deviations just under 3
However, each data point does not have simply one deviation from the mean, it deviates from both means
Consider Points A, B, C and D on the next graph
© Copyright 2001, Alan Marshall 45
Graph of DataGraph of Data
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6TSE
CMP
A
B
C
D
© Copyright 2001, Alan Marshall 46
ImplicationsImplications
When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive
When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative
© Copyright 2001, Alan Marshall 47
An Important ObservationAn Important Observation
The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship
© Copyright 2001, Alan Marshall 48
CovarianceCovariance(Showing the formula only to demonstrate a concept)(Showing the formula only to demonstrate a concept)
1nn
yxyx
1n
yyxxsYXcov
N
yxYXCOV
iiiii
n
1ii
XY
yi
N
1ixi
XY
),(
),(
© Copyright 2001, Alan Marshall 49
CovarianceCovariance
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6TSE
CMP
A
B
C
D
© Copyright 2001, Alan Marshall 50
CovarianceCovariance
In the same units as Variance (if both variables are in the same unit), i.e. units squared
Very important element of measuring portfolio risk in finance
© Copyright 2001, Alan Marshall 51
Covariance in ExcelCovariance in Excel
Tools Data Analysis
Covariance
Column 1 Column 2Column 1 7.25Column 2 4.875 6.25
© Copyright 2001, Alan Marshall 52
Interpreting the ResultInterpreting the Result
Column 1 Column 2Column 1 7.25Column 2 4.875 6.25
This gives us the variances (7.25 & 6.25) and the covariance between the variables, 4.875
In fact, variance is simply the covariance of a variable with itself!
© Copyright 2001, Alan Marshall 53
Using CovarianceUsing Covariance
Very useful in Finance for measuring portfolio risk
Unfortunately, it is hard to interpret for two reasons: What does the magnitude/size imply? The units are confusing
© Copyright 2001, Alan Marshall 54
A More Useful StatisticA More Useful Statistic
We can simultaneously adjust for both of these shortcomings by dividing the covariance by the two relevant standard deviations
This operation Removes the impact of size & scale Eliminates the units
© Copyright 2001, Alan Marshall 55
CorrelationCorrelation
Correlation measures the sensitivity of one variable to another, but ignoring magnitude
Range: -1 to 1 +1: Implies perfect positive co-movement -1: Implies perfect negative co-movement 0: No relationship
© Copyright 2001, Alan Marshall 56
Calculating CorrelationCalculating Correlation
YXXYXY
YXXY
ss
Y)cov(X,r
YXCOV
ˆ
),(
© Copyright 2001, Alan Marshall 57
Correlation in ExcelCorrelation in Excel
Tools Data Analysis
Correlation
Column 1 Column 2Column 1 1Column 2 0.724212 1
© Copyright 2001, Alan Marshall 58
Interpreting the ResultInterpreting the Result
The correlation of a variable with itself is 1 The correlation between CMP and the TSE Index
in this example is 0.724 This is positive, and relatively strong
Column 1 Column 2Column 1 1Column 2 0.724212 1
© Copyright 2001, Alan Marshall 59
Estimating Linear Estimating Linear RelationshipsRelationships
© Copyright 2001, Alan Marshall 60
Estimating Linear RelationshipsEstimating Linear Relationships
Often the data imply that a linear relationship exists
We can estimate this relationship using the Least Squares Method of Regression
We will just learn to use the Excel output and interpret it
© Copyright 2001, Alan Marshall 61
TSE-CMP Regression OutputTSE-CMP Regression Output(Abridged)(Abridged)
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.724211819R Square 0.524482759Adjusted R Square 0.503808096Standard Error 1.76102226Observations 25
Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05
© Copyright 2001, Alan Marshall 62
Interpreting the OutputInterpreting the Output
SUMMARY OUTPUT rCMP = 0 + 0.6724(rTSE) + e
Regression StatisticsMultiple R 0.724211819 CorrelationR Square 0.524482759 CoefficientAdjusted R Square 0.503808096Standard Error 1.76102226Observations 25 Intercept
Coefficients Standard Error t Stat P-valueIntercept 0 0.352204452 0 1X Variable 1 0.672413793 0.133502753 5.036704 4.26E-05
Slope
© Copyright 2001, Alan Marshall 63
Where We Are GoingWhere We Are Going
We will develop the use of the regression technique more fully Multiple explanatory variables
Some Time-Series Applications