1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between...
-
Upload
rachel-mcnamara -
Category
Documents
-
view
212 -
download
0
Transcript of 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between...
1
Correlation and Simple Regression
2
Introduction
Interested in the relationships between variables. What will happen to one variable if another is
changed? To what extent is it the case that increases in the
interest rate reduce inflation? Might want to know how sensitive the relationship is,
and if possible, what form it takes. Models needed.
3
Koop’s Deforestation Data
Y – average annual forest loss, 1981-1990 as % of total forested area
X - #people per 1000 hectares
Date on 70 tropical countries (N=70)
4
Line of Best Fit to Forest Data
0
1
2
3
4
5
6
0 1000 2000 3000
Population Density
Fo
rest
Lo
ss (Y
)Y
PredictedY
Figure 1.1 Deforestation/Population Density Data with Line of Best Fit
5
Line of Best Fit to Forest Data
0
1
2
3
4
5
6
0 1000 2000 3000
Population Density
Fo
res
t L
os
s (
Y)
Y
PredictedY
Predicted Value of Forest Loss Given Population Density
6
X=2000 implies Y=2.3If there are 2000 people per 1000 hectares,forest loss would be about 2.3%.
Commentsi) Increased dispersion about the lineas X increases; more uncertainty aboutpredictions for higher population densities.ii) Ignores other impacts on deforestation.
7
Correlation
Objectives of Correlation To measures how close the relationship
between two variables is to linearity – strength of linear association
Capture the sign of relationship Determine on common scale for all cases: -1
to +1 Closer to zero, weaker correlation
8
Sample Covariance
X and Y vary about their mean values.
To what extent is this variation aligned?
9
Scatter Plot of Forest Loss AgainstPopulation Density: Axes Crossing at Mean Points
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000
P opul ati on Dens i ty
639X 14.1Y
0,0 YYXX ii
0,0 YYXX ii
0,0 YYXX ii
0,0 YYXX ii
10
Deviations from Mean
YYXX ii , same sign 0 YYXX ii
YYXX ii , opposite sign 0 YYXX ii
11
Sample Covariance Formula
1
1,
N
YYXXN
iii
YX
Problem: varies with the scale of the data
12
Sample Correlation I
Standardise using sample standard deviations
Sample variance:
Sample standard deviation:
X
ii
N
X X
N2
2
1
1
2XX
13
Sample Correlation II
2
1
22
1
2
1,,
YNYXNX
YXNYX
rN
ii
N
ii
N
iii
YX
YXYX
14
Calculations for Deforestation Data
34.267X
527569.762 X 8615.02 Y
9282.0Y
39.444, YX 6592.0,,
YX
YXYXr
15
Correlation and Causality Must distinguish between causality and
correlation. Correlated does not imply causality. Not even an indication from a correlation
of which way the causality should run (from X to Y or the other way round).
Two trending time series variables may be spuriously correlated.
Causality is judgmental.
16
Example: UK Aggregate Consumption and Income
Aggregate UK consumption and income over a period of years is highly correlated.
Economists believe there is a relationship between these two variables.
Take correlation to be evidence in favour of the existence of a causal relationship: income causes consumption.
17
Time Series Plot of UK Aggregate Consumption and Income
Time Series Plot of UK Constant Price Consumption and Income, £Million, 1955-1984
60000
80000
100000
120000
140000
160000
180000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Time Index
Consumption
Income
18
Scatter Plot of UK Aggregate Consumption Against Income
Scatter Plot of UK Constant Price Consumption Against Constant Price Income, £Million, 1955-1984
r=0.9982
60000
80000
100000
120000
140000
160000
55000 75000 95000 115000 135000 155000 175000
Income
Co
nsu
mp
tio
n
19
Another Example
Ratio of unemployment benefit to wages, X, and the unemployment rate, Y.
Annual observations for 1920-1938 for the UK.
Theory: X causes Y Policy implication: r>0 implies cut benefits
relative to wages to reduce unemployment.
20
Scatter Plot of Unemployment Against Wage/Benefit Ratio
Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)
0
5
1015
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6
Benefit/Wage Ratio
Un
em
plo
ym
en
t R
ate
What happens to r if the following observation is not included?
r = 0.3888
21
Final Comments Correlation measures linear association on scale [-1,+1]. r=-1,+1 indicates PERFECT linear correlation (exact
straight line). Only concerned with the relationship between TWO
variables (bivariate). This measure is sensitive to outliers. Correlation may be taken as supportive evidence of
a causal relationship, but correlation does not imply causality.
22
Bivariate regression
Correlation can: Indicate the strength of a relationship It cannot: Contribute to an understanding of how the variables
may be related Make predictions about Y based on knowledge of X Regression analysis can: Examine the nature of the relationship between X
and Y Make predictions from that.
23
Line of Best Fit to Forest Data
0
1
2
3
4
5
6
0 1000 2000 3000
Population Density
Fo
rest
Lo
ss (Y
)Y
PredictedY
Figure 2.1 Deforestation/Population Density Data with Line of Best Fit
24
Introduction
What is the line of best fit?How can it be defined?What does it mean?Can place line by eye, but non-
systematic.
25
Scatter Plot of UK Constant Price Consumption Against Constant Price Income, £Million, 1955-1984
r=0.9982
60000
70000
80000
90000
100000
110000
120000
130000
140000
150000
160000
55000 75000 95000 115000 135000 155000 175000
Income
Co
nsu
mp
tio
n
UK consumption-income scatter plot gives a very strong indication of a linear relationship.
26
Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)
r=0.3888
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6
Benefit/Wage Ratio
Un
emp
loym
ent
Rat
e
UK unemployment-benefit to wage ratio plotdoes not look linear.
27
Models
Simplest model: straight line XY Too constrained – will never hold exactly.
Allow for disturbances for each case, i=1,2,…,N
iii XY
Properties of disturbances: on average zero, but they vary.
They have: mean zero, and variance denoted: 2
28
Scatter Plot of Data Generated According to Y=1+2X+ , Var( )=1r=0.9991
0
10
20
30
40
50
60
70
0 5 10 15 20 25 30 35
X
Y
29
Scatter Plot of Data Generated According to Y=1+2X+ , Var( )=20r=0.6749
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30 35
X
Y
30
So what?
We have a theory that allows us to think of there being an underlying linear relationship, but one that isn’t exact.
This fits with what we observe. It leads to a statistical theory of errors, the
real life equivalent of the theoretical disturbances, that eventually allows testing of various sorts.
31
Least Squares Line: Bivariate Linear Regression
Want the BEST LINEAR description of the way Y depends on X
Deforestation on population density, or consumption on income, or unemployment on the benefit to wage ratio.
Geometrically, we want the best fitting straight line to the data presented on a scatter plot.
Needs to be defined
32
Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)
r=0.3888
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6
Benefit/Wage Ratio
Un
em
plo
ym
en
t R
ate
error
error
Lots of big errors, ei
error
Errors smallerhere
33
Want calculate best values
iii eXY ˆˆ
i=1,2,...,N.
ˆ,ˆin
34
XY ˆˆ
Equation of the ‘fitted’ line– note that subscripts are not used here:
Predicted (fitted) value of Yi given X i
ii XY ˆˆˆ
35
Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)
r=0.3888
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6
Benefit/Wage Ratio
Un
em
plo
ym
en
t R
ate
Yi
XY ˆˆˆ
iY
Xi
(Xi,Yi)
36
The Error
Also called the RESIDUAL
ii
iii
XY
YYe
ˆˆ
ˆ
There are N, of these, one for each i=1,2,…,N
37
The Best Line
Actually, a best line – others can be defined
That which
minimises the sum of thesquares of the errors