1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between...

37
1 Correlation and Simple Regression

Transcript of 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between...

Page 1: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

1

Correlation and Simple Regression

Page 2: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

2

Introduction

Interested in the relationships between variables. What will happen to one variable if another is

changed? To what extent is it the case that increases in the

interest rate reduce inflation? Might want to know how sensitive the relationship is,

and if possible, what form it takes. Models needed.

Page 3: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

3

Koop’s Deforestation Data

Y – average annual forest loss, 1981-1990 as % of total forested area

X - #people per 1000 hectares

Date on 70 tropical countries (N=70)

Page 4: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

4

Line of Best Fit to Forest Data

0

1

2

3

4

5

6

0 1000 2000 3000

Population Density

Fo

rest

Lo

ss (Y

)Y

PredictedY

Figure 1.1 Deforestation/Population Density Data with Line of Best Fit

Page 5: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

5

Line of Best Fit to Forest Data

0

1

2

3

4

5

6

0 1000 2000 3000

Population Density

Fo

res

t L

os

s (

Y)

Y

PredictedY

Predicted Value of Forest Loss Given Population Density

Page 6: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

6

X=2000 implies Y=2.3If there are 2000 people per 1000 hectares,forest loss would be about 2.3%.

Commentsi) Increased dispersion about the lineas X increases; more uncertainty aboutpredictions for higher population densities.ii) Ignores other impacts on deforestation.

Page 7: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

7

Correlation

Objectives of Correlation To measures how close the relationship

between two variables is to linearity – strength of linear association

Capture the sign of relationship Determine on common scale for all cases: -1

to +1 Closer to zero, weaker correlation

Page 8: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

8

Sample Covariance

X and Y vary about their mean values.

To what extent is this variation aligned?

Page 9: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

9

Scatter Plot of Forest Loss AgainstPopulation Density: Axes Crossing at Mean Points

0

1

2

3

4

5

6

0 500 1000 1500 2000 2500 3000

P opul ati on Dens i ty

639X 14.1Y

0,0 YYXX ii

0,0 YYXX ii

0,0 YYXX ii

0,0 YYXX ii

Page 10: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

10

Deviations from Mean

YYXX ii , same sign 0 YYXX ii

YYXX ii , opposite sign 0 YYXX ii

Page 11: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

11

Sample Covariance Formula

1

1,

N

YYXXN

iii

YX

Problem: varies with the scale of the data

Page 12: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

12

Sample Correlation I

Standardise using sample standard deviations

Sample variance:

Sample standard deviation:

X

ii

N

X X

N2

2

1

1

2XX

Page 13: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

13

Sample Correlation II

2

1

22

1

2

1,,

YNYXNX

YXNYX

rN

ii

N

ii

N

iii

YX

YXYX

Page 14: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

14

Calculations for Deforestation Data

34.267X

527569.762 X 8615.02 Y

9282.0Y

39.444, YX 6592.0,,

YX

YXYXr

Page 15: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

15

Correlation and Causality Must distinguish between causality and

correlation. Correlated does not imply causality. Not even an indication from a correlation

of which way the causality should run (from X to Y or the other way round).

Two trending time series variables may be spuriously correlated.

Causality is judgmental.

Page 16: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

16

Example: UK Aggregate Consumption and Income

Aggregate UK consumption and income over a period of years is highly correlated.

Economists believe there is a relationship between these two variables.

Take correlation to be evidence in favour of the existence of a causal relationship: income causes consumption.

Page 17: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

17

Time Series Plot of UK Aggregate Consumption and Income

Time Series Plot of UK Constant Price Consumption and Income, £Million, 1955-1984

60000

80000

100000

120000

140000

160000

180000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Time Index

Consumption

Income

Page 18: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

18

Scatter Plot of UK Aggregate Consumption Against Income

Scatter Plot of UK Constant Price Consumption Against Constant Price Income, £Million, 1955-1984

r=0.9982

60000

80000

100000

120000

140000

160000

55000 75000 95000 115000 135000 155000 175000

Income

Co

nsu

mp

tio

n

Page 19: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

19

Another Example

Ratio of unemployment benefit to wages, X, and the unemployment rate, Y.

Annual observations for 1920-1938 for the UK.

Theory: X causes Y Policy implication: r>0 implies cut benefits

relative to wages to reduce unemployment.

Page 20: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

20

Scatter Plot of Unemployment Against Wage/Benefit Ratio

Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)

0

5

1015

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6

Benefit/Wage Ratio

Un

em

plo

ym

en

t R

ate

What happens to r if the following observation is not included?

r = 0.3888

Page 21: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

21

Final Comments Correlation measures linear association on scale [-1,+1]. r=-1,+1 indicates PERFECT linear correlation (exact

straight line). Only concerned with the relationship between TWO

variables (bivariate). This measure is sensitive to outliers. Correlation may be taken as supportive evidence of

a causal relationship, but correlation does not imply causality.

Page 22: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

22

Bivariate regression

Correlation can: Indicate the strength of a relationship It cannot: Contribute to an understanding of how the variables

may be related Make predictions about Y based on knowledge of X Regression analysis can: Examine the nature of the relationship between X

and Y Make predictions from that.

Page 23: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

23

Line of Best Fit to Forest Data

0

1

2

3

4

5

6

0 1000 2000 3000

Population Density

Fo

rest

Lo

ss (Y

)Y

PredictedY

Figure 2.1 Deforestation/Population Density Data with Line of Best Fit

Page 24: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

24

Introduction

What is the line of best fit?How can it be defined?What does it mean?Can place line by eye, but non-

systematic.

Page 25: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

25

Scatter Plot of UK Constant Price Consumption Against Constant Price Income, £Million, 1955-1984

r=0.9982

60000

70000

80000

90000

100000

110000

120000

130000

140000

150000

160000

55000 75000 95000 115000 135000 155000 175000

Income

Co

nsu

mp

tio

n

UK consumption-income scatter plot gives a very strong indication of a linear relationship.

Page 26: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

26

Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)

r=0.3888

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6

Benefit/Wage Ratio

Un

emp

loym

ent

Rat

e

UK unemployment-benefit to wage ratio plotdoes not look linear.

Page 27: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

27

Models

Simplest model: straight line XY Too constrained – will never hold exactly.

Allow for disturbances for each case, i=1,2,…,N

iii XY

Properties of disturbances: on average zero, but they vary.

They have: mean zero, and variance denoted: 2

Page 28: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

28

Scatter Plot of Data Generated According to Y=1+2X+ , Var( )=1r=0.9991

0

10

20

30

40

50

60

70

0 5 10 15 20 25 30 35

X

Y

Page 29: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

29

Scatter Plot of Data Generated According to Y=1+2X+ , Var( )=20r=0.6749

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30 35

X

Y

Page 30: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

30

So what?

We have a theory that allows us to think of there being an underlying linear relationship, but one that isn’t exact.

This fits with what we observe. It leads to a statistical theory of errors, the

real life equivalent of the theoretical disturbances, that eventually allows testing of various sorts.

Page 31: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

31

Least Squares Line: Bivariate Linear Regression

Want the BEST LINEAR description of the way Y depends on X

Deforestation on population density, or consumption on income, or unemployment on the benefit to wage ratio.

Geometrically, we want the best fitting straight line to the data presented on a scatter plot.

Needs to be defined

Page 32: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

32

Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)

r=0.3888

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6

Benefit/Wage Ratio

Un

em

plo

ym

en

t R

ate

error

error

Lots of big errors, ei

error

Errors smallerhere

Page 33: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

33

Want calculate best values 

iii eXY ˆˆ

i=1,2,...,N.

ˆ,ˆin

Page 34: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

34

XY ˆˆ

Equation of the ‘fitted’ line– note that subscripts are not used here:

Predicted (fitted) value of Yi given X i

ii XY ˆˆˆ

Page 35: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

35

Scatter Plot of UK Unemployment Rate (Y) Against Benefit/Wage Ratio (X)

r=0.3888

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6

Benefit/Wage Ratio

Un

em

plo

ym

en

t R

ate

Yi

XY ˆˆˆ

iY

Xi

(Xi,Yi)

Page 36: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

36

The Error

Also called the RESIDUAL

ii

iii

XY

YYe

ˆˆ

ˆ

There are N, of these, one for each i=1,2,…,N

Page 37: 1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.

37

The Best Line

Actually, a best line – others can be defined

That which

minimises the sum of thesquares of the errors