Regression Analysis

27
LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: [email protected]

description

Regression Analysis. Lecturer: Dr. Bo Yuan E-mail: [email protected]. Regression. To express the relationship between two or more variables by a mathematical formula. x : predictor (independent) variable y : response (dependent) variable - PowerPoint PPT Presentation

Transcript of Regression Analysis

Page 1: Regression Analysis

LOGO

Regression Analysis

Lecturer: Dr. Bo Yuan

E-mail: [email protected]

Page 2: Regression Analysis

Regression

To express the relationship between two or more variables by a mathematical formula.

x: predictor (independent) variable

y: response (dependent) variable

Identify how y varies as a function of x.

y is also considered as a random variable.

Real-Word Example:

Footwear impressions are commonly observed at crime scenes.

While there are numerous forensic properties that can be obtained

from these impressions, one in particular is the shoe size. The

detectives would like to be able to estimate the height of the

impression maker from the shoe size.

The relationship between shoe sizes and heights2

Page 3: Regression Analysis

Shoe Size vs. Height

3

Page 4: Regression Analysis

Shoe Size vs. Height

What is the predictor?

What is the response?

Can the height by accurately estimated from the shoe size?

If a shoe size is 11, what would you advise the police?

What if the size is 7 or 12.5?

4

Page 5: Regression Analysis

General Regression Model

The systematic part m(x) is deterministic.

The error ε(x) is a random variable.

Measurement Error

Natural Variations

Additive

5

)()()( xxmxy

Page 6: Regression Analysis

Example: Sin Function

6

)()sin()( xxAxy

Page 7: Regression Analysis

Standard Assumptions

7

Page 8: Regression Analysis

A1

8

Page 9: Regression Analysis

A2

9

Page 10: Regression Analysis

A3

10

Page 11: Regression Analysis

Back to Shoes

11

Page 12: Regression Analysis

Simple Linear Regression

12

xxm 10)(

Page 13: Regression Analysis

Model Parameters

13

Page 14: Regression Analysis

Derivation

14

n

iii xyR

1

21010 ),(

xy

xyn

iii

R

10

1100

020

2

1

2

11

111

1100

0

021

xnx

yxnyx

xxyxyx

xyx

n

ii

n

iii

n

iiiii

n

iiii

R

Page 15: Regression Analysis

Standard Deviations

15

n

iin 1

22

2

1

2/1

2

1

2

21

0

xnx

x

n n

i

2/1

2

1

2

11

xnxn

i

Page 16: Regression Analysis

Polynomial Terms

Modeling the data as a line is not always adequate.

Polynomial Regression

This is still a linear model!

m(x) is a linear combination of β.

Danger of Overfitting

16

p

k

kk

pp xxxxm

010 ...)(

Page 17: Regression Analysis

Matrix Representation

17

i

p

k

kiki xy

0

XY

Page 18: Regression Analysis

Matrix Representation

18

XYXYR T )(

YXXX

XXYXXYYYTT

TTTTTTR

00

YXXX TT 1

Page 19: Regression Analysis

Model Comparison

19

n

ii yySST

1

2 :Total Squares of Sum

n

iii yySSE

1

2^

:Error Squares of Sum

Page 20: Regression Analysis

R2

20

SST

SSE

SST

SSESSTR

12

2 / ( ( 1))1

/ ( 1)adj

SSE n pR

SST n

Page 21: Regression Analysis

Example

21

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5

0

5

10

15

20

25

30

X

Y

Y= -3.6029+4.8802X

R2=0.9131

Y= 0.7341-0.4303X+1.0621X2

R2=0.9880

Y=X2+N(0,1)

Page 22: Regression Analysis

Tricky Relationship

22

Exercise Time

Fitn

ess

Youth

Elderly

Page 23: Regression Analysis

Violent Crime vs. Video Game

23

0

2

4

6

8

10

12

14

16

18

0

100

200

300

400

500

600

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Violent Crime

Aggravated Assault

Robbery

Murder & Manslaughter

Forcible Rape

Video Game Sales

Page 24: Regression Analysis

这是真的吗?

24

Page 25: Regression Analysis

时间去哪儿了?

25

Page 27: Regression Analysis

Summary

Regression is the oldest data mining technique.

Probably the first thing that you want to try on a new data set.

No need to do programming!

Matlab, Excel …

Quality of Regression

R2

Residual Plot

Cross Validation

What you should learn after class:

The Influence of Outliers

Confidence Interval

Nonlinear Regression

27