Regression Analysis

Post on 02-Jan-2016

16 views 0 download

description

Regression Analysis. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Regression. To express the relationship between two or more variables by a mathematical formula. x : predictor (independent) variable y : response (dependent) variable - PowerPoint PPT Presentation

Transcript of Regression Analysis

LOGO

Regression Analysis

Lecturer: Dr. Bo Yuan

E-mail: yuanb@sz.tsinghua.edu.cn

Regression

To express the relationship between two or more variables by a mathematical formula.

x: predictor (independent) variable

y: response (dependent) variable

Identify how y varies as a function of x.

y is also considered as a random variable.

Real-Word Example:

Footwear impressions are commonly observed at crime scenes.

While there are numerous forensic properties that can be obtained

from these impressions, one in particular is the shoe size. The

detectives would like to be able to estimate the height of the

impression maker from the shoe size.

The relationship between shoe sizes and heights2

Shoe Size vs. Height

3

Shoe Size vs. Height

What is the predictor?

What is the response?

Can the height by accurately estimated from the shoe size?

If a shoe size is 11, what would you advise the police?

What if the size is 7 or 12.5?

4

General Regression Model

The systematic part m(x) is deterministic.

The error ε(x) is a random variable.

Measurement Error

Natural Variations

Additive

5

)()()( xxmxy

Example: Sin Function

6

)()sin()( xxAxy

Standard Assumptions

7

A1

8

A2

9

A3

10

Back to Shoes

11

Simple Linear Regression

12

xxm 10)(

Model Parameters

13

Derivation

14

n

iii xyR

1

21010 ),(

xy

xyn

iii

R

10

1100

020

2

1

2

11

111

1100

0

021

xnx

yxnyx

xxyxyx

xyx

n

ii

n

iii

n

iiiii

n

iiii

R

Standard Deviations

15

n

iin 1

22

2

1

2/1

2

1

2

21

0

xnx

x

n n

i

2/1

2

1

2

11

xnxn

i

Polynomial Terms

Modeling the data as a line is not always adequate.

Polynomial Regression

This is still a linear model!

m(x) is a linear combination of β.

Danger of Overfitting

16

p

k

kk

pp xxxxm

010 ...)(

Matrix Representation

17

i

p

k

kiki xy

0

XY

Matrix Representation

18

XYXYR T )(

YXXX

XXYXXYYYTT

TTTTTTR

00

YXXX TT 1

Model Comparison

19

n

ii yySST

1

2 :Total Squares of Sum

n

iii yySSE

1

2^

:Error Squares of Sum

R2

20

SST

SSE

SST

SSESSTR

12

2 / ( ( 1))1

/ ( 1)adj

SSE n pR

SST n

Example

21

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5

0

5

10

15

20

25

30

X

Y

Y= -3.6029+4.8802X

R2=0.9131

Y= 0.7341-0.4303X+1.0621X2

R2=0.9880

Y=X2+N(0,1)

Tricky Relationship

22

Exercise Time

Fitn

ess

Youth

Elderly

Violent Crime vs. Video Game

23

0

2

4

6

8

10

12

14

16

18

0

100

200

300

400

500

600

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Violent Crime

Aggravated Assault

Robbery

Murder & Manslaughter

Forcible Rape

Video Game Sales

这是真的吗?

24

时间去哪儿了?

25

Summary

Regression is the oldest data mining technique.

Probably the first thing that you want to try on a new data set.

No need to do programming!

Matlab, Excel …

Quality of Regression

R2

Residual Plot

Cross Validation

What you should learn after class:

The Influence of Outliers

Confidence Interval

Nonlinear Regression

27