Regression Analysis
-
Upload
aileen-rodriguez -
Category
Documents
-
view
16 -
download
0
description
Transcript of Regression Analysis
Regression
To express the relationship between two or more variables by a mathematical formula.
x: predictor (independent) variable
y: response (dependent) variable
Identify how y varies as a function of x.
y is also considered as a random variable.
Real-Word Example:
Footwear impressions are commonly observed at crime scenes.
While there are numerous forensic properties that can be obtained
from these impressions, one in particular is the shoe size. The
detectives would like to be able to estimate the height of the
impression maker from the shoe size.
The relationship between shoe sizes and heights2
Shoe Size vs. Height
3
Shoe Size vs. Height
What is the predictor?
What is the response?
Can the height by accurately estimated from the shoe size?
If a shoe size is 11, what would you advise the police?
What if the size is 7 or 12.5?
4
General Regression Model
The systematic part m(x) is deterministic.
The error ε(x) is a random variable.
Measurement Error
Natural Variations
Additive
5
)()()( xxmxy
Example: Sin Function
6
)()sin()( xxAxy
Standard Assumptions
7
A1
8
A2
9
A3
10
Back to Shoes
11
Simple Linear Regression
12
xxm 10)(
Model Parameters
13
Derivation
14
n
iii xyR
1
21010 ),(
xy
xyn
iii
R
10
1100
020
2
1
2
11
111
1100
0
021
xnx
yxnyx
xxyxyx
xyx
n
ii
n
iii
n
iiiii
n
iiii
R
Standard Deviations
15
n
iin 1
22
2
1
2/1
2
1
2
21
0
xnx
x
n n
i
2/1
2
1
2
11
xnxn
i
Polynomial Terms
Modeling the data as a line is not always adequate.
Polynomial Regression
This is still a linear model!
m(x) is a linear combination of β.
Danger of Overfitting
16
p
k
kk
pp xxxxm
010 ...)(
Matrix Representation
17
i
p
k
kiki xy
0
XY
Matrix Representation
18
XYXYR T )(
YXXX
XXYXXYYYTT
TTTTTTR
00
YXXX TT 1
Model Comparison
19
n
ii yySST
1
2 :Total Squares of Sum
n
iii yySSE
1
2^
:Error Squares of Sum
R2
20
SST
SSE
SST
SSESSTR
12
2 / ( ( 1))1
/ ( 1)adj
SSE n pR
SST n
Example
21
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5
0
5
10
15
20
25
30
X
Y
Y= -3.6029+4.8802X
R2=0.9131
Y= 0.7341-0.4303X+1.0621X2
R2=0.9880
Y=X2+N(0,1)
Tricky Relationship
22
Exercise Time
Fitn
ess
Youth
Elderly
Violent Crime vs. Video Game
23
0
2
4
6
8
10
12
14
16
18
0
100
200
300
400
500
600
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Violent Crime
Aggravated Assault
Robbery
Murder & Manslaughter
Forcible Rape
Video Game Sales
这是真的吗?
24
时间去哪儿了?
25
Summary
Regression is the oldest data mining technique.
Probably the first thing that you want to try on a new data set.
No need to do programming!
Matlab, Excel …
Quality of Regression
R2
Residual Plot
Cross Validation
What you should learn after class:
The Influence of Outliers
Confidence Interval
Nonlinear Regression
27