New Linear Regression (Part II) · 2020. 9. 23. · Linear Regression (Part II) [email protected]...
Transcript of New Linear Regression (Part II) · 2020. 9. 23. · Linear Regression (Part II) [email protected]...
11/3/2008
1
Linear Regression(Part II)
Outline
• Generalize the linear regression
• Simple multiple regression
• Regularization
• Bias vs. Variance tradeoff
• Model selection
2
11/3/2008
2
Linear regression in D dimensions
From D = 1
To D > 1 and linear on x(multiple regression)
To D > 1 and any set of nonlinear functions on x
3
All are linear on wLinear regression
Linear basis functions models
4
11/3/2008
3
Example of basis functions transformation
5
Maximum likelihood and least squares
Assuming normal distribution
The likelihood to observe N target values t from a set of X D dimensional predictors
Maximize ln of likelihood…
6
is the same as minimizing E
11/3/2008
4
Solving MLEFind optimal w…
based on Moore-Penrose pseudo-inverse
7
Outline
• Generalize the linear regression
• Simple multiple regression
• Regularization
• Bias vs. Variance tradeoff
• Model selection
8
11/3/2008
5
The case of simple multiple regression
wi XiY=w0 +
w
w
w0
w1
wD
9
The case of simple multiple regression
w
10
11/3/2008
6
Least squares estimatew
w
w w w
w
w
11
ww w
Least squares estimate
w
w
12
11/3/2008
7
Example
13
Solution
14
11/3/2008
8
Solution (cont.)
15
w
What it looks like?
16
11/3/2008
9
Outline
• Generalize the linear regression
• Simple multiple regression
• Regularization
• Bias vs. Variance tradeoff
• Model selection
17
Regularization
Recall the case of
• a single predictor
• and polynomial basis function φj(x) = xj
18
11/3/2008
10
1st Order Polynomial
19
3rd Order Polynomial
20
11/3/2008
11
9th Order Polynomial
21
Over‐fitting
Root‐Mean‐Square (RMS) Error:
22
11/3/2008
12
Polynomial Coefficients
23
Data Set Size:
9th Order Polynomial
24
11/3/2008
13
Data Set Size:
9th Order Polynomial
25
Regularization
Penalize large coefficient values
26
11/3/2008
14
Regularization:
27
Regularization:
28
11/3/2008
15
Regularization: vs.
29
Polynomial Coefficients
30
11/3/2008
16
Outline
• Generalize the linear regression
• Simple multiple regression
• Regularization
• Bias vs. Variance tradeoff
• Model selection
31
Bias vs. Variance
noise
32
variance
bias
11/3/2008
17
How this applies to regression?
33
How this applies to regression?
34
11/3/2008
18
How this applies to regression?
35
Outline
• Generalize the linear regression
• Simple multiple regression
• Regularization
• Bias vs. Variance tradeoff
• Model selection
36
11/3/2008
19
Model Selection
M (degree of polynomial) corresponds to the l it f th d lcomplexity of the model
The bigger M, the bigger the overfitting
Regularization (λ) helps by controlling the complexity
But how to find good combinations for M and λ?But how to find good combinations for M and λ?
37
Information criteria
p(D|wML) is the likelihoodM is the number of parameters
• AIC (Akaike Information Criterion)
maximize p(D|wML) - M
Tends to favor overly simply models
38