Regularization Methods for Linear Regression
Transcript of Regularization Methods for Linear Regression
![Page 1: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/1.jpg)
Regularization Methods forLinear Regression
Mathilde Mougeot
ENSIIE
2017-2018
Mathilde Mougeot (ENSIIE) MRR2017 1 / 66
![Page 2: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/2.jpg)
Agenda
Lessons
• 3 plenatory lessons (MRR)
• 3 Practical work sessions using R (mrr)
• 10 Project sessions (ipR)
Before next session, install on your computer
1 R software, https://www.r-project.org/
2 Rstudio, https://www.rstudio.com/
Documents are availabale (at this stage)
• https://sites.google.com/site/MougeotMathilde/teaching
Mathilde Mougeot (ENSIIE) MRR2017 2 / 66
![Page 3: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/3.jpg)
Plan
A word on data and predictive models
Data are everywhere
• Industry (Temperature, IR sensors...)
• Finance : transactions
• Marketing : consumer data.
• on your phone (GPS, mail, musique ...)
→ Data base are available everywhere : from small data set to Big Data
Mathilde Mougeot (ENSIIE) MRR2017 3 / 66
![Page 4: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/4.jpg)
Plan
A word on data and predictive models
Data are everywhere
• Industry (Temperature, IR sensors...)
• Finance : transactions
• Marketing : consumer data.
• on your phone (GPS, mail, musique ...)
→ Data base are available everywhere : from small data set to Big Data
Mathilde Mougeot (ENSIIE) MRR2017 3 / 66
![Page 5: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/5.jpg)
Plan
A word on data and predictive models
Nowadays, predictive models are crutial for monitoring, for diagnosis
• Industry : Health monitoring, Energy...
• Finance : forecast of the evolution of the market
• Marketing : scoring
• Health
→ Machine learning models are used to mine, to operate the data.
Mathilde Mougeot (ENSIIE) MRR2017 4 / 66
![Page 6: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/6.jpg)
Plan
Regularization Methods for Linear Regression
-Linear regression and Regularized Linear Regression belongs to the Predictivemodel family.-Linear regression is an old model but still very useful !Gauss, 1785 ; Legendre 1805
Outline of the lesson
• Motivations
• Ordinary Least Square -OLS- (geometrical approach)
• The linear Model (probabilistic approach)
• Using R software for modeling
Mathilde Mougeot (ENSIIE) MRR2017 5 / 66
![Page 7: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/7.jpg)
Plan
Evolution of the average age of the French population
Mathilde Mougeot (ENSIIE) MRR2017 6 / 66
![Page 8: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/8.jpg)
Plan
Evolution of the average age of the French population
Mathilde Mougeot (ENSIIE) MRR2017 7 / 66
![Page 9: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/9.jpg)
Plan
Modeling the average age of the French population
Mathilde Mougeot (ENSIIE) MRR2017 8 / 66
![Page 10: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/10.jpg)
Plan
Application : Social Networks
Facebook users :
an user(million)
31/12/04 031/12/05 531/12/06 1031/3/07 2030/12/07 6030/6/08 10030/12/08 15030/3/09 17030/6/09 20030/9/09 25030/12/09 30030/3/10 40030/6/10 50030/6/11 750
Mathilde Mougeot (ENSIIE) MRR2017 9 / 66
![Page 11: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/11.jpg)
Plan
Evolution of the number of Facebook users
Motivations : investment, forecastMathilde Mougeot (ENSIIE) MRR2017 10 / 66
![Page 12: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/12.jpg)
Plan
Modeling the evolution of the number of Facebook users
Mathilde Mougeot (ENSIIE) MRR2017 11 / 66
![Page 13: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/13.jpg)
Introduction : Regression model
• (Y,X) : couple of variablesY : Target quantitative variableX = (X1,X2, . . . ,Xp) : Co-variates, quantitative variables
→ The goal is to propose a Regression model to explain Y given X.The parameters of the model are computed using a set of data
Y = Fdata(X ) = Fdata(X1, . . . ,Xp)
In this case, F is a linear function.
• Questions :
- What are the performances of this model ?- What are the main explicative variables ?- Is-it possible to use the model to predict new values ? to forecast ?- Can we improve the model ?
![Page 14: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/14.jpg)
Boston Housing Data
The original data are n = 506 observations on p = 14 variables,medv median value, being the target variable
crim per capita crime rate by townzn proportion of residential land zoned for lots over 25,000 sq.ftindus proportion of non-retail business acres per townchas Charles River dummy variable (= 1 if tract bounds river ; 0 otherwise)nox nitric oxides concentration (parts per 10 million)rm average number of rooms per dwellingage proportion of owner-occupied units built prior to 1940dis weighted distances to five Boston employment centresrad index of accessibility to radial highwaystax full-value property-tax rate per USD 10,000ptratio pupil-teacher ratio by townb 1000(B − 0.63)2 where B is the proportion of blacks by townlstat percentage of lower status of the populationmedv median value of owner-occupied homes in USD 1000’s
![Page 15: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/15.jpg)
Boston Housing Data
The data :nr crim zn indus chas nox rm age dis rad tax ptratio b lstat medv1 0.006 18 2.3 0 0.53 6.57 65.2 4.09 1 296 15.3 396.9 4.9 24.02 0.027 0 7.0 0 0.46 6.42 78.9 4.96 2 242 17.8 396.9 9.1 21.63 0.027 0 7.0 0 0.46 7.18 61.1 4.96 2 242 17.8 392.8 4.0 34.74 0.032 0 2.1 0 0.45 6.99 45.8 6.06 3 222 18.7 394.6 2.9 33.45 0.069 0 2.1 0 0.45 7.14 54.2 6.06 3 222 18.7 396.9 5.3 36.2... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
![Page 16: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/16.jpg)
Boston Housing Data
Different points of view may exist for studying those data :Model Y = Fdata(X )
• Mining approach
− Evaluate the performances of the model
− What are the most important variables ? (variable selection)→ sparse models, less complex, best performances
• Predictive approach
− Inference and simulation→ Ponctual estimation for new values of the co-variables→ Confidence interval computation.
![Page 17: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/17.jpg)
Outline
Outline
• Applications
• Ordinary Least Square (OLS) / Moindre Carres Ordinaires (MCO)
• Linear Model
• Regularization methods : ridge, lasso
Mathilde Mougeot (ENSIIE) MRR2017 16 / 66
![Page 18: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/18.jpg)
OLS
Ordinary Least Square (OLS)
![Page 19: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/19.jpg)
Outline
Ordinary Least Square (OLS)
• Values/Variables :
- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates
• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}
• Goal : Modeling Y linearly with X , with a ”small” ε term
• Y = β1 + β2X2 + · · ·+ βpXp+ ε
• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε
• Y =∑p
j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)
Mathilde Mougeot (ENSIIE) MRR2017 18 / 66
![Page 20: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/20.jpg)
Outline
Ordinary Least Square (OLS)
• Values/Variables :
- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates
• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}
• Goal : Modeling Y linearly with X , with a ”small” ε term
• Y = β1 + β2X2 + · · ·+ βpXp+ ε
• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε
• Y =∑p
j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)
Mathilde Mougeot (ENSIIE) MRR2017 18 / 66
![Page 21: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/21.jpg)
Outline
Ordinary Least Square (OLS)
• Values/Variables :
- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates
• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}
• Goal : Modeling Y linearly with X , with a ”small” ε term
• Y = β1 + β2X2 + · · ·+ βpXp+ ε
• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε
• Y =∑p
j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)
Mathilde Mougeot (ENSIIE) MRR2017 18 / 66
![Page 22: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/22.jpg)
OLS
Ordinary Least Square (OLS)Simple Linear Regression model
![Page 23: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/23.jpg)
Outline
Simple Linear Regression : exampleWe only have one co-variable (X) to explain the target variable (Y).The scatter plot is represented by :
++++++
+
++
+
+
+
++
+
++++
+
+
+
+++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
++++
++
+
+
+
+
+
+++
+
+
+
+
++
++
+
+
+
+
++
+
++
+
+
+
+
++
++
+
+
+
+
+
+
++
+
5 6 7 8 9 10
12
16
20
24
x
y
Mathilde Mougeot (ENSIIE) MRR2017 20 / 66
![Page 24: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/24.jpg)
Outline
Simple Linear Regression : example
For all observation couples i , 1 ≤ i ≤ n (Yi ,Xi ),the goal is here to minimize (Yi − (β1 + β2Xi ))2
++++++
+
++
+
+
+
++
+
++++
+
+
+
+++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
++++
++
+
+
+
+
+
+++
+
+
+
+
++
++
+
+
+
+
++
+
++
+
+
+
+
++
++
+
+
+
+
+
+
++
+
5 6 7 8 9 10
12
16
20
24
x
y
Régression
Mathilde Mougeot (ENSIIE) MRR2017 21 / 66
![Page 25: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/25.jpg)
Outline
Ordinary Least Square : simple linear model
Formalism :
• Distance to a single point : (yi − xiβ2 − β1)2
• Distance to the whole sample :∑n
i=1(yi − xiβ2 − β1)2
→ Best line : intercept β1 and slope β2 such that∑n
i=1(yi − xiβ2 − β1)2 isminimum, among all possible values of β1 and β2.
OLS estimator :The values estimated by OLS (the estimates) for β1 and β2 verify :
(βols1 , βols
2 ) = argminβ1,β2∈R2
{n∑
i=1
(yi − xiβ2 − β1)2}
Mathilde Mougeot (ENSIIE) MRR2017 22 / 66
![Page 26: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/26.jpg)
Outline
Ordinary Least Square : simple linear modelOLS estimator (observations) :The values estimated by OLS (the estimates) for β1 and β2 verify :
(βols1 , βols
2 ) = argminβ1,β2∈R2
{n∑
i=1
(yi − xiβ2 − β1)2}
OLS estimator (vector notations) : y , x , 1n
The values estimated by OLS (the estimates) for β1 and β2 verify :
(βols1 , βols
2 ) = argminβ1,β2∈R2
‖y − xβ2 − 1nβ1‖22
OLS estimator (matrix notations Y (n,1) ; X (n,2)) :The values estimated by OLS (the estimates) for β1 and β2 verify :
(βols1 , βols
2 ) = argminβ1,β2∈R2
‖Y − Xβ‖22
Mathilde Mougeot (ENSIIE) MRR2017 23 / 66
![Page 27: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/27.jpg)
Outline
Ordinary Least Square : simple linear model
Theorem :The OLS estimators have the following expressions :
β1 = y − β2x
β2 =∑n
i=1(xi−x)(yi−y)∑2i=1(xi−x)2 = Cov(x ,y)
Var(x)
Proof :by zeroing the derivative of the objective function, which is convex.
Mathilde Mougeot (ENSIIE) MRR2017 24 / 66
![Page 28: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/28.jpg)
Outline
Ordinary Least Square : simple linear model
For the simple linear model, the correlation coefficient may be very useful :
• r(x , y) : correlation coefficient/ coefficient de correlation lineaire
r(x , y) = cov(X ,Y )√var(x)√
var(y)
• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X
R-square used in multiple regression
• R2 = VarYVar(Y )
• R2 ∈ [0, 1]
• Simple regression R2 = r2 :
Mathilde Mougeot (ENSIIE) MRR2017 25 / 66
![Page 29: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/29.jpg)
Outline
Ordinary Least Square : simple linear model
For the simple linear model, the correlation coefficient may be very useful :
• r(x , y) : correlation coefficient/ coefficient de correlation lineaire
r(x , y) = cov(X ,Y )√var(x)√
var(y)
• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X
R-square used in multiple regression
• R2 = VarYVar(Y )
• R2 ∈ [0, 1]
• Simple regression R2 = r2 :
Mathilde Mougeot (ENSIIE) MRR2017 25 / 66
![Page 30: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/30.jpg)
Outline
Ordinary Least Square : simple linear model
For the simple linear model, the correlation coefficient may be very useful :
• r(x , y) : correlation coefficient/ coefficient de correlation lineaire
r(x , y) = cov(X ,Y )√var(x)√
var(y)
• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X
R-square used in multiple regression
• R2 = VarYVar(Y )
• R2 ∈ [0, 1]
• Simple regression R2 = r2 :
Mathilde Mougeot (ENSIIE) MRR2017 25 / 66
![Page 31: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/31.jpg)
Outline
The value of a correlation coif. may be non informative
Illustration of two joint distribution with a correlation coefficient equaled to 0.999
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
n=500, correlation 0.9998
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
x
y
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
n=500,correlation 0.9990
x
y
→ A scatter plot shows two very different distribution.Always look at the data ! !
Always study carefully the data
Mathilde Mougeot (ENSIIE) MRR2017 26 / 66
![Page 32: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/32.jpg)
OLS
Ordinary Least Square (OLS)Multiple Linear Regression model
![Page 33: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/33.jpg)
Outline
Ordinary Least Square : multiple linear model
• We suppose Y =∑p
j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}
• The Quadratic error is defined by :
E (β) =∑n
i ε2i =
∑ni (yi −
∑j x
ji βj)
2
with matrix notation :
E (β) = (Y − Xβ)T (Y − Xβ)
• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :
β = argminβ∈Rp E (β)
Mathilde Mougeot (ENSIIE) MRR2017 28 / 66
![Page 34: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/34.jpg)
Outline
Ordinary Least Square : multiple linear model
• We suppose Y =∑p
j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}
• The Quadratic error is defined by :
E (β) =∑n
i ε2i =
∑ni (yi −
∑j x
ji βj)
2
with matrix notation :
E (β) = (Y − Xβ)T (Y − Xβ)
• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :
β = argminβ∈Rp E (β)
Mathilde Mougeot (ENSIIE) MRR2017 28 / 66
![Page 35: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/35.jpg)
Outline
Ordinary Least Square : multiple linear model
• We suppose Y =∑p
j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}
• The Quadratic error is defined by :
E (β) =∑n
i ε2i =
∑ni (yi −
∑j x
ji βj)
2
with matrix notation :
E (β) = (Y − Xβ)T (Y − Xβ)
• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :
β = argminβ∈Rp E (β)
Mathilde Mougeot (ENSIIE) MRR2017 28 / 66
![Page 36: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/36.jpg)
Outline
Ordinary Least Square. multiple regression model
• We aim to compute β which minimize :
E (β) = ||Y − Xβ||22= (Y − Xβ)T (Y − Xβ)
• Assumption : XTX inversible. (n ≥ p)
Theorem :
βMCO = (XTX )−1XTY
Mathilde Mougeot (ENSIIE) MRR2017 29 / 66
![Page 37: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/37.jpg)
Outline
MCO
• Estimation
β = (XTX )−1XTY
• Prediction Knowing β and given X1, . . . ,Xp,
the prediction of the target can be computed : Y =∑
j βjXj
Y = X β= X (XTX )−1XTY
• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT
P2 = P
• Residuals
- ε = Y − Y• Remark : no assumption on the law or distribution of ε
Mathilde Mougeot (ENSIIE) MRR2017 30 / 66
![Page 38: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/38.jpg)
Outline
MCO
• Estimation
β = (XTX )−1XTY
• Prediction Knowing β and given X1, . . . ,Xp,
the prediction of the target can be computed : Y =∑
j βjXj
Y = X β= X (XTX )−1XTY
• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT
P2 = P
• Residuals
- ε = Y − Y• Remark : no assumption on the law or distribution of ε
Mathilde Mougeot (ENSIIE) MRR2017 30 / 66
![Page 39: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/39.jpg)
Outline
MCO
• Estimation
β = (XTX )−1XTY
• Prediction Knowing β and given X1, . . . ,Xp,
the prediction of the target can be computed : Y =∑
j βjXj
Y = X β= X (XTX )−1XTY
• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT
P2 = P
• Residuals
- ε = Y − Y• Remark : no assumption on the law or distribution of ε
Mathilde Mougeot (ENSIIE) MRR2017 30 / 66
![Page 40: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/40.jpg)
Outline
MCO
• Estimation
β = (XTX )−1XTY
• Prediction Knowing β and given X1, . . . ,Xp,
the prediction of the target can be computed : Y =∑
j βjXj
Y = X β= X (XTX )−1XTY
• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT
P2 = P
• Residuals
- ε = Y − Y• Remark : no assumption on the law or distribution of ε
Mathilde Mougeot (ENSIIE) MRR2017 30 / 66
![Page 41: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/41.jpg)
Outline
Ordinary Least Square. Geometrical interpretation
Y =∑p
j Xjβj + ε
∈ Rn ∈ Rp ∈ R(n−p)
Mathilde Mougeot (ENSIIE) MRR2017 31 / 66
![Page 42: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/42.jpg)
Outline Properties
Ordinary Least Square. Properties
• Orthogonality :
- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0
• Residual average :
-∑
i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan
- ¯Y = Y
• Analysis of Variance -ANAVAR- (Pythagore)
- var(Y ) = var(Y ) + var(E)
Mathilde Mougeot (ENSIIE) MRR2017 32 / 66
![Page 43: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/43.jpg)
Outline Properties
Ordinary Least Square. Properties
• Orthogonality :
- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0
• Residual average :
-∑
i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan
- ¯Y = Y
• Analysis of Variance -ANAVAR- (Pythagore)
- var(Y ) = var(Y ) + var(E)
Mathilde Mougeot (ENSIIE) MRR2017 32 / 66
![Page 44: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/44.jpg)
Outline Properties
Ordinary Least Square. Properties
• Orthogonality :
- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0
• Residual average :
-∑
i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan
- ¯Y = Y
• Analysis of Variance -ANAVAR- (Pythagore)
- var(Y ) = var(Y ) + var(E)
Mathilde Mougeot (ENSIIE) MRR2017 32 / 66
![Page 45: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/45.jpg)
Outline Adjustment
Multiple Linear model : example with R
head(mydata,3) ;y x1 x2 x31 -2.20 0.38 0.98 0.462 -1.75 0.11 0.62 0.373 -0.24 0.80 0.59 0.87...> modlm=lm(y ˜ x1+x2+x3,data=mydata) ;Call :lm(formula = y ˜ x1+x2+x3, data = mydata)Coefficients :(Intercept) x1 x2 x30.02754 1.98163 -3.03612 0.01903
Mathilde Mougeot (ENSIIE) MRR2017 33 / 66
![Page 46: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/46.jpg)
Outline Adjustment
Multiple Linear model : example with R
> modlm=lm(y ˜ x1+x2+x3,data=mydata) ;> summary(modlm)lm(formula = y ˜ x1+x2+x3, data = mydata)Residuals :Min 1Q Median 3Q Max-0.29 -0.075 -0.0035 0.073 0.281Coefficients :Estimate Std. Error t value Pr(>|t|)(Intercept) 0.02754 0.01503 1.833 0.0674 .x1 1.98163 0.01577 125.652 <2e-16 ***x2 -3.03612 0.01621 -187.286 <2e-16 ***x3 0.01903 0.01576 1.208 0.2277— Signif. codes : 0 /*** 0.001 /** 0.01 / * 0.05 / . 0.1/ 1Residual standard error : 0.1009 on 496 degrees of freedomMultiple R-squared : 0.9904, Adjusted R-squared : 0.9904F-statistic : 1.707e+04 on 3 and 496 DF, p-value : < 2.2e-16
Mathilde Mougeot (ENSIIE) MRR2017 34 / 66
![Page 47: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/47.jpg)
Outline Adjustment
Ordinary Least Square, quality of the adjustment
• R2, R-square (French : coefficient de determination)
• R2 = var(Y )var(Y )
, remark : no unit
• cos2w = R2 =||Y− ¯Y1,n||2
||Y−Y1,n||2
w : angle between the centered vector (Y − Y1,n) and its centered prediction
(Y − ˆY1,n)
• var(E ) = 1n
∑ni=1(yi − yi )
2
= (1− R2)var(Y ), unit of Y 2
Mathilde Mougeot (ENSIIE) MRR2017 35 / 66
![Page 48: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/48.jpg)
Outline Adjustment
Ordinary Least Square, quality of the adjustment
• R2, R-square (French : coefficient de determination)
• R2 = var(Y )var(Y )
, remark : no unit
• cos2w = R2 =||Y− ¯Y1,n||2
||Y−Y1,n||2
w : angle between the centered vector (Y − Y1,n) and its centered prediction
(Y − ˆY1,n)
• var(E ) = 1n
∑ni=1(yi − yi )
2
= (1− R2)var(Y ), unit of Y 2
Mathilde Mougeot (ENSIIE) MRR2017 35 / 66
![Page 49: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/49.jpg)
Outline Adjustment
Ordinary Least Square, quality of the adjustment
• R2, R-square (French : coefficient de determination)
• R2 = var(Y )var(Y )
, remark : no unit
• cos2w = R2 =||Y− ¯Y1,n||2
||Y−Y1,n||2
w : angle between the centered vector (Y − Y1,n) and its centered prediction
(Y − ˆY1,n)
• var(E ) = 1n
∑ni=1(yi − yi )
2
= (1− R2)var(Y ), unit of Y 2
Mathilde Mougeot (ENSIIE) MRR2017 35 / 66
![Page 50: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/50.jpg)
Outline Adjustment
Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)
• R2 = cos2 ω = VarYVar(Y )
• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables
• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2
adj = 1− (1− R2) n−1n−p
• R2adj may be negtive
• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of
• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model
• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.
Mathilde Mougeot (ENSIIE) MRR2017 36 / 66
![Page 51: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/51.jpg)
Outline Adjustment
Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)
• R2 = cos2 ω = VarYVar(Y )
• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables
• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2
adj = 1− (1− R2) n−1n−p
• R2adj may be negtive
• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of
• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model
• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.
Mathilde Mougeot (ENSIIE) MRR2017 36 / 66
![Page 52: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/52.jpg)
Outline Adjustment
Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)
• R2 = cos2 ω = VarYVar(Y )
• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables
• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2
adj = 1− (1− R2) n−1n−p
• R2adj may be negtive
• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of
• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model
• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.
Mathilde Mougeot (ENSIIE) MRR2017 36 / 66
![Page 53: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/53.jpg)
Outline Adjustment
Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)
• R2 = cos2 ω = VarYVar(Y )
• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables
• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2
adj = 1− (1− R2) n−1n−p
• R2adj may be negtive
• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of
• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model
• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.
Mathilde Mougeot (ENSIIE) MRR2017 36 / 66
![Page 54: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/54.jpg)
Outline Adjustment
Ordinary Least Square : Quality of the Adjustement
Graphics (yi , yi ) 1 ≤ i ≤ j VERY USEFUL
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
modlm$fit
y
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
modlm$fit
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
modlm$fit
y
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
● ●
●
●●●
●●●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
● ●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
modlm$fit
y
Mathilde Mougeot (ENSIIE) MRR2017 37 / 66
![Page 55: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/55.jpg)
Outline Adjustment
Ordinary Least Square : Student Residual graphεiSE
= yi−yiSE
(no unit term)
+
+++
+
++
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
++
+
+
+
+
+
++
+
++
+++++
++
+
++
+
+
+
+
+
+
+
+
+
+
++
++
+
++
+
++
+++
+
+
++
++
+
+
++
+
++
++
++
++
++
+
+
+
+
+
++
+
++
12 14 16 18 20 22 24
−4
−2
02
4
y
E
Residual graph
→ Random distribution. There is no information to be capture
Mathilde Mougeot (ENSIIE) MRR2017 38 / 66
![Page 56: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/56.jpg)
Outline Adjustment
Ordinary Least Square : Student Residual graphεiSE
= yi−yiSE
(with non unit)
++++
++++
+++++
+++
+
+++++
+
+++
+
++
+++++
+++++++
++++++++++++++++++++
++++++++
++++++++++++++++++
+++++++
++++
++
0 20 40 60 80 100
−3
0−
10
01
02
03
0
Index
E
o
oo
Residual graph function of Y
→ Large values for some points ? Outliers detection ?
Mathilde Mougeot (ENSIIE) MRR2017 39 / 66
![Page 57: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/57.jpg)
Outline Adjustment
Ordinary Least Square : Student Residual graphεiSE
= yi−yiSE
(with non unit)
+++
++++++
+
+
++
+
++
+++ ++
+
+
+
+
+
+
+
+
++
++
+
+
+
++
+ +
++++
+
+++
+
+++++
+
+
+
+
+++
++++
+++
+++
+++
+
++
+
+
+
++
++
+
++++
++
+
+
+
+
++
+
+
+
50 100 150 200
−1
0−
50
51
0
y2
E2
Graphe des residus en fonction de Y
→ there is still some information in the residuals.→ The model needs to be changed.
Mathilde Mougeot (ENSIIE) MRR2017 40 / 66
![Page 58: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/58.jpg)
Outline Adjustment
Ordinary Least Square : curse of dimension
Data set : {(yi , xi )1 ≤ i ≤ n}. One target variable, one covariable
+
+
+
+
+
+
++
++
+
+
+
+ +
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
++
++
+++
++
+
+
+
+
+
+
+
++
+
+
−2 −1 0 1 2
−2−1
01
2
x1
y
R2= 0.07
Y:N(0,1) X:N(0,1)
n=50, p=1+1
→ R2 =, R2adj = −0.02
Mathilde Mougeot (ENSIIE) MRR2017 41 / 66
![Page 59: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/59.jpg)
Outline Adjustment
Ordinary Least Square : illustration of the impact of thenumber of covariables on the modelinitial data and OLS line
+
+
+ +
+
+
++
+
++
+
++
+
++
+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+ ++
+
+
++
+
+
+
+
+
+
+
+
−2 −1 0 1 2
−0.6
−0.4
−0.2
0.0
0.2
0.4
y
mco
$fit
Y:N(0,1) X:N(0,1)
n=50, p=1+1
R2= 0.07
→ R2 =, R2adj = −0.02
Mathilde Mougeot (ENSIIE) MRR2017 42 / 66
![Page 60: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/60.jpg)
Outline Adjustment
Ordinary Least Square : illustration of the impact of thenumber of covariables on the model
initial data and 48 more covariables N (0, 1) are added to the initial data set.
+
+
+
+
+
+
+
+++
+
++
+
+
+
+
+
+
+
+
+++
++
++
+
++
++
++
+
+
++
+
++
+
+
+
+
+
+
+
+
−3 −2 −1 0 1 2
−3−2
−10
12
y
mco
$fit
Y:N(0,1) X:N(0,1)
n=50, p=1+48
R2= 1.00
→ R2 = 0.99, R2adj = 0.93
Mathilde Mougeot (ENSIIE) MRR2017 43 / 66
![Page 61: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/61.jpg)
Outline Adjustment
Ordinary Least Square : need to change the model (1/2)
initial data set :
Mathilde Mougeot (ENSIIE) MRR2017 44 / 66
![Page 62: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/62.jpg)
Outline Adjustment
Ordinary Least Square : need to change the model (1/2)
logarithmic transformation
Mathilde Mougeot (ENSIIE) MRR2017 45 / 66
![Page 63: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/63.jpg)
Outline Adjustment
MCO Regression. Some limits :
If XTX is non inversible
• n >> p, collinearity between some Xj .• Pseudo-inverse, the solution is not unique• Variable selection
• p >> n, when the number of variables is larger than the number ofobservations• Regularization method• Ridge -L2-, Lasso -L1-.• Variable selection
OLS modelPonctual estimation.
Mathilde Mougeot (ENSIIE) MRR2017 46 / 66
![Page 64: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/64.jpg)
Outline Adjustment
OLS, XTX non inversible →
Pseudo inverse computation
Solution (n > p), XTX is non invertible with the rank k, k < p :
XTX = UΣ2UT
=U
σ2
1 0 0 0
0... 0 0
0 0 σ2k 0
0 0 0 0
UT
= UkΣ2kU
Tk
(XTX )∗−1 = UkΣ2k−1
UTk avec Σ2
k =
σ21 0 0
0... 0
0 0 σ2k
β = (XTX )∗−1XTYThe solution non unique
Mathilde Mougeot (ENSIIE) MRR2017 47 / 66
![Page 65: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/65.jpg)
Modele Lineaire
Outline
• Motivations
• Ordinary Least Square
• Linear Model
• Penalized regression, ridge, lasso
Mathilde Mougeot (ENSIIE) MRR2017 48 / 66
![Page 66: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/66.jpg)
Linear Model
Probabilistic assumption on the residuals
The probabilistic assumption on the residuals let to introduce :- statistical tests on the value of the coefficients (to study is a variable is useful ornot to explain the target),-to test if the general liner model is useful (all coefficients equal zero).
![Page 67: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/67.jpg)
Linear model
• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)
• We have
- εi = Yi −∑
X ji βj avec f (ε) = 1√
2πσe− ε2
2σ2 i.i.d.
• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =
∏i f (εi )
= 1(2π)n/2σn e
−∑ε2i
2σ2
= 1(2π)n/2σ2n/2 e
− ||Y−Xβ||22σ2
• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :
- β = (XTX )−1XTY
- σ2 = 1n
∑i=ni=1 ε
2i
![Page 68: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/68.jpg)
Linear model
• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)
• We have
- εi = Yi −∑
X ji βj avec f (ε) = 1√
2πσe− ε2
2σ2 i.i.d.
• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =
∏i f (εi )
= 1(2π)n/2σn e
−∑ε2i
2σ2
= 1(2π)n/2σ2n/2 e
− ||Y−Xβ||22σ2
• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :
- β = (XTX )−1XTY
- σ2 = 1n
∑i=ni=1 ε
2i
![Page 69: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/69.jpg)
Linear model
• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)
• We have
- εi = Yi −∑
X ji βj avec f (ε) = 1√
2πσe− ε2
2σ2 i.i.d.
• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =
∏i f (εi )
= 1(2π)n/2σn e
−∑ε2i
2σ2
= 1(2π)n/2σ2n/2 e
− ||Y−Xβ||22σ2
• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :
- β = (XTX )−1XTY
- σ2 = 1n
∑i=ni=1 ε
2i
![Page 70: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/70.jpg)
Linear model
• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)
• We have
- εi = Yi −∑
X ji βj avec f (ε) = 1√
2πσe− ε2
2σ2 i.i.d.
• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =
∏i f (εi )
= 1(2π)n/2σn e
−∑ε2i
2σ2
= 1(2π)n/2σ2n/2 e
− ||Y−Xβ||22σ2
• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :
- β = (XTX )−1XTY
- σ2 = 1n
∑i=ni=1 ε
2i
![Page 71: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/71.jpg)
Linear model : What are the laws of the estimators ?
Y = Xβ + ε avec ε ∼ N (0, σ2)
Law of the estimators :• Law of β ?
• Law of Y ?
• Law of σ2 ?
Benefits→ let to compute confidence intervals for β and Y .→ let to test the parameters.
![Page 72: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/72.jpg)
Law of β :
β ∼ N (β, σ2(XTX )−1)
Expectation and Variance of β ? :
β = (XTX )−1XTY et Y = Xβ + ε
• E(β) = β Non biased estimator E(β)− β = 0
• Var(β) = σ2(XTX )−1
Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1
= (XTX )−1XT ([var(ε)]X (XTX )−1
= σ2(XTX )−1
• E[(β − β)2] = Var(β) + 0
Recall Var(aY ) = aVar(Y )aT
![Page 73: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/73.jpg)
Law of β :
β ∼ N (β, σ2(XTX )−1)
Expectation and Variance of β ? :
β = (XTX )−1XTY et Y = Xβ + ε
• E(β) = β Non biased estimator E(β)− β = 0
• Var(β) = σ2(XTX )−1
Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1
= (XTX )−1XT ([var(ε)]X (XTX )−1
= σ2(XTX )−1
• E[(β − β)2] = Var(β) + 0
Recall Var(aY ) = aVar(Y )aT
![Page 74: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/74.jpg)
Law of β :
β ∼ N (β, σ2(XTX )−1)
Expectation and Variance of β ? :
β = (XTX )−1XTY et Y = Xβ + ε
• E(β) = β Non biased estimator E(β)− β = 0
• Var(β) = σ2(XTX )−1
Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1
= (XTX )−1XT ([var(ε)]X (XTX )−1
= σ2(XTX )−1
• E[(β − β)2] = Var(β) + 0
Recall Var(aY ) = aVar(Y )aT
![Page 75: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/75.jpg)
Law of β :
β ∼ N (β, σ2(XTX )−1)
Expectation and Variance of β ? :
β = (XTX )−1XTY et Y = Xβ + ε
• E(β) = β Non biased estimator E(β)− β = 0
• Var(β) = σ2(XTX )−1
Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1
= (XTX )−1XT ([var(ε)]X (XTX )−1
= σ2(XTX )−1
• E[(β − β)2] = Var(β) + 0
Recall Var(aY ) = aVar(Y )aT
![Page 76: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/76.jpg)
Law of Y
Y ∼ N (Xβ, σ2X (XTX )−1XT )
Expectation and Variance of Y ?, Y = X β
• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )
• Var(Y ) = σ2 X (XTX )−1 XT
Var(Y ) = Var(X β)
= X Var(β) XT
= σ2 X (XTX )−1 XT
![Page 77: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/77.jpg)
Law of Y
Y ∼ N (Xβ, σ2X (XTX )−1XT )
Expectation and Variance of Y ?, Y = X β
• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )
• Var(Y ) = σ2 X (XTX )−1 XT
Var(Y ) = Var(X β)
= X Var(β) XT
= σ2 X (XTX )−1 XT
![Page 78: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/78.jpg)
Law of Y
Y ∼ N (Xβ, σ2X (XTX )−1XT )
Expectation and Variance of Y ?, Y = X β
• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )
• Var(Y ) = σ2 X (XTX )−1 XT
Var(Y ) = Var(X β)
= X Var(β) XT
= σ2 X (XTX )−1 XT
![Page 79: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/79.jpg)
Law of ε
ε ∼ N (0, σ2(In − X (XTX )−1XT )
Expectation and Variance of ε = Y − Y ? :
• E(ε) = 0
• Var(ε) = σ2(In − X (XTX )−1XT )
Var(ε) = Var(Y − Y )
= Var(Y − X β)
= σ2(In)− XVar(β)XT
= σ2(In − X (XTX )−1XT )
Recal : Var(aY ) = aVar(Y )aT
![Page 80: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/80.jpg)
Law of ε
ε ∼ N (0, σ2(In − X (XTX )−1XT )
Expectation and Variance of ε = Y − Y ? :
• E(ε) = 0
• Var(ε) = σ2(In − X (XTX )−1XT )
Var(ε) = Var(Y − Y )
= Var(Y − X β)
= σ2(In)− XVar(β)XT
= σ2(In − X (XTX )−1XT )
Recal : Var(aY ) = aVar(Y )aT
![Page 81: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/81.jpg)
Law of ε
ε ∼ N (0, σ2(In − X (XTX )−1XT )
Expectation and Variance of ε = Y − Y ? :
• E(ε) = 0
• Var(ε) = σ2(In − X (XTX )−1XT )
Var(ε) = Var(Y − Y )
= Var(Y − X β)
= σ2(In)− XVar(β)XT
= σ2(In − X (XTX )−1XT )
Recal : Var(aY ) = aVar(Y )aT
![Page 82: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/82.jpg)
Linear model : law of the estimators
Under the assumption that εi are i.i.d. with εi ∼ N (0, σ2)
Theoremif p ≤ n and XTX inversible,
The vector
(βε
)of dimension (p + n) is a gaussian vector
with mean
(β0
), and
and variance σ2
((XTX )−1 0
0 In − X (XTX )−1XT
)
![Page 83: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/83.jpg)
Loi σ2
n−pσ2 σ
2 ∼ χ2n−p
We note : σ2 = ||ε||2n−p
||ε||2 =∑n
i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)
Then, the expectation of σ2 = ||ε||2n−p is σ2,
(E(χ2(n − p)) = n − p)
We deduce the law of σ2 :σ2 ∼ σ2
n−pχ2(n − p)
Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√
V/dfollows a Student law of parameter d .
![Page 84: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/84.jpg)
Loi σ2
n−pσ2 σ
2 ∼ χ2n−p
We note : σ2 = ||ε||2n−p
||ε||2 =∑n
i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)
Then, the expectation of σ2 = ||ε||2n−p is σ2,
(E(χ2(n − p)) = n − p)
We deduce the law of σ2 :σ2 ∼ σ2
n−pχ2(n − p)
Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√
V/dfollows a Student law of parameter d .
![Page 85: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/85.jpg)
Loi σ2
n−pσ2 σ
2 ∼ χ2n−p
We note : σ2 = ||ε||2n−p
||ε||2 =∑n
i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)
Then, the expectation of σ2 = ||ε||2n−p is σ2,
(E(χ2(n − p)) = n − p)
We deduce the law of σ2 :σ2 ∼ σ2
n−pχ2(n − p)
Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√
V/dfollows a Student law of parameter d .
![Page 86: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/86.jpg)
Significativity test of βj , σ2 inconnu
• Student Statistics : T
• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0
• Decision with a risk α, Reject of H0 if
• βj√σ2Sj,j
> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1
• pvalue < α
• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele
Not true if there exists colinearity between the variables
![Page 87: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/87.jpg)
Significativity test of βj , σ2 inconnu
• Student Statistics : T
• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0
• Decision with a risk α, Reject of H0 if
• βj√σ2Sj,j
> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1
• pvalue < α
• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele
Not true if there exists colinearity between the variables
![Page 88: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/88.jpg)
Significativity test of βj , σ2 inconnu
• Student Statistics : T
• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0
• Decision with a risk α, Reject of H0 if
• βj√σ2Sj,j
> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1
• pvalue < α
• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele
Not true if there exists colinearity between the variables
![Page 89: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/89.jpg)
Significativity test of βj , σ2 inconnu
• Student Statistics : T
• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0
• Decision with a risk α, Reject of H0 if
• βj√σ2Sj,j
> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1
• pvalue < α
• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele
Not true if there exists colinearity between the variables
![Page 90: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/90.jpg)
Significativity test of βj , σ2 inconnu
• Student Statistics : T
• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0
• Decision with a risk α, Reject of H0 if
• βj√σ2Sj,j
> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1
• pvalue < α
• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele
Not true if there exists colinearity between the variables
![Page 91: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/91.jpg)
Global significativity of the model
Test of the model with a risk α
H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0
Statistics
F = n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 ∼ Fisher(p − 1, n − p)
Remarque : n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)
Decision rule
• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”
• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”
![Page 92: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/92.jpg)
Global significativity of the model
Test of the model with a risk α
H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0
Statistics
F = n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 ∼ Fisher(p − 1, n − p)
Remarque : n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)
Decision rule
• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”
• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”
![Page 93: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/93.jpg)
Global significativity of the model
Test of the model with a risk α
H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0
Statistics
F = n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 ∼ Fisher(p − 1, n − p)
Remarque : n−pp−1
||Y− ¯Y ||2
||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)
Decision rule
• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”
• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”
![Page 94: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/94.jpg)
Global significativity of the model
• Fisher Statistic
• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0
• Decision with a rish α, Reject H0 if
• si n−pp−1
R2
1−R2 > fp−1,n−p(1− α)• si pvalue < α
→ The linear model has an added value
![Page 95: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/95.jpg)
Global significativity of the model
• Fisher Statistic
• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0
• Decision with a rish α, Reject H0 if
• si n−pp−1
R2
1−R2 > fp−1,n−p(1− α)• si pvalue < α
→ The linear model has an added value
![Page 96: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/96.jpg)
Global significativity of the model
• Fisher Statistic
• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0
• Decision with a rish α, Reject H0 if
• si n−pp−1
R2
1−R2 > fp−1,n−p(1− α)• si pvalue < α
→ The linear model has an added value
![Page 97: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/97.jpg)
Global significativity of the model
Remarque1 : sur la statistique de Fisher :
F = n−pp−1
R2
1−R2
The R2 coefficient increase mechanically with the number of variables
Remarque : the adjusted R2 may be used
R2adj = 1− (1−R2)(n−1)
n−p
The R2adj does not increase ith the number of variables.
![Page 98: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/98.jpg)
Global significativity of the model
Remarque1 : sur la statistique de Fisher :
F = n−pp−1
R2
1−R2
The R2 coefficient increase mechanically with the number of variables
Remarque : the adjusted R2 may be used
R2adj = 1− (1−R2)(n−1)
n−p
The R2adj does not increase ith the number of variables.
![Page 99: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/99.jpg)
Boston Housing Data
The original data are n = 506 observations on p = 14 variables,medv being the target variable
crim per capita crime rate by townzn proportion of residential land zoned for lots over 25,000 sq.ftindus proportion of non-retail business acres per townchas Charles River dummy variable (= 1 if tract bounds river ; 0 otherwise)nox nitric oxides concentration (parts per 10 million)rm average number of rooms per dwellingage proportion of owner-occupied units built prior to 1940dis weighted distances to five Boston employment centresrad index of accessibility to radial highwaystax full-value property-tax rate per USD 10,000ptratio pupil-teacher ratio by townb 1000(B − 0.63)2 where B is the proportion of blacks by townlstat percentage of lower status of the populationmedv median value of owner-occupied homes in USD 1000’s
![Page 100: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/100.jpg)
Boston Housing Data
Les donnees :nr crim zn indus chas nox rm age dis rad tax ptratio b lstat medv1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.02 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.63 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.74 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.45 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
![Page 101: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/101.jpg)
Boston Housing Data
MCO sous Rlibrary(mlbench)
#Data data(BostonHousing) tab=BostonHousing;names(tab)
target="medv"; Y=tab[,target]; X=tab[,names(tab)!=target]; names(X)
#MCO resfit=lsfit(x=X,y=Y,intercept=T);
resfit$coef hist(resfit$res)Cst crim zn indus chas nox rm age dis rad tax ptratio b lstat medv36.45 -0.10 0.046 0.020 2.68 -17.76 3.80 0.00 -1.47 0.30 -0.01 -0.95
![Page 102: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/102.jpg)
Boston Housing Data
Modele Lineaire sous Rcode Rreslm=lm(medv ~ .,data=tab); summary(reslm) Resultats :n = 506, p = 14Residuals:
Min 1Q Median 3Q Max
-15.595 -2.730 -0.518 1.777 26.199
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
crim -1.080e-01 3.286e-02 -3.287 0.001087 **
zn 4.642e-02 1.373e-02 3.382 0.000778 ***
indus 2.056e-02 6.150e-02 0.334 0.738288
chas1 2.687e+00 8.616e-01 3.118 0.001925 **
nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***
age 6.922e-04 1.321e-02 0.052 0.958229
dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***
rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***
tax -1.233e-02 3.760e-03 -3.280 0.001112 **
ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***
b 9.312e-03 2.686e-03 3.467 0.000573 ***
lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***
Signif. codes: 0 *** 0.001/ ** 0.01 /* 0.05 /. 0.1 / 1
Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16
![Page 103: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/103.jpg)
Precautions
• Multicolinearitela solution des MCO necessite de calculer (XTX )−1.Lorsque le determinant de cette matrice est tres proche de zero, le problemeest mal conditionne.
• Choix des variablesLe coefficient de determination R2 augmente en fonction du nombre devariables.si p = n R2 = 1, ce qui n’est pas forcement pertinent.
![Page 104: Regularization Methods for Linear Regression](https://reader033.fdocuments.in/reader033/viewer/2022050516/627269cf5a710f18f301ad60/html5/thumbnails/104.jpg)
Demonstration sous R