Sparse Inverse Covariance Estimation with the Graphical
LassoPaper by: Jerome Friedman, Trevor Hastie, and Robert Tibshirani
Presented by: Joseph Lubars
Preliminaries: LASSO
• We have a model 𝑌 = 𝑋𝛽 + 𝐸, where 𝐸 ∼ 𝑁 0, 𝜎2𝐼
• 𝑌 ∈ ℝ𝑝 is a vector of observations, X ∈ ℝ𝑛×𝑝 an observation matrix
• Want to estimate 𝛽
• Use MLE (Least Squares):
• Closed form solution: 𝛽 = 𝑋𝑇𝑋 −1𝑋𝑇𝑌
• What if 𝑝 > 𝑛? What if we want sparsity?
𝛽 = argmin𝛽
𝑌 − 𝑋𝛽 22
LASSO (Continued)
• What if we want sparsity in our solution? One attempt:
• This problem is not convex and is very difficult to solve.
• We relax the norm to the 𝐿1 norm and use the Lagrangian version:
• This is called LASSO!
• Can be solved efficiently using coordinate descent
𝛽 = argmin𝛽 𝑌 − 𝑋𝛽 22 s.t. 𝛽 0 = 𝑐
𝛽 = argmin𝛽 𝑌 − 𝑋𝛽 22 + 𝜆 𝛽 1
Recall: Gaussian Graphical Models
• We have a Gaussian random vector 𝑥 ∈ 𝑅𝑝
• Covariance Form: 𝑥 = 𝑁(𝜇, Σ)
• Information Form: 𝑥 = 𝑁−1(ℎ, Θ)
• Θ = Σ−1 encodes conditional independences
• We find the graph structure by estimating the non-zero entries of Θ
Estimating Θ
• Given data 𝑋 ∈ ℝ𝑛×𝑝 with 𝑛 observations of 𝑥 ~𝑁 0, Σ
• We can calculate the empirical covariance matrix: 𝑆 =1
𝑛𝑋𝑇𝑋
• Goal: Estimate Θ = Σ−1
• Log-Likelihood for Θ:
• Maximized when Θ = 𝑆−1
log det Θ − 𝑡𝑟(𝑆Θ)
Graphical Lasso Formulation
• Want to encourage sparsity in Θ
• 𝐿1 penalized formulation:
• We want to maximize this over non-negative definite matrices Θ
• This is an SDP, but we want to solve it more efficiently
log det Θ − 𝑡𝑟 𝑆Θ − 𝜆 Θ 1
Optimality Conditions
• Recall the objective:
• KKT Conditions:
• Γ𝑖𝑗 = 𝛾𝑖𝑗: Subgradient of Θ 𝑖𝑗: Sign(Θ𝑖𝑗) if Θ𝑖𝑗 ≠ 0, 𝛾𝑖𝑗 ∈ [−1,1] o.w.
• We will estimate Θ−1 as 𝑊
max𝜃
(log det Θ − 𝑡𝑟 𝑆Θ − 𝜆 Θ 1)
Θ−1 − 𝑆 − 𝜆Γ = 0
Optimality Conditions, Blockwise
• Consider blocks of 𝑊 and 𝑆:
• Write the conditions from the previous page for the upper right block:
• Time for the magic…
𝑊 =𝑊11 𝑤12
𝑤21 𝑤22𝑆 =
𝑆11 𝑠12
𝑠21 𝑠22
𝑤12 − 𝑠12 − 𝜆𝛾12 = 0
A Certain Equivalence
• Consider the following Quadratic Program:
• Its KKT Conditions:
• The Conditions for the Upper-Right Block:
min𝛽∈ℝ𝑝−1
{1
2𝛽𝑇𝑊11𝛽 − 𝛽𝑇𝑠12 + 𝜆 𝛽 1 }
𝑊11𝛽 − 𝑠12 + 𝜆𝜌 = 0
𝑤12 − 𝑠12 − 𝜆𝛾12 = 0
Equivalent if:• 𝛽 = 𝑊11
−1𝑤12
• 𝜌 = −𝛾12
Exploring the Equivalence
• All right, so what if 𝛽 = 𝑊11−1𝑤12?
• We know how to do block matrix inverses from class:
• So 𝛽 = −Θ12
Θ22, and we also have 𝜌 = −𝛾12
𝑊 =𝑊11 𝑤12
𝑤21 𝑤22=
(Θ11 −Θ12Θ21
Θ22) −𝑊11
Θ12
Θ22
1
Θ22−
Θ21𝑊11Θ12
Θ222
How Does This Help Us?
• We can now solve one block of our objective by solving:
• But this is secretly LASSO:
• And we can solve LASSO efficiently!
min𝛽∈ℝ𝑝−1
(1
2𝛽𝑇𝑊11𝛽 − 𝛽𝑇𝑠12 + 𝜆 𝛽 1 )
min𝛽
(1
2𝑊11
12 𝛽 − 𝑊11
−12𝑠12 2
2
+ 𝜆 𝛽 1 )
Our Strategy
1. Start with 𝑊 = 𝑆 + 𝜆𝐼
2. Solve the LASSO sub-problem (and save the value of 𝛽):
3. Update 𝑤12 and 𝑤21 using 𝑤12 = 𝑊11 𝛽
4. Rearrange 𝑊 so the next row and column are in position 12
5. Repeat steps 2-4 until convergence
6. Calculate the diagonals of Θ (using Θ22 = 1/(𝑊22 − 𝑊12𝑇 𝛽))
7. Use the most recent values of 𝛽 to complete Θ (using 𝛽 = −Θ12
Θ22)
𝛽 = min𝛽
(1
2𝑊11
12 𝛽 − 𝑊11
−12𝑠12 2
2
+ 𝜆 𝛽 1 )
What is Going On?
• We used block coordinate descent:• Start with a problem of optimizing 𝑊
• Break it into a number of smaller sub-problems (blocks of 𝑊)
• Solve the sub-problems and update the associated variables in 𝑊
• Because our problem is convex, block coordinate descent is guaranteed to converge
• Each of the sub-problems is equivalent to LASSO, so we can solve them efficiently!
Solving LASSO efficiently
• We have the problem:
• We don’t actually have to do these matrix powers and multiplications, since we can calculate closed form solutions for coordinate without them:
min𝛽
(1
2𝑊11
12 𝛽 − 𝑊11
−12𝑠12 2
2
+ 𝜆 𝛽 1 )
𝛽𝑗 = 𝑆((𝑠12)𝑗 −
𝑘≠𝑗
(𝑊11)𝑘𝑗 𝛽𝑘 , 𝜌)/ 𝑊11 𝑗𝑗
𝑆 𝑥, 𝑡 = 𝑠𝑖𝑔𝑛 𝑥 𝑥 − 𝑡 +
Questions?
Top Related