ﬀ Network Analysis via Lasso Penalized D-Trace Loss · 2019. 3. 31. · ﬀ Network Analysis via...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Differential Network Analysis via Lasso PenalizedD-Trace Loss

Ruibin Xi

School of Mathematical Sciences and Center for Statistical SciencePeking University

Joint work with Huili Yuan, Chong Chen and Minghua Deng

April 22, 2018

(PKU) Penalized D-Trace April 22, 2018 1 / 32

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Gaussian Graphical Model

In Gaussian graphical model, the precision matrix Θ = Σ−1.

Nonzero elements of Θ correspond to edges in Gaussian graphicalmodel.if x ∼ Np(0,Σ), Θij = 0 iff xi ⊥ xj |{xk, k = i, j} (Wittaker, 1990).

We can impose sparsity on Θ to study the Gaussian graphical model.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Gaussian Graphical Model

Meishausen and Buhlmann, P. (2006): neighborhood selectionscheme based on lasso penalized regression

Yuan and Lin (2006) and Friedman et al. (2007) proposed toestimate Θ by minimizing

− log detΘ + tr(ΘΣ) + λ|Θ|1,off

Zhang and Zou (2014) proposed minimizing

LD(Θ, Σ) + λ|Θ|1,off= 1

2 < Θ2, Σ > −tr(Θ) + λ|Θ|1,off

where < A,B >= tr(ATB).


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Difference of Precision Matrices

Suppose that X1 · · ·XnX ∼ N (0,ΣX), Y1, · · · , YnY ∼ N (0,ΣY ), try toestimate

∆ = ΘX −ΘY = Σ−1X − Σ−1

Y

ΘX may be the gene regulatory network in normal condition, ΘY maybe the gene regulatory network in a perturbed condition.

more interested in the “change” of the network

the change can be measured by ∆ = ΘX −ΘY

usually we can assume the changes are sparse.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Difference network

Zhou et al. Nature (1995) find a network attacking mutation at RETin cancer.

Bandyopadhyay et al. Science (2010) profiled genetic interactiondifferences with and without DNA damaging agent.

Guenole et al. Cell (2013) studied genetic interaction changes indifferent conditions..


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Difference network

Guenole et al. Cell (2013)


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Existing works

Danaher et al. (2014) considered minimizing

−K∑k=1

[nk log det(Θk)−tr(ΣkΘk)

]+λ1

K∑k=1

|Θk|1,off+λ2

∑k<k′

|Θk−Θk′ |1

If K = 2, this can be used for estimating the difference of theprecision matrices. No Theoretical development.

Zhao et al. (2014) proposed estimating ∆ = ΘX −ΘY by solving

argmin∆|∆| subject to |ΣX∆ΣY − ΣX + ΣY |∞ ≤ λn.

Advantage: Do not need to specify the sparsity of ΣX and ΣY .Computational complexity O(p4), very expensive!


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


In Zhao et al. (2014), to solve

argmin∆|∆| subject to |ΣX∆ΣY − ΣX + ΣY |∞ ≤ λn,

they have to transform the problem as

argmin∆|∆| subject to |(ΣX ⊗ ΣY )vec(∆)− vec(ΣX − ΣY )|∞ ≤ λn.

The, the problem can be solved based on a method developed for linearregression (implemented in the R package flare).


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

A new loss function

Estimation based on a new loss function. Idea is to find new loss functionLD(∆|ΣX ,ΣY )

LD is convex in ∆

LD achieves minimum at ΘX −ΘY .


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

A new loss function

We chose

LD(∆|ΣX ,ΣY ) =1

2

(⟨ΣX∆,ΣY ∆⟩+ ⟨ΣY ∆,ΣX∆⟩

)− 2⟨∆,ΣY − ΣX⟩

Note that∂LD∂∆ = ΣX∆ΣY +ΣY ∆ΣX − 2(ΣY − ΣX)

if ∆ = ΘX −ΘY = Σ−1X − Σ−1

Y , we have ∂LD∂∆ (∆) = 0

∂2LD∂∆2 = ΣX ⊗ ΣY +ΣY ⊗ ΣX ⪰ 0

We also call LD(∆|ΣX ,ΣY ) the D-trace loss.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Penalized D-trace loss

We use the lasso penalized D-trace loss for the estimation of the differenceof precision matrices

Given X1 · · ·XnX ∼ N (0,ΣX), Y1, · · · , YnY ∼ N (0,ΣY ).

Let ΣX and ΣY be the sample covariance matrices.

Estimate ∆ by minimizing

LD(∆|ΣX , ΣY ) + λ|∆|1

= 12

(⟨ΣX∆, ΣY ∆⟩+ ⟨ΣY ∆, ΣX∆⟩

)− 2⟨∆, ΣY − ΣX⟩+ λ|∆|1

How to solve this?


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


The augmented Lagrange:

L(∆1,∆2,∆3,Λ1,Λ2,Λ3)

=1

2

(⟨ΣY ∆1, ΣX∆1⟩+ ⟨ΣX∆2, ΣY ∆2⟩

)− ⟨∆1, ΣY − ΣX⟩ − ⟨∆2, ΣY − ΣX⟩+ λ∥∆3∥1+

ρ

2∥∆1 −∆2∥2F +

ρ

2∥∆2 −∆3∥2F +

ρ

2∥∆3 −∆1∥2F

+ ⟨Λ1,∆1 −∆2⟩+ ⟨Λ2,∆2 −∆3⟩+ ⟨Λ3,∆3 −∆1⟩.

Iteratively update ∆1,∆2,∆3,Λ1,Λ2,Λ3


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Taking partial derivative about ∆1 and setting it as zero, we get

∂L

∂∆1= ΣY ∆1ΣX + 2ρ∆1 − ρ(∆2 +∆3) + Λ1 − Λ3 − (ΣY − ΣX)

= 0

We get the equation of the form A∆B + ξ∆ = C, with ξ = 2ρ andC = ρ(∆2 +∆3) + Λ3 − Λ1 + ΣY − ΣX .


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Lemma (1)

Assume that A,B are symmetric and semidefinite matrices, C is a symmetric matrix and ρ is areal number. Let G(A,B,C, ρ) be the solution to the equation

A∆B + ρ∆ = C,

thenG(A,B,C, ρ) = UA[D ◦ (UT

ACUTB )]UB ,

where A = UAΣAUTA , B = UBΣBUT

B and Dij = 1σAj σB

i +ρ.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The proof of lemma 1

Proof.First, we make a vectorization for the original equation.

vec(C) = (A⊗B)vec(∆) + (ρI ⊗ I)vec(∆)

=((UAΣAUT

A )⊗ (UBΣBUTB ) + ρI ⊗ I

)vec(∆)

= (UA ⊗ UB)(ΣA ⊗ ΣB + ρI ⊗ I)(UTA ⊗ UT

B )vec(∆)

(1)

Since (UA ⊗ UB)(UTA ⊗ UT

B ) = I, it is easy to get

vec(∆) = (UA ⊗ UB)(ΣA ⊗ ΣB + ρI ⊗ I)−1(UTA ⊗ UT

B )vec(C)

= (UA ⊗ UB)(ΣA ⊗ ΣB + ρI ⊗ I)−1vec(UTACUT

B )

= (UA ⊗ UB)vec(D ◦ (UTACUT

B ))

= vec(UA[D ◦ (UTACUT

B )]UB)

(2)

So∆ = UA[D ◦ (UT

ACUTB )]UB .


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


Fixing ∆1,∆2,Λ1,Λ2,Λ3, the part of the augmented Lagrange involving∆3 is ρ∥∆3∥F + λ|∆3|1− < ∆3, ρ∆2 + ρ∆3 + Λ2 − Λ3 >

Lemma (2)

Let S(A, λ) = argmin∆12∥∆∥2F + λ∥∆∥1 − ⟨X,A⟩, then

S(A, λ)ij =

Aij − λ Aij > λ

Aij + λ Aij < −λ

0 otherwise


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithm

Require: ΣY , ΣX , ρ, λ1: Initialize ∆0

1,∆02,∆

03,Λ

01,Λ

02,Λ

03, k = 0

2: while Stop condition do3: ∆k+1

1 = G(ΣY , ΣX , ρ∆k2 + ρ∆k

3 + (ΣY − ΣX) + Λk3 − Λk

1, 2ρ)4: ∆k+1

2 = G(ΣX , ΣY , ρ∆k+11 + ρ∆k

3 + (ΣY − ΣX) + Λk1 − Λk

2, 2ρ)

5: ∆k+13 = S(

ρ∆k+11 +ρ∆k+1

2 +Λk2−Λk

32ρ , λ

2ρ)

6: Λk+11 = Λk

1 + ρ(∆k+11 −∆k+1

2 )7: Λk+1

2 = Λk2 + ρ(∆k+1

2 −∆k+13 )

8: Λk+13 = Λk

3 + ρ(∆k+13 −∆k+1

1 )9: end while

10: return ∆k3


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Theoretical Results

Theorem (1)

Assume that Xi, Yj (i = 1, · · · , nX , j = 1, · · · , nY ) are sub-Gaussian.Assume that max{∥Σ∗

X∥∞, ∥Σ∗Y ∥∞} ≤ M and s < p. Under an

irrepresentability condition, if λn is chosen properly, with probability largerthan 1− 2/pη−2 (η > 2), we have

∥∆−∆∗∥∞ ≤ MG

{η log p+ log 4

min (nX , nY )

}1/2

,

∥∆−∆∗∥F ≤ MG

{η log p+ log 4

min (nX , nY )

}1/2

s1/2.

where GA, GB, δ, CG, MG are constants depending on M , s, κΓ, α, σXand σY .


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The irrepresentability condition

Γ(ΣX ,ΣY ) =12(ΣX ⊗ ΣY +ΣY ⊗ ΣX).

S = {(i, j) : ∆∗i,j = 0} is the support of ∆∗.

Γ∗ = Γ(Σ∗X ,Σ∗

Y )

the irrepresentability condition

maxe∈Sc

∥Γ∗e,S(Γ

∗S,S)

−1∥1< 1− α


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The irrepresentability condition

Zhao et al. (2014) assumes a stronger condition, which implies

maxi =j

|Γ∗ij | ≤ min

jΓ∗jj(2s)

−1

Let

A =

(1 1/21/2 1

),

Σ∗X = Ip and Σ∗

Y = diag{A, Ip−2}.Γ∗ = Γ(Σ∗

X ,Σ∗Y ) satisfies the irrepresentability condition, but not the

above condition.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Theoretical Results

Define M(∆) = {sgn(∆j,k) : j = 1, . . . , p, k = 1, . . . , p}.

Theorem (2)

Under the same conditions and notations in Theorem (1), if

minj,k:∆∗

j,k =0| ∆∗

j,k | ≥ 2MG

{η log p+ log 4

min (nX , nY )

}1/2

for some η > 2 and, then M(∆) = M(∆∗) with probability 1− 2/pη−2.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation

n = 100.

ΘX = Σ−1X was defined as 0.5|i−j|

∆: around p/4 nonzero elements.

data were generated by Gaussian distribution

Compare with Fused Graphic Lasso with λ1 = 0 and Zhao et al.(2014).


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p= 100

1−TN

TP

DTL

FGL

L1−M


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p= 200

1−TN

TP

DTL

FGL


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p= 500

1−TN

TP

DTL

FGL


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Simulation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p= 1000

1−TN

TP

DTL

FGL


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Real Data Analysis

Colorectal cancer patients from TCGA

two groups: 77 microsatellite instable (MSI) patients, 122microsatellite stable (MSS) patients

Expression data of the genes in DNA mismatch repair pathway


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Real Data Analysis

●

●

●

●

●

AXIN2

MLH1

BIRC5

PIK3CB

PIK3CG

(a)

●

●

●

●

●

●●

APC

AXIN2

MLH1

BIRC5

TGFB3

PIK3CBPIK3CG

(b)

●

●

●

●

●

●

●

●

APC2

AKT3

AXIN2

APC

MLH1

CYCS

PIK3CB

TGFB2

(c)

Figure: (a): D-trace loss estimate under LF -norm; (b): Fused graphical lassounder LF -norm (c): the L1-minimization estimate under LF -norm.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Real Data Analysis

●●

●

●●

●

●●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●●●

●

●

●

●●

●

●

●

●

●●●●●

●

●

●●

●●

●●

●

●

●●

●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●●

AXIN2+ AXIN2−

0.0

0.5

1.0

1.5

2.0

2.5

3.0

log1

0 N

umbe

r of M

utat

ions

●●

●

●●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●

●

●

●

●

●

●●

●●●●

●

●

●●

●●

●●

●

●

●

●●●

●

●

●

●

●

●●

●

●●

●

●

●●●●

●●

PIK3CG+ PIK3CG−

0.0

0.5

1.0

1.5

2.0

2.5

3.0

log1

0 N

umbe

r of M

utat

ions

Figure: Boxplots of somatic mutation numbers in patients with/without a AXIN2or PIK3CG mutation; The y-axis is in log10 scale.


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Conclusion

Proposed a new loss function, the D-trace loss, for the estimation ofthe difference of precision matrices.

Proposed an efficient greedy algorithm for the penalized D-trace loss

Developed asymptotic results


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Acknowledgement

NSFC

The Recruitment Program of Global Youth Experts of China

National Key Basic Research Program of China


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Thank you for your attention!


ﬀ Network Analysis via Lasso Penalized D-Trace Loss · 2019. 3. 31. · ﬀ Network Analysis via...

Documents

Transcript of ﬀ Network Analysis via Lasso Penalized D-Trace Loss · 2019. 3. 31. · ﬀ Network Analysis via...