Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan...

61
Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures Vincent Tan Animashree Anandkumar, Alan Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology Allerton Conference (Sep 30, 2009) 1/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 1 / 20

Transcript of Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan...

Page 1: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Learning Gaussian Tree Models: Analysis of ErrorExponents and Extremal Structures

Vincent TanAnimashree Anandkumar, Alan Willsky

Stochastic Systems Group,Laboratory for Information and Decision Systems,

Massachusetts Institute of Technology

Allerton Conference (Sep 30, 2009)

1/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 1 / 20

Page 2: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Motivation

Given a set of i.i.d. samples drawn fromp, a Gaussian tree model.

Inferring structure of Phylogenetic Trees from observed data.

Carlson et al. 2008,PLoS Comp. Bio.

2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

Page 3: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Motivation

Given a set of i.i.d. samples drawn fromp, a Gaussian tree model.

Inferring structure of Phylogenetic Trees from observed data.

Carlson et al. 2008,PLoS Comp. Bio.

2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20

Page 4: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

More motivation

What is the exact rate of decay of the probability of error?

How do the structure and parameters of the model influence theerror exponent (rate of decay)?

What are extremal tree distributions for learning?

Consistency is well established (Chow and Wagner 1973).

Error Exponent is a quantitative measure of the “goodness” oflearning.

3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

Page 5: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

More motivation

What is the exact rate of decay of the probability of error?

How do the structure and parameters of the model influence theerror exponent (rate of decay)?

What are extremal tree distributions for learning?

Consistency is well established (Chow and Wagner 1973).

Error Exponent is a quantitative measure of the “goodness” oflearning.

3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

Page 6: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

More motivation

What is the exact rate of decay of the probability of error?

How do the structure and parameters of the model influence theerror exponent (rate of decay)?

What are extremal tree distributions for learning?

Consistency is well established (Chow and Wagner 1973).

Error Exponent is a quantitative measure of the “goodness” oflearning.

3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

Page 7: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

More motivation

What is the exact rate of decay of the probability of error?

How do the structure and parameters of the model influence theerror exponent (rate of decay)?

What are extremal tree distributions for learning?

Consistency is well established (Chow and Wagner 1973).

Error Exponent is a quantitative measure of the “goodness” oflearning.

3/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 3 / 20

Page 8: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Main Contributions

1 Provide the exact Rate of Decay for a given p.

2 Rate of decay ≈ SNR for learning.

3 Characterized the extremal trees structures for learning, i.e., starsand Markov chains.

Stars have the slowest rate. Chains have the fastest rate.

4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Page 9: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Main Contributions

1 Provide the exact Rate of Decay for a given p.

2 Rate of decay ≈ SNR for learning.

3 Characterized the extremal trees structures for learning, i.e., starsand Markov chains.

Stars have the slowest rate. Chains have the fastest rate.

4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Page 10: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Main Contributions

1 Provide the exact Rate of Decay for a given p.

2 Rate of decay ≈ SNR for learning.

3 Characterized the extremal trees structures for learning, i.e., starsand Markov chains.

Stars have the slowest rate. Chains have the fastest rate.

4/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 4 / 20

Page 11: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Page 12: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Page 13: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Page 14: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1),

Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Page 15: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Notation and Background

p = N (0,Σ): d-dimensional Gaussian tree model.

Samples xn = {x1, x2, . . . , xn} drawn i.i.d. from p.

p: Markov on Tp = (V, Ep), a tree.

p: Factorizes according to Tp.

p(x) = p1(x1)p1,2(x1, x2)

p1(x1)

p1,3(x1, x3)

p1(x1)

p1,4(x1, x4)

p1(x1), Σ−1 =

♠ ♣ ♣ ♣♣ ♠ 0 0♣ 0 ♠ 0♣ 0 0 ♠

5/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 5 / 20

Page 16: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions(Chow-Liu)

Denote p = pxn as the empirical distribution of xn, i.e.,

p(x) := N (x; 0, Σ)

where Σ is the empirical covariance matrix of xn.

pe: Empirical on edge e.

Reduces to a max-weight spanning tree problem (Chow-Liu 1968)

ECL(xn) = argmaxEq : q∈Trees

∑e∈Eq

I(pe). I(pe) := I(Xi;Xj).

6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Page 17: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions(Chow-Liu)

Denote p = pxn as the empirical distribution of xn, i.e.,

p(x) := N (x; 0, Σ)

where Σ is the empirical covariance matrix of xn.

pe: Empirical on edge e.

Reduces to a max-weight spanning tree problem (Chow-Liu 1968)

ECL(xn) = argmaxEq : q∈Trees

∑e∈Eq

I(pe). I(pe) := I(Xi;Xj).

6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Page 18: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions(Chow-Liu)

Denote p = pxn as the empirical distribution of xn, i.e.,

p(x) := N (x; 0, Σ)

where Σ is the empirical covariance matrix of xn.

pe: Empirical on edge e.

Reduces to a max-weight spanning tree problem (Chow-Liu 1968)

ECL(xn) = argmaxEq : q∈Trees

∑e∈Eq

I(pe). I(pe) := I(Xi;Xj).

6/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 6 / 20

Page 19: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions

True MIs {I(pe)} Max-weight spanning tree Ep

Empirical MIs {I(pe)} from xn Max-weight spanning tree ECL(xn) 6= Ep

7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Page 20: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions

True MIs {I(pe)} Max-weight spanning tree Ep

Empirical MIs {I(pe)} from xn Max-weight spanning tree ECL(xn) 6= Ep

7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Page 21: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions

True MIs {I(pe)} Max-weight spanning tree Ep

Empirical MIs {I(pe)} from xn Max-weight spanning tree ECL(xn) 6= Ep

7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Page 22: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Max-Likelihood Learning of Tree Distributions

True MIs {I(pe)} Max-weight spanning tree Ep

Empirical MIs {I(pe)} from xn Max-weight spanning tree ECL(xn) 6= Ep

7/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 7 / 20

Page 23: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Problem StatementThe estimated edge set is ECL(xn)

and the error event is{ECL(xn) 6= Ep

}.

Find and analyze the error exponent Kp:

Kp := limn→∞

−1n

logP({ECL(xn) 6= Ep

}).

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Page 24: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Problem StatementThe estimated edge set is ECL(xn) and the error event is{

ECL(xn) 6= Ep

}.

Find and analyze the error exponent Kp:

Kp := limn→∞

−1n

logP({ECL(xn) 6= Ep

}).

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Page 25: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Problem StatementThe estimated edge set is ECL(xn) and the error event is{

ECL(xn) 6= Ep

}.

Find and analyze the error exponent Kp:

Kp := limn→∞

−1n

logP({ECL(xn) 6= Ep

}).

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Page 26: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Problem StatementThe estimated edge set is ECL(xn) and the error event is{

ECL(xn) 6= Ep

}.

Find and analyze the error exponent Kp:

Kp := limn→∞

−1n

logP({ECL(xn) 6= Ep

}).

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Page 27: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Problem StatementThe estimated edge set is ECL(xn) and the error event is{

ECL(xn) 6= Ep

}.

Find and analyze the error exponent Kp:

Kp := limn→∞

−1n

logP({ECL(xn) 6= Ep

}).

Alternatively,

P({ECL(xn) 6= Ep

}).= exp(−nKp).

8/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 8 / 20

Page 28: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

2

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

−1n

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

Page 29: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

2

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

−1n

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

Page 30: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

2

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

−1n

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

Page 31: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

2

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

−1n

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

Page 32: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate I

Two pairs of nodes e, e′ ∈(V

2

)with distribution pe,e′ , s.t.

I(pe) > I(pe′).

Consider the crossover event:

{I(pe) ≤ I(pe′)}.

Definition: Crossover Rate

Je,e′ := limn→∞

−1n

logP ({I(pe) ≤ I(pe′)}) .

This event may potentially lead to an error in structure learning. Why?

9/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 9 / 20

Page 33: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate II

TheoremThe crossover rate is

Je,e′ = infq∈Gaussians

{D(q || pe,e′) : I(qe′) = I(qe)

}.

By assumption I(pe) > I(pe′). v

v

pe,e′

q∗e,e′{I(qe)= I(qe′)}

D(q∗e,e′ ||pe,e′)

10/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 10 / 20

Page 34: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate II

TheoremThe crossover rate is

Je,e′ = infq∈Gaussians

{D(q || pe,e′) : I(qe′) = I(qe)

}.

By assumption I(pe) > I(pe′). v

v

pe,e′

q∗e,e′{I(qe)= I(qe′)}

D(q∗e,e′ ||pe,e′)

10/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 10 / 20

Page 35: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

The Crossover Rate II

TheoremThe crossover rate is

Je,e′ = infq∈Gaussians

{D(q || pe,e′) : I(qe′) = I(qe)

}.

By assumption I(pe) > I(pe′). v

v

pe,e′

q∗e,e′{I(qe)= I(qe′)}

D(q∗e,e′ ||pe,e′)

10/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 10 / 20

Page 36: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Error Exponent for Structure Learning II

P({ECL(xn) 6= Ep

}).= exp(−nKp).

Theorem (First Result)

Kp = mine′ /∈Ep

(min

e∈Path(e′;Ep)Je,e′

).

11/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 11 / 20

Page 37: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Error Exponent for Structure Learning II

P({ECL(xn) 6= Ep

}).= exp(−nKp).

Theorem (First Result)

Kp = mine′ /∈Ep

(min

e∈Path(e′;Ep)Je,e′

).

11/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 11 / 20

Page 38: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Approximating the Crossover Rate I

Definition: pe,e′ satisfies the very noisy learning condition if

||ρe| − |ρe′ || < ε ⇒ I(pe) ≈ I(pe′).

Euclidean Information Theory (Borade, Zheng 2007).

12/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 12 / 20

Page 39: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Approximating the Crossover Rate II

Theorem (Second Result)The approximate crossover rate is:

Je,e′ =(I(pe′)− I(pe))

2

2 Var(se′ − se)

where se is the information density:

se(xi, xj) = logpi,j(xi, xj)

pi(xi)pj(xj)

The approximate error exponent is

Kp = mine′∈Ep

(min

e∈Path(e′;Ep)Je,e′

).

13/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 13 / 20

Page 40: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Approximating the Crossover Rate II

Theorem (Second Result)The approximate crossover rate is:

Je,e′ =(I(pe′)− I(pe))

2

2 Var(se′ − se)

where se is the information density:

se(xi, xj) = logpi,j(xi, xj)

pi(xi)pj(xj)

The approximate error exponent is

Kp = mine′∈Ep

(min

e∈Path(e′;Ep)Je,e′

).

13/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 13 / 20

Page 41: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Approximating the Crossover Rate II

Theorem (Second Result)The approximate crossover rate is:

Je,e′ =(I(pe′)− I(pe))

2

2 Var(se′ − se)

where se is the information density:

se(xi, xj) = logpi,j(xi, xj)

pi(xi)pj(xj)

The approximate error exponent is

Kp = mine′∈Ep

(min

e∈Path(e′;Ep)Je,e′

).

13/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 13 / 20

Page 42: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Correlation Decay

ww w

w@@@@@ �

����

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p p p p p p p p p p p p p p p p px1

x2 x3

x4

ρ1,2

ρ2,3

ρ3,4

ρ1,4

ρ1,3

ρi,j = E[xi xj].

Markov property⇒ ρ1,3 = ρ1,2 × ρ2,3.

Correlation decay⇒ |ρ1,4| ≤ |ρ1,3|.

(1, 4) is not likely to be mistaken as a true edge.

Only need to consider triangles in the true tree.

14/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 14 / 20

Page 43: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Correlation Decay

ww w

w@@@@@ �

����

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p p p p p p p p p p p p p p p p px1

x2 x3

x4

ρ1,2

ρ2,3

ρ3,4

ρ1,4

ρ1,3

ρi,j = E[xi xj].

Markov property⇒ ρ1,3 = ρ1,2 × ρ2,3.

Correlation decay⇒ |ρ1,4| ≤ |ρ1,3|.

(1, 4) is not likely to be mistaken as a true edge.

Only need to consider triangles in the true tree.

14/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 14 / 20

Page 44: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Correlation Decay

ww w

w@@@@@ �

����

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p p p p p p p p p p p p p p p p px1

x2 x3

x4

ρ1,2

ρ2,3

ρ3,4

ρ1,4

ρ1,3

ρi,j = E[xi xj].

Markov property⇒ ρ1,3 = ρ1,2 × ρ2,3.

Correlation decay⇒ |ρ1,4| ≤ |ρ1,3|.

(1, 4) is not likely to be mistaken as a true edge.

Only need to consider triangles in the true tree.

14/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 14 / 20

Page 45: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Correlation Decay

ww w

w@@@@@ �

����

p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p p p p p p p p p p p p p p p p px1

x2 x3

x4

ρ1,2

ρ2,3

ρ3,4

ρ1,4

ρ1,3

ρi,j = E[xi xj].

Markov property⇒ ρ1,3 = ρ1,2 × ρ2,3.

Correlation decay⇒ |ρ1,4| ≤ |ρ1,3|.

(1, 4) is not likely to be mistaken as a true edge.

Only need to consider triangles in the true tree.

14/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 14 / 20

Page 46: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures I

Fix ρ, a vector of correlation coefficients on the tree, e.g.

ww w

ww w@@@@@ �

����

ρ1

ρ2

ρ3ρ4 ρ5

ρ := [ρ1, ρ2, ρ3, ρ4, ρ5].

Which structures gives the highest and lowest exponents?

15/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 15 / 20

Page 47: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures I

Fix ρ, a vector of correlation coefficients on the tree, e.g.

ww w

ww w@@@@@ �

����

ρ1

ρ2

ρ3ρ4 ρ5

ρ := [ρ1, ρ2, ρ3, ρ4, ρ5].

Which structures gives the highest and lowest exponents?

15/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 15 / 20

Page 48: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures II

Theorem (Main Result)

Worst: The star minimizes Kp.

Kstar ≤ Kp.

Best: The Markov chain maximizes Kp.

Kchain ≥ Kp.

wwww

wρ1

ρ2

ρ3ρ4

w w w w wρπ(1) ρπ(2) ρπ(3) ρπ(4)

16/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 16 / 20

Page 49: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures III

Chain, Star and Hybrid Graphs for d = 10.

103

104

0.2

0.4

0.6

0.8

Number of samples n

Sim

ulat

ed P

rob

of E

rror

ChainHybridStar

103

104

0

0.5

1

1.5

2

2.5x 10

−3

Number of samples n

Sim

ulat

ed E

rror

Exp

onen

t

Plot of the error probability and error exponent for 3 tree graphs.

17/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 17 / 20

Page 50: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures III

Chain, Star and Hybrid Graphs for d = 10.

103

104

0.2

0.4

0.6

0.8

Number of samples n

Sim

ulat

ed P

rob

of E

rror

ChainHybridStar

103

104

0

0.5

1

1.5

2

2.5x 10

−3

Number of samples n

Sim

ulat

ed E

rror

Exp

onen

t

Plot of the error probability and error exponent for 3 tree graphs.

17/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 17 / 20

Page 51: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures IV

Remarks:Universal result.

Extremal structures wrt diameter are the extremal structures forlearning.

Corroborates our intuition about correlation decay.

18/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 18 / 20

Page 52: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extremal Structures IV

Remarks:Universal result.

Extremal structures wrt diameter are the extremal structures forlearning.

Corroborates our intuition about correlation decay.

18/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 18 / 20

Page 53: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extensions

Significant reduction of complexity for computation of errorexponent.

Finding the best distributions for fixed ρ.

Effect of adding and deleting nodes and edges on error exponent.

19/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 19 / 20

Page 54: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extensions

Significant reduction of complexity for computation of errorexponent.

Finding the best distributions for fixed ρ.

Effect of adding and deleting nodes and edges on error exponent.

19/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 19 / 20

Page 55: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Extensions

Significant reduction of complexity for computation of errorexponent.

Finding the best distributions for fixed ρ.

Effect of adding and deleting nodes and edges on error exponent.

19/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 19 / 20

Page 56: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20

Page 57: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20

Page 58: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20

Page 59: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20

Page 60: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20

Page 61: Learning Gaussian Tree Models: Analysis of Error Exponents ... · PLoS Comp. Bio. 2/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 2 / 20. Motivation Given

Conclusion1 Found the rate of decay of the error probability using large

deviations.

2 Used Euclidean Information Theory to obtain an SNR-likeexpression for crossover rate.

3 We can say which structures are easy and hard based on theerror exponent.

Extremal structures in terms of the tree diameter.

Full versions can be found athttp://arxiv.org/abs/0905.0940.http://arxiv.org/abs/0909.5216.

20/20 Vincent Tan (MIT) Learning Gaussian Tree Models Allerton Conference 20 / 20