The Structure of Level-k Phylogenetic Networks

41
Lille – 24/06/2009 – CPM'09 The Structure of Level-k Phylogenetic Networks Philippe Gambette in collaboration with Vincent Berry, Christophe Paul

description

A presentation on the combinatorics of level-k networks (CPM 2009, Lille)

Transcript of The Structure of Level-k Phylogenetic Networks

Page 1: The Structure of Level-k Phylogenetic Networks

Lille – 24/06/2009 – CPM'09

The Structure of Level-k Phylogenetic Networks

Philippe Gambette

in collaboration withVincent Berry, Christophe Paul

Page 2: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 3: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 4: The Structure of Level-k Phylogenetic Networks

Phylogenetic networks

median network

minimumspanning network

TCS

level-2 network

split network

SplitsTree

Network

Level-2

T-Rex

reticulogram

synthesisdiagram

HorizStory

Page 5: The Structure of Level-k Phylogenetic Networks

median network

minimumspanning network

TCS

level-2 network

split network

SplitsTree

Network

Level-2

T-Rex

reticulogram

synthesisdiagram

HorizStory

Phylogenetic networks

Page 6: The Structure of Level-k Phylogenetic Networks

Abstract or explicit networks

An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.

An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.

Doolittle : Uprooting the Tree of Life, Scientific American (Feb..2000)

Page 7: The Structure of Level-k Phylogenetic Networks

Abstract or explicit networks

An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.

An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.

Abstract phylogenetic network of mushroom species, www.splitstree.org

Page 8: The Structure of Level-k Phylogenetic Networks

Hierarchy of network subclasses

http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php

Page 9: The Structure of Level-k Phylogenetic Networks

Level-k phylogenetic networks

= level 0

level 1 =

http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php

Page 10: The Structure of Level-k Phylogenetic Networks

Level-k phylogenetic networks

Motivation to generalize “galled trees” (= level-1) :

Arenas, Valiente, Posada : Characterization of Phylogenetic ReticulateNetworks based on the Coalescent withRecombination, Molecular Biology and Evolution, to appear.

Page 11: The Structure of Level-k Phylogenetic Networks

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N

All arcs are oriented downwards

Page 12: The Structure of Level-k Phylogenetic Networks

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N A blob is a maximal induced connected subgraph with no cut arc.

A cut arc is an arc which disconnects the graph.

Page 13: The Structure of Level-k Phylogenetic Networks

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N A blob is a maximal induced connected subgraph with no cut arc.

A cut arc is an arc which disconnects the graph.

N has level 2.

Page 14: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 15: The Structure of Level-k Phylogenetic Networks

Decomposition of level-k networks

We formalize the decomposition into blobs:

a b c d e f g h i j k a b c d e f g h i j kN decomposed as a tree of simple graph patterns: generators.

N, a level-k network.

Generators were introduced by van Iersel & al (Recomb 2008) for the restricted class of simple level-k networks.

Page 16: The Structure of Level-k Phylogenetic Networks

Level-k generators

A level-k generator is a level-k network with no cut arc.

G0 G1 2a 2b 2c 2d

The sides of the generator are:- its arcs- its reticulation vertices of outdegree 0

Page 17: The Structure of Level-k Phylogenetic Networks

Decomposition theorem of level-k networks

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Page 18: The Structure of Level-k Phylogenetic Networks

G0

Decomposition theorem of level-k networks

SplitRootk(G

1,G

0)

G1 G

0G

1

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Page 19: The Structure of Level-k Phylogenetic Networks

Decomposition theorem of level-k networks

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Attachk(l

i,G

i,N)

Gi

li is an arc of N

li

Gi

N

Page 20: The Structure of Level-k Phylogenetic Networks

Decomposition theorem of level-k networks

Attachk(l

i,G

i,N)

Gi

li is a reticulation vertex of N

li G

i

N

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Page 21: The Structure of Level-k Phylogenetic Networks

Decomposition theorem of level-k networks

This decomposition is not unique!

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Page 22: The Structure of Level-k Phylogenetic Networks

Decomposition theorem of level-k networks

Unique “graph-labeled tree” decomposition:

Possible applications:- exhaustive generation of level-k networks- counting of level-k networks

31

21

a b c d e f g h i

N

a b c d e f g h i

1h1 h

2 h3 h

4

ρ

Page 23: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 24: The Structure of Level-k Phylogenetic Networks

Construction of the generatorsVan Iersel & al build the 4 level-2 generators by a case analysis, generalized by Steven Kelk into an exponential algorithm to find all 65 level-3 generators.

Page 25: The Structure of Level-k Phylogenetic Networks

Construction of the generators

Van Iersel & al give a simple case analysis for level-2.

We give rules to build level-(k+1) from level-k generators.

Page 26: The Structure of Level-k Phylogenetic Networks

h2

h1

Construction of the generators

Construction rules of level-k+1 generatorsfrom level k-generators:

e1

e2

h2

h1

h3

h2

h1 h

3

h2

h1 h

3

h2

h1 h

3R

1(N,h

1,h

2) R

1(N,h

1,e

2) R

1(N,e

2,e

2) R

1(N,e

1,e

2)

N

Rule R1 :

Page 27: The Structure of Level-k Phylogenetic Networks

Construction of the generators

e1

e2

h2

h1

N

Rule R2 :

R2(N,h

1,e

2) R

2(N,e

1,e

1) R

2(N,e

2,e

1)

h2

h1 h

2h1

h2h

1

h3

h3

h3

Construction rules of level-k+1 generatorsfrom level k-generators:

Page 28: The Structure of Level-k Phylogenetic Networks

h3

h2

h1

Construction of the generators

Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!

h2

h1 h

3R

1(2b,h

1,e

2) R

1(2b,h

2,e

1)

Page 29: The Structure of Level-k Phylogenetic Networks

h3

h2

h1

Construction of the generators

Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!

h2

h1 h

3R

1(N,h

1,e

2) R

1(N,h

2,e

1)

→ difficult to count!

Page 30: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 31: The Structure of Level-k Phylogenetic Networks

Upper bound

R1 and R

2 can be applied at most on all pairs of sides

A level-k generator has at most 5k slides:

gk+1

< 50 k² gk

Upper bound:g

k < k!² 50k

Theoretical corollary:There is a polynomial algorithm to build the set of level-k+1 generators from the set of level-k generators.

Practical corollary:g

4 < 28350

→ it is possible to enumerate all level-4 generators.

Page 32: The Structure of Level-k Phylogenetic Networks

Number of level-k generators

http://www.lirmm.fr/~gambette/ProgramGenerators

It is possible to enumerate all level-4 generators.

Isomorphism of graphs of bounded valence:polynomial

(Luks, FOCS 1980)

Practical algorithm?Simple backtracking exponential algorithm sufficient for level 4 :

go through both graphs from their root in parallel andidentify their vertices: O(n2n-h)

→ g4 = 1993

→ g5 > 71000

Page 33: The Structure of Level-k Phylogenetic Networks

Number of level-k generators

Page 34: The Structure of Level-k Phylogenetic Networks

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

Lower bound

Page 35: The Structure of Level-k Phylogenetic Networks

Lower bound

0 1

0 1

k = 1

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

Page 36: The Structure of Level-k Phylogenetic Networks

Lower bound

0 1

10

0 1

1 0

0 1 2 3

k = 2

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

Page 37: The Structure of Level-k Phylogenetic Networks

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Page 38: The Structure of Level-k Phylogenetic Networks

Simulated level-k networks

http://www.lirmm.fr/~gambette/ProgramGenerators

Simulate 1000 phylogenetic networks using the coalescent model with recombination.

Arenas, Valiente, Posada 2008Program Recodon

How many are level-1,2,3... networks?

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100

0

100

200

300

400

500

600

700

800

900

1000

n=50,r=1n=50,r=2n=50,r=4n=50,r=8n=50,r=16n=50,r=32

level

nb of networks

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100

0

100

200

300

400

500

600

700

800

900

1000

n=10,r=1n=10,r=2n=10,r=4n=10,r=8n=10,r=16n=10,r=32

level

nb of networks

Page 39: The Structure of Level-k Phylogenetic Networks

Simulated level-k networks

level

number of reticulations

0 5 9

9

5

0

Simulate 1000 phylogenetic networks using the coalescent model with recombination.

Link between level and number of reticulations:

http://www.lirmm.fr/~gambette/ProgramGenerators

Page 40: The Structure of Level-k Phylogenetic Networks

Summary on the level parameter

Advantages:• natural structure for all explicit phylogenetic networks• global tree-structure used algorithmically• finite graph patterns to represent blobs: generators

Limits:• number of generators exponential in the level• complex structure of generators• when recombination is not local, level doesn't help

Page 41: The Structure of Level-k Phylogenetic Networks

Questions?Thank you for your attention!

TreeCloud available at http://treecloud.org - Slides available at http://www.lirmm.fr/~gambette/RePresentationsENG.php

Split network and reticulogram of the 50 most frequent words in CPM titles, hyperlex cooccurrence distance.

TreeCloud

TreeCloud T-Rex