The Structure of Level-k Phylogenetic Networks

Post on 09-May-2015

603 views 1 download

description

A presentation on the combinatorics of level-k networks (CPM 2009, Lille)

Transcript of The Structure of Level-k Phylogenetic Networks

Lille – 24/06/2009 – CPM'09

The Structure of Level-k Phylogenetic Networks

Philippe Gambette

in collaboration withVincent Berry, Christophe Paul

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Phylogenetic networks

median network

minimumspanning network

TCS

level-2 network

split network

SplitsTree

Network

Level-2

T-Rex

reticulogram

synthesisdiagram

HorizStory

median network

minimumspanning network

TCS

level-2 network

split network

SplitsTree

Network

Level-2

T-Rex

reticulogram

synthesisdiagram

HorizStory

Phylogenetic networks

Abstract or explicit networks

An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.

An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.

Doolittle : Uprooting the Tree of Life, Scientific American (Feb..2000)

Abstract or explicit networks

An explicit phylogenetic network is a phyogenetic network where all reticulations can be interpreted as precise biological events.

An abstract network reflects some phylogenetic signals rather than explicitly displaying biological reticulation events.

Abstract phylogenetic network of mushroom species, www.splitstree.org

Hierarchy of network subclasses

http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php

Level-k phylogenetic networks

= level 0

level 1 =

http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php

Level-k phylogenetic networks

Motivation to generalize “galled trees” (= level-1) :

Arenas, Valiente, Posada : Characterization of Phylogenetic ReticulateNetworks based on the Coalescent withRecombination, Molecular Biology and Evolution, to appear.

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N

All arcs are oriented downwards

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N A blob is a maximal induced connected subgraph with no cut arc.

A cut arc is an arc which disconnects the graph.

r1

r2

r3

r4

r

Level-k phylogenetic networks

A level-k phylogenetic network N on a set X of n taxa is a multidigraph in which:- exactly one vertex has indegree 0 and outdegree 2: the root,- all other vertices have either:

- indegree 1 and outdegree 2: split vertices,- indegree 2 and outdegree ≤ 1: reticulation vertices,- or indegree 1 and outdegree 0: leaves labeled by X,

- any blob has at most k reticulation vertices.

a b c d e f g h i j k

N A blob is a maximal induced connected subgraph with no cut arc.

A cut arc is an arc which disconnects the graph.

N has level 2.

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Decomposition of level-k networks

We formalize the decomposition into blobs:

a b c d e f g h i j k a b c d e f g h i j kN decomposed as a tree of simple graph patterns: generators.

N, a level-k network.

Generators were introduced by van Iersel & al (Recomb 2008) for the restricted class of simple level-k networks.

Level-k generators

A level-k generator is a level-k network with no cut arc.

G0 G1 2a 2b 2c 2d

The sides of the generator are:- its arcs- its reticulation vertices of outdegree 0

Decomposition theorem of level-k networks

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

G0

Decomposition theorem of level-k networks

SplitRootk(G

1,G

0)

G1 G

0G

1

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Decomposition theorem of level-k networks

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Attachk(l

i,G

i,N)

Gi

li is an arc of N

li

Gi

N

Decomposition theorem of level-k networks

Attachk(l

i,G

i,N)

Gi

li is a reticulation vertex of N

li G

i

N

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Decomposition theorem of level-k networks

This decomposition is not unique!

N is a level-k network

iff

there exists a sequence (lj)

jϵ[1,r] of r locations

(arcs or reticulation vertices of outdegree 0)and a sequence (G

j)

jϵ[0,r] of generators of level at most k, such that:

- N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,Attach

k(l

1,G

1,G

0))...)),

- or N = Attachk(l

r,G

r,Attach

k(... Attach

k(l

2,G

2,SplitRoot

k(G

1,G

0))...)).

Decomposition theorem of level-k networks

Unique “graph-labeled tree” decomposition:

Possible applications:- exhaustive generation of level-k networks- counting of level-k networks

31

21

a b c d e f g h i

N

a b c d e f g h i

1h1 h

2 h3 h

4

ρ

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Construction of the generatorsVan Iersel & al build the 4 level-2 generators by a case analysis, generalized by Steven Kelk into an exponential algorithm to find all 65 level-3 generators.

Construction of the generators

Van Iersel & al give a simple case analysis for level-2.

We give rules to build level-(k+1) from level-k generators.

h2

h1

Construction of the generators

Construction rules of level-k+1 generatorsfrom level k-generators:

e1

e2

h2

h1

h3

h2

h1 h

3

h2

h1 h

3

h2

h1 h

3R

1(N,h

1,h

2) R

1(N,h

1,e

2) R

1(N,e

2,e

2) R

1(N,e

1,e

2)

N

Rule R1 :

Construction of the generators

e1

e2

h2

h1

N

Rule R2 :

R2(N,h

1,e

2) R

2(N,e

1,e

1) R

2(N,e

2,e

1)

h2

h1 h

2h1

h2h

1

h3

h3

h3

Construction rules of level-k+1 generatorsfrom level k-generators:

h3

h2

h1

Construction of the generators

Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!

h2

h1 h

3R

1(2b,h

1,e

2) R

1(2b,h

2,e

1)

h3

h2

h1

Construction of the generators

Problem!Some of the level-k+1 generators obtained from level-k generators are isomorphic!

h2

h1 h

3R

1(N,h

1,e

2) R

1(N,h

2,e

1)

→ difficult to count!

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Upper bound

R1 and R

2 can be applied at most on all pairs of sides

A level-k generator has at most 5k slides:

gk+1

< 50 k² gk

Upper bound:g

k < k!² 50k

Theoretical corollary:There is a polynomial algorithm to build the set of level-k+1 generators from the set of level-k generators.

Practical corollary:g

4 < 28350

→ it is possible to enumerate all level-4 generators.

Number of level-k generators

http://www.lirmm.fr/~gambette/ProgramGenerators

It is possible to enumerate all level-4 generators.

Isomorphism of graphs of bounded valence:polynomial

(Luks, FOCS 1980)

Practical algorithm?Simple backtracking exponential algorithm sufficient for level 4 :

go through both graphs from their root in parallel andidentify their vertices: O(n2n-h)

→ g4 = 1993

→ g5 > 71000

Number of level-k generators

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

Lower bound

Lower bound

0 1

0 1

k = 1

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

Lower bound

0 1

10

0 1

1 0

0 1 2 3

k = 2

Lower bound:g

k ≥ 2k-1

There is an exponential number of generators!

Idea:Code every number between 0 and 2k-1-1 by a level-k generator.

• Phylogenetic networks

• Decomposition of level-k networks

• Construction of level-k generators

• Number of level-k generators

• Simulated level-k networks

Outline

Simulated level-k networks

http://www.lirmm.fr/~gambette/ProgramGenerators

Simulate 1000 phylogenetic networks using the coalescent model with recombination.

Arenas, Valiente, Posada 2008Program Recodon

How many are level-1,2,3... networks?

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100

0

100

200

300

400

500

600

700

800

900

1000

n=50,r=1n=50,r=2n=50,r=4n=50,r=8n=50,r=16n=50,r=32

level

nb of networks

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100

0

100

200

300

400

500

600

700

800

900

1000

n=10,r=1n=10,r=2n=10,r=4n=10,r=8n=10,r=16n=10,r=32

level

nb of networks

Simulated level-k networks

level

number of reticulations

0 5 9

9

5

0

Simulate 1000 phylogenetic networks using the coalescent model with recombination.

Link between level and number of reticulations:

http://www.lirmm.fr/~gambette/ProgramGenerators

Summary on the level parameter

Advantages:• natural structure for all explicit phylogenetic networks• global tree-structure used algorithmically• finite graph patterns to represent blobs: generators

Limits:• number of generators exponential in the level• complex structure of generators• when recombination is not local, level doesn't help

Questions?Thank you for your attention!

TreeCloud available at http://treecloud.org - Slides available at http://www.lirmm.fr/~gambette/RePresentationsENG.php

Split network and reticulogram of the 50 most frequent words in CPM titles, hyperlex cooccurrence distance.

TreeCloud

TreeCloud T-Rex