MCB 372 #12: Tree, Quartets and Supermatrix Approaches

18
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn) Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics Edvard Munch, The Dance of Life (1900) J. Peter Gogarten University of Connecticut t. of Molecular and Cell Biology

description

MCB 372 #12: Tree, Quartets and Supermatrix Approaches. J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology. Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn). - PowerPoint PPT Presentation

Transcript of MCB 372 #12: Tree, Quartets and Supermatrix Approaches

Page 1: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

MCB 372 #12: Tree, Quartets and Supermatrix Approaches

Collaborators:

Olga Zhaxybayeva (Dalhousie)

Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn)

Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics

Edvard Munch, The Dance of Life (1900)

J. Peter Gogarten

University of ConnecticutDept. of Molecular and Cell Biology

Page 2: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

In the Felsenstein Zone“long branches attract”

0.8

0.1

0.8

0.10.1

C

B

D

A

A B

C D“true” tree

inferred treeA

B

C

D

Page 3: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

0

10

20

30

40

50

60

70

80

90

100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Long Branch Length (substitutions per site)

Reconstructions

LBR

Correct

LBA

Protpars reconstructions

B

A C

D

D

A C

B

Page 4: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

ML reconstructions with alignment step

((A:x,B:0.1):0.1,C:0.1,D:x) With Alignment

0

10

20

30

40

50

60

70

80

90

100

0.1 5.1 10.1 15.1 20.1 25.1 30.1 35.1 40.1 45.1

LBA LBR Tie (Non-original) Tie (Original) Original

Page 5: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

long branch attraction artifact

100% bootstrap support for bipartition (AD)(CB)

the two longest branches join together

What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV

seq. from B

seq. from A

seq. from Cseq. from D

seq. from B

seq. from Aseq. from C

seq. from D

True Tree:

Page 6: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Consensus of all trees from all bootstrap samples

2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila

+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum

Page 7: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Consensus of all consensus trees

2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila

+------6 T_petrophila +318.0-| +745.0-| +------7T_RQ2 | | +330.0-| +-------------5T_maritima | | +------| +--------------------4 T_lettingae | | | | +------3 Thermosipho | +--------------562.0-| | +------1 F_nodosum | +----------------------------------2 P_mobilis

Page 8: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Consensus of all collapsed (<95%) consensus trees

+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum

If you still have difficulties with tree do the tree tests at

http://www.sciencemag.org/cgi/content/full/310/5750/979/DC1

Page 9: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

METAGENOME

Welch et al, 2002

A.W

.F. E

dwar

ds 1

998

Edw

ards

-Ve

nn c

ogw

hee

l

core

Strain-specific

Pan-genome+ +

+

Page 10: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Genomic Islands

Binnewies, Motro et al., Funct. Integr. Genomics (2006) 6: 165–185

Page 11: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Gene frequency in a typical genome

-Pick a random gene from any of the 293 genomes

-Search in how many genomes this gene is present

-Sampling of 15,000 genes

F(x) = sum [ An*exp(Kn*x)]

(Character genes)(Accessory pool) (Extended Core)

Page 12: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Kézdy-Swinbourne Plot If f(x)=K+A • exp(-k•x), then

f(x+∆x)=K+A • exp(-k•(x+∆x)).

Through elimination of A:

f(x+∆x)=exp(-k • ∆x) • f(x) + K’

And for x, f(x)K, f(x+∆x)K

0

50

100

150

200

250

300

350

400

450

500

-100 0 100 200 300 400 500 600

delta x = 10

2030405060708090100110120130

Novel genes after looking in x genomes

No

vel g

en

es

aft

er

loo

kin

g in

x +

∆x

ge

no

me

s

only values with x ≥ 80 genomes were included

Even after comparing to a very large (infinite) number of bacterial genomes, on average, each new genome will contain about 230 genes that do not have a homolog in the other genomes.

~230 novel genes

per genome

Page 13: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

ca.70%

ca. 5%

ca. 25%

Gene frequency in individual genomes

Extended Core

Character Genes

Accessory Pool(mainly genes acquired form the mobilome)

Approximate number of genes sampled in 200 bacterial genomes:

25,160 core genes453,781 extended core genes156,259 accessory genes

Page 14: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

The Phylogenetic Position of Thermotoga

(a) concordant genes, (b) all genes & according to 16S (c) according to phylogenetically discordant genes

Gophna, U., Doolittle, W.F. & Charlebois, R.L.:

Weighted genome trees: refinements and applications. J. Bacteriol. (2005)

Page 15: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

From

: D

elsuc F, B

rinkmann H

, Philippe H

.P

hylogenomics and the reconstruction of the tree of life.

Nat R

ev Genet. 2005 M

ay;6(5):361-75.

QuickTime™ and a decompressor

are needed to see this picture.

Page 16: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Supertree vs. Supermatrix

Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree.

QuickTime™ and a decompressor

are needed to see this picture.

From

: A

lan de Queiroz John G

atesy: T

he supermatrix approach to system

aticsT

rends Ecol E

vol. 2007 Jan;22(1):34-41

Page 17: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

A) Template tree

B) Generate 100 datasets using Evolver with certain amount of HGTs

C) Calculate 1 tree using the concatenated dataset or 100 individual trees

D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Page 18: MCB 372 #12:   Tree, Quartets and Supermatrix Approaches

Supermatrix versus Quartet based Supertree

inset: simulated phylogeny