MCB 372 #12: Tree, Quartets and Supermatrix Approaches

Post on 15-Jan-2016

30 views 0 download

Tags:

description

MCB 372 #12: Tree, Quartets and Supermatrix Approaches. J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology. Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn). - PowerPoint PPT Presentation

Transcript of MCB 372 #12: Tree, Quartets and Supermatrix Approaches

MCB 372 #12: Tree, Quartets and Supermatrix Approaches

Collaborators:

Olga Zhaxybayeva (Dalhousie)

Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn)

Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics

Edvard Munch, The Dance of Life (1900)

J. Peter Gogarten

University of ConnecticutDept. of Molecular and Cell Biology

In the Felsenstein Zone“long branches attract”

0.8

0.1

0.8

0.10.1

C

B

D

A

A B

C D“true” tree

inferred treeA

B

C

D

0

10

20

30

40

50

60

70

80

90

100

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Long Branch Length (substitutions per site)

Reconstructions

LBR

Correct

LBA

Protpars reconstructions

B

A C

D

D

A C

B

ML reconstructions with alignment step

((A:x,B:0.1):0.1,C:0.1,D:x) With Alignment

0

10

20

30

40

50

60

70

80

90

100

0.1 5.1 10.1 15.1 20.1 25.1 30.1 35.1 40.1 45.1

LBA LBR Tie (Non-original) Tie (Original) Original

long branch attraction artifact

100% bootstrap support for bipartition (AD)(CB)

the two longest branches join together

What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV

seq. from B

seq. from A

seq. from Cseq. from D

seq. from B

seq. from Aseq. from C

seq. from D

True Tree:

Consensus of all trees from all bootstrap samples

2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila

+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum

Consensus of all consensus trees

2 P_mobilis3 Thermosipho1 F_nodosum4 T_lettingae5 T_maritima7 T_RQ26 T_petrophila

+------6 T_petrophila +318.0-| +745.0-| +------7T_RQ2 | | +330.0-| +-------------5T_maritima | | +------| +--------------------4 T_lettingae | | | | +------3 Thermosipho | +--------------562.0-| | +------1 F_nodosum | +----------------------------------2 P_mobilis

Consensus of all collapsed (<95%) consensus trees

+----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum

If you still have difficulties with tree do the tree tests at

http://www.sciencemag.org/cgi/content/full/310/5750/979/DC1

METAGENOME

Welch et al, 2002

A.W

.F. E

dwar

ds 1

998

Edw

ards

-Ve

nn c

ogw

hee

l

core

Strain-specific

Pan-genome+ +

+

Genomic Islands

Binnewies, Motro et al., Funct. Integr. Genomics (2006) 6: 165–185

Gene frequency in a typical genome

-Pick a random gene from any of the 293 genomes

-Search in how many genomes this gene is present

-Sampling of 15,000 genes

F(x) = sum [ An*exp(Kn*x)]

(Character genes)(Accessory pool) (Extended Core)

Kézdy-Swinbourne Plot If f(x)=K+A • exp(-k•x), then

f(x+∆x)=K+A • exp(-k•(x+∆x)).

Through elimination of A:

f(x+∆x)=exp(-k • ∆x) • f(x) + K’

And for x, f(x)K, f(x+∆x)K

0

50

100

150

200

250

300

350

400

450

500

-100 0 100 200 300 400 500 600

delta x = 10

2030405060708090100110120130

Novel genes after looking in x genomes

No

vel g

en

es

aft

er

loo

kin

g in

x +

∆x

ge

no

me

s

only values with x ≥ 80 genomes were included

Even after comparing to a very large (infinite) number of bacterial genomes, on average, each new genome will contain about 230 genes that do not have a homolog in the other genomes.

~230 novel genes

per genome

ca.70%

ca. 5%

ca. 25%

Gene frequency in individual genomes

Extended Core

Character Genes

Accessory Pool(mainly genes acquired form the mobilome)

Approximate number of genes sampled in 200 bacterial genomes:

25,160 core genes453,781 extended core genes156,259 accessory genes

The Phylogenetic Position of Thermotoga

(a) concordant genes, (b) all genes & according to 16S (c) according to phylogenetically discordant genes

Gophna, U., Doolittle, W.F. & Charlebois, R.L.:

Weighted genome trees: refinements and applications. J. Bacteriol. (2005)

From

: D

elsuc F, B

rinkmann H

, Philippe H

.P

hylogenomics and the reconstruction of the tree of life.

Nat R

ev Genet. 2005 M

ay;6(5):361-75.

QuickTime™ and a decompressor

are needed to see this picture.

Supertree vs. Supermatrix

Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree.

QuickTime™ and a decompressor

are needed to see this picture.

From

: A

lan de Queiroz John G

atesy: T

he supermatrix approach to system

aticsT

rends Ecol E

vol. 2007 Jan;22(1):34-41

A) Template tree

B) Generate 100 datasets using Evolver with certain amount of HGTs

C) Calculate 1 tree using the concatenated dataset or 100 individual trees

D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Supermatrix versus Quartet based Supertree

inset: simulated phylogeny