Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups)...

33
Parsimony and searching tree- space

description

Sometimes the data agrees ACCGCTTA ACTGCTTA ACTGCTAAACTGCTTA ACCCCTTA Time ACCCCATA ACCCCTTA ACCCCATA ACTGCTTA ACTGCTAA

Transcript of Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups)...

Page 1: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Parsimony and searching tree-space

Page 2: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The basic idea

To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived traits).

More simply put, we assume that if species are similar it is usually due to common descent rather than due to chance.

Page 3: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Sometimes the data agreesACCGCTTA

ACTGCTTA

ACTGCTAAACTGCTTA

ACCCCTTA

ACCCCTTA

Tim

e

ACCCCATA

ACCCCTTAACCCCATAACTGCTTAACTGCTAA

Page 4: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Sometimes notACCGCTTA

ACTGCTTA

ACTGCTAAACTGCTTC

ACCCCTTA

ACCCCTTC

Tim

e

ACCCCATA

ACCCCTTCACCCCATAACTGCTTCACTGCTAA

Page 5: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Homoplasy When we have two or more characters that can’t

possibly fit on the same tree without requiring one character to undergo a parallel change or reversal it is called homoplasy.

ACCGCTTA

ACTGCTTA

ACTGCTAAACTGCTTC

ACCCCTTA

ACCCCTTC

Tim

e

ACCCCATA

Page 6: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

How can we choose the best tree?

To decide which tree is best we can use an optimality criterion.

Parsimony is one such criterion. It chooses the tree which requires the fewest

mutations to explain the data. The Principle of Parsimony is the general

scientific principle that accepts the simplest of two explanations as preferable.

Page 7: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4)(1,3),(2,4)

1

2

3

4

1

3

2

4

Page 8: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0(1,3),(2,4) 0

A

A

A

A

A

A

A

A

Page 9: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 001(1,3),(2,4) 002

C

C

T

T

C

T

C

T

C T

C

TT

C

Page 10: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0011(1,3),(2,4) 0022

C

C

G

G

C

G

C

G

C G

C

GG

C

Page 11: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 001101(1,3),(2,4) 002201

T

A

T

T

T

T

A

T

T

A

A

T

Page 12: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 0011011(1,3),(2,4) 0022011

T

T

T

A

T

T

T

A

T

A

T

A

Page 13: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA(1,2),(3,4) 00110112 6(1,3),(2,4) 00220111 7

C

A

C

A

C

C

A

A

A

C

C

A

C A

According to the parsimony optimality criterion we should prefer the tree (1,2),(3,4) over the tree (1,3),(2,4) as it requires the fewest mutations.

Page 14: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Maximum Parsimony The parsimony criterion tries to minimise the

number of mutations required to explain the data

The “Small Parsimony Problem” is to compute the number of mutations required on a given tree.

For small examples it is straightforward to see how many mutations are neededCat

Dog

Rat

Mouse

A

A

G

G

G

A

Cat

Dog

RatA

G

G

A

A

A

Mouse

Page 15: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The Fitch algorithm For larger examples we need an algorithm to solve

the small parsimony problem

a

b cd

e

fgh

Sitea Ab Ac Cd Ce Gf Gg Th A

Page 16: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The Fitch algorithm

Label the tips of the tree with the observed sequence at the site

A

A CC

G

GTA

Page 17: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The Fitch algorithm

Pick an arbitrary root to work towards

A

A CC

G

GTA

Page 18: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The Fitch algorithm Work from the tips of the tree towards the root. Label

each node with the intersection of the states of its child nodes.

If the intersection is empty label the node with the union and add one to the cost

A

A CC

G

GTA

A

{A,T}A

{C,G}C{A,C,G}

Cost 4

Page 19: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Fitch continued…

The Fitch algorithm also has a second phase that allocates states to the internal nodes but it does not affect the cost.

To find the Fitch cost of an alignment for a particular tree we just sum the Fitch costs of all the sites.

Page 20: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Page 21: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The “large parsimony problem”

The small parsimony problem – to find the score of a given tree - can be solved in linear time in the size of the tree.

The large parsimony problem is to find the tree with minimum score.

It is known to be NP-Hard.

Page 22: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

How many trees are there?

#species #unrooted binary tip-labelled trees

4 35 3*5=156 3*5*7=1057 3*5*7*9=945

10 2,027,02520 2.2*1020

n (2n-5)!!

An exact search for the best tree, where each tree is evaluated according to some optimality criterion such as parsimony quickly becomes intractable as the number of species increases

Page 23: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Counting trees1

2 3

1 1 1

1

2

2 2

5 2

43

3

3 34 44

1 3 2

45

5 1 2

43

1 2 5

43

1 4 2

53

1

1 x 3 = 3

1 x 3 x 5 = 15

Page 24: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Search strategies Exact search

possible for small n only

Branch and Bound up to ~20 taxa

Local Search - Heuristics pick a good starting tree and use moves within a

“neighbourhood” to find a better tree.

Meta-heuristics Genetic algorithms Simulated annealing The ratchet

Page 25: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Exact searches for small number of taxa (n<=12) it is

possible to compute the score of every tree

Branch and Bound searches also guarantee to find the optimal solution but use some clever rules to avoid having to check all trees. They may be effective for up to 25 taxa.

Page 26: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

http://evolution.gs.washington.edu/gs541/2005/lecture25.pdf

Page 27: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

http://evolution.gs.washington.edu/gs541/2005/lecture25.pdf

No need to evaluate this whole branch of the search tree, as no tree can have a score better than 9

Page 28: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Page 29: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Page 30: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

The problem of local optima

Page 31: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Nearest Neighbor Interchange(NNI)

Page 32: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Subtree Pruning Regrafting (SPR)

Page 33: Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Tree Bisection Reconnection(TBR)