BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree...

22
BNFO 602 Phylogenetics Usman Roshan
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree...

Page 1: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

BNFO 602 Phylogenetics

Usman Roshan

Page 2: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Summary of last time

• Models of evolution

• Distance based tree reconstruction– Neighbor joining– UPGMA

Page 3: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Why phylogenetics?

• Study of evolution– Origin and migration of humans– Origin and spead of disease

• Many applications in comparative bioinformatics– Sequence alignment– Motif detection (phylogenetic motifs, evolutionary trace,

phylogenetic footprinting)– Correlated mutation (useful for structural contact prediction)– Protein interaction– Gene networks– Vaccine devlopment– And many more…

Page 4: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum Parsimony

• Character based method

• NP-hard (reduction to the Steiner tree problem)

• Widely-used in phylogenetics

• Slower than NJ but more accurate

• Faster than ML

• Assumes i.i.d.

Page 5: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum Parsimony

• Input: Set S of n aligned sequences of length k

• Output: A phylogenetic tree T– leaf-labeled by sequences in S– additional sequences of length k labeling the

internal nodes of T

such that is minimized. ∑∈ )(),(

),(TEji

jiH

Page 6: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum parsimony (example)

• Input: Four sequences– ACT– ACA– GTT– GTA

• Question: which of the three trees has the best MP scores?

Page 7: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum Parsimony

ACT

GTT ACA

GTA ACA ACT

GTAGTT

ACT

ACA

GTT

GTA

Page 8: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum Parsimony

ACT

GTT

GTT GTA

ACA

GTA

12

2

MP score = 5

ACA ACT

GTAGTT

ACA ACT

3 1 3

MP score = 7

ACT

ACA

GTT

GTAACA GTA

1 2 1

MP score = 4

Optimal MP tree

Page 9: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Maximum Parsimony: computational complexity

ACT

ACA

GTT

GTAACA GTA

1 2 1

MP score = 4

Finding the optimal MP tree is NP-hard

Optimal labeling can becomputed in linear time O(nk)

Page 10: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local search strategies

Phylogenetic trees

Cost

Global optimum

Local optimum

Page 11: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local search for MP

• Determine a candidate solution s• While s is not a local minimum

– Find a neighbor s’ of s such that MP(s’)<MP(s)– If found set s=s’– Else return s and exit

• Time complexity: unknown---could take forever or end quickly depending on starting tree and local move

• Need to specify how to construct starting tree and local move

Page 12: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Starting tree for MP

• Random phylogeny---O(n) time• Greedy-MP

Page 13: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Greedy-MP

Greedy-MP takes O(n^2k^2) time

Page 14: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local moves for MP: NNI

• For each edge we get two different topologies

• Neighborhood size is 2n-6

Page 15: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local moves for MP: SPR

• Neighborhood size is quadratic in number of taxa• Computing the minimum number of SPR moves

between two rooted phylogenies is NP-hard

Page 16: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local moves for MP: TBR

• Neighborhood size is cubic in number of taxa• Computing the minimum number of TBR moves

between two rooted phylogenies is NP-hard

Page 17: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Local optima is a problem

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 48 96 144 192 240 288 336

TNT

Page 18: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Iterated local search: escape local optima by perturbation

Local optimumLocal search

Page 19: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Iterated local search: escape local optima by perturbation

Local optimum

Output of perturbation

Perturbation

Local search

Page 20: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

Iterated local search: escape local optima by perturbation

Local optimum

Output of perturbation

Perturbation

Local search

Local search

Page 21: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.

ILS for MP

• Ratchet

• Iterative-DCM3

• TNT

Page 22: BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.