MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf ·...

141
MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience Tel-AvivUniversity MP,ML,AMLReconstructionofPhylogeneticTrees:AStatusReport –p.1

Transcript of MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf ·...

Page 1: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

MP, ML, AML Reconstructionof Phylogenetic Trees:

A Status ReportBenny Chor

School of Computer ScienceTel-Aviv University

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.1

Page 2: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Phylogenetic Reconstruction• Input: A set of n aligned sequences (genes,proteins) from n species,

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.2

Page 3: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Phylogenetic Reconstruction• Input: A set of n aligned sequences (genes,proteins) from n species,

• Goal: Reconstruct the tree which best explainsthe evolutionary history of this gene/protein.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.2

Page 4: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Phylogenetic Reconstruction• Input: A set of n aligned sequences (genes,proteins) from n species,

• Goal: Reconstruct the tree which best explainsthe evolutionary history of this gene/protein.

• Tree reconstruction is still a challenge today.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.2

Page 5: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Phylogenetic Reconstruction• Input: A set of n aligned sequences (genes,proteins) from n species,

• Goal: Reconstruct the tree which best explainsthe evolutionary history of this gene/protein.

• Tree reconstruction is still a challenge today.• Many concrete questions are still unresolved (e.g.mammalian evolutionary tree).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.2

Page 6: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Phylogenetic Reconstruction• Input: A set of n aligned sequences (genes,proteins) from n species,

• Goal: Reconstruct the tree which best explainsthe evolutionary history of this gene/protein.

• Tree reconstruction is still a challenge today.• Many concrete questions are still unresolved (e.g.mammalian evolutionary tree).

• Most realistic formulations of the problem, whichtake errors into account, give rise to hardcomputational problems.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.2

Page 7: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 8: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 9: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 10: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 11: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 12: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:• Maximum Parsimony.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 13: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:• Maximum Parsimony.• Maximum Likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 14: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:• Maximum Parsimony.• Maximum Likelihood.

• Additional Methods:

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 15: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:• Maximum Parsimony.• Maximum Likelihood.

• Additional Methods:• Quartets Based.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 16: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Popular Methods• Distance based methods:

• UPGMA• Neighbor Joining.• Buneman trees.

• Character Based Methods:• Maximum Parsimony.• Maximum Likelihood.

• Additional Methods:• Quartets Based.• Disc Covering.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.3

Page 17: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Talk Outline• Maximum likelihood (ML).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.4

Page 18: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Talk Outline• Maximum likelihood (ML).• The likelihood surface.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.4

Page 19: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Talk Outline• Maximum likelihood (ML).• The likelihood surface.• Existence of multiple maxima.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.4

Page 20: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Talk Outline• Maximum likelihood (ML).• The likelihood surface.• Existence of multiple maxima.• Computation complexity: Maximum likelihoodvs. maximum parsimony (MP).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.4

Page 21: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Talk Outline• Maximum likelihood (ML).• The likelihood surface.• Existence of multiple maxima.• Computation complexity: Maximum likelihoodvs. maximum parsimony (MP).

• Ancestral maximum likelihood (AML) and itscomputational complexity.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.4

Page 22: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood• Input: A set of n observed sequences and anunderlying substitution model.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.5

Page 23: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood• Input: A set of n observed sequences and anunderlying substitution model.

• Desired Output: The weighted tree T thatmaximizes the likelihood of the data.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.5

Page 24: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood• Input: A set of n observed sequences and anunderlying substitution model.

• Desired Output: The weighted tree T thatmaximizes the likelihood of the data.

• Likelihood of a data: The conditional probabilityof producing the data, given the modelparameters.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.5

Page 25: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood• Input: A set of n observed sequences and anunderlying substitution model.

• Desired Output: The weighted tree T thatmaximizes the likelihood of the data.

• Likelihood of a data: The conditional probabilityof producing the data, given the modelparameters.

• Likelihood is a common optimization criteria innumerous settings, including phylogenetic(Felsenstein 1981).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.5

Page 26: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

2–State Substitution Model

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

• Just two characters states, X and Y.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.6

Page 27: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

2–State Substitution Model

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

• Just two characters states, X and Y.• Transitions between states are symmetric.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.6

Page 28: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

2–State Substitution Model

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

• Just two characters states, X and Y.• Transitions between states are symmetric.• Equal rates across sites.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.6

Page 29: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

2–State Substitution Model

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

• Just two characters states, X and Y.• Transitions between states are symmetric.• Equal rates across sites.• Every column induces a pattern.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.6

Page 30: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

2–State Substitution Model

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

• Just two characters states, X and Y.• Transitions between states are symmetric.• Equal rates across sites.• Every column induces a pattern.• Remark: A simple model, yet very powerful.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.6

Page 31: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Neyman 2–State SubstitutionModel

e12

��

�e2

��

�e1

��

�e3

��

e123

1

2

3

4

For each edge e of a tree T , the edge weight pe rep-

resents the probability of having different states at the

two ends of e.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.7

Page 32: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

A Very Simple ExampleFour species (n = 4), just one site (c = 1)

species observed data

1 X2 X3 Y4 Y

Analyze the natural tree (12)(34)

e12�

��e2

���

e1 ���e3

���e123

(1) X

(2) X

Y (3)

Y (4)

? ?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.8

Page 33: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Computing the LikelihoodEach unknown state (?) can assume one of twopossibilities, X or Y. For example, the assignment

p12�

��p2

���

p1 ���p3

���p123

(1) X

(2) X

Y (3)

Y (4)

X Y

contributes (1 − p1)·(1 − p2)·p12 ·(1 − p3)·(1 − p123).

The likelihood is the sum of this

+ three similar expressions. . .

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.9

Page 34: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Computing the Likelihood (2)

L( data | T, edge parameters)�

∑internal assignments

∏edges p

de(1 − p)�−de .

Each de is number of unequal sites along edge e. Itdepends on the internal assignment a, and inputpattern t at two ends of the edge.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.10

Page 35: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Computing the Likelihood (2)

L( data | T, edge parameters)�

∑internal assignments

∏edges p

de(1 − p)�−de .

Each de is number of unequal sites along edge e. Itdepends on the internal assignment a, and inputpattern t at two ends of the edge.

A well defined objective function to maximize.

Termed average likelihood by Penny and Steel.

Widely used in practice.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.10

Page 36: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Three Likelihood Versions• Big Likelihood: Given the sequence data, find atree and edge weights that maximizeL(data|tree & edge weights).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.11

Page 37: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Three Likelihood Versions• Big Likelihood: Given the sequence data, find atree and edge weights that maximizeL(data|tree & edge weights).

• Small Likelihood: Given observed data & a tree,but not the edge weights, find the edge weightsthat maximize the likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.11

Page 38: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Three Likelihood Versions• Big Likelihood: Given the sequence data, find atree and edge weights that maximizeL(data|tree & edge weights).

• Small Likelihood: Given observed data & a tree,but not the edge weights, find the edge weightsthat maximize the likelihood.

• Tiny Likelihood: Given observed data & a tree &edge weights, find the likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.11

Page 39: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Three Likelihood Versions• Big Likelihood: Given the sequence data, find atree and edge weights that maximizeL(data|tree & edge weights).

• Small Likelihood: Given observed data & a tree,but not the edge weights, find the edge weightsthat maximize the likelihood.

• Tiny Likelihood: Given observed data & a tree &edge weights, find the likelihood.

• Tiny likelihood can be efficiently computed usingdynamic programming (Felsenstein, 1981).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.11

Page 40: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Hill Climbing / Small Likelihood

• Typical approach to small likelihood, used inpractice:

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.12

Page 41: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Hill Climbing / Small Likelihood

• Typical approach to small likelihood, used inpractice:

• Start at some initial point with edge weights p.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.12

Page 42: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Hill Climbing / Small Likelihood

• Typical approach to small likelihood, used inpractice:

• Start at some initial point with edge weights p.• Apply hill climbing to the likelihood function, tillreaching a maximum.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.12

Page 43: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface• For hill climbing to be guaranteed to find themaximum, there must be a single local andglobal maximum in the parameter space.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.13

Page 44: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface• For hill climbing to be guaranteed to find themaximum, there must be a single local andglobal maximum in the parameter space.

• Fukami and Tateno (89), Tillier (94): For anytree, the ML point will be unique.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.13

Page 45: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface• For hill climbing to be guaranteed to find themaximum, there must be a single local andglobal maximum in the parameter space.

• Fukami and Tateno (89), Tillier (94): For anytree, the ML point will be unique.

• Steel (94): Proofs are erroneous - A simple butpathological counter example (multiple maximaon the wrong tree).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.13

Page 46: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface• For hill climbing to be guaranteed to find themaximum, there must be a single local andglobal maximum in the parameter space.

• Fukami and Tateno (89), Tillier (94): For anytree, the ML point will be unique.

• Steel (94): Proofs are erroneous - A simple butpathological counter example (multiple maximaon the wrong tree).

• (94–present): Hill climbing techniques still used.Steel’s counter example is considered too“biologically unrealistic” to warrant concern.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.13

Page 47: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface (cont.)• Rogers and Swofford (99): Simulation Study

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.14

Page 48: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface (cont.)• Rogers and Swofford (99): Simulation Study

• Data is simulated on a tree.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.14

Page 49: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface (cont.)• Rogers and Swofford (99): Simulation Study

• Data is simulated on a tree.• Multiple optima are rare...

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.14

Page 50: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface (cont.)• Rogers and Swofford (99): Simulation Study

• Data is simulated on a tree.• Multiple optima are rare...• ...especially on the correct tree.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.14

Page 51: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

The Likelihood Surface (cont.)• Rogers and Swofford (99): Simulation Study

• Data is simulated on a tree.• Multiple optima are rare...• ...especially on the correct tree.

• Goal here: Investigate the problem analytically(joint work with Hendy, Holland, Penny).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.14

Page 52: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 53: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).• Splits and sequence spectra (change of variables)

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 54: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).• Splits and sequence spectra (change of variables)• Constrained optimization.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 55: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).• Splits and sequence spectra (change of variables)• Constrained optimization.• Systems of polynomial equations.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 56: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).• Splits and sequence spectra (change of variables)• Constrained optimization.• Systems of polynomial equations.• Analytical solution: very hard in general, evenfor four taxa.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 57: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximizing Likelihood on TreesTools used

• Hadamard conjugation (Hendy and Penny 93).• Splits and sequence spectra (change of variables)• Constrained optimization.• Systems of polynomial equations.• Analytical solution: very hard in general, evenfor four taxa.

• Employing computer algebra and algebraicgeometry tools.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.15

Page 58: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Example: Conservative Data,Two Very Different ML Trees

species observed data

1 XXXXXXXYYY XXY XY YX XY X2 XXXXXXXYYY YYX YX YX YX X3 XXXXXXXYYY YYX XY XY XY X4 XXXXXXXYYY YYX XY XY YX Y

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.16

Page 59: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Example: Conservative Data,Two Very Different ML Trees

� �

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.16

Page 60: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Small Likelihood & MultipleMaxima

• Small Likelihood (reminder): Given observeddata & a tree, but not the edge weights, find theedge weights that maximize the likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.17

Page 61: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Small Likelihood & MultipleMaxima

• Small Likelihood (reminder): Given observeddata & a tree, but not the edge weights, find theedge weights that maximize the likelihood.

• Multiple ML points for general case imply smalllikelihood cannot be solved by hill climbing.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.17

Page 62: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Small Likelihood & MultipleMaxima

• Small Likelihood (reminder): Given observeddata & a tree, but not the edge weights, find theedge weights that maximize the likelihood.

• Multiple ML points for general case imply smalllikelihood cannot be solved by hill climbing.

• Not clear if small likelihood has efficient (worstcase) solutions.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.17

Page 63: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Parsimony (MP)• Big Parsimony: Given the sequence data, find atree and assignment of sequences to internalnodes that minimizes the number of changesacross all edges.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.18

Page 64: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Parsimony (MP)• Big Parsimony: Given the sequence data, find atree and assignment of sequences to internalnodes that minimizes the number of changesacross all edges.

• Small Parsimony: Given the sequence data and atree, find internal assignment(s) that minimizestotal number of changes.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.18

Page 65: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Parsimony (MP)• Big Parsimony: Given the sequence data, find atree and assignment of sequences to internalnodes that minimizes the number of changesacross all edges.

• Small Parsimony: Given the sequence data and atree, find internal assignment(s) that minimizestotal number of changes.

• MP considered by practitioners easier than ML.Indeed small parsimony has efficient algorithms(Fitch 1971, Sankoff and Cedergren 1983).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.18

Page 66: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Complexity of ReconstructionBoth MP and ML have well-defined objectivefunctions=⇒ Reconstruction is a computational problem.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.19

Page 67: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Complexity of ReconstructionBoth MP and ML have well-defined objectivefunctions=⇒ Reconstruction is a computational problem.

Number of trees over n leaves is exponential in n=⇒ Cannot exhaustively search all trees.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.19

Page 68: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Complexity: Small MP vs. ML• Small parsimony is in P.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.20

Page 69: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Complexity: Small MP vs. ML• Small parsimony is in P.• Small likelihood – unknown.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.20

Page 70: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Complexity: Big MP vs. MLIs ML Computationally Intractable?

Big MP known for almost 20 years to becomputationally intractable [Day et al., 1986,reduction from vertex cover] .

No such result has been found for Big ML to date(2004).

Tuffley and Steel (1997): Relations betweenlikelihood and parsimony.

Addario-Berry et al. (2003): Big Ancestral ML ishard.

Still, no cigar (and not even close).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.21

Page 71: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Ancestral ML (AML)• A tree reconstruction method that is “in between”ML and MP.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.22

Page 72: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Ancestral ML (AML)• A tree reconstruction method that is “in between”ML and MP.

• The goal is to simultaneously find edge weightsand assignment of sequences to internal nodes sothat the likelihood of the data, given the treeparameters, is maximized.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.22

Page 73: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Ancestral ML (AML)• A tree reconstruction method that is “in between”ML and MP.

• The goal is to simultaneously find edge weightsand assignment of sequences to internal nodes sothat the likelihood of the data, given the treeparameters, is maximized.

• AML is widely used in evolutionary studies.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.22

Page 74: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Ancestral ML (AML)• A tree reconstruction method that is “in between”ML and MP.

• The goal is to simultaneously find edge weightsand assignment of sequences to internal nodes sothat the likelihood of the data, given the treeparameters, is maximized.

• AML is widely used in evolutionary studies.• Also termed joint reconstruction of ancestralsequences.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.22

Page 75: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Ancestral ML (AML)• A tree reconstruction method that is “in between”ML and MP.

• The goal is to simultaneously find edge weightsand assignment of sequences to internal nodes sothat the likelihood of the data, given the treeparameters, is maximized.

• AML is widely used in evolutionary studies.• Also termed joint reconstruction of ancestralsequences.

• AML computes the likelihood contributionresulting from best assignment to internal nodes,while “regular ML” sums up over all assignments.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.22

Page 76: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Two AML Versions• Big AML: Given the sequence data, find a tree,assignment to internal nodes, and edge weightsthat maximize the likelihood of the data.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.23

Page 77: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Two AML Versions• Big AML: Given the sequence data, find a tree,assignment to internal nodes, and edge weightsthat maximize the likelihood of the data.

• Small AML: Given observed data, a tree andedge weights, but not the internal assignment,find the assignment that maximize the likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.23

Page 78: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Two AML Versions• Big AML: Given the sequence data, find a tree,assignment to internal nodes, and edge weightsthat maximize the likelihood of the data.

• Small AML: Given observed data, a tree andedge weights, but not the internal assignment,find the assignment that maximize the likelihood.

• PPSG 2000: A poly time, dynamic programmingalgorithm for small AML.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.23

Page 79: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Two AML Versions• Big AML: Given the sequence data, find a tree,assignment to internal nodes, and edge weightsthat maximize the likelihood of the data.

• Small AML: Given observed data, a tree andedge weights, but not the internal assignment,find the assignment that maximize the likelihood.

• PPSG 2000: A poly time, dynamic programmingalgorithm for small AML.

• Remark: Version where tree is given but no edgeweights or assignment is still open.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.23

Page 80: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Two AML Versions• Big AML: Given the sequence data, find a tree,assignment to internal nodes, and edge weightsthat maximize the likelihood of the data.

• Small AML: Given observed data, a tree andedge weights, but not the internal assignment,find the assignment that maximize the likelihood.

• PPSG 2000: A poly time, dynamic programmingalgorithm for small AML.

• Remark: Version where tree is given but no edgeweights or assignment is still open.

• ACHLPW 2003: Big AML is NP-hard.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.23

Page 81: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Useful AML Observation• Given sequence data, a tree, and assignment tointernal nodes.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.24

Page 82: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Useful AML Observation• Given sequence data, a tree, and assignment tointernal nodes.

• The edge weights that maximize the likelihood ofthe data equal de/k.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.24

Page 83: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Useful AML Observation• Given sequence data, a tree, and assignment tointernal nodes.

• The edge weights that maximize the likelihood ofthe data equal de/k.

• Where de equals the number of changes accrossedge e, and k is the common sequence length.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.24

Page 84: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML, ReformulatedPrevious observation implies

• Input: A set S of n binary sequences, each oflength

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.25

Page 85: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML, ReformulatedPrevious observation implies

• Input: A set S of n binary sequences, each oflength

• Goal: Find a tree T with n leaves, an assignmentp : E(T ) → [0, 1] of edge probabilities, and alabelling λ : V (T ) → {0, 1}k of the vertices suchthat

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.25

Page 86: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML, ReformulatedPrevious observation implies

• Input: A set S of n binary sequences, each oflength

• Goal: Find a tree T with n leaves, an assignmentp : E(T ) → [0, 1] of edge probabilities, and alabelling λ : V (T ) → {0, 1}k of the vertices suchthat1. The n labels of the leaves are exactly thesequences from S.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.25

Page 87: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML, ReformulatedPrevious observation implies

• Input: A set S of n binary sequences, each oflength

• Goal: Find a tree T with n leaves, an assignmentp : E(T ) → [0, 1] of edge probabilities, and alabelling λ : V (T ) → {0, 1}k of the vertices suchthat1. The n labels of the leaves are exactly thesequences from S.

2. the sum of all “edge entropies”∑e∈E(T ) H (de/k) is minimized.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.25

Page 88: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML, ReformulatedPrevious observation implies

• Input: A set S of n binary sequences, each oflength

• Goal: Find a tree T with n leaves, an assignmentp : E(T ) → [0, 1] of edge probabilities, and alabelling λ : V (T ) → {0, 1}k of the vertices suchthat1. The n labels of the leaves are exactly thesequences from S.

2. the sum of all “edge entropies”∑e∈E(T ) H (de/k) is minimized.

• H(p) = −p log(p) − (1 − p) log(1 − p) is thebinary entropy function.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.25

Page 89: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML vs. MPOptimization criteria

• Input: A set S of n binary sequences, each oflength k.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.26

Page 90: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML vs. MPOptimization criteria

• Input: A set S of n binary sequences, each oflength k.

• AML: Minimize the sum of all “edge entropies”∑e∈E(T ) H (de/k).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.26

Page 91: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML vs. MPOptimization criteria

• Input: A set S of n binary sequences, each oflength k.

• AML: Minimize the sum of all “edge entropies”∑e∈E(T ) H (de/k).

• MP: Minimize the sum of all “edge differences”∑e∈E(T ) de/k.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.26

Page 92: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

AML vs. MPOptimization criteria

• Input: A set S of n binary sequences, each oflength k.

• AML: Minimize the sum of all “edge entropies”∑e∈E(T ) H (de/k).

• MP: Minimize the sum of all “edge differences”∑e∈E(T ) de/k.

• Can think of the two problems as attempting tominimize different edge weights (functions of de).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.26

Page 93: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

NP hardness of AML: Ideas• MP was shown NP-hard by Day, Johnson,Sankoff using reduction from vertex cover (VC).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.27

Page 94: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

NP hardness of AML: Ideas• MP was shown NP-hard by Day, Johnson,Sankoff using reduction from vertex cover (VC).

• Analogy of AML and MP optimization criteriasuggests using similar approach.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.27

Page 95: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

NP hardness of AML: Ideas• MP was shown NP-hard by Day, Johnson,Sankoff using reduction from vertex cover (VC).

• Analogy of AML and MP optimization criteriasuggests using similar approach.

• Reduction from VC indeed identical.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.27

Page 96: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

NP hardness of AML: Ideas• MP was shown NP-hard by Day, Johnson,Sankoff using reduction from vertex cover (VC).

• Analogy of AML and MP optimization criteriasuggests using similar approach.

• Reduction from VC indeed identical.• Proof substantially more involved as entropy

H (de/k) is not as “well behaved” as plain edgedifferences de/k.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.27

Page 97: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems as of 2004• Hardness proof for big AML as a stepping stonefor big ML?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.28

Page 98: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems as of 2004• Hardness proof for big AML as a stepping stonefor big ML?

• Is small ML in poly-time?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.28

Page 99: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Our Major QuestionIs ML Computationally Intractable?

MP known for almost 20 years to becomputationally intractable [Day et al., 1986,translation from vertex cover] .

No such result has been found for ML to date.

Tuffley and Steel (1997): Relations betweenlikelihood and parsimony.

Addario-Berry et al. (2003): Ancestral ML ishard.

Still, no cigar (and not even close).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.29

Page 100: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Is ML Computationally In-tractable?

Still, no cigar (and not even close).

Particularly frustrating in light of intuition amongpractitioners that ML is harder than MP.

Maybe some slick and efficient ML algorithmlurks out there, waiting to be discovered?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.30

Page 101: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Is ML Computationally In-tractable?

Still, no cigar (and not even close).

Particularly frustrating in light of intuition amongpractitioners that ML is harder than MP.

Maybe some slick and efficient ML algorithmlurks out there, waiting to be discovered?

CT2005:ML is computationally hard (NP complete)=⇒ No such algorithm exists (unless P=NP).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.30

Page 102: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Intractability Proof: The BigPictureEfficiently translate vertex cover (VC) to ML.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.31

Page 103: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Intractability Proof: The BigPictureEfficiently translate vertex cover (VC) to ML.

=⇒

k�0�

110..0�0110..0�

000110..0� 00110..0�

0000110..0�

00000110..0�

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.31

Page 104: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Intractability Proof: The BigPictureEfficiently translate vertex cover (VC) to ML.

=⇒

k�0�

110..0�0110..0�

000110..0� 00110..0�

0000110..0�

00000110..0�

“Translation” means

Small cover =⇒ Large likelihood.

Large cover =⇒ Small likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.31

Page 105: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Vertex Cover in GraphsGiven a graph (V,E)

find a small set of vertices C

such that for each edge in the graph,

C contains at least one endpoint.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.32

Page 106: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Vertex Cover in GraphsGiven a graph (V,E)

find a small set of vertices C

such that for each edge in the graph,

C contains at least one endpoint.

(figure from www.cc.ioc.ee/jus/gtglossary/assets/vertex_cover.gif)

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.32

Page 107: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Well Known: Vertex Cover isIntractableThe decision version of this problem (does G has acover of size ≤ c) is computationally intractable.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.33

Page 108: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Well Known: Vertex Cover isIntractableThe decision version of this problem (does G has acover of size ≤ c) is computationally intractable.

(figure from http://wwwbrauer.in.tum.de/gruppen/theorie/hard/vc1.png)MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.33

Page 109: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood: DecisionProblem

Input: A set of equi length binary sequencesS1, S2, . . . , Sm, and a real number, D.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.34

Page 110: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood: DecisionProblem

Input: A set of equi length binary sequencesS1, S2, . . . , Sm, and a real number, D.

Question: Is there a tree T and edge lengths such thatlog2L(S1, S2, . . . , Sm | T, edge parameters) > D?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.34

Page 111: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Maximum Likelihood: DecisionProblem

Input: A set of equi length binary sequencesS1, S2, . . . , Sm, and a real number, D.

Question: Is there a tree T and edge lengths such thatlog2L(S1, S2, . . . , Sm | T, edge parameters) > D?

Notice: Yes/No question.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.34

Page 112: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Translating VC to MLThe following graph, with 5 vertices and 6 edges

1 3 2

4 5

Translates to

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.35

Page 113: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Translating VC to MLThe following graph, with 5 vertices and 6 edges

1 3 2

4 5

Translates to

A set with 7 binary sequences, each of length 5:

00000 10100 10010 01100

01001 00110 00101

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.35

Page 114: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Translating VC to MLIf G has n vertices andm edges, will constructm + 1 binary sequences, each of length n.

One sequence is all zeroes.

For every edge (i, j) ∈ E, have the sequencei−1︷ ︸︸ ︷

00..00 1

j−i−1︷ ︸︸ ︷00..00 1

n−j︷ ︸︸ ︷00..00︸ ︷︷ ︸

n

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.36

Page 115: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Relation between likelihood andparsimony (Tuffley and Steel)

L(S|T ) ≡ Pr(S|p∗, T ) ≥ 2− log(kc)·pars(S,T )−Cd

L(S|T ) ≡ Pr(S|p∗, T ) ≤ 2− log(kc)·pars(S,T )−Cu

Cu and Cd are sub quadratic functions of the size of|V (T )|, pars(S, T ), and k − kc.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.37

Page 116: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Relation between likelihood andparsimony (Tuffley and Steel)

L(S|T ) ≡ Pr(S|p∗, T ) ≥ 2− log(kc)·pars(S,T )−Cd

L(S|T ) ≡ Pr(S|p∗, T ) ≤ 2− log(kc)·pars(S,T )−Cu

Cu and Cd are sub quadratic functions of the size of|V (T )|, pars(S, T ), and k − kc.

Conclusion: if (k − kc) = O(|V (T )|) (as our case)then

L(S|T ) = O(pars(S, T ) · log(n)) + O(|V (T )|2)+O(pars(S, T ) log(pars(S, T ))

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.37

Page 117: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees: Definition1. Tree has an internal node (called the “root” ) with 0 length

edge to the all zero leaf.

2. All leaves are at distance one or two from the root.

3. Subtrees of distance two leaves contains one, two, or three

leaves. All sequences in a subtree with two or three leaves

share a “1” in same position.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.38

Page 118: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees: Definition1. Tree has an internal node (called the “root” ) with 0 length

edge to the all zero leaf.

2. All leaves are at distance one or two from the root.

3. Subtrees of distance two leaves contains one, two, or three

leaves. All sequences in a subtree with two or three leaves

share a “1” in same position.

k�0�

11000000..000�01100000..000�

00000110..000� 00001100..000�

10001000..000�

10000100..000�

00000000..011�

00000010..001�

00000011..000�

00000010..010�

k�0�

Root�

Yes NoMP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.38

Page 119: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees and VertexCovers[Day, 1986]: A canonical tree with degree d at rootexists

⇐⇒G has a cover of size d.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.39

Page 120: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees and Likelihood

Now establish relationship between degree of root ofcanonical trees and likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.40

Page 121: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees and Likelihood

Now establish relationship between degree of root ofcanonical trees and likelihood.

Given a canonical tree T withm + 1 leaves labelledby sequences from S. Let d denote the degree of theroot. Then for the optimal edge lengths p∗,

log(Pr(S |T, p∗)) = −(m + d) · log n + θ(n) .

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.40

Page 122: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Canonical Trees and Likelihood

Now establish relationship between degree of root ofcanonical trees and likelihood.

Given a canonical tree T withm + 1 leaves labelledby sequences from S. Let d denote the degree of theroot. Then for the optimal edge lengths p∗,

log(Pr(S |T, p∗)) = −(m + d) · log n + θ(n) .

So as n → ∞,− log(L)

(m + d) log(n)→ 1 .

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.40

Page 123: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Do We Have the Desired Proof?− log(L)

(m + d) log(n)→n→∞ 1 .

Seems to imply

Small cover =⇒ Large likelihood.

Large cover =⇒ Small likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.41

Page 124: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Do We Have the Desired Proof?− log(L)

(m + d) log(n)→n→∞ 1 .

Seems to imply

Small cover =⇒ Large likelihood.

Large cover =⇒ Small likelihood.

Take it easy. There is a problem here. What weactually showed is a reduction from VC to ML ofcanonical trees.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.41

Page 125: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Do We Have the Desired Proof?− log(L)

(m + d) log(n)→n→∞ 1 .

Seems to imply

Small cover =⇒ Large likelihood.

Large cover =⇒ Small likelihood.

Take it easy. There is a problem here. What weactually showed is a reduction from VC to ML ofcanonical trees.

But ML tree need not be canonical!

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.41

Page 126: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 127: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 128: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 129: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

Carry out a sequence of gentle modification.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 130: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

Carry out a sequence of gentle modification.

Each modification may decrease log likelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 131: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

Carry out a sequence of gentle modification.

Each modification may decrease log likelihood.

But only by a little (B log n).

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 132: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

Carry out a sequence of gentle modification.

Each modification may decrease log likelihood.

But only by a little (B log n).

Number of modification small enough.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 133: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

All Is Lost?

Not necessarily.

Starting from any ML tree (arbitrary shape).

Carry out a sequence of gentle modification.

Each modification may decrease log likelihood.

But only by a little (B log n).

Number of modification small enough.

So accumulated loss in log likelihood is small.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.42

Page 134: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Hardness ConclusionMaximum likelihood of phylogenetic tree iscomputationally intractable

No magic bullet!

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.43

Page 135: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 136: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 137: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

ML hardness under molecular clock.√

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 138: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

ML hardness under molecular clock.√

Efficient approximation algorithms.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 139: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

ML hardness under molecular clock.√

Efficient approximation algorithms.

Efficient algorithms for small likelihood: Givenobserved data & a tree, but not the edge weights,find the edge weights that maximize thelikelihood.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 140: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

ML hardness under molecular clock.√

Efficient approximation algorithms.

Efficient algorithms for small likelihood: Givenobserved data & a tree, but not the edge weights,find the edge weights that maximize thelikelihood.

Regions of input parameters where ML can beefficiently solved.

MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44

Page 141: MP, ML, AML Reconstruction of Phylogenetic Trees: A Status ... › ~bchor › CG06 › ML.pdf · MP, ML, AML Reconstruction of Phylogenetic Trees: A Status Report BennyChor SchoolofComputerScience

Open Problems and Further Re-search

Four states characters & beyond.√

Hardness of ML approximation.√

ML hardness under molecular clock.√

Efficient approximation algorithms.

Efficient algorithms for small likelihood: Givenobserved data & a tree, but not the edge weights,find the edge weights that maximize thelikelihood.

Regions of input parameters where ML can beefficiently solved.

♣MP, ML, AML Reconstructionof Phylogenetic Trees:A Status Report – p.44