. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
-
Upload
lilian-reeves -
Category
Documents
-
view
224 -
download
0
description
Transcript of . Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
.
Perfect PhylogenyTutorial #10
© Ilan Gronau
Original slides by Shlomo Moran
2
The underlying model:• A character-vector is given for every specie in S.• Each character represents some observable trait.• Each character takes values from a finite set.• Basic Underlying Assumption: characters are
homoplasy free.
Perfect Phylogeny
3
no reversals
Homoplasy-Free Characters
no convergence
Homoplasy-free characters induce a convex coloring of the phylogenetic tree
The Perfect Phylogeny Problem:
Given character-vectors for S, find:- a phylogenetic tree T over S.
(S is the leaf-set of T)- convex character assignments to
all vertices of T.! This problem is generally NP-hard !If exists
4
Directed binary characters: • 0 – property exists• 1 – property doesn’t exist• Initially (at the root) all propertied do not exist.
Input: binary coloring (C1,…,Cm) of a set S (nxm binary matrix M)
Problem: Find a phylogenetic tree T over S (if one exists), s.t.1. For j=1,…,m, the partial coloring induced by Cj is convex in
T.2. The root has state 0 in all characters.
Directed Binary Perfect Phylogeny
We will present a polynomial-time solution
5
A
ED
C
B
(11000)
(00100)
(01000)
(00110)
(11001)
m characters
n sp
ecie
sExample
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0
Input: Possible output:
(00000)
(11000)
(01000)(00100)
C2
C3
zero-root
6
A tree is a directed perfect phylogeny for a given 0/1 matrix
iff we can map each character to an
edge/vertex on which this character was “turned on”.
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0
A
ED
C
B
C4
C3
C1
C5
Example:
An Important Observation
C2 origin of C2
7
Laminar MatricesDefinitions: Oj – set of objects that have character Cj (Oj={i : Mij=1}). A collection of sets {S1 ,…, Sk} is laminar if
for all i, j, either Si and Sj are disjoint, or one includes the other.
Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O1 ,…, Om} is laminar.
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 1C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 1
Laminar Not Laminar
8
Proof of Theorem
Assume M has a perfect phylogeny.Consider the edges labeled Ci and Cj: If there is a root-to-leaf path containing both edges (C1,C2 below),
then Oi includes Oj or vice-versa. Otherwise, Oi and Oj are disjoint (C1,C3 below).
A
ED
C
B
C4
C3
C5
C1
C2
9
Assume that the collection {O1 ,…, Ok} is laminar. We prove by induction on the number of characters k that M has a perfect phylogenetic tree.
Basis: one character. There are at most two (distinct) objects, one with and one without this character.
C1
A 1B 0
C1
ABroot
Proof of Theorem (cont)
10
Assume that the collection {O1 ,…, Ok} is laminar.
Induction step: assume correctness for n-1 characters.Consider a matrix with n characters (non-zero columns), and assume WLOG that O1 is not contained in Oj for all j > 1. S1 – the set of objects i for which Mi1 = 1. S2 – the remaining objects. Claim: each character belongs to objects in S1 or S2 , but not to both.
By induction there are trees T1 and T2 for S1 and S2. C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 1 0 0 0 0
T1 T2
C1S1 ={A,C,E}S2 ={B,D}
Proof of Theorem (cont)
why is this?
11
Efficient Implementation1. Sort the columns (characters) according to decreasing binary
value.
Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj.
Proof: Oi > Oj means the 1’s in Oi are not covered by the 1’s in Oj.
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0
C2 C1 C3 C5 C4
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
12
why is this?
2. Make a backwards linked list of the 1’s in each row
Claim: If the columns are sorted, then the set of columns is laminar ifffor each column i, all the links leaving column i point at the same column.
If the matrix is laminar then these pointers define the inclusion hierarchy
Efficient Implementation (cont)
C2 C1 C3 C5 C4
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
C2 C1 C3 C5 C4
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 0 0 1 1 0
13
(11000)
(00100)
(01000)
(00110)
(11001)
(00000)
(11000)
(10000)(00100)
3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral
character states
Efficient Implementation (cont)
C2 C1 C3 C5 C4
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0
C5
C1C2
C4
C3
A
ED
C
B
C4
C3
C5
C1
C2
14
1. Sort the columns (characters) according to decreasing binary value.
2. Make a backwards linked list of the 1’s in each row 3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral
character states
Complexity: O(mn) – use radix (bucket) sort in stage 1.
Efficient Implementation - Summary
C1 C2 C3 C4 C5
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0
C2 C1 C3 C5 C4
A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0