QUARKONIUM FORMATION FROM HEAVY QUARK RECOMBINATION FORMATION FROM HEAVY QUARK RECOMBINATION
Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and...
description
Transcript of Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and...
![Page 1: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/1.jpg)
Optimal Efficient Reconstruction of Root-Unknown Phylogenetic
Networks with Constrained and Structured Recombination
Author: Dan Gusfield
Presentation by: C. Badri Narayanan
![Page 2: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/2.jpg)
Agenda
• Main Problem – Root-Unknown galled-tree problem
• Solving Optimal Root-Unknown Galled-Tree Problem
![Page 3: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/3.jpg)
Root-Unknown Galled-Tree problem
Given a set of sequences (say, M), find a galled-tree with minimum number of recombinations, if one exists else output none
Let’s see the approach previously taken
![Page 4: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/4.jpg)
Points Considered in Theorem(s)
• Only single-crossover recombinations are considered
• The algorithm will be extended to multiple crossover recombinations
Before seeing the approach let’s consider some definitions
![Page 5: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/5.jpg)
Definition of Terms
• Trivial Component: A node with no edges
• Component (a.k.a. Connected/Non-Trivial Component): For any pair of nodes there is at least one path between those nodes
• Reduced galled-tree: If no gall contains a character site from a trivial component
![Page 6: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/6.jpg)
Previous Approaches – A Roadmap
• To construct a galled-tree for M with known ancestral sequence (say, A)
Focus on each non-trivial component
separately from incompatibility graph
For each component in the incompatibility
graph, determine the site arrangement on a
gall
Connect the galls in a tree structure
Place the sites from the trivial components
![Page 7: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/7.jpg)
Difficulties for Unknown Ancestral Sequence
• For any two sequences S & S’ (in M), the conflict and incompatibility graphs may be different
• How do we know which (ancestral) sequence will allow a galled-tree
![Page 8: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/8.jpg)
Optimal Galled-Tree• If a galled-tree that minimizes the number
of recombinations over all galled-trees for a set of sequences (say, M) and over all choices of ancestral sequence then it is called “Optimal Galled-Tree”
• The ancestral sequence of an optimal galled-tree is called an “optimal ancestral sequence”
![Page 9: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/9.jpg)
Author’s Approach: Theorem on Galled Trees – Finding An
Ancestral Sequence
If there is a galled-tree for M with some ancestral sequence, then there is an optimal galled-tree for M where the (optimal) ancestral sequence is one of the sequences in M
![Page 10: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/10.jpg)
Proof for the Theorem
T – optimal galled-tree for M A – ancestral sequence for T
Every gall must have at least three edges branching off of it
![Page 11: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/11.jpg)
Proof continued….
Path P in T from root to some leaf z which doesn’t contain any recombination nodes
Zz – sequence labeling z where Zz is in M
Make Zz as the ancestral sequence &
reverse the directions of all edges on path P
![Page 12: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/12.jpg)
Main Problem contd..
• Each such reversal of edges changes the direction of mutation on edges
• The reversal of edges don’t change
> Labels on edges in T
> Recombination node on a gall
• The modified tree T’ also derives M
![Page 13: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/13.jpg)
Main Problem contd..
• Ancestral sequence of T’ is Zz which is a member of M
• T’ also contains same number of galls and hence T’ is also optimal
• Running time is O(n2 m + n4) where
n – number of sequences
m – length of binary sequence
![Page 14: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/14.jpg)
Solving Optimal Root-Unknown Galled-Tree Problem
• M – can be derived on a galled-tree; T* - an optimal galled-tree for M
• A* - an optimal ancestral sequence
![Page 15: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/15.jpg)
Connecting galls of T*
Assumptions Every node v on a gall Q in T* is
incident with exactly one edge; The
other end is off of Q (a.k.a. “off-edge”)
Off-edge may be directed into or out of a node
(say, x)
![Page 16: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/16.jpg)
Connecting Galls of T*• Transform T* to T’
(conceptually) as follows– Node 00100 (say, x) is
incident with 2 edges– A new edge (say, y) is
introduced– Connect the 2 original
edges (that were initially out of x) from y
– T’ specifies how galls of T* are connected to each other but does not show the internal arrangement of the sites on any gall
![Page 17: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/17.jpg)
Connecting Galls of T*
If x is root of T* then create a new root and connect it with an If x is root of T* then create a new root and connect it with an edge to xedge to x
Contract each gall Q in T* to a single node (say, q) and make all Contract each gall Q in T* to a single node (say, q) and make all edges undirectededges undirected
![Page 18: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/18.jpg)
Algorithmic Construction of T’
• Find a family of splits SP(T)
• C1 & C2 are obtained from the incompatibility graph
• The leaf nodes for the tree (on the right side of the figure) are determined by the sites that have unique combination of characters
![Page 19: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/19.jpg)
Extensions to Complex Biological Phenomena & Structured Recombination
• Site-Arrangement algorithm for gall Q corresponding to component C
Let M(C ) be matrix M restricted to sites in C
![Page 20: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/20.jpg)
Extensions to Complex Biological Phenomena & Structured Recombination For each distinct sequence X in M(C ):
Let M(C, X) be M(C ) after removal of all rows with sequence X
If there is an undirected perfect phylogeny T(C) for M(C,X) where all sites on C are contained in one path whose end sequences can be recombined (with single-crossover) to create sequence X then output the pair (X, T(C ))
![Page 21: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/21.jpg)
Extensions to Complex Biological Phenomena & Structured Recombination
• Step 2 of above algorithm is modified for multiple-crossover recombination
• To determine if X can be created by a multiple-crossover recombination of Su(C) and Sy(C),
starting with Su(C)
– Let Su(C) and Sy(C) denote two sequences
![Page 22: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/22.jpg)
Extensions to Complex Biological Phenomena & Structured Recombination
• Algorithm:– i = 1; Z = Su(C)
– do{
• Find longest substring of Z starting at position i that matches a substring X starting at position i
• If none, return no else
• Set i to position past the right end of those matching substrings
• If Z = Su(C) then set Z = Sy(C) else Z = Su(C)
}
– Return yes
![Page 23: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/23.jpg)
Extensions to Complex Biological Phenomena & Structured Recombination
The above algorithm produces a multiple-crossover galled-tree for M
![Page 24: Optimal Efficient Reconstruction of Root-Unknown Phylogenetic Networks with Constrained and Structured Recombination](https://reader033.fdocuments.in/reader033/viewer/2022042718/56816807550346895ddd8a52/html5/thumbnails/24.jpg)
Thank You