Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacteri
al Genome
Yaw-Ling Lin 1 Ming-Tat Ko 2
1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.
2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan
Yaw-Ling Lin, Providence, Taiwan 2
Motivation – Where the problems
come from?
Yaw-Ling Lin, Providence, Taiwan 3
Two-Component System
• Two-component systems (2CS):– Sensor histidine kinase– response regulator
• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment
Yaw-Ling Lin, Providence, Taiwan 4
2CS in Pseudomonas aeruginosa PAO1
http://www.pseudomonas.com/
“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.
• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif
ied as 2CSs.
Yaw-Ling Lin, Providence, Taiwan 5
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 6
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 7
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 8
2CS in PAO1
• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations
hips between the sensor kinase and response regulator of a 2CS.
• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.
Yaw-Ling Lin, Providence, Taiwan 9
2CS in PAO1 -- Sensor Tree
Yaw-Ling Lin, Providence, Taiwan 10
2CS: Regulator Tree
Yaw-Ling Lin, Providence, Taiwan 11
Subtrees Analysis of 2CS
Yaw-Ling Lin, Providence, Taiwan 12
Co-evolution subtree Analysis
Sensor Tree Regulator Tree
versus
Yaw-Ling Lin, Providence, Taiwan 13
Problem Definition
• A phylogenetic tree with n leaves is a (rooted binary) tree such that all the leaf nodes are uniquely labelled from 1 to n.
• Given two n-leaf phylogenetic trees, we wish to explore the subtrees relationships between subtrees of the two trees.
Yaw-Ling Lin, Providence, Taiwan 14
Normalized cluster distance between two sets
• Symmetric set difference:
• Normalized cluster distance:
Yaw-Ling Lin, Providence, Taiwan 15
All Pairs Subtrees Comparison – A naïve O(n3) algorithm
Yaw-Ling Lin, Providence, Taiwan 16
All Pairs Subtrees Comparison – Property
Yaw-Ling Lin, Providence, Taiwan 17
All Pairs Subtrees Comparison – an O(n2) algorithm
Yaw-Ling Lin, Providence, Taiwan 18
Lowest Common Ancestor
Yaw-Ling Lin, Providence, Taiwan 19
Confluent subtree
Yaw-Ling Lin, Providence, Taiwan 20
Confluent subtree – Illustration
Yaw-Ling Lin, Providence, Taiwan 21
Consructing confluent subtree
Yaw-Ling Lin, Providence, Taiwan 22
Nearest subtree
Yaw-Ling Lin, Providence, Taiwan 23
Nearest subtree: reasoning
Yaw-Ling Lin, Providence, Taiwan 24
Nearest subtree: Algorithm
Yaw-Ling Lin, Providence, Taiwan 25
k-agreement Problem
Yaw-Ling Lin, Providence, Taiwan 26
Correlation analysis• Does gene duplication tend to occur within a
relative short distance on a bacterial genome? • Idea: a dot-matrix plot will be created, with the X-
axis being the physical distance, and Y-axis being the evolutionary distance, between two comparing 2CS.
• Some subset of 2CS, presumably functionally related, could possess the correlation between their physical and evolutionary distances.
Yaw-Ling Lin, Providence, Taiwan 27
k-correlation Problem
Yaw-Ling Lin, Providence, Taiwan 28
k-correlation is NP-complete
• Let M1 be an adjacent matrix of a graph G, and M2
be an zero matrix.• If we can solve the k-correlation problem in
polynomial time, then the maximum independent set problem will be polynomial solvable.
Yaw-Ling Lin, Providence, Taiwan 29
Conclusion
• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes
• Clustering analysis of 2CS for functional prediction of uncharacterized genes
• Co-evolutionary analysis of 2CS
Yaw-Ling Lin, Providence, Taiwan 30
Future Research
• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes
• Clustering analysis of 2CS for functional prediction of uncharacterized genes
• Co-evolutionary analysis of 2CS
Top Related