Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence...

30
Subtrees Comparison of Phyloge netic Trees with Applications to Two Component Systems Seque nce Classifications in Bacteri al Genome Yaw-Ling Lin 1 Ming-Tat Ko 2 1 Dept Computer Sci. & Info. Management, Providence University, Taichung, Taiwan. 2 Institute of Information Science Academia Sinica, Taipei, Taiwan

Transcript of Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence...

Page 1: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacteri

al Genome

Yaw-Ling Lin 1 Ming-Tat Ko 2

1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.

2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan

Page 2: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 2

Motivation – Where the problems

come from?

Page 3: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 3

Two-Component System

• Two-component systems (2CS):– Sensor histidine kinase– response regulator

• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

Page 4: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 4

2CS in Pseudomonas aeruginosa PAO1

http://www.pseudomonas.com/

“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.

• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif

ied as 2CSs.

Page 5: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 5

2CS in PAO1

Page 6: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 6

2CS in PAO1

Page 7: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 7

2CS in PAO1

Page 8: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 8

2CS in PAO1

• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations

hips between the sensor kinase and response regulator of a 2CS.

• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

Page 9: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 9

2CS in PAO1 -- Sensor Tree

Page 10: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 10

2CS: Regulator Tree

Page 11: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 11

Subtrees Analysis of 2CS

Page 12: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 12

Co-evolution subtree Analysis

Sensor Tree Regulator Tree

versus

Page 13: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 13

Problem Definition

• A phylogenetic tree with n leaves is a (rooted binary) tree such that all the leaf nodes are uniquely labelled from 1 to n.

• Given two n-leaf phylogenetic trees, we wish to explore the subtrees relationships between subtrees of the two trees.

Page 14: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 14

Normalized cluster distance between two sets

• Symmetric set difference:

• Normalized cluster distance:

Page 15: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 15

All Pairs Subtrees Comparison – A naïve O(n3) algorithm

Page 16: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 16

All Pairs Subtrees Comparison – Property

Page 17: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 17

All Pairs Subtrees Comparison – an O(n2) algorithm

Page 18: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 18

Lowest Common Ancestor

Page 19: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 19

Confluent subtree

Page 20: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 20

Confluent subtree – Illustration

Page 21: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 21

Consructing confluent subtree

Page 22: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 22

Nearest subtree

Page 23: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 23

Nearest subtree: reasoning

Page 24: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 24

Nearest subtree: Algorithm

Page 25: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 25

k-agreement Problem

Page 26: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 26

Correlation analysis• Does gene duplication tend to occur within a

relative short distance on a bacterial genome? • Idea: a dot-matrix plot will be created, with the X-

axis being the physical distance, and Y-axis being the evolutionary distance, between two comparing 2CS.

• Some subset of 2CS, presumably functionally related, could possess the correlation between their physical and evolutionary distances.

Page 27: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 27

k-correlation Problem

Page 28: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 28

k-correlation is NP-complete

• Let M1 be an adjacent matrix of a graph G, and M2

be an zero matrix.• If we can solve the k-correlation problem in

polynomial time, then the maximum independent set problem will be polynomial solvable.

Page 29: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 29

Conclusion

• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS

Page 30: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 30

Future Research

• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS