Download - Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Transcript
Page 1: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacteri

al Genome

Yaw-Ling Lin 1 Ming-Tat Ko 2

1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.

2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan

Page 2: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 2

Motivation – Where the problems

come from?

Page 3: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 3

Two-Component System

• Two-component systems (2CS):– Sensor histidine kinase– response regulator

• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

Page 4: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 4

2CS in Pseudomonas aeruginosa PAO1

http://www.pseudomonas.com/

“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.

• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif

ied as 2CSs.

Page 5: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 5

2CS in PAO1

Page 6: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 6

2CS in PAO1

Page 7: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 7

2CS in PAO1

Page 8: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 8

2CS in PAO1

• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations

hips between the sensor kinase and response regulator of a 2CS.

• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

Page 9: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 9

2CS in PAO1 -- Sensor Tree

Page 10: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 10

2CS: Regulator Tree

Page 11: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 11

Subtrees Analysis of 2CS

Page 12: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 12

Co-evolution subtree Analysis

Sensor Tree Regulator Tree

versus

Page 13: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 13

Problem Definition

• A phylogenetic tree with n leaves is a (rooted binary) tree such that all the leaf nodes are uniquely labelled from 1 to n.

• Given two n-leaf phylogenetic trees, we wish to explore the subtrees relationships between subtrees of the two trees.

Page 14: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 14

Normalized cluster distance between two sets

• Symmetric set difference:

• Normalized cluster distance:

Page 15: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 15

All Pairs Subtrees Comparison – A naïve O(n3) algorithm

Page 16: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 16

All Pairs Subtrees Comparison – Property

Page 17: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 17

All Pairs Subtrees Comparison – an O(n2) algorithm

Page 18: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 18

Lowest Common Ancestor

Page 19: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 19

Confluent subtree

Page 20: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 20

Confluent subtree – Illustration

Page 21: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 21

Consructing confluent subtree

Page 22: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 22

Nearest subtree

Page 23: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 23

Nearest subtree: reasoning

Page 24: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 24

Nearest subtree: Algorithm

Page 25: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 25

k-agreement Problem

Page 26: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 26

Correlation analysis• Does gene duplication tend to occur within a

relative short distance on a bacterial genome? • Idea: a dot-matrix plot will be created, with the X-

axis being the physical distance, and Y-axis being the evolutionary distance, between two comparing 2CS.

• Some subset of 2CS, presumably functionally related, could possess the correlation between their physical and evolutionary distances.

Page 27: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 27

k-correlation Problem

Page 28: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 28

k-correlation is NP-complete

• Let M1 be an adjacent matrix of a graph G, and M2

be an zero matrix.• If we can solve the k-correlation problem in

polynomial time, then the maximum independent set problem will be polynomial solvable.

Page 29: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 29

Conclusion

• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS

Page 30: Subtrees Comparison of Phylogenetic Trees with Applications to Two Component Systems Sequence Classifications in Bacterial Genome Yaw-Ling Lin 1 Ming-Tat.

Yaw-Ling Lin, Providence, Taiwan 30

Future Research

• Identifying novel 2CS in other bacteria genomes as well as in eucaryotic genomes

• Clustering analysis of 2CS for functional prediction of uncharacterized genes

• Co-evolutionary analysis of 2CS