1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer...

24
1 Computational Analysis Computational Analysis of Protein-DNA of Protein-DNA Interactions Interactions Changhui (Charles) Yan Changhui (Charles) Yan Department of Computer Department of Computer Science Science Utah State University Utah State University
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    1

Transcript of 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer...

Page 1: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

1

Computational Analysis Computational Analysis of Protein-DNA of Protein-DNA

InteractionsInteractionsChanghui (Charles) YanChanghui (Charles) Yan

Department of Computer Department of Computer ScienceScience

Utah State UniversityUtah State University

Page 2: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

2

Problem II

Identifying amino acid residues involved in protein-DNA interactions from sequence

Page 3: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

3

Materials And Methods

56 double-stranded DNA binding proteins previously used in the study of Jones et al. (2003)

Encoding

Page 4: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

4

Materials And Methods

Page 5: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

5

Leave-one-out cross-validation

Naïve Bayes

Naïve Bayes Classifier

Page 6: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

6

Naïve Bayes

n

ii

n

ii

n

n

cxPcP

cxPcP

xxxXcP

xxxXcP

1

1

21

21

)0|()0(

)1|()1(

)...|0(

)...|1(

Naïve Bayes Classifier

Leave-one-out cross-validation

Page 7: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

7

Leave-One-Out Cross-Validations

Sequence-based Sequence/structure-based

Identities

(ID)

ID + entropy ID + rASA ID + rASA + entropy

Correlation coefficient

0.25 0.29 0.28 0.30

Accuracy(%) 77 75 76 77

Specificity+(%)

37 37 36 39

Sensitivity+(%)

43 53 51 52

Page 8: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

8

Pit-1, PDB 1au7

TP:30 FP: 16 TN: 86FN:14 CC: 0.51 (2nd)Accuracy: 79%

Predicted Actual

Predictions in The Context of 3-D Structures

Page 9: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

9

Predictions in The Context of 3-D Structures

-Cro, PDB 6cro

TP:10FP: 5 TN: 34FN:10 CC: 0.37 (19th)Accuracy: 73%

Predicted Actual

Page 10: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

10

Predictions CPredictions Compared With With PROSITE MotifsPROSITE Motifs

Predicted binding sites substantially overlap with 34 of the 37 “DNA-binding” PROSITE motifs

In 52 of the 56 proteins, the predictor identifies at least 20% of the DNA-binding residues

28 of the 56 proteins contain no PROSITE motifs that are annotated as “DNA-binding”

Page 11: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

11

Comparison With Previous StudyComparison With Previous Study

Method Naïve Bayes classifier

Ahmad and Sarai

method*

Correlation CCoefficient

0.260.26 0.230.23

Accuracy (%) 8080 6666

Specificity+(%) 2929 2121

Sensitivity+(%) 4848 6868*Ahmad, S. and Sarai, A. (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 6, 33.

Page 12: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

12

Summary

A simple sequence-based Naive Bayes classifier predicts interface residues in DNA-binding proteins with 75% accuracy, 37% specificity+, 53% sensitivity+ and correlation coefficient of 0.29

Predicted binding sites

correctly indicate the locations of actual binding sites

substantially overlap with known PROSITE motifs

Page 13: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

13

Problem IIProblem II

Identification of Helix-Turn-Helix (HTH) DNA-Identification of Helix-Turn-Helix (HTH) DNA-binding motifsbinding motifs

Page 14: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

14

HTH MotifsHTH Motifs

Sequences sharing low similarities can fold into a Sequences sharing low similarities can fold into a similar HTH structuresimilar HTH structure

Identifying HTH motifs from sequence is Identifying HTH motifs from sequence is extremely challengingextremely challenging

Page 15: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

15

Trick 1Trick 1

Including more informationIncluding more information Amino acid sequenceAmino acid sequence Secondary structureSecondary structure

Page 16: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

16

Hidden Markov Model (HMM)Hidden Markov Model (HMM)

LQQITHIANQL-GLE----KDVVRVWF

Page 17: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

17

Hidden Markov Model Hidden Markov Model (HMM(HMM_AA_SS_AA_SS))

LQQITHIANQL-GLE----KDVVRVWFHHHEEHEEEHMHE----HHEEMMEH

Page 18: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

18

Trick 2Trick 2

There are similarities among the 20 naturally There are similarities among the 20 naturally occurred amino acidsoccurred amino acids Reduced alphabetsReduced alphabets

Page 19: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

19

Reduced AlphabetsReduced Alphabets

Schemes for reducing amino acid alphabet based on the Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix derived by grouping and averaging the similarity matrix elements as described in the text. elements as described in the text. (Murphy (Murphy et al.et al. 2000) 2000)

Page 20: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

20

Cross-Families EvaluationsCross-Families Evaluations

True Positive 1 False Positive 2

HMM_AA 3 0

HMM_AA_SS(20 letters) 3

227 0

HMM_AA_SS(Murphy_15) 3

474 0

HMM_AA_SS(Murphy_10) 3

470 3

HMM_AA_SS(Murphy_8) 3

431 5

1.True positive: HTH motifs that are correctly identified as such.2.False positive: Non-HTH motifs that are identified as HTH motifs.3.The alphabet used to encode amino acid sequences.

Page 21: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

21

QuestionsQuestions

Page 22: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

22

Within-family Three-Fold Cross-Within-family Three-Fold Cross-ValidationsValidations

.Family (number of HTH motifs in the family)

HMM_AA HMM_AA_SS(Murphy_15)

PF00126 (1635) 1594 1622

PF00165 (90) 63 80

PF00196 (30) 26 30

PF04545 (164) 137 164

PF01022 (42) 39 39

PF00046 (189) 176 188

PF03965 (48) 48 48

Page 23: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

23

Comparisons of HMM_AA_SS with Comparisons of HMM_AA_SS with FFAS03 in Cross-Family FFAS03 in Cross-Family

EvaluationsEvaluations

Total HTH motifs

Recognized by both FFAS03 and

HMM_AA_SS

Recognized by

FFAS03 only

Recognized by HMM_AA_SS

only

563 135 24 71

Page 24: 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

24

Putative HTH motifs in Putative HTH motifs in Ureaplasma parvumUreaplasma parvum

Protein Location Annotation from Uniprot

sp|Q9PQE5|SCPB_UREPA 176-214 Participates to chromosomal partition during cell division

sp|Q9PQV6|RPOB_UREPA 540-587 DNA-directed RNA polymerase

sp|Q9PR27|SYY_UREPA 340-380 Tyrosyl-tRNA synthetase

sp|Q9PQC2|SYA_UREPA 217-265 Alanyl-tRNA synthetase

sp|Q9PQ74|DPO3A_UREPA 365-400 DNA polymerase III subunit alpha

sp|Q9PQX7|Y166_UREPA 507-553 Hypothetical protein