PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
-
Upload
bilal-nizami -
Category
Technology
-
view
848 -
download
0
description
Transcript of PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
![Page 1: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/1.jpg)
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING
METHODS
BILAL NIZAMIM.Tech (Bioinformatics)
Under the guidance of
Dr. SUSAN THOMASBiomedical Informatics Center (BIC)
NIRRH, Mumbai
![Page 2: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/2.jpg)
We will be discussing.. • The problem• The solution• Objectives• Literature reviews• Machine learning in biological problems• Antimicrobial activity prediction• Technical background• Methodology• Results • Conclusions• Future perspective• Availability and publications
![Page 3: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/3.jpg)
The Problem• Increasing resistance toward the conventional antibiotics
has become a global concern.
Sour
ce :-
Cen
ter f
or G
loba
l Dev
elop
men
t (CG
D)
![Page 4: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/4.jpg)
The solution
• Novel antibacterial agents • Antimicrobial peptides (AMPs) are
potential alternatives for conventional antibiotics because of- 1.ability to kill target cells rapidly.2.broad spectrum of activity.3. and modularity.
![Page 5: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/5.jpg)
Yet another obstacle• Exact MOA and SAR of AMPs is not known completely *. • Many reasons can be given for the same :-
1. Diversity in AMPs sequence2. Varied structures 3. Unorganized structure in solution 4. Unknown structure of numerous AMPs.
• Above and beyond high throughput screening, methods for large scale synthesis and automated assay techniques, two other important pre requisites are a) open source in silico libraries of AMPs b) efficient computational methods.
• A computational method includes prediction tools for antimicrobial activity.
* Mohammad Rahnamaeian: Antimicrobial peptides Modes of mechanism, modulation of defense responses, Plant Signaling &Behavior 6:9, 1325-1332; 2011Landes Bioscience.
![Page 6: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/6.jpg)
Objectives
• Machine learning based prediction tools for antimicrobial activity.
• Comparison of SVM, RF and ANN based prediction models
• Relative importance of various peptide descriptors in prediction ability of models.
![Page 7: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/7.jpg)
Literature Reviews
• AMPs are Abundant and diverse group of biomolecules. • Selectively lethal against microbes.• Found every where e.g. Monera(Eubacteria), Protista
(protozoans and algae), Fungi (yeasts), Plantae (plants) and Animalia (insects, fish, amphibians, reptiles, birds and mammals). (Sang Y et al. 2008 )
• Exist as α-helical peptides, and β-sheet peptides.• Difference between cell membrane’s composition,
polarization, and structure of eukaryotes and prokaryotes is responsible for selective action. Brogden KA (2005)
• Attraction, attachment and pore formation are seen during the action of AMPs (Roland ,2009)
![Page 8: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/8.jpg)
Literature Reviews…
• Two significant properties which are considered for de-novo design of AMPs (Richard W. 2008 and Prenner 2005)
1. Net positive charge to interact with negatively charged bacterial membrane.
2. Amphipathic structure to facilitate its integration into the bacterial membrane. (Sarika P 2011)
Red, basic (positively charged) amino acids
Green, hydrophobic amino acids
Michael Zasloff (2002)
![Page 9: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/9.jpg)
Machine learning in Biological problems
• 1958 - First attempt to model neuronal architecture of the brain. • 1982 - Stormo et al. proposed ‘Perceptron’ algorithm to distinguish E. coli
translational initiation sequences from other sites. • machine learning is employed for :- 1. Prediction models 2. Automatic annotation 3. Protein structure and function prediction 4. Active sites determination in proteins 5. Evolutionary analysis6. Determination of binding sites on protein target 7. Biological network analysis8. Patterns discovery in biochemical pathways9. Phylogenetic tree analysis 10. Identifying genetic markers of disease.
![Page 10: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/10.jpg)
Antimicrobial activity prediction • several machine learning based prediction methods have been developed
* support vector machines (SVM), discriminant analysis (DA), Sliding window (SW), artificial neural network (ANN), quantitative matrix (QM), Hidden markov model (HMM), sequence alignment (SA), Weighted finite-state
transducers (WFST)
• Still a huge gap exists between what need to be achieved and what has been achieved.
Algorithm / method * Reference Associated database
SVM Lata et al. AntiBP
ANN Lata et al. AntiBP
SW Torrent et al. --
DA Thomas et al. CAMP
QM Lata et al. AntiBP
WFST Whelan et al. --
HMM Hammami et al. PhytAMP
Hammami et al. BACTIBASE
SA Wang et al. APD2
![Page 11: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/11.jpg)
Antimicrobial activity prediction • This is a challenging task, due to• Low sequence similarity among diverse AMPs (Hancock RE
1999)• Unorganised conformation
• Moreover costly experimental methods• So we need good prediction models • Physicochemical properties like Charge, size, amphipathicity,
amino acid composition, structural conformation, hydrophobicity and polar angle are responsible for antimicrobial activity.
• Total of 257 peptide descriptors - which includes dipeptide and tripeptide composition, composition based on reduced alphabets, amino acid indices, charge, and hydrophobicity indices.
![Page 12: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/12.jpg)
Technical background
SVMs
• Supervised learning model.• Originally it was for linearly separable case.• In 1995 it was extended to the linearly non separable cases
also.
![Page 13: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/13.jpg)
Linear SVMs
• For given datasets.
Where n = points in D and y = labels in D • Task is to determine the class label of a new data point• Many hyper planes are possible (H0, H1, H2).• Maximum margin hyper plane (H0, with largest separation) is pre-eminent.
![Page 14: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/14.jpg)
Linear SVMs…
• In linearly separable data, there is a vector ‘w’ and scalar ‘b’ such that following equations holds valid for all the points in D.
• Classifier with lowest possible generalization error.• So we want to minimize ||W||.
• Goal of SVM is to maximize the margin width while minimizing errors.
![Page 15: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/15.jpg)
Non linear SVM
• Kernel trick.• Data points are nonlinearly mapped to a feature space of high
dimensions.
• The transformation used is f([x y]) = [x y (x^2+y^2)].
![Page 16: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/16.jpg)
Random Forest
• Ensemble learning framework.• It raises multiple classification trees.• Decision tree is a common flow chart like schema to represent
classification problems.
![Page 17: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/17.jpg)
Random forest..• Each decision tree in RF is grown as follows :-• Sample N cases (1/3 of original dataset)with replacement from the original data.• Select randomly m predictor out of the M predictors (m<<M) and variable that
provides the best split is used to split the node. • Each tree is grown to its largest possible extent & each tree votes for ‘class labels’. • The classification winning most votes are chosen.
![Page 18: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/18.jpg)
Advantages of RF
• High prediction accuracy.• Hold perfectly good for large scale dataset with large number
of variables.• Integral variable selection based on importance and variable
interaction.• Deals efficiently with data having missing values.• Ability to reuse forest for future estimation.• Computation of relation between variables and classification.• Proximity calculation between cases.• Can be used for unsupervised learning and outlier detection.• Internal unbiased estimate of the generalization error
![Page 19: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/19.jpg)
ANN• Inspired by biological network of neurons like CNS. • Adaptive learning system.• It consists of dense and complex interconnected web of units
(perceptron), analogue of brain’s neurons. • Given inputs x1 … xn, the output o(x1, . . . , xn,) is
![Page 20: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/20.jpg)
ANN..
• Interconnected, complex network of perceptron forms ANN.
![Page 21: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/21.jpg)
Perceptron learning rule• It involves learning to fix the weight vector so that it is able to predict
correct ±1 output.• It is a method to alter and re-adjust the weights.
Perceptron rule • Assign initial weights randomly.• Then iteratively apply the perceptron. • If perceptron mis calculate the output, readjust the weights. Repeat this.
Delta rule • Perceptron rule fails to converge in nonlinearly separable case. • Based on gradient descent search algorithm. • Searches the suitable weight from a hypothesis space of weights.
![Page 22: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/22.jpg)
MethodologyKey issues • Data representation• Cross validation
• Measurement of classifier’s performance• Sensitivity • Specificity• MCC• Prediction accuracy
![Page 23: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/23.jpg)
Methodology …
• CAMP currently contains 4020 AMPs
• Sequences having X was removed.• redundant sequence - Cd hit (cut-
off of 0.9)• Final negative dataset - 4011
sequences.• Perl script to calculate 257 peptide
features. • train and test data -70:30. • Best 64 features - RF Gini score• Package randomForest in R for RF.• 1000 tree and default mtry. • Kernlab package for SVM,
Polynomial kernel.• nnet package for ANN. Log liner
model with 65 weights. • Package “ROCR” for evaluation.
![Page 24: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/24.jpg)
Results • 1470 AMP and 532 NAMP in test dataset.• RF shows the best prediction accuracy
Algorithm MCC against test dataset
Prediction Accuracy (in %)
AUC of ROC curve
RF 0.87 94.2 0.98
SVM 0.82 92.3 0.97
ANN 0.74 87.9 0.94
![Page 25: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/25.jpg)
Comparison with other prediction tools
Server / tools Prediction accuracy (%)
RF SVM ANN SW QM
Our method 94.2 92.3 87.9 -- --
AntiBP -- 92.1 88.17 -- 90.37
AMPA -- -- -- 85 --
Random Forest (RF), Support vector machines (SVM), Artificial neural network (ANN), sliding window (SW), quantitative matrix (QM)
![Page 26: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/26.jpg)
Fig 1 Fig 2
Fig 3
Figure 1 - Plot of cumulative error rates in RF - black (overall), red - class 0 (AMP), green - class 1 (NAMP)
Figure 2 - A variable importance plot. Variable importance is determined by Mean decrease in Gini score.
Figure 3 - Scatter plot of RF model (red triangle - AMP and black circle – NAMP).
![Page 27: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/27.jpg)
Conclusions General conclusion • Prediction tools are very crucial for designing and synthesis of novel AMPs. • Sequence of an AMP plays an important role in antimicrobial activity.• It is necessary to understand the role of peptide feature in antimicrobial
activity.• Prediction accuracy relies on the relevant information contained within
the descriptors.
Specific conclusions • RF has higher prediction performance. Ensemble technique seems to be
the reason behind this.• Best 64 peptide features is identified.• The prediction tools developed during this study will certainly help in
identifying the new potential AMPs.
![Page 28: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/28.jpg)
Future prospective
• Better prediction methods - by incorporating diverse peptide features & more stringent noise removal strategy.
• Antimicrobial region prediction in a peptide would be very useful.
• Developing a benchmark dataset would be a great milestone. • Position specific scoring matrix (PSSM) based prediction.• Classifying a predicted AMP into further sub families based on
functions. Although this work has been done, it still leaves the room for improvement in accuracy and methodology.
![Page 29: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/29.jpg)
Availability & Publication
• Version 2 of CAMP http://www.bicnirrh.res.in/antimicrobial/
• Publication of CAMP version 2 is in communication with Nucleic Acid research (NAR) http://nar.oxfordjournals.org/.
![Page 30: PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS](https://reader033.fdocuments.in/reader033/viewer/2022061214/547d6cc7b4af9fcf6a8b477f/html5/thumbnails/30.jpg)