Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod...
-
Upload
gabriella-doyle -
Category
Documents
-
view
218 -
download
3
Transcript of Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod...
![Page 1: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/1.jpg)
Using Real-Valued Meta Classifiers to Integrate Binding
Site Predictions
Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey
University of Hertfordshire, 2005
![Page 2: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/2.jpg)
Outline
• Problem Domain• Description of the Datasets• Experimental Techniques• Experiments• Summary
![Page 3: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/3.jpg)
Problem Domain (1)
• One of the most exciting and active areas of research in biology currently, is understanding the regulation of gene expression.
• It is known that many of the mechanisms of gene regulation take place directly at the transcriptional or sequence level.
![Page 4: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/4.jpg)
Problem Domain (2)
• Transcription factors will bind to a number of different but related sequences, thereby effecting changes in the expression of genes.
The current state of the art algorithms for transcription factor binding site prediction are, in spite of recent advances, still severely limited in accuracy.
![Page 5: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/5.jpg)
• The original dataset has 68910 possible binding sites.
• A prediction result for each of 12 algorithms.– Single sequence algorithms (7); – Coregulatory algorithms (3); – Comparative algorithm (1); – Evolutionary algorithm (1).
• It includes two classes labelled as either binding sites or non-binding sites with about 93% being non-binding sites.
Description of the Datasets (1)
![Page 6: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/6.jpg)
Description of the Datasets (2)
Fig. 1. Organisation of dataset, showing alignment of algorithmic predictions, known information and original DNA sequence data.
![Page 7: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/7.jpg)
Description of the Datasets (3)Windowing
Fig. 2. The window size is set to 7 in this study. The middle label of 7 continuous prediction sites is the label for a new windowed inputs. The length of each windowed input now is 12X7.
![Page 8: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/8.jpg)
Imbalanced Data (93% being Non-binding Sites)
• For the under-sampling, we randomly selected a subset of data points from the majority class.
• The synthetic minority over-sampling technique (SMOTE) proposed by N.V.Chawla, et al,. is applied for the over-sampling.
– For each pattern in the minority class, we search for its K-nearest neighbours in the minority class using Euclidean distance.
– For continuous features, the difference of each feature between the pattern and its nearest neighbour is taken, and then multiplied by a random number between 0 and 1, and added to the corresponding feature of the pattern.
– For binary features, the majority voting principle to each element of the K-nearest neighbours in the feature vector space is employed.
Sampling Techniques for Imbalanced Dataset Learning
![Page 9: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/9.jpg)
• Majority Voting (MV);• Weighted Majority Voting (WMV);• Single Layer Networks (SLN);• Rules Sets (C4.5-Rules);• Support Vector Machines (SVM).
Experimental Techniques
The Classification Techniques
![Page 10: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/10.jpg)
Performance Metrics
TN FP
FN TP
A confusion matrix
.TNFP
FP Rate-Fp
,FPTNFNTP
TNTP Accuracy
,Precision Recall
PrecisionRecall2 Score-F
,FPTP
TPPrecision
,FNTP
TP Recall
![Page 11: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/11.jpg)
Inputs classifier Recall Precision F-Score Accuracy Fp-Rate
Best Alg. 40.95 17.46 24.48 82.22 14.66
MV 43.10 13.14 20.14 75.95 21.57
single WMV 41.19 17.35 24.42 82.05 14.86SLN 28.81 22.16 25.05 87.86 7.66
SVM 32.14 24.46 27.78 88.23 7.52
C4.5Rules 29.29 23.08 25.81 88.15 7.39
SLN 34.29 18.87 24.34 85.00 11.16
windowed SVM 38.81 20.25 26.61 84.93 11.58
C4.5Rules 23.57 18.64 20.82 87.38 7.79
Table 1: Common performance metrics (%) tested on the same consistent possible binding sites with single and windowed inputs separately.
Experiments (1)Consistent Dataset
![Page 12: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/12.jpg)
Inputs classifier Recall Precision F-Score Accuracy Fp-Rate
Best Alg. 36.36 18.40 24.44 85.97 10.73
MV 35.73 15.12 21.25 83.48 13.35
single WMV 34.75 20.04 25.42 87.28 9.23
SLN 25.19 25.09 25.14 90.64 5.01
SVM 27.91 26.97 27.43 90.79 5.03
C4.5Rules 23.03 23.14 23.08 90.43 5.09
SLN 31.82 22.66 26.47 88.97 7.23
windowed SVM 36.78 23.50 28.67 88.58 7.97
C4.5Rules 22.26 19.70 20.90 89.49 6.04
Experiments (2)Full Dataset
Table 2: Common performance metrics (%) tested on the full test dataset with single and windowed inputs separately.
![Page 13: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/13.jpg)
Fig. 3. ROC graph: five classifiers applied to the consistent test set with single inputs.
Experiments (3)
![Page 14: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/14.jpg)
Fig. 4. ROC graph: 3 classifiers applied to the full test set with windowed inputs.
Experiments (4)
![Page 15: Using Real-Valued Meta Classifiers to Integrate Binding Site Predictions Yi Sun, Mark Robinson, Rod Adams, Paul Kaye, Alistair G. Rust, Neil Davey University.](https://reader036.fdocuments.in/reader036/viewer/2022062619/5515d2f6550346cf6f8b4700/html5/thumbnails/15.jpg)
Summary
• By integrating the 12 algorithms we considerably improve binding site prediction using the SVM.
• Employing a ‘window’ of consecutive results in the input vector can contextualise the neighbouring results, so that it can use the distribution of data to improve binding site prediction.