Predicting Drug-gene and Drug-disease Networks using Functional Flow Bioinformatics Capstone Project...

Predicting Drug-gene and Drug-disease Networksusing Functional Flow

Bioinformatics Capstone Project

School of InformaticsIndiana University

Bloomington, Indiana

Ryan Tran Rene

Purpose: Given putative drug associations with genes,find other drugs that may be associated with those genes.

For each unique gene, Functional Flow will be used todetermine which unannotated drugs are most likely tointeract with that gene.

The method will be based on the similarity of the molecular fingerprints of drugs

MethodsAlgorithmsResults and Conclusions

Unique drugs (pcid)

Daylight SMILES

molecular fingerprintsgNova; MACCS

Tanimoto Scores T(u,v)

Known drug-gene interactions

Edges between nodesE(u,v): 0 or 1

For each unique gene:Functional Flow

from annotated drugs (R=inf)To unannotated drugs (R=0)

Large functional flowsto unannotated drugs

may indicate new drug-gene interactions

Unique genes (pcid)

Matador (Gene Name + PubChem ID)

DrugBank (HGNC ID number + PubChem ID)

HGNC database (Gene Name to HGNC ID)

Goal: To create 2 data bases mapping genes to drugs (PubChem ID) and diseases to drugs. PubChem ID to molecular fingerprints.

Pdb (Pdb id number + Chemical compound name).

UniProt (pdb id to HGNC id)

script (chemical name to pubchem Id)

HGNC database (HGNC ID to Gene Name)

PharmGKB (disease name to gene name) (disease name to drug PubChem ID)

Tools for parsing & scripting: perl, awk, sed, UNIX, Excel, MATLAB (Log-Log), eliminate duplicate pairs, …

Daylight SMILES (from PubChem ID)

MACCS structural key molecular fingerprints (gNova; from SMILES)

OC1C(OC(CO)C(O)C1O) OC2(CO)OC(CO)C(O)C2O

Sucrose

PubChem ID =1115

Unique drugs (pcid)

Daylight SMILES

Unique genes (pcid)

Tanimoto coefficient (extended Jaccard coefficient)

T(u,v) = (u • v) / (||u||2 + ||v||2 - u • v)

Molecular fingerprints (0’s and 1’s):u = (1,0,1,1,0,1,0,0,1) -> ||u||2 = u • u = 5v = (0,1,1,1,1,0,1,0.1) -> ||v||2 = v • v = 6 (0,0,1,1,0,0,0,0,1) -> u • v = 3T(u,v) = 3/(5+6-3) = 3/8

Random fingerprints (N large): u = (1, 0, 1, 0, …., 1, 0, 1, 0) -> ||u||2 -> N/2v = (1, 0, 0, 1, …., 1, 0, 0, 1) -> ||v||2 ->N/2 (1, 0, 0, 0, …., 1, 0, 0, 0) -> u • v ->N/4T (u,v) -> (N/4)/(N/2+N/2-N/4) = 1/3

E(u,v) = 1; T(u,v) >= threshhold0; T(u,v) < threshhold{ Edges between nodes

0 <= T(u,v) <= 1

Unique drugs (pcid)

Daylight SMILES

Unique genes (pcid)

Annotated (Ro = ∞) not annotated (Ro = 0)

1st-iteration flow 2nd-iteration flow

Iterated Functional Flow

drug drug

E(D1,D5)

Flow from Drug D5 (u)

E(D2,D5)

E(D3,D

E(D5,D7)

E(D5,D

E(D5,D8)

Note: Nabieva et al. (2005) accidently omitted Rt-1(u) from their published equation for E/(u,v).

E/(u,v) = E(u,v) • Rt-1(u) / ΣE(u,y); ΣE/(u,y) = Rt-1(u)

gta(u,v) = { 0 ; Rt-1(v) > Rt-1(u)

min[E(u,v),E/(u,v)] ; Rt-1(u) > Rt-1(v)

2nd iteration:u =D5, v=D6R1(u) = 3E/(u,v) = 1 • 3 /6G1(u,v) = 1/2

(u) ={ ∞ ; node (drug) annotated for gene “a” 0 ; else

Rat(u) = Ra

t-1(u) + Σy gta

(y,u) - Σy gta

Reservoirs increase by net flow into nodes:

functional score = sum of all flows into a node during all iterations:

Rao = (∞, 0, …, 0, ∞, ∞, 0, …, 0)

0 E1,2 E1,3 … E1,N

E2,1 0 E2,3 … E2,N

E3,1 E3,2 0 … E3,N

EN,1 EN,2 … E1,N-1 0…………………

Input:

fa (u) = Σt Σy gat(y,u) Output:

Functional Flow Input and Output

for t = 2 : d + 1 t-1 f(t, :) = f(t - 1, :); for u = 1 : N-1 for v = u+1 : N % no flow if E(u, v) = 0. if E(u, v) ~= 0.; if R(u) > R(v); % compute flow from u to v : ... g = min(E(u, v), R(u) * W(u, v) ); S(v) = S(v) + g ; S(u) = S(u) - g ; f(t, v) = f(t, v) + g ; elseif R(v) > R(u); % compute flow from v to u : ...

g = min(E(u, v), R(v) * W(v, u) ); S(u) = S(u) + g ; S(v) = S(v) - g ; f(t, u) = f(t, u) + g ; end end end end R(:) = S(:);... end

Functional Flow Algorithm

uniquegenes

genes drugs

unique drugs

annotatedR=infinity

unannotatedR=0

Test DrugsR= infinity

Test drugsR=0

Functional Flow - Application and Tests

Repeat process for each geneassociated with a minimal

number of drugs

ranking

sortedscores

Drug Search(Application)

Leave-one-outcross-validation Random

numbers

Precision & recallPrecision-recall plotAverage over unique genes

sorted*scores

* Not necessary to sort scores for LOOCV

k1234567

Precision = items found/ items retrieved

Recall = items found/ items sought

Information Retrieval:

Precision = True Pos/(True Pos + False Pos)

Recall = True Pos/(True Pos + False Neg) = True Pos/ # Positives

Classification:

Leave-one-outcross-validation (LOOCV)

Omit then rank Functional Flow for: Drug 1 Drug 2 Drug 3

1 1/3 1/3 0.33

2 1/6 1/3 0.22

3 2/9 2/3 0.33

4 3/12 3/3 0.40

5 3/15 3/3 0.33

6 3/18 3/3 0.29

7 3/21 3/3 0.25

k Prec. Recall F1

F1 measure = 2 • prec • recall / (prec. + recall)

k1234567

FPTNFNTNTNTNTN

TPTNTNTNTNTNTN

FPTNTNFNTNTNTN

k=1 k1234567

FPFPFNTNTNTNTN

TPFPTNTNTNTNTN

TNFPTNFNTNTNTN

k1234567

FPFPTPTNTNTNTN

TPFPFPTNTNTNTN

FPFPFPFNTNTNTN

k=3 k1234567

FPFPTPFPTNTNTN

FPFPFPFPTNTNTN

FPFPFPTPTNTNTN

LOOCV results(Classifications)

Precision = TP/(TP+FP)

Recall = TP/(TP+FN) = TP / (# positives)

k1234567

k=1 k1234567

FPFPFN

k1234567

FPFPTP

TPFPFP

FPFPFPFN

k=3 k1234567

FPFPTPFP

TPFPFPFP

TNFPFPTP

Information RetrievalClassifications

Precision = items found/ items retrieved = TP/(TP+FP)

Recall = items found/ items sought = TP/(TP+FN)

LOOCV Results

Parameters:

Minimum number of annotated drugs

Number of functional flow iterations

Tanimoto threshhold for non-zero edge

Precision-Recall Plots:

Leave-One-Out cross-validation for rankingsk of 1 through 50; averages for genes to whichLOOCV was applied

Random Rankings

Comparison of 4 vs. 10 iterationsfor a minimum of 25 annotated drugs/unique gene

and a Tanimoto threshold of 80%

10 iterations is too many (low precision). Note: prec.(1) = recall(1)

0 0.005 0.01 0.015 0.02 0.0250

precision

threshold 80, annotated 25, intervals 10

random

Precision

0 0.01 0.02 0.03 0.040

precision

random

Precision

Comparison of 4 vs. 8 iterationsfor a minimum of 50 annotated drugs/unique gene

and a Tanimoto threshold of 80%

8 iterations is too many (low precision). Note again: For the top-ranked LOOCV functional flow scores precision equals recall (k = 1).

0 0.01 0.02 0.03 0.04 0.050

precision

threshold 80, annotated 50, iterations 8

random

Precision

0 0.01 0.02 0.03 0.04 0.050

precision

random

Precision

0 0.01 0.02 0.03 0.040

precision

random

Precision

Comparison of 25 vs. 50 minimum numbers of annotated drugs/unique gene

(for 4 iterations and a Tanimoto threshold of 80%)

Requiring at least 50 annotated drugs increased precision and recall significantly

Effects of averagingKMAX= min(50, #annotated drugs-1)

0 0.01 0.02 0.03 0.04 0.050

precision

random

Precision

Comparison of 60 vs. 80% Tanimoto thresholds(for 4 iterations and a minimum number of

50 annotated drugs/unique gene)

Increasing the Tanimoto score threshold from 60% to 80%doubled the precision.

0 0.01 0.02 0.03 0.04 0.050

precision

random

Precision

0 0.005 0.01 0.015 0.020

precision

random

Precision

0 0.005 0.01 0.015 0.020

precision

random

Precision

0 0.005 0.01 0.015 0.02 0.0250

precision

random

Precision

For Tanimoto score threshold of 60% the precision is low.The results are quite variable for k > 28 with fewer annotated drugs.

0 0.01 0.02 0.03 0.040

precision

random

Precision

0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350

precision

random

Precision

Requiring at least 25 annotated drugs increased precision significantly, but predictions using fewer annotated drugs may nevertheless be useful

0 0.01 0.02 0.03 0.040

precision

random

Precision

Comparison of 70 vs. 80% Tanimoto thresholds(for 4 iterations and a minimum number of

Increasing the Tanimoto score threshold from 70% to 80%decreased the precision for the top ranked scores (k=1).

0 0.01 0.02 0.03 0.04 0.050

precision

random

Precision

0 0.005 0.01 0.015 0.02 0.025 0.030

precision

cluster threshold 60, annotated 25, iterations 4

random

Precision

0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

precision

random

PrecisionR

Using Clustered Drugs: Comparison of 60 vs. 70% Tanimoto thresholds(for 4 iterations and a minimum number of

25 annotated drugs/unique gene; graphconncomp)

Average Precision of > 6% achieved for top-ranked drugs (k=1) using clustered drugs only

Using Clustered Drugs: 70% Tanimoto thresholds(for 6 iterations and a minimum number of

Average Precision of > 6% achieved for top-ranked drugs (k=1) using clustered drugs only

0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

precision

random

Precision

0 0.005 0.01 0.015 0.02 0.025 0.030

precision

Disease to Drugs 80% threashold 50 annotations 4 intervals

random

Disease to Drugs: 80% Tanimoto threshold4 iterations and a minimum number of

50 annotated drugs/unique disease)

Precision

Average precision for top ranks (k=1) is only 2%, butLOOCV precison is double that of random model for k < 10.

Conclusions

With Tanimoto thresholds of 70-80% and relatively fewiterations (~4), Functional Flow may be useful to predicting new drugs that will interact with genes and diseases.

Descisions on parameters will depend on the economics of trading less precision for greater recall (increasing k) and the performance of Leave-One-Out Cross-Validation (LOOCV) for the genes and diseases that are of most interest.

If you look at more rankings you find more drugs, but you have to test more drugs

References

Nabieva, et al., 2005, Whole-proteome prediction of protein functionvia graph-theoretic analysis of interaction maps: bioinformatics, 21, Suppl. 1, 2005, i302–i310.

MacCuish , J. D., and MacCuish, N. E., 2003, Mesa Suite Version 1.2: Fingerprint Module: Mesa Analytics & Computing, LLC

Brown, R. D.; Martin, Y.C., 1996, Use of structure-activity dataTo compare structure-based clustering methods and descriptors for use in compound selection: J. Chem. Inf. Compu. Sci, 36, 572-584.

Gunther, et al., 2007, Super target and Matador: resources for exploring drug-target relationships, Nucleic Acids Research, 1-4

Acknowledgments

Special thanks to Drs. Predrag Radivojac, David Wild, Sun Kim, Mehemet Dalkilic, Rajarshi Guha, Haixu Tang and the faculty of Bioinformatics and Cheminformatics. Also thanks to Jefferson Davis (Math/Stat), Bob Konicek, and of course Linda Hostetter.

Thank you all and enjoy the rest of the summer!

Predicting Drug-gene and Drug-disease Networks using Functional Flow Bioinformatics Capstone Project...

Documents

Transcript of Predicting Drug-gene and Drug-disease Networks using Functional Flow Bioinformatics Capstone Project...

Capstone Collegiate Communities - Relationsrelations.gmu.edu/wp-content/uploads/2018/10/... · Capstone Collegiate Communities ... •Capstone Collegiate Communities (“Capstone”)-

Bloomington Area Career Center

City of Bloomington Area #1A: South-West …bloomington.in.gov/trades/annexation/1A_Summary.pdfCity of Bloomington Area #1A: South-West Bloomington Annexation Area - Tax Impact Analysis

Afghanistans Drug Trade Since 2001 Richard Moore Capstone Presentation December 2008.

E Bloomington-Normal Street Index - The Pantagraph · 2009-06-18 · Bloomington-Normal Street Index BLOOMINGTON STREET INDEX Aberdeen Way . . . . . . . . . . . . . . . . . . . .

INDIANA UNIVERSITY - Bloomington · 2019-12-02 · The Indiana University – Bloomington Institutional Biosafety Committee (IU-Bloomington IBC) is charged with the responsibility

Bloomington and PBPA

Bloomington - Chicago's Pizza Menu

MILI Bloomington Flyer

Dental Implants in Bloomington

MOA-Bloomington Letter Lush

Bloomington High School

Ivy Tech Community College Dean's List - Summer 2016 2016 Deans List.pdf · Bloomington Abney David C Bloomington Albertini Amorim Ana C Bloomington Allen Brooke M ... Bloomington

Student Capstone Guidebook (adapted) 07OCT2019blogs.vsb.bc.ca/magee-clc-capstone/files/2019/10/Student-Capstone... · CLC 12 – Capstone Project Guidebook 9 3. The Project Capstone

Getting around Bloomington

Bloomington South Optimist

Architecture of Bloomington

Bloomington - Xylophone (close )

CBU Construction Spec-2017 - Bloomington, Indiana · 2018-07-10 · CITY OF BLOOMINGTON UTILITIES – CONSTRUCTION SPECIFICATIONS January 20, 2017 Page 2 CITY OF BLOOMINGTON UTILITIES

Bloomington 100 Things