Pharmacology Dentalelle Tutoring. Page 9 = Pharmacology abbreviation.
Predicting Pharmacology
-
Upload
willem-van-hoorn -
Category
Technology
-
view
666 -
download
1
description
Transcript of Predicting Pharmacology
WP van Hoorn, Feb 20061
Predicting Pharmacology
Willem van Hoorn
Pfizer Global Research & Development
Sandwich
UK
Pipeline Pilot UGM, San Diego, Mar 2006
WP van Hoorn, Feb 20062
Willem van Hoorn
Standing on the Shoulders of Giants
Gaia Paolini
Richard Shapland
Andrew Hopkins
Jonathan Mason
WP van Hoorn, Feb 20063
The Work of Giants
4.8 M structures
275k active compounds
600k activities (IC50, etc)
3k targets
800 human targets
InpharmaticaStARLITe
CerepBioprint
ThomsonIDDB
Pfizer in house
• Oracle / DayCard cartridge• Structures stored as smiles• Pipeline Pilot:• Canonical tautomers, salt stripping, etc• Access: ODBC components + web service• Pfizer compound structure retrieval
Unified DB
WP van Hoorn, Feb 20064
Why Giants Are Required
WP van Hoorn, Feb 20065
Unified DB
Unified Database as Starting Point
Bayesian Learn Molecular Categories
Predicting activities
Linear Discriminant Analysis (LDA)
Predicting gene families
Polypharmacology interaction network
WP van Hoorn, Feb 20066
MetalloproteasesMetalloproteases
Cysteine proteasesCysteine proteases
Serine proteasesSerine proteases
PhosphodiesterasesPhosphodiesterases Aminergic GPCRsAminergic GPCRs
Peptide GPCRsPeptide GPCRs
GPCRs (others: classes A, B & C)GPCRs (others: classes A, B & C)
Enzymes Enzymes (hydrolases, transferases, oxidoreductases & others)(hydrolases, transferases, oxidoreductases & others)
Ion ChannelsIon Channels
Nuclear hormoneNuclear hormonereceptorsreceptors
Aspartyl proteasesAspartyl proteases
KinasesKinases
MiscellaneousMiscellaneous
Polypharmacology Network From Binding Data
Node : targetEdge : compound
WP van Hoorn, Feb 20067
Deriving Multi-Category Bayesian Model
Unified DB
238k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,≥ 10 actives / target
FCFP_6
90% / 214k 10% / 23,792
55,781 activities
698 models
WP van Hoorn, Feb 20068
Assessing the Predictions of the Random Test Set
Large number of predictions:• 23,792 * 698 ~ 16.6M• 55,781 activities, rest unknown presumed inactive• Interpretation of Bayesian score?• Score ≥ cut-off : active, rest inactive• # predicted actives = F(cut-off)
Comparison with random:• For each cut-off: calculate number of predicted actives• Generate exactly same number of random predicted actives
WP van Hoorn, Feb 20069
50
Assessing the Predictions of the Random Test Set
58,428 predictions / 17,210 compounds16,281 compounds ≥1 correct prediction31,600 true positives (random: 292)Enrichment ~ 100 fold26,828 false positives (random: 55,489)24,181 false negatives
WP van Hoorn, Feb 200610
Nuclear hormone receptorsNuclear hormone receptors
Ion ChannelsIon Channels
PhosphodiesterasesPhosphodiesterases
AminergicAminergicGPCRsGPCRs
PeptidePeptideGPCRsGPCRs
GPCRs (others)GPCRs (others)
Enzymes Enzymes (others)(others)
True positive prediction
False positive prediction
Predicted Polypharmacology Network At Bayesian Cut-off 50
WP van Hoorn, Feb 200611
Predicted Polypharmacology Network At Bayesian Cut-off 50
• At confidence level 50, most predictions are intra gene class• Quite a few false positive connections coincide with true positives• Exceptions: Ion Channels, Enzymes-others• Although the prediction is wrong, the connection is right?• Or the prediction is right and the connection is false negative (not measured?)• Most interesting part of predicted connections to test• Compare to Peter Willett’s work in similarity searches:
(Next) Nearest neighbours of inactive nearest neighbours are equal likely to
be active as nearest neighbours themselves: J. Med. Chem. 2005, 48, 7049
WP van Hoorn, Feb 200612
A More Challenging Test Set: Cerep Bioprint
Unified DB
238k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,≥ 10 actives / target
FCFP_6
237k
Bioprint997 compounds316 targets
694 models
WP van Hoorn, Feb 200613
A More Challenging Test Set: Cerep Bioprint
50
720 predictions / 291 compounds210 compounds ≥1 correct prediction433 true positives (random: 17)Enrichment ~ 25 fold287 false positives (random: 55,489)12,281 false negatives
WP van Hoorn, Feb 200614
Another Look At The Same Data
0
36,222 predictions 6,121 true positives30,101 false positives6,593 false negatives48% of actives in 11% of dataPlus 378 extra predicted targets
WP van Hoorn, Feb 200615
A More Challenging Test Set: Cerep Bioprint
• Bioprint harder to predict than 10% random test set • Data can be interpreted depending on need• Few high confidence predictions, appropriate for triaging HTS hits• Many low confidence predictions, appropriate for risk assessment of lead
WP van Hoorn, Feb 200616
length
height
left rim bottom rim
H. LohningerTeach/Me Data Analysishttp://www.vias.org/tmdatanaleng
Linear Discriminant Analysis
diagonal
NOTE Length Left Right Bottom Top Diagonal GenuineBN1 214.8 131.0 131.1 9.000 9.700 141.0 true
BN2 214.6 129.7 129.7 8.100 9.500 141.7 true
BN3 214.8 129.7 129.7 8.700 9.600 142.2 true
BN4 214.8 129.7 129.6 7.500 10.40 142.0 true
BN5 215.0 129.6 129.7 10.40 7.700 141.8 true
BN6 215.7 130.8 130.5 9.000 10.10 141.4 true
BN7 215.5 129.5 129.7 7.900 9.600 141.6 true
BN8 214.5 129.6 129.2 7.200 10.70 141.7 true
BN9 214.9 129.4 129.7 8.200 11.00 141.9 true
BN10 215.2 130.4 130.3 9.200 10.00 140.7 true
…. …. …. …. …. …. …. ….
BN195 214.9 130.3 130.5 11.60 10.60 139.8 false
BN196 215.0 130.4 130.3 9.900 12.10 139.6 false
BN197 215.1 130.3 129.9 10.30 11.50 139.7 false
BN198 214.8 130.3 130.4 10.60 11.10 140.0 false
BN199 214.7 130.7 130.8 11.20 11.20 139.4 false
BN200 214.3 129.9 129.9 10.20 11.50 139.6 false
• Similar to PCA which tries to represent classes• Tries to discover what distinguishes classes• Compare letters: O and Q• PCA focuses on circle, LDA on tail• Web example: distinguish between genuine and false banknotes• Training set: 200 banknotes, 100 genuine / 100 forgeries
WP van Hoorn, Feb 200617
Predicting Forgeries with LDA and Bayesian
NOTE Length Left Right Bottom Top Diagonal BankNotes LD1
BN1 215.1 130.0 129.8 9.100 10.20 141.5 true 2.501
BN2 214.7 130.7 130.8 11.20 11.20 139.4 false -4.561
BN3 214.3 129.9 129.9 10.20 11.50 139.6 false -3.390
BN4 214.7 130.0 129.4 7.800 10.00 141.2 true 4.060
NOTE Length Left Right Bottom Top Diagonal BankNotesBayes
BN1 215.1 130.0 129.8 9.100 10.20 141.5 1.992
BN2 214.7 130.7 130.8 11.20 11.20 139.4 -6.611
BN3 214.3 129.9 129.9 10.20 11.50 139.6 -6.341
BN4 214.7 130.0 129.4 7.800 10.00 141.2 1.771
LDA
Bayesian
WP van Hoorn, Feb 200618
Predicting Gene Class by Physical Properties
Compounds binding to different gene classes posses different
physical property distributions:
Can this be used to predict gene class from physical properties alone?
How does LDA compare to Bayesian?
Mw clogP
WP van Hoorn, Feb 200619
Predicting Gene Class by Physical Properties
Unified DB
148k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,binding to single target class only
Aminergic GPCRsAspartyl ProteasesCysteine ProteasesEnzymes- othersGPCRs Class A- othersGPCRs Class BGPCRs Class CHydrolasesIon Channels- Ligand_GatedIon Channels- othersKinases- othersMetalloproteasesNuclear hormone receptorsOthersOxidoreductasesPDEsPeptide GPCRsProtein KinasesSerine ProteasesTransferases
20 Gene Classes:
WP van Hoorn, Feb 200620
Molecular_WeightNum_H_Acceptors Num_H_DonorsNum_RotatableBondsMolecular_PolarSurfaceAreaNo_IonCenters Molecular_SolubilityMolecular_SurfaceAreaClogP *Andrews*
Predicting Gene Class by Physical Properties
10 Descriptors:
147,534
118,118
29,416
WP van Hoorn, Feb 200621
Predicting Gene Class by Physical Properties
29416 (9025)1 (0)
349 (137)5309 (1423)8123 (2811)
791 (248)888 (241)2638 (499)482 (163)279 (74)
0 (0)152 (59)47 (0)0 (0)0 (0)1 (0)
1268 (366)1969 (321)
75 (28)1180 (613)
5864 (2042)LDA (correct)
29416 (5631)1012 (125)792 (133)341 (147)
2809 (1135)2176 (392)1437 (329)
90 (47)2083 (345)1626 (293)1545 (100)964 (104)
2109 (280)350 (42)
3346 (146)2340 (115)962 (309)
1 (0)1464 (73)
1670 (614)2299 (902)
Bayes (correct)
29416 (1447)1460 (36)1526 (53)
1488 (148)1461 (236)1468 (56)1492 (54)
1465 (167)1459 (53)1515 (47)1430 (11)1441 (29)1448 (52)1461 (15)1438 (29)1477 (14)
1524 (117)1451 (135)1470 (13)1479 (29)
1463 (153)Random (correct)
29416727913
292750271178138533361238849198594764286339226
26472574252728
3228ExperimentTarget class
TotalTransferasesSerine ProteasesProtein KinasesPeptide GPCRsPDEsOxidoreductasesOthersNuclear hormone receptorsMetalloproteasesKinases- othersIon Channels- othersIon Channels- Ligand_GatedHydrolasesGPCRs Class CGPCRs Class BGPCRs Class A- othersEnzymes- othersCysteine ProteasesAspartyl ProteasesAminergic GPCRs
WP van Hoorn, Feb 200622
Predicting Gene Class by Physical Properties
• Enrichment over random: LDA ~ 6 fold, Bayes ~4 fold• Bayesian: more equal spread• LDA: some baskets contain too many eggs?• Some of the misclassifications might be true: many missing values• Unbiased and fast method to (pre)screen large compound collection• Compare with other unbiased methods: docking, pharmacophore search
WP van Hoorn, Feb 200623
Conclusions
• Data from heterogeneous sources can be combined in one knowledge base• Predictive Bayesian models can be derived from it• Models are adaptive, regenerate to incorporate latest experimental results• Models are not replacement for experiment• Models can lead to substantially lower screening investment• Drug design compared to supermarket stock inventory:
Just in time delivery vs. just enough screening
• Don’t discount simple molecular properties
WP van Hoorn, Feb 200624