Extracting actionable knowledge from large scale in vitro pharmacology data

31
M e d C h e m i c a Ed Griffen, MedChemica Ltd Extracting actionable knowledge from large scale in vitro pharmacology data

Transcript of Extracting actionable knowledge from large scale in vitro pharmacology data

Ed Griffen, MedChemica Ltd

Extracting actionable knowledge from large scale in vitro pharmacology data

MedChem

icaWhy improve medicinal chemistry practice?For an aging population and emerging pathogens“Eroom’s Law” – The cost of discovering a new drug has doubled every 9 years consistently for the last 60 years.1

= cost 8%/year

2

1. Scannell et al Nature Reviews Drug Discovery (2012), 11, 191-2002. Paul et al Nature Reviews Drug Discovery (2010), 9, 203-214

Cost / $million

Cost/Launch(2010): $873mCapitalised: $1.8Bn2

targe

t-to-hi

t

Hit-to-L

ead

Lead

Opti

misatio

n

Preclin

ical

Phase

I

Phase

II

Phase

III

Submiss

ion to

Laun

ch0

50

100

150

200

250

300

350

400

450

500

Cost / projectCost/LaunchCost/Launch (capitalized)

MedChem

icaActionable knowledge

Critical information that the user can immediately choose a course of action from:

3

ADME– ways to ‘fix’ your molecule

Toxicology – sub structures to avoid

Pharmacology – substructural leads built for practical design

MedChem

ica

Roche Data

rule finde

r

RocheDatabas

e

Genentech Data

rule finde

rGenentech

Data

AZData

rule finde

r

AZ Databas

e

Grand Rule

Database

ADMET Rule databaseBetter medicinal chemistry by combining knowledge

MedChemica

Grand Rule

Database

Grand Rule

Database

Grand Rule

Database

AZExploitation

RocheExploitation

GenentechExploitation

Pharma 4 Data

rule finde

rPharma 4

DataGrand Rule

Database

Pharma 4Exploitation

Grand Rule

Database

Pharma 5 Data

rule finde

rPharma 5

DataGrand Rule

Database

Pharma 5Exploitation

Grand Rule

Database

>500 million pairs from companies+ 12 million from public data

Current Knowledge sets – GRDv3Numbers of statistically valid transformsGrouped Datasets Number of

RuleslogD7.4 153449Merged solubility 46655In vitro microsomal clearance:

Human, rat ,mouse, cyno, dog88423

In vitro hepatocyte clearance :

Human, rat ,mouse, cyno, dog26627

MCDK permeability A-B / B – A efflux 1852Cytochrome P450 inhibition:

2C9, 2D6 , 3A4 , 2C19 , 1A2 40605

Cardiac ion channels NaV 1.5 , hERG ion channel inhibition 15636

Glutothione Stability 116plasma protein or albumin binding Human, rat ,mouse, cyno, dog 64622

MedChem

icaActionable knowledge

Critical information that the user can immediately choose a course of action from:

6

ADME– ways to ‘fix’ your molecule

Toxicology – substructures to avoid

Pharmacology – substructural leads built for practical design

MedChem

icaClear structural direction from Big DataExampleDopamine Transporter inhibitors

7

pKi Predicted 8.6Measured 9.1Mean with Pharmacophore 8.3 Mean without 6.7n examples 27Odds ratio : ChEMBL 407

What do I want?:

• Substructures associated with potency

• Specificity of model

• Predictions

• Domain of Applicability

CHEMBL538405

MedChem

icaMedChemica Principles of Pharmacophore Extraction

• Pharmacophores must be clear and understandable• Pharmacophore generation must be transparent to allow checking and

validation• Use as much measured data as possible• Look for key elements influencing potency• Don’t base pharmacophores on a few compounds• Pharmacophore must be specific

• (not like phenyl + amine = hERG inhibitor)• Can be applied quickly (to large libraries)

8

Cation

HyAr

HyAr

How do I actually use

this?

MedChem

icaQSAR and Knowledge extractionModel as filter or knowledge?

9

substructures Physical chemistry descriptors(Hansch, Taft, Fujita, Abraham)

Atomic, pair, tripletdescriptors

Indices

(M)LR Free Wilson

PLS

Trees / Forests

SVM

Bayesian NN

Deep Learning Dark Black

INTERPRETABILITY

Descriptors

Method

MedChem

ica

• Identify key potency giving changes by matched molecular pair analysis on large datasets • Extract fragments that are associated with potency• Find pairs of fragments and linkers that are specific to potent compound subsets• easy to use 2D pharmacophores

• 2 potency enhancing fragments joined by a specific linker

Specific Pharmacophore extraction from MMPA

10

Model

1470 compounds CHEMBL339Dopamine

transporter

Pharmacophore

Identification CHEMBL538405 pKi 9.1

ExampleFragment I

Fragment 2Linker

pKi Predicted 8.6Measured 9.1Mean with Pharmacophore 8.3 Mean without 6.7n examples 27Odds ratio : ChEMBL 407

Predict potency and

show Pharmacopho

re match

Public Data

Find Matched Pairs

Pharmacophores

Find Pharmacopho

re dyads

Find Potent

Fragments

MedChem

ica

Matched Molecular Pairs• Molecules that differ only by a

particular, well-defined structural transformation

Transformation with environment capture

• MMPs can be recorded as transformations from A B

• Environment is essential to understand chemistry

Statistical analysis • Learn what effect the transformation has had on properties in the past

Griffen, E. et al. Matched Molecular Pairs as a Medicinal Chemistry Tool. Journal of Medicinal Chemistry. 2011, 54(22), pp.7739-7750.

Advanced MMPA

Δ Data A-B1

2

2

33

3

4

44

1223

3

34

44

A B

Public Data

Find Matched Pairs

Fragments

MedChem

icaMatched pair methodologybecause MCSS and F&I each find different pairings

A – CHEMBL156639 B - CHEMBL2387702 A – CHEMBL100461 B –CHEMBL103900

MCSS ✓, F&I ✗ MCSS ✗ , F&I ✓

MCSS ✓, F&I ✗

MCSS ✓, F&I ✗

MCSS ✗, F&I ✗ MCSS ✗, F&I ✗MCSS ✗ , F&I ✓

MCSS ✓, F&I ✗

MedChem

icaDoes the Matched Pair method really matter?Using only one technique will miss between 12% and 56% of pairings

13

Pairings Pairingsnumber of compounds common FI only MCSS only total FI only % common % MCSS only %

VEGF 4466 14631 17172 14823 46626 37 31 32Dopamine Transporter 1470 4480 8930 3497 16907 53 26 21

GABAA 848 2500 1722 4205 8427 20 30 50

D2 human 3873 12995 13811 13098 39904 35 33 33

D2 rat 1807 5408 6595 7346 19349 34 28 38Acetylcholine esterase 383 536 725 1434 2695 27 20 53Monoamine oxidase 264 653 1156 246 2055 56 32 12

min 20 20 12

max 56 33 53

FI MCSS

com

mon

MedChem

icaMining transform sets to find potent fragments

Identify the ‘A’ fragments associated with a significant number `of potency decreasing changes – irrespective of what they are replaced with‘A’ is ‘better than anything you replace it with’

Fragment A Fragment BChange in binding measurement

• One-tailed binomial test with Holm–Bonferonni correction at 95% confidence identifies potent fragments

• Compare the mean of the compounds that contain the fragment with the mean of the remaining compounds

Statistics:

pKi/pIC50

Compounds containing

potent fragment

Remaining compounds

Effect size = Cohen’s d test

A

BC ED

+2.1+2.2+1.4

+0.4 F

+1.8

Public Data

Find Matched Pairs

Find Potent

Fragments

Cohen’s d

Effect sizes:Large >= 0.8Medium 0.5 – 0.8Small 0.2 - 0.5Trivial 0.1 – 0.2No effect < 0.1

MedChem

icaMining transform sets to find destructive fragments

Identify the ‘Z’ fragments associated with a significant number `of potency increasing changes – irrespective of what they are replaced with‘Z’ is ‘worse than anything you replace it with’

Fragment A Fragment BChange in binding measurement

Public Data

Find Matched Pairs

Find Potent

Fragments

+2.7

+3.2+0.6

+0.6

Z

pKi/pIC50

Compounds containing destructive fragment

Remaining compounds

MedChem

icaMining transform sets to find influential fragments

Identify the ‘Z’ fragments associated with a significant number `of potency increasing changes – irrespective of what they are replaced with‘Z’ is ‘worse than anything you replace it with’

Fragment A Fragment BChange in binding measurement

Public Data

Find Matched Pairs

Find Potent

Fragments

+2.7

+3.2+0.6

+0.6

Identify the ‘A’ fragments associated with a significant number `of potency decreasing changes – irrespective of what they are replaced with‘A’ is ‘better than anything you replace it with’

A+2.1+2.2

+1.4+0.4

+1.8

Z

pKi/pIC50

Compounds with destructive fragment

Compounds with constructive

fragments

MedChem

ica

17

Building Pharmacophores from potent Fragments

But individual Fragments are small and often non – specific so…

• Permutate all the pairs of fragments and find the the shortest path between them (pharmacophore dyads) in the training set

• shortest path between them encodes distance & geometry

• select pharmacophore dyads with PLS to identify the dyads that are explaining most of the potency

• check for significance and effect size with Cohen’s d and Welch’s t-test.

• But what about specificity?

Path

Fragment 1

Fragment 2[CH2]CN

Public Data

Find Matched Pairs

Pharmacophores

Find Pharmacopho

re dyads

Find Potent

Fragments

MedChem

icaTesting for specificity - pharmacophores

• How selective is the pharmacophore?• What are the odds of it hitting a molecule in the test

set vs CHEMBL?

• Odds of finding in potency set =n(pharmacophore hits in potency set)

n(in potency set)

• Odds of finding in CHEMBL =n(pharmacophore hits in CHEMBL not in potency set)

n(in CHEMBL)

• Odds ratio = selectivity =

Odds of finding in potency set_______

Odds of finding in CHEMBL(not potency set)

18

271470

621351211

27/147062/1351211=407(95% confidence limits: 259-642)

Odds of hitting a potent compound are 407 times greater than a random compound in CHEMBL

Path

Fragment 1

Fragment 2[CH2]CN

MedChem

icaHow specific is a Pharmacophore?What does a bad odds ratio look like?

What is the odds ratio?

Found in CHEMBL 565658/1352681

Found in CHEMBL240 – hERG where pIC50 >=5 1985/2451

OR = 1985/2451 = 0.81565658/1352681 0.42

=1.94 (95% conf 1.83 – 2.05)

19

Lipophilic base, usually a tertiary amineX = 2-5 atom chain, may include rings, heteroatoms or polar groups

XN

R1

R2

e. g. sertindole: 14nM vs hERG

[$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~*~c)]

Early simple hERG model

Ar-linker-base has only been found 1.9x more often in hERG inhibitors than at random in ChEMBL

MedChem

icaDomain of Applicability“Whereof one cannot speak, thereof one must be silent.”1

Claiming to have extracted knowledge or making a prediction when we know don’t have enough evidence is:

• Delusional• Dangerous

• it would be more productive to act on a different hypothesis or at random• Degrades using rational analysis at all

Compound activity prediction should have three classes of output:

• Active• Inactive• Out of domain – no prediction possible

Only fragments with sufficient evidential base are used to form into pharmacophore dyads

In turn only pharmcophore dyads that have enough support are used in the model

201. Wittgenstein, Tractatus Logico-Philosophicus, 1922

MedChem

icaModel activity from presence of Pharmacophores

21

Identify and group Fragment SMARTS from MMPA

If n ≥ 8, perform a one-tailed binomial test with Holm-Bonferroni adjustment

Remove non significant ‘Biophores’

Compare the mean of the compounds containing the biophore with the mean of the remaining compounds for significance (Welch’s t test and effect size Cohen’s d)

Permutate all the significant Biophores and determine the shortest paths between them in the training set = Pharmacophore

dyads

Select Pharmacophore dyads with n >=6 examples

Use presence /absence of Pharmacophore dyad as an

indicator variable in PLS modelling

Dopamine Transport +/- pharmacophores

MedChem

icaModelling critical safety targets

22

1. J. Bowes, A. J. Brown, J. Hamon, W. Jarolimek, A. Sridhar, G. Waldron, and S. Whitebread, “Reducing safety-related drug attrition: the use of in vitro pharmacological profiling,” Nat. Rev. Drug Discov., vol. 11, no. 12, pp. 909–922, Nov. 2012

Public Data

Find Matched

PairsPharmacophores

Find Pharmacophore

dyads

Find Potent

Fragments

Target Class Effect Number of compoundsAcetylcholine esterase - human enzyme CV: drop in BP, drop in HR, bronchioconstriction 383

b 1 adrenergic receptor GPCR CV: change in HR, BP, bronchiodilation, vasodilation, tremor 505

Androgen receptor NHR Endocrine: agonism: androgenicity / gynecomastia, prostrate / breast carcinoma 1064

CB1 canabinnoid receptor GPCR CNS: euphoria, dysphoria, anxiety, memory impairment, analgesia, hypothermia, weight loss, emesis, depression 1104

CB2 canabinnoid receptor GPCR increased inflammation 1112Dopamine D2 receptor - human GPCR CNS: hallucinations, drowsiness, confusion, emesis,

CV drop in heart rate 3873

Dopamine D2 receptor - rat GPCR As human 1807

Dopamine Transporter Transporter CNS: addictive psychostimulation, depression , parkinsonism, seizures 1470

GABA A receptor Ion channel CNS: anxiolysis, ataxia, sedation, depression, amnesia 848hERG ion channel Ion channel CV: QT prolongation 41895HT2a receptor GPCR CNS:drop in body temp, anxiogenic 642Monoamine oxidase enzyme CV increase BP, DDI potential CNS: dizziness, nausea 264Muscarinic acetyl choline receptor M1 GPCR CNS: proconvulsant, drop in cognitive function, vision

impairment 628

m opioid receptor GPCR CNS: sedation, abuse liability, respiratory depression, hypothermia 1128

MedChem

ica

Target Number of compounds

Number of compound

pairsNumber of Fragments

Number of Pharmacophore

dyads after filtering

R2 RMSEP ROC odds_ratio (geomean)

Acetylcholine esterase - human 383 27755 44 10 0.43 1.57 0.80 4

b 1 adrenergic receptor 505 145447 276 313 0.64 0.70 0.96 833

Androgen receptor 1064 113163 186 46 0.47 0.77 0.86 140

CB1 canabinnoid receptor 1104 88091 165 90 0.61 1.02 0.87 96

CB2 canabinnoid receptor 1112 82130 194 158 0.19 0.85 0.64 5.7

Dopamine D2 receptor - human 3873 230962 483 602 0.42 0.88 0.69 110

Dopamine D2 receptor - rat 1807 118736 267 377 0.29 0.85 0.78 125

Dopamine Transporter 1470 106969 282 336 0.58 0.73 0.88 141

GABA A receptor 848 39494 106 167 0.70 0.76 0.97 560

hERG ion channel 4189 242261 392 76 0.61 0.96 0.92 55

5HT2a receptor 642 50870 197 267 0.61 0.59 0.83 600

Monoamine oxidase 264 15439 44 11 0.12 1.25 0.48 181Muscarinic acetylcholine receptor M1 628 48200 97 510 0.62 0.94 0.89 48

m opioid receptor 1128 37184 33 11 0.69 1.30 0.87 81

Modelling critical safety targets

• Build models using 10-fold cross validated PLS• Assess using ROC / BEDROC, R2 vs 100 fold y-scrambled R2 and geomean odds

ratio

23

Public Data

Find Matched

PairsPharmacophores

Find Pharmacophore

dyads

Find Potent

Fragments

MedChem

icaToxophore examplesDetailed, specific & transparent

24

Dopamine D2 receptor humanActual: 9.5Predicted: 9.1Mean with: 8.0Mean without: 6.6Odds Ratio: 340

Dopamine TransporterActual: 9.1 Predicted: 8.6 Mean with: 8.3Mean without: 6.7Odds Ratio: 407

GABA-AActual: 9.0Predicted: 8.7Mean with: 8.0Mean without: 6.8Odds Ratio: 1506

b1 adrenergic receptorActual: 7.8Predicted: 7.7Mean with: 6.5Mean without: 5.7Odds Ratio: 1501

MedChem

icaSafety Target Conclusions

• We can model safety critical targets and extract both predictive models and useful ligand structural information

• Clear areas to action

• Clearly defined domain of applicability• No prediction where there is insufficient evidence (conservative method)

• The method relies on having large data sets >= 500 data points• MMPA is computationally intense phase• But of course molecules only need pairing once…

25

MedChem

icaActionable knowledge

Critical information that the user can immediately choose a course of action from:

26

ADME– ways to ‘fix’ your molecule

Toxicology – sub structures to avoid

Pharmacology – substructural leads built for practical design

MedChem

icaPrediction of unseen new moleculesThe acid test…

• Vascular endothelial growth factor receptor 2 tyrosine kinase (KDR)• Inhibitors have oncology and ophthalmic indications• Large dataset in CHEMBL• 10 fold cross validated PLS model• Selected model by minimised RMSEP

27

Compounds 4466Matched Pairs 288100Fragments 678

Pharmacophore dyads 787RMSEP 0.8R2 0.64Y-scrambled R2 0.0ROC 0.95Geomean odds ratio 80

MedChem

icaNovartis Predictions From Our ModelDomain of Applicabiltiy….

Actual: 8.4[1]

Predicted: 7.5

28

Actual: 7.6[1]

Predicted: 7.5

1. J MedChem(2016), Bold et al.2. MedChem Lett (2016), Mainolfi et al.

Actual: 7.7[2]

Predicted: 7.1 Actual: 9.0[2]

Predicted: Out of Domain

MedChem

icaValue of Potency prediction from MMPA:Clear substructures enable rapid actions

29

Compounds + data

Safety data

Potency data

HTS data

Toxicity alerts

Virtual Library prioritisation

Virtual Library design

Fragment set design

Retest prioritisation

Hit re-mining / analogue hunting

Substructure modification

Lead design

Fast Follower design

26 examples in training set

Mean without pharmacophor

e

Mean with pharmacophor

e

MedChem

icaThe MedChemica team

Andrew G LeachAl DossetterShane MontagueLauren Reid*Jess Stacey*

*Royal Society of Chemistry Industrial Placements Grant Scheme

MedChem

icaA Collaboration of the willing

Craig Bruce OEDavid Cosgrove GalCozAndy Grant★

Martin Harrison ElixirPaul Faulder ElixirAndrew Griffin ElixirHuw Jones Base360Al RabowDavid Riley AZGraeme Robb AZAttilla Ting AZHoward Tucker retiredDan Warner MyjarSteve St-Galley SygnatureDavid Wood JBA Risk

Management

Phil Jewsbury AZMike Snowden AZPeter Sjo AZMartin Packer AZManos Perros AZNick Tomkinson AZMartin Stahl RocheJerome Hert RocheMartin Blapp RocheTorsten Schindler RochePaula Petrone RocheJohn Cumming RocheJeff Blaney GenentechHao Zheng GenentechSlaton Lipscomb GenentechJames Crawford Genentech