ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI [email protected].

29
ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI [email protected]

Transcript of ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI [email protected].

Page 1: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL –Large-Scale Open Access Data for

Drug Discovery

John OveringtonEMBL-EBI

[email protected]

Page 2: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Private to Public Domain Transfer

• Five year strategic award from Wellcome Trust

• Large-scale Drug Discovery Structure Activity Relationship (SAR) data

• Linking small molecule structures to ‘targets’ and pharmacological activities – Chemogenomics/Chemical Biology

• ‘Open Access’, ‘User Friendly’, ‘Translational’, ‘Free’

• Multiple access mechanisms

• Full database download, web front-ends, web services

• Actively support ad hoc sabbaticals (academic and commercial) at EMBL-EBI

Page 3: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL Research Strategy

• Comprehensively catalogue historical drug discovery

• Include successes and failures

• Drugs can be small molecules, recombinant proteins, siRNA, etc.

• Derive rules for drug discovery ‘success’ from these data

• Target selection and prioritisation

• Lead discovery, optimisation, candidate selection

Page 4: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Target Discovery

Lead Discovery

Lead Optimisatio

n

Preclinical Development

Phase 1

Phase 2

Phase 3

Launch

Drug Discovery Process (simplified)

>450,000 distinct compounds~25,000 distinct lead series

~12,000 candidates~1,300drugs

•Target identification•Microarray

profiling•Target

validation•Assay

development•Biochemistry

•Clinical/Animaldisease models

•High-throughput

Screening (HTS)•Fragment-

basedscreening•Focused libraries

•Screening collection

•Medicinal Chemistry•Structure-

baseddrug design•Selectivity

screens•ADMET screens•Cellular/Animaldisease models•Pharmacokineti

cs

•Toxicology•In vivo safety pharmacology•Formulation

•Dose prediction

PKtolerabili

ty

Efficacy

Safety&

Efficacy

IndicationDiscovery & expansion

Med. Chem. SAR Clinical Candidates

Drugs

Discovery Development Use

Clinical Trials

Page 5: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL: Launched Drugs• Database of all approved drugs

• Chemistry and sequence ‘aware’

• Contents• Small molecules and biological therapeutics

• USANs, INNs, research codes, other synonyms

• Pharmaceutical properties, prodrugs, dosage, form, etc

• PK data and metabolites, black box warnings, etc.

• 1,378 chemically distinct ‘drugs’, 324 distinct molecular targets

• Controlled vocabulary indications dictionary and hierarchy

Page 6: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

New Drugs 2006-2009

Enzyme

mAb

Peptide

Other

Protein

Natural

Product

Synthetic small

molecule

Page 7: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL: Launched DrugsNat. Rev. Drug Disc., 5, pp. 993-996 (2006)

Page 8: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL: Drug Dosage

Binned log10 mole dose-8.4 -8.08 -7.76 -7.44 -7.12 -6.8 -6.48 -6.16 -5.84 -5.52 -5.2 -4.88 -4.56 -4.24 -3.92 -3.6 -3.28 -2.96 -2.64 -2.32

0

10

20

30

40

50

60

70

80

mmolmolnmol

~150-200mol

Steroids, thyroids

Metformin,Hydroxyurea

Page 9: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Affinity Of Drugs For Their Targets• Retrieved Ki, Kd, IC50, EC50, pA2, … endpoints for

drugs against their ‘efficacy targets’

2 3 4 5 6 7 8 9 10 11 120

50

100

150

200

250

300

350

400

Fre

quen

cy

-log10 affinity

10mM 1mM 100M 10M 1M 100nM 10nM 1nM 100pM 10pM 1pM

Page 10: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Function for Drug Efficacy/Affinity

• Empirical function that estimates the probability of in vivo activity for a compound with acceptable PK characteristics as a function of target affinity

0 2 4 6 8 10 120.0

0.2

0.4

0.6

0.8

1.0

P(e

ffic

acy)

-log10 Affinity

mM M nM pM

Page 11: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL: Clinical Candidates

• Database of clinical development candidates• Contains ~10,000 2-D structures

• Estimated size ~35-45,000 compounds

• Work in progress• Deeper coverage of key gene families

• e.g. Protein kinases, 184 distinct clinical candidates

0

10

20

30

40

50

60

70

80

90

Launched III II I

VEGFR

PDGFR

p38a

C-Kit

CDKErbBAurora

Kinase clinical candidatesby highest phase

Clinical candidates by target

Page 12: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Industry Productivity

File Registration number vs USAN date

0

100000

200000

300000

400000

500000

600000

700000

800000

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

Page 13: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Industry Productivity

0

10

20

30

40

50

60

70

1-

100,000

100,001-

200,000

200,001-

300,000

300,001-

400,000

400,001-

500,000

500,001-

600,000

600,001-

700,000

700,001,

800,000

File registration number range

64 USANs/100,000 compounds

1.9 USANs/100,000 compounds

16 Drugs/100,000 compounds

0.4 Drugs/100,000 compounds

USAN assignment typically at entry to phase 3

Page 14: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL: SAR data• Bioactive compounds

• Link through to validated synthetic routes and assay protocols

• Bidirectionally linking compounds to/from targets

• Built from 12 primary journals•J.Med.Chem. Biorg.Med.Chem., PNAS, JBC, Bioorg.Med.Chem.Letts., Eur.J.Med.Chem., DMD, Xenobioitica, Nature, Science, AACR, J.Nat.Prod.

• StARlite 1 – June 2001

• StARlite 31 – August 2008

StARLITe

Bioactivity

StARLITe

Bioactivity

CompoundCompoundT

arg

etT

arg

et

Ki=4.5 nM

N

N

N

N

N

ON

O

N

O

H

H

H

H

H

>Thrombin (Homo sapiens) MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

>Thrombin (Homo sapiens) MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

Page 15: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

4th generation3rd generation2nd generation1st generationPrototype

N

O

N

O

O

H

NN

N

Cl Cl

NN

N

O

N

O

N

O

O

H

NN

N

Cl Cl

N

O

N

O

O

O

H

N

N

Cl Cl

Drug Optimisation

N

N

N+

O

O

Azomycin

(1956)

Streptomyces

natural product

trichomonacidal

‘toxic’

Metronidazole 1962

N

N

N+

O

O

O

N

N

Cl

N

N

Cl

Cl

O

Cl

Cl

N

N

Cl

Cl

O

Cl

Clotrimazole 1970

Miconazole 1970

Econazole 1972

N

N

Cl

Cl

S

Cl

N

N

N+

O

O

SO O

N

N

Tinidazole 1970

Bifonazole 1981

Sulconazole 1980

Ketoconazole 1978 Itraconazole 1984

Terconazole 1980

Voriconazole 2002

N N

F

F

OH

N

N

N

F

Fluconazole 1988

OH

N

N

N

N

NN

F

F

Fosfluconazole 2004

O

O

N

N

NN

N

F F

NN

N

O

OH

Posaconazole 2005

triazoleImidazole

O

N

N

N

N

NN

F

F

PO

OHOH

N

N

N

NN

After W. Sneader

Page 16: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

ChEMBL SAR Contents• Abstracted from 26,299 papers from 12

journals• Monthly update cycle - optimised curation pipeline

• Autocuration tools – clean up and index other large SAR datasets

• Updates and ongoing curation process all data, not simply new article data

• 521,237 compound records• 440,055 distinct compound structures

• 5,439 targets• 3,512 protein molecular targets• ~2,200 orthologous targets (1,644 human)

• 1,936,969 million experimental bioactivitiesCounts refer to StARlite release 31

Page 17: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 18: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 19: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 20: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 21: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 22: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 23: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Interface and Searching

Page 24: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

NH

N

N NOH O

Rule-based Optimisation – Bioisosteres• Identify data-driven ‘rational’ lead-optimisation strategies

• Useful in automated design• e.g. Replacement of carboxylic acid

• Reflect synthetic ease and expectation for functional effect

IC50

Search StARLITe for

functional group

Search for all ‘contexts’ where

acid has been replacedStARLITeStARLITe

OH O

Retrieve assay value

N

N

NN

A

O

S

O

ANH2

O

S

O

OH A

O

A

O

tetrazole

sulphonamide

ester

sulphonic acid

Effect on affinity (-log10 IC50)

Fre

quen

cy (

%)

0 42 6-2-4-6

10

40

20

60

50

30

Page 25: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Typical Compound Collection - Novartis

Ertl, Koch and Roggo, Novartis

N

N

O

O

NN

N

N

N

N

N

O

N

O

N

N

S

N

N

N

S

N

N

O

O

N

N

N

N

O

N

NN

NN

N

N

N

N

N

N N

N

N

N

N

N

S

N

O

N

benzene pyridine piperidine piperazine cyclohexane pyrimidine indole

imidazole naphthalene morpholine thiophene pyrazole pyrrolidine thiazole

furan quinoline cyclopropane benzimidazole imidazoline pyrrole cyclopentane

pyran quinazoline benzthiazole benzodioxole isoxazole purine tetrahydrofuran

tetrazole triazine isoquinolinetetrahydroisoquinolinebenzofurantriazole adamantane

Page 26: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Screening File Comparison - Novartis

NStARLITe rank

No

vart

is r

ank

Enriched fragments

Depleted fragments

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

benzene

pyridine

pyrrolidine

tetrahydrofuran

purine

tetrazole

pyrimidine

morpholine

pyrazole

N

N

O

N

NN

N

N

N

N

N

N

N

NN

N

O

piperidine

Page 27: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Genome-Scale Druggability Assessment

• Now possible to rapidly map chemical intervention points onto genomic data

• In ‘real time’ as gene model is developed• Develop therapeutic hypotheses for expert review/analysis/validation

• Reuse existing drugs/clinical candidates in new contexts• Anticipate required optimisation (comparative modelling, etc)

Nature 460, 352-358 (2009) Nat. Rev. Drug. Disc., 8, pp. 900-907 (2008)

Page 28: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

Indication Discovery

Marks et al., Lancet, 367, pp. 668-678 (2006)

• Map chemical biology/pharmacology data onto microarray datasets• Rapid path to clinic and patient benefit

• Develop therapeutic hypotheses for expert review/analysis/validation• Reuse existing drugs/clinical candidates in new contexts

Marks et al., Lancet, 367, pp. 668-678 (2006)

Page 29: ChEMBL – Large-Scale Open Access Data for Drug Discovery John Overington EMBL-EBI jpo@ebi.ac.uk.

The ChEMBL-og - www.chemblog.org