Post on 22-Dec-2014
description
Cheminformatics & The evolving relationship
between data in the public domain & pharma
Dana VanderwallCheminformatics, Bristol-Myers Squibb
How do we start to find a new chemical that might be the next drug?
Typically- Need a specific protein to target that we think we can use to fix the problem that causes the disease Caveat: emerging trends (&what’s old is new again)
Need to design experiments that test for chemicals that can fit that protein (lock & key)
Thousands to >2 million chemicals are tested with that protein to look for a starting point
This is where drug discovery gets really modern Highly automated robots and infromatics can do work that used to
take years in 1 week
Compound optimization Compounds are optimized for many parameters including
Potency, selectivity, oral bioavailability, safety
1-3 years
>2 million compounds tested in primary assay
Make another 2-10,000
Getting ready for the clinic All compounds are tested for safety in animals Need to prove we can give enough to get the positive benefit without side effects We have to be able to make it on a scale and form suitable for dosing in the clinic
The early stages need milligrams or grams (tablespoons) To start testing in humans requires many kilograms of very, very pure material
1-3 years
2-3 years
Profiling Assays and Lead Op Progression
2M
2K
200
1-10
1 2-5 10-50 300-600
HTS
Hit Triage
Early Lead Op
Late Lead Op
Target to Hit Hit to Lead Lead to Candidate
# C
om
po
un
ds
# Assays
Chemical Structures are the Intellectual Property The targets exist in nature- chemical structures are the
unique component that pharma & biotech can bring to the table
(Biologicals are increasing in importance) As such, the structures, and their biological activity, are
extremely sensitive
Captured in the patents filedNever disclosed until protectedEven similarity/sub-structure searches on public
sites are treated cautiously
GlaxoSmithKline moves to stimulate public-private partnerships for R&D in neglected tropical diseases
http://www.gsk.com/responsibility/access/rnd-neglected-tropical-diseases.htm
GSK launched the open lab at Tres Cantos as one way in which to share our expertise and seek to stimulate open innovation in drug discovery into diseases of the developing world 60 slots for scientists Access to screening facility & TC staff scientists to support
collaborations $5M GBP facilities expansion
Committed to sharing data & IP on GSK research in DDW Starting with recently generated novel anti-malarial hits
Malaria
Mosquito-borne infectious disease, caused by the plasmodium parasite
250 M cases/annum, 1-3 M deaths
Variety of drugs available, but resistance is a constant problem
http://www.mcwhealthcare.com/malaria_drugs_medicines/life_cycle_of_plasmodium.htm http://www.mcwhealthcare.com/malaria_drugs_medicines/life_cycle_of_plasmodium.htm
The assumption is:
One target One consequence
The Complexity of Cell Biology
Target
In reality:This target is one component of a complicated biochemical network.
• A selective probe may influence many pathways.
• Probes can interact with multiple targets.
• Network interactions can be redundant.
• Biological effects are often a consequence of interaction with multiple targets.
Target
Emerging paradigm- look for the cellular activity first Advances in cell biology & the HTS
platforms are enabling HTS screening for a cellular phenotype
Start with something that works in a cellular model for disease phenotype (a.k.a. black box), then figure out how it worksTarget deconvolution
Supporting black-box HTS for anti-malarials 2M compound GSK HTS collection screened @ 2M vs. P.
falciparum (3D7) infected human erythrocytes
12 mos. Screening in biohazard labAvg. z’ = 0.7
19,451 primary hits; inh. parasite growth >80%; 13,533 confirmed in via retests 1,982 showed cytotox in HepG2s @10M None active in cell background control
8,000 also active against DD2 (multi-drug resistant strain) >50%
F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107
Characterizing the hits Clustering was used to help characterize chemical space
416 “molecular frameworks” Bemis & Murcko J. Med. Chem. 39 2887 (1996)
O
O
HN
N
N
857 clusters/1978 singletons by Daylight FP/Tanimoto (.85)
O
HN
HN
FCl
N
N
HN
NH
O HN
O
OH
N
NN
N
N
H2N
O
O
O
N
HN
N
H2N
HN
O
F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107
Three-dimensional plot of some of the novel chemical diversity present in TCAMS
Characterizing the hits
Compounds with an abnormally high frequency of activity across HTS campaigns were filtered out
Excluded where IFI=5% where tested in >100 HTS to 20% where tested in >25 HTS (~1800 cmpds.)
~70 compounds that clustered with know anti-malarials
How are these rest of these compounds working???
100screens HTS ofnumber total
50% Inh. % wherescreens HTS ofnumber IndexFrequency Inhibition
F
HN N
NH
NN
Can we leverage the historical target data on compounds?
Target assays Clear relationship between interactions and
measurements, but what does it mean biologically?
Can we use the data to figure out which targets lead to which biological
response?
kinase_1 kinase_2 kinase_3 kinase_4
7TM_1 7TM_2
NR_3NR_2NR_1
??
stimulant
readout
Phenotypic assays Clear biological result associated with
readout, but from which interaction(s)?
Can we leverage the historical target data on compounds?
Find all target assay data for compounds tested in anti-malarial screen Aggregate at the target-result type level (max pIC50/pEC50)
Of the 2M tested, 130K had some associated target assay data Incl. 3,435 of the 13,500 ‘actives’ “Hits”* at 413 targets
*pIC50 >7.0 for antag/inh/blocker *pEC50 >6.5 for ag/activation/opener
Given that some targets are screened in 2-3 modes, >650 target-result type combinations
Surely not all 400 targets are significant Data very sparse, avg. ~2 pXC50s per compound that
had data
Finding targets ‘enriched’ among the anti-malarials An ‘enrichment’ was calculated for each possible target-result type
combination Are compounds active at target X more prevalent amongst the compounds that
inhibited P. falciparum, or equally distributed across all screened compounds?
For each target –result type, calculate:
@target 0pIC50/pEC5 measuredset with screening entire from compounds ofnumber the
@target thresholdactivity set with screening entire from compounds ofnumber the
@target 0pIC50/pEC5 measured a with hits alantimalari ofnumber the
@target thresholdactivity with hits alantimalari ofnumber the
where
compounds screened allin actives target all
hits among activestarget factor Enrichment
N
X
n
x
NX
nx
Narrowing down the possible candidates
~140 targets @ ≥2 fold enrichment ~50 with homologues in P. falciparum
400 targets >2 fold
enrichment>2 fold
enrichment
F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107
Targets with homologues in P. falciparum genomeAspartic protease Methionyl-tRNA synthetase
b-Ketoacid reductase Phenylalanyl-tRNA synthetase
Calcium/calmodulin-dependent kinase
Phosphatidylinositol 3-kinase
Cysteine protease Plasmodium electron transport chain
Dihydrofolate reductase Ribosome
Dihydroorotate dehydrogenase
Ser/Thr protein kinase
DNA gyrase Tyrosyl-tRNA synthetase
Isoleucyl-tRNA synthetase
Targets with NO homologues in P. falciparum genomeGPCR: Adrenergic antag Nuclear Receptor ag/antag
GPCR: Cannabanoid antag Ion Channel inh
GPCR: Chemokine antag Phospholipse inh
GPCR: Cholinergic ag Lipid amide hydrolase inh
GPCR: Free Fatty Acid ag Serine protease inh
GPCR: Serotonin ag/antag Toll-like receptor ag
GPCR: Opiod ag/antag
GPCR: Peptide hormone receptor ag/antag
Data publicly available All chemical structures and exp. data for compounds
available@http://www.ebi.ac.uk/chemblntd
EXT_CMPD_NUMBER
SMILES
Percentage_inhibition_3D7
Percentage_inhibition_DD2
Percentage_inhibition_3D7_PFLDH
XC50_MOD_3D7
XC50_3D7 (µM)
Percentage_inhibition_HEPG2
Chemical cluster Nr
IFI
Graph_Frame_Cluster
Target_Hypothesis
P. falciparum locus
Commercial Supplier_Reference
Additional information & interest in additional collaborations contact:
jose.f.garcia-bustos@gsk.com
And the raw target data used to develop hypotheses? That was trickier Release the list of 400 targets & all the inactive
compounds would Reveal our whole compound collection All the targets in the current (and past) portfolio
Needed some level of validation for analysis to publish
Surrogates for internal data
Chemical structures associated with a particular target hypothesis were used as ‘bait’ to find published structures & data that validate proposed MOA for each chemotype Similarity & SSS in Aureus DBs & SciFinder Exemplars and their similarity to original hits
published in Suppl. Material with reference We often found our own compounds and data in J
Med Chem and Patent literature.
AcknowledgementsAnti-malarial HTS
Tres Cantos Medicines Development Campus, Tres Cantos Spain
Medicines Research Centre, Stevenage, UK
Darren VS Green
Collegeville & King or Prussia, PA, USA
Vinod Kumar Samiul Hasan James Brown Catherine Peishoff Lon Cardon
Francisco-Javier Gamo Laura Sanz Jaume Vidal Cristina de Cozar Emilio Alvarez Jose-Luis Lavandera Jose Garcia-Bustos