W O M B A T WOrld of Molecular BioAcTivity...Bora, Ionela Olah, Marius Olah, Magdalena Banda...

26
W O M B A T W O M B A T WOrld of Molecular BioAcTivity WOrld of Molecular BioAcTivity Tudor Oprea Sunset Molecular Discovery LLC http://www.sunsetmolecular.com Daylight MUG 18 Santa Fe, NM, 02/24/04 Copyright © Tudor I. Oprea 2004. All rights reserved

Transcript of W O M B A T WOrld of Molecular BioAcTivity...Bora, Ionela Olah, Marius Olah, Magdalena Banda...

  • W O M B A TW O M B A TWOrld of Molecular BioAcTivityWOrld of Molecular BioAcTivity

    Tudor OpreaSunset Molecular Discovery LLChttp://www.sunsetmolecular.com

    Daylight MUG 18Santa Fe, NM, 02/24/04Copyright © Tudor I. Oprea 2004. All rights reserved

    http://www.sunsetmolecular.com/

  • W O M B A TW O M B A TWOrld of Molecular BioAcTivityWOrld of Molecular BioAcTivity

    Axes: MW, LogP, LogSw

    0.001 – 638%

    6 – 843%

    8 - 14.418%

    Inactives/SingleDose 1%

  • Bioactivity DistributionBioactivity DistributionBy Target TypeBy Target Type

    Receptors (56.2%)

    0.0 – 633%

    6 – 844%

    8 - 14.422%

    Inactives/SingleDose 1%

    Proteins (4.8%)

    0.0 – 643%

    6 – 845%

    8 - 14.411%

    Inactives/SingleDose 0%

    Enzymes (39%)

    0.0 – 644%

    6 – 841%

    8 - 14.414%

    Inactives/SingleDose 2%

    • Enzymes tend to have a higher rate of inactives/low actives• Receptors tend to have more medium/high actives

  • Target Type DistributionTarget Type DistributionBy ActivityBy Activity

    enzyme55%

    protein1%

    receptor44%

    enzyme45%

    protein5%

    receptor49%

    enzyme37%

    protein5%

    receptor58%

    enzyme29%

    protein3%

    receptor68%

    Inactives (1%) Low Act. (38%) Medium Act. (43%) High Act. (18%)

    • Enzymes dominate the inactive/low activity bins• Receptors clearly dominate the medium/high activity bins

  • WOMBAT HistoryWOMBAT History• SADB5 (May 2002):

    • Project funded initially by AstraZeneca• 21700 structures (includes duplicates)• 36738 activities on 324 targets• 837 papers indexed (JMC 1996-1999)• 39.56% Ki, 53.52% IC50• 5.54% D2 or EC50

    • WOMBAT 2003.2 (September ‘03):• 53126 entries (47872 unique structures) 98662 activities on 506 unique

    targets, plus 236 inactives, 7982 ‘smaller than’ & 159 ‘greater than’ values• 2143 papers (2148 series) indexed (JMC 1994-1999)• 35.5% Ki, 56.6% IC50, 4.85% D2 or EC50• literature coverage included BMCL (2002), QSAR (2000-2001)

    R o m a n i a n A c a d e m yInstitute of Chemistry Timisoara

  • WOMBAT 2004.1WOMBAT 2004.1• 76,165 entries (68,543 unique SMILES) covering 3039 series (over 3000

    papers) and ~143,000 activities on ~630 targets

    • Activities now include inactives (635), < (8916), > (259), @ (578 – single dose)

    • 37.1% Ki (& variations), 55.85% IC50 (& variations), 4.44% D2 or EC50, 0.9% Kb and Kd , 0.1% MIC, 0.04% ED50

    • Biochem. Pharmacol. 2001 [partial coverage], Bioorg. Med. Chem. Lett. 2002 [1-24], Chembiochem 2002 [partial], Eur. J. Med. Chem. 2001 [partial], J. Amer. Chem. Soc. 1975,1992,1993 [partial], J. Healt. Sci. 2003 [partial], J. Med. Chem. 1991 [partial], 1992-2000 [complete], 2003 [partial], Quant. Struct.-Act. Relat. 1998-2000 [partial]

    • Fully integrated FEDORA server (Metaphorics LLC)

    • New features include SwissProt IDs for most Targets and links (via the DOI format for 1737 entries) to PDF files for all literature entries

    http://www.sunsetmolecular.com/products/?id=4

    http://www.sunsetmolecular.com/products/?id=4http://www.sunsetmolecular.com/products/?id=4

  • Activity Profile for WOMBAT 2004.1Activity Profile for WOMBAT 2004.1Target Class Compounds (a) PercentG-Protein Coupled Receptors 28973 38.04Nuclear Hormone Receptors 688 0.90Integrins 1772 2.33Ion Channels 9008 11.83Aspartyl Proteases 3351 4.40Serine Proteases 1459 1.92Kinases 2842 3.73Cysteine Proteases 704 0.92Phosphodiesterases 1689 2.22Oxidoreductases 2010 2.64Oxygenases 2829 3.71Transporters 2264 2.97Others 18576 24.39

    (a) number of structures active at least once/target, % of total entries

  • References Are Stored SeparatelyReferences Are Stored Separately

  • WOMBAT Quality ControlWOMBAT Quality Control• Chirality: What chemists can interpret, computers are not always able (the

    “above/below the plane” must be strictly enforced)Not machine-readable Machine-readable

    • Missing/altered atoms/substituents – overall error rate above 9%• Incorrectly drawn or written structures (3.4%); incorrect molecular formula or

    molecular weight (3.4%);• Unspecified binding position for substituents or ambiguous numbering scheme

    for the heterocyclic backbone (0.91%);• Structures with the incorrect backbone (0.71%);• Incorrect generic names or chemical names (0.24%);• Incorrect biological activity (0.34%);• Incorrect references (0.2%).

    N

    NRO

    N

    NH2N

    N

    N

    OH OH

    NH2

    N

    N

    NNO

    OH OH

    R

  • WOMBAT Quality Control…WOMBAT Quality Control…

  • JMC Errors… 1JMC Errors… 1Reference Published Structure Corrected Structure Comment

    JMC 37-476 chart 1

    N

    O

    O

    O NO O

    O

    rolipram: incorrect N atom position

    JMC 43-2217 chart 1

    N

    N

    O

    N

    N

    O

    A-85380: incorrect ring size

    -||- & JMC 36-2645

    NN

    O

    O

    O N

    N

    O

    tropisetron: methyl group in plus

    -||-

    N

    N N

    O

    N O O N

    O

    N

    ON

    O

    DAU-6285: missing methoxy; N instead O

    JMC 37-758 chart 1 N

    N

    O

    O

    N

    OH

    N3

    N

    N

    O

    O

    N

    O

    N3

    Ro-15-4513: methyl group missing

    JMC 37-787 figure 1

    N

    S

    O

    SO

    O

    OO

    NS

    O

    S

    epalrestat: E/Z config: E instead Z

  • JMC Errors… 2JMC Errors… 2Reference Published Structure Corrected Structure Comment

    JMC 35-1969 chart 1

    OO

    O

    O

    O

    O NHH

    O

    N

    O

    O

    O

    O

    O

    HH

    bicuculline: incorrect chirality; incorrect ring fusion

    -||-

    N+

    OO

    H

    O

    HO

    N+

    O O

    H

    N+O

    N+OOO

    OO

    HH

    (+)-tubocurarine: incorrect N atom position; substitution position

    JMC 37-1769 chart 1

    NO

    O

    F

    Br

    NO

    O

    F

    Cl

    haloperidol: Br instead of Cl

    JMC 38-16 scheme 1

    O

    N

    NO

    N

    N

    NOO

    divaplon: missing nitrogen atom

    JMC 43-71 figure 2

    N

    NN N

    Cl

    Cl

    N

    NN N

    Cl

    triazolam: missing chlorine atom

  • JMC Errors… 3JMC Errors… 3Reference Published Structure Corrected Structure Comment

    JMC 43-1793 N

    S NN

    N OO

    OO

    N

    N

    N

    S NN

    N OO

    OOO

    N

    N

    argatroban: missing double bonded oxygen atom; missing chirality

    JMC 41-1943 chart 1

    ON

    ClO

    N N

    N

    O

    NN

    N

    LY-297524: completely different structures

    JMC 41-4196 N

    N

    NS

    O

    F

    N

    N

    NS

    O

    F

    SB-203580: missing S=O double bond

    JMC 38-3645 figure 1

    N

    N

    N

    N

    tacrine: missing two double bonds

    JMC 38-3094 figure 1

    N

    O

    O

    O

    O

    N

    O

    O

    O

    levonantradol: methyl instead hydroxy,methyl & plus an extra double bond

  • JMC Errors… 4JMC Errors… 4

    JMC 35-4509 table II O N

    Othp

    NY

    X

    O N

    O NY

    X

    40, 41: THP in plus

    JMC 35-3858 table IV

    N

    NX

    O

    NNX

    53/R2: imidazoyl instead imidazolyl

    JMC 43-236 table 1

    X

    O

    FFF

    O

    O

    FF

    F

    OX

    6b: incorrect substituent

    -||- X

    NO

    O

    OF

    FF

    NO

    O

    X

    F

    FF

    9: double bonded O atom in plus

    JMC 39-3636 table 1

    X

    X

    28xiii/R3: pent-4-yl instead hept-4-yl; confirmed from chemical name

    JMC 40-1049 table 6

    N

    OO

    X NO

    X

    69/R: wrong substituent; confirmed from chemical name

  • Other Errors… Other Errors… SciFinderSciFinder

    Me

    Ph

    HO 2C

    NH

    NH

    O

    NH

    O

    NH

    OS

    SNH O

    H2N

    O

    R

    SS

    S

    N NN

    N

    N

    S S

    OO

    O

    O

    O

    O

    N

    O

    WOMBAT: RB-370

    Registry Number: 187454-94-0

    The correct structure has a 13-member ring

  • Other Errors… Merck IndexOther Errors… Merck Index

    N

    N

    O

    O

    N

    O

    NO N

    O

    ON

    O

    "Carisoprodol"Merck Index 13th ed #1854

    Carisoprodol - correct structureMerck Index 13th ed has correct name

  • F E D O R AF E D O R A

    http://www.metaphorics.com

    http://www.metaphorics.com/

  • WOMBAT@FEDORAWOMBAT@FEDORA

  • WOMBAT PatternsWOMBAT Patterns• Dave Weininger wrote a SMARTS generator starting from a SMILES that was

    hand-picked by Vera Povolna to match a specific (not the maximum common) substructure for each WOMBAT series

    • These SMARTS are intended to capture the unique biological profile for each series – on occasion 2 such SMARTS were defined; note that hydrogens are matched exactly as defined in the series

    [CH3]-[OH0]-[cH0]:1:[cH1,cH0]:[cH0]:2-[CH2]-[NH0](-[NH0]=[CH0](-[cH0]:2:[cH1]:[cH1]:1)-[CH2]-[cH0]:3:[cH0](:[cH1]:[nH0]:[cH1]:[cH0]:3-[ClH0])-[ClH0])-[CH0,SH0,CH1]=[OH0]

    [CH2]-[CH2]-[NH0](-[CH2]-[CH2])-[CH2]-[CH2]-[OH0,SH0]-[cH0]:1:[cH1]:[cH1]:[cH0](:[cH1] :[cH1]:1)-[CH1]-2-[CH1](-[CH0,CH2]-[OH0]-[cH0]:3:[cH1]:[cH0](:[cH1]:[cH1]:[cH0]-2:3)-[OH0,OH1])-[cH0]:4:[cH1]:[cH1]:[cH1]:[cH1]:[cH1]:4

    [OH1]-[CH0](=[OH0])-[CH2]-[CH1,CH2]-[NH1]-[CH0](=[OH0])-[CH2]-[NH1,NH0]-[CH0] (=[OH0])-[CH2,CH1,NH0]-[CH2]-[CH2]-[cH0]:1:[nH0]:[cH0]:2-[NH1]-[CH2]-[CH2]-[CH2]-[cH0]:2:[cH1]:[cH1]:1

    [OH1]-[CH0](=[OH0])-[CH2]-[CH1,CH2]-[NH1]-[CH0](=[OH0])-[CH2]-[NH0]-1-[CH0](-[CH1](-[CH2]-[CH2]-1)-[CH2]-[CH2]-[cH0]:2:[nH0]:[cH0]:3-[NH1]-[CH2]-[CH2]-[CH2]-[cH0]:3:[cH1]:[cH1]:2)=[OH0]

    [NH2]-[CH2]-[CH2]-[CH2]-[NH1]-[CH2]-[CH2]-[CH2]-[CH2]-[NH1]-[CH2]-[CH2]-[CH2]-[NH1]-[CH0,SH0]=[OH0]

    • Provides interesting associations in FEDORA

  • Increased MW does not Increased MW does not warrant higher activitywarrant higher activity

    67210 structures 138401 activities

    MW

  • Increased Increased LogPLogP does not does not warrant higher activitywarrant higher activity

    ELogP

    66824 structures 137766 activities

  • How Small Can Active Compounds Be?How Small Can Active Compounds Be?Binned ELogP

    Less -3 -1 1 3 More

    10

    20

    30

    40

    50

    60

    70

    80

    90N

    +O

    NH2O

    CH3

    CH3

    CH3NH2OH

    OH

    O

    O

    OHO

    OHNH2

    H

    H

    NNH

    NH2

    NCH3N N NH

    CH3CH3

    CH3

    N

    NH2

    PO

    OHNH2 N

    NO

    CH3

    N

    CarbacholMW = 143LogP = -3.8IC50 = 8.2 (m)

    DopamineMW = 153LogP = -1.0IC50 = 8.7 (D2)

    LY-379268MW = 187LogP = -4.6EC50 = 8.6 (mGLU2)

    NicotineMW = 162LogP = 1.2Ki = 9.0 (nACh)

    MedetomidineMW = 200LogP = 3.8EC50 = 8.5 (α2)

    HistamineMW = 111LogP = -0.7Ki = 8.2 (H3)

    CGP-27492MW = 123LogP = -1.7IC50 = 8.6 (GABA-B)

    L-670548MW = 179LogP = 0.77Ki = 9.7 (m1)

    TacrineMW = 198LogP = 2.7IC50 = 8.2 (BChE)

    192 unique structures 46 targets 252 activities ≤ 10 nMMW ≤ 200 amu176 are likely to be charged at pH 7.4

  • AcknowledgmentsAcknowledgments• Maria Mracec, Liliana Ostopovici, Ramona Rad, Alina

    Bora, Ionela Olah, Marius Olah, Magdalena Banda (Timisoara Institute of Chemistry of the Romanian Academy) and TIO introduced data in WOMBAT

    • Marius Olah wrote the database interfaces• Maria Mracec, Marius Olah and TIO did the keyword

    characterization• Marius Olah, Maria Mracec, Cristian Bologa and TIO

    performed structural error checking• Vera Povolna and David Weininger (Metaphorics) for

    implementing WOMBAT@FEDORA

    The contents of this talk are copyright © Tudor I. Oprea 2004. All rights reserved

  • http://www.eurohttp://www.euro--qsar2004.orgqsar2004.org

    15th European Symposium on15th European Symposium onQuantitative StructureQuantitative Structure--ActivityActivity RelationshipsRelationshipsIstanbul / Turkey 05Istanbul / Turkey 05--10 September 200410 September 2004

    EuroQSAR 2004

    Chair:Chair: Prof. Dr. Prof. Dr. EsinEsin AKI ŞENERAKI Ş[email protected]@pharmacy.ankara.edu.trCoCo--Chair:Chair: Prof. Dr. Prof. Dr. İsmailİsmail YALÇINYALÇ[email protected]@pharmacy.ankara.edu.trAddress for Correspondence:Address for Correspondence:ArmoriaArmoria CongressCongressarmoria@[email protected]

    http://www.euro-qsar2004.org/http://www.euro-qsar2004.org/

    W O M B A TWOrld of Molecular BioAcTivityW O M B A TWOrld of Molecular BioAcTivityBioactivity DistributionBy Target TypeTarget Type DistributionBy ActivityWOMBAT HistoryWOMBAT 2004.1Activity Profile for WOMBAT 2004.1References Are Stored SeparatelyWOMBAT Quality ControlWOMBAT Quality Control…JMC Errors… 1JMC Errors… 2JMC Errors… 3JMC Errors… 4Other Errors… SciFinderOther Errors… Merck IndexF E D O R AWOMBAT@FEDORAWOMBAT PatternsIncreased MW does not warrant higher activityIncreased LogP does not warrant higher activityHow Small Can Active Compounds Be?Acknowledgments

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice