Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf ·...

33
August 2010, ACS National meeting, Boston Representation of Markush structures — from molecules towards patents Szabolcs Csepregi Solutions for Cheminformatics

Transcript of Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf ·...

Page 1: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Representation of Markush structures — from molecules towards patents

Szabolcs Csepregi

Solutions for Cheminformatics

Page 2: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Contents

•  ChemAxon

•  What are Markush structures?

•  How to get them?

•  What can be done with them? –  Enumeration –  Storage, search

•  Challenges in chemical representation

•  Under development

Page 3: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

ChemAxon

•  Cheminformatics toolkits and applications

•  HQ: Budapest, Hungary

•  Founded: 1998

•  Main customers: pharma, biotech, publishing

•  3rd party applications and web sites. (e.g. Integrity, Reaxis, PDB ligand search, ELN-s, registration systems, etc)

Page 4: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

ChemAxon

Main products: –  Structure drawing & visualization (Marvin family) –  Chemical DB tools (JChem family) –  Property predictions (Calculator plugins) –  Drug discovery tools (Reactor, JKlustor, etc.)

Development strategy: customer-driven

Page 5: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

What are Markush structures

and how to get them?

Page 6: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush structures Generic notation for describing many molecules

(= Markush library) in a compact form.

Main usage: –  Combinatorial chemistry –  Chemistry-related patents

Page 7: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush structures

•  Current features handled: –  R-groups –  Atom lists, bond lists –  Position variation bond –  Link nodes –  Repeating units –  Homology groups

(aryl, alkyl, etc.)

Page 8: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

ChemAxon Markush project Goals:

–  Extend structural search capabilities to combinatorial Markush structures

–  Markush enumeration

Complications: –  Practical examples may be very complex, methods using

explicit enumeration may be impossible –  Extension of current molecular formats (generic features)

Timeline –  Pilot study started in 2005 Q4, –  First prototype shown at UGM, 2006 June –  Released in JChem 5.0, 2008 –  Markush DARC format support 5.3.0 2010

Page 9: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

How to get Markush structures?

•  Drawing – Marvin Sketch

Page 10: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

How to get Markush structures?

•  Patent literature – Markush DARC format (*.vmn)

•  Compatible with Thomson Reuters MMS patent Markush database (Test set available.)

Page 11: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

How to get Markush structures?

Combinatorial chemistry – Reagent clipping 1.  Replace reacting group with attachment point

(Reactor tool) 2.  Turn fragments to

R-group definitions (Molconvert tool)

3.  Add a scaffold (Molconvert tool)

Page 12: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

How to get Markush structures?

Combinatorial chemistry – R-group decomposition 1.  Filter and identify ligands in chemical library 2.  Create Markush structure from R-table (R-group decomposition tool)

Page 13: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

What to do with them?

Page 14: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush Enumeration

•  Markush enumeration plugin –  Full enumeration –  Selected parts only –  Random enumeration –  Calculate library size –  Scaffold alignment

and coloring –  Markush code –  Optional example

homology group enumeration

Page 15: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush storage & search •  JChem Base and

Instant JChem

•  No enumeration involved

•  Can handle complex Markush structures (1040 or more)

•  Substructure and Full structure search

•  Broad translation of homology groups is supported. (Homology in DB, specific in query.)

Page 16: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush storage & search

Substructure hit visualization

Query

Result in original Markush

Page 17: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Markush storage & search

Substructure hit visualization: „Markush structure reduction”

Query

Result in original Markush

Reduced result

Page 18: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Main use cases

•  Patent search hits refining / visualization,

•  White space analysis,

•  Patent busting,

•  Markush structure curation,

•  In-house storage of small Markush DB,

•  etc...

Page 19: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

MMS evaluation Instant JChem project

Page 20: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenges in chemical representation (solved)

Page 21: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Representation - What we already had

Generic notation in queries:

•  Atom lists, bond lists

•  R-group queries (Problem: RGFile R-logic and patent R-logic are different! - Solution: Just ignore R-logic.)

•  Link nodes

•  Some generic atoms (X) – represented as pseudo atoms.

Single or double

Page 22: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 1: Attachment point

•  Multiple – ligand order and attachment order Heavily used in Markush DARC (up to 8 attachments!)

•  Represented as atom property

Parent group (root)

R-group definitions

Order of ligands for G15 (R15)

Attachment points for definitions

Page 23: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 1: Attachment point

•  Embedded R-groups: Grandparent relations may be needed between attachment points:

G3’s attachment point „1” is mapped to

G4’s attachment point „1”

Page 24: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 1: Attachment point

•  Temporary representation: attached data –  ligand order –  attachment point in R-group definition –  still an atom property –  ligand order sometimes in parent group

(grandparent relation)

Order of ligands for R2

Attachment points for definitions

Page 25: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 1: Attachment point

•  Real attachment object with bond (under development)

–  eliminates need for grandparent relations table:

Order of ligands for R4

Attachment point for R3

Order of ligands for R2

Attachment points for definitions

Page 26: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 2: Abbreviations

•  Superatom S-groups were originally in Marvin (~700 built-in shortcuts) –  Expand / Contract –  Search code already handled them

in specific structures.

•  M. DARC had 21 shortcuts + 31 peptides.

•  Attachment point next to abbreviations –  Needed to be visible „outside” and handled

correctly „inside”. –  New attachment point solves this also:

Page 27: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 3: Homology groups (generics)

•  Pseudoatom representation

•  Naming (Still looking for the most descriptive „long” names.)

•  Extra conditions: general atom property framework (under development)

Markush DARC name „Long name” CHK alkyl CYC carboAlicyclyl ARY carboAryl HEA heteroMonoAryl

Page 28: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 4: Frequency variation

•  Link nodes

•  Repeating units: modified SRU

•  Multipliers: –  special SRU, 1 outer bonds. –  (Currently visualization only.)

•  Moieties: –  special SRU, 0 outer bonds –  to describe (variable) stoichiometry –  (Currently visualization only.)

Page 29: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Challenge 5: Position variation bond

•  New special S-group type

•  Relocatable multicenter atom represents group for bonds

•  Also useful to represent multicenter charge and coordination compounds:

Page 30: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

What (else) keep us busy

Page 31: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Under development

•  Further improvements in Markush DARC support: –  Ring segment groups (XX form a ring) –  New, more robust representation for attachment points –  Homology properties (low alkyl, fused aryl, C1-3, N2-5, etc)

•  Ranking of results •  New ways to navigate/zoom Markush structures

•  Maximum common substructure search

•  Biased enumeration and covering Markush – based on examples in patent.

•  Improve search speed to handle larger Markush sets.

•  Other Markush formats – Markush InChI standard committee •  Overlap analysis of Markush structures

•  Conditions for Markush variables

Page 32: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Summary

•  Markush structure storage, search and enumeration at ChemAxon now patent coverage

•  Compatible patent data is available from Thomson Reuters

•  Well thought out chemical representation

•  Continuous development, improvements in the pipeline

Page 33: Representation of Markush structures — from molecules ...bulletin.acscinf.org/PDFs/240nm76.pdf · ChemAxon Markush project Goals: – Extend structural search capabilities to combinatorial

August 2010, ACS National meeting, Boston

Acknowledgements

•  Development team: Nóra Máté, Róbert Wágner, Szilárd Dóránt, Tamás Csizmazia, Tim Dudgeon, Erika Bíró, Ali Baharev, Ferenc Csizmadia, et al.

•  Tim Miller, Steve Hajkowski, Gez Cross and Linda Clark at Thomson Reuters for useful discussions, help and example Markush DARC files

•  Many early adopters and colleagues within the field for suggestions and feedback