The Derwent Markush Resource (DWPIM) on...

Post on 11-Aug-2020

3 views 0 download

Transcript of The Derwent Markush Resource (DWPIM) on...

The Derwent Markush Resource (DWPIM) on STNext®

Webex30.01.2020

Thomas Stengel (Product Manager Chemistry & Patents)

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

Information about DWPIM on STNext4

Webinars about DWPIM on STNext5

Latest DWPIM seminars• DWPIM on STNext:

Introduction and New Functionality• Multifile Structure Searching on STNext

Why is it Important?How is it Done?

• In-depth Structure Searching in DWPIM on STNext Best Practices

• The Derwent Markush Resource (aka DWPIM) on STNext

AN 2121-56402...

What is the Derwent Markush Resource (DWPIM)?6

1 => TRA MCN /AN

TI New pyrazolopyrimidine derivative are bruton'styrosine kinase inhibitors used to treat cancer comprises solid cancer e.g. brain tumor, malignantastrocytoma and blood cancer e.g. leukemia, autoimmune disease e.g. rheumatoid arthritis

CMC UPB 20181206...RIN: 01174 01732 MCN: 2121-56402-N 2121-56402-P

Markush Compound Number

Accession Number

Chemical Code (CMC)

TRA1

Indexing guidlines for DWPIM• Markush structures are indexed from:

‒ the patent claims‒ the embodiment if a 'wider disclosure' is indicated

• The maximum number of Markush and DCR structures indexed per DWPI basic patent is 99‒ Specific structures which cannot be covered using

DCR indexing are covered within a DWPIM Markushstructure

‒ Specific Structures prior DCR implementation (1999) can be found in DWPIM

7

Example: Markush of specific compounds (9213-F8101)

TI New 5-formyl-1,1,2,3,3,4,6-hepta:methyl:indane -which is organoleptic agent with fragrant musk-like aroma

CMC ... M3 *01* G023 G034 G036 G038 G039 G212 J011 J241 J431 K0 K840 L143 M210 M211 M212 M240 M283 M320 M414 M510 M520 M531 M540 M710 Q253 Q254 R021 R022 R023 M903 M904MCN: 9213-F8101-N 9317-G7301-M 9317-G7301-N 9317-

G7301-Q 9319-A0801-N

In DWPI (US 5095152 A) In DWPIM

Example: Markush of Series of specific compounds (9830-C8901)

Differences DWPIM Structure vs Patent Claim• Indexing conventions

‒ Keto-enol tautomerism (keto form is the preferred one in DWPIM)

• Use of Markush terminology and shortcuts‒ DWPIM: Use of Superatoms terms (CHK, ARY etc.)

& shortcuts (CO2, SO3 etc.)

• Allowing for variable attachments‒ All parts of the structure where the attachment can be made by a variable group are assigned

G group elements

• Allowing for exceptions mentioned in the patent (povisos)‒ E.g. if A=value x, then B can take on subset of its possible values)

• Allowing for system limits‒ Means sometimes one structure is split into 2 or more structures

Comparison of patent claim with DWPIM structure helpful for optimizing query

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

Ways to generate structures12

• Structure Editor

• CAS REGISTRY Number

• InChI Strings

• SMILES Strings

• Import structure

• .cxf format fully supported

• .mol, .str formats supported for specific structures

• Command line

Structure Editor Preferences13

Derwent Markush Attributes for nodes and bonds14

When the structure is uploaded for the session, all Derwent attributes are displayed in the transcript(ring lock indicated with bold bonds)

DWPIM Structure Searching and Crossover to DWPI is Similar to REGISTRY/CAplus

15

=> FILE DWPIM

=> Uploading structure file: 2018_0006_StructureL1 STRUCTURE UPLOADED

=> S L1 SSS FULL

100.0% PROCESSED 0 ITERATIONS 1443 ANSWERSL2 1443 SEA SSS FUL L3

=> FIL WPINDEX

=> S L2L3 955 L5

=> D FULLG AHITSTR

1

2

3

1

2

3

DWPIM structure search

Crossover to DWPI

DWPI display with HIT structures

Three Markush Displays16

DWPI:

AHITSTR assembled

BHITSTR brief

FHITSTR full

DWPIM:

ASB assembled

BRIEF brief

FULL full

Types of Searches17

• Sample (default)

• Subset

• Batch

• Substructure SSS and Closed Substructure CSS

Subset Search in DWPIM18

• All valid search types (CSS, SSS) and search scopes (SAMPLE, FULL) may be used in the subset search. The search syntax follows STN conventions, e.g.:

Batch Search in DWPIM19

• Increased search time

per structure may reduce

number of iteration

incompletes

• Extended overall search

time of 90 minutes may

increase number of

completed searches

Substructure Search in DWPIM 20

Separating out the records retrieved with incomplete designation. Preserves hit structures

Meaning of „Iterations“21

There is no full file projection for online and batch search available.

Information on Iterations:

no meaning, to be ignored !

#of it incs to give 50 sample records

Best Practise: How to deal with Iteration Incompletes• Most often very generic Markush structures are of no relevance

• Most often core structure is not related to query structure

• If set needs to be evaluated:‒ do not analyze structures in DWPIM but crossover incomplete structures to DWPI

• If possible, narrow result to technical area like pharma (B/DC), e.g. by using roles

• Display WPINDEX records with graphic images (incl structures from the claims) , e.g. => D AN TI GI=> D FULLG

• Split up complicated query structures with many possible variations into two or more separate less complicated structures

22

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

Applying Roles in DWPIM24

• SDM: 26 Substance Descriptors (mainly tech- and structure-related)

• MDE: 3 Markush Descriptors (specific generic)

Applying Roles in DWPI (cross-over from DWPIM) 25

• DCR,DCN: 30 Roles (compound provience, type, analytics)

Syntax:S L-num(T)(role(s))/MCN

Syntax:S L-num(T)(CL)/MCND HIT

Applying Roles in DWPI (cross-over from DWPIM) 26

• Frag-codes: > 100 Roles (Pharmaceutical and Agricultural activities, properties and uses)

Syntax:S L-num(P)(role(s))/M0,M2,M3,M4

Syntax:S L-num(P)(Q25)/M0,M2,M3,M4D HIT

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

DWPIM vs. MMS: content and database structure

DWPI basic patents

PHARMpatents

MMS(Markush + Specific)

DWPI basic patents

DWPIM(Markush)

DCR(Specific)

MMS STN

Summary of key differences DWPIM vs MMS

• Structure displays (hit structures only available in DWPIM)

• Free sites (MMS “closed”, DWPIM “open”)

• Match Level (STN) vs. Translation (Questel)

• Differences in search functions

• Bond values

• VPA Function

• Exclusion of certain elements / fragments

MARPATSM and DWPIM complement each as regards

• Authority coverage• Compound class coverage• Indexing policies

‒ Chains / bonds‒ Generic Nodes‒ Match Level‒ G-group numbering‒ Tautomers

• Time periods covered

Generic nodes (=superatoms) in DWPIM

X preserved

M preserved

DWPIM superatoms defined by properties

till 1990s

Searching POL

US200600518240323-31403Query

Mixed Match Level Rings can be searched

• Only the combination of ML ATOM and ML ANY is allowed (Clarivate indexing rules)

• Hybrid ring systems can only contain the generic node XX.

These records can only be found with ML ANY

Mixed Match Level Rings can be searched

• ML combinations ATOM + CLASS, CLASS + ANY and ATOM + CLASS + ANY

are not allowed. The following rules apply:

• The most common type of ML is determined and applied for all ring atoms

• If it is ML ANY, the second most common will be applied for all ring atoms

• For equal numbers of assigned ML’s the lower ML is assigned, e.g. a rig with 3

atoms ML Atom and 3 Atoms ML Class the overall ML Atom will be assigned.

Indexing inconsistencies Rare example for ML Atom-Class combination within same ring

MARPAT: ML Atom-Class within rings allowed

Carbons: ML Class unlimitedML Atom

Mixed Match Level Rings (ATOM-ANY)Ring Contraction does not take place: Ring size of query structure is preserved.

MARPAT:Ring contraction allowed

DWPIM:Ring contraction not allowed(Pyrrolidine ML ANY)

Search in DWPIM

DWPIM hit structure

MARPAT hit structure

(Pyrrolidine ML CLASS)

Search in MARPAT

Mixed Match Level Rings

ML ANY atom# completes

None (=all atom) 522

4 529

3,4 532

3,4,5 532

3,4,5,6 536

2,3,4,5,6 536

All Class 4858

All ANY 7284

1

2

3

4

5

6

Mixed Match Level Rings

Query 1 Hit record Query 2

ML ANY ML ANY

Mixed Match Level Rings

• ML ATOM: Pyridines

• ML CLASS: Pyridines, HEA, HEF

• ML ANY: HEA, HEF, Pyridines, XX, ring containing XX

Tautomers – Carboxylic acids and amids Case 1: X and Y are different (chains)

Example:

• MARPAT: − Bonds normalized− Indexing: double bond at O,S

• DWPIM: − Bonds as single and double bonds− Indexing: double bond at O,S

DWPIMMARPAT

35983970229(indexing inconsistencies)

3970STR 14STR 13

Tautomers – LactamsCase 2: X and Y are different (rings)

DWPIMMARPAT

278(indexing inconsistencies)

5248

52495251

• MARPAT: bonds are normalized

• DWPIM: located double and single bonds

STR 26

STR 25

STR25 STR26

Example:

Tautomers – Pyridinone type

• MARPAT: N-C-O bonds normalized.

Preferred indexing as 2-pyridinone tautomer (Oxo Rule)

• DWPIM: Aromatization takes priority over tautomerization,

i.e. 2-pyridinol tautomer indexed

Tautomers – Pyridinone type

bond normalization for pyridinone-type in MARPAT but not in DWPIM

DWPIM: 83 4

MARPAT: 52 52

=87

=52

+

+

Aromatization rule inDWPIM but some hitsare „keto“

Example: Valaciclovir

STR 1 STR 2

DWPM, Marpat: N-C-N bonds are normalized in both db‘s

Tautomers – Imidazoles and Guanidines

Case 3: X and Y are N

DWPIM: Marpat:

Tautomers – Keto-Enol

DWPIM: 4 196

MARPAT: 19 20

No bond normalization for keto-enol bonds in DWPIM or MARPAT

=200

=39

+

+

Keto rule inDWPIM but some hits are „hydroxy“

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

Nov 2019 Release – Resolved Issues / Enhancements

49

• Repeating groups enhancements:

• In combination with ring lock

• Starting with 0 (e.g. [0-3])

• Without upper or lower limit (e.g. [2-] or [-2])

• Atom-Atom match with A- and Q-node issue resolved

• Ring-lock function issues resolved

Ring and Chain Expansion via free sites• Chain nodes have at least 1 free site and can therefore match to other chain nodes:

CHK CHE, CHYCHE CHY

• Ring nodes have at least 2 free site and can therefore match to other ring nodes:Cb HEFCYC ARY, HEFHET HEFHEA HEFARY HEF

• Currently the ring expansion cannot be avoided (attribute „monocyclic“ not applicable).

Ring and Chain Expansion via free sites – Example 1

Query:

Pyridine ring ML Class

Without ring expansion: 6113 hitsWith ring expansion: 8639 hits

Result:

AN:2149-71702

Pyridine HEA HEF

Ring expansion

Current implementation ! Will be reversed (Q1 2020)

Ring and Chain Expansion via free sites – Example 2

52

ML Atom

SSS Full

Current implementation ! Will be reversed (Q1 2020)

Ring and Chain Expansion via free sites – Planned Changes• Chain nodes have at least 1 free site and can therefore match to other chain nodes:

CHK CHE, CHYCHE CHY

• No ring extensions takes place for CYC, ARY, Cb, HEA, HET.

• Attribute „monocyclic/polycyclic“ for CYC and ARY.

Queries with Fragments do not work correctly

54

• Hit records may be missed due to a matching problem.

Example: STR2 does not include all hits retrieved with STR1

STR1 STR2

Adjust bond value for Carboxylic acid derivates to e/n

55

RR

Switch from n (default) to e/n

• On STN Carboxylic acid derivates as well as corresponding phosphoric, sulfonic

and selenic acid derivates are defined with normalized bonds

• Affected groups: -COOH, -CO2H, -COSH, -CSSH, -CS2H, -OPO3H2, -PO3H2, -PO2H,

-OSO3H, -SO3H, -SO2H, -SeO3H, -SeO2H

• How to handle this issue for query structures:

e/n (default)

Avoid shortcuts since their bond values can’t be changed !

• Recommendation: hitstructure information within DWPIM preserved.

Therefore answer set should always be saved in DWPIM as well.

• Records from DWPIM and DWPI can be mutually allocated by Markush number

(AN in DWPIM corresponds to MCN in DWPI).

DWPIM WPIX Cross-Over: Hit Structures

Reporting Function (including hitstructures)• Reporting as „Substance Report“ does not work (fix planned in Q1/2020)

• Workaround: Use the patent template and drag the „substance descriptor“ field for DWPIM

and „manual code“ field for DWPI reporting.

• DWPIM Database information, content & Indexing

• Basic Markush searching techniques

• Advanced Markush searching techniques (descriptors, roles)

• DWPIM compared to MMS and MARPAT

• Latest DWPIM Release: Enhancements & Outstanding Issues

• Special Topics

Agenda

XX as Query Node59

Possible hit records

XX node for „linker“ searches60

Complexity ↑# incompletes ↑search time ↑

Nested G-Group search –Example 161

R1

R2

Search time: „seconds“

Nested G-Group search –Example 262

Cy, Q

R4: X, Ak, Cy

R1

R2

R3

Search time: „minutes“

Nested G-Group search –Example 363

Example 3 Search: limit of 1024 possible variations reached.Error message: SYSTEM ERROR

R4: X, Ak, Cy

R3

R2

R1

Matching of carbonyls adjacent to carbon chains64

General Spin-off Rules:

Spin-off are always generated from real nodes but never from generic nodes.

• E.g. CO1-CHK group

• Starting from CO1:

• a spin-off for the CO1-group is generated (CHK*), adjacent CHK included

Chain contraction takes place

• Starting from CHK:

• No spin-off for generic node CHK generated

No chain contraction with adjacent CO1 group

Dependency on Search Direction65

Example Query –CO-CH2-CH3

1) Search Direction from left to right

Query: -CO-CH2-CH3 Spin-off: -CO-CHK(C=2)

Target: -CO-CHK -CO-CHK

2) Search Direction from right to left

Query: -CO-CH2-CH3 Spin-off: CHK(C=3) O

Target: -CO-CHK CHK-CO-

Match !

No Match !

Example 166

CO1

Query Index

1

22

21

CHK*

CHK*

match

no match

• Search direction starting at

imidazolinone N

• No hit because there is no match

for CO1 moiety

Example 267

CO1

Query Index

22

1

2

CHK*

CHK*

match

• Search direction starting at NH2

• Hit (matching is complete)

CAShelp@cas.orgSupport:www.cas.org

FIZ Karlsruhehelpdesk@fiz-karlsruhe.deSupport:www.stn-international.de

For more information …