Substance and Chemical Structure Searching in CAS · PDF fileSubstance and Chemical Structure...
Transcript of Substance and Chemical Structure Searching in CAS · PDF fileSubstance and Chemical Structure...
Substance and Chemical Structure Searching in CAS REGISTRYSM and DCR on new STN®
• Settings and Cross File Search operators • Substance search fields in CAS REGISTRYSM
• Structure examples • DCR/DWPI substance and structure search • Multifile search example
Agenda
3
Search settings for substance and structure searches
The default scope for structure searches is FULL.
Automatic Cross File Search can be toggled on or off under General Search.
4
Display settings for substance and structure searches
• Hit structure default displays in bibliographic records can be set on the Display tab in Settings.
• DWPISM hit structures from DCR only display in FULL format.
• CAplusSM default format can be set with Show hit structures.
• Both can be toggled in the record display.
CAplus
5
Cross File Search with REFX to find bibliographic references indexed to substances
Query Cross File Search (REFX L1) Where L1 is a substance search in REGISTRY or DCR
CAplus - Retrieves bibliographic references indexed to the Registry Numbers for the substances in L1 DWPI/DCR- Retrieves bibliographic references indexed to the DCR number for the substances in L1
Alternatively you can enter any substance query instead of L1, e.g. (REFX CAFFEINE/CN).
6
Cross File Search with SUBX to find all substance records indexed to references
Query Cross File Search (SUBX L1) Where L1 is a search in CAplus or DWPI
REGISTRY - Retrieves all REGISTRY records indexed to the bibliographic records from L1 in CAplus DWPI/DCR - Retrieves all DCR number records indexed to the bibliographic records in L1 from DWPI
Alternatively you can enter any bibliographic query instead of L1, e.g. (SUBX (L’OREAL/PA AND A61K/IPC,CPC)).
7
CAS REGISTRYSM and CAplusSM on new STN
CAS REGISTRYSM
94+ million substances
CAplusSM
41+ million references
CAS Registry Numbers®
REFX
SUBX
8
Substance search fields in CAS REGISTRY
• Chemical Names (CN) • Chemical Name Segments (CNS) • Molecular formula (MF) • Component Molecular Formula (CMF) • Element Symbol (ELS)
9
Lookup Chemical Names in the Term Explorer
10
Chemical Name Segments are parsed at punctuation and spaces
11
Chemical Name Segments can have left and right truncation
12
Molecular formula searches and Component Molecular Formula searches
13
Searching Element Symbol (ELS) and counts
14
New STN Structure Editor
15
Create a structure in several ways
• Draw in the structure editor (recommended) • Convert SMILES or InChI strings to structures using Add
to Editor by External Identifier • Copy/Paste from another software program such as
ChemDraw® or ISIS/DrawTM
• Import a structure saved as .cxf or .mol file format ‒ Attributes of the original drawing program may be retained or
changed
16
STN Help has details on all the drawing tools
Click Demo to watch a 10 second video of how to use the tool.
17
Develop a structure search query
Find compounds which meet the following criteria:
• The ring system shown is mono- or bicyclic • R1 = alkyl, alkylene or alkenylene 1-10 C • R2 = O, N, S, or a bond • R3 is a substituted heterocyclic ring containing
exactly 1 N, and up to 1 O or S atom
18
Identify the structure pieces
C = 0 C = 1-6
19
Isolate rings for REGISTRY searches using the lock rings tool
20
Isolate rings for REGISTRY searches using the lock rings tool
Isolated rings have thick, bold bonds.
21
Set Bond Attributes using right-click
22
Set Node Attributes using right-click
23
Define R-groups
24
Set generic ring attributes
25
Set Node attributes
Nodes which have attributes applied display with an asterisk.
26
Show/Hide Attribute Values Panel to verify query
Position your mouse cursor over the attribute value, and portions of the query that have the attribute will be highlighted.
27
Continue to verify the query
When multiple element counts are applied the panel does not highlight, but the nodes do.
Click OK when ready to submit.
28
The system automatically places the query in the Query Builder panel
*Automatic Cross File Search was on for this search.
29
View Counts to see how system interpreted the query
30
Modify structures easily by clicking on the structure under the structure tab
Simply modify the structure by clicking to open it, make your modifications, and click OK.
All structure queries are saved under new STR numbers, unless you click Cancel.
31
History panel to this point
*Automatic Cross File Search was on for these searches.
32
Click on tabs to view results from each database
33
Refine with CAS Roles with Cross File Search ON
See STN Help for CAS roles and definitions.
34
Use parentheses with REFX for best results
Query Cross File Search
1 ((REFX L1) OR (REFX L2)) (U) (THU OR PKT OR PAC)/RL
2 (REFX L1 OR L2) (U) (THU OR PKT OR PAC)/RL X 3
L1 OR L2 (L3 = REGISTRY answers) (REFX L3) (U) (THU OR PKT OR PAC)/RL
4 REFX L3 (U) (THU OR PKT OR PAC)/RL X 5
(REFX (L1 OR L2)) (L6 = CAplus answers) L6 (U) (THU OR PKT OR PAC)/RL
35
Refine a structure or substance search with modifying text in the IT field
The (U) relational operator is defined between a REGISTRY Cross File Search, CAS roles, and modifying text, which must be searched in the IT field.
(U)
36
Comparison search with automatic Cross File search OFF
37
Optionally choose a broad CAS role
The system automatically switches to CAplus and searches “refx l5.”
38
Refine the CAplus L-number with roles and text
39
Complex R-groups
R1 R2
40
Two ways to search with disconnected fragments in REGISTRY
• Draw structure fragments in separate windows, combine structures with the AND operator ‒ Finds single and multi-component substances ‒ Fragments may be in the same or different components (SSS) ‒ There may be over lap between the fragments (SSS)
• Draw structure fragments in the same window ‒ Finds single and multi-component substances ‒ Fragments may be in the same or different components (SSS) ‒ There will be no overlap between the fragments (SSS)
41
Search two separate fragments with AND
42
Results from combining two separate structure queries with the AND operator
Overlapping 2 separate components
Disconnected
43
Search two separate fragments drawn in the same window
44
Results from searching two separate fragments drawn in the same window
2 separate components*
Disconnected
May have overlap, but a disconnected fragment must be present.
* Different from classic STN
45
FAMILY searches with fragments
A FAMILY search finds the same answers with 2 separate structure queries, or when 2 fragments are drawn in the same window.
46
Example answers from FAMILY search
114205-82-2 C54 H104 O18 S3 . 3 H3 N Incompletely Defined Substance (IDS)
1644285-27-7 C6 H14 O6 . x H2 O4 S
FAMILY searches find two or more component answers. This can be useful in polymer searches when you have specific monomers you want to be present, but other monomers can also be present, or you may be interested in Incompletely Defined Substances.
47
EXACT searches with fragments drawn in a single window
48
Find substances from references with SUBX
Asterisks in the patent family indicate publication numbers which have been indexed as basics in CAplus.
49
Click Get Substances to retrieve substances from REGISTRY
50
SUBX finds substances in REGISTRY
The system enters REGISTRY and uses SUBX with the accession number of the CAplus record to Cross File Search the Registry Numbers associated with that record, resulting in 125 substances in L2.
51
Compare answers from each record
52
What is the DWPI Chemistry Resource (DCR)?
• DCR is a chemical structure database covering specific chemical structures indexed in Derwent World Patents Index® (DWPISM) patent records
• Fully integrated with DWPI on new STN
28,000,000+ Patent records
2,400,000+ Substance
records
DCR DWPI
53
DWPI Chemistry Resource (DCR)
• For each specific chemical substance a DCR record is created with a unique DCR number ‒ Basic compound ‒ Salts, isotopes, mixtures, isomers
• Substance records include structure diagrams and substance data, e.g. ‒ IUPAC-name, synonyms ‒ Molecular formula, molecular weight
• DCR numbers (/DCR) form the connection to DWPI patent records
54
. . . .
DCR substance record detailed display
Chemical structures are searchable in the standard new STN format.
DCR can often be a useful source of synonym chemical names (/CN).
DCR numbers connect DCR substance records to DWPI patent records.
55
A note on DCR chemical name fields
• A Preferred Name (/CN.P) may be chosen by Thomson Reuters, e.g. a generic drug name
• Synonyms (/SY) may be selected for inclusion by Thomson Reuters, e.g. trivial names, trade names
• Chemical Name (/CN) field provides a one step search of all Preferred (/CN.P) and Synonym (/SY) names
• A Systematic Chemical Name (/CN.S) may also be available, generated using AutoNom software
• Chemical name segment (/CNS) field provides name fragment searching for all CN and CN.S names
56
Searching DCR numbers
• Use the /AN.S field to retrieve substance records 1 DCR-368/AN.S
• Use the /DCR field to retrieve patent records
DCR-90453 is the DCR number for cetirizine.
57
DWPI patent record detailed display full view
. . . . . . . . . . . .
. . . .
DCR numbers connect DCR substance records to DWPI patent records.
DCR hit structures are automatically displayed in detailed display full view.
58
DCR coverage
• Specific chemical compounds indexed by Thomson Reuters from basic patents in DWPI
• DWPI patents classified in pharmaceutical (B), agrochemical (C) and/or general chemical (E)
• Comprehensive coverage began in 4/1999 • Selective coverage for approximately
‒ 20,000 substances from 1/1987 to date ‒ 2,100 substances from 7/1981 to date
See also: DWPI CPI Chemical Indexing Guidelines: http://ip-science.thomsonreuters.com/m/pdfs/mgr/chemical_index_guidelines.pdf
59
Multi-database chemical structure search example
Search Question: A group of compounds are described as having analgesic properties as kappa opioid agonists. Find similar substances.
60
Identify a common core structure
61
Include query details
• The 6-membered ring is monocyclic or polycyclic, saturated or normalized
• R1-N-R2 describe rings of various sizes, but R1 and R2 are always C
• The ring N is attached directly to the 6-membered ring, or via a 2 C-chain linker
• The amide chain is connected to either the linker or the ring
• Ar is defined as an aryl or heteroaryl ring • n is defined as 1-3 C
62
Multi-database chemical structure search steps
1. Create a new project and select databases 2. Prepare the structure queries 3. Run the structure searches in REGISTRY and DCR 4. Crossover the results to CAplus and DWPI – REFX 5. Use Create Term List to identify unique hits in
DWPI via the CAplus Patent Number/Kind (PNK)
63
. . . .
Create a new project and select databases
64
Prepare the structure queries
• Right click on a node or bond to change Attribute Values
• Mouse over the query or attribute panel to verify Attribute Values
• Click OK to add the query to the structures tab of the history panel
65
Prepare the structure queries (cont.)
Notes: • Manually assigned ring bonds are
represented with a circle symbol • Unspecified bonds are represented
by a dashed line
66
Run the structure searches in REGISTRY and DCR
• Structure queries are assigned STR numbers for searching
• Use REFX to retrieve references (L3)
67
Use Create Term List to identify unique hits
• Create Term List is used to extract data and transfer terms to other databases for searching
• Main focus is on patent information ‒ PN, PNK, PRN, AP available in all patent databases ‒ Basic versions (.B) available in patent family databases ‒ RN, CN and DOI are also available
• Term Lists are identified by Q# ‒ Permanent asset, project independent ‒ Can be searched in combination with other terms ‒ Can be re-qualified with one or more field codes
67
68
Use Create Term List to identify unique hits (cont.)
69
Search Term Lists via their assigned Q-numbers
Q18 = patent number/kind taken from CAplus (L3).
L3 = CAplus and DWPI combined search results.
Patent records only found in DWPI (L4).
Manage Term lists.
70
Additional records found in DWPI
Including DCR/DWPI is an essential part of completing a comprehensive chemical structure prior art search.
DCR hit structures are automatically displayed in detailed display full view.
• Set default hit structure display for CAplus and DWPI in settings
• Use STN Help to find details on structure drawing tools • Strategies can be as broad or narrow as you want • Create Term Lists to search patent publications in
different databases
Summary
CAS [email protected] Support and Training: www.cas.org
FIZ Karlsruhe [email protected] Support and Training: www.stn-international.de
For more information …