pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle...

30
CINF 13, ACS Fall 2017, Washington, D.C. pistachio Search and Faceting of Large Reaction Databases John Mayfield, Daniel Lowe, Roger Sayle

Transcript of pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle...

Page 1: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

CINF 13, ACS Fall 2017, Washington, D.C.

pistachioSearch and Faceting of Large Reaction Databases

JohnMayfield,DanielLowe,RogerSayle

Page 2: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

What do Synthetic Chemists Want from Their Reaction Systems?

CINF 13, ACS Fall 2017, Washington, D.C.

Data ClassificationDiagrams Search

Page 3: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

What do Synthetic Chemists Want from Their Reaction Systems?

CINF 13, ACS Fall 2017, Washington, D.C.

Data ClassificationDiagrams Search

Page 4: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

HazELNut Filbert NameRXN Cobnut

Accelrys Pipeline Pilot (AstraZeneca, AbbVie & Hoffmann-La Roche)

ChemAxon JChem Cartridge (GlaxoSmithKline & Novartis)

Elsevier Reaxys (Hoffmann-La Roche, AstraZeneca, Merck)

Perkin Elmer Informatics (formerly CambridgeSoft) eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x

Oracle Server version 10, 11 or

Microsoft Windows, Linux or Mac OS

Infrastructure for liberating and processing reactions from Electronic Lab Notebooks (ELNs)

CINF 13, ACS Fall 2017, Washington, D.C.

Page 5: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.

[0517]

US 2016/16966 A1

Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012

Page 6: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis, University of Cambridge, 2012

To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095 mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours. The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.

[0517]

Product Properties7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid

Reactant Properties7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol(3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol

Agent Properties1,4-dioxane 3mLwater 1.5mLsodium carbonate 435 mg, 4.10 moltetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmolDMSO

Unstructuredtexttoastructuredreactiontable

US 2016/16966 A1

LeadMine+ChemicalTagger

Page 7: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266

Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402

Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf. Model., 2015, 55 (1), pp 39–53

Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346

Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci., 2017, 3 (5), pp 434–443

Data impact

CINF 13, ACS Fall 2017, Washington, D.C.

Public subset released in 2014 as CC-Zero

Pistachio expands the scope of the data and uses Atom-Atom Maps from NameRxn

Page 8: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Example26.EpizymeInc.1-phenoxy-3-(alkylamino)-propan-2-olderivativesasCARM1inhibitorsandusesthereof(US09718816B2)Aug.1,2017

Example 26, US 09718816 B2

JohnMay,etal.SketchySketches:HidingChemistryinPlainSight.SeventhJointSheffieldConferenceonCheminformatics.2016

Step1

Step4

Step3

Step2

etc..

sketch extraction

NextMove’sPraline

Page 9: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

total reactions over time

CINF 13, ACS Fall 2017, Washington, D.C.

0

0.5M

1.0M

1.5M

2.0M

2.5M

3.0M

3.5M

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Rea

ctio

n D

etai

ls (c

umul

ativ

e) EPO ApplicationsEPO Grants

USPTO Applications

USPTO Grants

Page 10: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

What do Synthetic Chemists Want from Their Reaction Systems?

CINF 13, ACS Fall 2017, Washington, D.C.

Data ClassificationDiagrams Search

Page 11: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

reaction DIAGRAMSGood reaction diagrams are essential in communicating synthetic chemistry

Layout can be stored or generated • When extracting from text, layout must be generated • Generated diagrams can be unsatisfactory for display

CINF 13, ACS Fall 2017, Washington, D.C.

Page 12: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

O

OB

OH

HO

OH

O

O

Cl

N

HNC

O

PPd

P

P

P

O

O

Na+

Na+

-O O-

O

H2O

O

O

N

HNC

O

O OH

O

+

Che

mD

raw

OEC

hem

Generated from SMILES for US 2016/16966 A1 [0517]

Page 13: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Che

mA

xon

BIO

VIA

Generated from SMILES for US 2016/16966 A1 [0517]

Page 14: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

diagram improvementsTypical work arounds:

• Separately render molecules • Hide agents and list separately

What do humans do: • Wrap products below • Abbreviate functional groups and agents • Orientate reactants to products and visa versa • Hide agents and list as text

CINF 13, ACS Fall 2017, Washington, D.C.

Page 15: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Pist

achi

o+C

DK

(Abb

revi

ated

+Alig

ned)

Pist

achi

o+C

DK

(Abb

revi

ated

)

Generated from SMILES for US 2016/16966 A1 [0517]

Page 16: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

reaction detail view

Page 17: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

What do Synthetic Chemists Want from Their Reaction Systems?

CINF 13, ACS Fall 2017, Washington, D.C.

Data ClassificationDiagrams Search

Page 18: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

4.1.6CyclicBeckmannrearrangement

Assigns names to 900+ reactions using transformations

Can guarantee perfect Atom-Atom Mapping • Atom-Atom Mapping is an output not an input • MCS mappers struggle with rearrangements:

namerxn

Page 19: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

concepts and rxno

CINF 13, ACS Fall 2017, Washington, D.C.

1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n

Page 20: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

concepts and rxno

CINF 13, ACS Fall 2017, Washington, D.C.

1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n

Esterification(7)

Chan-Lamcoupling(3)

Schotten-BaumannReaction(9)

RXNO: http://github.com/rsc-ontologies/rxno

Page 21: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

result FACETSProvides summary over the key concepts of results

Cut through information deluge and refine search

CINF 13, ACS Fall 2017, Washington, D.C.

• Reaction Types (NextMove ontology tree) • Drug Targets (ChEMBL ontology tree) • Disease Targets (MESH ontology tree) • Yields • Affiliation (NextMove ontology tree) • Publication Date, Documents, Authors

Page 22: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

CINF 13, ACS Fall 2017, Washington, D.C.

Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz

2.9 seconds to summarise all 6.6 million rows

Resource expensive – O(n) size of result set • Client, server, or database? • Overhead copying and transferring data that is

not needed • Calculate when requested or up-front?

facet calculation

Custom cartridge:

Page 23: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

What do Synthetic Chemists Want from Their Reaction Systems?

CINF 13, ACS Fall 2017, Washington, D.C.

Data ClassificationDiagrams Search

Page 24: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

one entry point

CINF 13, ACS Fall 2017, Washington, D.C.

SystematicName DateRange TrivialName

YieldRange Affiliation ReactionSMARTS

DiseaseTarget DocumentLineFormula

SMILES InChIAuthor ProteinTarget Collection

ReactionType(NameRxn)SMARTSSource

…andlogicalcombinationsthereof

Page 25: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

suggestionsBased on global frequency

CINF 13, ACS Fall 2017, Washington, D.C.

Based on context frequency

Page 26: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

structure search technology

NextMove’s Arthor Technology

Up to 100x faster then state-of-the-art

Combination of SMARTS compilation and efficient storage

Preliminary PostgreSQL integration

36s Arthor 56m BIOVIA Direct (Oracle) 1h Bingo (NoSQL) 1h54m Bingo (PostgreSQL) 2h6m Bingo (Oracle) 2h41m JChem (Oracle) 5h9m RDCart (PostgreSQL) 13h54m pgchem (PostgreSQL) 1d1h52m mychem (MySQL) 3d1h13m orchem (Oracle)

Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same hardware.

John May and Roger Sayle, Substructure Search Face-off, May 2015

Page 27: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Intention can be refined by qualifiers Role {structure} product

Substructure {structure} substructure {structure} substructure product

Make/Break Synthesis of {structure}

Combined with other terms {structure} substructure product and yield of 80%

refining structure search

CINF 13, ACS Fall 2017, Washington, D.C.

Page 28: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Find:7H-purinesubstructureproduct

Find:Synthesisof7H-purine

make/break example

CINF 13, ACS Fall 2017, Washington, D.C.

Page 29: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Find:7H-purine-8-onesubstructurechlorination

Find:[*:1][CH2:2]Cl>>[*:1][CH2:2]F

Namerxn example

CINF 13, ACS Fall 2017, Washington, D.C.

Page 30: pistachio - NextMove Software€¦ · eNotebook v9, v11 or v13 or Symyx ELN v5.x or v6.x Oracle Server version 10, 11 or Microsoft Windows, Linux or Mac OS Infrastructure for liberating

Acknowledgements Noel O’Boyle (NextMove Software), Egon Willighagen (CDK) James Davison, Matt Swain (Vernalis)

What do Synthetic Chemists Want from Their Reaction Systems?

Data ClassificationDiagrams Search

pistachiohttp://www.nextmovesoftware.com/pistachio.html

Come find me around ACS for a demo!See also: CINF 90