Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

55
Going a mile InChI by InChI Going a mile InChI by InChI Enabling online chemistry at Enabling online chemistry at ChemSpider ChemSpider Antony Williams Antony Williams

description

The task of finding chemical information online can be daunting since even the most rudimentary query on Google can provide tens to hundreds of thousands of links to peruse. While there has been an increase in the number of online chemical structure databases there has not been a central online resource allowing integrated chemical structure-searching of chemistry databases, chemistry articles, patents and web pages, such as blogs and wikis, until now. ChemSpider provides a significant knowledge base and resource for chemists working in different domains. From the perspective of the InChI identifiers this project can be considered to be a success story since ChemSpider has used both for the development of the database and the provision of fast searching routines. ChemSpider has provided web services for both InChI generation and searching, leading to a proliferation of InChI in the web-based domain of chemistry. This talk will provide an update of ChemSpiders functionality.

Transcript of Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Page 1: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Going a mile InChI by InChIGoing a mile InChI by InChI

Enabling online chemistry at Enabling online chemistry at ChemSpiderChemSpider

Antony WilliamsAntony Williams

Page 2: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider 2009ChemSpider 2009

““Building a Structure Centric Community for Building a Structure Centric Community for Chemists”Chemists”

Hosting structures, spectra, images, documents, Hosting structures, spectra, images, documents, outlinksoutlinks

Many web services for retrieval of data, conversion Many web services for retrieval of data, conversion of files, generation of properties..of files, generation of properties..

Now a platform for:Now a platform for: data deposition, data deposition, curation and annotation – remove the curation and annotation – remove the

junkjunk Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts chemistry document mark-up with ChemMantischemistry document mark-up with ChemMantis the online ChemSpider Journal of Chemistrythe online ChemSpider Journal of Chemistry

Page 3: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Statistics and ConnectionsStatistics and Connections

>6000 unique users per day on average>6000 unique users per day on average >40,000 transactions per day>40,000 transactions per day >21.4 million compounds and growing >21.4 million compounds and growing

dailydaily

Advocate of InChIs for searching and Advocate of InChIs for searching and integrationintegration

Page 4: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 5: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 6: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 7: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 8: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 9: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Search CholesterolSearch Cholesterol

Page 10: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

SearchingSearching

Structure searching based on Structure searching based on SMILESSMILES InChIStringInChIString InChIKeyInChIKey StdInChIStdInChI StdInChIKeyStdInChIKey molfile uploadsmolfile uploads structures drawn in appletstructures drawn in applet

Search across Google (to string limit for Search across Google (to string limit for InChIString)InChIString) Skeleton searchSkeleton search Full structure searchFull structure search

Page 11: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

InChIKey Searches WorkInChIKey Searches Work

Page 12: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

DepositionsDepositions

Depositions from users – single structures and Depositions from users – single structures and SDFsSDFs

Depositions from databases/vendors – SDF filesDepositions from databases/vendors – SDF files

And then came InChIs…And then came InChIs… InChIs and InChIKeys are available on Blogs for InChIs and InChIKeys are available on Blogs for

harvestingharvesting Publishers are making their structures available as Publishers are making their structures available as

InChIs for harvestingInChIs for harvesting InChIs are NOT ideal for building a database…some InChIs are NOT ideal for building a database…some

lessonslessons We want to link to publications especially…We want to link to publications especially…

Page 13: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Chemistry PapersChemistry Papers

Cultivation of a rare Cultivation of a rare VerrucosisporaVerrucosispora strain (sediment, Sea strain (sediment, Sea of Japan) gave three polyketides, of Japan) gave three polyketides, atropatrop-abyssomicin C -abyssomicin C 3535, , abyssomicin G abyssomicin G 3636 and abyssomicin H and abyssomicin H 3737. . AtropAtrop--abyssomicin C abyssomicin C 3535 has previously been reported as a has previously been reported as a synthetic compound, but ready conversion to abyssomicin synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. D suggests that it was probably naturally produced. AtropAtrop-abyssomicin C was an inhibitor of -abyssomicin C was an inhibitor of S. aureusS. aureus N315 N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D The tenacibactins A–D 3838––4141, hydroxamate siderophores , hydroxamate siderophores isolated from culture of the filamentous bacterium isolated from culture of the filamentous bacterium TenacibaculumTenacibaculum sp. ( sp. (Chondrus ocellatusChondrus ocellatus, Awajishima , Awajishima Island, Japan), all possessed iron-chelating activity with Island, Japan), all possessed iron-chelating activity with tenacibactins C tenacibactins C 4040 and D and D 4141 being considerably more being considerably more effective than tenacibactins A effective than tenacibactins A 3838 and B and B 3939. .

Page 14: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Structures in Chemistry Structures in Chemistry PapersPapers

Page 15: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Aesthetics vs Machine Aesthetics vs Machine ReadableReadable

Beautiful Beautiful chemical structures submitted chemical structures submitted by authors can be by authors can be beastsbeasts for machines for machines

Page 16: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

InChI RepresentationInChI Representation

Page 17: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

InChI’fication of ArticlesInChI’fication of Articles

InChIs from publishers – a lot of work for InChIs from publishers – a lot of work for a publisher to provide exact structures for a publisher to provide exact structures for articles. Applause to RSC for Project articles. Applause to RSC for Project Prospect and now Nature ChemistryProspect and now Nature Chemistry

An enormous editorial task with a massive An enormous editorial task with a massive benefit to the communitybenefit to the community

If the structures were correct…imagine a If the structures were correct…imagine a centralized DOI:InChI databasecentralized DOI:InChI database

Page 18: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Cleaning Structures Cleaning Structures

Page 19: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Converting InChIs to StructuresConverting InChIs to StructuresBacitracin ABacitracin A

InChI=1/C66H103N17O16S/......./InChI=1/C66H103N17O16S/......./t35t35uu,36,36uu,37,37uu,40-,41+,42+,43-,44+,45-,46-,47+,,40-,41+,42+,43-,44+,45-,46-,47+,4848uu,52-,53-,54-/m0/s1,52-,53-,54-/m0/s1

InChI=1/C66H103N17O16S/.......)/t35?,36?,37?,40-,41+,42+,43-,44+,45-,46-,47+,48?,52-,53-,54-/m0/s1

Page 20: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Page 21: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Converting InChIs to Converting InChIs to StructuresStructures

What we want is a good layout, retention What we want is a good layout, retention of stereochemistry labels and tautomers of stereochemistry labels and tautomers as drawnas drawn

Page 22: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Auxinfo – Who Uses It? Who Auxinfo – Who Uses It? Who Converts It?Converts It?

AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0;

Page 23: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Who Has Responsibility? Who Has Responsibility?

WhoWho will take responsibility for will take responsibility for drawing/enumerating the structures? drawing/enumerating the structures?

Where can software contribute?Where can software contribute? What Quality is “good enough”?What Quality is “good enough”? We MUST reduce rework!!!We MUST reduce rework!!!

Page 24: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Does one stereocenter matter?Does one stereocenter matter?

Distaval, Talimol, Distaval, Talimol, Nibrol, Sedimide, Nibrol, Sedimide, Quietoplex, Quietoplex, Contergan, Contergan, Neurosedyn, and Neurosedyn, and Softenon Softenon

Page 25: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

The InChI ResolverThe InChI Resolver

Page 26: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

HVYWMOMLDIMFJA-HVYWMOMLDIMFJA-DPAQBDIFSA-N DPAQBDIFSA-N

Page 27: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Page 28: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Resolve-ItResolve-It

Page 29: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Resolve-ItResolve-It

Page 30: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Pretty-ItPretty-It

Page 31: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

JMol-It, Download-It and JMol-It, Download-It and Zoom-ItZoom-It

Page 32: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Kind-of-Resolve-It Kind-of-Resolve-It

Page 33: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Generate-ItGenerate-It

Page 34: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Draw-It : Thanks Symyx (Beta Draw-It : Thanks Symyx (Beta release)release)

Page 35: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Generate-ItGenerate-It

Page 36: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

All FlavorsAll Flavors

Page 37: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Serve Up ServicesServe Up Services

Page 38: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

And Once It’s Resolved…And Once It’s Resolved…

Page 39: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Out to ChemSpider…and its Out to ChemSpider…and its resourcesresources

Page 40: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

COMING: InChI Resolver to COMING: InChI Resolver to DOIsDOIs

Page 41: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Full Text-Based Literature Full Text-Based Literature Searching to DOIsSearching to DOIs

Including Citations NowIncluding Citations Now

Page 42: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

When Structures are When Structures are “Connected”“Connected”

Page 43: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

When Structures are “Connected”When Structures are “Connected”

Page 44: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider Everywhere

Linked from WikipediaLinked from Wikipedia Linked from Open Notebook Science sites Linked from Open Notebook Science sites

using EMBEDusing EMBED Linked from Blogs using Structure/Spectra Linked from Blogs using Structure/Spectra

EMBEDEMBED Integrated into structure drawing packages Integrated into structure drawing packages

such as ACD/ChemSketch, Symyx Draw, such as ACD/ChemSketch, Symyx Draw, Open Source appletsOpen Source applets

Integrated to software offerings from Integrated to software offerings from Thermo, Waters, Agilent, BrukerThermo, Waters, Agilent, Bruker

Page 45: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider EverywhereEmbed Functionality (like Embed Functionality (like

YouTube)YouTube)

Page 46: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider Everywherewww.spectralgame.comwww.spectralgame.com

Page 47: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider EverywhereCrowdsourced Curation of SpectraCrowdsourced Curation of Spectra

Page 48: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider EverywhereRSC CompoundsRSC Compounds

Page 49: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider EverywhereNature ChemistryNature Chemistry

Nature ChemistryNature Chemistry articles are articles are annotated to identify all of the annotated to identify all of the chemical compounds chemical compounds mentioned throughout the text. mentioned throughout the text. Users can choose to view the Users can choose to view the article with all of the article with all of the compounds highlighted, and compounds highlighted, and find out more about those find out more about those compounds by linking out to compounds by linking out to other information resources other information resources including PubChem and including PubChem and ChemSpiderChemSpider. .

Page 50: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemSpider EverywhereChemMobiChemMobi

Page 51: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Structure RSS Feeds with Structure RSS Feeds with InChIsInChIs

Page 52: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

InChIs are IncompleteInChIs are Incomplete

What is NOT supported, yet:What is NOT supported, yet: polymerspolymers organometallicsorganometallics Markush structuresMarkush structures 3-D structures3-D structures excited statesexcited states interlocking structures (e.g. rotaxanes) interlocking structures (e.g. rotaxanes) host-guest complexeshost-guest complexes

Page 53: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

Progressing InChIProgressing InChI

Highest priorityHighest priority for the InChI Team is for the InChI Team is communication with structure drawing package communication with structure drawing package vendors – vendors – THETHE interfaces to the users interfaces to the users

For the InChI Resolver : Delivery of services to For the InChI Resolver : Delivery of services to allow publishers to deposit their structure allow publishers to deposit their structure collections with associated DOIs to ChemSpidercollections with associated DOIs to ChemSpider

Not every structure is important…Discussions Not every structure is important…Discussions with Publishers to discern primary compoundswith Publishers to discern primary compounds

Page 54: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

ConclusionsConclusions

InChIs and Internet InChIs and Internet ChemistryChemistry

http://http://inchis.chemspider.cominchis.chemspider.com

Page 55: Going a mile InChI by InChI : Enabling online chemistry at ChemSpider

Building a Structure Centric Community for Chemists

AcknowledgmentsAcknowledgments

Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry Keith Taylor, SymyxKeith Taylor, Symyx Chris Singleton, Steven Bachrach and Chris Singleton, Steven Bachrach and

Alan McNaught for feedbackAlan McNaught for feedback ““The InChI team”The InChI team”