Going a mile InChI by InChI : Enabling online chemistry at ChemSpider
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
852 -
download
0
description
Transcript of Going a mile InChI by InChI : Enabling online chemistry at ChemSpider
Going a mile InChI by InChIGoing a mile InChI by InChI
Enabling online chemistry at Enabling online chemistry at ChemSpiderChemSpider
Antony WilliamsAntony Williams
Building a Structure Centric Community for Chemists
ChemSpider 2009ChemSpider 2009
““Building a Structure Centric Community for Building a Structure Centric Community for Chemists”Chemists”
Hosting structures, spectra, images, documents, Hosting structures, spectra, images, documents, outlinksoutlinks
Many web services for retrieval of data, conversion Many web services for retrieval of data, conversion of files, generation of properties..of files, generation of properties..
Now a platform for:Now a platform for: data deposition, data deposition, curation and annotation – remove the curation and annotation – remove the
junkjunk Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts chemistry document mark-up with ChemMantischemistry document mark-up with ChemMantis the online ChemSpider Journal of Chemistrythe online ChemSpider Journal of Chemistry
Building a Structure Centric Community for Chemists
Statistics and ConnectionsStatistics and Connections
>6000 unique users per day on average>6000 unique users per day on average >40,000 transactions per day>40,000 transactions per day >21.4 million compounds and growing >21.4 million compounds and growing
dailydaily
Advocate of InChIs for searching and Advocate of InChIs for searching and integrationintegration
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
SearchingSearching
Structure searching based on Structure searching based on SMILESSMILES InChIStringInChIString InChIKeyInChIKey StdInChIStdInChI StdInChIKeyStdInChIKey molfile uploadsmolfile uploads structures drawn in appletstructures drawn in applet
Search across Google (to string limit for Search across Google (to string limit for InChIString)InChIString) Skeleton searchSkeleton search Full structure searchFull structure search
Building a Structure Centric Community for Chemists
InChIKey Searches WorkInChIKey Searches Work
Building a Structure Centric Community for Chemists
DepositionsDepositions
Depositions from users – single structures and Depositions from users – single structures and SDFsSDFs
Depositions from databases/vendors – SDF filesDepositions from databases/vendors – SDF files
And then came InChIs…And then came InChIs… InChIs and InChIKeys are available on Blogs for InChIs and InChIKeys are available on Blogs for
harvestingharvesting Publishers are making their structures available as Publishers are making their structures available as
InChIs for harvestingInChIs for harvesting InChIs are NOT ideal for building a database…some InChIs are NOT ideal for building a database…some
lessonslessons We want to link to publications especially…We want to link to publications especially…
Building a Structure Centric Community for Chemists
Chemistry PapersChemistry Papers
Cultivation of a rare Cultivation of a rare VerrucosisporaVerrucosispora strain (sediment, Sea strain (sediment, Sea of Japan) gave three polyketides, of Japan) gave three polyketides, atropatrop-abyssomicin C -abyssomicin C 3535, , abyssomicin G abyssomicin G 3636 and abyssomicin H and abyssomicin H 3737. . AtropAtrop--abyssomicin C abyssomicin C 3535 has previously been reported as a has previously been reported as a synthetic compound, but ready conversion to abyssomicin synthetic compound, but ready conversion to abyssomicin D suggests that it was probably naturally produced. D suggests that it was probably naturally produced. AtropAtrop-abyssomicin C was an inhibitor of -abyssomicin C was an inhibitor of S. aureusS. aureus N315 N315 (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. (MRSA) and 4-amino-4-deoxychorismate (ADC) synthase. The tenacibactins A–D The tenacibactins A–D 3838––4141, hydroxamate siderophores , hydroxamate siderophores isolated from culture of the filamentous bacterium isolated from culture of the filamentous bacterium TenacibaculumTenacibaculum sp. ( sp. (Chondrus ocellatusChondrus ocellatus, Awajishima , Awajishima Island, Japan), all possessed iron-chelating activity with Island, Japan), all possessed iron-chelating activity with tenacibactins C tenacibactins C 4040 and D and D 4141 being considerably more being considerably more effective than tenacibactins A effective than tenacibactins A 3838 and B and B 3939. .
Building a Structure Centric Community for Chemists
Structures in Chemistry Structures in Chemistry PapersPapers
Building a Structure Centric Community for Chemists
Aesthetics vs Machine Aesthetics vs Machine ReadableReadable
Beautiful Beautiful chemical structures submitted chemical structures submitted by authors can be by authors can be beastsbeasts for machines for machines
Building a Structure Centric Community for Chemists
InChI RepresentationInChI Representation
Building a Structure Centric Community for Chemists
InChI’fication of ArticlesInChI’fication of Articles
InChIs from publishers – a lot of work for InChIs from publishers – a lot of work for a publisher to provide exact structures for a publisher to provide exact structures for articles. Applause to RSC for Project articles. Applause to RSC for Project Prospect and now Nature ChemistryProspect and now Nature Chemistry
An enormous editorial task with a massive An enormous editorial task with a massive benefit to the communitybenefit to the community
If the structures were correct…imagine a If the structures were correct…imagine a centralized DOI:InChI databasecentralized DOI:InChI database
Building a Structure Centric Community for Chemists
Cleaning Structures Cleaning Structures
Building a Structure Centric Community for Chemists
Converting InChIs to StructuresConverting InChIs to StructuresBacitracin ABacitracin A
InChI=1/C66H103N17O16S/......./InChI=1/C66H103N17O16S/......./t35t35uu,36,36uu,37,37uu,40-,41+,42+,43-,44+,45-,46-,47+,,40-,41+,42+,43-,44+,45-,46-,47+,4848uu,52-,53-,54-/m0/s1,52-,53-,54-/m0/s1
InChI=1/C66H103N17O16S/.......)/t35?,36?,37?,40-,41+,42+,43-,44+,45-,46-,47+,48?,52-,53-,54-/m0/s1
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Converting InChIs to Converting InChIs to StructuresStructures
What we want is a good layout, retention What we want is a good layout, retention of stereochemistry labels and tautomers of stereochemistry labels and tautomers as drawnas drawn
Building a Structure Centric Community for Chemists
Auxinfo – Who Uses It? Who Auxinfo – Who Uses It? Who Converts It?Converts It?
AuxInfo=1/1/N:52,24,90,40,41,50,100,60,51,23,61,68,66,67,17,16,83,65,64,15,81,30,28,84,92,37,62,99,74,71,13,45,11,39,49,22,59,63,9,88,79,31,35,95,98,75,70,42,76,25,5,47,19,56,77,86,78,26,34,93,7,6,53,18,55,44,85,2,48,12,91,10,80,33,14,38,96,73,69,94,43,58,21,1,27,32,3,4,89,87,82,29,36,97,8,72,54,20,57,46/E:(4,5)(13,14)(18,19)(85,86)(87,88)/it:im/rA:100cONOOCCCOCNCNCNCCCCCONCCCCCOCOCCONCCOCNCCCCNCCSCNCCCCCOCCONCCCCCCCCCCNCCONCCCCCCNCOCCNCOCOCNCCNCNOCCC/rB:;;;s3d4;;;d7;;s9;s10;d11;d9s12;;;;s15s16;s14;s18;d18;N19;s19;s22;s23;;s21;d25;s25;d26;s28;s26s30;s25;N31;s33;s34;d34;s35;N35;s37;s39;s39;;s42;d43;s42;s44s45;s44;s47;s47;s49;s49;s51;s38s42;d53;;s55;d55;P56;s56;s59;s59;;s62;d63;s63;d65;s64;s66d67;s7;s6p69;s5s70;d6;s6;;n73s74;d1s2s74;s75;s58;s78;N79;s79;d78;s81;s83;s84;s80;d86;p14s15s86;d77;s61;s77;s16s91;;s55;s62s93p94;s93;d93;s7n96;s9s98;s22;/rC:12.1656,-8.504,0;12.1656,-6.2884,0;16.5968,-3.1336,0;15.3445,-5.2769,0;16.5968,-4.5785,0;16.7654,-7.5166,0;20.0406,-7.5166,0;20.0406,-6.2402,0;22.208,-6.2161,0;23.2436,-5.4937,0;22.8582,-4.2895,0;21.6059,-4.2895,0;21.1725,-5.4937,0;19.294,-19.028,0;16.067,-19.6542,0;11.2264,-17.9202,0;13.1048,-19.6542,0;20.45,-18.426,0;21.5337,-19.028,0;20.45,-17.1255,0;21.5578,-21.1954,0;22.6174,-18.4019,0;22.6174,-17.1255,0;23.7252,-16.4512,0;19.2459,-23.9168,0;22.7137,-21.8698,0;19.2699,-25.2654,0;20.4018,-23.2425,0;23.8697,-21.1714,0;21.5819,-23.8927,0;22.7378,-23.1943,0;18.0899,-23.2665,0;23.9179,-23.8686,0;25.0497,-23.1702,0;26.2298,-23.8686,0;25.0497,-21.8457,0;27.3617,-23.1702,0;26.2298,-25.2172,0;27.3617,-21.8457,0;28.5417,-21.1714,0;26.2298,-21.1714,0;28.5417,-25.2172,0;29.8181,-25.6025,0;30.6128,-24.5188,0;28.5417,-23.8686,0;29.8181,-23.411,0;31.9614,-24.5188,0;32.6357,-25.6989,0;32.6357,-23.3629,0;31.9614,-22.2069,0;33.9603,-23.3629,0;34.6346,-24.5188,0;27.3617,-25.8675,0;27.3617,-27.2161,0;20.0165,-11.3457,0;18.9328,-11.9959,0;20.0165,-10.0452,0;18.9087,-13.3686,0;17.825,-11.3457,0;16.7172,-11.9959,0;17.825,-10.0693,0;23.5807,-11.7551,0;23.5807,-13.0315,0;24.7367,-13.6817,0;22.5211,-13.6817,0;22.5211,-14.9822,0;24.7367,-14.934,0;23.5807,-15.5842,0;18.9328,-8.1427,0;17.8009,-7.5166,0;17.8009,-5.2769,0;16.091,-6.3365,0;16.1633,-8.6003,0;14.0681,-7.3962,0;14.7906,-8.6003,0;12.8158,-7.3962,0;14.0681,-9.8044,0;17.5119,-13.3686,0;16.8135,-14.5728,0;17.5119,-15.7528,0;15.4408,-14.5728,0;17.0699,-12.6613,0;14.7665,-15.7528,0;13.3697,-15.7528,0;12.6713,-16.9569,0;16.8135,-16.9569,0;15.4408,-16.9569,0;17.5119,-18.137,0;14.7906,-10.9845,0;19.0291,-9.395,0;12.6954,-9.8044,0;11.2264,-11.1049,0;22.2321,-10.0452,0;21.1243,-11.9478,0;22.2321,-11.3457,0;21.1243,-9.4191,0;23.3399,-9.4191,0;21.1243,-8.1186,0;22.2321,-7.5166,0;23.7252,-19.028,0;
Building a Structure Centric Community for Chemists
Who Has Responsibility? Who Has Responsibility?
WhoWho will take responsibility for will take responsibility for drawing/enumerating the structures? drawing/enumerating the structures?
Where can software contribute?Where can software contribute? What Quality is “good enough”?What Quality is “good enough”? We MUST reduce rework!!!We MUST reduce rework!!!
Building a Structure Centric Community for Chemists
Does one stereocenter matter?Does one stereocenter matter?
Distaval, Talimol, Distaval, Talimol, Nibrol, Sedimide, Nibrol, Sedimide, Quietoplex, Quietoplex, Contergan, Contergan, Neurosedyn, and Neurosedyn, and Softenon Softenon
Building a Structure Centric Community for Chemists
The InChI ResolverThe InChI Resolver
Building a Structure Centric Community for Chemists
HVYWMOMLDIMFJA-HVYWMOMLDIMFJA-DPAQBDIFSA-N DPAQBDIFSA-N
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Resolve-ItResolve-It
Building a Structure Centric Community for Chemists
Resolve-ItResolve-It
Building a Structure Centric Community for Chemists
Pretty-ItPretty-It
Building a Structure Centric Community for Chemists
JMol-It, Download-It and JMol-It, Download-It and Zoom-ItZoom-It
Building a Structure Centric Community for Chemists
Kind-of-Resolve-It Kind-of-Resolve-It
Building a Structure Centric Community for Chemists
Generate-ItGenerate-It
Building a Structure Centric Community for Chemists
Draw-It : Thanks Symyx (Beta Draw-It : Thanks Symyx (Beta release)release)
Building a Structure Centric Community for Chemists
Generate-ItGenerate-It
Building a Structure Centric Community for Chemists
All FlavorsAll Flavors
Building a Structure Centric Community for Chemists
Serve Up ServicesServe Up Services
Building a Structure Centric Community for Chemists
And Once It’s Resolved…And Once It’s Resolved…
Building a Structure Centric Community for Chemists
Out to ChemSpider…and its Out to ChemSpider…and its resourcesresources
Building a Structure Centric Community for Chemists
COMING: InChI Resolver to COMING: InChI Resolver to DOIsDOIs
Building a Structure Centric Community for Chemists
Full Text-Based Literature Full Text-Based Literature Searching to DOIsSearching to DOIs
Including Citations NowIncluding Citations Now
Building a Structure Centric Community for Chemists
When Structures are When Structures are “Connected”“Connected”
Building a Structure Centric Community for Chemists
When Structures are “Connected”When Structures are “Connected”
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider Everywhere
Linked from WikipediaLinked from Wikipedia Linked from Open Notebook Science sites Linked from Open Notebook Science sites
using EMBEDusing EMBED Linked from Blogs using Structure/Spectra Linked from Blogs using Structure/Spectra
EMBEDEMBED Integrated into structure drawing packages Integrated into structure drawing packages
such as ACD/ChemSketch, Symyx Draw, such as ACD/ChemSketch, Symyx Draw, Open Source appletsOpen Source applets
Integrated to software offerings from Integrated to software offerings from Thermo, Waters, Agilent, BrukerThermo, Waters, Agilent, Bruker
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereEmbed Functionality (like Embed Functionality (like
YouTube)YouTube)
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider Everywherewww.spectralgame.comwww.spectralgame.com
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereCrowdsourced Curation of SpectraCrowdsourced Curation of Spectra
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereRSC CompoundsRSC Compounds
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereNature ChemistryNature Chemistry
Nature ChemistryNature Chemistry articles are articles are annotated to identify all of the annotated to identify all of the chemical compounds chemical compounds mentioned throughout the text. mentioned throughout the text. Users can choose to view the Users can choose to view the article with all of the article with all of the compounds highlighted, and compounds highlighted, and find out more about those find out more about those compounds by linking out to compounds by linking out to other information resources other information resources including PubChem and including PubChem and ChemSpiderChemSpider. .
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereChemMobiChemMobi
Building a Structure Centric Community for Chemists
Structure RSS Feeds with Structure RSS Feeds with InChIsInChIs
Building a Structure Centric Community for Chemists
InChIs are IncompleteInChIs are Incomplete
What is NOT supported, yet:What is NOT supported, yet: polymerspolymers organometallicsorganometallics Markush structuresMarkush structures 3-D structures3-D structures excited statesexcited states interlocking structures (e.g. rotaxanes) interlocking structures (e.g. rotaxanes) host-guest complexeshost-guest complexes
Building a Structure Centric Community for Chemists
Progressing InChIProgressing InChI
Highest priorityHighest priority for the InChI Team is for the InChI Team is communication with structure drawing package communication with structure drawing package vendors – vendors – THETHE interfaces to the users interfaces to the users
For the InChI Resolver : Delivery of services to For the InChI Resolver : Delivery of services to allow publishers to deposit their structure allow publishers to deposit their structure collections with associated DOIs to ChemSpidercollections with associated DOIs to ChemSpider
Not every structure is important…Discussions Not every structure is important…Discussions with Publishers to discern primary compoundswith Publishers to discern primary compounds
Building a Structure Centric Community for Chemists
ConclusionsConclusions
InChIs and Internet InChIs and Internet ChemistryChemistry
http://http://inchis.chemspider.cominchis.chemspider.com
Building a Structure Centric Community for Chemists
AcknowledgmentsAcknowledgments
Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry Keith Taylor, SymyxKeith Taylor, Symyx Chris Singleton, Steven Bachrach and Chris Singleton, Steven Bachrach and
Alan McNaught for feedbackAlan McNaught for feedback ““The InChI team”The InChI team”