Presentation of ChemSPider at PubChem Public Meeting
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
941 -
download
2
description
Transcript of Presentation of ChemSPider at PubChem Public Meeting
ChemSpiderChemSpider
Creating a Structure Centric Creating a Structure Centric Community for ChemistsCommunity for Chemists
Antony WilliamsAntony Williams
[email protected]@chemspider.com
Building a Structure Centric Community for Chemists
The ChemSpider MissionThe ChemSpider Mission
Build a structure centric community for Build a structure centric community for chemists by:chemists by: Providing an environment for structure drawing, Providing an environment for structure drawing,
manipulation, visualization, modeling, databasing manipulation, visualization, modeling, databasing and searchingand searching
Providing methods by which to deposit, curate and Providing methods by which to deposit, curate and enhance data associated with chemical structuresenhance data associated with chemical structures
Providing structure-based access to federated Providing structure-based access to federated Chemistry databases representing chemical Chemistry databases representing chemical vendors, literature, online data, patents and other vendors, literature, online data, patents and other forms of Chemistry data forms of Chemistry data
Building a Structure Centric Community for Chemists
Execution of the Mission Execution of the Mission September 2007September 2007
An online database of nearly 20 million structures An online database of nearly 20 million structures (should be >21 million following the latest (should be >21 million following the latest depositions)depositions)
Systems in place for: Systems in place for: Single structure and data collection depositions (in beta Single structure and data collection depositions (in beta
testing)testing) Association of analytical data with structuresAssociation of analytical data with structures Ability to curate data for each individual recordAbility to curate data for each individual record
Indexing of and Integration to:Indexing of and Integration to: Over >80 individual databasesOver >80 individual databases Patents from the US and European Patent offices Patents from the US and European Patent offices
(SureChem)(SureChem)
Building a Structure Centric Community for Chemists
Execution of the Mission Execution of the Mission September 2007September 2007
Text-based searching of over 50,000 Open Text-based searching of over 50,000 Open Access articles (110,000 have been Access articles (110,000 have been indexed but not online yet. Structure indexed but not online yet. Structure searching is coming)searching is coming)
Over 100,000 identifiers curatedOver 100,000 identifiers curated Average of 1200 unique users per dayAverage of 1200 unique users per day A series of web services for people to A series of web services for people to
access a number of our capabilitiesaccess a number of our capabilities Multiple collaborations now in placeMultiple collaborations now in place
Building a Structure Centric Community for Chemists
Flexible Boolean SearchingFlexible Boolean Searching
Building a Structure Centric Community for Chemists
Flexible Boolean SearchingFlexible Boolean Searching
Building a Structure Centric Community for Chemists
Flexible Boolean SearchingFlexible Boolean Searching
Building a Structure Centric Community for Chemists
Search result: 49 hits in 0.8 Search result: 49 hits in 0.8 secondsseconds
Building a Structure Centric Community for Chemists
Integrated Visualization ToolsIntegrated Visualization Tools
Building a Structure Centric Community for Chemists
Integrated Analytical Data Integrated Analytical Data ManagementManagement
for Public Domain Datafor Public Domain Data
Building a Structure Centric Community for Chemists
Integrated Access to Open Access Integrated Access to Open Access LiteratureLiterature
Text-based searching of over 50,000 Open Access Chemistry Articles
Building a Structure Centric Community for Chemists
External Integrations - GoogleExternal Integrations - Google
Search Across Search Across Google Using Google Using InChI stringInChI string
Building a Structure Centric Community for Chemists
External Integrations – PatentsExternal Integrations – PatentsSurechem PortalSurechem Portal
Building a Structure Centric Community for Chemists
How do people generally use How do people generally use ChemSpider?ChemSpider?
Searching for chemical structures, in rank Searching for chemical structures, in rank order, via:order, via: Trade names, synonyms and registry numbers, . Trade names, synonyms and registry numbers, . Structure identifiers such as SMILES or InChIStructure identifiers such as SMILES or InChI Intrinsic properties: commonly mass-based Intrinsic properties: commonly mass-based
searches executed by mass spectrometristssearches executed by mass spectrometrists Systematic names: IUPAC or CAS Index nameSystematic names: IUPAC or CAS Index name
Structure-based searching of PatentsStructure-based searching of Patents Text-based searching of Open Access articlesText-based searching of Open Access articles Generation of physicochemical propertiesGeneration of physicochemical properties
Building a Structure Centric Community for Chemists
Curators - An Active Curators - An Active CommunityCommunity
Active curation is happening everyday nowActive curation is happening everyday now Roboticized curation is underway – scripting to strip Roboticized curation is underway – scripting to strip
obvious errorsobvious errors Visit the blog posts for detail Visit the blog posts for detail
(www.chemspider.com/blog)(www.chemspider.com/blog)
Building a Structure Centric Community for Chemists
Quality is a Major IssueQuality is a Major Issue
Pubchem structure-identifier pairs are Pubchem structure-identifier pairs are proliferatingproliferating
Care is needed or at least cleansing of the Care is needed or at least cleansing of the datadata
Building a Structure Centric Community for Chemists
Quality is a Major IssueQuality is a Major Issue
Other DatabasesOther Databases……
1-Butyl alcohol , 1-Hydroxybutane , 1-butanol , 1-Butyl alcohol , 1-Hydroxybutane , 1-butanol , Alcool butylique, Butan-1-ol, Butanol-1, Butanolen, Alcool butylique, Butan-1-ol, Butanol-1, Butanolen, Butanolo, Butyl alcohol, Butyl hydroxide, Butanolo, Butyl alcohol, Butyl hydroxide, Butyl Butyl orthotitanate, Butyl titanate, Butyl titanate orthotitanate, Butyl titanate, Butyl titanate (IV), Butyl zirconate(IV), Butyl zirconate,, Butylowy alkohol, Butyric Butylowy alkohol, Butyric alcohol, Butyric or normal primary butyl alcohol, alcohol, Butyric or normal primary butyl alcohol, Hemostyp, Methylolpropane, Propylcarbinol, Hemostyp, Methylolpropane, Propylcarbinol, Propylmethanol, Propylmethanol, Tetrabutoxytitanium, Tetrabutoxytitanium, Tetrabutoxyzirconium, Tetrabutyl Tetrabutoxyzirconium, Tetrabutyl orthotitanate, Tetrabutyl titanate, Tetrabutyl orthotitanate, Tetrabutyl titanate, Tetrabutyl zirconate, Titanium butoxide (Ti), Titanium zirconate, Titanium butoxide (Ti), Titanium tetrabutoxide, Titanium tetrabutylate, Zirconic tetrabutoxide, Titanium tetrabutylate, Zirconic acid butyl ester, Zirconium tetrabutoxideacid butyl ester, Zirconium tetrabutoxide, n-, n-Butan-1-ol, n-Butanol, n-Butanolbutanolen, n-Butyl Butan-1-ol, n-Butanol, n-Butanolbutanolen, n-Butyl alcohol, n-Butylalkohol, propyl carbinolalcohol, n-Butylalkohol, propyl carbinol
Building a Structure Centric Community for Chemists
Quality is a Major IssueQuality is a Major Issue
Building a Structure Centric Community for Chemists
Curating on ChemSpiderCurating on ChemSpider
Building a Structure Centric Community for Chemists
Curating PubChem DataCurating PubChem Data
The PubChem team is not resourced to The PubChem team is not resourced to curate the datacurate the data
The data should be curatedThe data should be curated ChemSpider has created an environment ChemSpider has created an environment
to validate and curate the datato validate and curate the data Curation is underwayCuration is underway We will feed back curated data to We will feed back curated data to
PubChem on an ongoing basisPubChem on an ongoing basis
Building a Structure Centric Community for Chemists
ChemSpider and PubChemChemSpider and PubChem
ChemSpider will deposit our entire ChemSpider will deposit our entire database of structures to PubChem database of structures to PubChem following our latest deposition and following our latest deposition and deduplication cycle (within a month we deduplication cycle (within a month we hope)hope)
ChemSpider is curating data and will ChemSpider is curating data and will submit back to PubChemsubmit back to PubChem
At 9:13am today:At 9:13am today:
Building a Structure Centric Community for Chemists
Online Deposition System in Online Deposition System in BetaBeta
Building a Structure Centric Community for Chemists
Provide Tools for DevelopersProvide Tools for Developers
Building a Structure Centric Community for Chemists
Provide Tools for DevelopersProvide Tools for Developers
Building a Structure Centric Community for Chemists
Targets for 2007Targets for 2007
End of year intentions for ChemSpider includeEnd of year intentions for ChemSpider include Adding more databases to the index Adding more databases to the index Enhance integrations to other structure drawing packagesEnhance integrations to other structure drawing packages Additional property prediction algorithms from partners. Additional property prediction algorithms from partners.
More predicted properties to go online shortly. Calculations More predicted properties to go online shortly. Calculations for >20 million structures is time-consuming!for >20 million structures is time-consuming!
Expand analytical data handling – presently working with a Expand analytical data handling – presently working with a publisher regarding hosting the data for their publicationspublisher regarding hosting the data for their publications
Enhance the Patent integrationEnhance the Patent integration Expand the Open Access article index to >250,000 articlesExpand the Open Access article index to >250,000 articles Make Medline structure searchable by text miningMake Medline structure searchable by text mining
Building a Structure Centric Community for Chemists
Targets for End of 2007Targets for End of 2007
Source funding to continue the ChemSpider Source funding to continue the ChemSpider projectproject
Deliver on projects with collaborators:Deliver on projects with collaborators: ChemModLab with NCSU and NISS for QSAR-based ChemModLab with NCSU and NISS for QSAR-based
virtual screening. ZINC is 4.6 million commercially virtual screening. ZINC is 4.6 million commercially available compounds. ChemSpider has about 10 available compounds. ChemSpider has about 10 million commercially available compounds – 3D million commercially available compounds – 3D optimized structures will be generated shortlyoptimized structures will be generated shortly
Simbiosys has developed groundbreaking Simbiosys has developed groundbreaking technologies in terms of the speed of virtual technologies in terms of the speed of virtual screening by docking against targets. ChemSpider screening by docking against targets. ChemSpider ligands will be used in virtual screensligands will be used in virtual screens
Connectivities between ChemSpider and Chembench Connectivities between ChemSpider and Chembench (Alex Tropsha at UNC Chapel Hill) will be enabled (Alex Tropsha at UNC Chapel Hill) will be enabled
Building a Structure Centric Community for Chemists
Making the Web Structure Making the Web Structure SearchableSearchable
The InChIString and InChIKey will The InChIString and InChIKey will helphelp make make the web structure searchablethe web structure searchable
InChIStrings are not indexed correctly and InChIStrings are not indexed correctly and the shift is to the InChIKeythe shift is to the InChIKey
““Someone” must host the InChIKey look up Someone” must host the InChIKey look up table relating to InChIStringstable relating to InChIStrings
““Someone” must provide scalable online tools Someone” must provide scalable online tools for the capture, databasing and searching of for the capture, databasing and searching of InChIsInChIs
InChIs do NOT make the web substructure or InChIs do NOT make the web substructure or similarity of structure searchable. An index similarity of structure searchable. An index will.will.
Building a Structure Centric Community for Chemists
ConclusionConclusion
ChemSpider is ChemSpider is successfully successfully building a building a structure centric community for chemistsstructure centric community for chemists
Over 1200 chemists per day utilize Over 1200 chemists per day utilize ChemSpider to help answer questions and ChemSpider to help answer questions and solve their problemssolve their problems
A well-defined path forward to enhance A well-defined path forward to enhance the service has been definedthe service has been defined
Building a Structure Centric Community for Chemists
AcknowledgmentsAcknowledgments
Thousands of users for their feedback and Thousands of users for their feedback and ongoing encouragementongoing encouragement
The “naysayers” – criticism, when taken The “naysayers” – criticism, when taken constructively, can drive creative actionsconstructively, can drive creative actions
Our advisory group of scientists, Our advisory group of scientists, specialists and friendsspecialists and friends
The bloggers coming to the ChemSpider The bloggers coming to the ChemSpider Blog and ChemSpider NewsBlog and ChemSpider News www.chemspider.com/blogwww.chemspider.com/blog www.chemspider.com/newswww.chemspider.com/news