How an Online Resource for Chemistry Can Change Our World
-
Upload
antony-williams-chemconnector -
Category
Technology
-
view
1.469 -
download
1
description
Transcript of How an Online Resource for Chemistry Can Change Our World
How an Online Chemistry How an Online Chemistry Resource Resource
Could Change Could Change OurOur World World
Antony WilliamsAntony Williams
Triangle Chromatography Discussion Group,Triangle Chromatography Discussion Group,Raleigh, NC, May 2009Raleigh, NC, May 2009
Building a Structure Centric Community for Chemists
Imagine a time when ….Imagine a time when ….
The internet is searchable by chemical structure and The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar)substructure (e.g.Wikipedia, Google Scholar)
When there is an online database of NMR, IR, MS spectra When there is an online database of NMR, IR, MS spectra and chromatography methods built by available to the and chromatography methods built by available to the communitycommunity
Chemistry articles are indexed and searchable by Chemistry articles are indexed and searchable by “chemistry”“chemistry”
The web is linked together through the “language of The web is linked together through the “language of chemistry”chemistry”
Publicly funded research data can be shared and Publicly funded research data can be shared and discussed in the Open, maybe as Open Notebook Sciencediscussed in the Open, maybe as Open Notebook Science
Cheminformatics has as much of a public face and success Cheminformatics has as much of a public face and success as bioinformatics (Protein DataBank, Genbank, etc)as bioinformatics (Protein DataBank, Genbank, etc)
Building a Structure Centric Community for Chemists
The Language of ChemistryThe Language of Chemistry
My language….My language….
Building a Structure Centric Community for Chemists
And its dialects….And its dialects….
Building a Structure Centric Community for Chemists
As a chemist…As a chemist…
I look for information about I look for information about chemicals/chemistrychemicals/chemistry What is a particular structure ?What is a particular structure ? What alternative names/identifiers?What alternative names/identifiers? Reaction synthesis?Reaction synthesis? Physical properties?Physical properties? Analytical data?Analytical data? Purchase?Purchase? Tell me more?Tell me more? Similar stuff – what other compounds are “like” Similar stuff – what other compounds are “like”
mine?mine?
Building a Structure Centric Community for Chemists
Linked Data CloudLinked Data Cloud
Building a Structure Centric Community for Chemists
Chemistry on the InternetChemistry on the Internet
Much of the information online is Much of the information online is User Beware! User Beware!
The Quality of information is “diverse”The Quality of information is “diverse”
Technologies can “link and connect” information Technologies can “link and connect” information but validation and curation is key to providing but validation and curation is key to providing qualityquality
The LinkedData web is of less value when the The LinkedData web is of less value when the data linked are “wrong”data linked are “wrong”
Building a Structure Centric Community for Chemists
““Good Stuff” Good Stuff” TotallySynthetic.comTotallySynthetic.com
Building a Structure Centric Community for Chemists
PubChemPubChem
Building a Structure Centric Community for Chemists
Questions a chemist might ask…Questions a chemist might ask… What is the melting point of n-butanol? What is the melting point of n-butanol? What is the chemical structure of Xanax?What is the chemical structure of Xanax? Chemically, what is phenolphthalein?Chemically, what is phenolphthalein? What are the stereocenters of cholesterol?What are the stereocenters of cholesterol? Where can I find publications about xylene?Where can I find publications about xylene? What are the different trade names for What are the different trade names for
Ketoconazole?Ketoconazole? What is the NMR spectrum of Aspirin?What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol What are the safety handling issues for Thymol
Blue?Blue?
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Link outsLink outs
Building a Structure Centric Community for Chemists
Complex Data and InformationComplex Data and Information
Building a Structure Centric Community for Chemists
Online Analytical DataOnline Analytical Data
Building a Structure Centric Community for Chemists
Various Searches Various Searches
Structure searchingStructure searching Substructure searchingSubstructure searching Subset searching – choose from 200 data Subset searching – choose from 200 data
sourcessources Property searchingProperty searching
Value for Mass Spectrometrists and Value for Mass Spectrometrists and Chromatographers?Chromatographers?
Building a Structure Centric Community for Chemists
ChemSpider for MS ChemSpider for MS SpectrometristsSpectrometrists
What would an MS spectrometrist want to do?What would an MS spectrometrist want to do? Search the database based on mass (various forms)Search the database based on mass (various forms) Search selected subsets of the database based on massSearch selected subsets of the database based on mass Search based on mass and substructure(s)Search based on mass and substructure(s) Search for structure based on name(s) or database IDsSearch for structure based on name(s) or database IDs Search for structures based on elements/not elementsSearch for structures based on elements/not elements Download the structure/structures in standard formatDownload the structure/structures in standard format Search literature for informationSearch literature for information Identify related data sources – chemical vendors, Identify related data sources – chemical vendors,
pathway databases, etcpathway databases, etc
Building a Structure Centric Community for Chemists
Search Database Based on Search Database Based on MassMass
Building a Structure Centric Community for Chemists
Mass Based Searches?Mass Based Searches?
What compounds have a mass of 300+/-What compounds have a mass of 300+/-0.001?0.001?
Building a Structure Centric Community for Chemists
59 hits/1.3 seconds from 21.5 59 hits/1.3 seconds from 21.5 MILLIONMILLION
Building a Structure Centric Community for Chemists
Substructure and PropertySubstructure and Property
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Elemental ConstraintsElemental Constraints
Building a Structure Centric Community for Chemists
Search based on Data SourcesSearch based on Data Sources
Building a Structure Centric Community for Chemists
Outlinks – to vendors and other Outlinks – to vendors and other databasesdatabases
Example databases of interest to MS Example databases of interest to MS Spectrometrists:Spectrometrists: HMDB – Human Metabolome DatabaseHMDB – Human Metabolome Database
KEGG – Kyoto Encyclopedia of Genes and GenomesKEGG – Kyoto Encyclopedia of Genes and Genomes
BioCyc - collection of Pathway/Genome DatabasesBioCyc - collection of Pathway/Genome Databases
Uni. Minnesota Biodegradation DB - information on Uni. Minnesota Biodegradation DB - information on microbial biocatalytic reactions and biodegradation microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical pathways for primarily xenobiotic, chemical compounds compounds
WikiPathways – new initiative to build crowdsourced WikiPathways – new initiative to build crowdsourced pathway data managementpathway data management
Building a Structure Centric Community for Chemists
Links out to KEGGLinks out to KEGGKyoto Encyclopedia of Genes and Kyoto Encyclopedia of Genes and
Genomes Genomes
Building a Structure Centric Community for Chemists
WikiPathways LinkWikiPathways Link
Building a Structure Centric Community for Chemists
Download Structure(s)Download Structure(s)
Download individual record – molfileDownload individual record – molfile
Download SDF file (group of structures)Download SDF file (group of structures)
Building a Structure Centric Community for Chemists
Web Service IntegrationWeb Service Integration
ChemSpider integration presently ChemSpider integration presently integrated to Bruker, Waters and Thermo integrated to Bruker, Waters and Thermo – more vendors coming…– more vendors coming…
Direct integration to vendor data Direct integration to vendor data processing toolsprocessing tools
Building a Structure Centric Community for Chemists
MassSpec API Web ServicesMassSpec API Web Services
http://http://www.chemspider.com/MassSpecAPI.asmxwww.chemspider.com/MassSpecAPI.asmx
Building a Structure Centric Community for Chemists
Web ServicesWeb Services
Building a Structure Centric Community for Chemists
Test Web Services for Test Web Services for MassSpecMassSpec
http://http://www.chemspider.com/WebServices/WSMwww.chemspider.com/WebServices/WSMassSpecAPIDemo.aspxassSpecAPIDemo.aspx
Building a Structure Centric Community for Chemists
Test resultsTest results
Building a Structure Centric Community for Chemists
Waters IntegrationWaters Integration
Building a Structure Centric Community for Chemists
Waters IntegrationWaters Integration
Building a Structure Centric Community for Chemists
Outlinks from TableOutlinks from Table
Building a Structure Centric Community for Chemists
For Chromatographers?For Chromatographers?
““Structure-based methods” being linkedStructure-based methods” being linked Structure-centric searching of methodsStructure-centric searching of methods We can host chromatograms for displayWe can host chromatograms for display LogPs and LogDs (pH5.5 and 7.4) calculated LogPs and LogDs (pH5.5 and 7.4) calculated
for >21 million compounds using ACD/Labs for >21 million compounds using ACD/Labs softwaresoftware
We’d love to host collections from the column We’d love to host collections from the column vendors!vendors!
[email protected]@chemspider.com
Building a Structure Centric Community for Chemists
From 21.5 MILLION From 21.5 MILLION molecules…molecules…
Data are gathered/deposited from >200 Data are gathered/deposited from >200 data sourcesdata sources Government databasesGovernment databases Chemical vendorsChemical vendors WikipediaWikipedia
There are “imperfections” in all online There are “imperfections” in all online data sourcesdata sources
How bad can it get????How bad can it get????
Building a Structure Centric Community for Chemists
What is “wrong”?What is “wrong”?
Building a Structure Centric Community for Chemists
Quality is a Major Issue- Search Quality is a Major Issue- Search ButanolButanol
OLD EXAMPLE..now fixedOLD EXAMPLE..now fixed
Building a Structure Centric Community for Chemists
VancomycinVancomycin
Who will Who will curate?curate?
PubChem is PubChem is not resourced not resourced to clean these to clean these errors errors
How would How would you clean such you clean such a large a large dataset?dataset?
Building a Structure Centric Community for Chemists
Wikipedia, C&E News, Wikipedia, C&E News, PubChemPubChem
C&E News C&E News (from ACS)(from ACS)
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Does one stereocenter matter?Does one stereocenter matter?ThalidomideThalidomide
Building a Structure Centric Community for Chemists
Question EverythingQuestion Everythingwww.dhmo.orgwww.dhmo.org
Building a Structure Centric Community for Chemists
DailyMedDailyMed
“ “DailyMed provides DailyMed provides high qualityhigh quality information about marketed drugs. information about marketed drugs.
This information includes FDA approved This information includes FDA approved labels (package inserts).”labels (package inserts).”
Building a Structure Centric Community for Chemists
The FDA’s DailyMedThe FDA’s DailyMed
Building a Structure Centric Community for Chemists
Structures on DailyMedStructures on DailyMedPoor RepresentationsPoor Representations
Building a Structure Centric Community for Chemists
Incorrect StructuresIncorrect StructuresScanning (?) IssuesScanning (?) Issues
Building a Structure Centric Community for Chemists
Incorrect StructuresIncorrect Structures
Building a Structure Centric Community for Chemists
Wikis for ScienceWikis for Science
Who in the room hasn’t used Wikipedia?Who in the room hasn’t used Wikipedia?
Is it trustworthy?Is it trustworthy?
What are the advantages and What are the advantages and disadvantages of the Wiki environment?disadvantages of the Wiki environment?
How suitable is it for Chemistry?How suitable is it for Chemistry?
Building a Structure Centric Community for Chemists
CollaborativeCollaborative Knowledge Knowledge Management Management for Chemistsfor Chemists
Building a Structure Centric Community for Chemists
Wikipedia CurationWikipedia Curation
Looking for self-Looking for self-consistency across a consistency across a Wikipedia PageWikipedia Page
Primary key is the article Primary key is the article TITLETITLE
The chemical shown The chemical shown needs to match the titleneeds to match the title
Cyclic self-consistency – Cyclic self-consistency – and decisions must get and decisions must get mademade
Building a Structure Centric Community for Chemists
Taxol on PubChemTaxol on PubChem
Building a Structure Centric Community for Chemists
When are things “wrong”?When are things “wrong”?
Structures have a timeline…..Structures have a timeline…..
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Creating a trusted source…Creating a trusted source…
Small databases can be curated by the Small databases can be curated by the hosts – EPA’s DSSTox, Wikipedia, etc.hosts – EPA’s DSSTox, Wikipedia, etc.
Who will curate an enormous database?Who will curate an enormous database?
Building a Structure Centric Community for Chemists
CrowdsourcingCrowdsourcing
Building a Structure Centric Community for Chemists
Curating ChemSpiderCurating ChemSpider Anyone can “Post Comments” associated Anyone can “Post Comments” associated
with a structure. To curate data we with a structure. To curate data we require login to trackrequire login to track
Building a Structure Centric Community for Chemists
Multi-level Curation and Multi-level Curation and ApprovalApproval
Building a Structure Centric Community for Chemists
ChemMantisChemMantis
ChemChemical ical MMarkup arkup AAnd nd NNomenclature omenclature TTransformation ransformation IIntegrated ntegrated SSystemystem
Building a Structure Centric Community for Chemists
On the fly conversionOn the fly conversion
Building a Structure Centric Community for Chemists
Nature PublicationsNature Publications
Building a Structure Centric Community for Chemists
Integrations Out to Other Integrations Out to Other SourcesSources
Building a Structure Centric Community for Chemists
ReactionsReactions
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereRSC CompoundsRSC Compounds
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereNature ChemistryNature Chemistry
Nature ChemistryNature Chemistry articles articles are annotated to identify all are annotated to identify all of the chemical compounds of the chemical compounds mentioned throughout the mentioned throughout the text. text.
Those compounds are linked Those compounds are linked out to other information out to other information resources including resources including PubChem and PubChem and ChemSpiderChemSpider. .
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereChemMobiChemMobi
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
It Happened in a Basement!!It Happened in a Basement!!
Homebuilt serversHomebuilt servers Cable internetCable internet Software donationsSoftware donations Lots of hard workLots of hard work >8000 users per day>8000 users per day >80,000 transactions per day>80,000 transactions per day
Building a Structure Centric Community for Chemists
And now…And now…
The The Royal Society of ChemistryRoyal Society of Chemistry announced on May announced on May 11th that it has 11th that it has acquired ChemSpideracquired ChemSpider, heralding a , heralding a breakthrough investment for the organisation and for breakthrough investment for the organisation and for the Chemistry Community. This acquisition reflects the Chemistry Community. This acquisition reflects RSC's commitment to providing access to rich RSC's commitment to providing access to rich resources of chemistry data and information. resources of chemistry data and information.