Oops and Downs of Resolving InChIs For the Chemistry Community
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
1.911 -
download
1
description
Transcript of Oops and Downs of Resolving InChIs For the Chemistry Community
Oops and downs of resolving InChIs for the chemistry community
The InChI Has Arrived
My opinions:
The InChI is a crucial part of the future of structure-based relationships on the web
The semantic web of chemistry will sit on the shoulders of InChI until there is something better
InChIs and publishers are already in relationship – publishers who have not adopted will follow
PPP – Perfection vs Productive vs Prolific
The InChI is not perfect
There are limitations but they are acknowledged and in discussion
The InChI is very “productive”
InChIs are showing up in databases, manuscripts, spreadsheets, on publications, in software
A Lot of Variability in InChIs
Source: Unofficial InChI FAQ page
InChIStrings Hash to InChIKeys
HVYWMOMLDIMFJA-DPAQBDIFSA-N
The InChI Resolver
Inchis.chemspider.com
Resolve an InChI or InChIKey
Resolved
Connection Only Resolving
InChIs and Big Databases
There appears to be a bigger is better mentality with online databases
InChI has shown a lot of “overlap” in the ChemSpider database
Distinction : a unique chemical entity versus what it’s meant to be
Some simple examples …
Spot The Difference
Standard InChIKeys
Spot The Difference
55 Hits in 0.08 Seconds
Large Databases Contain Junk
InChI Resolvers will get us back to results but it’s a look up..
There is an enormous need for curation and linking resolved structures to “correct” structures – a manual task
Generate-It
Draw and generate
Generate
All Flavors
Historical and Future InChIs
The Standard InChI removed variability
There will be new variants in the future
There are already millions of historical InChIs “out there”
Resolvers should accommodate historical and future InChIs
In Our Resolver…
On to ChemSpider…
NEW Patents and Pubmed on ChemSpider
InChIs to Patents and Pubmed Articles
But there will be multiple resolvers…
Each publisher, database, scientist can choose not to publish their structures into a centralized database
There are many large online databases. There is no need to merge/mirror them – each can be a resolver
They need to be federated
Many ways to address resolving
Our approach is simple – lookup. We look up the structure. SIMPLE.
NCI/CADD resolver: 69 million structures
Differences
The NCI and ChemSpider Resolvers are “different”
Different databases behind the resolver – Feedback from NCI: “Preliminary results indicate that inchis.chemspider.com can resolve approx. 28% of our structures.”
Our approaches for resolving differ
Some features are different
The InChI Resolver Protocol
There will not be only one InChI Resolver – there will be many Publishers Commercial Databases Free services and resources : PubChem,
ChemSpider, NCI Database, ChEBI
Resolvers will not be mirrors of each other There is no need to mirror when a protocol is in
place
InChI Resolver Protocol
InChI resolving needs to be federated
A common protocol can connect resolvers so that a user gets a complete results set
Individual resolvers can have different capabilities but an agreed common protocol for resolving InChIs
Discuss with us on Google Groups
Draft protocol for ACS Spring 2010 from RSC ChemSpider NCI/CADD PubChem Symyx
Proof of concept hopefully by end of this year for initial feedback (NCI and ChemSpider
Join us at http://tinyurl.com/r7q9zc http://groups.google.com/group/inchiresolverprotocol
InChI trust
The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal Society of Chemistry, Symyx, FIZ-Chemie, Taylor & Francis and OpenEye
In InChIs We Trust
It was said…. “There is a finite, but very small probability of
finding two structures with the same InChIKey.”
The first collision was announced on Sunday by Jonathan Goodman
Spongistatin
Probabilities are what they are…
“The molecule for which a collision has been reported … gives rise to 226 = 67,108,864 possible stereoisomers”
The probability of a clash is low but finite…and it happened.
OR…there may be a bug…work underway
The Future
InChI is here
InChIKeys are proliferating
The need for lookup is inevitable – the need for federated resolvers is obvious
Intention to provide draft resolver protocol by end of year
ACS Spring – unveil proof of concept
Acknowledgments
The InChI “Team” – leadership team, developers, advisors, funders and the community providing feedback
Royal Society of Chemistry