Opportunities in chemical structure standardization
-
Upload
valery-tkachenko -
Category
Science
-
view
61 -
download
0
Transcript of Opportunities in chemical structure standardization
Opportunities in Chemical Structure
StandardizationValery Tkachenko
Science Data Software, Rockville, USA
Expanding IUPAC Standards for Chemical InformationEMBL-EBI Workshop, March 20-21st 2017
DIKW workflow
Predictive data models & toolsExperimental Design
Data Analysis and
Modeling
Structured Nanomaterials
DataRepository
Data collection, curation, integration,
and structuring (ontology)
Literature data
Electronic Databases:
Analysis
Text Mining
Processing
Experimental Data
Disease
ExperimentalValidation
Feedback
, new
data
3
Effect
Decision support
Karmann Mills and Anthony HickeyRTI International, RTP, NC 27709andAlex TropshaEshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599
Standards and authorities
We live in hyperconnected World
Data repositories
Fourches, Muratov, Tropsha. Nat Chem Biol. 2015,11(8):535.
How the problem is being solved now
[Very incomplete] list of common problems• Violation of chemical and common sense• Violations of valence bond theory• Unsupported format and chemical model features• Information loss during conversion• Tautomers• Stereochemical issues• Mixtures• Other classes of chemicals (materials, formulations, biologicals, structurally
diverse, etc)• Equivalence/mapping issues• Identifiers/names issues• Etc, etc, etc…
…problems (continued)• Multiple [historical, proprietary, shortcoming] formats
• ChemDraw, ChemSketch, AccelrysDraw• MOL, SDF• SMILES• Identifiers• Names and Synonyms
• Multiple toolkits/models• Open Source (alphabetical)
• CDK• RDKit• Indigo• OpenBabel• Etc…
• Commercial (alphabetical)• CACTVS• ChemAxon• OpenEye• Etc…
• Historical Hysterical software• No [machine-readable] standards• No authorities No coordinated efforts!!!
Solution• Agreed and machine-readable (digital) standards• Open-source (transparent) solution• Organizations AND community support and involvement• Accessible solution• Data triaging at data repositories level• Real-time validation/standardization (API, library, “docker”, etc)
11@gray_alasdair Big Data Integration
OpenPHACTS
OpenPHACTSChemistry Registry System (CRS)
OpenPHACTS CRS shortcomings…• Platform-dependent• Toolkit-dependent (potential licensing issues)• No deployable library• No [convenient] API
…OpenPHACTS CRS1 - ongoing work• Microsoft platform independent
• .NET Core, Python• Linux• NoSQL
• Toolkit independent• Indigo• RDKit (in progress)• CDK (planned)
• Docker image
• RESTful API
1 Was open-sourced and now supported by OpenPHACTS Foundation
CVSP on Jupyter
Meet the Team
Alexandru KorotcovData Science
Rick ZakharovTechnology
Valery TkachenkoSupport
Boris SattarovCheminformatics
Slides: https://www.slideshare.net/valerytkachenko16