BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
OpenPHACTS - Chemistry Platform Update and Learnings
-
Upload
valery-tkachenko -
Category
Data & Analytics
-
view
23 -
download
1
Transcript of OpenPHACTS - Chemistry Platform Update and Learnings
![Page 1: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/1.jpg)
Open PHACTS - Chemistry Platform Update and learnings
Antony Williams and Valery Tkachenko
ORCID ID:0000-0002-2668-4821
![Page 2: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/2.jpg)
@gray_alasdair Big Data Integration 2
OpenPHACTS and CRS Diagram
![Page 3: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/3.jpg)
The Chemical Registration ServiceChemistry processing•Validation•Standardization•Properties generation•Properties retrieval
Export•RDF•SDF
API•Domain-specific searches•Chemical visualization•Properties•Conversions
![Page 4: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/4.jpg)
![Page 5: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/5.jpg)
Subsystems
• “CVSP” (frontend, backend, database)• Compounds (frontend, database)• OpenPHACTS API (frontend, database)• Datasources registry (frontend, database)• Processing farm (optional)
![Page 6: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/6.jpg)
Structure-Based Database linking
• Open PHACTS, and many other projects requiring the linking of structure databases, depend on mappings
• Different databases use different processes for standardization prior at deposition
• Examples: PubChem, EBI databases, ChemSpider, etc.
![Page 7: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/7.jpg)
DrugBank• ~60 records can’t be dearomatized unambiguously
• ~40 records where InChIs did not match structure• 2 records where SMILES, InChI and name did not
match the structure• 7 records with 2 stereo bonds at chiral atoms
DB04283 DB04462
![Page 8: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/8.jpg)
Standardizers• EBI Standardizer:
https://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/
• PubChem Standardizer: https://pubchem.ncbi.nlm.nih.gov/standardize/standardize.cgi
• NCGC Standardizer: https://tripod.nih.gov/?p=61
• The CVSP Standardizer work in Open PHACTS http://cvsp.chemspider.com/
![Page 9: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/9.jpg)
![Page 10: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/10.jpg)
Standardization Rules
• Available from: http://tinyurl.com/hwapem3 • Use the SRS as guidance for standardization• Adjust as necessary to our needs
![Page 11: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/11.jpg)
Nitro groups
![Page 12: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/12.jpg)
Salt and Ionic Bonds
![Page 13: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/13.jpg)
The CVSP Systemhttp://cvsp.chemspider.com
![Page 14: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/14.jpg)
Supports various file formats
![Page 15: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/15.jpg)
Comptox Chemistry DashboardPrior to deposition check a deposition…
![Page 16: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/16.jpg)
>3450 compounds in one SDF
![Page 17: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/17.jpg)
98 Errors, 1571 Warnings
![Page 18: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/18.jpg)
Review Errors
![Page 19: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/19.jpg)
Validation Rule Set
![Page 20: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/20.jpg)
Various Rules Sets Available
![Page 21: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/21.jpg)
CVSP – My own custom rules
![Page 22: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/22.jpg)
ChEMBL Validation Review (of 1.3 million records)• 11,020 records with 4 bonds and zero charge, e.g.
CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine
• 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
![Page 23: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/23.jpg)
Chemical Validation first… Standardization Second• Chemical Validation detects errors –
Standardization FIXES them according to rules
• SMIRKS transformations are based on both InChI Normalization and FDA SRS rules
![Page 24: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/24.jpg)
Standardization SMIRKSExamples of InChI normalization [*;H+:1]>>[*;H:1][O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3][N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2]
Examples of FDA SRS rules[n:1]=[O:2]>>[n+:1][O-:2][*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3][N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5]Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10]([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10]([H,*:12])[N:9]2)[C:3]1=[S:2]
![Page 25: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/25.jpg)
Examples of Standardization
Double bond with adjacent wiggly single bond
Collapser hydrogen atoms with no stereo bonds
ClCl
Cl
NH 2
O
Cl
N
H
H
Cl
H
Cl
O
![Page 26: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/26.jpg)
Examples of Standardization
Remove symmetric stereocenters
Turn off chiral flag if no up or down bonds
Chiral flag is setN H 2
NH 2NH 2
N H 2
![Page 27: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/27.jpg)
Defining a Community Rule Set
• There are multiple standardizers, each with their own rules set
• Can we decide on a default community rules set, like Standard InChI, that could be used by ALL Standardizers?
• A joint meeting between the Research Data Alliance (RDA), IUPAC and ACS Division of Chemical Information discussed the value and possibilities of this approach (July 2016)
![Page 28: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/28.jpg)
EPA is investigating CVSP
• EPA is investigating CVSP as a validation and standardization platform
• Considering the API aspects of CVSP to integrate to our registration system
• CVSP is a reference implementation and “starting point” for a community rules set
![Page 29: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/29.jpg)
CVSP code is now Open Source
• Open Source CVSP code now released• Code is hosted on Open PHACTS Github
https://github.com/openphacts/ops-crs • Valery Tkachenko will offer future support • Hoping for additional community engagement
and support
• Some details of availability….
![Page 30: OpenPHACTS - Chemistry Platform Update and Learnings](https://reader034.fdocuments.in/reader034/viewer/2022052606/588aae6b1a28ab4c308b6b9b/html5/thumbnails/30.jpg)
Virtual Machines
• OPS_FRONT (all websites and API)• OPS_BACK (all heavy-lifting)• OPS_DB (databases)
• VMs are VMware images• Can be converted to other hypervisors