An architecture for an Open Science molecular compound database
-
Upload
egon-willighagen -
Category
Education
-
view
107 -
download
1
description
Transcript of An architecture for an Open Science molecular compound database
![Page 1: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/1.jpg)
Department of Bioinformatics - BiGCaT 1
An architecture for anOpen Sciencemolecular compound database
Egon Willighagen, @egonwillighagenDept. of Bioinformatics - BiGCaT - Maastricht University
orcid.org/0000-0001-7542-0286
ACS New Orleans, 9 April 2013, #ACSNola
![Page 2: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/2.jpg)
Department of Bioinformatics - BiGCaT 2
This session: Public Databases ...
• Public: what's that?– free access?– redistribute?–Modify?
• BTW, what is “Open Access” ???
![Page 3: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/3.jpg)
Department of Bioinformatics - BiGCaT 3
This session: Serving the community...
• Service–What do people want?–Do they know what is possible?
• Community–Who are they? Personas!→–Usability must include learnability
![Page 4: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/4.jpg)
Department of Bioinformatics - BiGCaT 4
Personas
• Not every scientist is alike• You cannot and must not target one
user• Instead, target at least 2 different
users, particularly:–The hacker doing all the actual
bioinformatics in the lab–The professor who has too little time to
understand things outside his narrow field
![Page 5: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/5.jpg)
Department of Bioinformatics - BiGCaT 5
Reason #1: Bioclipse decision support
Spjuth, O. et al. JCIM 2011 51(8):1840-1847.
![Page 6: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/6.jpg)
Department of Bioinformatics - BiGCaT 6
Data #1: Linked Open Drug Data
M. Samwald, et al, Linked open drug data for pharmaceutical research and development, 2011, JChemInf.
![Page 7: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/7.jpg)
Department of Bioinformatics - BiGCaT 7
Data^2: Linked Open Data
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Sept 2011, CC-BY-SA.
![Page 8: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/8.jpg)
Department of Bioinformatics - BiGCaT 8
Linked Open Data in the Life Sciences
![Page 9: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/9.jpg)
Department of Bioinformatics - BiGCaT 9
WikiPathways
Pico, AR et al. PLoS biology 6.7 (2008): e184.
![Page 10: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/10.jpg)
Department of Bioinformatics - BiGCaT 10
PathVisio: Pathway Analysis
Van Iersel, M. et al. BMC Bioinfo. 2008 9(1):399.
![Page 11: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/11.jpg)
Department of Bioinformatics - BiGCaT 11
Reason #2: Publishing
• Journals will increasingly require data deposition–e.g. BioMed Central:
![Page 12: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/12.jpg)
Department of Bioinformatics - BiGCaT 12
Needs
• We must propagate rights–whether open or not!
• We must make things explicit–e.g. by using semantics–e.g. by using the InChI
![Page 13: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/13.jpg)
Department of Bioinformatics - BiGCaT 13
Tool #1: licensing
![Page 14: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/14.jpg)
Department of Bioinformatics - BiGCaT 14
Open Data #1: crystallography
![Page 15: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/15.jpg)
Department of Bioinformatics - BiGCaT 15
Open Data #2: Open Notebook Science
![Page 16: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/16.jpg)
Department of Bioinformatics - BiGCaT 16
Open Data #3: CrystalEye
![Page 17: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/17.jpg)
Department of Bioinformatics - BiGCaT 17
Licensing Open not Required→
• But not providing info is a killer–no, not really because
no scientist seems to care
–yes, because how will a machine do? Think scalability and massive data integration efforts
![Page 18: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/18.jpg)
Department of Bioinformatics - BiGCaT 18
Why does explicit licensing matter?
Because when there is a fire, you want immediate access to the fire hose. You do not want to wait for permission from the mayor.
Because when you like to validate your scientific results, you want immediate access to related data. You do not want to wait for permission from that professor who is on a conference tour for the next 4 weeks. You must have an immediate answer, whatever it is.
![Page 19: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/19.jpg)
Department of Bioinformatics - BiGCaT 19
Tool #2: Semantic Web to the rescue
• Allows provenance–provide where data came from– tells us our rights
![Page 20: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/20.jpg)
Department of Bioinformatics - BiGCaT 20
App #1: Spidering the semantic web
Spjuth, O et al. JChemInf 2013 5:14.
![Page 21: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/21.jpg)
Department of Bioinformatics - BiGCaT 21
App #2: Making a web
http://rdf.openmolecules.net/
![Page 22: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/22.jpg)
Department of Bioinformatics - BiGCaT 22
App #3: Open PHACTS Explorer
http://www.openphacts.org/ → room 349, 2:20pm
![Page 23: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/23.jpg)
Department of Bioinformatics - BiGCaT 23
How #1: RDF Graphs
![Page 24: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/24.jpg)
Department of Bioinformatics - BiGCaT 24
How #1: RDF Graphs
PREFIX cheminf: <http://semanticscience.org/resource/>
SELECT ?graph ?p ?o WHERE { GRAPH ?graph { ?mol cheminf:CHEMINF_000200 [ a cheminf:CHEMINF_000059 ; cheminf:SIO_000300 "$inchikey" ] ; ?p ?o . }}
![Page 25: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/25.jpg)
Department of Bioinformatics - BiGCaT 25
NanoPub.org
![Page 26: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/26.jpg)
Department of Bioinformatics - BiGCaT 26
Graph output
orcid.org/0000-0001-7542-0286
![Page 27: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/27.jpg)
Department of Bioinformatics - BiGCaT 27
Is that it?!? Just an architecture??
Yes, but a simple and flexible one. Keep an eye out on my blog. This will happen in the next few months:
1. Aggregate all CCZero/PDDL data around chemical properties
1.Open Notebook Science (solubility, melting point)
2.ChemPedia
3.Crystallography (COD, CrystalEye)
4....
2. Calculate molecular properties with the CDK (and release as CCZero)
3. Host on http://linkedchemistry.info/chembox
![Page 28: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/28.jpg)
Department of Bioinformatics - BiGCaT 28
CHEMINF ontology
orcid.org/0000-0001-7542-0286
Hastings, J. et al. PLoS ONE 2011 6(10):e25513.
![Page 29: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/29.jpg)
Department of Bioinformatics - BiGCaT 29
Architecture
Triple Store(e.g. Virtuoso)
Web server(HTML / RDF)
• Graphs• Explicit license
info• InChI/FixedH
![Page 30: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/30.jpg)
Department of Bioinformatics - BiGCaT 30
/FixedH ?!?!
![Page 31: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/31.jpg)
Department of Bioinformatics - BiGCaT 31
Conclusions & Outlook
• We must propagate rights–whether open or not!
• We must make things explicit–e.g. by using semantics–e.g. by using the InChI with FixedH
![Page 32: An architecture for an Open Science molecular compound database](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c673c74a7959d4168b457f/html5/thumbnails/32.jpg)
Department of Bioinformatics - BiGCaT 32
More information
• @egonwillighagen• http://chem-bla-ics.blogspot.com/• http://egonw.github.com/
• http://orcid.org/0000-0001-7542-0286