Liberating Laboratory Data - Eureka

16
Eureka Research Workbench: Semantic Capture of the Scientific Process Stuart J. Chalk Department of Chemistry University of North Florida Jacksonville, FL USA [email protected] Liberating Laboratory Data – Day

description

Presentation on the use of the Eureka Research Workbench to store data and scientific workflow information. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)

Transcript of Liberating Laboratory Data - Eureka

Page 1: Liberating Laboratory Data - Eureka

Eureka Research Workbench:Semantic Capture of the

Scientific Process

Stuart J. ChalkDepartment of ChemistryUniversity of North Florida

Jacksonville, FL [email protected]

Liberating Laboratory Data – Day 2

Page 2: Liberating Laboratory Data - Eureka

Data is a fundamental output of science, but… Data is not useful if it does not have context Big data analytics needs detailed, well structured

metadata and relationships to assemble aggregated datasets for useful interpretation

Options LabArchives http://www.labarchives.com eCAT

http://www.researchspace.com/electronic-lab-notebook/ LabTrove http://www.labtrove.org/ Dryad data publishing http://datadryad.org/ or …

Capturing Science Data

Page 3: Liberating Laboratory Data - Eureka

Started in 2006 as an offshoot of getting involved in the Analytical Information Markup Language (AnIML) project

No way to store all research notes in a digital format No way to capture the workflow of scientists Realized writing in a lab notebook is equivalent to

“multi-type” blogging in the digital world How to capture information? Many datatypes -> ExptML How to store files and make them available through web

interface? (Fedora-Commons) How to link data together? RDF (in Fedora-Commons)

Eureka Research Workbench

Page 4: Liberating Laboratory Data - Eureka

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Many datatypes (will expand…)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Communication Customer Data Dataset Definition

Element Equipment Event Experiment Group Project Protocol Quote Report Result

Page 5: Liberating Laboratory Data - Eureka

ExptML Chemical Schema

Page 6: Liberating Laboratory Data - Eureka

ExptML Chemical Schema

Page 7: Liberating Laboratory Data - Eureka

ExptML Chemical Instance

Page 8: Liberating Laboratory Data - Eureka

In computer science and ontology“formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to model a domain and support reasoning about concepts.”*

In essence, an ontology allows us to define the relationships and assertions about concepts

For substances represented in ExptML we define isSubstance (assertion) hasSubstance isSubstanceOf

Related Data - ExptML Ontology

*https://en.wikipedia.org/wiki/Ontology_(information_science)

Page 9: Liberating Laboratory Data - Eureka

ExptML Ontology

Page 10: Liberating Laboratory Data - Eureka

Digital repository software for creating and managing online digital libraries

Stores the ExptML files Stores any other files (PDFs, Images, Word

etc.) Stores relationships as RDF

Version control Checksumming Built in search of content and relationships

Fedora Commons

Page 11: Liberating Laboratory Data - Eureka

Fedora-Commons treats each ExptML file as an object

In the definition of a fedora object the file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships

Each Fedora object can have any number of additional streams for Paper PDFs, product/sample pictures, original file

formats (if a conversion has been done) Video, audio, anything

You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving)

File Storage

Page 12: Liberating Laboratory Data - Eureka

File Storage

Page 13: Liberating Laboratory Data - Eureka

So, finally to the Eureka Research Workbench! Web interface written in PHP using the CakePHP

Framework Communicates with Fedora-Commons API to

create, retrieve, update and delete (CRUD) ExptML and other files

Representational State Transfer (REST) format for URLs E.g. http://web.server/chemicals/view/exptml:chm1

Allows for searching of all files in Fedora Can also search based on relationships Can extract data out of XML files Can gather data from other websites (via API

controller) and add it to ExptML files

Eureka Interface

Page 14: Liberating Laboratory Data - Eureka

Eureka Website – NotebookTy

pic

al th

ings

we r

eco

rdin

our

note

book

Page 15: Liberating Laboratory Data - Eureka

Eureka uses ExptML for representing science data Reliable storage system for ExptML files (Fedora) Method for storage of relationships (RDF in Fedora) Web application to create ExptML files (Eureka) TODO

Provide web functionality to process data Provide mechanism for sharing of data (authenticated) Integration into the RDA model for sharing research data Integrate with many other websites, e.g. ChemSpider Support enlItemManifest and future RDA specifications

Conclusion

Page 16: Liberating Laboratory Data - Eureka

References

Eureka – http://sourceforge.net/projects/eureka

Fedora-Commons – http://fedora-commons.org

XML – http://www.w3.org/standards/xml ExptML – http://exptml.sourceforge.net/ JSON – http://www.json.org/ UnitsML – http://unitsml.nist.gov/ RDF – http://www.w3.org/RDF/ CIR – http://cactus.nci.nih.gov/chemical/

structure RDA – http://rd-alliance.org