Liberating Laboratory Data - AnIML

Post on 02-Jul-2015

214 views 1 download

description

Presentation on the use of the Analytical Information Markup Language (AnIML) to store and access scientific data. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)

Transcript of Liberating Laboratory Data - AnIML

The Analytical InformationMarkup Language (AnIML) for

instrument data storage and access

Stuart J. Chalk

Department of Chemistry

University of North Florida

Jacksonville, FL USA

schalk@unf.edu

Liberating Laboratory Data – Day 1

What does this mean?

In a vendor, platform, and language independent format

Archivable, Authenticated, Provenanced

Datatyped, Qualified (accuracy/precision for numeric data)

Contextualized – annotated with descriptive metadata

Uniquely Referenceable – URI, DOI

Shareable, Searchable, Readable by computers and human

Liberating Laboratory Data

AnIML is an activity under ASTM subcommittee E13.15 on Analytical Data (http://animl.sourceforge.net/)

Work on AnIML began in 2003

Designed as a replacement for JCAMP-DX (backwards compatible).

Charter: "Develop an analytical data standard that can be used to store data from any analytical instrument"

Task group holds virtual meetings on a monthly basis to develop the specification

Targeted to through ASTM balloting in 2014

AnIML History

AnIML Schema Structure

AnIML Structure

The “Series” element is used to store arrays of data

Can contain many x/y spectra in one data file(good for LC-UV/MS data for instance)

Also used for the chromatogram (time slice) data

Autoincrement Value Set

Typically used for evenly distributed data (e.g. x-axis)

Individual Value Set

Typically used for y-axis data

Encoded Value Set

Base64 encoded binary data (per XML specification)

AnIML Data Structures

AnIML Data Example

AnIML Data Example

AnIML Data Example

Embedding AnIMLin Other XML Specifications

AnIML being XML leverages a variety of tools and technologies

Making data in AnIML files accessible can be achieved by using

eXtensible Stylesheet Language (XSL) transformations-> to convert data into different formats-> to process data into results

XPath -> provide unique identifiers/references to data points or data sets

XQuery -> search for particular data with a dataset

Publishing AnIML Stored Data

eXensible Stylesheet Language (XSL) is an XML standard for conversion of XML encode data to other formats

E.g. HTML, PDF, Javascript Object Notation (JSON) , or even graphics

Scaled Vector Graphics (SVG) is (another!) XML specification for vector graphics

So we can use and XSL Transformation (XSLT) processor (e.g. Saxon) to convert data stored in the AnIML to a graphic representation of the data

XSLT

XSLT

An XML file that extracts data from another XML document and formats its based on specifications

Returning data in JSON format

{"data":[200.0:.3720,200.5:.3503,201.0:.5042,201.5:.0130, …]}

XSLT

An XML file that extracts data from another XML document and formats its based on specifications

Returning data in JSON format

XSLT: AnIML -> SVG

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/data/xml

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:AnIML/xml

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:Result[@name=‘Spectrum’]/xml

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:Series[@name=‘Absorbance’]_a:IndividualValueSet_a:F[3]/xml

XQuery

AnIML being an XML specification makes it easily readable, archivable, and searchable

The data within an AnIML file can easily be extracted, manipulated and repurposed

With the development of additional XML technologies the options for using and sharing AnIML data will only increase over time

Conclusion