Liberating Laboratory Data - AnIML

22
The Analytical Information Markup Language (AnIML) for instrument data storage and access Stuart J. Chalk Department of Chemistry University of North Florida Jacksonville, FL USA [email protected] Liberating Laboratory Data Day 1

description

Presentation on the use of the Analytical Information Markup Language (AnIML) to store and access scientific data. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)

Transcript of Liberating Laboratory Data - AnIML

Page 1: Liberating Laboratory Data - AnIML

The Analytical InformationMarkup Language (AnIML) for

instrument data storage and access

Stuart J. Chalk

Department of Chemistry

University of North Florida

Jacksonville, FL USA

[email protected]

Liberating Laboratory Data – Day 1

Page 2: Liberating Laboratory Data - AnIML

What does this mean?

In a vendor, platform, and language independent format

Archivable, Authenticated, Provenanced

Datatyped, Qualified (accuracy/precision for numeric data)

Contextualized – annotated with descriptive metadata

Uniquely Referenceable – URI, DOI

Shareable, Searchable, Readable by computers and human

Liberating Laboratory Data

Page 3: Liberating Laboratory Data - AnIML

AnIML is an activity under ASTM subcommittee E13.15 on Analytical Data (http://animl.sourceforge.net/)

Work on AnIML began in 2003

Designed as a replacement for JCAMP-DX (backwards compatible).

Charter: "Develop an analytical data standard that can be used to store data from any analytical instrument"

Task group holds virtual meetings on a monthly basis to develop the specification

Targeted to through ASTM balloting in 2014

AnIML History

Page 4: Liberating Laboratory Data - AnIML

AnIML Schema Structure

Page 5: Liberating Laboratory Data - AnIML

AnIML Structure

Page 6: Liberating Laboratory Data - AnIML

The “Series” element is used to store arrays of data

Can contain many x/y spectra in one data file(good for LC-UV/MS data for instance)

Also used for the chromatogram (time slice) data

Autoincrement Value Set

Typically used for evenly distributed data (e.g. x-axis)

Individual Value Set

Typically used for y-axis data

Encoded Value Set

Base64 encoded binary data (per XML specification)

AnIML Data Structures

Page 7: Liberating Laboratory Data - AnIML

AnIML Data Example

Page 8: Liberating Laboratory Data - AnIML

AnIML Data Example

Page 9: Liberating Laboratory Data - AnIML

AnIML Data Example

Page 10: Liberating Laboratory Data - AnIML

Embedding AnIMLin Other XML Specifications

Page 11: Liberating Laboratory Data - AnIML

AnIML being XML leverages a variety of tools and technologies

Making data in AnIML files accessible can be achieved by using

eXtensible Stylesheet Language (XSL) transformations-> to convert data into different formats-> to process data into results

XPath -> provide unique identifiers/references to data points or data sets

XQuery -> search for particular data with a dataset

Publishing AnIML Stored Data

Page 12: Liberating Laboratory Data - AnIML

eXensible Stylesheet Language (XSL) is an XML standard for conversion of XML encode data to other formats

E.g. HTML, PDF, Javascript Object Notation (JSON) , or even graphics

Scaled Vector Graphics (SVG) is (another!) XML specification for vector graphics

So we can use and XSL Transformation (XSLT) processor (e.g. Saxon) to convert data stored in the AnIML to a graphic representation of the data

XSLT

Page 13: Liberating Laboratory Data - AnIML

XSLT

An XML file that extracts data from another XML document and formats its based on specifications

Returning data in JSON format

{"data":[200.0:.3720,200.5:.3503,201.0:.5042,201.5:.0130, …]}

Page 14: Liberating Laboratory Data - AnIML

XSLT

An XML file that extracts data from another XML document and formats its based on specifications

Returning data in JSON format

Page 15: Liberating Laboratory Data - AnIML

XSLT: AnIML -> SVG

Page 16: Liberating Laboratory Data - AnIML

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/data/xml

Page 17: Liberating Laboratory Data - AnIML

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:AnIML/xml

Page 18: Liberating Laboratory Data - AnIML

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:Result[@name=‘Spectrum’]/xml

Page 19: Liberating Laboratory Data - AnIML

XPath

https://eureka.coas.unf.edu/data/source/exptml:dat1/a:Series[@name=‘Absorbance’]_a:IndividualValueSet_a:F[3]/xml

Page 20: Liberating Laboratory Data - AnIML

XQuery

Page 21: Liberating Laboratory Data - AnIML

AnIML being an XML specification makes it easily readable, archivable, and searchable

The data within an AnIML file can easily be extracted, manipulated and repurposed

With the development of additional XML technologies the options for using and sharing AnIML data will only increase over time

Conclusion