Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope...

34
SEMANTiCS, Industry, Vienna 2015,16-17 September Semantics for Integrated Analytical Laboratory Processes The Allotrope Perspective Heiner Oberkampf

Transcript of Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope...

Page 1: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

SEMANTiCS, Industry, Vienna 2015,16-17 September

Semantics for Integrated

Analytical Laboratory Processes

The Allotrope Perspective

Heiner Oberkampf

Page 2: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 2

Agenda

Introduction

Approach and IT-Solution

Allotrope Data Format

Domain Taxonomies

Data Cube Ontology

Integration Projects

Page 3: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 3

Laboratory Analytical Processes

sample data analytical process

Page 4: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 4

High Variability of Result Data

chromatography pH thermogravimetry

HPLC-MS-MS

mass spectroscopy HPLC-MS

cell counter NMR

Page 5: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 5

Laboratory Analytical Processes

application 1

application 2 application 3

result data and

process meta-data

Page 6: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 6

Common Problems

It’s hard to find data

based on intuitive starting

points [e.g. study, project,

analyst, technique]

It’s hard to integrate

data from different

labs instruments, or

online/offline because

the file format is

different

It’s hard to mine a collection of

data because the details and the

context of the experiment is

stored somewhere else

Can’t interpret data later because the context is

incomplete, inconsistent, often free text

Instrument & software

interoperability is

limited…at best

Page 7: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 7

Landscape of Existing Standards

"The nice thing about standards is that

there are so many to choose from."

Andrew S. Tanenbaum

DISCLAIMER

This is work in progress.

It is not a complete list of standards but a tool for research the standards.

Allotrope is investigating numerous standards but his graphic is not intended to represent standards Allotrope is commiting to include in the framework.

UN/CEFACT Core Components Technical

Specification

3.0

Batch ML

W3C

OWL

2.0

ISO

ISO 11179 (Metadata Registry)

1999

ISO 19763 (Metamodel

Interoperability)

2013RDF

1.0

SKOS

2012

OMG

Allotrope

Foundation

Common Warehouse Metamodel

1.1

2003

Common Terminology Services 2

1.1

2013

ISO 25694 (Thesauri)

Univeral Modeling Language

2.4.1

2012

ASTM

AnIML

2.0

HL7

HL7

ISO 12000 (MARTIF)

MESA

ISO 19773 (Metadata Registry

Modules)

IETF

RFC 2421 (Voice Profile)

2

1998

ISO 1087 (Terminology

Vocabulary)

2000

ISO 11404 (General Purpose

Datatypes)

2007

ISO 20944 (MDRIB)

2013

UPU S42-1 (Postal address

components)

2003

ISO 2832 (IT Vocabulary)

1996-2000

UPU

ISO 9899 (Programming

Languages C)

1999

ISO 9945 (Filenames)

RFC 3986 (URI)

2005

ISO 10646 (Unicode)

ISO 646 (IA5 character code)

ISO 19107 (Geographic

Information)

ISO 16684-1 (XMP)

2012

Adobe

ISO 639 (Language Codes)

ISO 3166 (Country Codes)

RFC 2046 (MIME Types)

RFC 3066 (Language Codes)

OASIS

ebXML Registry Information

Model 2

3.0

2005

ebXML Registry Services

Specification

2.0

2001

genericode

1.0

2007

RFC 2119 (Requirement

Keywords)

1997

CMIS

1.1

2012

RFC 2616 (HTTP)

1.1

1999

RFC 3023 (XML Media Types)

2001 RFC 2045 (MIME Format)

RFC 4287 (Atom Syndication)

RFC 5023 (Atom Publishing)

RFC 4918 (WebDAV)

XML Schema Datatypes

2004

OData

4.0

ebXML RegRep

4.0

2012

ISO 15000-3 (ebRIM)

2004

XPath 2.0

2.0

2007

XMLDSig

2001

XLink 1.1

1.1

1999

SOAP 1.2

1.2

2003

ISO 19915 (Geographic

Information Metadata)

ISO 19119 (Geographic

Information Services)

2005

LC

MARC 21 XML Schema

1.2

2009

MIX

2.0

2006

PREMIS

2.2

2012

NISO

Metadata Object Description

Standard

3.5

2013

Metadata Authority Description

Standard

2.0

2012

ISO 25577 (Information and

Documentation - MarcXchange)

ISO 20775 (Information and

Documentation - Schema for

Holdings Information)

searchRetrieve

1.0

2013

Search/Retrieval via URL

2.0

Contextual Query Language

1.2

Dublin Core Metadata Element

Set

1.1

UKOLN

Encoded Archival Description

2002

2002

Text Encoding Initiative

DDI Codebook

2.5

OAI Protocol for Metadata

Harvesting

2.0

2002

OAI

OAI Object Reuse and Exchange

1.0

2008

SPARQL

1.1

2013

ISO 704 (Terminology - Principles

and methods)

2000

UNECE

ISO 19504 (Common Warehouse

Metamodel)

Statistical Data and Metadata

Exchange

2.1

2011

Common Metadata Framework

DDI Alliance

DDI Lifecycle

3.1

UNSC

EDIFACT

Meta Object Facility

1.4.1

2005

Ontology Definition Metamodel

1.0

2009

Information Management

Metamodel

UML Profile & Metamodel for

Services

1.0.1

2012

Semantics of Business Vocabulary

and Business Rules

1.2

2013

ISO 6093 (Number Namespace)

Metadata Encoding &

Transmission Standard

1.10

2013

ISO 15000-4 (ebRS)

2004

ISO 15489 (Records

Management)

2001

ISO 23081 (Metadata for records)

2006

ISO 16363 (Audit and Certification

of Trustworthy Digital Repositories)

2011

ISO 14721 (OAIS)

2012

Dublin Core

Metadata

Initiative

ISO 15836 (DCMES)

SWORD

2.0

2008

JISC

BagIt

ARK Identifiers

ISO 26324 (Digital Object

Identifier)

2012

RFC 3652 (Handle System

Protocol)

2.1

2003

RFC 3650 (Handle System

Overview)

2003

RFC 3651 (Handle System

Namespace and Service

Definition)

2003

ISO 13120 (ClamML)

2013

ISO 27951 (CTS1)

2009

ISO 27527 (Provider

Identification)

2010

ISO 27932 (HL7 Clinical

Document Architecture)

2009

ISO 27931 (HL7)

2009

ISO 17115 (Vocabulary for

terminological systems)

2007

LMER

1.2

DNB

RFC 2141 (URN Syntax)

1997

RFC 1737 (URN Requirements)

1994

RFC 4122 (UUID URN

Namespace)

2005

ISO 20652 (PAIMAS)

2006

IMS Content Packaging

1.2

IMS Global

Z39.50 (Information Retrieval)

4

2003

ISO 2709 (Format for information

exchange)

2008

MARC 21

EAD

2002

FOAF Vocabulary

0.99

2014

FOAF Project

RDF Best Practices

CoolURIs

RDF Vocabulary Description

Language

1.0

2004

Extensible Resource Identifier

2.0

2005

RFC 2234 (ABNF)

1997

RFC 3987 (IRI)

2005

RFC 3305 (URI,URL,URN

Clarifications)

2002

RFC 2396 (URI)

1998

XRI Data Interchange

2.0

2005

ISO 14533-2 (XAdES)

2012

Canonical XML

1.0

2001

Universal Business Language

2.1

2013

ISO 14662 (Open-edi)

2010

ISO 15000-5 (CCTS)

2005

Z39.88 (OpenURL)

1

2004

Z39.85 (DCMES)

1

2001

ISO 8601 (Dates and Times)

2000

ISO 62264 (B2MML)

2003-2008

ISA 95

2001-2005

ISA 88

ANSI

ISO 21000-2 (MPEG-21 DID)

2005

ISO 21000-6 (MPEG-21 RDD)

2004

ISO 21000-7 (MPEG-21 DIA)

2007

ISO 21000-9 (MPEG-21 Fileformat)

2005

ISO 21000-18 (MPEG-21

Streaming)

2007

ISO 14496-12 (base media fi le

format)

2012

RFC 6481(Codecs)

2011

ISO 21000-3 (MPEG-21 DII)

2003

TIFF

6.0

1992

ISO 15444-1 (JPEG2000)

2004

JPEG

UnitsML

1.0

2011

NIST

hData

1.0

2013

RLUS

1.0.1

2011

LECIS

1.0

2003

ISO 21090 (Health informatics

data types)

IHE

XDS

SVSXUA

SAML

2.0

2008 XACML

3.0

2013

ASTM E1986 (Access Privileges to

Health Info)

2013

ASTM E1869 (Confidentiality,

Privacy, Access and Data Security

)

2010

ISO 19005-1b (PDF/A)

CDA

2

2008

ISO 19510 (BPMN 2.0)

2013

BPMN

2.0.1

2011

SAA

CDISC

BRIDG

3.2

Define-XML

2.0

2013

ADaM

2.1

SDM-XML

1.0

CDISC-ODM

1.3.2

SEND

3.0

LAB

1.0.1

ISO 28500 (WARC)

2009

RFC 3629 (UTF-8)

2003

ISO 17025 (Competence of

laboratories)

2005

ISO W3C

IE TF

OASIS

OMG

LC

CDISC

NISO

OAI

Page 8: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 8

Allotrope Data Format

Page 9: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 9

Allotrope Foundation

•Subject Matter Experts

•Project Funding

Member

Companies

•Project Management

•Legal & Logistical Support

Secretariat

•Framework Development

•Technical Leadership

Professional

Software Firm

•Requirements & Specifications

•Contributions, PoC Applications

Partner Network

Page 10: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 10

Allotrope Foundation

•Subject Matter Experts

•Project Funding

Member

Companies

•Project Management

•Legal & Logistical Support

Secretariat

•Framework Development

•Technical Leadership

Professional

Software Firm

•Requirements & Specifications

•Contributions, PoC Applications

Partner Network

AbbVie

Amgen

Baxter

Bayer

Biogen

Boehringer Ingelheim

Bristol-Myers Squibb

Eli Lilly

Genentech/Roche

GlaxoSmithKline

Merck & Co

Pfizer

ACD/Labs

Agilent

BIOVIA

BSSN

Erasmus MC

IDBS

Mestrelab Research

Mettler Toledo

Persistent

Riffyn

Sartorius

Shimadzu

Thermo Scientific

Univ. Southampton

Waters

Page 11: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 11

Allotrope Data Format (ADF)

Data Description

RDF Model

Data Cubes

Universal data container

Data Package

Virtual file system *

Contains:

• Method, instrument, sample,

process, result, etc.

• Data cube metadata

• Binary file metadata

• …

Analytical data represented by

one- or multidimensional arrays.

HDF5

Platform Independent File Format

Allotrope Data Format

* Use is optional

Analytical data represented by

arbitrary formats, incl. native

instrument formats, images, pdf,

video, etc.

Specifically designed to store and

organize large amounts of numerical

data.

Page 12: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 12

API Stack

Allotrope Framework provides APIs to read and write data

contained in ADF

Developers do not have to concern themselves with RDF,

SPARQL, semantics or complex graph patterns

Platform independent file format

(HDF5)

Data Package API Data Cube API

Data Description API

(Apache Jena)

Analytical Data API

Taxonom

ies

Triple Store API

Taxonom

ies

Page 13: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 13

Allotrope Foundation Taxonomies (AFT)

Page 14: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 14

Scope and Current Status

Implemented analytical

techniques:

Small molecules

gas chromatography

Karl Fischer

liquid chromatography

mass spectrometry

nuclear magnetic resonance

spectroscopy

thermogravimetric analysis

ultra violet spectroscopy

Large molecules

capillary electrophoresis

cell counter

cell culture analyzer

blood gas analysis

Both

balance

pH

562

168

2272

283

Number of classes:

Page 15: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 15

Reused Vocabularies and Ontologies

Used:

RDFS, OWL, SKOS

Shape Constraint Language (SHACL)

Directly imported:

Quantities, Units, Dimensions and Data Types Ontologies (QUDT)

The W3C RDF Data Cube Vocabulary (QB)

Partly reused definitions:

Chemical Methods Ontology (CHMO)

Proteomics Standards Initiative – Mass Spectrometry (PSI-MS)

International Union of Pure and Applied Chemistry (IUPAC)

Page 16: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 16

Analytical Workflow

Page 17: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 17

Analytical Workflow

The basic analytical workflow and data flow gets standardized

Page 18: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 18

Process

Page 19: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 19

Result

n-dimensional result

data, is represented

through a qb:DataSet

Page 20: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 20

Example: Mass Spectrum

Data set of rank 2.

Additional dimensions:

• sample

• retention time

• device

• …

Meta data is expressed in RDF.

Numeric data is natively

represented in HDF5.

mass

intensity

af-m:AFM_0000350

af-r:A

FR_0

000495

Page 21: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 21

ADF Data Cube Ontology

ADF Data Cube API

HDF5

ADF Data Cube Ontology

RDF Data Cube

Vocabulary

HDF5 Ontology

ADF-HDF5 Mapping

Create and access data cubes.

Extends the RDF Data Cube

Vocabulary by scales, slabs, order

functions and complex data types.

Mapping between RDF meta data

descriptions and description of

physical storage in HDF5.

Vocabulary of HDF5 entities and

data types.

Platform independent file format.

Page 22: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 22

ADF Data Cube Ontology

W3C: RDF Data

Cube Vocabulary

HDF5 Ontology

W3C: RDF, OWL, SHACL

ADF Data Cube Ontology ADF-HDF5 Mapping

Page 23: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 23

ADF Data Cube Ontology

Data Slabs:

Selections on Components

Page 24: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 24

ADF Data Cube Ontology

Nominal Scale: sample, run …

Ordinal Scale: sample index, quality (++,+,o,-,--) ...

Interval Scale: temperature, date time …

Ratio Scale: mass, duration …

Page 25: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 25

ADF Data Cube Ontology

Order Functions:

Required for range selections

Page 26: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 26

ADF Data Cube Ontology

HDF Mapping:

Required to map the

data structure from

functional to physical

perspective.

Page 27: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 27

ADF Data Cube Ontology

Complex Data Types:

Required mainly for measurements

Page 28: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 28

Complex Data Types

weight (mg)

1020

655

weight

1.020 g

655 mg

weight (mg)

1020 +/- 15

655 +/- 12

weight

tare: 25.3332 +/- 0.2 g

net: 20.219 +/- 0.2 g

Complex Data types are expressed using the Shapes Constraint

Language (SHACL).

https://w3c.github.io/data-shapes/shacl/

Page 29: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 29

Integration Projects

Page 30: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 30

Company 1

Reference Data Project

Data Lake Project

Lab

Execution

System

Instruments

(multiple)

Data Lake

(Hadoop)

ADF

(multiple)

AF

Taxonomies

Page 31: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 31

Company 2

Analytical Chemistry in Discovery

Sample

Queue

Analytical

Data Review ADF HPLC-MS

ADF Methods

MS

HPLC

Page 32: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 32

Company 3

Stability and Release Testing

Manufacturing Domain

ADF HPLC-UV

HPLC-UV

Balance Electronic

Lab

Notebook

ADF Methods

Page 33: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 33

Conclusion

Why Semantics?

Good framework for standardized but extendable data

descriptions which are needed to realize the potential of the

available data.

Linked Data allows to relate information stored in ADF with

additional context: e.g. materials, devices, chemicals,

processes, locations etc.

Initially:

Experiments for

approval for drugs.

Today:

Experiments generate data

that can be used in many

different contexts.

Page 34: Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – the Allotrope Perspective

slide 34

Questions?

Heiner Oberkampf

[email protected]

www.osthus.com

Allotrope Foundation:

www.allotrope.org