Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical...

22
Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting

Transcript of Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical...

Page 1: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Use of Data Standards and Metadata in Information Exchange

Rachel UphillFrom Big Data to ChemicalInformation Meeting

Page 2: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

2Presentation title in footer

Use of Data Standards and Metadata in Information Exchange

• Understanding how the plethora of new data and improved analytical techniques can enhance future innovation and feed the drug-development pipeline

• Implementing ontologies, standards, strategies and collaborations to enhance products and provide wider value

• Discovering what your peers are prioritizing as part of their own Big Data strategies

00 Month 0000

Page 3: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Big Data in Drug Discovery

25 Million Citations

Over 16,000 Organisms

20,0002 Interactions

>1 million gene expression profiles

HTS: 1 million reactions/day

Over 40000 metabolites

186,000 trials

Page 4: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Complex and Disparate

Increasing Data Size Increasing Data Dimensionality

+ =

Increasing Complexity

Data points are growing rapidly and endpoints are unclear Disparate datasets Data integration through increased use of Contract Research Organisations

(CRO’s) Complex analytics required on ever increasing data

Page 5: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

(MDM)

Vision Strategy Plan Execute Summary Endpoint

MDM Delivers one version of the truth across each of the integrated solutions, made easier through the use of standards and ontologies/metadata

(MDM)

Metadata

MetadataResults

ProjectExperiment Protocol

Substance

Integrated systems and re-use of information

Planning and designing our projects and study protocols will be simpler leading to more time to focus on scientific aspects

Entering information once and making it more accessible across the organisation

Facilitating enterprise working & decision-making

Bringing the regions “closer” and collaborating further with our scientists

Page 6: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Starting out on the journey …

• To maximise the value of Pharma R&D data there are some key disciplines that IT have been investing in:

– Blue Printing, understand what you’ve got and what you need• If you were building a house you would first start with the plans

– Stewardship & Governance, who it matters to in the business if it’s wrong• You’d consult with the planning authorities to make sure everything was in acceptance, if it wasn’t you’d get it corrected

– Quality, how you know it’s wrong• You wouldn’t use a industrial boiler to heat your house

– Master & Reference data management, single source of the truth• You’d check with the land registry to make sure the boundaries of the property were accurate before building an extension

– Standards, understand what you have been asked for• You’d consult the building regulations and make sure all documents that are created containing the required data

– Search and Analytics, You’d only trust your search results and analytical answers once you knew all these were correct• You’d build the chimney, once you’d completed the foundations and structure of the building

6

Page 7: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Information Blueprinting

• What is an Information Blueprint– Documentation of the information landscape illustrating the flow and structure of information at a level of detail appropriate for the audience.

• Modelling the key business processes, understand the inputs and outputs of those processes and identifying the upstream suppliers of data and the down stream consumers

• Modelling the high-level information concepts for a broad audience and the detailed data structures for a technical development and data governance audience.

• Identifying transactional use of data as well as secondary reporting and analytical uses of data.

– Information Blueprints are collections of models created using a number of modelling techniques that document the business processes, data structures and systems landscape.

• Major benefits of an Information Blueprint– Provide a business process based view of data usage and information requirements. Highlights gaps in information provision and enables impact analysis of

changes to the information supply chain.

– Provides the framework for assessing and influencing business data quality.

– Facilitates communication within the business and between the business and IT.

– Encourages the development and use of common terminology by clarifying definitions & synonyms.

– Provides the common business understanding for use and re-use across IT systems

Information Blueprinting as a Service 7

Page 8: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

What Do We Mean By Stewardship And Governance?

• Stewardship … “formalizing accountability for the management of data resources for a subset of enterprise data.” (1)

• Data Governance … “the execution and enforcement of authority over the management of data assets and the performance of data functions.” (1)

• Master Data Management … the people, processes and technology that ensure a single managed view of critical enterprise information.

8(1) “Non-Invasive Data Governance” © .Copyright © 2008 Robert S. Seiner – KIK Consulting & Educational Services/TDAN.com

Information has value and purpose beyond a single use, system or report. We need to support this with elements of

good Stewardship and Governance.

Page 9: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Data Governance and Stewardship

Good Data Stewardship and Governance encompasses:

• Capability to effectively find, understand and use data and document

• Data standards – the nature of our standards, how we wish to apply them in GSK

• Business rules and processes to support the creation and management of data and documents

• Data disposition – business rules for versions retained, storage, labelling, tracking of data and documents and compliance to these rules

• Business model to support data stewardship and increase the capability for data stewardship within the business lines – responsibilities, resource, utilities

9

Page 10: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Data Quality

10

Page 11: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Master Data Management

11

Page 12: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Reference Data Management

• An ontology is a set of related vocabulary/taxonomies specific to an area enabling Knowledge Management allowing us to create a hierarchy of related terms

• When implementing ontologies need to understand the model, governance/best practice and quality of entries

• Allow us to link disparate data sets together through related terminologies

• Easily extensible and can be embedded within standards and frameworks– Allows for organisational mappings

• Relevant Pharma Ontology Usages– Reactome, available in an RDF format allowing organisations to link pathway information utilised in Metabolomics and relate disease,

target and chemical information

– Allotrope, affiliation of ontologies associated with the analytical chemistry space, for example equipment

Vocabularies, Taxonomies and Ontologies

Page 13: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Standards and Frameworks

• For organisations to be able to exchange data not only do we need to agree on the structure but what is meant by the structure and the business rules associated with the data

• To support this we can use the concepts we have already discussed Information Blueprints, Stewardship & Governance and Data Quality

• The additional aspect and probably most important is that of our Master and Reference data where our Metadata will be held, the drop down lists in our everyday applications

• To improve the ease associated with data exchange we need to agree on terminology and if we can’t them support mappings/synonyms to enable us to integrate datasets at a later point that are then embedded within the use of the standard

• In the growing world of information exchange and data types there is an ever increasing number of standards and frameworks to implement these

– ISA-88 and ISA-95, associated .xsd formats BatchML and B2MML

– Pistoia, HELM, Standard Metadata, Ontology, Standard Data Warehouse projects

– Allotrope, Analytical Chemistry Framework

Page 14: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

14

Allotrope : Advanced Data Design for Chemical R&D

© 2014 Allotrope Foundation

Document Preparation

DataErrors

DataExchange

DataManagement

RegulatoryCompliance

Innovation Constrained

DataSilos

Root CauseCurrent Software Environment

Incomplete, Incompatible

Software

No Standard File Formats

Inconsistent Metadata

Gaps, Complexity, incompatible software

Effect

Outcomes

Automated Documents

EliminateData Errors

Fast Data Exchange

Better DataManagement

Innovative Ecosystem

RegulatoryCompliance

EliminateData Silos

Allotrope Foundation Framework

Reusable Software

Components

Open Document Standard

Open Metadata

Repository

Efficient, Innovative, Powerful Software

Fix the Root Cause

Scientists focus on science

Page 15: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

15

Drug Development

Doctor/Patient

Clinical Data

FormsClinical

Outcome

HL7

HL7: eStability

eCTD

CDISC

Analytical CMC

CDER Data Standard

Allotrope Framework

Class Libraries

Metadata Repository

WorkflowAutomation

InformationAccess

Data Standards

Archiving

Service Standards

Allotrope Framework addresses the gap in standards for CMC analytical data

Data prepared for submission in standard

formatData ReviewData EntryAnalytical Testing

Page 16: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

UN/CEFACT

W3CISO

OMGAllotrope Foundation

ASTM

HL7

MESA

IETF

UPU

Adobe

OASIS

LC

NISOUKOLN

OAI

UNECE

DDI Alliance

UNSC

Dublin Core

Metadata Initiative

JISC

DNB

IMS Global

FOAF Project

ANSI

JPEG

NIST

IHE

SAA

CDISC

16

The landspace of standards potential useful for the Framework

Page 17: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

"The nice thing about standards is that there are so many to choose from."

Andrew S. Tanenbaum

DISCLAIMER

This is work in progress.

It is not a complete list of standards but a tool for research the standards.

Allotrope is investigating numerous standards but his graphic is not intended to represent standards Allotrope is commiting to include in the framework.

UN/CEFACT Core Components Technical Specification

3.0

Batch ML

W3C

OWL2.0

ISO

ISO 11179 (Metadata Registry)1999

ISO 19763 (Metamodel Interoperability)

2013RDF1.0

SKOS2012

OMG

Allotrope Foundation

Common Warehouse Metamodel1.1

2003

Common Terminology Services 21.1

2013

ISO 25694 (Thesauri)

Univeral Modeling Language2.4.12012

ASTM

AnIML2.0

HL7

HL7

ISO 12000 (MARTIF)

MESA

ISO 19773 (Metadata Registry Modules)

IETF

RFC 2421 (Voice Profile)2

1998

ISO 1087 (Terminology Vocabulary)

2000

ISO 11404 (General Purpose Datatypes)

2007

ISO 20944 (MDRIB)2013

UPU S42-1 (Postal address components)

2003

ISO 2832 (IT Vocabulary)1996-2000

UPU

ISO 9899 (Programming Languages C)

1999

ISO 9945 (Filenames)

RFC 3986 (URI)2005

ISO 10646 (Unicode)

ISO 646 (IA5 character code)

ISO 19107 (Geographic Information)

ISO 16684-1 (XMP)2012

Adobe

ISO 639 (Language Codes)

ISO 3166 (Country Codes)

RFC 2046 (MIME Types)

RFC 3066 (Language Codes)

OASIS

ebXML Registry Information Model 2

3.02005

ebXML Registry Services Specification

2.02001

genericode1.0

2007

RFC 2119 (Requirement Keywords)

1997

CMIS1.1

2012

RFC 2616 (HTTP)1.1

1999

RFC 3023 (XML Media Types)2001 RFC 2045 (MIME Format)

RFC 4287 (Atom Syndication)

RFC 5023 (Atom Publishing)

RFC 4918 (WebDAV)

XML Schema Datatypes2004

OData4.0

ebXML RegRep4.0

2012

ISO 15000-3 (ebRIM)2004

XPath 2.02.0

2007

XMLDSig2001

XLink 1.11.1

1999

SOAP 1.21.2

2003

ISO 19915 (Geographic Information Metadata)

ISO 19119 (Geographic Information Services)

2005

LC

MARC 21 XML Schema1.2

2009

MIX2.0

2006

PREMIS2.2

2012

NISO

Metadata Object Description Standard

3.52013

Metadata Authority Description Standard

2.02012

ISO 25577 (Information and Documentation - MarcXchange)

ISO 20775 (Information and Documentation - Schema for

Holdings Information)

searchRetrieve1.0

2013

Search/Retrieval via URL2.0

Contextual Query Language1.2

Dublin Core Metadata Element Set1.1

UKOLN

Encoded Archival Description20022002

Text Encoding Initiative

DDI Codebook2.5

OAI Protocol for Metadata Harvesting

2.02002

OAI

OAI Object Reuse and Exchange1.0

2008

SPARQL1.1

2013

ISO 704 (Terminology - Principles and methods)

2000

UNECE

ISO 19504 (Common Warehouse Metamodel)

Statistical Data and Metadata Exchange

2.12011

Common Metadata Framework

DDI Alliance

DDI Lifecycle3.1

UNSC

EDIFACT

Meta Object Facility1.4.12005

Ontology Definition Metamodel1.0

2009

Information Management Metamodel

UML Profile & Metamodel for Services

1.0.12012

Semantics of Business Vocabulary and Business Rules

1.22013

ISO 6093 (Number Namespace)

Metadata Encoding & Transmission Standard

1.102013

ISO 15000-4 (ebRS)2004

ISO 15489 (Records Management)

2001

ISO 23081 (Metadata for records)2006

ISO 16363 (Audit and Certification of Trustworthy Digital Repositories)

2011

ISO 14721 (OAIS)2012

Dublin Core Metadata Initiative

ISO 15836 (DCMES)

SWORD2.0

2008JISC

BagIt

ARK Identifiers

ISO 26324 (Digital Object Identifier)

2012

RFC 3652 (Handle System Protocol)

2.12003

RFC 3650 (Handle System Overview)

2003

RFC 3651 (Handle System Namespace and Service

Definition)

2003

ISO 13120 (ClamML)2013

ISO 27951 (CTS1)2009

ISO 27527 (Provider Identification)

2010

ISO 27932 (HL7 Clinical Document Architecture)

2009

ISO 27931 (HL7)2009

ISO 17115 (Vocabulary for terminological systems)

2007

LMER1.2

DNB

RFC 2141 (URN Syntax)1997

RFC 1737 (URN Requirements)1994

RFC 4122 (UUID URN Namespace)

2005

ISO 20652 (PAIMAS)2006

IMS Content Packaging1.2

IMS Global

Z39.50 (Information Retrieval)4

2003

ISO 2709 (Format for information exchange)

2008

MARC 21

EAD2002

FOAF Vocabulary0.992014

FOAF Project

RDF Best Practices

CoolURIs

RDF Vocabulary Description Language

1.02004

Extensible Resource Identifier2.0

2005

RFC 2234 (ABNF)1997

RFC 3987 (IRI)2005

RFC 3305 (URI,URL,URN Clarifications)

2002

RFC 2396 (URI)1998

XRI Data Interchange2.0

2005

ISO 14533-2 (XAdES)2012

Canonical XML1.0

2001

Universal Business Language2.1

2013

ISO 14662 (Open-edi)2010

ISO 15000-5 (CCTS)2005

Z39.88 (OpenURL)1

2004

Z39.85 (DCMES)1

2001

ISO 8601 (Dates and Times)2000

ISO 62264 (B2MML)2003-2008

ISA 952001-2005

ISA 88

ANSI

ISO 21000-2 (MPEG-21 DID)2005

ISO 21000-6 (MPEG-21 RDD)2004

ISO 21000-7 (MPEG-21 DIA)2007

ISO 21000-9 (MPEG-21 Fileformat)2005

ISO 21000-18 (MPEG-21 Streaming)

2007

ISO 14496-12 (base media file format)2012

RFC 6481(Codecs)2011

ISO 21000-3 (MPEG-21 DII)2003

TIFF6.0

1992

ISO 15444-1 (JPEG2000)2004

JPEG

UnitsML1.0

2011

NIST

hData1.0

2013

RLUS1.0.12011

LECIS1.0

2003

ISO 21090 (Health informatics data types)

IHE

XDS

SVSXUA

SAML2.0

2008 XACML3.0

2013

ASTM E1986 (Access Privileges to Health Info)

2013

ASTM E1869 (Confidentiality, Privacy, Access and Data Security

)

2010

ISO 19005-1b (PDF/A)

CDA2

2008

ISO 19510 (BPMN 2.0)2013

BPMN2.0.12011

SAA

CDISC

BRIDG3.2

Define-XML2.0

2013

ADaM2.1

SDM-XML1.0

CDISC-ODM1.3.2

SEND3.0

LAB1.0.1

ISO 28500 (WARC)2009

RFC 3629 (UTF-8)2003

ISO 17025 (Competence of laboratories)

2005

17

The landspace of standards potential useful for the Framework

Page 18: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Analytics Goal

– Implement a strategy addressing the need for R&D to Big Data through Informatics to enhance decision making, to Simplify of the Operating Model and ensure efficiencies to Deliver More Products of Value

• Goal: Maximize insight from the minimum necessary data

– Standards, MDM, Data Quality, Data Governance & Stewardship and Blue Printing are some of the foundational capabilities to do this

Search.Analytics.Visualization.

IT & Informatics Technology

Clinical DataDiscovery DataNon R&D Data (e.g. competitive intelligence)

GSK Data

High Value Questions

Knowledge centered organization.Increased Re-Use of Knowledge.Knowledge as our competitive asset.

R&D Culture

Big Data Informatics

Electronic Health Records.Academic Institutes. (EBI, Broad, etc...)Public / Private Partnerships. (IMI etc..)Publications / Public Standards

External Data

Page 19: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

SOCRATES SEARCH

Scientific searching for R&D

Text and chemical searches

Internal and External data sources linked through ontologies

More access to R&D content

Page 20: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Phase II – Summarise GSK Experience of Compounds, Targets

• Provide overview of compounds using traditional drug Discovery-Development chevrons, allow drilldown into the raw data.

• Integrated into Socrates Search, whenever a compound or medicine is ‘detected’ in search results

• Targetpedia : existing system integrated and extended into Socrates

Preclinical data systemsEarly systems Clinical systems

Socrates Search

Federated data aggregation(supported with ontologies)

TargetSelection

CandidateSelection

Commit to Medicine Dev

Target IDLead

DiscoveryLead

OptimisationPreclinical

Development POC FullDevelopment

External Datasets

Page 21: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

Socrates Targetpedia and Compoundpedia

Targetpedia

Compoundpedia

Exporting to Excel

Page 22: Use of Data Standards and Metadata in Information Exchange Rachel Uphill From Big Data to Chemical Information Meeting.

GSK Search

Improved GSK Search

Socrates Scientific Search

Socrates Target and Compoundpedias

Structured public data

Advanced “Expert Systems”

Integrating Predictive Systems

Integrating Knowledge

Delivery

Value

Data Integration

Advanced Analytics

Simple Analytics

Text Analytics

Data Quality

Ontologies

2013

Whats next ? …

NLP

2012

2014

2015