Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair.

Post on 27-Mar-2015

213 views 0 download

Tags:

Transcript of Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair.

Applying Semantic Web Standards to Drug Discovery and Development

Eric NeumannW3C HCLS co-chair

2

Knowledge

“--is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations.”

How does this translate into using Information Systems better in support of Innovation?

3

Knowledge Predictiveness

• Knowledge of Target Mechanisms• Knowledge of Toxicity • Knowledge of Patient-Drug Profiles

4

Where Information Advances are Most Needed

• Supporting Innovative Applications in R&D– Mol Diagnostics (Biomarkers)– Molecular Mechanisms (Systems)– Data Provenance, Rich Annotation

• Clinical Information– eHealth Records + EDC– Clinical Submission Documents– Safety Information, Pharmacovigilance, Adverse Events– Handling Biomarker evidence

• Standards– Central Data Sources

• Genomics, Diseases, Chemistry, Toxicology

– MetaData• Ontologies• Vocabularies

5

DecisionDecisionSupportSupport

TranslationalTranslationalResearchResearch

ToxTox

NewNewApplicationsApplications

SafetySafety

TargetTargetValidationValidation

BiomarkerBiomarkerQualificationQualification

GOGO

BioPAXBioPAX

ICHICH

Raw DataRaw Data

MAGE MLMAGE ML

ASN1.ASN1.

XLSXLS

Psi XMLPsi XML

CSVCSV

SAS TablesSAS Tables

CDISCCDISC

Semantic BridgeSemantic Bridge

6

Losing Connectedness in Tables

Genes

Tissues

?

Fast Uptake and ease of use, but loose binding to entities and terms

7

Data Integration?

• Querying Databases is not sufficient

• Data needs to include the Context of Local Scientists

• Concepts and Vocabulary need to be associated

• More about Sociology than Technology

Information Knowledge

8

Data Integration: Biology Requirements

Disease Proteins GenesPapers

RetentionPolicy

AuditTrail

Curation Tools Ontology Experiment

Samples

Compounds

9

Standards- Why Not?

• Good when there’s a majority of agreement• By vendors, for vendors?• Mainly about Data Packing-- should be more

about Semantics (user-defined)• Ease and Expressivity• Too often they’re Brittle and Slow to develop• “They’re great, that’s why there are so many of

them”

10

Data Integration Enables Business Integration: Efficiency and Innovation

• Searching

• Visualization

• Analysis

• Reporting

• Notification

• Navigation

11

Searching…

#1 way for finding information in companies…

13

Semantic Web Data Integration

R&D Scientist

Bioinformatics CheminformaticsLIMS Public Data Sources

Dynamic,Linked,

Searchable

14

The Current Web

What the computer sees: “Dumb” links

No semantics - <a href> treated just like <bold>

Minimal machine-processable information

15

The Semantic Web

Machine-processable semantic information

Semantic context published – making the data more informative to both humans and machines

16

The Web of Data

• URI’s are universal ID’s• Distributed data references• Non-locality of data• NamedGraphs can help

segment external references• New meaning for Annotation

target target

gene

pathway

17

Case Study: Omics

ApoA1 …

… is produced by the Liver

… is expressed less in Atherosclerotic Liver

… is correlated with DKK1

… is cited regarding Tangier’s disease

… has Tx Reg elements like HNFR1

Subject Verb Object

18

Courtesy of BG-Medicine

Example:Knowledge Aggregation

20

Tim Berners-Lee’s App View

21

Semantic Web Drug DD Application Space

Genomics

Therapeutics

Biology

HTS

NDA

Compound Opt

safety

eADME

DMPK

informatics

manufacturing

genes

ClinicalStudies

Patent

Chem Lib

Production

22

W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group

• Interest Group formally launched Nov 2005: http://www.w3.org/2001/sw/hcls

• First Domain Group for W3C - “…take SW through its paces”

• An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices

• Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum

• SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA,

• Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)

23

HCLS Objectives

• Share use cases, applications, demonstrations, experiences

• Exposing collections

• Developing vocabularies

• Building / extending (where appropriate) core vocabularies for data integration

24

HCLS Activities

• BioRDF - data as RDF• BioNLP - unstructured data• BioONT - ontology coordination • Clinical Trials - CDISC/HL7• Scientific Publishing - evidence management• Adaptive Healthcare Protocols

25

Reporting on ProgressionNotify Others of Decisions

ProgressionManager

Found DeterminationsNoted Alternatives

ScientistToxicogenomicist

Shared AnnotationsNotified of Alternatives

Semantic Web in R&D

A Single Compound

Open Data Format and Flexible Linking EnabledData Integration and Collaboration

26

Progression ManagerProject Dashboard

ScientistR&D Commons

ToxicogenomicistExperiment Manager

A Single Compound

R&D Applications in the Semantic Web

27

Other Benefits of Semantic Web

• Enterprise Distributed Connectivity– Universal Resource Identifiers (URI)

• Authenticity– Auditability (Sarbanes-Oxley) – Authorship Non-repudibility

• Privacy– Encryptibility and Trust Networks

• Security – At any level of granularity

28

What is the Semantic Web ?

• http://www.w3.org/2006/Talks/0125-hclsig-em/

It’s AI

It’s Web 2.0

It’sOntologies

It’s DataTracking

It’s a Global Conspiracy

It’s SemanticWebs

It’s TextExtraction

29

W3C Roadmap

• Semantic Web foundation specifications – RDF, RDF Schema and OWL are W3C

Recommendations as of Feb 2004

• Standardization work is underway in Query, Best Practices and Rules

• Goal of moving from a Web of Document to a Web of Data

The Only Open and Web-based Data Integration Model Game in Town

30

Leveraging with Semantic Web

• Free Data from Applications… – Data uniquely defined by URI’s, even across

multiple databases– Mapped through a common graph semantic

model– Data can be distributed (not in one location)– New relations and attributes dynamically added

• As easy as spreadsheets, but with semantics and web locations

Benefit #1

31

Leveraging with Semantic Web

• All things on the Web can have semantics added to them– Ability to define and link in ontologies– Documents Management through Links– Changed data and semantics can be managed as

versions– Semantics can be used to define and apply policies– No Need for complex Middleware

Benefit #2

32

Leveraging with Semantic Web

• Supporting the Management of Knowledge– All data nodes and doc resources can be linked– Ability to represent Assertions and Hypotheses

• Include authorship and assumptions• Use of KD45 logic

– Both Local and Global Knowledge• Scientists can upload partially validated facts

– View Data and Interpretations through Points-of-View (Semantic Lenses)

• Share views with others

Benefit #3

33

The Technologies: RDF

• Resource Description Framework• Think: "Relational Data Format"• W3C standard for making statements of fact

or belief about data or concepts• Descriptive statements are expressed as

triples: (Subject, Verb, Object)– We call verb a “predicate” or a “property”

Subject ObjectProperty

<Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>

34

Universal, semantic connectivity supports the construction of elaborate structures.

What RDF Gets You

35

What does RDF get you?

• Structure is not format-rigid (i.e. tree)– Semantics not implicit in Syntax– No new parsers need to be defined for new data

• Entities can be anywhere on the web (URI)• Define semantics into graph structures

(ontologies)– Use rules to test data consistency and extract important

relations

• Data can be merged into complete graphs• Multiple ontologies supported

36

RDF vs. XML example

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Wang et al., Nature Biotechnology, Sept 2005

AGML

HUPML

37

RDF Stripe Mode

Node>Edge>Node>Edge….

38

RDF Graph

40

gsk:KENPAL rdf:type :Compound ; dc:source http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&#38;db=pubmed&#38;dopt=Abstract&#38;list_uids=14698171 ;

chemID “3820” ;

clogP “2.4” ;

kA “e-8” ;

mw “327.17” ;

ic50 { rdf:type :IC50 ; value “23” ; units :nM ; forTarget gsk:GSK3beta } ;

chemStructure “C16H11BrN2O” ;

rdfs:label “kenpaullone” ;

synonym “bromo-paullone” ;

smiles “C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B” ;

inChI “1/C16H11BrN2O/c17-9-5-6-14-11(7-9)12-8-15(20)18-13-4-2-1-3-10(13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H” ;

xref http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3820 .

41

DB

Mapping from Current Formats

42

Excel => RDF

ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio "0.2726" ; ls:conditionHub gl:BREAST_MALIGNANT } ;

ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS ; ls:GE_Expected_Ratio "0.0138" ; ls:conditionHub gl:BREAST_MALIGNANT } ;

ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio "0.1275" ; ls:conditionHub gl:BREAST_NORMAL } ;

Casp2

TNFRS

BreastMalig

43

W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group

• Interest Group formally launched Nov 2005: http://www.w3.org/2001/sw/hcls

• First Domain Group for W3C - “…take SW through its paces”

– Not a standards group, but a group to identify the best implementations of current SW Standards!

• An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices

• Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)

44

W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group

• First formal meeting: Jan 25-26, 2006 Cambridge, MA

• SW Supporting Vendors: Oracle, IBM, HP, Siemens, Agfa,

• Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, U Bolzano, AlzForum,

• Joining W3C gets you in as s group member

– Early access to technology and discussions

– Interaction with potential partners and clients

45

Multiple Ontologies Used Together

Drug targetontologyFOAF

Patentontology

OMIM

Person

Group

Chemicalentity

Disease

SNP

BioPAX

UniProt

Extant ontologies

Protein

Under development

Bridge concept

UMLS

DiseasePolymorphisms

PubChem

46

Potential Linked Clinical Ontologies

Clinical Trialsontology

RCRIM(HL7)

Genomics

CDISC

IRB

Applications

Molecules

Clinical Obs

ICD10

Pathways(BioPAX)

DiseaseModels

Extant ontologies

Mechanisms

Under development

Bridge concept

SNOMED

DiseaseDescriptions

Tox

47

Case Studies

48

Case Study: NeuroCommons.org

• Public Data & Knowledge for CNS

• R&D Forum

• Available for industry and academia

• All based on Semantic Web Standards

49

NeuroCommons

The Recontribution of Knowledge

Publications are usually copyrighted…Knowledge of Nature should be openly shareable!

50

NeuroCommons.org

The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals:

1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information.

2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner.

3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.

52

NeuroCommons First Steps

The first stage is underway:

• Using NLP and other automated technologies, extract machine-readable representations of neuroscience-related knowledge as contained in free text and databases

• Assemble those representations into a graph• Publish the graph with no intellectual

property rights or contractual restrictions on reuse

53

HCLS Neuro Tasks

• Aggregate facts and models around Parkinson’s Disease

• SWAN: scientific annotations and evidence• Use RDF and OWL to describe

– Brain scans in the The Whole Brain Atlas– Neural entries in NCBI’s Entrez Gene Database– ’Brain Connectivity'– Neuronal data in SenseLab– Neurological Disease entries in OMIM

54

<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>

<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><bp: participants rdf:resource="#xDsh"/>

</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >

Case Study: BioPAX (Pathways)

55

<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>

<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/><bp: participants rdf:resource="#xDsh"/>

</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >

Case Study: BioPAX (Pathways)

Modulation

CHIR99102

affectedBy

56

Case Study: Drug Discovery Dashboards

• Dashboards and Project Reports• Next generation browsers for semantic

information via Semantic Lenses• Renders OWL-RDF, XML, and HTML

documents• Lenses act as information aggregators

and logic style-sheets

add { ls:TheraTopic hs:classView:TopicView}

58

Drug Discovery Dashboardhttp://www.w3.org/2005/04/swls/BioDash

Topic: GSK3beta Topic

Target: GSK3beta

Disease: DiabetesT2

Alt Dis: Alzheimers

Cmpd: SB44121

CE: DBP

Team: GSK3 Team

Person: John

Related Set

Path: WNT

59

Bridging Chemistry and Molecular Biology

urn:lsid:uniprot.org:uniprot:P49841

Semantic Lenses: Different Views of the same data

Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot

BioPax Components

Target Model

60

•Lenses can aggregate, accentuate, or even analyze new result sets

• Behind the lens, the data can be persistently stored as RDF-OWL

• Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references

Bridging Chemistry and Molecular Biology

61

Case Study: Drug Safety ‘Safety Lenses’

• Lenses can ‘focus data in specific ways– Hepatoxicity, genotoxicity, hERG, metabolites

• Can be “wrapped” around statistical tools• Aggregate other papers and findings (knowledge) in

context with a particular project• Align animal studies with clinical results• Support special “Alert-channels” by regulators for

each different toxicity issue• Integrate JIT information on newly published

mechanisms of actions

62

GeneLogic GeneExpress Data

• Additional relations and aspects can be defined additionally

Diseased Tissue

Links to OMIM (RDF)

63

ClinDash: Clinical Trials Browser

Clinical Obs

Expression Data

Subjects

•Values can be normalized across all measurables (rows)

•Samples can be aligned to their subjects using RDF rules

•Clustering can now be done over all measureables (rows)

64

Case Study: Nokia

• Developer’s Forum Portal

65

Case Study: TERANODE Design Suite Supports Laboratory Data and Workflow

• Protocol Modeler– Accelerates workflow

development

– Eliminates database

programming

• Protocol Player– Guides users through workflow

– Automates data capture

– Automates complex data flow

plates

– Integrates lab data with project

and enterprise data

66

Conclusions:Key Semantic Web Principles

• Plan for change • Free data from the application that

created it • Lower reliance on overly complex

Middleware• The value in "as needed" data integration

• Big wins come from many little ones • The power of links - network effect • Open-world, open solutions are cost

effective • Importance of "Partial Understanding"

Efficiency and Innovation:Semantic Web Applications Roadmap