The iPlant Collaborative Semantic Web Platform · iPlant Semantic Web Program Foundational...

Post on 20-Aug-2020

0 views 0 download

Transcript of The iPlant Collaborative Semantic Web Platform · iPlant Semantic Web Program Foundational...

W O R K S H O P O N S E M A N T I C S I N G E O S PAT I A L A R C H I T E C T U R E S : A P P L I C AT I O N S A N D I M P L E M E N TAT I O N

O c t o b e r 2 0 1 3

D A M I A N G E S S L E R , P h . D .

S E M A N T I C W E B A R C H I T E C T

U N I V E R S I T Y O F A R I Z O N A

d g e s s l e r ( a t ) i p l a n t c o l l a b o r a t i v e ( d o t ) o r g

The iPlant Collaborative Semantic Web Platform

Classicism and Scholasticism

w w w . i p l a n t c o l l a b o r a t i v e . o r g 2

St. Thomas Aquinas Fra Bartolommeo (1472–1517)

Source: http://en.wikipedia.org/wiki/File:Thomas_Aquinas_by_Fra_Bartolommeo.jpg

Think, then look Look, then think

Empiricism

w w w . i p l a n t c o l l a b o r a t i v e . o r g 3

Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg

Look, then think Think, then look

Sir Francis Bacon Frans Pourbus the younger (1569–1622)

Scientific Method

w w w . i p l a n t c o l l a b o r a t i v e . o r g 4

Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg

Hypothesis

Prediction

Experiment Analysis

Conclusion

[ support or refutation ]

Sir Francis Bacon Frans Pourbus the younger (1569–1622)

The Experiment as the Rate Limiting Step

w w w . i p l a n t c o l l a b o r a t i v e . o r g 5

Charles Darwin Leonard Darwin, 1874

Source: http://en.wikipedia.org/wiki/File:1878_Darwin_photo_by_Leonard_from_Woodall_1884_-_cropped_grayed_partially_cleaned.jpg

Hypothesis

Prediction

Experiment Analysis

Conclusion

[ support or refutation ]

Bio(Geo)logy

w w w . i p l a n t c o l l a b o r a t i v e . o r g 6

Not too long ago: “If you love science and hate math, do Bio(Geo)logy”

All-in-one Analysis + Manuscript Prep + Data Management Plan

w w w . i p l a n t c o l l a b o r a t i v e . o r g 7

Not too long ago: “What? Me worry? My backups are safe”

Source: http://en.wikipedia.org/wiki/File:IBM_PC_5150.jpg

Quantity has a Quality all of its own

w w w . i p l a n t c o l l a b o r a t i v e . o r g 8

Hypothesis driven

Data Driven

Evidence-based Decision-making

w w w . i p l a n t c o l l a b o r a t i v e . o r g 9

http://blog.lib.umn.edu/ellis271/arch1701/bigstockphoto_Global_Warming_217540%203.jpg http://www.smartpower.org/blog/wp-content/photos/field_turbines.jpg

Decisions have downstream and unintended consequences; analyses and decisions about our Natural world that utilize a scientific approach bias our odds towards viable solutions.

The Analysis as the Rate Limiting Step

w w w . i p l a n t c o l l a b o r a t i v e . o r g 10

Source: http://en.wikipedia.org/wiki/File:Mapping_Reads.png

Hypothesis

Prediction

Experiment Analysis

Conclusion

[ support or refutation ]

high throughput sequencing

from climate change to

Re-revolutionizing the Revolution

w w w . i p l a n t c o l l a b o r a t i v e . o r g 11

13th century 20th century

Think, then look Look, then think

Hypothesis driven Data Driven

Re-revolutionizing the Revolution

w w w . i p l a n t c o l l a b o r a t i v e . o r g 12

16th century 21st century

Look, then think Think, then look

Data driven Hypothesis Driven

Hypoth

Predict

Exprmnt

Analysis

Conclsn

w w w . i p l a n t c o l l a b o r a t i v e . o r g 13

Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg

Hypoth

Predict

Exprmnt

Analysis

Conclsn

Hypoth

Predict

Exprmnt

Analysis

Conclsn

Hypoth

Predict

Exprmnt

Analysis

Conclsn

Analysis

Scientific Method, on hyper-cycles

w w w . i p l a n t c o l l a b o r a t i v e . o r g 14

Bridging HPC, Enterprise, and Web assets

High Performance Computing 500K core, 10 PetaFLOPS*

Petabyte scale storage

PetaFLOP: 1015 (million billion) floating point operations per second

Foundational Infrastructure iPlant Data Store, HPC, etc.

The iPlant Collaborative

Mission: To build a cyberinfrastructure for the nation’s plant scientists.

iPlant is a Service/Infrastructure project

w w w . i p l a n t c o l l a b o r a t i v e . o r g 15

Bridging HPC, Enterprise, and Web assets

Foundational Infrastructure iPlant Data Store, HPC, etc.

Enterprise Class Discovery Environment, Atmosphere

Enterprise Class Virtual Work desk,

Cloud, Virtual machines

Discovery Environment: world class bioinformatics’ work station at your browser

Atmosphere: “instant” dedicated work station on the cloud: load, use, discard, repeat

breadth

dep

th

High Performance Computing 500K core, 10 PetaFLOPS*

Petabyte scale storage

PetaFLOP: 1015 (million billion) floating point operations per second

w w w . i p l a n t c o l l a b o r a t i v e . o r g 16

The Greatest Informatic Asset of all Time

Web

Just a portal and browser ... ... or an infrastructural asset?

w w w . i p l a n t c o l l a b o r a t i v e . o r g 17

Bridging HPC, Enterprise, and Web assets

Enterprise Class Discovery Environment, Atmosphere

Web

What is the infrastructural role for MODS, CODS, and trillions $$$ in web assets? How does iPlant engage, leverage, and enhance the Gramene’s, the TAIR’s, Soybase’s, SGN’s, MazieGDB’s, PlexDB’s, ..., of the world? How does iPlant engage anything that is not a downloadable, installable Linux/MS program?

MODS: Model Organism Databases CODS: Clade-Oriented Databases

Foundational Infrastructure iPlant Data Store, HPC, etc.

w w w . i p l a n t c o l l a b o r a t i v e . o r g 18

iPlant Semantic Web Program

Foundational Infrastructure iPlant Data Store, HPC, etc.

Enterprise Class Discovery Environment, Atmosphere

Web

SSWAP Semantic Integration

Semantic Pipelining

Enterprise Class Virtual Work desk

Cloud, Virtual machines

Distributed Semantic Web Services Logic-driven semantics

High Performance Computing 500K core, 10 PetaFLOPS*

Petabyte scale storage

PetaFLOP: 1015 (million billion) floating point operations per second

w w w . i p l a n t c o l l a b o r a t i v e . o r g 19

iPlant Semantic Web Program

Web

SSWAP Semantic Integration

Semantic Pipelining

Distributed Semantic Web Services Logic-driven semantics

It is the Semantic aspect of the Semantic Web that allows us to leverage the Web from being an external resource into an integrated infrastructural asset.

The Actors

You

Community MODS (Model Organism Databases)

and CODS (Clade Oriented Databases)

iPlant Computational Resources (e.g., TACC)

The World Your lab

The World

Semantic Mediation Layer

w w w . i p l a n t c o l l a b o r a t i v e . o r g 20

Semantic Mediation

Sidney Harris © 2006

w w w . i p l a n t c o l l a b o r a t i v e . o r g 21

The Antibody Analogy as the Mediation Layer

w w w . i p l a n t c o l l a b o r a t i v e . o r g 22

Simple Semantic Web Architecture and Protocol W3C OWL RDF/XML

• Establish the framework for Web resources to describe themselves and their offerings

• Establish the framework for ontological integration

• Engage first-order, description logic reasoning

• Provide a semantically enabled Discovery Server for service and pipeline coordination

http://sswap.info/protocol

w w w . i p l a n t c o l l a b o r a t i v e . o r g 23

SSWAP Enables Reasoner-assisted Workflows

• A protocol allows a reasoner to connect chains of resources (services) based on logical (not just lexical) matching of what various resources consume and produce.

• Web Discovery: from (any) Web site -> semantic pipeline

• Semantic Pipeline: single chain workflows of distributed semantic Web services hosted anywhere on the Web

• Workflows constructed via reasoner-assisted, first-order subsumption matching of the output of one service into the input of another

w w w . i p l a n t c o l l a b o r a t i v e . o r g 24

rdfs:subClassOf

Resource1 Subject Object

Resource2 Subject Object

Resource3 Subject Object

rdfs:subClassOf

iPlant Semantic Architecture

Data and Service Providers

Web Resource

Data

RDG RIG RRG

Algorithm

Web

Semantic Broker / Discovery Server

INTERPRETER

IND

IREC

TIO

N L

AYE

R

KB

BROKER

RDG RQG RRG EXPLICIT INTERFACE

IND

IREC

TIO

N L

AYE

R

Clients

RQG

Client

RIG RRG

Ontologies

Ontology Servers

Protocol Ontology

OWL RDF/XML

OWL RDF/XML

REPOSITORY

RESTful API

1

2

4 3

EXPLICIT INTERFACE

Semantic documents described in this talk: PDG: Provider Description Graph

RDG: Resource Description Graph RIG: Resource Invocation Graph RRG: Resource Response Graph

RQG: Resource Query Graph

PDG

w w w . i p l a n t c o l l a b o r a t i v e . o r g 25

w w w . i p l a n t c o l l a b o r a t i v e . o r g 26

sswap.info/example

w w w . i p l a n t c o l l a b o r a t i v e . o r g 27

From TreeGenes to High Performance Computing

w w w . i p l a n t c o l l a b o r a t i v e . o r g 28

Semantic Integration from Third-party Web sites DiversiTree

Javascript snippet to launch

data for Web Discovery

with the press of a button

HTTP API: sswap.info/api

w w w . i p l a n t c o l l a b o r a t i v e . o r g 29

JSON -> OWL RDF/XML

transformation

transparent to the user

(via the SSWAP HTTP API)

w w w . i p l a n t c o l l a b o r a t i v e . o r g 30

Web Discovery into Semantic Pipelines Reasoner-assisted Web workflows

Reasoner uses first-

order, description logic

to present services

and pipelines that can

operate on the data at

any given step

Just-In-Time Ontology Hosting sswap.info/jit

w w w . i p l a n t c o l l a b o r a t i v e . o r g 31

w w w . i p l a n t c o l l a b o r a t i v e . o r g 32

Web Discovery into Semantic Pipelines Reasoner-assisted Web workflows

Reasoner uses first-

order, description logic

to present services

and pipelines that can

operate on the data at

any given step

w w w . i p l a n t c o l l a b o r a t i v e . o r g 33

RESTful Pipeline Execution

w w w . i p l a n t c o l l a b o r a t i v e . o r g 34

Data Tree View “Ontologized” data and metadata

•on-the-fly Data Tree views

•pre-defined renderers

w w w . i p l a n t c o l l a b o r a t i v e . o r g 35

Semantic Integration into Third-party Web sites TreeGenes’ CartograTree

Third-party web sites can

engage as renderers on

result sets

w w w . i p l a n t c o l l a b o r a t i v e . o r g 36

TreeGenes’ Sequenced, Genotyped, and Phenotyped

Geographical browsing into

the TreeGenes database

• 1 265 tree species

• 901 113 sequences

• 24 142 786 genotypes

• 19 441 phenotypes

http://dendrome.ucdavis.edu/treegenes

w w w . i p l a n t c o l l a b o r a t i v e . o r g 37

AmeriFlux Sites CO2, Water, Energy

http://public.ornl.gov/ameriflux

w w w . i p l a n t c o l l a b o r a t i v e . o r g 38

WorldClim 1 Km2 climate grids

http://www.worldclim.org

w w w . i p l a n t c o l l a b o r a t i v e . o r g 39

ArcGIS Layers

Soil layers

http://maps2.arcgislonline.com

w w w . i p l a n t c o l l a b o r a t i v e . o r g 40

TRY-DB Phenotypes

http://www.try-db.org

w w w . i p l a n t c o l l a b o r a t i v e . o r g 41

CartograTree Custom user interface data selection and analysis

• Select

• Analyze

• Web Discovery

w w w . i p l a n t c o l l a b o r a t i v e . o r g 42

Semantic Integration from Third-party Web sites Data slicing and contextual augmentation

w w w . i p l a n t c o l l a b o r a t i v e . o r g 43

Direct and Indirect Data Referencing URI dereferencing of arbitrarily large data sets

Serialize the data itself,

or a URI to where the

data is located

w w w . i p l a n t c o l l a b o r a t i v e . o r g 44

High Performance Computing Services engage like any other Web services

Custom Service Parameterization

w w w . i p l a n t c o l l a b o r a t i v e . o r g 45

w w w . i p l a n t c o l l a b o r a t i v e . o r g 46

Publishing Pipelines Private data, shared service parameterization

Manage and Publish your Pipelines

w w w . i p l a n t c o l l a b o r a t i v e . o r g 48

Phylogenetics Pipeline runs are persisted in OWL and can start new pipelines

w w w . i p l a n t c o l l a b o r a t i v e . o r g 49

TreeViz Multi-location, multi-institution, Web/HPC run

Quick Vitals

• 185,000 lines of code

• 100+ libraries

• Open source

• Free SDK (Software Development Kit) for semantics and reasoning on your servers; run pipeline manager on our servers

• More info: sswap.info/wiki

w w w . i p l a n t c o l l a b o r a t i v e . o r g 50

Acknowledgements

Special thanks to:

• iPlant Collaborative

• UC Davis Dendrome / TreeGenes

• Semantic Web engineering by Clark and Parsia, LLC

• NSF grants #0943879 and #EF-0735191

w w w . i p l a n t c o l l a b o r a t i v e . o r g 51