The iPlant Collaborative Semantic Web Platform · iPlant Semantic Web Program Foundational...
Transcript of The iPlant Collaborative Semantic Web Platform · iPlant Semantic Web Program Foundational...
W O R K S H O P O N S E M A N T I C S I N G E O S PAT I A L A R C H I T E C T U R E S : A P P L I C AT I O N S A N D I M P L E M E N TAT I O N
O c t o b e r 2 0 1 3
D A M I A N G E S S L E R , P h . D .
S E M A N T I C W E B A R C H I T E C T
U N I V E R S I T Y O F A R I Z O N A
d g e s s l e r ( a t ) i p l a n t c o l l a b o r a t i v e ( d o t ) o r g
The iPlant Collaborative Semantic Web Platform
Classicism and Scholasticism
w w w . i p l a n t c o l l a b o r a t i v e . o r g 2
St. Thomas Aquinas Fra Bartolommeo (1472–1517)
Source: http://en.wikipedia.org/wiki/File:Thomas_Aquinas_by_Fra_Bartolommeo.jpg
Think, then look Look, then think
Empiricism
w w w . i p l a n t c o l l a b o r a t i v e . o r g 3
Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg
Look, then think Think, then look
Sir Francis Bacon Frans Pourbus the younger (1569–1622)
Scientific Method
w w w . i p l a n t c o l l a b o r a t i v e . o r g 4
Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg
Hypothesis
Prediction
Experiment Analysis
Conclusion
[ support or refutation ]
Sir Francis Bacon Frans Pourbus the younger (1569–1622)
The Experiment as the Rate Limiting Step
w w w . i p l a n t c o l l a b o r a t i v e . o r g 5
Charles Darwin Leonard Darwin, 1874
Source: http://en.wikipedia.org/wiki/File:1878_Darwin_photo_by_Leonard_from_Woodall_1884_-_cropped_grayed_partially_cleaned.jpg
Hypothesis
Prediction
Experiment Analysis
Conclusion
[ support or refutation ]
Bio(Geo)logy
w w w . i p l a n t c o l l a b o r a t i v e . o r g 6
Not too long ago: “If you love science and hate math, do Bio(Geo)logy”
All-in-one Analysis + Manuscript Prep + Data Management Plan
w w w . i p l a n t c o l l a b o r a t i v e . o r g 7
Not too long ago: “What? Me worry? My backups are safe”
Source: http://en.wikipedia.org/wiki/File:IBM_PC_5150.jpg
Quantity has a Quality all of its own
w w w . i p l a n t c o l l a b o r a t i v e . o r g 8
Hypothesis driven
Data Driven
Evidence-based Decision-making
w w w . i p l a n t c o l l a b o r a t i v e . o r g 9
http://blog.lib.umn.edu/ellis271/arch1701/bigstockphoto_Global_Warming_217540%203.jpg http://www.smartpower.org/blog/wp-content/photos/field_turbines.jpg
Decisions have downstream and unintended consequences; analyses and decisions about our Natural world that utilize a scientific approach bias our odds towards viable solutions.
The Analysis as the Rate Limiting Step
w w w . i p l a n t c o l l a b o r a t i v e . o r g 10
Source: http://en.wikipedia.org/wiki/File:Mapping_Reads.png
Hypothesis
Prediction
Experiment Analysis
Conclusion
[ support or refutation ]
high throughput sequencing
from climate change to
Re-revolutionizing the Revolution
w w w . i p l a n t c o l l a b o r a t i v e . o r g 11
13th century 20th century
Think, then look Look, then think
Hypothesis driven Data Driven
Re-revolutionizing the Revolution
w w w . i p l a n t c o l l a b o r a t i v e . o r g 12
16th century 21st century
Look, then think Think, then look
Data driven Hypothesis Driven
Hypoth
Predict
Exprmnt
Analysis
Conclsn
w w w . i p l a n t c o l l a b o r a t i v e . o r g 13
Source: http://en.wikipedia.org/wiki/File:Pourbus_Francis_Bacon.jpg
Hypoth
Predict
Exprmnt
Analysis
Conclsn
Hypoth
Predict
Exprmnt
Analysis
Conclsn
Hypoth
Predict
Exprmnt
Analysis
Conclsn
Analysis
Scientific Method, on hyper-cycles
w w w . i p l a n t c o l l a b o r a t i v e . o r g 14
Bridging HPC, Enterprise, and Web assets
High Performance Computing 500K core, 10 PetaFLOPS*
Petabyte scale storage
PetaFLOP: 1015 (million billion) floating point operations per second
Foundational Infrastructure iPlant Data Store, HPC, etc.
The iPlant Collaborative
Mission: To build a cyberinfrastructure for the nation’s plant scientists.
iPlant is a Service/Infrastructure project
w w w . i p l a n t c o l l a b o r a t i v e . o r g 15
Bridging HPC, Enterprise, and Web assets
Foundational Infrastructure iPlant Data Store, HPC, etc.
Enterprise Class Discovery Environment, Atmosphere
Enterprise Class Virtual Work desk,
Cloud, Virtual machines
Discovery Environment: world class bioinformatics’ work station at your browser
Atmosphere: “instant” dedicated work station on the cloud: load, use, discard, repeat
breadth
dep
th
High Performance Computing 500K core, 10 PetaFLOPS*
Petabyte scale storage
PetaFLOP: 1015 (million billion) floating point operations per second
w w w . i p l a n t c o l l a b o r a t i v e . o r g 16
The Greatest Informatic Asset of all Time
Web
Just a portal and browser ... ... or an infrastructural asset?
w w w . i p l a n t c o l l a b o r a t i v e . o r g 17
Bridging HPC, Enterprise, and Web assets
Enterprise Class Discovery Environment, Atmosphere
Web
What is the infrastructural role for MODS, CODS, and trillions $$$ in web assets? How does iPlant engage, leverage, and enhance the Gramene’s, the TAIR’s, Soybase’s, SGN’s, MazieGDB’s, PlexDB’s, ..., of the world? How does iPlant engage anything that is not a downloadable, installable Linux/MS program?
MODS: Model Organism Databases CODS: Clade-Oriented Databases
Foundational Infrastructure iPlant Data Store, HPC, etc.
w w w . i p l a n t c o l l a b o r a t i v e . o r g 18
iPlant Semantic Web Program
Foundational Infrastructure iPlant Data Store, HPC, etc.
Enterprise Class Discovery Environment, Atmosphere
Web
SSWAP Semantic Integration
Semantic Pipelining
Enterprise Class Virtual Work desk
Cloud, Virtual machines
Distributed Semantic Web Services Logic-driven semantics
High Performance Computing 500K core, 10 PetaFLOPS*
Petabyte scale storage
PetaFLOP: 1015 (million billion) floating point operations per second
w w w . i p l a n t c o l l a b o r a t i v e . o r g 19
iPlant Semantic Web Program
Web
SSWAP Semantic Integration
Semantic Pipelining
Distributed Semantic Web Services Logic-driven semantics
It is the Semantic aspect of the Semantic Web that allows us to leverage the Web from being an external resource into an integrated infrastructural asset.
The Actors
You
Community MODS (Model Organism Databases)
and CODS (Clade Oriented Databases)
iPlant Computational Resources (e.g., TACC)
The World Your lab
The World
Semantic Mediation Layer
w w w . i p l a n t c o l l a b o r a t i v e . o r g 20
Semantic Mediation
Sidney Harris © 2006
w w w . i p l a n t c o l l a b o r a t i v e . o r g 21
The Antibody Analogy as the Mediation Layer
w w w . i p l a n t c o l l a b o r a t i v e . o r g 22
Simple Semantic Web Architecture and Protocol W3C OWL RDF/XML
• Establish the framework for Web resources to describe themselves and their offerings
• Establish the framework for ontological integration
• Engage first-order, description logic reasoning
• Provide a semantically enabled Discovery Server for service and pipeline coordination
http://sswap.info/protocol
w w w . i p l a n t c o l l a b o r a t i v e . o r g 23
SSWAP Enables Reasoner-assisted Workflows
• A protocol allows a reasoner to connect chains of resources (services) based on logical (not just lexical) matching of what various resources consume and produce.
• Web Discovery: from (any) Web site -> semantic pipeline
• Semantic Pipeline: single chain workflows of distributed semantic Web services hosted anywhere on the Web
• Workflows constructed via reasoner-assisted, first-order subsumption matching of the output of one service into the input of another
w w w . i p l a n t c o l l a b o r a t i v e . o r g 24
rdfs:subClassOf
Resource1 Subject Object
Resource2 Subject Object
Resource3 Subject Object
rdfs:subClassOf
iPlant Semantic Architecture
Data and Service Providers
Web Resource
Data
RDG RIG RRG
Algorithm
Web
Semantic Broker / Discovery Server
INTERPRETER
IND
IREC
TIO
N L
AYE
R
KB
BROKER
RDG RQG RRG EXPLICIT INTERFACE
IND
IREC
TIO
N L
AYE
R
Clients
RQG
Client
RIG RRG
Ontologies
Ontology Servers
Protocol Ontology
OWL RDF/XML
OWL RDF/XML
REPOSITORY
RESTful API
1
2
4 3
EXPLICIT INTERFACE
Semantic documents described in this talk: PDG: Provider Description Graph
RDG: Resource Description Graph RIG: Resource Invocation Graph RRG: Resource Response Graph
RQG: Resource Query Graph
PDG
w w w . i p l a n t c o l l a b o r a t i v e . o r g 25
w w w . i p l a n t c o l l a b o r a t i v e . o r g 26
sswap.info/example
w w w . i p l a n t c o l l a b o r a t i v e . o r g 27
From TreeGenes to High Performance Computing
w w w . i p l a n t c o l l a b o r a t i v e . o r g 28
Semantic Integration from Third-party Web sites DiversiTree
Javascript snippet to launch
data for Web Discovery
with the press of a button
HTTP API: sswap.info/api
w w w . i p l a n t c o l l a b o r a t i v e . o r g 29
JSON -> OWL RDF/XML
transformation
transparent to the user
(via the SSWAP HTTP API)
w w w . i p l a n t c o l l a b o r a t i v e . o r g 30
Web Discovery into Semantic Pipelines Reasoner-assisted Web workflows
Reasoner uses first-
order, description logic
to present services
and pipelines that can
operate on the data at
any given step
Just-In-Time Ontology Hosting sswap.info/jit
w w w . i p l a n t c o l l a b o r a t i v e . o r g 31
w w w . i p l a n t c o l l a b o r a t i v e . o r g 32
Web Discovery into Semantic Pipelines Reasoner-assisted Web workflows
Reasoner uses first-
order, description logic
to present services
and pipelines that can
operate on the data at
any given step
w w w . i p l a n t c o l l a b o r a t i v e . o r g 33
RESTful Pipeline Execution
w w w . i p l a n t c o l l a b o r a t i v e . o r g 34
Data Tree View “Ontologized” data and metadata
•on-the-fly Data Tree views
•pre-defined renderers
w w w . i p l a n t c o l l a b o r a t i v e . o r g 35
Semantic Integration into Third-party Web sites TreeGenes’ CartograTree
Third-party web sites can
engage as renderers on
result sets
w w w . i p l a n t c o l l a b o r a t i v e . o r g 36
TreeGenes’ Sequenced, Genotyped, and Phenotyped
Geographical browsing into
the TreeGenes database
• 1 265 tree species
• 901 113 sequences
• 24 142 786 genotypes
• 19 441 phenotypes
http://dendrome.ucdavis.edu/treegenes
w w w . i p l a n t c o l l a b o r a t i v e . o r g 37
AmeriFlux Sites CO2, Water, Energy
http://public.ornl.gov/ameriflux
w w w . i p l a n t c o l l a b o r a t i v e . o r g 38
WorldClim 1 Km2 climate grids
http://www.worldclim.org
w w w . i p l a n t c o l l a b o r a t i v e . o r g 39
ArcGIS Layers
Soil layers
http://maps2.arcgislonline.com
w w w . i p l a n t c o l l a b o r a t i v e . o r g 40
TRY-DB Phenotypes
http://www.try-db.org
w w w . i p l a n t c o l l a b o r a t i v e . o r g 41
CartograTree Custom user interface data selection and analysis
• Select
• Analyze
• Web Discovery
w w w . i p l a n t c o l l a b o r a t i v e . o r g 42
Semantic Integration from Third-party Web sites Data slicing and contextual augmentation
w w w . i p l a n t c o l l a b o r a t i v e . o r g 43
Direct and Indirect Data Referencing URI dereferencing of arbitrarily large data sets
Serialize the data itself,
or a URI to where the
data is located
w w w . i p l a n t c o l l a b o r a t i v e . o r g 44
High Performance Computing Services engage like any other Web services
Custom Service Parameterization
w w w . i p l a n t c o l l a b o r a t i v e . o r g 45
w w w . i p l a n t c o l l a b o r a t i v e . o r g 46
Publishing Pipelines Private data, shared service parameterization
Manage and Publish your Pipelines
w w w . i p l a n t c o l l a b o r a t i v e . o r g 48
Phylogenetics Pipeline runs are persisted in OWL and can start new pipelines
w w w . i p l a n t c o l l a b o r a t i v e . o r g 49
TreeViz Multi-location, multi-institution, Web/HPC run
Quick Vitals
• 185,000 lines of code
• 100+ libraries
• Open source
• Free SDK (Software Development Kit) for semantics and reasoning on your servers; run pipeline manager on our servers
• More info: sswap.info/wiki
w w w . i p l a n t c o l l a b o r a t i v e . o r g 50
Acknowledgements
Special thanks to:
• iPlant Collaborative
• UC Davis Dendrome / TreeGenes
• Semantic Web engineering by Clark and Parsia, LLC
• NSF grants #0943879 and #EF-0735191
w w w . i p l a n t c o l l a b o r a t i v e . o r g 51