Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher...
-
Upload
julius-bradford -
Category
Documents
-
view
215 -
download
0
Transcript of Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher...
![Page 1: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/1.jpg)
Semantic Technologies: Towards Semantic Technologies: Towards
Making a Difference in Scientific Data Making a Difference in Scientific Data
Management Management
Bertram Bertram LudLudääscherscher
([email protected])([email protected])
San Diego Supercomputer Center
Associate ProfessorAssociate ProfessorDept. of Computer Science & Genome Dept. of Computer Science & Genome
CenterCenterUniversity of California, DavisUniversity of California, Davis
FellowFellow
San Diego Supercomputer CenterSan Diego Supercomputer CenterUniversity of California, San DiegoUniversity of California, San Diego
UC DAVISDepartment ofComputer Science
![Page 2: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/2.jpg)
2DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
OutlineOutline
• Semantics & Scientific Data IntegrationSemantics & Scientific Data Integration
• Semantics & Scientific Workflow ManagementSemantics & Scientific Workflow Management
• ConclusionsConclusions
![Page 3: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/3.jpg)
3DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Anatomy of the Science Anatomy of the Science Environment for Ecological Environment for Ecological
Knowledge (SEEK) Collaboratory Knowledge (SEEK) Collaboratory • Domain Science DriverDomain Science Driver
– Ecology (LTER), biodiversity, …
• Analysis & Modeling SystemAnalysis & Modeling System– Design & execution of ecological
models & analysis (“scientific workflows”)
– {application,upper}-ware Kepler system
• Semantic Mediation SystemSemantic Mediation System– Data Integration of hard-to-relate
sources and processes– Semantic Types and Ontologies– upper middleware Sparrow Toolkit
• EcoGridEcoGrid – Access to ecology data and tools– {middle,under}-ware unified API to SRB/MCAT,
MetaCat, DiGIR, … datasetssample CS problem [DILS’04]
![Page 4: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/4.jpg)
4DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Common Collaboratories / Distributed Common Collaboratories / Distributed Science / Cyberinfrastructure PiecesScience / Cyberinfrastructure Pieces
• Seamless and uniform data access (“Data-Grid”)Seamless and uniform data access (“Data-Grid”)– data & metadata registry
• distributed and high performance computing distributed and high performance computing platform (“Compute-Grid”)platform (“Compute-Grid”)– service registry
• User-friendly workbench / problem-solving User-friendly workbench / problem-solving environmentenvironment– scientific workflow system
• A common problem:A common problem:– integrating (or at least linking) data from multiple sites,
investigators, communities, …, scales, …, species, …
Federated, integrated, mediated databasesFederated, integrated, mediated databases– often use of semantic extensions (e.g. ontologies)
![Page 5: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/5.jpg)
5DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Interoperability & Integration Interoperability & Integration ChallengesChallenges
• System aspects: “Grid” Middleware• distributed data & computing, SOA• web services, WSDL/SOAP, WSRF, OGSA, …• sources = functions, files, data sets …
• Syntax & Structure: (XML-Based) Data Mediators
• wrapping, restructuring • (XML) queries and views• sources = (XML) databases
• Semantics: Model-Based/Semantic Mediators
• conceptual models and declarative views • Knowledge Representation: ontologies,
description logics (RDF(S),OWL ...)• sources = knowledge bases (DB+CMs+ICs)
• Synthesis: Scientific Workflow Design & Execution
• Composition of declarative and procedural components into larger workflows
• (re)sources = services, processes, actors, … • Semantic extensions needed here as well!
reconciling reconciling SS55 heterogeneitiesheterogeneities
““gluing” together gluing” together resources resources
bridging information and bridging information and knowledge gaps knowledge gaps computationallycomputationally
![Page 6: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/6.jpg)
6DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Information Integration Challenges: Information Integration Challenges: SS44 Heterogeneities Heterogeneities• SSystemystem aspects aspects
– platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote
resources, …
• SSyntaxyntax & & SStructuretructure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files,
…) – heterogeneous schemas (one for each DB ...) Database mediation and warehousing technologies+ XML-based data exchange, integrated views, transparent query
rewriting, …
• SSemanticsemantics– descriptive metadata, different terminologies, implicit assumptions &
hidden semantics (“context”) of experiments, simulations, observation, …
Knowledge representation & semantic mediation technologies+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy
anyways!
![Page 7: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/7.jpg)
7DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Information Integration Challenges: Information Integration Challenges: SS55 Heterogeneities Heterogeneities
• SSynthesisynthesis of applications, analysis tools, data & of applications, analysis tools, data & query components, … into “query components, … into “scientific workflowsscientific workflows” ” – How to make use of these wonderful things & put them
together to solve a scientist’s problem? Scientific Problem Solving Environments (PSEs)Scientific Problem Solving Environments (PSEs)
Portals,Workbench (“scientist’s view”, end user)+ ontology-enhanced data registration, discovery, manipulation+ creation and registration of new data products from existing
ones, … Scientific Workflow System (“engineer’s view”, tool
maker)+ for designing, re-engineering, deploying analysis pipelines
and scientific workflows; a tool to make new tools … + e.g., creation of new datasets from existing ones, dataset
registration, …
Not discussed here: the “6th S”: Social challenges …
![Page 8: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/8.jpg)
8DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Our FocusOur Focus• Scientific Data Integration: Scientific Data Integration:
– need DB/DI + KR (“semantic mediation”)
• Automation of Scientific Data Analysis, Automation of Scientific Data Analysis, Process & Application IntegrationProcess & Application Integration– need for scientific workflow systems– need for semantic extensions
• But first:But first:– Some data & information integration
problems
![Page 9: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/9.jpg)
9DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
An Online Shopper’s Information Integration An Online Shopper’s Information Integration ProblemProblem
El Cheapo: “Where can I get the cheapest copy (including shipping cost) of El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”
??Information Information IntegrationIntegration
addall.comaddall.com
“One-World”Mediation
amazon.comamazon.comamazon.comamazon.com A1books.comA1books.comA1books.comA1books.comhalf.comhalf.comhalf.comhalf.combarnes&noble.combarnes&noble.combarnes&noble.combarnes&noble.com
Mediator (virtual DB)Mediator (virtual DB)(vs. Datawarehouse)(vs. Datawarehouse)NOTE: non-trivial NOTE: non-trivial
data engineering challenges!data engineering challenges!
![Page 10: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/10.jpg)
10DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
A Home Buyer’s Information Integration A Home Buyer’s Information Integration ProblemProblem
What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?Information Integration
RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats
“Multiple-Worlds”Mediation
![Page 11: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/11.jpg)
11DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Information Integration from a Information Integration from a Database Perspective Database Perspective
• Information Integration ProblemInformation Integration Problem– Given: data sources S1, ..., Sk (databases, web sites, ...)
and user questions Q1,..., Qn that can –in principle– be answered using the information in the Si
– Find: the answers to Q1, ..., Qn
• The Database Perspective: The Database Perspective: source = source = “database”“database” Si has a schema (relational, XML, OO, ...) Si can be queried define virtual (or materialized) integrated (or
global) view G over local sources S1 ,..., Sk using database query languages (SQL, XQuery,...)
questions become queries Qi against G(S1,..., Sk)
![Page 12: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/12.jpg)
12DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
5. Post processing5. Post processing2. Query rewriting2. Query rewriting
Standard Mediator ArchitectureStandard Mediator Architecture
MEDIATORMEDIATOR
Integrated Global(XML) View G
Integrated ViewDefinition
G(..) S1(..)…Sk(..)
USER/ClientUSER/Client
1. Query Q ( G (S1. Query Q ( G (S11,..., S,..., Skk) )) )
S1
Wrapper
(XML) View
S2
Wrapper
(XML) View
Sk
Wrapper
(XML) Viewweb services as wrapper APIs
3. Q1 Q2 Q33. Q1 Q2 Q3
4. {answers(Q1)} {answers(Q2)} {answers(Q3)}4. {answers(Q1)} {answers(Q2)} {answers(Q3)}
6. {answers(Q)}6. {answers(Q)}
![Page 13: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/13.jpg)
13DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Query Planning in Data Query Planning in Data IntegrationIntegration
• GivenGiven: : – Declarative user query Q: answer(…) …G ...– … & { G … S … } global-as-view (GAV)– … & { S … G … } local-as-view (LAV)– … & { ic(…) … S … G… } integrity constraints (ICs)
• FindFind: : – equivalent (or minimal containing, maximal contained) query plan Q’: answer(…) … S …
query rewriting (logical/calculus, algebraic, physical levels)
• ResultsResults::– A variety of results/algorithms; depending on classes of
queries, views, and ICs: P, NP, … , undecidable– hot research area in core CS (database community)
![Page 14: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/14.jpg)
14DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
A Neuroscientist’s A Neuroscientist’s Information Integration Information Integration
ProblemProblemWhat is the cerebellar distribution of rat proteins with more than
70% homology with human NCS-1? Any structure specificity?How about other rodents?
?Information Integration
protein localization(NCMIR)
neurotransmission(SENSELAB)
sequence info(CaPROT)
morphometry(SYNAPSE)
“Complex Multiple-Worlds”
Mediation
Biomedical InformaticsBiomedical InformaticsResearch NetworkResearch Networkhttp://nbirn.nethttp://nbirn.net
Inter-source links:• unclear for the non-scientists• hard for the scientist
![Page 15: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/15.jpg)
15DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
![Page 16: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/16.jpg)
16DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Scientific Data Integration Scientific Data Integration using using
Semantic ExtensionsSemantic Extensions
![Page 17: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/17.jpg)
17DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Example: Geologic Map Example: Geologic Map IntegrationIntegration
• GivenGiven: : – Geologic maps from different state geological
surveys (shapefiles w/ different data schemas)– Different ontologies:
• Geologic age ontology (e.g. USGS)• Rock classification ontologies:
– Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC)
– Single hierarchy from British Geological Survey (BGS)
• ProblemProblem::– Support uniform queries across all maps – … possibly using different ontologies– Support registration w/ ontology A, querying
w/ ontology B
![Page 18: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/18.jpg)
18DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
IntegrationSchema
Schema Integration Schema Integration (“registering” (“registering” local schemas to the global schema)local schemas to the global schema)
Arizona
Colorado
Utah
Nevada
Wyoming
New Mexico
Montana E.
Idaho
Montana West
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
ABBREV
PERIOD
PERIOD
NAME
PERIOD
TYPE
TIME_UNIT
FMATN
PERIOD
NAME
PERIOD
NAME
FORMATION
PERIOD
FORMATION
FORMATION
LITHOLOGY
LITHOLOGY
AGE
AGE
andesitic sandstone
Livingston formation
Tertiary-Cretaceous
Sources
Sources
![Page 19: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/19.jpg)
19DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Multihierarchical Rock Classification Multihierarchical Rock Classification “Ontology” (Taxonomies) for “Thematic “Ontology” (Taxonomies) for “Thematic
Queries” (GSC)Queries” (GSC)
Composition
Genesis
Fabric
Texture
![Page 20: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/20.jpg)
20DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Ontology-Enabled Application Example:Ontology-Enabled Application Example:Geologic Map IntegrationGeologic Map Integration
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
+/- a few hundred million years
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
Geologic Age
ONTOLOGY
NevadaNevada
![Page 21: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/21.jpg)
21DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Querying by Geologic Age … Querying by Geologic Age …
![Page 22: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/22.jpg)
22DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Querying by Geologic Age: Querying by Geologic Age: ResultsResults
![Page 23: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/23.jpg)
23DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Semantic Mediation Semantic Mediation (via “semantic (via “semantic registration” of schemas and ontology registration” of schemas and ontology
articulations)articulations)
• Schema elements and/or data values are Schema elements and/or data values are associated with concept expressions from associated with concept expressions from the target ontologythe target ontology conceptual queries “through” the ontology
• Articulation ontology Articulation ontology source registration to A, querying through B
• Semantic mediation: query rewriting w/ ontologiesSemantic mediation: query rewriting w/ ontologies
Database1
Database2
Ontology A
Ontology B
semanticregistration
ontology articulations
Concept-based(“semantic”)
queries
semanticregistration
![Page 24: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/24.jpg)
24DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Different views on State Different views on State Geological MapsGeological Maps
![Page 25: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/25.jpg)
25DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Sedimentary Rocks: BGS Sedimentary Rocks: BGS OntologyOntology
![Page 26: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/26.jpg)
26DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Sedimentary Rocks: GSC Sedimentary Rocks: GSC OntologyOntology
![Page 27: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/27.jpg)
27DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Some Thoughts … Some Thoughts … • Translate this idea of multiple conceptual (ontology) Translate this idea of multiple conceptual (ontology)
views to views to your your domain!domain!– e.g. datasets biological pathways registration
• Your data is valuable (time & $$$ spent in producing it)Your data is valuable (time & $$$ spent in producing it)data (re-)usabilitydata (re-)usability
• Metadata helps to discover, localize, assess relevant data sets, Metadata helps to discover, localize, assess relevant data sets, given particular scientific questions & queriesgiven particular scientific questions & queries
• Does your system “understand” what to do with the metadata? Does your system “understand” what to do with the metadata? • Capturing more semanticsCapturing more semantics of a data set in a way that humans of a data set in a way that humans
and systems can exploit it is an and systems can exploit it is an investment in reusabilityinvestment in reusability– “We are producing more and more data”– Today “we can store everything!”– But can we use anything? (i.e., is anyone looking at the data after
the initial creation?)• Design system, interfaces, data and metadata models with Design system, interfaces, data and metadata models with
reusability in mind (think archives and “time capsules”)reusability in mind (think archives and “time capsules”)• This may even be pushed to the experiment/simulation/workflow This may even be pushed to the experiment/simulation/workflow
design…design…
![Page 28: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/28.jpg)
28DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Data Semantics and Ontologies should be Data Semantics and Ontologies should be useful for Humans and “The Machine” useful for Humans and “The Machine”
![Page 29: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/29.jpg)
29DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscherformalized as domain map/ontology
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
domain expert knowledge
Made usable for the system using Description Logic
Example: Domain Example: Domain KnowledgeKnowledge to “glue” to “glue”
SYNAPSESYNAPSE & & NCMIR DataNCMIR Data
![Page 30: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/30.jpg)
30DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
““Semantic Source Browsing”: Semantic Source Browsing”: Domain Maps/Ontologies (left) & conceptually linked data (right)Domain Maps/Ontologies (left) & conceptually linked data (right)
![Page 31: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/31.jpg)
31DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
A Semantic Mediation A Semantic Mediation Result ViewResult View
![Page 32: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/32.jpg)
32DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Source Contextualization Source Contextualization through Ontology Refinementthrough Ontology Refinement
In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...
sources can register new concepts at the mediator ...
increase your data usability
![Page 33: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/33.jpg)
33DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
OutlineOutline
• Semantics & Scientific Data IntegrationSemantics & Scientific Data Integration
• Semantics & Scientific Workflow ManagementSemantics & Scientific Workflow Management
• ConclusionsConclusions
![Page 34: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/34.jpg)
34DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
What is a Scientific Workflow What is a Scientific Workflow (SWF)?(SWF)?
• GoalsGoals::– automate a scientist’s repetitive data management
and analysis tasks – typical phases:
• data access, scheduling, generation, transformation, aggregation, analysis, visualization
design, test, share, deploy, execute, reuse, … SWFs
![Page 35: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/35.jpg)
35DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Promoter Identification Promoter Identification WorkflowWorkflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
![Page 36: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/36.jpg)
36DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)
![Page 37: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/37.jpg)
37DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Ecology: GARP Analysis Ecology: GARP Analysis Pipeline for Invasive Species Pipeline for Invasive Species
PredictionPrediction
Training sample
(d)
GARPrule set
(e)
Test sample (d)
Integrated layers
(native range) (c)
Speciespresence &
absence points(native range)
(a)EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Validation
MapGeneration
Integrated layers (invasion area) (c)
Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
Environmental layers (native
range) (b)
GenerateMetadata
ArchiveTo Ecogrid
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
Environmental layers (invasion
area) (b)
Invasionarea prediction
map (f)
Model qualityparameter (g)
Selectedpredictionmaps (h)
Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)
![Page 38: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/38.jpg)
38DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
![Page 39: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/39.jpg)
39DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Commercial & Open Source Commercial & Open Source Scientific “Workflow” (often Scientific “Workflow” (often DataflowDataflow) Systems) Systems
Kensington Discovery Edition from InforSense
Taverna
Triana
![Page 40: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/40.jpg)
40DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
SCIRun: Problem Solving Environments SCIRun: Problem Solving Environments for Large-Scale Scientific Computingfor Large-Scale Scientific Computing
• SCIRun: PSE for interactive construction, debugging, and SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computations and visualizationssteering of large-scale scientific computations and visualizations
• Component model, based on generalized dataflow programmingComponent model, based on generalized dataflow programming
Steve Parker (cs.utah.edu)Steve Parker (cs.utah.edu)
![Page 41: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/41.jpg)
41DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Ptolemy II
see!see!see!see!
try!try!try!try!
read!read!read!read!
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
![Page 42: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/42.jpg)
42DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Why Ptolemy II (and thus KEPLER)?Why Ptolemy II (and thus KEPLER)?• Ptolemy II Objective:Ptolemy II Objective:
– “The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”
• Dataflow Process Networks w/ natural support for Dataflow Process Networks w/ natural support for abstractionabstraction, , pipeliningpipelining (streaming) (streaming) actor-orientation, actor-orientation, actor actor reusereuse
• User-OrientationUser-Orientation– Workflow design & exec console (Vergil GUI)– “Application/Glue-Ware”
• excellent modeling and design support• run-time support, monitoring, …• not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …)• but middle-/underware is conveniently accessible through actors!
• PRAGMATICSPRAGMATICS– Ptolemy II is mature, continuously extended & improved, well-documented
(500+pp) – open source system– Ptolemy II folks actively participate in KEPLER
![Page 43: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/43.jpg)
43DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
KEPLER/KEPLER/CSPCSP: : CContributors, ontributors, SSponsors, ponsors, PProjectsrojects(or loosely coupled (or loosely coupled CCommunicating ommunicating SSequential equential PPersons ;-)ersons ;-)
Ilkay Altintas Ilkay Altintas SDM, ResurgenceSDM, Resurgence
Kim Baldridge Kim Baldridge Resurgence, NMIResurgence, NMI
Chad Berkley Chad Berkley SEEKSEEK
Shawn Bowers Shawn Bowers SEEKSEEK
Terence Critchlow Terence Critchlow SDMSDM
Tobin Fricke Tobin Fricke ROADNetROADNet
Jeffrey Grethe Jeffrey Grethe BIRNBIRN
Christopher H. Brooks Christopher H. Brooks Ptolemy IIPtolemy II
Zhengang Cheng Zhengang Cheng SDMSDM
Dan Higgins Dan Higgins SEEKSEEK
Efrat Jaeger Efrat Jaeger GEONGEON
Matt Jones Matt Jones SEEKSEEK
Werner Krebs, Werner Krebs, EOLEOL
Edward A. Lee Edward A. Lee Ptolemy IIPtolemy II
Kai Lin Kai Lin GEONGEON
Bertram Ludaescher Bertram Ludaescher SDM, SEEKSDM, SEEK, , GEONGEON, , BIRN,BIRN, ROADNetROADNet
Mark Miller Mark Miller EOLEOL
Steve Mock Steve Mock NMINMI
Steve Neuendorffer Steve Neuendorffer Ptolemy IIPtolemy II
Jing Tao Jing Tao SEEKSEEK
Mladen Vouk Mladen Vouk SDMSDM
Xiaowen Xin Xiaowen Xin SDMSDM
Yang Zhao Yang Zhao Ptolemy IIPtolemy II
Bing Zhu Bing Zhu SEEKSEEK
••••••
Ptolemy IIPtolemy II
www.kepler-project.orgwww.kepler-project.org
![Page 44: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/44.jpg)
44DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
KEPLER: An Open CollaborationKEPLER: An Open Collaboration• Initiated by members from DOE SDM/SPA and NSF SEEK; now several Initiated by members from DOE SDM/SPA and NSF SEEK; now several
other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, …)other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, …)• Open Source (BSD-style license)Open Source (BSD-style license)• Intensive Communications: Intensive Communications:
– Web-archived mailing lists– IRC (!)– Meetings, Hackathons
• Co-development: Co-development: – via shared CVS repository– joining as a new co-developer (currently):
• get a CVS account (read-only)• local development + contribution via existing KEPLER member• be voted “in” as a member/co-developer
• Software & social engineeringSoftware & social engineering– How to better accommodate new groups/communities?– How to better accommodate different usage/contribution models (core dev …
special purpose extender … user)?
![Page 45: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/45.jpg)
45DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Ptolemy II/KEPLER GUI (Vergil)Ptolemy II/KEPLER GUI (Vergil)“Directors” define the component interaction & execution semantics
Large, polymorphic component (“Actors”) and Directors libraries (drag & drop)
![Page 46: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/46.jpg)
46DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Web Services Web Services Actors Actors (WS Harvester)(WS Harvester)
12
3
4
“Minute-made” (MM) WS-based application integration• Similarly: MM workflow design & sharing w/o implemented
components
![Page 47: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/47.jpg)
47DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Rapid Web Service-based Prototyping Rapid Web Service-based Prototyping
(Here: ROADNet Command & Control Services for LOOKING Kick-Off Mtg)(Here: ROADNet Command & Control Services for LOOKING Kick-Off Mtg)
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
![Page 48: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/48.jpg)
48DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
An “early” example: An “early” example: Promoter Identification Promoter Identification
SSDBM, AD 2003SSDBM, AD 2003
• Scientist models application as a “workflow” of connected components (“actors”)
• If all components exist, the workflow can be automated/ executed
• Different directors can be used to pick appropriate execution model (often “pipelined” execution: PN director)
![Page 49: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/49.jpg)
PIW Workflow Today
![Page 50: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/50.jpg)
50DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Enter initial inputs,
Run
and
Display results
“Run Window”
![Page 51: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/51.jpg)
51DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Custom Output Visualizer
![Page 52: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/52.jpg)
52DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Job Management (here: Job Management (here: NIMROD)NIMROD)
• Job management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics)
![Page 53: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/53.jpg)
53DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Some Recent Actor AdditionsSome Recent Actor Additions
![Page 54: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/54.jpg)
54DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
in KEPLER in KEPLER (w/ editable (w/ editable script)script)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
![Page 55: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/55.jpg)
55DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
in KEPLER in KEPLER (interactive session)(interactive session)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
![Page 56: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/56.jpg)
56DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Blurring Blurring Design (ToDo)Design (ToDo) and Execution and Execution
![Page 57: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/57.jpg)
57DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Some Scientific Workflow Some Scientific Workflow ChallengesChallenges
• Typical FeaturesTypical Features– data-intensive and/or compute-intensive– plumbing-intensive (consecutive web services won’t
fit)– dataflow-oriented– distributed (remote data, remote processing)– user-interaction “in the middle”, …– … vs. (C-z; bg; fg)-ing (“detach” and reconnect)– advanced programming constructs (map(f), zip,
takewhile, …)– logging, provenance, “registering back”
(intermediate) products…
![Page 58: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/58.jpg)
58DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Scientific Workflows & Scientific Workflows & SemanticsSemantics• Registering data to ontologies: semantic Registering data to ontologies: semantic
types (in addition to structural data types)types (in addition to structural data types)• Smarter data set discovery & integrationSmarter data set discovery & integration• Now also: Now also:
– Smarter workflow design– More “intelligent” (semantics-aware)
component composition– Improved (re-)usability of data, services
(actors), and workflows– Given semantic type of my input ports, what
other data sets / actors produce such input
![Page 59: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/59.jpg)
59DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Reengineering a Geoscientist’s Mineral Classification Workflow
Add semantic types to ports!!
![Page 60: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/60.jpg)
60DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Beginnings: Ontology-based Beginnings: Ontology-based Actor/Service DiscoveryActor/Service Discovery
Ontology based actor (service) and dataset
search
Result Display
![Page 61: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/61.jpg)
61DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Semantics & Scientific WorkflowsSemantics & Scientific Workflows
Data comes from Data comes from heterogeneousheterogeneous sources sources– Real-world observations– Spatial-temporal contexts– Collection/measurement protocols and procedures– Many representations for the same information (count,
area, density)– Schematically heterogeneous
Data discovered and “synthesized” Data discovered and “synthesized” manuallymanually
Hard to reuse/repurpose existing analytical Hard to reuse/repurpose existing analytical steps (another form of heterogeneity)steps (another form of heterogeneity)
![Page 62: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/62.jpg)
62DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
The KR/SMS “Waterfall” … The KR/SMS “Waterfall” …
Ontologies
SemanticAnnotation
IterativeDevelopment
Resource Discovery
ResourceIntegration
WorkflowPlanning
WorkflowAnalysis
Source: [Bowers-SEEK-AHM-04]Source: [Bowers-SEEK-AHM-04]
![Page 63: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/63.jpg)
63DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
A KR+DI+Scientific Workflow A KR+DI+Scientific Workflow ProblemProblem
• Services can be Services can be semantically compatiblesemantically compatible, , but but structurally incompatiblestructurally incompatible
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
Compatible
(⋠)
(⊑)
(Ps)(Ps) (≺)
Ontologies (OWL)Ontologies (OWL)
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
![Page 64: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/64.jpg)
64DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Ontology-Informed Data Ontology-Informed Data Transformation Transformation (“Structure-Shim”)(“Structure-Shim”)
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Generate (Ps)(Ps)
Ontologies (OWL)Ontologies (OWL)
Transformation
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
![Page 65: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/65.jpg)
65DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
OutlineOutline
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Musings & Conclusions Musings & Conclusions
![Page 66: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/66.jpg)
66DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Some Thoughts … Some Thoughts … • Translate this idea of multiple conceptual (ontology) Translate this idea of multiple conceptual (ontology)
views to views to your your domain!domain!– e.g. datasets biological pathways registration
• Your data is valuable (time & $$$ spent in producing it)Your data is valuable (time & $$$ spent in producing it)data (re-)usabilitydata (re-)usability
• Metadata helps to discover, localize, assess relevant data sets, Metadata helps to discover, localize, assess relevant data sets, given particular scientific questions & queriesgiven particular scientific questions & queries
• Does your system “understand” what to do with the metadata? Does your system “understand” what to do with the metadata? • Capturing more semanticsCapturing more semantics of a data set in a way that humans of a data set in a way that humans
and systems can exploit it is an and systems can exploit it is an investment in reusabilityinvestment in reusability– “We are producing more and more data”– Today “we can store everything!”– But can we use anything? (i.e., is anyone looking at the data after
the initial creation?)• Design system, interfaces, data and metadata models with Design system, interfaces, data and metadata models with
reusability in mind (think archives and “time capsules”)reusability in mind (think archives and “time capsules”)• This may even be pushed to the experiment/simulation/workflow This may even be pushed to the experiment/simulation/workflow
design…design…
![Page 67: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/67.jpg)
67DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
The FutureThe Future• We start to see the benefits of We start to see the benefits of semantic semantic
technologiestechnologies in scientific data in scientific data managementmanagement– BTW: semantic technologies have been
there for a while!• think conceptual models, ER diagrams, …• or Gottlob Frege (German mathematician,
logician, philosopher; 1848-1925) • Today: momentum through “Semantic Web”,
“Semantic Grid”
• Where will semantics lead us in 10, 20, Where will semantics lead us in 10, 20, 50 years?50 years?
![Page 68: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/68.jpg)
68DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
A 50 year forecast in retrospectiveA 50 year forecast in retrospective(even if a hoax … you get the idea…)(even if a hoax … you get the idea…)
![Page 69: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/69.jpg)
69DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
KEPLER – a Collaboration KEPLER – a Collaboration ExampleExample
• A A grass-rootsgrass-roots project project– Needed a coalition of the (really!) willing – People matter!
• Intra-project Intra-project linkslinks– e.g. in SEEK: AMS SMS EcoGrid
• Inter-projectInter-project links links– SEEK ITR, GEON ITR, ROADNet ITRs, DOE SciDAC SDM,
Ptolemy II, NIH BIRN (coming we hope …), UK eScience myGrid, …
• Inter-technologyInter-technology links links– Globus, SRB, JDBC, web services, soaplab services, command
line tools, R, GRASS, XSLT, …
• InterdisciplinaryInterdisciplinary links links– CS, IT, domain sciences, … (recently: usability engineer)
![Page 70: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/70.jpg)
70DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
GEON Dataset Generation & GEON Dataset Generation & RegistrationRegistration
(a co-development in KEPLER)(a co-development in KEPLER)
Xiaowen (SDM)
Edward et al.(Ptolemy)
Yang (Ptolemy)
Efrat(GEON)
Ilkay(SDM)
SQL database access (JDBC)Matt,Chad,
Dan et al. (SEEK)
% Makefile$> ant run
% Makefile$> ant run
![Page 71: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/71.jpg)
71DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Summary/Lessons Learned Summary/Lessons Learned • Semantics mattersSemantics matters• Collaboration tools neededCollaboration tools needed
– CVS repositories (+cvsview, webcvs)– Mailing lists (e.g. mailman googlified) – Bugzilla (detailed tracking of tech. issues & bugs)– WIKI (community authored web resource, e.g. high-level tech.
issues)• People matterPeople matter• Repositories matterRepositories matter
– EcoGrid (SEEK) registry, GEON registry, BIRN registry KEPLER actor & datasets repository, …– UDDI what?
• “ “Melting Pots”: Melting Pots”: – Places, projects, organizations (GGF), tools:
• National Labs, …, SDSC, NCEAS, LTER, NLADR (w/ NCSA), KU Specify, …, new Genome Center@UC Davis (moving in …), …
• SDM, BIRN, GEON, SEEK, … • Kepler, …
![Page 72: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/72.jpg)
72DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Q & AQ & A
![Page 73: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/73.jpg)
73DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Further ReadingFurther Reading
under review – available upon request from [email protected]
![Page 74: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/74.jpg)
74DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Related PublicationsRelated Publications• Semantic Data Registration and IntegrationSemantic Data Registration and Integration• On Integrating Scientific Resources through Semantic RegistrationOn Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, , S. Bowers, K. Lin,
and B. Ludäscher, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database 16th International Conference on Scientific and Statistical Database ManagementManagement ( (SSDBM'04SSDBM'04), 21-23 June 2004, Santorini Island, Greece. ), 21-23 June 2004, Santorini Island, Greece.
• A System for Semantic Integration of Geologic Maps via A System for Semantic Integration of Geologic Maps via OntologiesOntologies, K. Lin and B. , K. Lin and B. Ludäscher. In Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific DataSemantic Web Technologies for Searching and Retrieving Scientific Data ( (SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
• Towards a Generic Framework for Semantic Registration of Scientific DataTowards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers , S. Bowers and B. Ludäscher. In and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Semantic Web Technologies for Searching and Retrieving Scientific DataScientific Data ( (SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
• The Role of XML in Mediated Data Integration Systems with Examples from Geological (MThe Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperabilityap) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In , B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Geological Society of America (GSA) Annual MeetingAnnual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
• Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON GSemantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Gridrid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In , K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual MeetingGeological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
• Query Planning and RewritingQuery Planning and Rewriting• Processing First-Order Queries under Limited Access PatternsProcessing First-Order Queries under Limited Access Patterns, Alan Nash and B. , Alan Nash and B.
Ludäscher, Ludäscher, Proc. 23rd ACM Symposium on Principles of Database SystemsProc. 23rd ACM Symposium on Principles of Database Systems ( (PODS'04PODS'04) ) Paris, France, June 2004. Paris, France, June 2004.
• Processing Unions of Conjunctive Queries with Negation under Limited Access PatternsProcessing Unions of Conjunctive Queries with Negation under Limited Access Patterns, , Alan Nash and B. Ludäscher., Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology9th Intl. Conference on Extending Database Technology ((EDBT'04EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992. ) Heraklion, Crete, Greece, March 2004, LNCS 2992.
• Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negationwith Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), , B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on 20th Intl. Conference on Data EngineeringData Engineering ( (ICDE'04ICDE'04) Boston, IEEE Computer Society, April 2004.) Boston, IEEE Computer Society, April 2004.
![Page 75: Semantic Technologies: Towards Making a Difference in Scientific Data Management Bertram Ludäscher (ludaesch@sdsc.edu) San Diego Supercomputer Center Associate.](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649e605503460f94b5a807/html5/thumbnails/75.jpg)
75DOE NC Meeting, Boulder, Dec 1st 2004 Semantic Technologies in SDM, B.Ludäscher
Related PublicationsRelated Publications• Scientific WorkflowsScientific Workflows• KeplerKepler: An Extensible System for Design and Execution of Scientific Workflows: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. , I. Altintas, C.
Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on 16th International Conference on Scientific and Statistical Database ManagementScientific and Statistical Database Management ( (SSDBM'04SSDBM'04), 21-23 June 2004, Santorini ), 21-23 June 2004, Santorini Island, Greece. Island, Greece.
• KeplerKepler: Towards a Grid-Enabled System for Scientific Workflows: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad , Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10)Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004., Berlin, March 9th, 2004.
• An Ontology-Driven Framework for Data Transformation in Scientific WorkflowsAn Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers , S. Bowers and B. Ludäscher, and B. Ludäscher, Intl. Workshop on Data Integration in the Life SciencesIntl. Workshop on Data Integration in the Life Sciences ( (DILS'04DILS'04), March ), March 25-26, 2004 Leipzig, Germany, LNCS 2994. 25-26, 2004 Leipzig, Germany, LNCS 2994.
• A Web Service Composition and Deployment Framework for Scientific Workflows, I. A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on 2nd Intl. Conference on Web ServicesWeb Services ( (ICWSICWS), San Diego, California, July 2004.), San Diego, California, July 2004.