GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows
description
Transcript of GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON IT Advances:GEON IT Advances:⁃ ⁃ Data IntegrationData Integration
⁃ ⁃ GEON WorkbenchGEON Workbench
⁃ ⁃ Scientific WorkflowsScientific Workflows
Bertram LudBertram Ludääscher scher
Kai LinKai Lin
Efrat JaegerEfrat Jaeger
San Diego Supercomputer Center
University of California, San Diego
2www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
The Problem: Scientific Data IntegrationThe Problem: Scientific Data Integrationor: or: … from Questions to Queries …… from Questions to Queries …
3www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Information Integration Challenges: Information Integration Challenges: SS44 Heterogeneities Heterogeneities
• SSystemsystems Integration Integration– platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote
resources, …
• SSyntaxyntax & & SStructuretructure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB ...) Database mediation technologies+ XML-based data exchange, integrated views, transparent query rewriting, …
• SSemanticsemantics– fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, … Knowledge representation & semantic mediation technologies+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!
4www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Information Integration Challenges: Information Integration Challenges: SS55 Heterogeneities Heterogeneities
• SSynthesisynthesis of analysis pipelines, integrated apps & of analysis pipelines, integrated apps & data products, …data products, …– How to make use of these wonderful things & put them
together to solve a scientist’s problem?
Scientific Problem Solving EnvironmentsScientific Problem Solving EnvironmentsGEON Portal and Workbench (“scientist’s view”)+ ontology-enhanced data registration, discovery, manipulation+ creation and registration of new data products from existing
ones, …
GEON Scientific Workflow System (“engineer’s view”)+ for designing, re-engineering, deploying analysis pipelines
and scientific workflows+ e.g., creation of new datasets from existing ones, dataset
registration,…
5www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Ontology-Enabled Application Example:Ontology-Enabled Application Example:Geologic Map IntegrationGeologic Map Integration
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
+/- a few hundred million years
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
AGE ONTOLOGY
NevadaNevada
6www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying by Geologic Age … Querying by Geologic Age …
7www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying by Geologic Age: ResultQuerying by Geologic Age: Result
8www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying by Chemical Composition … (GSC) Querying by Chemical Composition … (GSC)
9www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying by Chemical Composition: ResultsQuerying by Chemical Composition: Results
10www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Querying w/ British Rock Classification (BRC)Querying w/ British Rock Classification (BRC)
Uses a GSC BRC inter-ontology articulation mapping Uses a GSC BRC inter-ontology articulation mapping
11www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
British Rock Classification Query: ResultsBritish Rock Classification Query: Results
Uses a GSC BRC inter-ontology articulation mapping Uses a GSC BRC inter-ontology articulation mapping
12www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Need for Knowledge-enabled IntegrationNeed for Knowledge-enabled Integration
• A geologist analyzing chemical data from a pluton A geologist analyzing chemical data from a pluton finds no recognizable correlation between variables. finds no recognizable correlation between variables. – What possible scenarios can he examine to understand
this heterogeneity?
• Measured ages also show a scatter Measured ages also show a scatter – What is the significance of the observed spread in
measure time?
GeolAgeDB
GeoChemDB
DataTables Knowledge Representation Research:• concept maps & ontologies• process maps & ontologies• semantic types• … to facilitate (even) “smarter” tools
13www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A Prerequisite: Resource RegistrationA Prerequisite: Resource Registration
(1a) (1a) Register ontologiesRegister ontologies– geologic age; rock classifications (GSC, BGS), seismology; …
(1b) optionally: register inter-ontology articulations(1b) optionally: register inter-ontology articulations– e.g. GSC ontology BGS ontology
(2a) (2a) Item-level dataset registrationItem-level dataset registration– ADN metadata; other controlled vocabularies & ontologies
(e.g. geologic age timescale (USGS), SWEET (NASA), …)
(2b) (2b) Item-detail registrationItem-detail registration– e.g. associate values in a column with a concept
(3) (3) Use ontology-based query UI / applicationUse ontology-based query UI / application – e.g. query by geologic age and chemical composition
14www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Demonstration PreviewDemonstration Preview
NOTENOTE: A : A technologytechnology demonstration, demonstration, notnot a a contentcontent
demonstrationdemonstration ( (vocabularyvocabulary, , ontologyontology, , mapsmaps, …), …)
1.1. Ontology RegistrationOntology Registration (geologicAge.owl) (geologicAge.owl)
2.2. Dataset RegistrationDataset Registration (myShapeFiles.zip) (myShapeFiles.zip)
3.3. Item-Level AssociationItem-Level Association (1 (12) 2)
4.4. GEONsearchGEONsearch
• metadata, spatial, temporal, concept-based
5.5. GEONworkbenchGEONworkbench
• use of workspace e.g. composing new maps from existing ones
… … resume with resume with GEON workflowGEON workflow overview overview
15www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEONmiddleware
Demonstration PreviewDemonstration Preview
myOntology.owl myDataset.foo
metadatametadata
User Access (via Portal)User Access (via Portal)
Gazetteer, DLESE, …
Geologic Age, Chronos, …
external services
GEONsearchGEONsearch
Search condition(s)spatial temporal concept
LogLog
GEONworkbench GEONworkbench
GEON Workspace
(user)
User actionsadd delete manipulate
GEON Catalog
ResourceRegistrationResourceRegistration
SRB
Client Access (via web services)Client Access (via web services)
Other distributed apps Kepler, DLESE, …
16www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Dataset to Ontology Registration (Item-level)Dataset to Ontology Registration (Item-level)
Domain Knowledge Ontologies
Domain Knowledge Ontologies
ArizonaArizona
16www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
17www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Search: Concept-based Querying GEON Search: Concept-based Querying Portal Demonstration Portal Demonstration
18www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Scientific Problem Solving EnvironmentsScientific Problem Solving Environments
• GEON Portal and Workbench (“scientist’s view”)GEON Portal and Workbench (“scientist’s view”) previous demonstration
• Kepler Workflow System (“engineer’s view”)Kepler Workflow System (“engineer’s view”)– for (semi-)automating “scientific workflows” and “analysis
pipelines”– some features:
• … low-level plumbing to high-level conceptual flows … • connect reusable components (“actors”, “boxes”) to form apps• abstraction via nesting of subworkflows into composite actors• deploy automated workflows on the Grid and/or with custom Uis
– demonstrations available (“Kepler2Go-1.” CD for Summer Institute)
19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A Kepler Scientific WorkflowA Kepler Scientific Workflow
19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
component (actor) libraries component (actor) libraries canvas for design andexecution monitoring
canvas for design andexecution monitoring
inline commentsinline comments
20www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Translating query xml response to web service xml input format.
worldImage
XML SOAP response
Look InsideSample
GEON DatasetGEON Dataset Extraction & Processing Extraction & Processing
21www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 21www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON DatasetGEON Dataset Registration Registration
Annotation form
22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON DatasetGEON Dataset Registration Registration
validationRegistering
ADN metadata
Metadata display
23www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES 23www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Putting it all together … Putting it all together …
24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Workflows & KEPLERGEON Workflows & KEPLER
http://kepler-project.org (next week)
http://kepler.ecoinformatics.org (now)
http://kepler-project.org (next week)
http://kepler.ecoinformatics.org (now)
24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
HPC workflowHPC workflow
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON IT Advances:GEON IT Advances:⁃ ⁃ Data IntegrationData Integration
⁃ ⁃ GEON WorkbenchGEON Workbench
⁃ ⁃ Scientific WorkflowsScientific Workflows
Bertram LudBertram Ludääscher scher
Kai LinKai Lin
Efrat JaegerEfrat Jaeger
San Diego Supercomputer CenterUC San Diego
E N DE N D
26www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Related PublicationsRelated Publications
• Semantic Data Registration and IntegrationSemantic Data Registration and Integration• On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B.
Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
• A System for Semantic Integration of Geologic Maps via Ontologies, K. Lin and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.
• Towards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.
• The Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.
• Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Grid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual Meeting, volume 35(6), November 2003.
• Query Planning and RewritingQuery Planning and Rewriting• Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher,
Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004. • Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan
Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992.
• Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004.
27www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Related PublicationsRelated Publications
• Scientific WorkflowsScientific Workflows• Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C.
Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
• Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004.
• An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25-26, 2004 Leipzig, Germany, LNCS 2994.
• A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004.
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Additional Material Additional Material (for questions etc)(for questions etc)
29www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Multi-Hierarchical Rock Classification System (GSC)Multi-Hierarchical Rock Classification System (GSC)… a target ontology (after conversion to OWL) for geologic map registration …… a target ontology (after conversion to OWL) for geologic map registration …
Composition
Genesis
Fabric
Texture
30www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Inside Ontology-Enabled Map IntegrationInside Ontology-Enabled Map Integration
User: “Show formations from Cenozoic!”User: “Show formations from Cenozoic!”
Query RewritingQuaternary Tertiary
Cenozoic
Age Ontology
Arizona Montana West
TertiaryTertiary TkgmTkgm
QuaternaryQuaternary QQ
…… …………
QgQg QuaternaryQuaternary …… …… ……
TwpTwp TertiaryTertiary …… …… ……
TwlTwl TertiaryTertiary …… …… ……
PERIOD FORMATION LITHOLOGY
TkgmTkgm
QgQg
TwpTwp
TwlTwl
……
PERIOD
Color Definition
Map Rendering
select FORMATION where AGE=“Tertiary” or AGE=“Quaternary”
ABBREV
31www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Data Source Wrapping and IntegrationData Source Wrapping and Integration
Arizona
Colorado
Utah
Nevada
Wyoming
New Mexico
Montana East
Idaho
Montana West
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
ABBREV
PERIOD
PERIOD
NAME
PERIOD
TYPE
TIME_UNIT
FMATN
PERIOD
NAME
PERIOD
NAME
FORMATION
PERIOD
FORMATION
FORMATION
LITHOLOGY
LITHOLOGY
AGE
AGE
andesitic sandstone
Livingston formation
Tertiary-Cretaceous
32www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Gravity Modeling Design WorkflowGravity Modeling Design Workflow
• Idea: Comparing observed & synthetic gravity modelsIdea: Comparing observed & synthetic gravity models
• Steps:Steps:– Extracting and merging gravity depths from heterogeneous data
sources for a Lat/Lon bounding box (databases, web services).– Projecting and interpolating data sources into the same coordinate
systems.– Differencing observed and synthetic models.– Displaying Differential raster image.
33www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Grid InterpolationGrid Interpolation
• Interpolating queried gravity data on the grid and displaying it using a Interpolating queried gravity data on the grid and displaying it using a color schema.color schema.
• Currently IDW interpolation algorithm supported. Future plans: Minimum Currently IDW interpolation algorithm supported. Future plans: Minimum Curvature, TIN, Kriging and Spline.Curvature, TIN, Kriging and Spline.
• Output: either ascii x,y,z,p or ESRI ascii grid format.Output: either ascii x,y,z,p or ESRI ascii grid format.• Display: using global mapper service.Display: using global mapper service.
34www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Gravity Modeling Design WorkflowGravity Modeling Design Workflow