A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A...

28
A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A. Dalton, K. Lin * Virginia Tech ** San Diego Supercomputer Center * * * * **

Transcript of A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A...

Page 1: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

A Cyberinfrastructure Framework for Discovery,

Integration, and Analysis of Earth Science Data

A Prototype SystemA. K. Sinha, Z. Malik, A. Rezgui, A. Dalton, K.

Lin

* Virginia Tech

** San Diego Supercomputer Center

* * * * **

Page 2: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

2

Hypothesis Evaluation: Are A-Type Rocks in Virginia related to a Hot Spot Trace ?

Spatio-Temporal Distribution of Igneous Rocks

Laurentian Crust and Lithosphere

Plume Head

Hot Spot Trace ?

Page 3: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

3

GEON’s DIA Engine

Evaluating a Hypothesis requires

Discovery - Access to Data Integration of Data – Provide data

products Analysis of Data – Verify Hypothesis

Page 4: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

4

Data Discovery

Registration of Data : Pre-requisite for Data Discovery

Level 1 Registration – Keywords Level 2 Registration – Ontologic Classes Level 3 Registration – Item Detail Level

Page 5: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

5

Registration of Data:Key to Discovery, Integration and Analysis

Level 1 Discovery of data resources (e.g., gravity, geologic maps,

etc) requires registration through use of high level index terms. GEON has deployed extension of AGI Index terms -will be cross indexed to others such as GCMD, AGU

Level 2 Discovering Item level databases requires registration at

data level ontologies (e.g. bulk rock geochemistry, gravity database)

Level 3 Item detail level registration (e.g., column in geochemical

database that represents SiO2 measurement). This level of registration is a requirement for semantic integration

Page 6: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

6

AGI Index Terms

GEON Index Ontology

http://www.geoscienceworld.org/

Level 1 Registration

Page 7: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

7

Ontological Look at Virginia Tech Igneous Rock Database

RockGeologic Images

Methods & References

Isotope

LocationMineral

Structure

MapReference References

FeTreatmentMinerals BulkRockGeochemMethods

AnalyticalMethods BodyShapes

Fractures Fabric

RockGeoChemistry

ModelComposition

ImagesGeologicLocation MineralChemistry

Rb_Sr_Isotope_Whole_Rock

Sm_Nd_Isotope_Whole_Rock U_Th_Pb_Isotope_Whole_RockRb_Sr_Isotope_Mineral

Sm_Nd_Isotope_Mineral U_Th_Pb_Isotope_Mineral

Level 2: Registration at the Item Level

Mineral

Rock

Element

Isotope

Structure

Location

Level 2 Registration

Page 8: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

8

1 0..n

A Section from Planetary Material Ontology

AnalyticalOxideConcentration

analyticalOxide: AnalyticalOxideconcentration : ValueWithUniterrorOfConcentration : ValueWithUnit

GEON approach of registering data to concepts removesstructural (format) andsemantic heterogeneity

Level 3 Registration

Page 9: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

9

DIA Engine (1) How does GEON discover data

Keywords, Resource Type, Temporal, Spatial Invoke GEON protocol for discovering

databases Discovery, Integration and Analysis Engine

Retrieve the discovered data from registered databases

Emphasize Geospatial and Aspatial Discoveries (Not all things to be done through a Map-based browser)

Page 10: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

10

DIA Engine (2)

Geoscience TemplatesGeologic Map (USA)Geologic Map (States)

Terrane MapGeologic Provinces

Geophysical Map

- Experimental Databases

- Tools

Geospatial Engine Aspatial Engine

Page 11: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

11

High-Level View of the DIA Engine

User specifies class of data for analysis

The DIA Engine derives and retrieves the different data sets needed for the requested analysis

The DIA Engine applies processing and filtering techniques to generate the requested data product

Data products and Query Steps can be saved

RawData

QueryTool

DataProduct

Modeling Computation

Page 12: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

12

Data products (1) Data products can be in the form of Interactive

Maps, Interactive Filtering Diagrams or Excel Data Files

Examples: A map showing the A-Type bodies in the Mid-Atlantic

region

An Excel file giving the ages of those A-Type bodies

A gravity database table spatially related to A-Type bodies

Saved as a contoured gravity map

Page 13: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

13

Data products (2)

Data products can be: Pre-Packaged

Quickly queried but not flexible and provide little support for complex scientific discovery

Created Dynamically May require on-the-fly, extensive query

processing but enables far richer possibilities for scientific discovery

Requires Semantic Integration

Page 14: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

14

Data Integration (1)

Semantic integration of data products requires: Ontologies: a common language to

interpret data from different sources Data sharing: requires data registration

Fine grain (i.e., item-level) registration is necessary to enable the automatic processing (by tools) of shared data.

Page 15: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

15

Data Integration (2)

QueryTool

DataProduct

Integration within anontological class

OntologicallyRegistered Data 1

DP 1

Integration acrossontological classes

OntologicallyRegistered Data(Geo-chemistry)

OntologicallyRegistered Data(Geo-physics)

DP 2

QT 1

QT 2

OntologicallyRegistered Data 2

RawData

Data OwnerData Owner

Geo-chemistryOntology

RawData

RegisterData

Geo-chemistryOntology

Geo-physicsOntology

IntegrationClass

Location

Page 16: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

16

Limitations of Current Data Sharing Approaches

Each research group adopts its own acronyms, notations, conventions, units, etc.

Data sharing is of limited scope Data discovery is ad-hoc Only a small community of scientists may be aware of

and share a given data set Integration is difficult

Extensive conversion efforts may be needed Absence of streamlined integration leads to poor

ability to answer complex scientific questions Solution: Ontology-based Data Registration

Page 17: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

17

Menu-based (Used in the Demo) The GUI lets the user select only specific items

which in turn queries only a subset of the data A robust system informs the user of any incorrect

input and guides in the right direction Results are guaranteed as the query is

definitely answered Text-based

The entire database can be queried Result sets may be empty Only a small mistake in the query can return

incorrect results, without the user being able to point out the fallacy

Query Building

Page 18: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

18

Menu-based Query Building In a selected “region of interest” the user is

provided with a number of options (the menu)

User clicks through the different menus to build an exact query Click history is maintained to enable future referencing

Menu # 1 Menu # 3Menu # 2 Menu # 4

Menu # 5

Page 19: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

19

Query Tool Selection Tools provided by GEON can be used to answer a query

OR Other geologic tools can be incorporated (invocation

interfaces need to be defined) Example: GCD-Kit can be used for classification, geotectonic

and normative calculations for Igneous Rocks

Page 20: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

20

Analysis

Data Product(s) generated can be analyzed using various techniques Modeling Computation

Page 21: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

21

10000*Ga/Al vs.

Zr

User

Geo-Chemical

Data

FeO*/MgO vsZr+Nb+Ce+Y

Web ServerSDSC

RockClassification

Ontology

US NationalGazeteer

Q: A-Type polygons in a region Rusing discrimination diagram D ?

GEONServer -Virginia

Tech

DiscriminationFunctions

Geo-SpatialData

Geo-SpatialData Server

Geo-ChemicalData Server 1 -Virginia Tech(Mid-Atlantic)

Geo-Chemical

Data

Geo-ChemicalData Server 2

(Wyoming)

Geo-Chemical

Data

Geo-ChemicalData Server 3

(Texas)

Y vs. Nb

Java/VB ScriptASP.netVB.net

Visual Basic

Java/VB Script-enabled

Web browser

ESRIArcSDE

ESRI ArcGISServer

MS SQLServer

MS SQLServer

MS SQLServer

Workflow Associated with the Demo

Page 22: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

22

Used Technologies User Interface:

Java / VB Script ASP.net VB.net

Back-End: ESRI ArcGIS Server 9.1 ESRI ArcSDE 9.1 (Spatial Database) Microsoft SQL Server (Geo-Chemical

Database) Functionality Coding:

Visual Basic (to code the discrimination filters)

Page 23: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

23

Demo Starts Here

Page 24: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

24

Current Tool Sharing Approaches

Each research group develops its own tools

Tools developed by a research group are rarely used by other groups

Redundancy of development efforts Little interoperability amongst tools

Interaction amongst different tools is often not possible or requires extensive (re)coding

Solution: Wrap Tools as Web Services Accessible to the Scientific Community Worldwide

Page 25: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

25

The Future: Integration through Ontologies and Web Services

Benefits of Web Services Facilitate Integration

Tools developed independently may easily be integrated into new applications

Example: Discrimination tools may be made as Web services

Provide High Reusability More tools available to the research community

Reduce development time, effort, and cost

Page 26: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

26

Web Services Explained (1)

Function 1

ServiceProvider 1

Function 2

ServiceProvider 2

Function 3

ServiceProvider 3

W e b

UserUser

ApplicationProvider 1

ApplicationProvider 2

UDDI Registry

WSDL ServiceDescriptions

UDDI Registry

PublishWeb

Service

1

DiscoverWeb

Service

2

InvokeWeb

Service

3

SOAPMessages

WebServices

WS Standards

WSDL: Web Services Description Language

UDDI:Universal Description, Discovery, and Integration

SOAP:Simple Object Access Protocol

Page 27: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

27

Web Services Explained (2)

WSDL (Service provider describes service using WSDL) An XML-based language to describe the capabilities of Web

services The capabilities of a WS are described as a set of end points

that can exchange messages WSDL is part of UDDI

UDDI (Service provider publishes service using UDDI) A Web-based directory where service providers may list their

services and where service consumer may retrieve the services published by the providers (like yellow pages)

SOAP (Clients and services communicate using SOAP) An XML-based protocol used to encode the messages

(requests and responses) exchanged between a Web service and its clients.

Page 28: A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.

28

Within Same Ontologic Class

Discovery

Integration

Geochemical Geophysics Geologic Time

Ontologically Registered Data

Data Product

Analysis

Hypothesis Evaluation: Are A-Type Rocks in Virginia related to a Hot Spot Trace ?

Geospatial Query Aspatial Query

Between Different Ontologic Classes

Data Product

Geochemical

A-Type Identification

VA. Ontologically Registered Data

WY. Ontologically Registered Data

TX. Ontologically Registered Data