Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University...
-
Upload
beatrix-snow -
Category
Documents
-
view
212 -
download
0
Transcript of Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University...
![Page 1: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/1.jpg)
Data Grid Research GroupDept. of Computer Science and EngineeringThe Ohio State UniversityColumbus, Ohio 43210, USA
David Chiu & Gagan Agrawal
Enabling Ad Hoc Queries over Low-Level Scientific
Data Sets
![Page 2: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/2.jpg)
2D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Presentation Outline
• Motivation‣ Current Trends in Scientific Data Management‣ Problem Discussion
• Data Registration Indexing‣ Metadata Extraction‣ Transformation
• Service Composition
• Conclusion
![Page 3: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/3.jpg)
3D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
![Page 4: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/4.jpg)
4D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
![Page 5: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/5.jpg)
5D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Increased tremendously over the years
Scientific Data Sets
• The collection of scientific data has increased over the years with new instruments, simulations, etc.
• Data sets are stored in repositories around the globe
• Just within U.S. entities in the geospatial domain‣ NOAA: oceanic, climate, water
quality, ...‣ NASA: ozone, air quality, tropical, ...‣ NRCS: land quality, watershed, ...
![Page 6: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/6.jpg)
6D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories
Web or Data Grid InfrastructureMass StorageSystems (MSS)
![Page 7: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/7.jpg)
7D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Scientific Data Sets
• Data sets are typically low level, i.e., ‣ Unstructured or semi-structured0101071895 0.34 -2.45 0.50 -0.65 -0.62 -0.71 0.00 -0.96 0101071896 -1.71 0.49 0.27 -0.79 -1.53 0.60 0.09 -2.210101071897 -0.53 0.14 4.32 1.95 -1.55 -1.68 -1.32 -0.690101071898 1.90 -2.64 -1.70 1.11 -2.18 -1.08 -0.53 -0.250101071899 0.44 0.97 1.65 -0.71 -2.02 -2.10 -0.50 -2.030101071900 -1.65 1.19 -1.34 0.57 -1.37 7.00 -0.48 -1.77 . . .
• However, data is well-documented‣ Accompanying XML-based metadata describing data sets is
typically required in today’s repositories
![Page 8: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/8.jpg)
8D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories
Mass StorageSystems (MSS)
Grid/Web Services & portals
Web or Data Grid Infrastructure
![Page 9: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/9.jpg)
9D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Repositories in the Global Scale
US EU
AU ...
![Page 10: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/10.jpg)
10
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
What Do the Users Want?
US
EU
AU
...
I don’t care where data is located.
I also want to share my own data with others!
Don’t just give me the data, but...
- Transform it - Manipulate it - Compose it with other processes and data sets
And do this with the least amount of work required from me!
![Page 11: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/11.jpg)
11
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
System Goals
• To enable queries over low level data sets, which involves:‣ identification of relevant data sets‣ automatic planning for the composition of dependent
services (processes) for derivation
• ... while being non-intrusive to existing schemes, i.e.,‣ avoids a standardized format for storing data sets‣ accommodates heterogeneous metadata‣ this system should - fit - into existing MSS and scientific
computing infrastructures (Data Grid & the Web)
![Page 12: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/12.jpg)
12
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
That’s good and all, but...
Challenges
• Not without challenges...‣ dealing with metadata from multiple entities‣ efficiently identifying relevant data sets‣ planning and executing accurate service compositions on
the spot
![Page 13: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/13.jpg)
13
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
That’s good and all, but...
Challenges
• Not without challenges...‣ dealing with metadata from multiple entities‣ efficiently identifying relevant data sets‣ planning and executing accurate service compositions on
the spot
DOMAIN KNOWLEDGE & SEMANTICS
• And without question, the need for
![Page 14: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/14.jpg)
14
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The AUSPICE System
AUSPICE: Automatic Service Planning and Execution in Cloud/Grid Environments
![Page 15: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/15.jpg)
15
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Semantics Layer
A Need for Domain Level Knowledge
• Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r
• Questions to ask the system:‣ How to deduce that this service can be used?‣ How to determine what information is needed for input?‣ Did the user provide enough information to invoke this service?
get_sat_image(double x, double y, double r)
inputsTo inputsToinputsTo
longitude latitude grid_size
outputsTo
satellite image
![Page 16: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/16.jpg)
16
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
In the Semantics Layer
Applying Domain Information
Domain concepts can be derivedfrom executing a service
Domain concepts can also be derived from retrieving an
existing data setService parameters representdifferent domain concepts
![Page 17: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/17.jpg)
17
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Handling heterogeneous metadata
• For instance, just within the geospatial domain,
Country Metadata Standards
US CSDGM
AU, NZ ANZLIC
EU ???
CDN ???
... ...
![Page 18: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/18.jpg)
18
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Handling heterogeneous metadata
![Page 19: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/19.jpg)
19
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Metadata Transformation
. .
.
(transform to spatial index)
![Page 20: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/20.jpg)
20
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
• Metadata to DB transformations
. .
.
insert
![Page 21: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/21.jpg)
21
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
![Page 22: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/22.jpg)
22
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
![Page 23: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/23.jpg)
23
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Data Registration Service
Indexing Data Sets
![Page 24: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/24.jpg)
24
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
In the Semantics Layer
Applying Domain Information
Data registration simplifies identification process within
![Page 25: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/25.jpg)
25
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Indexing Services
• Services (inputs, outputs) are also registered in much the same way
![Page 26: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/26.jpg)
26
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition: An Example
A subset of the ontology (unrolled)
![Page 27: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/27.jpg)
27
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition
begin compSrvc(concept, Q[...])W := ()
//perform DFS starting from conceptlet v := concept be the currently visited node
if v is a data type then W := (W, index.getData(v, Q))
else //v is a servicelet (p1,..,pn) be v’s params
//recursive call on each piW := (W, (v, compSrvc(p1, Q), ... , compSrvc(pn, Q)))
end if
return Wend
![Page 28: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/28.jpg)
28
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
The Planning Layer
Service Composition: An Example
Ontology (unrolled)
A Derived Execution Plan This is what data registration provides
![Page 29: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/29.jpg)
29
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Planning Times
![Page 30: Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.](https://reader035.fdocuments.in/reader035/viewer/2022070403/56649f2c5503460f94c47985/html5/thumbnails/30.jpg)
30
D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets
SSDBM ’09
Conclusion
• The AUSPICE System...‣ unifies heterogeneous metadata‣ extracts certain metadata attributes and indexes low level
data sets and services for fast access from distributed repositories
‣ automatically composes these services and data sets to answer user queries
• Questions - Comments?‣ David Chiu [email protected]‣ Gagan Agrawal [email protected]