Grid-enabled Collaborative Research Applications Internet2 Member Meeting Spring, 2003 Sara J....
-
Upload
mitchell-powell -
Category
Documents
-
view
216 -
download
0
Transcript of Grid-enabled Collaborative Research Applications Internet2 Member Meeting Spring, 2003 Sara J....
Grid-enabled Collaborative Research Applications
Internet2 Member MeetingSpring, 2003
Sara J. GravesDirector, Information Technology and Systems CenterUniversity Professor, Computer Science Department
University of Alabama in HuntsvilleDirector, Information Technology Research Center
National Space Science and Technology Center256-824-6064
http://www.itsc.uah.edu
“…drowning in data but starving for knowledge”
User
Community
InformationInformation
Data glut affects business, medicine,
military, scienceHow do we leverage data to make BETTER decisions???
Collaborative Research Applications
• Enabling Technologies for Collaborative Research– Grid-Enabled Data Mining Services– Interchange Technology Mark-ups– Collaboration Tools
• Collaborative Research Applications on the Grid– TeraGrid Expeditions– Linked Environments for Atmospheric Discovery– Propulsion Research: Rocket Engine
Advancement Project 2
Data Mining
• Automated discovery of patterns, anomalies from vast observational data sets
• Derived knowledge for decision making, predictions and disaster response
• ADaM – Algorithm Development and Mining System
http://datamining.itsc.uah.edu
Mining Environment: When,Where, Who and Why?
WHEN•Real Time•On-Ingest•On-Demand•Repeatedly
WHERE•User Workstation•Data Mining Center•GRID
WHO•End Users•Domain Experts•Mining Experts
Data Mining
WHY•Event•Relationship•Association•Corroboration•Collaboration
Iterative Nature of the Data Mining Process
DATA
PREPROCESSING
CLEANINGAnd
INTEGRATION
MINING SELECTION
AndTRANSFORMATION
DISCOVERY
KNOWLEDGEEVALUATION
AndPRESENTATION
ADaM Engine Architecture
PreprocessedData
PreprocessedData
Patterns/ModelsPatterns/Models
ResultsResults
OutputGIF ImagesHDF-EOSHDF Raster ImagesHDF SDSPolygons (ASCII, DXF)SSM/I MSFC
Brightness TempTIFF ImagesOthers...
Preprocessing AnalysisClustering K Means Isodata MaximumPattern Recognition Bayes Classifier Min. Dist. ClassifierImage Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture OperationsGenetic AlgorithmsNeural NetworksOthers...
Selection and Sampling Subsetting Subsampling Select by Value Coincidence SearchGrid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find HolesImage Processing Cropping Inversion ThresholdingOthers...
Processing
InputHDFHDF-EOSGIF PIP-2SSM/I PathfinderSSM/I TDRSSM/I NESDIS Lvl 1BSSM/I MSFC
Brightness TempUS RainLandsatASCII GrassVectors (ASCII Text)
Intergraph RasterOthers...
TranslatedData
DataData
Mining EnvironmentsMultilevel Mining (ADaM)
– Complete System (Client and Engine)– Mining Engine (User provides its own
client)– Application Specific Mining Systems– Operations Tool Kit– Stand Alone Mining Algorithms– Data Fusion
Distributed/Federated Mining– Distributed services– Distributed data– Chaining using Interchange Technologies
On-board Mining (EVE)– Real time and distributed mining– Processing environment constraints
Grid-Enabled Data Mining Services
• Distributed researchers, data sources, storage and computational resources in a secure environment
• ADaM data mining modules as Open Grid Services Architecture (OGSA) services
Data Mining / Earth Science Collaboration: Tropical Cyclone Detection
Advanced Microwave Sounding Unit (AMSU-A) Data
Calibration/Limb Correction/Converted to Tb
Mining Environment
Data Archive
Result
Results are placed on the web, made available to National Hurricane Center & Joint Typhoon Warning Center,
and stored for further analysis
Mining Plan:• Water cover mask to eliminate land• Laplacian filter to compute temperature
gradients• Science Algorithm to estimate wind
speed• Contiguous regions with wind speeds
above a desired threshold identified• Additional test to eliminate false positives• Maximum wind speed and location
produced
Hurricane Floyd
Further Analysis
http://pm-esip.msfc.nasa.gov/cyclone
KnowledgeBase
Data Mining / Earth Science Collaboration: Classification Based on Texture Features
Cumulus cloud fields have a very characteristic texture signature in the GOES visible imagery
Science Rationale: Man-made changes to land use cause changes in weather patterns, especially cumulus clouds
Comparison based on – Accuracy of detection– Amount of time required to classify
Parallel Version of Cloud Extraction
Laplacian FilterSobel Horizontal
FilterSobel Vertical
Filter
Energy Computation
Energy Computation
Energy Computation
Energy Computation
Classifier
GOES Image
Cloud Image
• GOES images can be used to recognize cumulus cloud fields
• Cumulus clouds are small and do not show up well in 4km resolution IR channels
• Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors• Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster
GOES Image Cumulus CloudMask
Data Mining / Earth Science Collaboration:
Detecting Signatures• Detecting mesocyclone
signatures from Radar data• Science Rationale:
Mesocyclone is an indicator of Tornadic activity
• Developing an algorithm based on wind velocity shear signatures– Improve accuracy and
reduce false alarm rates
Data Mining / Space Science Collaboration:
Boundary Detection and Quantification
• Analysis of polar cap auroras in large volumes of spacecraft UV images
• Scientific Rationale:– Indicators to predict
geomagnetic storm • Damage satellites• Disrupt radio
connection
• Developing different mining algorithms to detect and quantify polar cap boundary
Polar Cap Boundary
A B
C D
Data Mining / BioInformatics Collaboration:
Genome Patterns
MiningResults:MCSs
Genome DB
Mining EngineAnalysisModules
InputModules
OutputModules
Text Pattern Recognition: Used to search for text patterns in bioscience data as well as other text documents.
Scientists
Event/Relationship
SearchSystem
Event/Relationship
SearchSystem
Knowledge base
Sensor Data Characteristics
• Many different formats, types and structures
• Different states of processing ( raw, calibrated, derived, modeled or interpreted )
• Enormous volumes
• Heterogeneity leads to data usability problems
• Earth science data comes in: Different formats, types and
structures Different states of processing (raw,
calibrated, derived, modeled or interpreted)
Enormous volumes
• Heterogeneity leads to data usability problems
• One approach: Standard data formats Difficult to implement and enforce Can’t anticipate all needs
Some data can’t be modeled or is lost in translation
The cost of converting legacy data
• A better approach: Interchange Technologies Earth Science Markup Language
The Problem
DATA FORMAT 1
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 2
DATA FORMAT 3
DATA FORMAT 3
READER 1 READER 2
FORMATCONVERTER
ESML LIBRARY
APPLICATION
DATA FORMAT 1
DATA FORMAT 1
DATA FORMAT 2
DATA FORMAT 2
DATA FORMAT 3
DATA FORMAT 3
The Solution
APPLICATION
ESMLFILEESMLFILE
ESMLFILEESMLFILE
ESMLFILEESMLFILE
Interchange Technologies: Accessing Heterogeneous Data
What is ESML? It is a specialized markup language for Earth
Science metadata based on XML - NOT another data format.
It is a machine-readable and -interpretable representation of the structure, semantics and content of any data file, regardless of data format
ESML description files contain external metadata that can be generated by either data producer or data consumer (at collection, data set, and/or granule level)
ESML provides the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost of data conversion
ESML is the basis for core Interchange Technology that allows data/application interoperability
ESML complements and extends data catalogs such as FGDC and GCMD by providing the use/access information those directories lack.
http://esml.itsc.uah.edu
DATAFORMAT1
DATAFORMAT2
DATAFORMAT3
OTHER FORMATS
ESMLFILE
ESMLFILE
ESMLFILE
ESMLSCHEMA
ESML LIBRARY
OTHER APPLICATIONS
ESMLEDITOR
(3) MIDDLEWARE FOR AUTOMATION
ESML LIBRARY
ESMLDATA
BROWSER
ADaM DATA MININGSYSTEM
ESML CONSISTS OF:
(1) MARKUPS
ESMLFILE
(1) External description file for dataset or formats
(2) RULES FOR THEMARKUPS
ESMLSCHEMA
(2) Rules that govern the description of the data files
(3) Library parses and interpretsthe description file and figuresout how to read the data
Components of the ESML Interchange Technology
ESML in Numerical Modeling
ESMLfile
ESMLfile
ESMLfile
ESML Library
255
256
257
258
259
260
261
262
263
264
265
200 210 220 230 240 250 260 270 280 290 300
Sea Surface Temperature (TMI) Degree Kelvin
Ch
n 5
Tem
per
atu
re (
AM
SU
) D
egre
e K
elvi
n
GOESSkin Temp
InsolationProducts
Soundings,Others
Network
Prediction
Scientists can:• Select remote files across the
network• Select different observational
data to increase the model prediction accuracy
Purpose:• Use ESML to incorporate
observational data into the numerical models for simulation
NUMERICAL WEATHERMODELS (MM5, ETA, RAMS)
Collaboration Tools
CAMEX-4 campaign
• Data acquisition and integration from multiple platforms and instruments for quick exploitation
• Intra-project communications before, during, and after CAMEX campaigns
• Collaborators included NASA, NOAA, USAF, and multiple universities
Technologies to coordinate complex projects
http://camex.msfc.nasa.gov
CAMEX-4Distributed
Mission Coordinatio
nRDBMS
CoordinationClearinghouse
Forecasters
NASA managers review status
Radars Mission Managers
Data management
Web-based interface
Experiment PI
NASA Aircraft
NOAA Aircraft
USAF Aircraft
Aircraft Crew: maintenance and report status.
Modeling Environment for Atmospheric Discovery (MEAD): Use of the TeraGrid
Infrastructure
• will develop/adapt a cyberinfrastructure that will enable simulation, datamining, and visualization of hurricanes and storms
• will integrate model and grid workflow management, data management, model coupling, and analysis/mining of large, ensemble datasets.
•Argonne National Lab
•Georgia Tech University
•Indiana University
•Lawrence Berkley National Lab
•NCSA
•NOAA/FSL
•NOAA/NSSL
•Northwestern University
•Ohio State University
•Oklahoma University
•Portland State University
•Rice University
•Rutgers
•UAH
•UCAR
•University of Wisconsin
•University of Minnesota
Primary MEAD Software Components
• WRF Model (Weather Research and Forecasting)
• ROMS Model (Regional Ocean Modeling System)
• Coupled WRF/ROMS Model• D2K (Data to Knowledge)• ADaM (Algorithm Development and Mining
System)• Visualization Engines (NCAR Graphics, Vis5D,
IDV-VisAD, HVR, VTK)• netCDF, HDF5, ESML• Middleware (Globus, JavaCog, GridFTP)• Metadata Catalogue Service
Example MEAD Workflow
Initial Data and
Parameters
Initial Data and
Parameters
Multiple WRF Models
(Weather)
Multiple ROMS Models
(Ocean)
Data Mining (ADaM)
Visualization
Inter-model communications
Initial Setup Model Execution Post Run Analysis
ModelResults
ModelResults
Need the Grid to support the huge computational, data storage and post analysis requirements
Linked Environments for Atmospheric Discovery (LEAD)
Create for the university community an integrated, scalable framework for use in accessing, preparing, assimilating, predicting, managing, mining/analyzing, and displaying a broad array of meteorological and related information independent of format and physical location.
Collaborators:– University of Oklahoma– University of Alabama in Huntsville– UCAR/Unidata– Indiana University– University of Illinois/NCSA– Millersville University– Howard University– Colorado State University
LEAD Architecture
Application Services
Middleware
Grid and Web infrastructure
Data Management Workflow Management Monitoring
Data MiningVisualization
toolsModels
MyLEAD Portal
Others…
MyLEAD Virtual Environment
Interchange Technologies
Workflow Orchestration
Semantics for data and services
Personal Data Space
Resource Allocation
Scheduling Security Others…
poolsof work-stations
clu
ster
s
nat
ion
alsu
per
-co
mp
ute
r fa
cili
ties
tert
iary
sto
rag
e
scie
nti
fic
inst
r’m
ts
Distributed Resources
Collaborative Environment for Propulsion Research:
Rocket Engine Advancement Program 2
• Consortium of propulsion research centers.
• Auburn University • Purdue University• Pennsylvania State University• Tuskegee University
• Grid configuration will make distributed computational and data resources available to researchers without having to negotiate separate access to each resource.
• Linking or integration of multiple distributed experiment steps into a single investigation for more timely results and analysis.
• Will rely on the security capabilities of the Grid due to the sensitive nature of the propulsion research.
• University of Alabama in Huntsville• University of Tennessee• NASA Marshall Space Flight Center• NASA Glenn Research Center
Collaborative Environment for Propulsion Research
Rocket Engine Advancement Program 2
SupercomputerCluster(s)
TestEquipment
Data andResults
REAP2User Portal
REAP2Grid Portal
Evolution of Frameworks for Advanced Applications
• Changing Computational Landscape– GRIDS– Clusters– Web Services– Pervasive Computing– On-Board Processing
• Middleware for applications on GRID/Clusters – Automate parallelization of mining tasks– Estimate using resource requirements using computational
complexity of the algorithms
• Federated Model for Mining– Individual components that can be distributed and can
execute across different platforms