2415677 IM - Kooper · Title: 2415677_IM Created Date: 3/13/2017 11:18:16 AM
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for...
-
Upload
britney-jordan -
Category
Documents
-
view
217 -
download
0
Transcript of Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for...
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers
National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-Champaign (UIUC)
POC: Peter Bajcsy, email: [email protected]
CyberIntegrator: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools
Outline
• Problem Formulation– Meta-Workflow Definitions– Past Work
• Design– Workflow Requirements Driven by Environmental Observatories– Architecture of NCSA Meta-workflow Prototype Called
CyberIntegrator
• Implementation– Key Capabilities of CyberIntegrator
• Use Cases– Environmental and Hydrological Engineering
• Summary
Problem Formulation
Science Problem Formulation
System Problem Formulation
Work Flow Problem Formulation
Meta-Workflow Definition
• Meta-workflow (MWF) definitions in the past: – (1) Workflow aspect: a workflow is an aggregation of tasks, a meta-
workflow is an aggregation of workflows or a hierarchy of workflows – (2) Process management aspect: large activities have to be
integrated, executed and evaluated in a process of conducting electronic commerce
• Our meta-workflow definition includes multiple of its dimensions:– (1) hierarchical structure and organization of software,
• combinatorial explosion of module connection– (2) heterogeneity of software tools and computational resources,
• the number of different engines and software applications used by people for a reason
– (3) usability of tool and workflow interfaces, – (4) community sharing of fragments and user friendly security, – (5) community knowledge and provenance, – (6) execution and built-in fault-tolerance, etc
Previous Work• Other efforts:
– Business process workflow architectures - FlowMark, WSFL and BPEL: serving business community
– Scientific workflow architectures - DAGMan, Taverna, SciFlo, Kepler, D2K, OGRE, CCA, Pegasus, GridFlow and Grid Ant, Triana and GSFL
• Comparison: – Our work focuses on the simplicity of end user
interactions with information technologies while utilizing all execution mechanisms transparently (workflow by example).
– Our work creates provenance to recommendation pipelines for the benefit of a community (recommendations based on provenance information).
Research Topics
• Data Translations: Semantic and syntactic mapping of data structures
• Provenance Information: Granularity of gathered provenance information for recommendations, auditing and re-construction
• HCI: User interface design issues and community dependencies
• Meta-Data: Federation of distributed (data, tool, computational resource) registries
• Execution: Just in time data delivery wrt. remote computing; Cost benefit analysis of data transfer vs. CPU requirements; Execution triggered by streaming data
Design
Design Goals
• Make scientific discoveries easier– Workflow by example (step-by-step
experimentation)– Design friendly user interfaces– Build seamless access to heterogeneous
data/tools/resources – Provide data and process provenance
information– Recommend data, tools and computational
resources– Derive higher level semantic tools
Meta-workflow Architecture
Implementation
Meta-Workflow Features
• Workflow by example
• Support of heterogeneous executors– Workflows: GeoLearn, D2K, Kepler/Ptolemy– Applications: MS Excel, Im2Learn, ArcGIS– Web services: D2KWS
• Provenance– Gathering & Meta-data repositories
• Recommendations
Meta-workflow Editor
Use Cases
Meta-Workflow R&D Drivers
• Community drivers: – Environmental Science: CLEANER– Hydrological Science: CUAHSI
• Science drivers:– Environmental Modeling of Nutrient Distribution
• Monte Carlo simulations of maximum amount of pollution that a water body can receive each day and still retain its uses
– Understanding the Dynamic Evolution of Land-Surface Variables in the Illinois River Basin
• Data-driven analyses of multi-variable relationships from remote sensing data
• Technology drivers: – Collaboratory Cyberenvironments
Summary
• The problem of designing a highly interactive scientific meta-workflow system is very complex
• Key capabilities of our meta-workflow prototype implementation called CyberIntegrator were demonstrated with two use cases.
• We plan on building and deploying a practical tool for multiple communities.
• Publications:– Image Spatial Data Analysis Group at NCSA: – URL: http://isda.ncsa.uiuc.edu
• Questions:– Peter Bajcsy; Email: [email protected]
Hydro-informatics
Backup
Meta-workflow System Information
Terminology
• Engines are stand-alone environments and applications that are used by many tools– Examples: Matlab, MS Excel, D2K, Im2Learn, ArcGIS,
Kepler
• Tools are solutions specific to a problem and consist of several algorithms– Examples: Image Calculator in Im2Learn, Pie chart
visualization in MS Excel, …
• Algorithms are code fragments that perform a specific operation in a tool– Examples: image addition operation in Image Calculator
Environmental Science
Hydrological Science