Scientific Workflow Systems

42
Workflow Systems for Science: Concepts & Tools Domenico Talia ICAR-CNR and University of Calabria, 87036 Rende, Italy by: Seyed Ziae Mousavi Mojab Wayne State University - 2013

description

An introduction about Scientific Workflows, and Workflow Management Systems.

Transcript of Scientific Workflow Systems

Page 1: Scientific Workflow Systems

Workflow Systems for Science: Concepts & Tools

Domenico TaliaICAR-CNR and University of Calabria, 87036 Rende, Italy

by: Seyed Ziae Mousavi MojabWayne State University - 2013

Page 2: Scientific Workflow Systems

● Introduction

● Main programming issues in the area of scientific workflow

● Significant WMS

● Issues that are still open in the area of scientific workflows

● Conclusion

Agenda

Page 3: Scientific Workflow Systems

"workflows provide a declarative way of specifying the high-level logic of an application, hiding the low-level details that

are not fundamental for application design."

Introduction

Page 4: Scientific Workflow Systems

"A workflow is a well-defined, and possibly repeatable, pattern or systematic organization of activities designed to

achieve a certain transformation of data."

Introduction

● Scientific Workflow:

Page 5: Scientific Workflow Systems

Introduction

Page 6: Scientific Workflow Systems

Introduction

Taverna, Pegasus, Triana, Askalon, Kepler, GWES, and Karajan, ...

● Scientific Workflows & open research issues

Page 7: Scientific Workflow Systems

Workflow Programming

● Textual programming interface

● Visual programming interface

Page 8: Scientific Workflow Systems

Workflow Programming

- Directed Acyclic Graph (DAG)

- Directed Cyclic Graph (DCG)

● Programming Structure:

Page 9: Scientific Workflow Systems

Workflow Programming

- Efficiency: bind tasks to appropriate computing resource

- Robustness: detecting and recovering from failure-- monitoring-- checkpoints-- step by step execution

● Workflow Enactment:

Page 10: Scientific Workflow Systems

Workflow Programming

- Abstract Level: what has to be done at each task along with information about how tasks are interconnected

- Concrete Level: the implementation and/or resources to be used

● Workflow Design:

Page 11: Scientific Workflow Systems

Scientific Workflow Management Systems

Workflow Management Systems: are software environments providing tools to define, compose, map, and execute workflows

- Programming languages: BPEL, UML, Petri nets, XML-based, …

Page 12: Scientific Workflow Systems

Scientific Workflow Management Systems

- Script-like Systems: Grid Ant, Karajan,...

- Graphical-based Systems

● Workflow Design:

Page 13: Scientific Workflow Systems

Scientific Workflow Management Systems

- Java based, Open source, developed at the University of Manchester

- To support life sciences (design and execution of scientific workflows)

- Can invoke any web service through WSDL (code reusability)

- Can also invoke local java services, api … and import data from CSV or Excel Spreadsheet

● Taverna:

Page 14: Scientific Workflow Systems

Scientific Workflow Management Systems

i. Taverna Engineii. Taverna Workbenchiii. Taverna Serveriv. A command line tool

● Taverna Tools:

Page 15: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 16: Scientific Workflow Systems

Scientific Workflow Management Systems

i. Pipeliningii. Implicit iteration of service callsiii. Conditional calling of servicesiv. Customizable looping over a servicev. Failover and retry of service callingvi. Parallel executionvii. Managing previous runs and workflow results

● Features of Taverna Workflow:

Page 17: Scientific Workflow Systems

Scientific Workflow Management Systems

- Java based, Open source, developed at Cardiff University- Modularized architecture- Combines a visual interface + data analysis tools- Can connect heterogeneous tools (Web services, Java units,...)- Uses its own custom workflow language (+BPEL)- Uses several workflow patterns including loop and branches

● Triana:

Page 18: Scientific Workflow Systems

Scientific Workflow Management Systems

- Signal analysis- Image manipulation- Desktop publishing- Also to integrate your own tools

● Triana Tools:

Page 19: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 20: Scientific Workflow Systems

Scientific Workflow Management Systems

- Developed at the university of Southern California- Runs on desktops, clusters, grids, clouds- Used in several scientific areas including bioinformatics, astronomy, earthquake science, gravitational wave physics, and ocean science.- Executes the workflow tasks in the order of their dependencies- Includes a sophisticated error recovery system

● Pegasus:

Page 21: Scientific Workflow Systems

Scientific Workflow Management Systems

i. The Mapper:- builds an executable workflow based on an abstract

workflow provided by the user- can also restructure the workflow for optimization purpose

ii. The Execution Engine:- executes the tasks in appropriate order

iii. The Task Manager:- managing and supervising workflow tasks on the local or

remote resources

● Pegasus Components:

Page 22: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 23: Scientific Workflow Systems

Scientific Workflow Management Systems

- Java based, open source, developed at the University of California- Can execute workflows from graphical interface or command line- Based on the concept of directors- Runs on local and Grids- Supports foreign language interface through JNI (Matlab actor, Python actor...)- Supports distributed computational resources through Web and Grid service actor- Used to design and execute various workflows in biology, ecology, geology, chemistry, and astrophysics

● Kepler:

Page 24: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 25: Scientific Workflow Systems

Scientific Workflow Management Systems

- Developed at the University of Innsbruck- Allows the execution of distributed workflow applications in service oriented Grids- Uses Globus Toolkit as Grid Middleware- Uses a custom XML based language (AGWL)

● Askalon:

Page 26: Scientific Workflow Systems

Scientific Workflow Management Systems

i. Resource Brokerii. Resource Monitoringiii. Information Serviceiv. Workflow Executorv. Metascheulervi. Performance Predictionvii. Performance Analysis

● Askalon Architecture:

Page 27: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 28: Scientific Workflow Systems

Scientific Workflow Management Systems

- Java based, open source data mining systems- Offers easy GUI interface- Includes Knowledge Flow tool- Data mining algorithms are wrapped as web services- Executes a whole workflow only on a single computer- Can use Gridlab to exploit Grid resources- Provides data & task parallelism

● Weka4WS:

Page 29: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 30: Scientific Workflow Systems

Scientific Workflow Management Systems

- Multi level abstraction- Plugin concept (DB2 activity, Grid activity, ...)- Can be used on clusters, grids, clouds- Uses GworkflowDL based on Petri nets- Supports exception handling

● GWES (Generic Workflow Execution Service):

Page 31: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 32: Scientific Workflow Systems

Scientific Workflow Management Systems

- Utilizes reference nets for composing workflow tasks in hierarchical way- Has forwarder-receiver components- Maps between tasks and resources

● DVega:

Page 33: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 34: Scientific Workflow Systems

- Java based- Allows users to compose workflows through XML scripting language & K- Supports linear and parallel execution- Supports hierarchical workflow- Allows monitoring of the execution (checkpointing subsystem)- Workflows can be modified during the runtime

● Karajan:

Scientific Workflow Management Systems

Page 35: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 36: Scientific Workflow Systems

- allows user to compose distributed data mining workflow - execute workflows onto the Knowledge Grid- visualize the result

Functionalities:i. Metadata managementii. Design and execution management

● DIS3GNO:

Scientific Workflow Management Systems

Page 37: Scientific Workflow Systems

Scientific Workflow Management Systems

Page 38: Scientific Workflow Systems

Discussion and Research issues

- Abstractions for data representation

- Abstractions for concurrent processing orchestration

- Annotating, storing and retrieving workflow results

● Workflow formalisms:

Page 39: Scientific Workflow Systems

Discussion and Research issues

i. Textual or graphical composition

ii. Mapping of the abstract workflow description onto the available resources

iii. Scheduling, monitoring, and debugging of subsequent execution

● Workflow Lifecycle:

Page 40: Scientific Workflow Systems

Discussion and Research issues

i. Adaptive Workflow Execution Modelsii. High level tools and languages for workflow compositioniii. Scientific workflow Interoperability and Opennessiv. Big Data management and knowledge discovery workflowsv. Internet-wide distributed workflow executionvi. Service-oriented workflows on Cloud infrastructuresvii. Workflows composition and execution in Exascale computing systemsviii. Fault-tolerance and recovery strategies for scientific workflowsxi. Workflow provenance and annotation mechanisms and systems

● Topics to investigate:

Page 41: Scientific Workflow Systems

Conclusion

- Support scientific processes

- Integrate programs, methods, agents and services

- Helps knowledge discovery from Big Data

- Needs to deal with failures

● Workflow Systems:

Page 42: Scientific Workflow Systems

Thank You!