Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts?...

51
Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow design … and other research issues

Transcript of Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts?...

Page 1: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Zen and the Art of SWF Maintenance

• Kinds of Scientific Workflows

• Why not just Python scripts?

• Business workflows born again ?

• Zen and the art of workflow design– … and other research issues

Page 2: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

What is a Scientific Workflow (SWF)?

• Model the way scientists work with their data and tools– Mentally coordinate data export, import, analysis via software systems

• Scientific workflows emphasize data flow (≠ business workflows)

• Metadata (incl. provenance info, semantic types etc.) is crucial for automated data ingestion, data analysis, …

• Goals: – SWF automation,

– SWF, component reuse

– SWF design & documentation

making scientists’ data analysis and management easier!

Page 3: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

What we use SWF for …

• Short answer: Everything – includes making coffee (tea ceremonies are harder)

• Kinds of workflows (not disjoint):– Plumbing: Stage files, submit batch jobs, monitor progress, move

files off XT3 to analysis and viz cluster, archive, steer computation, …

• Ex: Fusion simulation, Astrophysics (supernova simulation), … your laptop backup???

– Knowledge discovery workflows: automate repetitive data access, retrieval, custom analysis (e.g. Blast), generic steps (PCA, cluster analysis, ..),

• Do this in ways that are meaningful to the scientist• Ex: PIW, Motif analysis, NDDP, …

– Conceptual modeling workflows: what the heck is XYZ doing? Reverse engineering of processes and information flows at all levels, in order to optimize, we need to understand first

• Ex: napkin drawing workflows to get an overview, refine design from abstract to executable (top-down), or generalize from the concrete/legacy to the abstract (bottom-up); data-driven, task-driven, ..

Page 4: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Why not just a Python script?

• Users who might be able to define, reuse, modify, specialize WFs might not be able to do the same for Python scripts

• But wait, there’s more:– Modular reuse– Debugging and monitoring of WF execution

• easy to “tee” (“man tee” for you windows guys ;-)– Automated Provenance Mgmt– Semantic types– From integrated WF modeling (ER + dataflow + co-

registrations) to execution, optimization, archival …

Page 5: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Business workflows born-again?

• Yes, there are similarities– And we can learn from BWF! E.g. transactions!

• But also big differences:– SWF:

• data-flow oriented• streaming/pipelined execution• cf. signal processing (see also COM later)• popular MoC: PN

– BWF: • task- and control-flow oriented• popular MoC: Petri-Net? CSP?

Page 6: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Sample BWFs

• Focus is on … – Tasks

– Control-flow

– Work items

• Useful stuff:– Transactions!

– How to handle complex control-flow …

Page 7: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Pop Quiz! BWF? SWF?

Page 8: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

And the answer is …

Page 9: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Click here for “Oracle” (or another one)

Page 10: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Dataflow it is!

Page 11: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

The Dataflow Difference

Page 12: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Data/Process/Provenance Central

Page 13: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

BUY ME!!

Page 14: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

A Signal Processing Pipeline

Page 15: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Some Terminology (tentative)

• Workflow definition W ( WF graph we see)– partial specification of a workflow (cf. program)– parameters P need to be instantiated– data-bindings D can be viewed as special parameters

• Model of Computation (MoC)– Looking at W, P, D we still not know how to execute W(P,D) to

compute result R– A MoC is an algorithm telling us how to apply W on P and D to obtain

R.– Examples:

• MoC TM (Turing Machine): – given program P and input I, we know what to do

• MoC PN (Process Network):– Network of independent processes, communicating through (infinite)

unidirectional buffers (queues), prefix-monotonic behavior; given a PN and an input stream and prefix-monotonic, deterministic actors, the output stream is determined! (lots of flexibility for execution!)

• MoC SDF (Synchronous Dataflow):– Similar to PN, but actors must statically declare there token

production/consumption rates; solving for pos. int. solutions of balance equations (“LGS”) yields static schedule guaranteeing fixed buffer size

Page 16: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Some Terminology (tentative)

• Model of Computation (MoC)• WF Run: completed computation• WF Execution: ongoing computation• Computation graph: graph data structure keeping track

of which token has been computed from which other one(s)– Simple examples: evaluating an arithmetic expression; running a “job

DAG”– But keeping track of “real dependencies” can be tricky

• Ex: output tuples of an SQL query have “witness tuples” in multiple relations; clear for positive existential queries; what are witnesses for universal and negated queries? R = A \ B ; witnesses anybody?

• Similar to the notion of “proof tree” in logic (and LP); negation-as-failure looms it’s ugly (beautiful?) head!

Page 17: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Research Area: Provenance

• (Abstract) Use Cases– “Total Recall”: capture everything the MoC can observe

• … and more: MoC-inherent plus addtl. observables– Example: time-stamp token-in, token-out events benchmark actor exec

time, data movement time, … – The 7 W’s: Who, What, Where, Why, When, Which, (W)how (C. Goble)

– Smart Re-run: after Pause or Stop, followed by parameter changes: rerun relevant parts

– Fault tolerance, crash recovery (cf. checkpointing)– Result interpretation and post-mortem analysis

• Research Question: – Given a use case (as a query U) and a provenance schema PS, can

U be answered using PS? (related to query answering using views – a reasoning problem!)

– Ultimately: design PS with U in mind! Also: optimize/specialize PS if U is known/limited

– Note: the MoC can make a difference! For example, some MoCs have explicit notion of “firing” or might exploit actor declarations (“I’m a function! I have no state!”) This means is relevant e.g. for checkpointing (Need to save state or not? When to save state..)

Page 18: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Research Area: WF/Dataflow Design

• Collection-Oriented Modeling (COM)– Assembly line metaphor + Signal Processing + XML + …

• Streams are nested collections ( XML)• Stream data schema is “registered” to a WF data model (really

need this)• Actor “picks up” only certain parts of the stream: scope• Actor declares how within the scope is changed: delta • Gives rise to new notions of type and new problems of type

inference (using scope, delta, workflow structure etc.)– Advantages:

• Less “messy” WFs (more linear, less branching)• “Add-only” mode (inject new derived information); augmentation

instead of transformation• Tagging data for downstream processing (instead of “bombing”,

pass on “dirty” / faulty / strange data with a relevant tag• Pipelined parallelism (can stream an array)

Page 19: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Research: WF Design

• ER model primitives:– Entity (-type), attribute, relationship (-type)

• SWF model primitives??– Actors, directors (MoC), …– Lots of new “types”:

• Conventional data type (Java style)• Polymorphic types w/ type variables (Haskell style)• Semantic type (formal annotations in logic relative to a

controlled vocabulary or knowledge base)• Hybrids• A “theory of adapters” !?

Page 20: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

hand-crafted control solution; also: forces sequential execution!

designed to fit

designed to fit

hand-craftedWeb-service

actor

Complex backward control-flow

No data transformations

available

[Altintas-et-al-PIW-SSDBM’03][Altintas-et-al-PIW-SSDBM’03]

Page 21: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

A Scientific Workflow Problem: More Solved (Computer Scientist’s view)

• Solution based on declarative, functional dataflow process network(= also a data streaming model!)

• Higher-order constructs: map(f) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go from

piw(GeneId) to PIW :=map(piw) over [GeneId]

map(f)-style

iterators Powerful type

checking Generic,

declarative “programming”

constructs

Generic data transformation

actors

Forward-only, abstractable sub-workflow piw(GeneId)

Page 22: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

A Scientific Workflow Problem: Even More Solved (domain&CS coming together!)

map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}

Page 23: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Research Problem: Optimization by Rewriting

• Example: PIW as a declarative, referentially transparent functional process optimization via functional rewriting possiblee.g. map(f o g) = map(f) o map(g)

• Technical report &PIW specification in Haskellmap(f o g)

instead of map(f) o

map(g)

Combination of map and zip

http://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf

Page 24: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Job Management (here: NIMROD)

• Job management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics)

Page 25: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Kepler Coupling Components & Codes

• Types of Coupling …– Loosely coupled (“1st Phase”)

• Web Services (SPA, GEON, SEEK, …), • ssh actors, ..

+ reusability (behavorial polymorphism)

+ scalability (# components)

– efficiency– Tight(er) coupling (“2nd Phase”)

• Via CCA (SciRUN-2, Ccaffeine, …) (Cipres uses CORBA) • HPC needs: code-coupling as efficient & flexible as possible

(e.g. Scott’s challenges…) – memory-to-memory (single node or shared memory), – MPI (multiple-nodes)– optimizations for transfer of data & control (streaming, socket-based

connections)

Page 26: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Accord-CCA: Ccaffeine w/ Self-Managed Behavior

Source: Hua Liu and Manish Parashar

cf. w/ mobile models, reconfiguration in Ptolemy II

… begging for a Kepler design and

implementation …

Page 27: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Fault Tolerance & Maintenance Challenges

Page 28: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Workflow Templates and Patterns

New Ingredients Proposed Layered Architecture

work w/ Anne Ngu, Shawn Bowers, Terence Critchlow

Page 29: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Use Ideas from Fault Tolerant Shell

Source: Douglas Thain, Miron Livny The Ethernet Approach to Grid Computing

Good ideas in ftsh; some might be

(semi-)low hanging fruits for Kepler …

Page 30: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Use of Semantics in SWF…

“Smart” Search– Concept-based, e.g., “find all datasets containing biomass

measurements”

Improved Linking, Merging, Integration– Establishing links between data through semantic annotations &

ontologies– Combining heterogeneous sources based on annotations– Concatenate, Union (merge), Join, etc.

Transforming– Construct mappings from schema S1 to S2 based on annotations

Semantic Propagation– “Pushing” semantic annotations through transformations/queries

Page 31: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Typing Workflow Components

Semantic Type Editor is used to assign one or more semantic types to the component or to the component’s input and output ports. In the simplest case, a semantic type is a class taken from an OWL-DL ontology. Multiple types define a conjoined concept expression.

A simple ontology browser is provided in Kepler to navigate a classified OWL-DL ontology. Classes can be searched for and selected as a semantic type.

Page 32: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

More on Semantic Annotation

Initial Version Supports:

• Actor-level and port-level annotations

• Annotations are stored in actor’s MoML definition (as new “semantic type” properties)

• Creation of composite ports (i.e., “virtual” ports grouping a set of underlying ports)

• Regular and composite ports may have multiple annotations (conjunction)

• Annotations can be drawn from multiple ontologiesAn annotated composite port

Page 33: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

More on Semantic AnnotationCurrently Adding:

• “Semantic Link” Annotations for annotation of ports via ontology properties

– E.g, hasLat(point1, lat1) – Supported in MoML, not yet in tool

• Simple condition “filters” in port semantic annotations

– E.g., if attribute height > 0 then biomass is annotated as AboveGroundBiomass

• Incorporating instances/values in semantic links

– E.g., hasUnit(biomass, celsius)

• Suggesting additional annotations based on given ones

– suggesting/guessing ways to “fill in” given annotations

– E.g., possible semantic links

• Templates and ontology “views”– To help specify common annotation

patternsSemantic Links

Page 34: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Checking Type ConstraintsKepler can statically perform semantic and structural type checking of connections. A type checker allows the user to see potentially mismatched port connections as well as known type conflicts before workflow execution.

The user can navigate the unsafe and potentially unsafe channels using the Kepler Type Checker dialog. When a channel is selected: (a) it is highlighted on the canvas, (b) the structural type and status is shown (here, the channel is structurally well typed), and (c) the semantic type and status is shown (here, the connection produce a semantic type error).

Page 35: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Kepler Actor-Library

• Ontology-based actor organization / browsing• Customizable libraries based on ontologies• Text search with concept-based expansion

Users can discover ImageJ using various search terms. Here, ImageJ shows up in multiple tree locations based on its given annotations. The library search permits text-based matching against the component’s metadata (its given name and certain properties), expanded with concept matches.

Page 36: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Semantic Searching

Kepler provides a more advanced ontology-based search mechanism. Users can start the Semantic Search dialog, where components can be search for based on their semantic types.

The Semantic Search dialog allows a user to search components by any combination of actor, input, and output semantic types.

Page 37: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Structural Type (XML DTD) Annotations

S1

(life stage property)

S1

(life stage property)

S2

(mortality rate for period)

S2

(mortality rate for period)

P1P2

P4

P3 P5

root population = (sample)*elem sample = (meas, lsp)elem meas = (cnt, acc)elem cnt = xsd:integerelem acc = xsd:doubleelem lsp = xsd:string

<population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> …<population>

root cohortTable = (measurement)*elem measuremnt = (phase, obs)elem phase = xsd:stringelem obs = xsd:integer

<cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement>…<cohortTable>

structType(P2) structType(P3)

Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]

Page 38: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Ontology-Guided Data Transformation

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

Structural/SemanticAssociation

Structural/SemanticAssociation

CorrespondenceCorrespondence

Generate (Ps)(Ps)

Ontologies (OWL)Ontologies (OWL)

Transformation

Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]

Page 39: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

WF-Design: Adapters for Semantic & Structural Incompatibility

Adapters may:

– be abstract (no impl.)

– be concrete

– bridge a semantic gap

– fix a structural mismatch

– be generated automatically (e.g., Taverna’s “list mismatch”)

– be reused components(based on signatures)

C1 C1 D1C1

C2

C D C C D D

D DC2 C2 D2

f2f1[S] S T [S][S]

f1[T]f2

map

f2f1[[S]] S T [[S]][[S]]

f1[[T]]f2

map

map

Source: [Bowers-Ludaescher, ER’05]Source: [Bowers-Ludaescher, ER’05]

Page 40: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Additional Design Primitives for Semantic Types

Extended Transformations Starting Workflow Resulting Workflow

t9: Actor Semantic Type Refinement (T T)

T

t12: I/O ConstraintStrengthening ( )

t10: Port Semantic TypeRefinement(C C, D D)

C

t14: Adapter Insertion

T

t11: AnnotationConstraint Refinement( ) s

C1

t15: Actor Replacement f f

t16: Workflow Combination(Map)

t13: Data Connection Refinement

…f1

f2

f1…f2

Resulting Workflow

D C D C D

t

D2 1

t

D 2

s

C 1

t

D2

s

C

Source: [Bowers-Ludaescher, ER’05]Source: [Bowers-Ludaescher, ER’05]

Page 41: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Scientific Workflow Design

• Support SWF design & reuse, via:– Structural data types – Semantic types– Associations (=constraints) between

them – Type checking, inference,

propagationSeparation of concerns:– structure, semantics, WF

orchestration, etc.Source: [Bowers-Ludaescher, ER’05]Source: [Bowers-Ludaescher, ER’05]

Page 42: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Semantic Annotation Propagation

Page 43: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Forward and Backward Propagation Rules

Page 44: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

GEON Dataset Generation & Registration(and co-development in KEPLER)

Xiaowen (SDM)

Edward et al.(Ptolemy)

Yang (Ptolemy)

Efrat(GEON)

Ilkay(SDM)

SQL database access (JDBC)Matt et al.

(SEEK)

% Makefile$> ant run

% Makefile$> ant run

Page 45: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Web Services Actors (WS Harvester)

12

3

4

“Minute-made” (MM) WS-based application integration• Similarly: MM workflow design & sharing w/o implemented

components

Page 46: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Some KEPLER Actors (out of 160+ … and counting…)

Page 47: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Different “Directors” for Different Concerns

• Example: – Ptolemy Directors – “factoring out” the concern of

workflow “orchestration” (MoC)– common aspects of overall execution not left to the

actors• Similarly:

– “Black Box” (“flight recorder”) • a kind of “recording central” to avoid wiring 100’s of

components to recording-actor(s) – “Red Box” (error handling, fault tolerance)

• use ftsh ideas; tempaltes – “Yellow Box” (type checking)

• for workflow design– “Blue Box” (shipping-and-handling)

• central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …)

– “CCA++ Boxes” • Change behavior (e.g. algorithm) of a component

• Change behavior (i.e., wiring) of a workflow in-flight

SDF/PN/DE/…

Provenance Recorder

SHA @

Static Analysis

On Error

Component Mgr

Composition Mgr

Page 48: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Separation of Concerns: Port Types

• Token consumption (& production) “type”– a director’s concern

• More generally: resource consumption “type”– other scheduling problems

• Token “transport type”– by value, reference (which one), protocol (SOAP, scp,

GridFTP, scp, SRB, …)– a SHA concern

• Structural and semantic types– SAT (static analysis & typing) concern– built after static unit type system…

• static unit type system as a special case!?

Page 49: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

Other Research Problems

• Making the system more X-aware:– MoC-aware: ok (directors)– Provenance-aware: …– DS (data schema)-aware: … – Semantics-aware: upcoming (should be hybrid w/ DS)– Host-aware: allow distributed scheduling of actors– Data-transport-aware: choose suitable data transport protocol (scp,

bbcp, http, (Grid-)ftp, SRB, SRM, ...)

– Think of new “folks” on the movie set:• Actors, director• Cameraman (provenance recorder?)• Editor (FF/REW/Play/Pause/Stop provenance re-run)• Caterer/Stager (feeding actors with yummy tokens!)• Managers for “Process Central” and “Data Central”• Semantic/Hybrid Type Manager

Page 50: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

More Research Topics

• What if we know something about bandwidths, processor loads, data sizes? workflow optimization!

• What if we have more semantics for actors?– Black-box: token in/out– Grey-box: data types, semantic types– White box: exact functional behavior is known!– Example: Actor implements a (stream-?) query!

Query Process Network– New optimization opportunities!

Page 51: Zen and the Art of SWF Maintenance Kinds of Scientific Workflows Why not just Python scripts? Business workflows born again ? Zen and the art of workflow.

A User’s Wish List

• Usability• Closing the “lid” (cf. vnc)• Dynamic plug-in of actors (cf. actor & data

registries/repositories)• Distributed WF execution• Collection-based programming• Grid awareness• Semantics awareness• WF Deployment (as a web site, as a web service, …)• “Power apps” (cf. SCIRun)• …