Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo,...

49
Trident Scientific Workflow Workbench eScience’08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin Gautam Microsoft Research Joby Thomas and the development team Aditi Technologies

Transcript of Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo,...

Page 1: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

TridentScientific Workflow Workbench

eScience’08 Tutorial

Nelson Araujo, Roger Barga, Dean Guo, Jared JacksonYogesh Simmhan, Catharine van Ingen, Nitin Gautam

Microsoft Research

Joby Thomas and the development teamAditi Technologies

Page 2: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Satya SahooWright State University

David KoopUniversity of Utah

Matt ValerioOhio State University

Eran ChinthakaIndiana University

MSR (Trident) Summer ‘09 Interns

Page 3: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Technical Content• Introduction• Feature Overview and Logical Architecture• Deep(er) dive into select features with

demos• Roadmap to delivery

Overview of our presentation today

Design Philosophy and Exit Strategy• Leverage COTS WFMS, build only what is required • Extensible and open, integrate with community tools• Drive development from actual eScience requirements• Deliver as open source accelerator to the community

Page 4: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Workflow for Ocean Observatories, part of an “oceanographer’s workbench” Jim Gray

Ocean Observing Initiative (OOI)Formerly the NEPTUNE project

Collaboration with Univ. of Wash & MBARI

Page 5: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.
Page 6: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

PanSTARRs(Astronomy)

Workflow Requirements• Load/Merge Databases• Execute on Clusters• Monitor workflow execution• Logging, Provenance, Faults

One of the largest visible light telescopesFour unit telescopes acting as oneOne Gigapixel per telescope

Survey entire visible universe in 1 weekCatalog solar system, moving objects/asteroidsps1sc.org: Univ. Hawaii, Johns Hopkins, …

Page 7: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Sanity Check of Network Files,

Manifest, Checksum

Validate CSV File & Table

Schema

Create, Register empty LoadDB from template

For Each CSV File in Batch

BULK LOAD CSV File into Table

StartPerform CSV

File/Table Validation

Perform LoadDB/Batch

ValidationEnd

Detect Load Fault. Launch Recovery Operations. Notify Admin.

Determine affine Slice Cold DB for CSV Batch

Switch OUT Slice partition

to temp

For Each Partition in

Slice Cold DB

UNION ALL over Slice & Load DBs into temp.

Filter on partition bound.

StartPost Partition

Load ValidationSwitch IN temp

to Slice partition End

Detect Merge Fault. Launch Recovery Operations. Notify Admin.

Slice Column Recalculations &

Updates

Post Slice Load Validation

Determine ‘Merge Worthy’ Load DBs &

Slice Cold DBs

Pan-STARRS Load & Merge Workflows

Page 8: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

http://research.microsoft.com/en-us/collaboration/tools/trident.aspx

http://beta.research.microsoft.com/en-us/collaboration/tools/trident.aspx

Trident Public Website Accessible today

From January ‘09

Page 9: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Logical Architecture

Features

Building on Windows

Workflow

9

Page 10: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Visualization

Design

Trident Logical Architecture

WorkflowPackages

ManagementStudio

Community

Workbench

Desktop

Browser

WindowsWorkflow

Foundation

ScientificWorkflows

Monitor

Administration

Web Portal(myExperiment)

Archiving

Trident Registry

Data Model (Data Agnostic Abstraction)

Data Access

SQL Server SSDS S3 Others

RegistryManagement

Trident Runtime Services

Provenance

Publish-Subscribe Blackboard

WF Execution Hosts

Others

Fault Tolerance

HPC Scheduling

Page 11: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Features

Libraries of activities, services, and workflows– Prepackaged activities and workflows out of the

box and custom libraries– Registry with rich sets of workflow meta data– Versions– Workflow packages– Social annotations(myExperiment)

Page 12: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Features

Two programming interfaces to Trident• Use Visual Studio to develop custom

activities and workflows and import them to Trident

• Visually Compose Workflows– No programming and scripting is required– Drag and drop a workflow or an activity– Subsections

Page 13: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Execution Service• Local or distributed execution of workflows

– HPCS cluster– Cloud services

• Interactive and non-interactive execution service

• Publishes events to subscriber services, such as tracking, provenance, and monitoring.

Page 14: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Workflow Monitoring• Remote and local monitoring

– Workflow processing status– Input and output parameters– Data products– Performance

Page 15: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Management Studio

• Administration of workflows and workflow scheduling

• Registry management• Monitoring

Page 16: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

What is Windows Workflow?

• Part of Microsoft’s .Net framework 3.0, 3.5, and upcoming 4.0

• Activities• Runtime• Tooling

Host Process (.exe, IIS, …)

WF Runtime Extensions

Tracking

Persistence

WorkflowActivity Library

ToolingVS

DesignerVS

DebuggerRehosted Designer

Page 17: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Windows WorkflowBase Activity Library

Basic Composite

Page 18: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Workflow Authoring

Page 19: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Workflow Composer

An End User Application forEditing, Executing, and

MonitoringScientific Workflows

19

Page 20: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

What Differentiates Scientific Workflow?

• Composition goes through many iterations• Data flow is a first class citizen• Need an easy way to publish and share• Provenance

• Runtime• Evolutionary

• Adaptable to different computing environments

Page 21: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Workflow Composer

Composition Space

Activity Library

WorkflowLibrary

Data Options & Sharing

Page 22: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Composer Demo

22

Page 23: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Registry

Flexible Data Store And Some More

23

Page 24: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryMotivation: Why a new registry system?

• Single “point of truth” of the system– Facilitates state synchronization actions– Catalog keeps track of computing resources and state

• Flexible Storage– What is it?

• Flexible store mechanism• Supports Microsoft and non-Microsoft store providers• Supports local, client-server and cloud architectures

– Non goals• Replacement for LINQ or ER Framework

• Reference Catalog– Unified view of the resources– Stores references to internal and external resources– Flexible provider mechanism to abstract access to external

resources

Page 25: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryRegistry Connections

Page 26: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryRegistry Management

Page 27: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryData Providers: Abstracting “What’s out there”

• Storage providers– Provides abstraction to data structures stored in the

backend– No assumptions on how data was stored and related

Implemented using “verbs” and “subjects” actions• “Store object user with these properties”• “Relate this user object with this service as its owner”• “Delete namespace object”

• Data abstraction layer and code generation– C# generated code provides shield and programming

API– C# code generator generates SQL catalog for perfect

datacode match

Page 28: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryData Providers: Abstracting “What’s out there”

• Creating new providers– Why would I create a new storage provider?

• Enable Trident to store / retrieve state from other platforms

• Enable Trident to store / retrieve state on other systems• Enhance existing providers with new features and

abstractions

– What it takes to create a new provider• Create a new assembly (or add to an existing provider

assembly)• Create a new class derived from Microsoft.Research.eResearch.Connection

• Drop our new DLL into Trident folder

Page 29: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Creating a new Registry Provider

DEMO

29

Page 30: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident RegistryStorage vs References

• Use Cases– Object Tracking– Data and Process Discovery

• All workflow aspects are exposed in the storage schema• Allows rich query of data, activities, parameters, etc

• Data Providers– Abstraction layer to external references (similar to

registry data storage)• Enables user applications to benefit from unified model• Simplifies development• Enables fault tolerance for external resource sources• Not every workflow need to worry about these details

– All data provider knowledge resides in the registry– Pluggable and flexible

Page 31: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

API

Native

Managed

Web Service

s

API

Managed

Native

Web Service

s

Trident RegistryProvider API

Managed (.NET) API– Library of choice for interacting with Trident

Registry– Simplifies lots of data complexity– Abstracts verbs and actions into an object

model– Access to all Trident Registry objects and

relations– No need for servers and services to operate

(access the data backend directly)– Faster, no extra hops. Direct data access.

Native API– Useful for non-managed applications

and systems integration– Similar to Managed (.NET) API in

terms of performance and requirements

– But more limited (not a 100% feature match right now)

Web Services API– Recommended for non-Microsoft platform integration,

e.g. Linux and Mac OS– Requires a IIS web server and service configured– Greater control over data and process, higher data

security– Only core objects and relationships are exposed right

now– Extra parsing and processing hop. Need to consider

cluster and load and balancing solutions for high-performance scenarios

Page 32: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Blackboard

A Distributed Eventing ModelFor Workflow

32

Page 33: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

The Workflow Runtimeand Tracking Services

• WF workflows launch in a runtime context– Runtime thread controls WF related threads

• Execution thread• Built-in services • Custom services

• Built-in services track workflow execution– Workflow events– Individual activity events– Data updates

Page 34: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Blackboard

• A distributed Pub/Sub model for workflow eventing

• Why?– Tracking information needs to be shared

across compute nodes– Workflows are evolutionary and thus

messengers require a pluggable interface– Large message volume means that the

message broker needs to be light-weight and fast

Page 35: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

The Blackboard Message

• Titled name/value pair collection– All values are strings– Title and names can resolve against an

ontologyStructure Example

‘Collection Title’

‘value 1’ ‘value 2’ ‘value 3’

‘name 1’ ‘name 2’ ‘name 3’

‘WF Runtime Event’

‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’

‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’

Page 36: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

The Blackboard Message

• Titled name/value pair collection– All values are strings– Title and names can resolve against an

ontologyStructure Example

‘Collection Title’

‘value 1’ ‘value 2’ ‘value 3’

‘name 1’ ‘name 2’ ‘name 3’

‘WF Runtime Event’

‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’

‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’

Publisher

Subscriber

Subscriber

Workflow Tracker

Database Logging Provenance Store

Page 37: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Architecture

Trident Workflow Executor

WF Runtime Services

Publisher

Publisher

Publisher

Blackboard Subscriber

Subscriber

Subscriber

Publisher Interface

Subscriber Interface

MessageSubscriptionInformation

LightweightMessageQueue

Page 38: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Architecture

Trident Workflow Executor

WF Runtime Services

Publisher

Publisher

Publisher

Blackboard Subscriber

Subscriber

Subscriber

Publisher Interface

MessageSubscriptionInformation

LightweightMessageQueue

• Message Rerouting• Subscription Information

Management• Recovery Logic

Message Routing

Subscriber Interface

Messages

Page 39: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Architecture

Trident Workflow Executor

WF Runtime Services

Publisher

Publisher

Publisher

Blackboard Subscriber

Subscriber

Subscriber

MessageSubscriptionInformation

LightweightMessageQueue

• Message Rerouting• Subscription Information

Management• Recovery Logic

Subscription Information Routing

Messages

SubscriptionInformation

Publisher Interface

Subscriber Interface

Page 40: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Architecture

Trident Workflow Executor

WF Runtime Services

Publisher

Publisher

Publisher

Blackboard Subscriber

Subscriber

Subscriber

MessageSubscriptionInformation

LightweightMessageQueue

• Message Rerouting• Subscription Information

Management• Recovery Logic

Internal Technologies

Messages

SubscriptionInformation

Publisher Interface

Subscriber Interface

Windows Workflow (WF)

Windows Communication Foundation (WCF)

Page 41: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Architecture

Trident Workflow Executor

WF Runtime Services

Tracking

Blackboard File Writer

Composer

Publisher Interface

MessageSubscriptionInformation

LightweightMessageQueue

• Message Rerouting• Subscription Information

Management• Recovery Logic

Logging and Monitoring Example

Subscriber Interface

Messages

Resources

Registry

‘WF Runtime Event’

‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’

‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’

Config File

Page 42: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Blackboard Demo

42

Page 43: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Tips and Tricks

43

Page 44: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Interoperability Story

• Silverlight execution environment– Web frontend for management and execution– Allows non-Microsoft operating system to use

and admister Trident

• Interface with other systems– Cove– myExperiment

Page 45: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Interface Trident Other Systems

Integration with UW COVE systemDEMO

45

Page 46: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.
Page 47: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Tips and Tricks

• Productivity Tools– Database ready activities

• Simplifies development of database aware workflows

• Code generator improves development productivity

– Data visualization and charting activities– Web Service ready activities

• Simplifies development of web service aware workflows

• Code generator improves development productivity

Page 48: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Trident Roadmap to Release

48

Page 49: Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Sprint 1

• Composer framework

• Registry• Distributed

execution service

Sprint 2

• Service and Tray Icon (run workflows locally and remotely)

• Workflow model

• Open and Save workflows with Workflow Model

• Subsections

• Intermediate results

• IFELSE• Workflow

over workflow

Sprint 3

• FOR-LOOP and Replicator

• Property Sheets for workflows and activities

• Monitoring (WF events, input & output parameters, performance)

• Data products (input and output)

• Blackboard• Logging• PanStarrs

workflow support

Trident Road Map