Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo,...
-
Upload
jackson-daniels -
Category
Documents
-
view
217 -
download
0
Transcript of Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo,...
TridentScientific Workflow Workbench
eScience’08 Tutorial
Nelson Araujo, Roger Barga, Dean Guo, Jared JacksonYogesh Simmhan, Catharine van Ingen, Nitin Gautam
Microsoft Research
Joby Thomas and the development teamAditi Technologies
Satya SahooWright State University
David KoopUniversity of Utah
Matt ValerioOhio State University
Eran ChinthakaIndiana University
MSR (Trident) Summer ‘09 Interns
Technical Content• Introduction• Feature Overview and Logical Architecture• Deep(er) dive into select features with
demos• Roadmap to delivery
Overview of our presentation today
Design Philosophy and Exit Strategy• Leverage COTS WFMS, build only what is required • Extensible and open, integrate with community tools• Drive development from actual eScience requirements• Deliver as open source accelerator to the community
Workflow for Ocean Observatories, part of an “oceanographer’s workbench” Jim Gray
Ocean Observing Initiative (OOI)Formerly the NEPTUNE project
Collaboration with Univ. of Wash & MBARI
PanSTARRs(Astronomy)
Workflow Requirements• Load/Merge Databases• Execute on Clusters• Monitor workflow execution• Logging, Provenance, Faults
One of the largest visible light telescopesFour unit telescopes acting as oneOne Gigapixel per telescope
Survey entire visible universe in 1 weekCatalog solar system, moving objects/asteroidsps1sc.org: Univ. Hawaii, Johns Hopkins, …
Sanity Check of Network Files,
Manifest, Checksum
Validate CSV File & Table
Schema
Create, Register empty LoadDB from template
For Each CSV File in Batch
BULK LOAD CSV File into Table
StartPerform CSV
File/Table Validation
Perform LoadDB/Batch
ValidationEnd
Detect Load Fault. Launch Recovery Operations. Notify Admin.
Determine affine Slice Cold DB for CSV Batch
Switch OUT Slice partition
to temp
For Each Partition in
Slice Cold DB
UNION ALL over Slice & Load DBs into temp.
Filter on partition bound.
StartPost Partition
Load ValidationSwitch IN temp
to Slice partition End
Detect Merge Fault. Launch Recovery Operations. Notify Admin.
Slice Column Recalculations &
Updates
Post Slice Load Validation
Determine ‘Merge Worthy’ Load DBs &
Slice Cold DBs
Pan-STARRS Load & Merge Workflows
http://research.microsoft.com/en-us/collaboration/tools/trident.aspx
http://beta.research.microsoft.com/en-us/collaboration/tools/trident.aspx
Trident Public Website Accessible today
From January ‘09
Logical Architecture
Features
Building on Windows
Workflow
9
Visualization
Design
Trident Logical Architecture
WorkflowPackages
ManagementStudio
Community
Workbench
Desktop
Browser
WindowsWorkflow
Foundation
ScientificWorkflows
Monitor
Administration
Web Portal(myExperiment)
Archiving
Trident Registry
Data Model (Data Agnostic Abstraction)
Data Access
SQL Server SSDS S3 Others
RegistryManagement
Trident Runtime Services
Provenance
Publish-Subscribe Blackboard
WF Execution Hosts
Others
Fault Tolerance
HPC Scheduling
Trident Features
Libraries of activities, services, and workflows– Prepackaged activities and workflows out of the
box and custom libraries– Registry with rich sets of workflow meta data– Versions– Workflow packages– Social annotations(myExperiment)
Trident Features
Two programming interfaces to Trident• Use Visual Studio to develop custom
activities and workflows and import them to Trident
• Visually Compose Workflows– No programming and scripting is required– Drag and drop a workflow or an activity– Subsections
Execution Service• Local or distributed execution of workflows
– HPCS cluster– Cloud services
• Interactive and non-interactive execution service
• Publishes events to subscriber services, such as tracking, provenance, and monitoring.
Workflow Monitoring• Remote and local monitoring
– Workflow processing status– Input and output parameters– Data products– Performance
Management Studio
• Administration of workflows and workflow scheduling
• Registry management• Monitoring
What is Windows Workflow?
• Part of Microsoft’s .Net framework 3.0, 3.5, and upcoming 4.0
• Activities• Runtime• Tooling
Host Process (.exe, IIS, …)
WF Runtime Extensions
Tracking
Persistence
…
WorkflowActivity Library
ToolingVS
DesignerVS
DebuggerRehosted Designer
Windows WorkflowBase Activity Library
Basic Composite
Workflow Authoring
Trident Workflow Composer
An End User Application forEditing, Executing, and
MonitoringScientific Workflows
19
What Differentiates Scientific Workflow?
• Composition goes through many iterations• Data flow is a first class citizen• Need an easy way to publish and share• Provenance
• Runtime• Evolutionary
• Adaptable to different computing environments
Trident Workflow Composer
Composition Space
Activity Library
WorkflowLibrary
Data Options & Sharing
Composer Demo
22
Trident Registry
Flexible Data Store And Some More
23
Trident RegistryMotivation: Why a new registry system?
• Single “point of truth” of the system– Facilitates state synchronization actions– Catalog keeps track of computing resources and state
• Flexible Storage– What is it?
• Flexible store mechanism• Supports Microsoft and non-Microsoft store providers• Supports local, client-server and cloud architectures
– Non goals• Replacement for LINQ or ER Framework
• Reference Catalog– Unified view of the resources– Stores references to internal and external resources– Flexible provider mechanism to abstract access to external
resources
Trident RegistryRegistry Connections
Trident RegistryRegistry Management
Trident RegistryData Providers: Abstracting “What’s out there”
• Storage providers– Provides abstraction to data structures stored in the
backend– No assumptions on how data was stored and related
Implemented using “verbs” and “subjects” actions• “Store object user with these properties”• “Relate this user object with this service as its owner”• “Delete namespace object”
• Data abstraction layer and code generation– C# generated code provides shield and programming
API– C# code generator generates SQL catalog for perfect
datacode match
Trident RegistryData Providers: Abstracting “What’s out there”
• Creating new providers– Why would I create a new storage provider?
• Enable Trident to store / retrieve state from other platforms
• Enable Trident to store / retrieve state on other systems• Enhance existing providers with new features and
abstractions
– What it takes to create a new provider• Create a new assembly (or add to an existing provider
assembly)• Create a new class derived from Microsoft.Research.eResearch.Connection
• Drop our new DLL into Trident folder
Creating a new Registry Provider
DEMO
29
Trident RegistryStorage vs References
• Use Cases– Object Tracking– Data and Process Discovery
• All workflow aspects are exposed in the storage schema• Allows rich query of data, activities, parameters, etc
• Data Providers– Abstraction layer to external references (similar to
registry data storage)• Enables user applications to benefit from unified model• Simplifies development• Enables fault tolerance for external resource sources• Not every workflow need to worry about these details
– All data provider knowledge resides in the registry– Pluggable and flexible
API
Native
Managed
Web Service
s
API
Managed
Native
Web Service
s
Trident RegistryProvider API
Managed (.NET) API– Library of choice for interacting with Trident
Registry– Simplifies lots of data complexity– Abstracts verbs and actions into an object
model– Access to all Trident Registry objects and
relations– No need for servers and services to operate
(access the data backend directly)– Faster, no extra hops. Direct data access.
Native API– Useful for non-managed applications
and systems integration– Similar to Managed (.NET) API in
terms of performance and requirements
– But more limited (not a 100% feature match right now)
Web Services API– Recommended for non-Microsoft platform integration,
e.g. Linux and Mac OS– Requires a IIS web server and service configured– Greater control over data and process, higher data
security– Only core objects and relationships are exposed right
now– Extra parsing and processing hop. Need to consider
cluster and load and balancing solutions for high-performance scenarios
Trident Blackboard
A Distributed Eventing ModelFor Workflow
32
The Workflow Runtimeand Tracking Services
• WF workflows launch in a runtime context– Runtime thread controls WF related threads
• Execution thread• Built-in services • Custom services
• Built-in services track workflow execution– Workflow events– Individual activity events– Data updates
Trident Blackboard
• A distributed Pub/Sub model for workflow eventing
• Why?– Tracking information needs to be shared
across compute nodes– Workflows are evolutionary and thus
messengers require a pluggable interface– Large message volume means that the
message broker needs to be light-weight and fast
The Blackboard Message
• Titled name/value pair collection– All values are strings– Title and names can resolve against an
ontologyStructure Example
‘Collection Title’
‘value 1’ ‘value 2’ ‘value 3’
‘name 1’ ‘name 2’ ‘name 3’
‘WF Runtime Event’
‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’
‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’
The Blackboard Message
• Titled name/value pair collection– All values are strings– Title and names can resolve against an
ontologyStructure Example
‘Collection Title’
‘value 1’ ‘value 2’ ‘value 3’
‘name 1’ ‘name 2’ ‘name 3’
‘WF Runtime Event’
‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’
‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’
Publisher
Subscriber
Subscriber
Workflow Tracker
Database Logging Provenance Store
Blackboard Architecture
Trident Workflow Executor
WF Runtime Services
Publisher
Publisher
Publisher
Blackboard Subscriber
Subscriber
Subscriber
Publisher Interface
Subscriber Interface
MessageSubscriptionInformation
LightweightMessageQueue
Blackboard Architecture
Trident Workflow Executor
WF Runtime Services
Publisher
Publisher
Publisher
Blackboard Subscriber
Subscriber
Subscriber
Publisher Interface
MessageSubscriptionInformation
LightweightMessageQueue
• Message Rerouting• Subscription Information
Management• Recovery Logic
Message Routing
Subscriber Interface
Messages
Blackboard Architecture
Trident Workflow Executor
WF Runtime Services
Publisher
Publisher
Publisher
Blackboard Subscriber
Subscriber
Subscriber
MessageSubscriptionInformation
LightweightMessageQueue
• Message Rerouting• Subscription Information
Management• Recovery Logic
Subscription Information Routing
Messages
SubscriptionInformation
Publisher Interface
Subscriber Interface
Blackboard Architecture
Trident Workflow Executor
WF Runtime Services
Publisher
Publisher
Publisher
Blackboard Subscriber
Subscriber
Subscriber
MessageSubscriptionInformation
LightweightMessageQueue
• Message Rerouting• Subscription Information
Management• Recovery Logic
Internal Technologies
Messages
SubscriptionInformation
Publisher Interface
Subscriber Interface
Windows Workflow (WF)
Windows Communication Foundation (WCF)
Blackboard Architecture
Trident Workflow Executor
WF Runtime Services
Tracking
Blackboard File Writer
Composer
Publisher Interface
MessageSubscriptionInformation
LightweightMessageQueue
• Message Rerouting• Subscription Information
Management• Recovery Logic
Logging and Monitoring Example
Subscriber Interface
Messages
Resources
Registry
‘WF Runtime Event’
‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’
‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’
Config File
Blackboard Demo
42
Trident Tips and Tricks
43
Interoperability Story
• Silverlight execution environment– Web frontend for management and execution– Allows non-Microsoft operating system to use
and admister Trident
• Interface with other systems– Cove– myExperiment
Interface Trident Other Systems
Integration with UW COVE systemDEMO
45
Trident Tips and Tricks
• Productivity Tools– Database ready activities
• Simplifies development of database aware workflows
• Code generator improves development productivity
– Data visualization and charting activities– Web Service ready activities
• Simplifies development of web service aware workflows
• Code generator improves development productivity
Trident Roadmap to Release
48
Sprint 1
• Composer framework
• Registry• Distributed
execution service
Sprint 2
• Service and Tray Icon (run workflows locally and remotely)
• Workflow model
• Open and Save workflows with Workflow Model
• Subsections
• Intermediate results
• IFELSE• Workflow
over workflow
Sprint 3
• FOR-LOOP and Replicator
• Property Sheets for workflows and activities
• Monitoring (WF events, input & output parameters, performance)
• Data products (input and output)
• Blackboard• Logging• PanStarrs
workflow support
Trident Road Map