pomsets Workflow management for your...
Transcript of pomsets Workflow management for your...
![Page 1: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/1.jpg)
Michael Pannephosity
pomsetsWorkflow management for your cloud
![Page 2: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/2.jpg)
In the future, the rapidity with which anygiven discipline advances is likely todepend on how well the communityacquires the necessary expertise indatabase, workflow management,visualization, and cloud computingtechnologies.
“Beyond the Data Deluge”, Science, Vol. 323. no.5919, pp. 1297-1298, 2009.
![Page 3: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/3.jpg)
Workflow management is…
the design,specification,coordination ofthe execution oftasks and taskdependencies.
![Page 4: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/4.jpg)
Why workflow management +cloud computing?
• Cloud computing provides the ability to scalecompute resources with the work that needsto be done
• Better than what has been available, i.e.WFM+grid
• WFM is critical to a successful long-termcloud computing strategy• A critical component of the cloud computing
software stack• Growing recognition of the need for workflow
management
![Page 5: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/5.jpg)
Issues with WFM+grid
• Jobs submitted to grids queue up behindjobs of other users, reduces operationalefficiencies provided by WFMS
• Heterogeneous comput environments mayresult in different task results
• Grids are not easily federated, limiting burstcomputing
• Available only to institutions with theresources to deploy their own grid andimplement their own WFMS
![Page 6: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/6.jpg)
Components of a cloudcomputing software stack
• Virtual machines (VMWare, Xen, Virtuzzo, KVM)• Dynamic provisioning (Amazon EC2, Eucalyptus)• Task partitioning (MapReduce, Hadoop, Disco,
Sphere)• Data distribution (GFS, HDFS, Ceph, Sector,
MongoDB, CouchDB)• Unified messaging (Qpid, RabbitMQ, ZeroMQ)• Workflow management (Azkaban, Kepler, Oozie,
Pipeline, Pegasus, Taverna, Triana, pomsets)• Analytics (Rightscale, Nagios, Ganglia, Graphite)
![Page 7: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/7.jpg)
Growing recognition of the need forworkflow management
(screencap 2009-12-04, currently 59 watchers)
![Page 8: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/8.jpg)
Why pomsets?
• Other existing workflowmanagement systems are madefor programmers
• Non-programmers in enterprisesneed an easier way to managetheir data-intensive computationalworkflows
![Page 9: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/9.jpg)
Oozie
![Page 10: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/10.jpg)
Cascading
![Page 11: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/11.jpg)
Pig
![Page 12: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/12.jpg)
Shell script
![Page 13: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/13.jpg)
pomsets is …
• A mathematical model- first used in1985 by Vaughn Pratt- to describeconcurrent processes
• An application that implements themathematical model as the datastructures that represent workflowcomplents, facilitates the design andspecification of workflows, andcoordinates the execution of workflowtasks on cloud deployments
![Page 14: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/14.jpg)
The mathematical definition
![Page 15: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/15.jpg)
The workflow managementsystem
• 2 components• pomsets-core is the backend and provides
an API• pomsets-gui is the front end and interacts
with the user
![Page 16: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/16.jpg)
Features• Parallel computing• Data flow• Flow control• Workflow reusability• Compute cloud agnosticism• Execute environment agnosticism• Task partitioning• Shell commands, Hadoop, Python functions, etc• Intuitive GUI• Simple API
![Page 17: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/17.jpg)
Demo
How to create the following script in pomsets
![Page 18: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/18.jpg)
Demo
![Page 19: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.](https://reader035.fdocuments.in/reader035/viewer/2022081620/6106aabbe3539067525269c5/html5/thumbnails/19.jpg)
Growing recognition• nephosity was showcased at Structure 2010 as
one of the 11 most promising startups, due to itsfocus on workflow management in the cloud fornon-programmers