Locastyle standards: creating interfaces in an agile, scalable and responsive way
Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile...
Transcript of Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile...
![Page 1: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/1.jpg)
Tigres: Template Interfaces for Agile Parallel Data-Intensive Science
Lavanya Ramakrishnan Deb Agarwal
Lawrence Berkeley National Lab
![Page 2: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/2.jpg)
• Core Team – Deb Agarwal (PI), Lavanya Ramakrishnan, Dan Gunter – Valerie Hendrix, Gilberto Pastorello, Sarah Poon – Ryan Rodriguez, James Fox
• CS Research groups – John Shalf, Shane Canon, Nicholas Wright
• Science research groups – Cosmology - Alex Kim, Rollin Thomas, Stephen Bailey – Gamma Ray - Dan Chivers – Advanced Light Source - Dula Parkinson – HEP - Paolo Calafiura – Materials – Kristin Persson
Tigres Team
![Page 3: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/3.jpg)
Tigres: Design templates for common scientific workflow patterns
"LightSrc" Domain templates
Base Tigres templates
Scale up
Application "LightSrc-1"
Application "LightSrc-2"
Create andDebug
Share
Create andDebug
Workflow Library: Implement templates as a library in an existing language Basic Templates: Sequence, Parallel, Split, Merge
Early python release is now available! http://tigres.lbl.gov
![Page 4: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/4.jpg)
Key Aspects of Tigres
• Targeted for large-scale data-intensive workflows – Motivated by “MapReduce” model
• Library model embedded in existing languages such as Python and C – “Extend current scripting/programming tools” – API-based, embedded in code
• Light-weight execution framework – “As easy to run as an MPI program on an HPC resource” – No persistent services
• Scientist-Centered Design Process – Get feedback from user continuously
![Page 5: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/5.jpg)
Tigres: Current Status (in Release)
• Iterative workflow development – Simple data model – Python API to compose and execute – Use programming language constructs for complex logic
flows • Execution
– Existing application binaries, functions – Seamlessly run on Desktops, Clusters and HPC
• Monitoring, Provenance – Visual representation of graph that ran – Extensive monitoring from workflow execution – Support for adding user-level provenance
• Extensive documentation, examples and tutorials
Tigres data model
![Page 6: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/6.jpg)
Tigres: Current Status
• Recover failed workflows from logs (Testing) • C API in development (90% done) • Active Code Generation (Prototype) • Fault tolerance and failure recovery API (Design)
![Page 7: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/7.jpg)
Scientist-Centered Design Process
• Usability studies provides semi-structured feedback from end-users – Not the same as requirements gathering – Limited literature on doing usability for APIs
• Round 1: Paper API & Google Docs Coding Session – Goal: Nomenclature and desired features – Priorities: Nomenclature, Monitoring, Dependency syntax, ..
• Round 2: Initial Prototype with documentation – Goal: Effectiveness of using API for specific problems – Understanding experience relative to programming work
styles – Opportunistic, Pragmatic, Systematic – Questionnaire and interview and a 3/6 month follow-up
Experiences with User-Centered Design for the Tigres Workflow API, IEEE eScience 2014
![Page 8: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/8.jpg)
Develop
Run Feedback
Design
Model/existing codes translated to a Tigres
program
start(name=“MyWorkflow”) ... split(name=“Split”...) merge(name=“Merge”...) ... end()
G LOG
May be a partial recovery run
Program state available during and after runs
Desktop
HPC
Iterative Scientific Workflow Process
![Page 9: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/9.jpg)
Develop
Run Feedback
Design
User API
State Management
Execution Management
Core API
Mon
itorin
g
Model/existing codes translated to a Tigres
program
start(name=“MyWorkflow”) ... split(name=“Split”...) merge(name=“Merge”...) ... end()
G LOG
May be a partial recovery run
Program state available during and after runs
Desktop
HPC
Tigres “Library” Model
Next Step: Nested Templates, Fault Tolerance API, Efficient decentralized execution
![Page 10: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/10.jpg)
Other Collaborations • DALHIS: INRIA Associated Team
– Building a data analysis environment using shared space execution and cloud models
– Paper: Combining Workflow Templates with a Shared Space-based Execution Model, WORKS 2014
• NERSC – Identifying next-generation workflows and supporting
services needs at HPC centers • ARES
– Use of Tigres for managing shared data-analysis workflows • Additional communities
– Climate CASCADE SFA, Berkeley Institute for Data Science (BIDS), ..
![Page 11: Tigres: Template Interfaces for Agile Parallel Data …...Tigres: Template Interfaces for Agile Parallel Data-Intensive Science Lavanya Ramakrishnan Deb Agarwal Lawrence Berkeley National](https://reader034.fdocuments.in/reader034/viewer/2022042223/5ec9d049ea65d0120411f51e/html5/thumbnails/11.jpg)
Open Research Topics
• How does a “computational/data” workflow tie with the larger scientific process and scientist’ development environment?
• How do we balance the dynamic, interactive and
iterative needs with performing global optimizations needed for exascale?
• How do we provide a framework that allows for data fusion from multiple diverse sources that can be used to derive knowledge?