Kepler Project Overview, Status, Future Directions
description
Transcript of Kepler Project Overview, Status, Future Directions
![Page 1: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/1.jpg)
8th Biennial Ptolemy Miniconference
Berkeley, CAApril 16, 2009
Kepler Project Overview, Status, Future Directions
Bertram Ludäscher
University of California, Davis
![Page 2: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/2.jpg)
Ptolemy Miniconference, April 16, 2009
Overview
• History– Origins, Diversity, Challenges
• Kepler, Kepler/CORE: – Issues, Status– Next steps
• Research: Scientific Workflows … – Business (workflows) as usual? – Or… ?
![Page 3: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/3.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler: Some History
• The Origins:– AD 2002: NSF/SEEK, DOE/SDM
• Similar requirements for “scientific workflows”• Can we avoid reinventing the wheel (twice…) !? Grass-roots effort, open source collaboration
• The Head-start:– Adopting, extending Ptolemy II from Berkeley– Common software platform facilitates grass-root collaboration– More than software: Research Results
• Heterogeneous Modeling & Design, dataflow-, actor-oriented MoCs
![Page 4: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/4.jpg)
Ptolemy Miniconference, April 16, 2009
Scientific Workflows: Cyberinfrastructure “Upperware”
Underware
Middleware
UpperMiddleware
Upperware
NSF/SEEK ITR, 5 Year collaboration: SDSC, UCSB, UCD, UNM, UK, …
![Page 5: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/5.jpg)
Ptolemy Miniconference, April 16, 2009
Scientific Workflow
Capture how a scientist works with data and analytical tools– data access, transformation, analysis, visualization– possible worldview: dataflow-oriented (cf. signal-processing)
Scientific workflow (wf) benefits (compare w/ script-based approaches) : – wf automation – wf & component reuse – wf design, documentation– wf archival, sharing– built-in concurrency
(task-, pipeline-parallelism) – built-in provenance support– distributed & parallel exec:
Grid & cluster support – …
![Page 6: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/6.jpg)
Ptolemy Miniconference, April 16, 2009
Simple Kepler analysis workflow using R …
Data source from EcoGrid(metadata-driven ingestion)
res <- lm(BARO ~ T_AIR)resplot(T_AIR, BARO)abline(res)
R processing script
Dan Higgins, NCEASDan Higgins, NCEAS
![Page 7: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/7.jpg)
Ptolemy Miniconference, April 16, 2009
… vs. “Plumbing” workflow
• to monitor, control remote supercomputer simulations …
– 50+ composite actors (subworkflows)
– 4 levels of hierarchy
– 1000+ atomic (Java) actors
43 actors, 3 levels
196 actors, 4 levels30 actors
206 actors, 4 levels
137 actors33 actors
150123 actors
66 actors12 actors
243 actors, 4 levels
Norbert Podhorszki Then: UC Davis, now: ORNL … Norbert Podhorszki Then: UC Davis, now: ORNL …
![Page 8: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/8.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler: Open Source + Open Community
1. Huge diversity of domains => needs – Astrophysics, nuclear fusion research,
geoinformatics, ecology, systematics, bioinformatics, genomics, environmental monitoring, simulation, …
– Not just bioinformatics and cheminformatics …
2. A broad range of technical problems– Workflow design with a graphical UI– Sharing actors, workflows across communities– Distributed workflow execution – Data movement on the network– Integrate local apps, web services, native actors– Support a variety of computational models– Not just web service orchestration or Grid
deployment
![Page 9: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/9.jpg)
Ptolemy Miniconference, April 16, 2009
3. Many kinds of users with different backgrounds and responsibilities:
• Scientists automating and sharing their analyses of their own data or performing meta-analyses on others’ data
• Software engineers developing their own systems around Kepler• Computer scientists doing basic research in scientific workflows, data
and provenance management, distributed and collaborative computing.• Not just biologists and chemists …
4. Kepler used in many different deployment contexts• Standalone application on a scientist’s desktop computer or laptop.• Backend for web-based scientific applications.• Embedded workflow engine in larger systems.• One size (of deployment) does not fit all!
5. Kepler open to contribution and extension by anyone:• Anyone can contribute to Kepler!• Anyone can use Kepler in their own applications• Developing with Kepler doesn’t require collaboration with the “owners” …
Kepler: Open Source + Open Community
![Page 10: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/10.jpg)
Ptolemy Miniconference, April 16, 2009
COMET!
**
**
Ecology
Chemistry
Geosciences
Oceanography
Molecular Biology
Phylogenetics
Conservation Biology
Library Science
Particle Physics
Astronomy
KeplerKeplerCORECORE
KeplerKeplerCORECORE
ChIP-chip
KeplerKeplerCORECORE
KeplerKeplerCORECORE
![Page 11: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/11.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler-CORE Mission
In collaboration with current and future contributors to Kepler, the Kepler/CORE team will …
– Develop and maintain the essential, interdisciplinary software components of Kepler
– Coordinate the contributions of the greater Kepler collaboration to the core (“kernel”) of the system
– Increase the role of the current and future user community in specifying requirements and priorities
![Page 12: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/12.jpg)
Ptolemy Miniconference, April 16, 2009
The Kepler-CORE Project and Team
• Kepler-CORE (sensu stricto)– 3-year, $1.7M NSF-OCI funded project
• Kepler-CORE team @ UCD,UCSB, UCSD: – UC Davis
• Bertram Ludäscher (PI@UCD), Shawn Bowers (co-PI), Tim McPhillips (co-PI & software architect), David Welker (software engineer), Sean Riddle (software engineer)
– UC Santa Barbara• Matthew Jones (PI@UCSB), Mark Schildhauer (co-PI), Aaron Schultz, Chad
Berkley (software engineer)
– UC San Diego• Ilkay Altintas (PI@SDSC), Jianwu Wang (postdoc)
• Kepler/CORE (sensu lato) – Goal: sustain long-term, beyond initial funding period – KEPLER = Kepler/Core + Kepler/X + Kepler/Y + …
• Core, X, Y, … = open community of stakeholders, contributors, users, etc.
![Page 13: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/13.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler-CORE Vision
In the future we foresee Kepler…
• Satisfying the scientific workflow automation needs of– Collaborative government-funded projects– Academic research groups– Individual researchers in diverse scientific disciplines
• Enhancing the productivity of researchers by– Facilitating discovery and collaboration within and across disciplines– Being the best way for scientists to leverage developments and expertise in other
domains
• Leading to further breakthroughs and innovations in the fields of– Scientific data management– Data provenance– Collaborative scientific computing
• Shepherded by a self-sustaining effort that thrives well beyond the lifetimes of the grants that have contributed to Kepler’s development.
Kepler-CORE Duo
![Page 14: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/14.jpg)
Ptolemy Miniconference, April 16, 2009
• Kepler cannot solve everyone’s problems right out of the box– Kepler must be adaptable to different domain sciences
– Adaptation requires more than developing new actors
– Kepler is as much a development platform as an “end-user” tool
– No one group can take responsibility for supporting all the ways Kepler will be used …
• Kepler is open-source but more complex than other open source projects– Diversity of domains, users, and deployment contexts mean there can be conflicts
between the needs or priorities of contributors
– Need a way of developing and adding extensions without breaking other’s systems
– Software engineers developing code for Kepler often are not expert scientists, and cannot be the final authority on what the system should do (unlike projects like Apache, Linux, etc where the engineers are the expert users themselves and can add what they need)
– PIs and project managers on projects extending Kepler must take responsibility for knowing what needs to be done.
– It is essential that for each project employing Kepler, representatives authoritative on the scientific and technical needs of their projects participate in driving the future development of Kepler!
These differences mean that the Kepler collaboration will be unique, too
![Page 15: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/15.jpg)
Ptolemy Miniconference, April 16, 2009
Stakeholders: Essential to Success of Kepler
Kepler stakeholders …– Are projects and individuals whose work depend critically on the success of
Kepler.– Are funded by a variety of sources and work in diverse fields of scientific research.– Are more likely to greatly extend Kepler and use Kepler within their own
systems than simply develop packages of actors and workflows for use with a standard distribution of Kepler.
– Need to deliver the software systems they develop to their own community of users.
– Must deliver their software systems according to their own (e.g. release) schedules as determined by their research and funding programs.
– Have different requirements that will conflict in the absence of mechanisms for enabling independent extension and deployment of Kepler-based systems.
– Require recognition for the contributions they make to Kepler as well as for their own systems based on Kepler.
– Know better than us what they need from Kepler.
![Page 16: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/16.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler(-CORE) Management
• Leadership Team– 3 year terms (current members from UC + [S] + {B, D} )– Focus on
• long term viability of Kepler• strategic decisions on behalf of user community, Kepler project
• Interest Groups– Communicate, collaborate on specialized capabilities
• Development Teams– Design, develop, test specific software deliverables
• Infrastructure Teams– Identify, discuss, design, implement Kepler Framework
![Page 17: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/17.jpg)
Ptolemy Miniconference, April 16, 2009
New Kepler Build System
• Modules and suites– Develop against trunk or specific version, release– Tag, branch Kepler extensions independently of the kernel easier to share develop, share extensions svn repository (https://code.kepler-project.org/code/kepler/)
• Module Manager– New component to simplify
working with modules
David Welker et al. David Welker et al.
![Page 18: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/18.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler Release Roadmap
• https://kepler-project.org/developers/teams/build/kepler-release-roadmap
• Kepler releases– based on a “standard set” of modules
• Individual module releases– Provenance– Workflow reporting– COMAD– Distributed: Master/Slave – …
• Kepler 2.0– Add modules, extensions to installed Kepler dynamically– Targeted for Summer 2009
![Page 19: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/19.jpg)
Ptolemy Miniconference, April 16, 2009
New Plone-based Web site, Forums
![Page 20: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/20.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler/REAP: Workflow Run Manager
• Use case “Publication-Ready Archive”
• Archive workflow with inputs, outputs
• Tagging
• Also: Outline view– to manage browsing of
large/deeply nested workflows
Derik Barseghian et al.Derik Barseghian et al.
![Page 21: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/21.jpg)
Ptolemy Miniconference, April 16, 2009
Workflow Reports (Provenance Interest Group)
Derik Barseghian et al.Derik Barseghian et al.
![Page 22: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/22.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler and Scientific Workflow Research
• Scientific Workflows: – Business (workflows) as usual?
• Data-oriented, data-centric … – … as opposed to control-, task-centric
– Signal processing?– Or else …?
• Modeling scientific processes, analysis methods– Understanding is in the Mind of the Beholder!
• Example areas:– Workflow Modeling & Design– Provenance – Optimization
![Page 23: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/23.jpg)
Ptolemy Miniconference, April 16, 2009
Modeling Example (ChIP-chip workflow)
Tim McPhillips et al.Tim McPhillips et al.
![Page 24: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/24.jpg)
Ptolemy Miniconference, April 16, 2009
Modeling & Design: The limits of my language mean the limits of my world …
• Vanilla Process Network
• Functional Programming Dataflow Network
• XML Transformation Network
• Collection-oriented Modeling & Design framework (COMAD)
– “Look Ma: No Shims!”
![Page 25: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/25.jpg)
Ptolemy Miniconference, April 16, 2009
Two different workflow designs
![Page 26: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/26.jpg)
Ptolemy Miniconference, April 16, 2009
Data Provenance• Keep track of data dependencies, processing history
support interpretation, validation, reproducibility
AZG
AYG
AXG
AlignWarp Reslice Softmean Slicer Convert
AZG
AYG
AXG
AI1AH1
AI2AH2
AI4AH4
AI4AH4
RI RH
inputs
outputs
AXS
AYS
AZS
AI
AH
RI1
RH1
RI2
RH2
RI4
RH4
RI4
RH4
WP1
WP2
WP4
WP4alignWarp:4
alignWarp:3
alignWarp:2
alignWarp:1
reslice:4
reslice:3
reslice:2
reslice:1
softmean:1
slicer:1
slicer:2
slicer:3
convert:1
convert:2
convert:3
![Page 27: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/27.jpg)
Ptolemy Miniconference, April 16, 2009
Kepler/pPOD: Provenance Browser
• For conventional data provenance and
• fine-grained dependencies (COMAD style)
• Navigate forward and backward in time (VCR style) in different views (collections, processes, combined)
Shawn Bowers et al.Shawn Bowers et al.
![Page 28: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/28.jpg)
Ptolemy Miniconference, April 16, 2009
Collection History
• Collection and invocation view• Incrementally step through execution
history
![Page 29: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/29.jpg)
Ptolemy Miniconference, April 16, 2009
From MoCs to Models of Provenance (MoPs)
![Page 30: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/30.jpg)
Ptolemy Miniconference, April 16, 2009
Fine-grained, Data & MoC-aware MoP
Manish Anand, Shawn Bowers, et al.
Manish Anand, Shawn Bowers, et al.
![Page 31: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/31.jpg)
Ptolemy Miniconference, April 16, 2009
Optimization: Multi-level Workflows
Kepler
LPPN
Daniel Zinn et al. Daniel Zinn et al.
![Page 32: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/32.jpg)
Ptolemy Miniconference, April 16, 2009
Modeling + Optimization: Virtual Assembly Lines (COMAD)
![Page 33: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/33.jpg)
Ptolemy Miniconference, April 16, 2009
Layers in COMAD / ∆-XML Pipelines
WF Graph
Configurations(white-box)
Scientific Functions(black-boxes)
CipresRAxMLIn: DNASeq+
Thres: Float
Method: String
Out: (t:Tree, s:score)+
•Access data in XML stream•Call Scientific Functions (Services)•Put results back into stream
Daniel Zinn (UC Davis) Daniel Zinn (UC Davis)
![Page 34: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/34.jpg)
Ptolemy Miniconference, April 16, 2009
Conventional vs Assembly Line / COMAD Thinking
Daniel Zinn (UC Davis) Daniel Zinn (UC Davis)
![Page 35: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/35.jpg)
Ptolemy Miniconference, April 16, 2009
More secret sauce: User vs. Optimized Dataflow
![Page 36: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/36.jpg)
Ptolemy Miniconference, April 16, 2009
Conceptual Pipeline w/ Scopes & Types
Daniel Zinn (UC Davis) Daniel Zinn (UC Davis)
![Page 37: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/37.jpg)
Ptolemy Miniconference, April 16, 2009
X-CSR (“XML Scissor”): Cut-Ship-Reassemble
Daniel Zinn (UC Davis) Daniel Zinn (UC Davis)
![Page 38: Kepler Project Overview, Status, Future Directions](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814c47550346895db94b29/html5/thumbnails/38.jpg)
Ptolemy Miniconference, April 16, 2009
Acknowledgments• Kepler contributors
– Many individuals: https://kepler-project.org/developers/kepler-contributors – Projects: Ptolemy, SEEK, SDM, CPES, GEON, REAP, CIPRes, ChIP2, pPOD, COMET, BAP, LTER, RAPR, ITER, …
• Funding agencies: NSF, DOE, …
• DAKS @ UC Davis Members– Research Staff
• Drs. Shawn Bowers, Timothy McPhillips, Lei Dou, Ustun Yildiz
– Developers:• Sean Riddle, David Welker, Gongjing Cao
– Students:• Manish Anand, Dave Thau, Daniel Zinn, Sven Koehler, Saumen Dey, Supriya Gulati, Faraaz Sareshwala, Xuan Li
UC DAVISDepartment ofComputer Science