1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation...

15
1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific Workflows: Current and Future Development

Transcript of 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation...

Page 1: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

1Ilkay ALTINTAS - October, 2007

Ilkay ALTINTASLab Director, Scientific Workflow Automation Technologies

San Diego Supercomputer Center, UCSD

Kepler Scientific Workflows:Current and Future Development

Page 2: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

2Ilkay ALTINTAS - October, 2007

Scientific Workflow Systems• Combination of

– data integration, analysis, and visualization steps – automated "scientific process”

• Mission of scientific workflow systems– Promote “scientific discovery” by providing tools and methods to

generate scientific workflows– Create an extensible and customizable graphical user interface

for scientists from different scientific domains– Support computational experiment creation, execution, sharing,

reuse and provenance– Design frameworks which define efficient ways to connect to the

existing data and integrate heterogeneous data from multiple resources

Page 3: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

3Ilkay ALTINTAS - October, 2007

Ptolemy II: A laboratory for investigating designKEPLER: A problem-solving environment for Scientific Workflow

KEPLER = “Ptolemy II + X” for Scientific Workflows

Kepler is a Scientific Workflow System

• … and a cross-project collaboration

• 3rd Beta release (Jan 8, 2007)

www.kepler-project.org

• Builds upon the open-source Ptolemy II framework

Page 4: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

4Ilkay ALTINTAS - October, 2007

Kepler use cases represent many science domains!• Ecology

– SEEK: Ecological Niche Modeling and climate change

– REAP: Modeling parasite invasions in grasslands using sensor networks

– NEON: Ecological sensor networks– COMET: Environmental science

• Geosciences– GEON: LiDAR data processing, Geological

data integration– NEESit: Earthquake engineering

• Molecular biology– SDM: Gene promoter identification and

ScalaBLAST– ChIP-chip: Genome-scale research– CAMERA: Metagenomics

• Oceanography– REAP: SST data processing– LOOKING/OOI CI: ocean observing CI– ROADNet: real-time data modeling and

analysis– Ocean Life project

• Phylogenetics– ATOL: Processing Phylodata

– CiPRES: Phylogentic tools

• Chemistry– Resurgence: Computational chemistry

– DART/ARCHER: X-Ray crystallography

• Library science– DIGARCH: Digital preservation

– UK Text Mining Center: Cheshire feature and archival

• Conservation biology– SanParks: Thresholds of Potential

Concerns

• Physics– SDM: astrophysics TSI-1 and TSI-2

– CPES: Plasma fusion simulation

– ITER-EU: ITM fusion workflows

Page 5: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

5Ilkay ALTINTAS - October, 2007

Kepler today is a research prototype and a production workflow tool!

• Some of the current R&D– Distributed execution of workflow

parts (peer to peer)

– Efficient data transfer

– Provenance tracking of data and processes

– Tracking workflow evolution

– Streaming data analysis

– Easy-to-deploy batch interfaces

– Intuitive workflow design

– Customizable semantic typing

– Interoperability with other workflow and analytical environments (at exec level)

• Production workflow examples:– GEON LiDAR workflow (GLW)

• 116 registered, 106 active users• 2076 submitted jobs to date

– Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW)

• 2000 actors, 5 levels of model hierarchy• Longest run duration 3 hours

– PtII AirForce Lab Model• 12920 actors, 65331 attributes• Longest run duration: 10 minutes

– Longest running real-time simple monitoring model in PtII - months at a time

• All generated using the GUI and executed in batch mode…

– No coding and text manipulation

Page 6: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

6Ilkay ALTINTAS - October, 2007

REAP: Realtime Environment for Analytical Processing

• Funded 2006-2009– NSF CEO:P

• Jones(PI), Altintas, Baru, Ludaescher, Schildhauer

– Partners: • NCEAS/UCSB (Lead),

SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU

• Management and Analysis of Observatory Data using Kepler Scientific Workflows• The vision:

– An integrated environment for analyzing data from observatories

• Two scientific use cases:– Terrestrial ecology– Oceanography

reap.ecoinformatics.org

Page 7: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

7Ilkay ALTINTAS - October, 2007

REAP Views

• For data-grid engineers– monitoring and management

capabilities of underlying sensor networks

• For outside users– access to observatory data

and results of models, approachable to non-scientists.

• For scientists– capabilities for designing and executing complex analytical models over near real-time and archived

data sources

Page 8: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

8Ilkay ALTINTAS - October, 2007

REAP: Terrestrial Ecology Usecase

Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.

Page 9: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

9Ilkay ALTINTAS - October, 2007

REAP: RBNB Streaming Data Actor

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.

Page 10: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

10Ilkay ALTINTAS - October, 2007

REAP: Oceanographic Usecase

Facilitate the quantitative evaluation of SST data sets.

Page 11: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

11Ilkay ALTINTAS - October, 2007

Kepler/C.O.R.E

• Funded 2007-2010– NSF SDCI

• Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer

– Partners: • Genome Center/UCDavis

(Lead), SDSC/UCSD, NCEAS/UCSB

• SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure

• The vision:– Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure

kepler-project.org

Builds on community participation as a driving force for Kepler.

Page 12: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

12Ilkay ALTINTAS - October, 2007

Kepler/C.O.R.E.• Comprehensive

– First-class support for technical features

• Open– well designed and clearly articulated mechanisms and interfaces provided to facilitate

developing extensions

• Reliable– Both as a development platform and as a run-time environment for the user

• Extensible– Independently extensible by groups not directly collaborating with the team

Page 13: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

13Ilkay ALTINTAS - October, 2007

Directors in Kepler• Means to execute networks of components under multiple execution

models– Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all

combined

• Makes use of separation of concerns principle– e.g., component execution, workflow execution and provenance tracking

• The manager acts like a “common execution environment” – governing different concerns related to execution of the network and services

Ptolemy and Kepler are unique in combining different

execution models in heterogeneous models!

Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines

Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless

Page 14: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

14Ilkay ALTINTAS - October, 2007

Credits

• Kepler community and colleagues

• On REAP and Kepler/CORE:– Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome

Center, UCD– Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB– Eric Seabloom, OSU– Peter Cornillion, OpenDAP

Page 15: 1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

15Ilkay ALTINTAS - October, 2007

Ilkay [email protected]+1 (858) 822-5453

http://www.sdsc.edu

Questions…