Computing for Scientific Workflows A P2P Approach to Many...
Transcript of Computing for Scientific Workflows A P2P Approach to Many...
![Page 1: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/1.jpg)
VECPAR’10
A P2P Approach to Many Tasks Computing for Scientific Workflows
Eduardo Ogasawara Jonas DiasDaniel Oliveira Carla RodriguesCarlos Pivotto Rafael AntasVanessa Braganholo Patrick ValduriezMarta Mattoso
COPPE/UFRJ
PPGI/UFRJUniversité Montpellier/INRIA
![Page 2: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/2.jpg)
Our Scenario
• Scientific Experiments– Large scale simulations
– Chain of programs and activities
– Scientific Workflows• SWfMS (Kepler, VisTrails)
• Exploration with different methods, parameters or data
– Many Task Computing (MTC)
6/24/2010 2A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 3: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/3.jpg)
What we have
• Workflow parallelization– Homogenous environments• Multiprocessor or cluster systems
• Centralized control
• High‐throughput, low latency, high performance
– Heterogeneous environments• Grids, desktop grids and volunteering computing,
hybrid clouds
• Different efforts to parallelize the workflow
6/24/2010 3A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 4: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/4.jpg)
The problems
• To parallelize in the context of scientific workflows– Independent of the target environments
– Better control of the experiment
• Provenance gathering– On heterogeneous distributed environment
– Experiments must be reproducible
• Control X Performance
6/24/2010 4A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 5: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/5.jpg)
What we propose
• Peer‐to‐Peer approach– Decentralized control– Scalability– Dynamic behavior of nodes
• P2P to support MTC– Through Scientific Workflows
• SciMule– A tool to parallelize and distribute scientific
workflows activities through P2P computing
6/24/2010 5A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 6: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/6.jpg)
Agenda
1. Backgrounds on P2P networks– P2P approaches overview and comparison
2. SciMule Architecture– Features and strategies
3. Experimental results regarding SciMule– Does it work?
4. Conclusions
6/24/2010 6A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 7: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/7.jpg)
P2P types
Napster
Hierarchical DHTSuper‐Peer
e2dk
Kademlia
Gnutella
KaZaA
Canon
Torrent
6/24/2010 7A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 8: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/8.jpg)
Factors that impact on P2P architectures
• Load Balancing– An activity can generate thousands of tasks
• Scalability– Large Scale Networks
• Churn risk– How impactful a churn event can be
• Maintenance cost– Affects the scheduling process
6/24/2010 8A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 9: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/9.jpg)
Comparing
Network type
Factor
Centralized Decentralized Hierarchical
Load Balancing Low High Moderate
Scalability Low Moderate High
Churn Risks High Low Moderate
Maintenance Cost Low High Moderate
6/24/2010 9A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 10: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/10.jpg)
Design of SciMule
• Middleware– Distribute, control and monitor Wf Activities
• Promote Workflow/Activity Parallelization
• Hierarchical approach
• Three‐layer architecture– Submission layer
– Execution layer
– Overlay layer
PeerPeerSWfMSSWfMS SciMuleSciMule
6/24/2010 10A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 11: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/11.jpg)
SciMule Peer Roles
• Client Peer– Submits
• Executor Peer– Executes
• Gate Peer– Keeps the list of nearby nodes and their subject
• Peers may play any role during SciMule lifetime
6/24/2010 11A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 12: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/12.jpg)
SciMule Purpose
• To make it easier to distribute Wf activities– To make scientists happier!
• Using MTC paradigm– MTC without control are hard to maintain
• Activities that demand high computation
• Adaptation of Hydra– Real system
– Our approach for clusters
6/24/2010 12A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 13: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/13.jpg)
SciMule Architectural Features
• Two types of parallelization– Data parallelism
– Parameter sweep
• Distributed provenance gathering– Heterogeneous network
• Simple to deploy
• Linked to SWfMS
6/24/2010 13A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 14: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/14.jpg)
6/24/2010 14A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 15: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/15.jpg)
SciMule Overlay
• Overlay affects the performance
• Balance peers– Locality principles
– Subjects
• Gate Peers– Keep a list of nearby nodes
– Control subjects
– Have low churn frequency and high reputation
– Keep a backup node
6/24/2010 15A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 16: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/16.jpg)
Overlay characteristics
• New peers connect to few neighbors– Half the average of the old peers neighborhood
– Avoid free riders
• Neighborhood grows over time
• Controlled connectivity– Based on the number of gate peers on the
network
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows16
![Page 17: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/17.jpg)
Overlay behaviour
• Network with n nodes and g Gate Peers
Gate Peers
…
n/g
n/g
n/g
6/24/2010 17A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 18: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/18.jpg)
Overlay
Gate Peers
…
+n/gn/g
n/g
=2n/g
6/24/2010 18A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 19: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/19.jpg)
Overlay
Gate Peers
…
+ n/g
Maximum of neighbors: 3n/g
6/24/2010 19A P2P Approach to Many Tasks Computing
for Scientific Workflows
![Page 20: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/20.jpg)
SciMule Evaluation
• Evaluate the proposed architecture– Does SciMule parallelization scale?
• Simulation environment– Experimentation on real large scale P2P is expensive
and difficult– Flexibility to evaluate hybrid networks
• PeerSim extension– SciMule peers, Link, Data package, Workflow
Activities and Task components– Scheduling, transference and execution systems– Overlay
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows20
![Page 21: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/21.jpg)
The study
• Five‐factor, four‐treatment study
• 14400 cycles of simulation (5 days simulated)
• 4096 nodes in the network
• 0.01 Poisson activity submission frequency
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows21
FactorsIndependent Variables
cycles n f k tasks cost size churn14400 4096 0.01 32 128 4000 12000 0.00
64 512 8000 24000 0.05128 16000 48000 0.10256 32000 96000
![Page 22: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/22.jpg)
Presented Scenarios
• 384 instances of simulation
• Representative Activities cases– Churn Poisson frequency: 0%, 5% and 10%
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows22
Activity Name Tasks TaskCost (p.u.)
TaskSize (MB)
Low Cost Medium Size 512 4,000 6Low Cost Big Size 128 4,000 12Medium Cost Small Size 128 16,000 1.5High Cost Small Size 512 32,000 1.5
![Page 23: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/23.jpg)
Parameter Sweep results
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows23
Activity Name Tasks TaskCost (p.u.)
TaskSize (MB)
Low Cost Medium Size 512 4,000 6Low Cost Big Size 128 4,000 12Medium Cost Small Size 128 16,000 1.5
![Page 24: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/24.jpg)
Data Parallelism results
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows24
Activity Name Tasks TaskCost (p.u.)
TaskSize (MB)
Low Cost Medium Size 512 4,000 6Low Cost Big Size 128 4,000 12Medium Cost Small Size 128 16,000 1.5
![Page 25: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/25.jpg)
Conclusions
• SciMule scales– Distributing scientific workflow activities over a
P2P network
– Promising approach to deal with MTC on heterogeneous environments• P2P
• Hybrid Clouds
• Distribution approaches should vary
• Impact of churn events
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows25
![Page 26: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/26.jpg)
Future Work
• To improve the scheduling mechanism– Considering data transferring minimization
• Minimize data transfer impact– Improve data discovery
– Improve data distribution• Torrent
• Compression
• Replication
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows26
![Page 27: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/27.jpg)
Acknowledgements
6/24/2010A P2P Approach to Many Tasks Computing
for Scientific Workflows27
![Page 28: Computing for Scientific Workflows A P2P Approach to Many ...vecpar.fe.up.pt/2010/slides/29.pdfVanessa Braganholo Patrick Valduriez Marta Mattoso COPPE/UFRJ PPGI/UFRJ Université Montpellier/INRIA](https://reader036.fdocuments.in/reader036/viewer/2022062417/6127caece35b422a41120b2f/html5/thumbnails/28.jpg)
VECPAR’10
A P2P Approach to Many Tasks Computing for Scientific Workflows
Eduardo Ogasawara Jonas DiasDaniel Oliveira Carla RodriguesCarlos Pivotto Rafael AntasVanessa Braganholo Patrick ValduriezMarta Mattoso
COPPE/UFRJ
PPGI/UFRJUniversité Montpellier/INRIA
Thanks!