EECE 571R: Data-intensive computing systems Matei Ripeanu matei at ece.ubc.ca.
Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang...
![Page 1: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/1.jpg)
Cactus-G:Experiments with a Grid-Enabled
Computational Framework
Dave Angulo, Ian Foster
Chuang Liu, Matei Ripeanu, Michael RussellDistributed Systems Laboratory
University of Chicago & Argonne National Laboratory
Gabrielle Allen, Thomas Dramlitsch,
Ed Seidel, Thomas RadkeMax-Planck-Institut für Gravitationsphysik
UCSD, UIUC, U.Tenn GrADS Groups
![Page 2: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/2.jpg)
Grid Application Development Software Project
Overview
Research goals: Why Cactus-G Context: Numerical relativity, Cactus, dynamic
Grid computing The Cactus-G Grid-enabled framework Cactus and GrADS
– The Cactus Worm model problem
– Dynamic resource selection & code migration
– Experimental results Future directions & lessons learned
![Page 3: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/3.jpg)
Grid Application Development Software Project
Research Goals
Investigate methods and structures for efficient Grid execution via in-depth study of a demanding application, including– Constructs for adapting to heterogeneity
– Constructs for dynamic resource acquisition Create testbed for GrADSoft components,
as they emerge Investigate utility of computational
frameworks as facilitator of Grid computing
![Page 4: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/4.jpg)
Grid Application Development Software Project
Context (1):Numerical Relativity
Numerical simulation of extreme astrophysical events: colliding black holes, neutron stars, etc.– Understand physics
– Predict gravitational wave forms Relativistic effects => Einstein eqns
– Computationally intensive (can be 1000s flops/grid point)
– 3-D simulations only recently possible: demanding users
LIGO gravitational wave observatory
Colliding black holes
![Page 5: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/5.jpg)
Grid Application Development Software Project
Context (2): Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)
Modular, portable framework for parallel, multidimensional simulations
Construct codes by linking– Small core (flesh): mgmt services
– Selected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.
Custom linking/configuration tools Developed for astrophysics, but not
astrophysics-specific
Cactus “flesh”
Thorns
![Page 6: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/6.jpg)
Grid Application Development Software Project
Context (3):Dynamic Grid Computing
Application behaviors in a Grid environment:– Identify fastest/cheapest/biggest resources
– Configure for efficient execution
– Detect need for new resources or behaviors (e.g., due to resource slowdown, new subtasks, new appln regime, user steering, new resource available)
– Adapt, and/or discover new resources; invoke subtasks on new resources and/or migrate
We have users who want these behaviors; we also have the enabling machinery
![Page 7: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/7.jpg)
Grid Application Development Software Project
Cactus-G: An ApplicationFramework for Dynamic Grid Computing
Cactus thorns for active management of application behavior and resource use
Heterogeneous resources, e.g.:– Irregular decompositions
– Variable halo for managing message size
– Msg compression (comp/comm tradeoff)
– Comms scheduling for comp/comm overlap Dynamic resource behaviors/demands, e.g.:
– Perf monitoring, contract violation detection
– Dynamic resource discovery & migration
– User notification and steering
![Page 8: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/8.jpg)
Grid Application Development Software Project
Cactus-G Example:Terascale Computing
Gig-E100MB/sec
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
OC-12 lineBut only 2.5MB/sec)
17
12
512
5
4 2 2
Solved EEs for gravitational waves (real code)– Tightly coupled, communications required through derivatives– Must communicate 30MB/step between machines– Time step take 1.6 sec
Used 10 ghost zones along direction of machines: communicate every 10 steps
Compression/decomp. on all data passed in this direction Achieved 70-80% scaling, ~200GF (only 14% scaling without tricks)
![Page 9: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/9.jpg)
Grid Application Development Software Project
Cactus-G Model Problem:The Cactus Worm
Migrate to “faster/ cheaper” system– When better system
discovered
– When requirements change
– When characteristics change (e.g., competition)
Tests most elements of Cactus-G & GrADS
![Page 10: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/10.jpg)
Grid Application Development Software Project
Cactus Worm Architecture
Cactus “flesh”
“Tequila” Thorn
Appln& otherthorns
Globus Toolkit substrate: resourcediscovery, allocation, management
GrADS Mechanisms
Resourceselector
Applicationmanager
![Page 11: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/11.jpg)
Grid Application Development Software Project
Tequila Thorn Functions
Initiate adaptation on any one of– User request (e.g., HTTP thorn)
– Notification of new resources
– Application monitoring: contract violation Request resources (ClassAd protocol)
– E.g., GrADS ResourceSelector Checkpoint application Contact App Manager to request restart
– Security, robustness advantages vs. direct restart
![Page 12: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/12.jpg)
Grid Application Development Software Project
Cactus WormDetailed Architecture & Operation
Cactus “flesh”
“Tequila” Thorn
Computeresource
Computeresource
…
Coderepository…
Coderepository
Storageresource
Storageresource
…
GridInformation
Service
GrADSResourceSelector
ApplicationManager
Appln& otherthorns
(1) Adapt.request (2)
Resourcerequest
(3) Writecheckpoint
(4) Migrationrequest
(5) Cactusstartup
(7) Readcheckpoint
(0) Possibleuser input
(6) Loadcode
(1’) Resourcenotification
Storemodels, etc.
Query
![Page 13: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/13.jpg)
Grid Application Development Software Project
Contract Monitor
Driven by three user-controllable parameters– Time quantum for “time per iteration”
– % degradation in time per iteration (relative to prior average) before noting violation
– Number of violations before migration Potential causes of violation
– Competing load on CPU
– Computation requires more processing power: e.g., mesh refinement, new subcomputation
– Hardware problems
![Page 14: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/14.jpg)
Grid Application Development Software Project
Current Status
We have developed– Tequila thorn: monitoring, selection, control
– ResourceSelector (via ClassAd protocol)
– Cactus performance model We have demonstrated on GrADS Macro Grid
– Contract monitoring for multiprocessor runs
– Dynamic resource selection
– Migration
![Page 15: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/15.jpg)
Grid Application Development Software Project
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 3 5 7 9
11
13
15
17
19
21
23
25
27
29
31
33
Clock Time
Ite
rati
on
s/S
ec
on
d
Migration in Action
Loadapplied
3 successivecontract
violations
RunningAt UIUC
(migrationtime not to scale)
Resourcediscovery
& migration
RunningAt UC
![Page 16: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/16.jpg)
Grid Application Development Software Project
Ongoing and Future Work:New and Improved Capabilities
Optimize migration process Use performance models during selection
– And include cost of migration & information about future computation in the model
Matchmaker-based ResourceSelector– Separation of concerns between resource
characterization and selection– Study resource characterization process, use
NWS-based prediction techniques Dynamic notification of availability of “better”
resources
![Page 17: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/17.jpg)
Grid Application Development Software Project
Ongoing and Future Work:Further Integration with GrADSoft
Contract monitoring– Pablo– Issues: determining which thorns are
monitored, or if flesh is monitored Program Preparation System Configurable Object Program and
Application Launcher – Cactus has its own launcher and compiles
its own code
![Page 18: Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d7e5503460f94a60cdd/html5/thumbnails/18.jpg)
Grid Application Development Software Project
Lessons Learned and Outcomes
Lessons learned– A real & demanding application can exploit
adaptive techniques to execute efficiently in Grid environments
– Even a relatively regular application can incorporate a range of useful mechanisms for adaptive behaviors & resource demands
Outcomes– Prototype Cactus-G framework: wonderful
experimental platform