From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research...
-
Upload
griffin-powers -
Category
Documents
-
view
218 -
download
1
Transcript of From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research...
![Page 1: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/1.jpg)
From Grid to Global Computing:Deploying Parameter SweepApplications
Henri Casanova
Grid Research And Innovation Laboratory (GRAIL)
http://grail.sdsc.edu/San Diego Supercomputer Center (SDSC)
Computer Science and Engineering Dept. (CSE)
University of California, San Diego (UCSD)
![Page 2: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/2.jpg)
Parameter Sweep Applications
Many compute tasks No or simple dependencies Several output post-processing stages Potentially large datasets
Input data
Raw Output
Tasks
Post-processing
Final Output
![Page 3: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/3.jpg)
Relevance
Arise in virtually every field of science an engineering Monte Carlo, Parameter Space
Searches, Parameter Studies, etc. Biology, Astrophysics, Physics,
Bioinformatics, Economics, etc. Primary candidate for Grid
computing Latency-tolerant, amenable to simple
fault-tolerance Need huge amount of resources
![Page 4: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/4.jpg)
Outline of the Presentation
Parameter Sweep Applications
(PSAs)
APST
The Virtual Instrument
BIO@Home
![Page 5: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/5.jpg)
Scheduling of PSAs
?
![Page 6: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/6.jpg)
Grid Scheduling Practice
Ad-hoc solutions: specific to one application hand-tuned to the environment
(e.g. SF-Express demo)
Large body of work on Scheduling What can we re-use on the Grid?
Heterogeneous resources Dynamic performance characteristics Resources downtimes Complex network topologies Performance prediction errors
![Page 7: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/7.jpg)
“DataGrid” Scheduling
Goal: Co-locate/replicate data and computation
Dynamic Priority List-Scheduling Built on heuristics described in [Ibarra77, Siegel99]
Added adaptivity Simulation results
List-scheduling works, adaptivity should make it practical
Experimental results (Demo at SC’00 and SC’01)
[HCW’00] H. Casanova, A. Legrand, et al.
![Page 8: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/8.jpg)
Lessons
Much scheduling work to re-use List-scheduling with Dynamic
Priorities seems effective Simulation Experimental
Let’s build software that uses it Let’s target scientific communities
![Page 9: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/9.jpg)
Motivation for APST
Started as scheduling research Evolved into a tool that provides
Transparency of Grid execution Data movements Remote job management Multiple Grid middleware back-ends
Scheduling Self-scheduling List scheduling w/ dynamic priorities
![Page 10: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/10.jpg)
APST Designs
The AppLeS Parameter Sweep Template: An Application Execution Environment
XML application and resource descriptions
APST clientGrid
Grid Services
Scheduler
TransportCompute
Decisions
Actions
MetadataBookkeeper
Information
APST
![Page 11: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/11.jpg)
APST: Lessons
The Grid is difficult to use APST provides a simple software layer
that does one thing well Minimal user interface (XML, command-line) Used as a building block for domain-specific
applications E.g. multi-cluster bio-informatics (Singapore)
Ssh? Default mechanism Critical for gaining user buy in Natural way to lead to using the Grid
![Page 12: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/12.jpg)
APST Status
Version 1.1 released 2 weeks ago Available for public download Used for 10+ applications
Bioinformatics (BLAST, HMM, …) Computational Neuro-science
Globus, NetSolve, Ssh, Condor GASS, IBP, Scp, GridFTP, SRB, NWS, MDS, Ganglia,…
http://grail.sdsc.edu/projects/apst
![Page 13: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/13.jpg)
APST Research Directions
APST is a research platform Maintained by one staff Several graduate student contributors
Partitionable Workload Bioinformatics (database splitting) Factoring: Decrease chunk size Pipelining: Increase chunk size Combined? Create APST-BLAST
(Mario Lauria, OSU Yang Yang, UCSD)
![Page 14: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/14.jpg)
Outline of the Presentation
Parameter Sweep Applications
(PSAs)
APST
Virtual Instrument
BIO@home
![Page 15: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/15.jpg)
Computational Neuroscience
MCell: Monte Carlo Cell simulator Developed at Salk and PSC Gain knowledge about neuro-transmission mechanisms
• Fundamental for drug design (psychiatry)• Large user base (yearly MCell workshop) • Parallel MC simulations at the molecular level
![Page 16: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/16.jpg)
Traditional MCell usage
“By hand” No automatic project management No transparent resource access No automated data management
Consequences No interactive simulations No fault-tolerance, scheduling, … MCell limited to resources in the lab
![Page 17: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/17.jpg)
MCell and APST
APST alleviates some of the limitations Large-scale simulations Fault-tolerance and scheduling Data retrieval from distributed storage XML application descriptions
No interactivity MCell is exploratory User interaction is fundamental for many
users
![Page 18: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/18.jpg)
The Virtual Instrument
$2.5M funding from the NSF Salk, PSC, UCSB, UTK, UCSD
A running MCell simulation should behave as a lab instrument
Computational steering for MCell User interface Grid software Application software Scheduling research
(how does one scheduling an application that’s being steered interactively?)
![Page 19: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/19.jpg)
VIDatabase
VIInterface
VIDaemon
VI User
Grid Storage andCompute Resources
storage
computeGridServices
control
data
control+data
control+data
data
process
VI Software
OpenDX
![Page 20: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/20.jpg)
Scheduling Goals
Reduce the “search” time Let user assign levels of importance to
regions on the parameter space Assign fraction of resources with respect to
the importance levels Assign priorities to tasks
Interesting questions Job control limited on Grid resource Cannot assign exact fractions Interesting trade-offs between control
overhead and accuracy of priorities
![Page 21: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/21.jpg)
Current Status
First software prototype released in Feb 2002 Globus and Ssh MySQL OpenDX priority-based scheduling 20,000 lines of C++
Upcoming papers JPDC submission Scheduling paper (SC submission)
![Page 22: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/22.jpg)
Outline of the Presentation
Parameter Sweep Applications
(PSAs)
PSAs on the Grid with APST
MCell Virtual Instrument
Global Computing
![Page 23: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/23.jpg)
SETI@home
Over 500,000 active participants, most of which run screensaver on home PC
Over a cumulative 20 TeraFlop/sec Versus 12.3 TeraFlop/sec of IBM’s ASCI
White Cost: $500,000 + $200,000 in donated
hardware Less than 1% of the $110 million required
for ASCI White
![Page 24: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/24.jpg)
Global vs. Grid Computing
Nature of resources Home desktops running Windows and are
completely autonomous Machines powered on and off by user Behind firewalls, dynamic IP, transient
network connections Programming model
Server cannot “push” tasks to clients Server has no little means for remote job
control Server has incomplete information about
resources and availability
![Page 25: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/25.jpg)
Goal
SETI@home limitations: Embarrassingly parallel Infinite amount of input data Pure throughput
Can we do something more? Short-lived applications? Parallel applications? Compute service?
BIO@Home Smith-Waterman for short/long sequences No real software yet (build on XtremWeb?)
![Page 26: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/26.jpg)
Scheduling?
Sophisticated scheduling algorithms need information and control
At the moment: Simple mechanisms1. Work unit duplication
Specifies max number of times a work unit can be resent
2. TimeoutsTime that must elapse before work unit is
resent
![Page 27: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/27.jpg)
Simulation
Built a simulation model Using
statistics/surveys/extrapolations Next: logs from real systems
(XtremWeb?, Entropia?) Evaluated the impact of both
mechanisms on performance and throughput
![Page 28: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/28.jpg)
Early Lessons
Trade-off between throughput and turn-around time
Duplication: aggressively decreases turn-around time wastes resources there is an optimal value
Timeouts: moderately lowers turnaround times preserves good throughput infinite timeouts is of course not a good idea
![Page 29: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/29.jpg)
Future work
Two knobs Question: A compute service?
Mix of applications (SETI, short-lived, …) Singapore Bio-informatics institute
Notion of fairness? How do we implement policy with many
volatile resources? Software
Re-use existing platforms: XtremWeb Entropia
![Page 30: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/30.jpg)
Conclusion
APST, Virtual Instrument, BIO@Home Other GRAIL activities I didn’t talk
about Scientific Computing Simulation Adaptive Scheduling Networking
http://grail.sdsc.edu
![Page 31: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/31.jpg)
![Page 32: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/32.jpg)
![Page 33: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1f5503460f949f3d8d/html5/thumbnails/33.jpg)
Experimental Results
UTK
UCSD
TITECH
Tokyo
Self-scheduling XSufferage