GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
-
Upload
alfred-stokes -
Category
Documents
-
view
212 -
download
0
Transcript of GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
![Page 1: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/1.jpg)
GRAM5 - A sustainable, scalable, reliable GRAM service
Stuart Martin - UC/ANL
![Page 2: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/2.jpg)
GRAM5 2SC 2009
What is GRAM?
GRAM is a Globus Toolkit component For Grid job management
GRAM is a unifying remote interface to Resource Managers Yet preserves local site security/control
GRAM is for stateful job control Reliable create operation Asynchronous monitoring and control Remote credential management Remote file staging and file cleanup
![Page 3: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/3.jpg)
GRAM5 3SC 2009
Grid Job Management Goals
Provide a service to securely: Create an environment for a job Stage files to/from environment Cause execution of job process(es)
Via various local resource managers Monitor execution Signal important state changes to client
![Page 4: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/4.jpg)
GRAM5 4SC 2009
Traditional Interaction
4
Local Jobs
Resource A
Scheduler (e.g., PBS)
Compute Nodes
Satisfies many use cases TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one
stop shopping, why do we need more?
![Page 5: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/5.jpg)
GRAM5 5SC 2009 5
Local Jobs
Resource A
GRAM Service
Scheduler (e.g., PBS)
Compute Nodes
remoteGRAMJobs
GRAM API
Add remote execution capability Enable clients/devices to manage
jobs with logging into the cluster
GRAM Benefit
![Page 6: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/6.jpg)
GRAM5 6SC 2009
GRAM Benefit
6
GRAM Service
Scheduler (e.g., PBS)
Compute Nodes
GRAM Service
Scheduler (e.g., LSF)
Compute Nodes
Local Jobs Local Jobs
Resource A Resource B
GRAMJobs
GRAM API
Provides scheduler abstraction
![Page 7: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/7.jpg)
GRAM5 7SC 2009
GRAM Benefit
7
GRAM
Sched
Compute Nodes
GRAMjobs
Scalable job management Interoperablility
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM API
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
![Page 8: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/8.jpg)
GRAM5 8SC 2009
Users/Applications: Science Gateways, Portals, CLI scripts,
App Specific Web Service, etc.
Resource Managers: PBS, Condor, LSF, SGE,
Loadleveler, Fork
GRAM
![Page 9: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/9.jpg)
GRAM5 9SC 2009
Higher-level Clients and User Examples
![Page 10: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/10.jpg)
GRAM5 10SC 2009
Condor-G Architecture
GRAM
LSF
User Job
Startd
Personal Condor Remote Resource
Condor jobs
GlideIn jobs
Starter
ScheddCollector & Negotiator
Grid Manager
Shadow
Master
![Page 11: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/11.jpg)
GRAM5SC 2009
GridWay Components
ExecutionManager
TransferManager
InformationManager
DispatchManager
RequestManager
Scheduler
Job Pool Host Pool
DRMAA library CLI
GridWay Core
File TransferServices
ExecutionServices
GridFTP RFTpre-WSGRAM
WSGRAM
InformationServices
MDS2MDS2GLUE
MDS4
Resource DiscoveryResource MonitoringResource DiscoveryResource Monitoring
Job PreparationJob TerminationJob Migration
Job PreparationJob TerminationJob Migration
Job SubmissionJob MonitoringJob ControlJob Migration
Job SubmissionJob MonitoringJob ControlJob Migration
![Page 12: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/12.jpg)
GRAM5 12SC 2009
GridWay / Condor-G Benefit
12
Scalable job management Throttling Metascheduling
GRAM API
GridWayjobs
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
![Page 13: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/13.jpg)
GRAM5 13SC 2009
Architecture of Ninf-G
Client
GRAM /NAREGI /
Condor / SSH
Invoke Executable
Connect back
IDL file NumericalLibrary
IDL Compiler
Ninf-GExecutable
Generate Interface Request
Interface Reply
Server side
Client side
MDS4 /NAREGI IS
Interface InformationLDIF Fileretrieve
Globus-IO / ssh / TCP
InvokeServer
![Page 14: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/14.jpg)
GRAM5 14SC 2009
caBIG and Globus caGrid is built on top of Globus 4 WSRF Java Core and Security
![Page 15: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/15.jpg)
GRAM5 15SC 2009
caBIG - TeraGridIntegration
Leave caGrid service infrastructure as is with the exception of the analytical services.
globus
![Page 16: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/16.jpg)
GRAM5 16SC 2009
Hierarchical Clustering Results
![Page 17: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/17.jpg)
GRAM5 17SC 2009
UserJob(s)
GRAM2 ArchitectureDiagram
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager RM adapter
poll ResourceManager
Job Submission
Job Monitoring
![Page 18: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/18.jpg)
GRAM5 18SC 2009
UserJob(s)
GRAM2 Architecture
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager RM adapter
poll ResourceManager
Job Submission
Job Monitoring
Job Manager RM adapter
submit Job Manager RM adapter
submit Job Manager RM adapter
submit
Job Manager RM adapter
poll Job Manager RM adapter
poll Job Manager RM adapter
poll
Unlimited Unlimited
Unlimited Unlimited
![Page 19: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/19.jpg)
GRAM5 19SC 2009
UserJob(s)
GRAM5 Architecture
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager ResourceManager
Job Submission
Job Monitoring
RM adaptersubmit RM adapter
submit
Job Manager
Job Manager Job Manager
RM logSEG log
SEG
throttled(default 6)
1 process
1 process 1 process
![Page 20: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/20.jpg)
GRAM5 20SC 2009
Changes Made to Improve Scalability
Removed extra listening port per job for MPIg jobs Functionality can be re-implemented around GRAM
Removed active monitoring of stdout/err files for streaming during job execution Instead transfer stdout/err at the end of job execution
![Page 21: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/21.jpg)
GRAM5 21SC 2009
Improvements
New Job Manager Logging implementation Added job exit code support Added GRAM service version detection Added usage statistics support Added support for auditing of TG gateway user attribute
Updated admin, user, developer guides Many bugs fixed
![Page 22: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/22.jpg)
GRAM5 22SC 2009
Releases and Testing
3 Alpha releases and 1 Beta 2 deployments on TeraGrid
Significant scalability testing of Condor-G Jaime Frey Igor Sfiligoi Gaurang Mehta
Included in GT 5.0.0 RCs Internal functional and performance testing
http://cvs.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/qp/#id2557011
![Page 23: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/23.jpg)
GRAM5 23SC 2009
![Page 24: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/24.jpg)
GRAM5 24SC 2009
Next Improvement
Add support for Sun Grid Engine (SGE) adapter
Improve support for native packaging
![Page 25: GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649ebe5503460f94bc7aab/html5/thumbnails/25.jpg)
GRAM5 25SC 2009
Thanks to the GRAM developers!
Joe Bester - ANL Mike Link - ANL