GRAM5 - A sustainable, scalable, reliable GRAM service
Stuart Martin - UC/ANL
GRAM5 2SC 2009
What is GRAM?
GRAM is a Globus Toolkit component For Grid job management
GRAM is a unifying remote interface to Resource Managers Yet preserves local site security/control
GRAM is for stateful job control Reliable create operation Asynchronous monitoring and control Remote credential management Remote file staging and file cleanup
GRAM5 3SC 2009
Grid Job Management Goals
Provide a service to securely: Create an environment for a job Stage files to/from environment Cause execution of job process(es)
Via various local resource managers Monitor execution Signal important state changes to client
GRAM5 4SC 2009
Traditional Interaction
4
Local Jobs
Resource A
Scheduler (e.g., PBS)
Compute Nodes
Satisfies many use cases TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one
stop shopping, why do we need more?
GRAM5 5SC 2009 5
Local Jobs
Resource A
GRAM Service
Scheduler (e.g., PBS)
Compute Nodes
remoteGRAMJobs
GRAM API
Add remote execution capability Enable clients/devices to manage
jobs with logging into the cluster
GRAM Benefit
GRAM5 6SC 2009
GRAM Benefit
6
GRAM Service
Scheduler (e.g., PBS)
Compute Nodes
GRAM Service
Scheduler (e.g., LSF)
Compute Nodes
Local Jobs Local Jobs
Resource A Resource B
GRAMJobs
GRAM API
Provides scheduler abstraction
GRAM5 7SC 2009
GRAM Benefit
7
GRAM
Sched
Compute Nodes
GRAMjobs
Scalable job management Interoperablility
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM API
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM5 8SC 2009
Users/Applications: Science Gateways, Portals, CLI scripts,
App Specific Web Service, etc.
Resource Managers: PBS, Condor, LSF, SGE,
Loadleveler, Fork
GRAM
GRAM5 9SC 2009
Higher-level Clients and User Examples
GRAM5 10SC 2009
Condor-G Architecture
GRAM
LSF
User Job
Startd
Personal Condor Remote Resource
Condor jobs
GlideIn jobs
Starter
ScheddCollector & Negotiator
Grid Manager
Shadow
Master
GRAM5SC 2009
GridWay Components
ExecutionManager
TransferManager
InformationManager
DispatchManager
RequestManager
Scheduler
Job Pool Host Pool
DRMAA library CLI
GridWay Core
File TransferServices
ExecutionServices
GridFTP RFTpre-WSGRAM
WSGRAM
InformationServices
MDS2MDS2GLUE
MDS4
Resource DiscoveryResource MonitoringResource DiscoveryResource Monitoring
Job PreparationJob TerminationJob Migration
Job PreparationJob TerminationJob Migration
Job SubmissionJob MonitoringJob ControlJob Migration
Job SubmissionJob MonitoringJob ControlJob Migration
GRAM5 12SC 2009
GridWay / Condor-G Benefit
12
Scalable job management Throttling Metascheduling
GRAM API
GridWayjobs
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM
Sched
Compute Nodes
GRAM5 13SC 2009
Architecture of Ninf-G
Client
GRAM /NAREGI /
Condor / SSH
Invoke Executable
Connect back
IDL file NumericalLibrary
IDL Compiler
Ninf-GExecutable
Generate Interface Request
Interface Reply
Server side
Client side
MDS4 /NAREGI IS
Interface InformationLDIF Fileretrieve
Globus-IO / ssh / TCP
InvokeServer
GRAM5 14SC 2009
caBIG and Globus caGrid is built on top of Globus 4 WSRF Java Core and Security
GRAM5 15SC 2009
caBIG - TeraGridIntegration
Leave caGrid service infrastructure as is with the exception of the analytical services.
globus
GRAM5 16SC 2009
Hierarchical Clustering Results
GRAM5 17SC 2009
UserJob(s)
GRAM2 ArchitectureDiagram
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager RM adapter
poll ResourceManager
Job Submission
Job Monitoring
GRAM5 18SC 2009
UserJob(s)
GRAM2 Architecture
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager RM adapter
poll ResourceManager
Job Submission
Job Monitoring
Job Manager RM adapter
submit Job Manager RM adapter
submit Job Manager RM adapter
submit
Job Manager RM adapter
poll Job Manager RM adapter
poll Job Manager RM adapter
poll
Unlimited Unlimited
Unlimited Unlimited
GRAM5 19SC 2009
UserJob(s)
GRAM5 Architecture
Job Manager
Client Gatekeeper
RM adaptersubmit
ResourceManager
UserJob(s)
Job Manager ResourceManager
Job Submission
Job Monitoring
RM adaptersubmit RM adapter
submit
Job Manager
Job Manager Job Manager
RM logSEG log
SEG
throttled(default 6)
1 process
1 process 1 process
GRAM5 20SC 2009
Changes Made to Improve Scalability
Removed extra listening port per job for MPIg jobs Functionality can be re-implemented around GRAM
Removed active monitoring of stdout/err files for streaming during job execution Instead transfer stdout/err at the end of job execution
GRAM5 21SC 2009
Improvements
New Job Manager Logging implementation Added job exit code support Added GRAM service version detection Added usage statistics support Added support for auditing of TG gateway user attribute
Updated admin, user, developer guides Many bugs fixed
GRAM5 22SC 2009
Releases and Testing
3 Alpha releases and 1 Beta 2 deployments on TeraGrid
Significant scalability testing of Condor-G Jaime Frey Igor Sfiligoi Gaurang Mehta
Included in GT 5.0.0 RCs Internal functional and performance testing
http://cvs.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/qp/#id2557011
GRAM5 23SC 2009
GRAM5 24SC 2009
Next Improvement
Add support for Sun Grid Engine (SGE) adapter
Improve support for native packaging
GRAM5 25SC 2009
Thanks to the GRAM developers!
Joe Bester - ANL Mike Link - ANL
Top Related