Moab: Intelligent Workload Management
Transcript of Moab: Intelligent Workload Management
© 2012 ADAPTIVE COMPUTING, INC.1
Moab: Intelligent Workload Management
Colin Whitbread
[email protected] 10/05/2012
© 2012 ADAPTIVE COMPUTING, INC.2 © 2012 ADAPTIVE COMPUTING, INC.2
Intelligent Workload Management with Moab
2
Moab optimizes and automates by:
Unifying resource and workload information
Automating intelligent decisions through policies
Making and considering future commitments
Optimizing workload placement over time and resources
Modifying workload and resources for optimal performance
QueryWorkload, Resources, and State
Apply Policies to Control
Workloads
Resource Managers
Moab HPC Suite
Resources
© 2012 ADAPTIVE COMPUTING, INC.3
Moab HPC Suite - Enterprise Edition Architecture
3
Moab Intelligence EngineDecisions, policies, resource scheduling, allocation, orchestration
Portal
Accounting
Moab HPC Suite™ - Enterprise Edition
Management Integration (Moab Services Manager, Web Services)
User Self-Service Admin Dashboard
Users Administrators
Resources
Multi-Vendor Resource ManagersStorage
ManagerProvisioning
Manager
Job Manager(TORQUE) Network
ManagerHealthMonitor
Non-traditionalResource Managers
Other Resources: Licenses Chillers NFS Servers Database Servers Equipment Etc.
Heterogeneous Resources
CLI
Workload Job Queue
© 2012 ADAPTIVE COMPUTING, INC.4 © 2012 ADAPTIVE COMPUTING, INC.4
TORQUE is a descendant of PBS (Portable Batch System) developed 1993-1994
▪ What was happening with computers in 1993 and 1994
1993 - Commercial providers allowed to sell Internet connections to individuals
1993 - Intel releases P5-based Pentium processors with 60 MHz and 66 MHz versions
1993 - Novell purchased Digital Research, DR-DOS
1993 - Windows NT 3.1 released which supported 32-bit programs
1993 – MS-DOS 6.0 released
1994 – Intel releases 90 MHz and 100 MHz Pentium Processors
1994 – Motorola releases the 68060 processor
1994 - Linus Torvalds releases version 1.0 of Linux Kernel.
1994 – IBM releases PC-DOS 6.3
TORQUE 4.0 history lesson
4
© 2012 ADAPTIVE COMPUTING, INC.5 © 2012 ADAPTIVE COMPUTING, INC.5
Program practices of 1993-1994
Memory limited
Keep code size small
Slow and unreliable network connections
No threads for Unix or Linux
Disk storage slow
TORQUE 4.0 history lesson
© 2012 ADAPTIVE COMPUTING, INC.6 © 2012 ADAPTIVE COMPUTING, INC.6
▪ Scalability
▪ Support 10s of thousands of hosts in a cluster
▪ Support jobs using 10s of thousands of hosts
▪ Improved Communications
▪ Higher Throughput and Response
▪ Improved Reliability
▪ Enhanced Security
TORQUE 4.0 Goals
6
Scalable
Reliable
Secure
Responsive Enterprise-ReadyT O R Q U E
© 2012 ADAPTIVE COMPUTING, INC.7 © 2012 ADAPTIVE COMPUTING, INC.7
▪ Problem: all sister nodes communicate directly with Mother Superior
▪ Causes communication failures when a node becomes saturated
▪ Jobs are needlessly lost
▪ Communication bottlenecks at single point
Job Radix
MOAB pbs_server
MOM
MOM
MOM
MOM
MOMsuper
MOM
MOM
MOM
MOM
MOM
MOM
MOM
!
© 2012 ADAPTIVE COMPUTING, INC.8 © 2012 ADAPTIVE COMPUTING, INC.8
Job Radix Solution: $ qsub script.sh -W job_radix=3 States that each MOM should communicate directly with 3 other MOMs
No longer a single stress point, each node communicates with a maximum of 3 other MOMs
Scalability – Job Radix
© 2012 ADAPTIVE COMPUTING, INC.9 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL
Scalability – Job Radix=3
9
Mother Superior
SisterSister Sister
Sister Sister Sister Sister Sister Sister Sister Sister Sister
© 2012 ADAPTIVE COMPUTING, INC.10 © 2012 ADAPTIVE COMPUTING, INC.10
MOM Hierarchy
Goals
Improve efficiency of MOM to server update communications
Reduce network congestion
Improve cluster status reliability
No more MOMS going down when nothing is wrong
© 2012 ADAPTIVE COMPUTING, INC.11 © 2012 ADAPTIVE COMPUTING, INC.11
MOM Hierarchy
<TORQUE_HOME>server_priv/mom_hierarchy Specifies where each MOM should send status updates. The updates get
propagated from there to pbs_server.
Self-healing – Describes how retries should happen when the preferred connection can't be obtained.
Requires the server to process larger amounts of data, but with a lower number of connections, resulting in better performance.
Administrator can set up hierarchy to mirror actual network topology, resulting in optimum performance.
© 2012 ADAPTIVE COMPUTING, INC.12 © 2012 ADAPTIVE COMPUTING, INC.12
MOM Hierarchy
Sample file:
<path>
<level>node0,node1,node2</level>
<level>node3,node4,node5</level>
…
</path>
<path>
...
Note: You can specify multiple paths. Not all nodes need to be in all paths but nodes can be in multiple paths if desired.
© 2012 ADAPTIVE COMPUTING, INC.13 © 2012 ADAPTIVE COMPUTING, INC.13
MOM Hierarchy
pbs_server
MOM0MOM1 MOM2
MOM3MOM4 MOM5
MOM6MOM7 MOM8
© 2012 ADAPTIVE COMPUTING, INC.14 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL
TORQUE Throughput
14
Refactor internal structuresPass only used attributes.
Reduces memory footprint
Remove linked lists – replace with hash maps and arrays.
Replace DIS (Data is String).•45% of time spent encoding and decoding strings.
•Requires several function calls to send 1 character
Remove redundant code.7 calls to getenv() in qsub
© 2012 ADAPTIVE COMPUTING, INC.15
Moab HPC Suite - Enterprise Edition Architecture
15
Moab Intelligence EngineDecisions, policies, resource scheduling, allocation, orchestration
Portal
Accounting
Moab HPC Suite™ - Enterprise Edition
Management Integration (Moab Services Manager, Web Services)
User Self-Service Admin Dashboard
Users Administrators
Resources
Multi-Vendor Resource ManagersStorage
ManagerProvisioning
Manager
Job Manager(TORQUE) Network
ManagerHealthMonitor
Non-traditionalResource Managers
Other Resources: Licenses Chillers NFS Servers Database Servers Equipment Etc.
Heterogeneous Resources
CLI
Workload Job Queue
© 2012 ADAPTIVE COMPUTING, INC.16 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL
Moab - Scalability
16
Objects cached within Moab:- Jobs- Nodes- Reservations- Triggers
Real-time status
© 2012 ADAPTIVE COMPUTING, INC.17 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL
Moab - Scalability
17
Threaded client commands
More dynamic limits, fewer static limits
Per-partition scheduling
Reduced memory footprint
© 2012 ADAPTIVE COMPUTING, INC.18 © 2012 ADAPTIVE COMPUTING, INC.18
Moab
Moab® HPC Suite – Grid Option
Extends Moab HPC Suite to Grid Functionality
▪ Flexibility in grid structure, management and control
▪ Unified grid management
▪ Maintains sovereignty while enabling sharing
▪ Improved user collaboration and productivity
▪ Increased system throughput and utilization
18
© 2012 ADAPTIVE COMPUTING, INC.19 © 2012 ADAPTIVE COMPUTING, INC.19
Flexibility: Grid Management Options
▪ Centralized and/or local management
▪ Local management – “peer-to-peer”
19
Moab Head NodeAll Rules for Grid Environment or Shared Grid Rules
MoabLocal Rules
MoabLocal Rules
MoabLocal Rules
MoabLocal Rules
MoabLocal Rules
© 2012 ADAPTIVE COMPUTING, INC.20
© 2012 ADAPTIVE COMPUTING, INC.21 © 2012 ADAPTIVE COMPUTING, INC.21
▪ New authorization daemon (trqauthd) gives administrators better control over users
▪ Eliminated security loophole
▪ Users cannot run jobs as another user
▪ trqauthd runs as “root” (replaced pbs_iff with “sticky” bit)
Enhanced Client and Security Management
21