xCAT+Moab Cloud
description
Transcript of xCAT+Moab Cloud
© 2008 IBM Corporation
xCAT+Moab Cloud
Egan Ford
Team Lead, ATS STaCC (Scientific, Technical, and Cloud Computing)
Project Leader, xCAT
© 2009 IBM Corporation
Agenda
Objectives
Stateless/Statelite
xCAT
Moab
xCAT+Moab
2 04/24/23
© 2009 IBM Corporation
Objectives Increase ROI
– Increase utilization.– Reduce management overhead.– Reduce downtime/Increase availability.
• Installation• Maintenance
– Rapid data & application-based provisioning.– Better cross departmental use of computing resources.
• Grid, On-Demand, Utility Computing, Cloud Reduce Investment Reduce Footprint Reduce Power Usage
3 04/24/23
© 2008 IBM CorporationNew Enterprise Data Center
IBM ConfidentialPage 4
Make Environment Changeable with Stateless, Traditional or Virtual Provisioning
20102008 2012
*12%
60%Stateless/Statelite or
Traditional Full
Provisioning
Non-virtualizable
Environments
Virtual Machines
Multi-purpose
Scale-out Computing
*Virtualization of workloads by virtual machine growth data—Gartner Data Center Event, Las Vegas, Dec. 2008.
© 2009 IBM Corporation5 Apr 24, 2023
What is stateless?
Stateless is not a new concept.– The processors and memory (RAM) subsystems in modern
machines do not maintain any state between reboots, i.e., having no information about what occurred previously.
Stateless provisioning takes this concept to the next level and removes the need to store the operating system and the operating system state locally.– Bproc/Beoproc– BlueGene/L
SAN/iSCSI provisioning and NFS-root-RW is not stateless.– State is maintained remotely (disk-elsewhere).
© 2009 IBM Corporation6 Apr 24, 2023
What is stateless provisioning?
Stateless provisioning loads the OS over the network into memory without the need to install to direct-attached disk.
OS state is not maintained locally or remotely after a reboot. For example booting an OS from CD (e.g. Live CD).– The initial start state will always be the same for any nodes
using the same stateless image as if reinstalling between reboots.
– SAN-boot, iSCSI-boot, NFS-root-RW are not stateless.
Think of your nodes/servers as preprogrammed appliances that serve a fixed or limited purpose.– E.g. DVD Player, toaster, etc...
© 2009 IBM Corporation7 Apr 24, 2023
Stateless is not diskless, diskfree, or disk-elsewhere
Stateless provisioning can leverage local disk, SAN, or iSCSI for /tmp, /var/tmp, scratch, application data, and swap.
If possible diskfree is recommended.– Reduced power– Reduced cooling– Reduced downtime (Increased system MTBF)– Reduced space
• Future diskfree only nodes. E.g. BlueGene.
Stateless does not change the way applications access data.– NFS, SAN, GPFS, local disk, etc… supported.
© 2009 IBM Corporation8 Apr 24, 2023
Why stateless provisioning?
Less (horizontal) software to maintain.– No inconsistencies over time.
Less (vertical) software to maintain.– Small fix purpose images vs. large general purpose images.
• Less risk of a software component having a security hole.• Reduced complexity.
Greater security.– No locally stored authentication data.
Initial large installations and upgrades can be reduced to minutes of boot time verses hours or days of operating system installation time.– Reprovisioning/repurposing a large number of machines can
be accomplished in a few minutes.
© 2009 IBM Corporation9 Apr 24, 2023
Why stateless provisioning?
Provides a framework for automated per application provisioning using intelligent application schedulers.
– Change server function as needed (On-Demand/Utility/Cloud computing).
– Increase node utilization.
A stateless image can be easily share across the enterprise enabling the promise of grid computing.
– This can be automated with grid scheduling solutions.
The migration of a running virtual machine between physical machines has no large OS image to migrate.
© 2009 IBM Corporation
Stateless Customer Set Limitations Operating System must boot and operate as OS vendor intended and must not require
modification.– Support
• Vendor• Application• Administrator• Community
Images must be easy to create and maintain.– RPM/YUM/YAST– A real file system layout for manual configuration.
Must support a system or method of per node unique configuration.– IP Addresses, NFS mounts, etc...– License Files– Authentication configuration and credentials (must not be in image).
Avoid reengineering existing or adding new networks. Predictive Performance Untethered (e.g. No SAN)
– May be unavoidable
10 04/24/23
© 2009 IBM Corporation11 Apr 24, 2023
What is xCAT?
Extreme Cluster(Cloud) Administration Toolkit
– Open Source Linux/AIX/Windows Scale-out Cluster Management Solution
Design Principles
– Build upon the work of others• Leverage best practices
– Scripts only (no compiled code)• Portable• Source
– Vox Populi -- Voice of the People• Community requirements driven• Do not assume anything
© 2009 IBM Corporation12 Apr 24, 2023
What does xCAT do?
Remote Hardware Control– Power, Reset, Vitals, Inventory, Event Logs, SNMP alert processing– xCAT can even tell you which light path LEDs are lit up remotely
Remote Console Management– Serial Console, SOL, Logging / Video Console (no logging)
Remote Destiny Control– Local/SAN Boot, Network Boot, iSCSI Boot
Remote Automated Unattended Network Installation– Auto-Discovery
• MAC Address Collection• Service Processor Programming• Remote Flashing
– Kickstart, Autoyast, Imaging, Stateless/Diskless, iSCSI Scales! Think 100,000 nodes. xCAT will make you lazy. No need to walk to datacenter again.
© 2009 IBM Corporation13 Apr 24, 2023
xCAT Past, Present, Future
October 1999– xCAT Zero created for Web 1.0
January 2000 – Present– xCAT used WW for scale-out Linux and Windows clusters– xCAT Community: 342 members from at least 29 countries
October 2007– xCAT 1.3.0 released– xCAT 2.0-alpha
• Linux Only 2008-2010
– xCAT 2.0, 2.1, 2.2, 2.3, 2.4 released• xSeries, pSeries, zSeries• Linux, Windows, and AIX• Open Source• CLI and Web
xCAT
PSSP
CSM
xCAT 2.0
pSeries
xSeries
0
25
50
75
100Total Number of Linux Clusters
IBM
Other
HPQ
Dell
Sun
SGI0
25
50
75
100
0
25
50
75
100Total Number of Linux Clusters
IBM
Other
HPQ
Dell
Sun
SGI
© 2009 IBM Corporation14 Apr 24, 2023
Add CSM Value into xCAT & Director
Large HPC,Parallel batch jobs
Commercialclusters
SMB, Departmental, orHeterogeneous clusters
xCAT Director
xCAT 2 Director/CM
2007
2008
2009
Open source Flexible, scalable Full IBM support available Expertise required IBM fully involved in open src development
Full IBM product & support GUI & CLI Easier learning curve All IBM platforms
CSM
© 2009 IBM Corporation15 Apr 24, 2023
Where is xCAT in use today?
NSF Teragrid (teragrid.org)– ~1500 IA64 nodes (2x proc/node), 4 sites, Myrinet
A Bank in America– n clouds @ 252 – 1008 iDPX nodes each, multi-site, rollout on-going, 10 GE
University of Toronto (SciNet)– Hybrid 3864 iDPX/Linux (30,912 cores) and 104 P6/AIX (3,328 cores)
Weta Digital (xCAT -- one tool to rule them all)– 1200 Xeon blades (2x proc/node), Gigabit Ethernet
LANL Roadrunner– Phase 1: 2016 Opteron Nodes (8 core/node), IB, Stateless– Phase 3: 3240 LS21, 6480 QS22, IB, Stateless
Anonymous Petroleum Customer– 30,000 nodes, 20,000 in largest single cluster. Was Windows, now Linux.
IBM On-Demand Center IBM GTS
– "They can have my xCAT when they pry it from my cold dead hands." -- Douglas Myers, IBM GS Special Events
© 2009 IBM Corporation16 Apr 24, 2023
xCAT 2 Support Requirements
Attributes of support offering
24x7 support
Worldwide
Close to traditional L1/L2/L3 model
Identical support mechanism for system x and p
Begins with xCAT 2.0 (9/2008)
© 2009 IBM Corporation17 Apr 24, 2023
xCAT 2 Team Members & Responsibilities Egan Ford (architecture, scaling expert, customer input, marketing, cloud, etc...) Jarrod Johnson (architecture & development, system x HW control, 1350 test, ESX,
Xen, KVM, Linux Containers) Bruce Potter (architecture, GUI) Linda Mellor (development, HPC integration) Lissa Valletta (documentation, general management functions) Norm Nott (AIX deployment, AIX porting & open source) Ling Gao (PERCS, monitoring, scaling) Scot Sakolish (system p HW control) Shujun Zhou (RoadRunner cluster setup/admin) Jay Urbanski (Open source approval process) Adaptive Computing (Hyper V, Moab, cloud) Sumavi Other IBMers, BPs, and customers
© 2009 IBM Corporation
xCAT Tech
18 04/24/23
© 2009 IBM Corporation19 Apr 24, 2023
xCAT 2.x Architecture
Everything is a node
– Physical Nodes
– Virtual Machines/LPARs/zVM• Xen, KVM, ESXi, ScaleMP
– rpower, live migration, console logging, Linux and Windows guests.
– Infrastructure• Terminal Servers• Switches• Storage• HMC
© 2009 IBM Corporation
CLI Web/GUI
xcatd
Client
Server
XM
L/S
SL
ACL
DBaction
xcatd
action
Management Node
Service Node
SNMP R/W
Trusted SSL
Fork PID
xCAT 2.x Architecture
20 Apr 24, 2023
© 2009 IBM Corporation21 Apr 24, 2023
xCAT 2.x Scale Infrastructure A single xCAT management node with multiple service nodes providing boot services to increasing
scaling. Can scale to 1000s and 10000s of nodes. xCAT already provides this support for large diskfull clusters and it can by applied to stateless as
well. The number of nodes and network infrastructure will determine the number of DHCP/TFTP/HTTP
servers required for a parallel reboot with no DHCP/TFTP/HTTP timeouts. The number of DHCP servers does not need to equal the number of TFTP or HTTP servers.
TFTP servers NFS mount read-only the /tftpboot and image directories from the management node to provide a consistent set of kernel, initrd, and file system images.
node001 node002 ... nodennn
DHCP TFTP HTTP NFS(hybrid)
DHCP TFTP HTTP NFS(hybrid)
nodennn + 1 nodennn + 2 ... nodennn + m
DHCP TFTP HTTP NFS(hybrid)
...
IMNmgmt node
service node01 service node02 service nodeNN
IMN...
© 2009 IBM Corporation22 Apr 24, 2023
xCAT 2.x Provisioning Supported Architectures
– x86/x86_64– Power/PPC/Cell
Supported OSes– Linux
• Redhat– CentOS, Fedora, Scientific Linux
• SuSE– AIX,– Windows
Provisioning Methods– Local Disk/SAN/Solid State– Stateless (RAM Root, Linux and AIX (xCAT 2.5))– Statelite (NFS Root, Linux and AIX (xCAT 2.5))– VM (data dedupe (copy-on-write), VM copy)– iSCSI (Windows and Linux)
• x86/x86_64 does not require firmware-based iSCSI initiator. xCAT simulates with netboot.
© 2009 IBM Corporation
HD MemoryNode
xCAT Provisioning Methods
xCAT
Stateful –Diskful
Local - HD - Flash
Stateful – Disk-Elsewhere
San - iSCSi
Stateless – Disk Optional
Memory RAM - CRAM - NFS
xCAT xCATO
S In
stal
ler
HD Memory HD Memory
SAN/iSCSI/NASO
S In
stal
ler
Imag
e Pu
sh
Node Node
OS
OS
• HD• Flash
• SAN• iSCSI• NAS
• RAM• CRAM
OS
Statelite
© 2009 IBM Corporation
Provisioning Methods
24 Apr 24, 2023
© 2009 IBM Corporation
Automagic Discovery
One-Button Provisioning
Simplified Service
– No need for skilled staff in datacenter.
Simplified Cluster Expansion
– Nodes must be predefined!
Idiot Proof
– Even managers can do it.
Complements IBM’s Hot Swap/Add Initiatives.
25 04/24/23
IBM-HW
xCAT
© 2009 IBM Corporation
netboot
2nd stage — pxelinux.0/elilo/openfw
xcatk disklessinstaller
discoveryxcatd
next modxcatd
reboot toinstaller
provisionxcatd
reboot tohd/iSCSI
Stateless
DIM
DOS
flash
reboot tonext
PXE
xcpu
embedded
TFTPvendorspecific
code
Other?
reboot todiskless
Provisioning Methods
26 Apr 24, 2023
© 2009 IBM Corporation27 Apr 24, 2023
xCAT 2.x Virtualization Support KVM and Xen (Paravirtualization and Classic) (libvirt driven)
– Allocate on Demand, Provision Linux/Windows Guest– Live Migration– Serial and VGA console
ESXi (Vmware API driven) (xCAT 2.4)– Allocate on Demand, Provision Linux/Windows Guest– Live Migration (Vcenter required)– No console access (WIP, xCAT 2.5)
Microsoft HyperV (xCAT 2.5) ScaleMP (xCAT 2.4) Power LPAR zVM Linux Containers (Think Solaris Zones) (xCAT 2.5 – 2.6) (libvirt driven)
– RH6, SLES11– Allocate on Demand– Provision/Boot Linux Guest Only– Suspend/Resume (WIP)– Live Migration (WIP)– http://lxc.sf.net
© 2009 IBM Corporation28 Apr 24, 2023
xCAT 2.x Monitoring
© 2009 IBM Corporation
What is Moab?
The Brain: Intelligent Management and Automation Engine.
Policy and Service Level Enforcer.
Provides simple Web-based job management, graphical cluster administration, and management reporting tools.
page 29
© 2009 IBM Corporation
xCAT and Moab Synergistically- • Provision OS images
• provisioning and reconfiguration of operating system to match workload needs based on policy
• Start and stop compute resources• reduce power consumption through workload-aware power and temperature management
• Monitor and balance workload to fulfill SLAs• dynamically and automatically reallocate resources according to business priorities
• keep system failures invisible to employees, partners, and customers and assure continuing smooth operations
• Manage and trigger virtualization• increase resource utilization
• assure high availability
• provision new virtual servers in minutes to handle shifting workloads
• enable live migration of virtual environments
© 2009 IBM Corporation
xCAT + Moab Suite
Moab Cloud Manager/Visual Cloud– GUI
Moab Workload Manager– Scheduler
• The Brain
Moab Service Manager– Queue
– Lock Management
– Universal Translator
xCAT– Actions
• The Muscle
– The Senses
31 04/24/23
IBM-HW
MSM
xCAT
VMs
MWM
MCM/VC
© 2009 IBM Corporation
Thermal Balancing
MOAB
xCAT
Upcoming Workload
Moab:• Job Impact
• Node Information• Temperature and Policies
• Sweet Spot• Node Consolidation
Intense
Less Intense
© 2009 IBM Corporation
Idle Node Management
MOAB
xCATMoab:
• Workload Prediction• Power Control
• Energy Savings
© 2009 IBM Corporation
Obstacle Avoidance
MOAB
xCATMoab:
• Workload Prediction• LED Reporting
• Higher Job Throughput• Energy Savings
© 2009 IBM Corporation
Green with xCAT+Moab
Reduce your overall power and cooling costs and decrease your organization’s carbon footprint
Moab automatically and seamlessly . . .Places idle servers in power-saving modes (power capping or power down) via xCAT.Schedules nodes based on node temperatures and cost per watt.Maximizes utilization of all CPU cores by orchestrating virtual environments (such as Xen, KVM, ESXi).Reports on energy used by user, project, or resource to give you greater control of energy consumption and accountability.
6/6/2008 35
© 2009 IBM Corporation
LinuxLinux
WindowsWindows
Historical Resource Management
• Multiple Silos• Additional Management
• Complexity• Under Utilization
© 2009 IBM Corporation
LinuxLinux
Moab Dynamic Hybrid
WindowsWindows
• Unified System• Dynamic Resources
• Management• Ease of Use
• Increased UtilizationWindowsWindowsLinuxLinux
Moab
xCAT
© 2009 IBM Corporation
Dynamic Service Nodes
xCAT
SNMoab
SN
SNetc.
Image
Push
Prov
ision
ing
Polic
y
© 2009 IBM Corporation
Virtual Machine Automation
Create/Delete VMs
Dynamic Add/Remove
Live Migration (KVM, Xen, ESXi)
Stateless Hypervisor (KVM, Xen, ESXi, ScaleMP)– Multiple HV Support
Balancing
Consolidation
Route Around Current/Future Problems
39 04/24/23
© 2009 IBM Corporation
Moab and xCAT
Intelligent Management
Green•Power down idle servers
QoS/SLA Assurance
Automated Fault Avoidance and
RecoveryReal-time Policy-driven
Resource Allocation
Dynamic Provisioning
•Respond to workload surges and priorities
Mixed Workloads
• Provisioning
• Virtualization
• Power
• Workload
• Apps
• Users
• Projects
© 2009 IBM Corporation
Benefits of xCAT + Moab • 90-99% Server Utilization*
• Simplified Administration & Management – Easy to use interface and intelligent automation
• Guarantee QoS & SLA delivery to applications and projects
• Green Computing Facilities
• Intelligent On-Demand Provisioning
• Automated Failure Recovery
• Dynamic Allocation of Resources to meet high priority needs of jobs
• Rapid Deployment and Scale-out
• Reporting & Charting Facilities
xCAT& Moab
Scheduling/Provisioning
CurrentWorkload
FutureNeeds
HistoricalData
Policies
© 2009 IBM Corporation
Case Studies: IBM Systems w/Moab• SciNet
• Moab provides key adaptive scheduling of xCAT’s on-demand environment provisioning
• Cluster Resources and IBM Canada jointly worked to respond to the RFP, present to the customer, and secure this win
• RoadRunner• 1-petaFLOP System• Cell-based processors
• IBM’s VLP• Moab enables IBM to host resources to software partners to enable testing
© 2009 IBM Corporation
Power
Moab
Loadleveler
iDataPlex
TORQUEJobs are
orchestrated based on Node Types which are most ideal for the
job & on what is available.
Example:
Shared Memory
Visualization
High Memory
Specialized App Environment
Shared Mem
Etc.
High Mem
Jobs
SciNet
xCAT
© 2009 IBM Corporation
Middleware
DB2, WebSphere, ….Operating SystemAIX, i5OS, SLES, RHEL
Net
wor
k
Vendor Portal“WebSphere”
Resource Availability Query
Reservation Commitment
External
Users
MoabMoab
Storage
Database“DB2”
Virtual Loaner Program
Provisioning Manager“Tivoli”
X 86
Series XSeries I
Power 4
32-wayPower 5
16-way
ExternalData
Storage
Persistent Storage
ExternalData
Storage
ExternalData
Storage
Storage
Storage
Net
wor
k
Provision
Resources
Request ResourcesStore Reservation
Records
VPN
SSH
Portal View of
Available Resources
Query Resources
Key Concept: Initiates, Saves and Restores Reservations, Hardware Set Up, Software Images, Configuration, Etc.
Software
Source
Custom RM“HMC”
Key Benefits• Dynamic Set Up
• Dynamic Clean Up
• 3X More Usage
• Fraction of Cost
© 2009 IBM Corporation
Summary
Available Today– xCAT 2.3
– Moab (xSeries Part Number)
Available Soon– xCAT 2.4 (April 30 2010)
Available not-so-soon– xCAT 2.5 (October 31 2010)
45 04/24/23
© 2009 IBM Corporation
Who’s responsible for this stuff?
Blame me:– Egan Ford
– [email protected] (email both for faster service)
46 04/24/23