IBM Cluster Systems Management for Linux and …€¦ · IBM Cluster Systems Management for Linux...

40
© 2008 IBM Corporation IBM Power TM Systems Software IBM Cluster Systems Management for Linux and AIX Version 1.7.0.10 Glen Corneau Power Systems ATS

Transcript of IBM Cluster Systems Management for Linux and …€¦ · IBM Cluster Systems Management for Linux...

© 2008 IBM Corporation

IBM PowerTM Systems Software

IBM Cluster Systems Management for Linux and AIX Version 1.7.0.10

Glen CorneauPower Systems ATS

2 © 2008 IBM Corporation

IBM Power Systems

This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area.

Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.

All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.

All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.

Revised September 26, 2006

Special notices

3 © 2008 IBM Corporation

IBM Power Systems

The following terms are registered trademarks of International Business Machines Corporation in the United States and/or other countries: AIX, AIX/L, AIX/L(logo), alphaWorks, AS/400, BladeCenter, Blue Gene, Blue Lightning, C Set++, CICS, CICS/6000, ClusterProven, CT/2, DataHub, DataJoiner, DB2, DEEP BLUE, developerWorks, DirectTalk, Domino, DYNIX, DYNIX/ptx, e business(logo), e(logo)business, e(logo)server, Enterprise Storage Server, ESCON, FlashCopy, GDDM, i5/OS, IBM, IBM(logo), ibm.com, IBM Business Partner (logo), Informix, IntelliStation, IQ-Link, LANStreamer, LoadLeveler, Lotus, Lotus Notes, Lotusphere, Magstar, MediaStreamer, Micro Channel, MQSeries, Net.Data, Netfinity, NetView, Network Station, Notes, NUMA-Q, Operating System/2, Operating System/400, OS/2, OS/390, OS/400, Parallel Sysplex, PartnerLink, PartnerWorld, Passport Advantage, POWERparallel, Power PC 603, Power PC 604, PowerPC, PowerPC(logo), Predictive Failure Analysis, pSeries, PTX, ptx/ADMIN, RETAIN, RISC System/6000, RS/6000, RT Personal Computer, S/390, Scalable POWERparallel Systems, SecureWay, Sequent, ServerProven, SpaceBall, System/390, The Engines of e-business, THINK, Tivoli, Tivoli(logo), Tivoli Management Environment, Tivoli Ready(logo), TME, TotalStorage, TURBOWAYS, VisualAge, WebSphere, xSeries, z/OS, zSeries.

The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: Advanced Micro-Partitioning, AIX 5L, AIX PVMe, AS/400e, Chiphopper, Chipkill, Cloudscape, DB2 OLAP Server, DB2 Universal Database, DFDSM, DFSORT, DS4000, DS6000, DS8000, e-business(logo), e-business on demand, eServer, Express Middleware, Express Portfolio, Express Servers, Express Servers and Storage, GigaProcessor, HACMP, HACMP/6000, IBM TotalStorage Proven, IBMLink, IMS, Intelligent Miner, iSeries, Micro-Partitioning, NUMACenter, On Demand Business logo, OpenPower, POWER, PowerExecutive, Power Architecture, Power Everywhere, Power Family, Power PC, PowerPC Architecture, PowerPC 603, PowerPC 603e, PowerPC 604, PowerPC 750, POWER2, POWER2 Architecture, POWER3, POWER4, POWER4+, POWER5, POWER5+, POWER6, POWER6+, Redbooks, Sequent (logo), SequentLINK, Server Advantage, ServeRAID, Service Director, SmoothStart, SP, System i, System i5, System p, System p5, System Storage, System z, System z9, S/390 Parallel Enterprise Server, Tivoli Enterprise, TME 10, TotalStorage Proven, Ultramedia, VideoCharger, Virtualization Engine, Visualization Data Explorer, X-Architecture, z/Architecture, z/9.

A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml.UNIX is a registered trademark of The Open Group in the United States, other countries or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries or both.Intel, Itanium, Pentium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries or both.AMD Opteron is a trademark of Advanced Micro Devices, Inc.Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both.AltiVec is a trademark of Freescale Semiconductor, Inc.Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.Other company, product and service names may be trademarks or service marks of others.

Revised September 28, 2006

Special notices (cont.)

4 © 2008 IBM Corporation

IBM Power Systems

Why Cluster?

Utilize the Power of Multiple Computing

Resources

High Availability

Access to Data and

Applications

Low Cost, Effective Infrastructure Management

Server Consolidation, Dynamic Provisioning and

Workload Consolidation

CSM

5 © 2008 IBM Corporation

IBM Power Systems

However...

There are challenges facing an administrator who has to manage a large number of machines! For instance, how does one:

These challenges can lead to many additional costs in operating clusters!

➢handle installing and updating of software on all machines in the cluster

➢ find out if any of the machines in the cluster have a problem before something drastic occurs

➢handle a problem that occurs overnight while no one is around

➢ reboot servers without having to go out to each and every one, wherever they may be

➢diagnose problems or failures in the cluster

6 © 2008 IBM Corporation

IBM Power Systems

CSM Can Help Solve These ProblemsCluster Systems Management (CSM) is designed to provide many solutions for managing a cluster all from one single point-of-control. Some examples of CSM functions: ● Install and update machines remotely

● Remote powering on, off and reboot of nodes in the cluster

● Continuous monitoring across all machines in the cluster

● Automated responses that can be run any time a problem occurs that can provide notification or take corrective action

● Probes can be run across nodes in the cluster to diagnose problems

● Files can be changed in one spot and distributed to all machines or a set of machines in the cluster

7 © 2008 IBM Corporation

IBM Power Systems

So What is CSM?

IBM Cluster Systems Management can provide a robust, powerful and centralized way to manage large numbers of IBM eServer®, xSeries®, System x®, BladeCenter®, pSeries®, System p5® and Power Systems® servers.

CSM can lower the overall cost of IT ownership by simplifying the tasks of installing, operating and maintaining clusters of servers.

CSM is comprised of a modular architecture, so that it integrates both IBM software and certain Open Source software into a complete systems management solution, but also allows administrators to decide which parts of CSM to use.

CSM leverages the rich heritage and proven technology of the IBM RS/6000® SP™ by utilizing and deriving various software from the Parallel Systems Support Programs for AIX® (PSSP) systems management software product (such as dsh and RSCT).

8 © 2008 IBM Corporation

IBM Power Systems

CSM High-Level View

Management Server - the entire cluster of machines can be operated, maintained and monitored from a single point-of-control

Managed Nodes are the operating system instances (OSIs) that are defined in the cluster

● Install and update software on nodes● Automatic security configuration● Run distributed commands ● Synchronize files across cluster● Monitoring and automated responses● Hardware control● Management of node groups● Diagnostics tools● Management of high-speed

interconnects● Implement HPC applications

The nodes in a CSM cluster can be:●Linux on System x, eServer or BladeCenter™●AIX™ on POWER™●Linux® on POWER and OpenPower●Linux and AIX on iSeries LPARs●A mix of the above

Hardware Control Points are the devices that provide direct hardware functions to managed nodes (HMC, MM, RSA, FSP, etc)

9 © 2008 IBM Corporation

IBM Power Systems

CSM Value and Benefits ➔ One consistent interface to control and monitor across a variety of hardware

platforms, such as Power (POWER6), System p (POWER5) and pSeries machines, BladeCenter blades, System x and xSeries servers, eServer servers and OpenPower. Consistent interface between Linux and AIX.

➔ CSM can manage across multiple switch topologies and across different types of clusters such as GPFS clusters and HA clusters.

➔ Continuous monitoring under administrator control provides for notification or automatic responses to be run, which allows problems to be handled any time of the day or night, leading to problem avoidance, rapid resolution and recovery.

➔ Distributed command execution allows the administrator to run any needed commands across the cluster (or to a distinct set of nodes) from one single point-of-control.

➔ Provides command line access to all cluster operations and data, to enable administrators to utilize scripting for automation and reuse of cluster administration.

➔ CSM is designed to handle large scaling, and has the ability to handle distributed operations in parallel such as remote command execution, monitoring and automated responses, installing and updating of nodes.

10 © 2008 IBM Corporation

IBM Power Systems

CSM Value and Benefits (cont’d)➔Installing and updating of the nodes in the cluster can be done from one single point-of-

control, and security and remote shell setup will be done while adding nodes to the cluster, which provides ease of installation as well as ease of maintenance.

➔Sets of nodes can be identified as node groups and managed and monitored in the cluster as distinct entities. Nodes inherit monitoring as they move into node groups, removing the need for manual enablement.

➔Using Configuration File Management, the administrator can update files in one spot and have them synchronized across nodes and node groups in the cluster. Services in the cluster such as NTP can be set up, and basic user management can be done. If a node is down during these updates, the update will automatically be done when the node comes back online.

➔Commands and diagnostic probes are provided, allowing concise ways to see the entire state of the cluster and diagnose problems all from one single point-of-control.

➔Monitoring can be extended to allow customization of monitoring and use of the CSM event infrastructure to fit the administrator’s needs.

11 © 2008 IBM Corporation

IBM Power Systems

CSM Management of Multiple Domains

Cluster LAN(s)

Switch

HA Cluster 1

HA Cluster 2

CSM Management Server

CSM can manage machines across different hardware domains, different switch topologies and across high availability clusters

Separate Switch/Interconnect Domain

© 2008 IBM Corporation

IBM Power Systems

EMS

FMS FMS FMS

FMS Node

FMS Node

FMS Node

FMS Node

FMS Node

FMS Node

……

Hierarchical Management Servers

EMS: Executive Management Server (the top dog)

FMS: First Line Management Server (the “local” MS)

13 © 2008 IBM Corporation

IBM Power Systems

Current Software Distributions / Hardware

● Selected System x/xSeries servers

● e325, e326, e326m● BladeCenter HS2x,

HS4x, JS2x, LS2x

● Selected POWER4 and OpenPOWER servers

● All p5 servers● Selected i5 servers with

AIX or Linux partitions ● Selected RS/6000 SP

nodes and servers● POWER6 servers

Supported Servers

● Ethernet● Myrinet™

● RHEL 4, 5● Novell SLES 9, 10

System xxSerieseServerBladeCenter

● Ethernet● HPS (AIX)● Myrinet (Linux)● Infiniband

● AIX 5.3, 6● RHEL 4, 5● Novell SLES 9, 10

System p5pSeriesPower SystemsOpenPowerSystem i5iSeries

Supported Interconnects

Supported Operating System

Platform

14 © 2008 IBM Corporation

IBM Power Systems

Cluster Management

The cluster management capability allows an administrator to:

Add nodes to the cluster (installation of the operating system and/or CSM can be done at that time) and remove nodes from the cluster

View and/or change information for one or more nodes in the cluster

The CSM software is packaged into a server side that runs on the management server and a client side that runs on each managed node. The size of the CSM client packages are small, thus putting a minimum of overhead on the nodes in the cluster.

Included, optional CSM High Availability Management Server (HA MS) feature designed to allow automated failover of the CSM management server to a backup management server.

IBMIBMIBM

Management Server

Managed Nodes

IBM

© 2008 IBM Corporation

IBM Power Systems

Hierarchical Management Servers

Why use this function?Manage more nodes than allowed by the current CSM scaling limit.

Divide up the nodes into smaller sets that can be managed individually, sometimes by different administrators.

Bring a management server closer to the nodes in a geographically disperse environment.

Manage and monitor the FMSs from the EMS

Two main scenarios:Building a hierarchical set of clusters from scratch, from the top down.

Bringing existing CSM 1.7 clusters together to form a hierarchical structure.

Scaling:

Normal CSM scaling in each FMS cluster (1024 for x, 512 for p)

100 FMSs

4000 total FMS nodes

See the documentation for limitations:i.e. Homogenous FMS clusters, no HA MS

16 © 2008 IBM Corporation

IBM Power Systems

Node Groups

Node groups can be formed inside the CSM cluster and managed and monitored as distinct entities.

Node groups can be defined as a static list of nodes or they can be defined to be dynamic and have nodes inside them that correspond to one or more characteristics.

For example:There is a predefined dynamic node group for Linux nodes and a predefined node group for AIX nodes. As nodes are added to the cluster, they will automatically become part of one of those node groups.

An administrator can create a node group for a particular type of hardware and monitor that hardware. If new machines of that hardware type are added into the cluster, monitoring will automatically start for them.

IBMIBMIBM

Management Server

Managed Nodes

IBM

17 © 2008 IBM Corporation

IBM Power Systems

Installation and Setup

IBMIBM

IBM

The administrator installs the management server, defines the nodes to be in the cluster and then CSM can remotely do a parallel network operating system install of the nodes. CSM uses the native install mechanisms under the covers - such as Kickstart for Red Hat, AutoYaST for Novell SLES, or NIM for AIX.

Separate network installation servers, located off of the management server, are supported to handle the various operating systems and distributions in a mixed cluster (i.e. heterogenous) environment. CSM provides the flexibility of having any type of management server with any type of nodes.

CSM will update software on the nodes, such as for new CSM versions and new Open Source updates. If a node is down during an update, CSM will automatically perform the update when the node comes back up.

OSCSMSecurity

OSCSMSecurity

OSCSMSecurity

OSCSMSecurity

Management Server

Managed Nodes

IBM

18 © 2008 IBM Corporation

IBM Power Systems

Installation and Setup

IBMIBM

IBM

CSM will automatically setup the security configuration for the underlying cluster infrastructure. It can also do the necessary set up for rsh or ssh (exchange of ssh keys).

The definition and configuration of secondary (i.e. non-install) adapters is integrated into CSM. For Linux nodes this applies to Ethernet. AIX nodes can define Ethernet, Infiniband, HPS and Multi-link (ml0) adapters.

HPC stack applications can be installed and configured as part of the operating system installation customization process. These applications include LoadLeveler, GPFS, Parallel ESSL and Parallel Environment both on Linux and AIX.

OSCSM

Security

OSCSM

Security

OSCSM

Security

OSCSM

Security

Management Server

Managed Nodes

IBM

19 © 2008 IBM Corporation

IBM Power Systems

Distributed Command Execution

IBMIBMIBM

An administrator can run commands in parallel across nodes or node groups in the cluster and gather the output using the dsh (distributed shell) command.

The dsh command can use rsh (UNIX® basic remote shell) or ssh (secure shell). Either of these can utilize Kerberos V5. The administrator decides which one to use.

The dshbak command can format the output returned from dsh if desired - for example collapsing identical output from more than one node so that it is displayed only once. Optionally, the output can be sorted alphabetically.

The dcp command can be used to copy a file (or files) from the management server to multiple target nodes with one operation.

Remote shell capabilities can also be enabled for non-node devices (like HMCs, Virtual I/O Servers and IVM partitions via ssh).

Target can be single node, node groups, node ranges, devices, device groups.

Management Server

Managed Nodes

dsh (uses rsh or ssh)

IBM

20 © 2008 IBM Corporation

IBM Power Systems

Configuration File Management

IBMIBM

IBM

A configuration file manager (CFM) is provided to synchronize and maintain the consistency in files across nodes in the cluster. This prevents the administrator from having to copy files manually across the nodes in the cluster. The particular files can be changed once on the management server and then distributed to all the nodes or node groups in the cluster.

CFM can use make use of meta variables for IP address or hostname substitution in files being transferred. Scripts can also be run for processing before and after a file is copied (e.g., to stop/start daemons, etc).

CFM makes use of rdist or rsync for file transfer. Rdist/rsync can use rsh (UNIX basic remote shell) or ssh (secure shell) - the administrator can decide which one to use.

Management Server

Managed Nodes

CFM/rdist (uses rsh or ssh)

IBM

21 © 2008 IBM Corporation

IBM Power Systems

Monitoring and Automated ResponsesIBM

IBMIBM

An administrator can set up monitoring for various conditions across nodes or node groups in the cluster and have actions run in response to events that occur in the cluster.

Conditions that can be monitored include network reachability, power status, whether applications or daemons that are running on the node are up or down, and CPU, memory and filesystem utilization among others.

Actions that can occur in response to one of these conditions occurring include commands that can be run on the management server or on any node of the cluster or notification actions such as logging, e-mailing or paging.

SNMP traps can also be generated in response to events in the cluster.

Managed Nodes

Administrator sets up monitoring for particular conditions in the cluster

Events occur in the cluster and actions are run in response to those events

Management Server

IBM

22 © 2008 IBM Corporation

IBM Power Systems

Monitoring / Automated Responses (cont'd)

Predefined "Conditions" are shipped with CSM (over 100) for many types of information that can be monitored so that monitoring can be started "out of the box".

Predefined "Responses" for e-mail notification, SNMP traps, logging and displaying a message to a console are provided.

A user can take a condition, quickly associate it with a response, and start monitoring. The administrator can easily customize these conditions and responses or create new conditions tailored to fit their own needs.

In addition to notification actions, administrator-defined recovery actions can also occur. These can include cleaning up filesystems that are filling up, taking actions to help restart a critical application that went down, etc.

The types of resources that can be monitored or controlled can be extended by an administrator via a mechanism provided by the underlying cluster infrastructure.

CSM can also be used to consolidate error reports from across the nodes. For Linux nodes: syslog and for AIX nodes: error report (errpt). AIX error report sensor uses errnotify ODM entries instead of polling mode.

23 © 2008 IBM Corporation

IBM Power Systems

Hardware Control

CSM provides hardware control capability to power on, off, reboot, bring up a remote hardware console and query nodes in the cluster. This support is available for IBM systems with out-of-band management capabilities (RSA adapters, integrated BMC controllers, Management Modules, HMCs, Flexible Service Processors, etc).

The Hardware Control capability is designed so that there is a one layer of code for doing the hardware functions that then reference a library based on the hardware type of the node. This allows libraries that can then be "plugged in" for the specific hardware to be supported. CSM was specifically designed so that additional hardware support can be easily added.

Management Server System x nodes in

a rack with hardware control service processor or integrated BMC

System p5 with an HMC

HMCe325/e326 with BMC processor

BladeCenter with Management Module

IBM

p5 systems with FSP (no HMC)

IBM

24 © 2008 IBM Corporation

IBM Power Systems

Hardware Control

Clusters with a mix of AIX and Linux nodes can use a management server with either OS, but require an install server with the same OS and distribution

CSM supports the representation of additional parts of the cluster (such as RSAs, terminal servers for System x, HMCs and Virtual I/O Servers for Power Systems) as non-node devices in the cluster, allowing the CSM management server to show the status of non-node devices (network reachability), control power state (if remote power capability exists) and run remote commands.

System p5 and POWER6 servers without an HMC are supported with hardware control, installation and remote console for LPARs managed by the Integrated Virtualization Manager (IVM).

Management Server

System x nodes in a rack with hardware control service processor or integrated BMC

System p5 with an HMC

HMCe325/e326 with BMC processor

BladeCenter with Management Module

IBM

p5 systems with FSP (no HMC)

IBM

25 © 2008 IBM Corporation

IBM Power Systems

Hardware Control

Cluster Ready Hardware Server (CHRS) function: enabling hardware discovery, HMC to server assignment, and server processor password management from the CSM management server for POWER5™ hardware, including High Performance Switch (HPS) and Infiniband implementations. Not supported with AIX 6 or POWER6™ systems.

IBM Power Systems enterprise systems and frames (575, 590, 595) can have their CEC and frame firmware updated from the CSM management server. There is also support for selected System x systems and BladeCenter firmware upgrades.

Support for the ultimate in supercomputer performance: BlueGene/L, is provided.

Management Server

System x nodes in a rack with hardware control service processor or integrated BMC

System p5 with an HMC

HMCe325/e326 with BMC processor

BladeCenter with Management Module

IBM

p5 systems with FSP (no HMC)

IBM

26 © 2008 IBM Corporation

IBM Power Systems

Diagnostic Tools

IBMIBMIBM

The administrator can run diagnostic probes provided by CSM to automatically perform "health checks" of particular software functions if a problem is suspected.

These probes can also be run periodically or automatically as a response to a condition occurring in the system.

Current probes shipped with CSM include probes to diagnose network connectivity, NFS health, and the status of daemons that CSM runs.

Administrators can also write their own probes to add into the existing CSM probe infrastructure.

Management Server

Managed Nodes

Probes

IBM

27 © 2008 IBM Corporation

IBM Power Systems

SecurityIBM

IBMIBM

There are three main pieces to CSM security:

Distributed command execution (dsh) and CFM make use of either rsh or OpenSSH. CFM uses a push mechanism to distribute updates to the nodes.

The underlying cluster infrastructure uses an "out of the box" host based authentication mechanism. The security is designed to be pluggable to more easily allow other security mechanisms to be used in the future.

Hardware control uses an encrypted id and password to communicate with the hardware control points. It is recommended to put the hardware control communication to the hardware control points and terminal servers on a separate VLAN than the VLAN used to install and manage the nodes.

Management Server

Managed Nodes

Host-Based Authentication

IBM

28 © 2008 IBM Corporation

IBM Power Systems

SecurityIBM

IBMIBM

The security infrastructure allows for pluggable security mechanisms - Kerberos V5 now supported with both AIX and Linux nodes.

CSM will automatically setup and exchange public keys between the management server and the managed nodes during full install.

Access to particular resources in the cluster are determined by ACL files that CSM assists with setting up. The managed nodes are not given access to any resources on the management server, thus preventing the management server from having to trust the managed nodes.

Least Privilege Resource Manager (LPRM) now allows non-root users to execute a subset of CSM commands on the management server that normally require root access.

Documentation for implementation of firewalls in a CSM environment is provided.

Management Server

Managed Nodes

Host-Based Authentication

IBM

29 © 2008 IBM Corporation

IBM Power Systems

Additional CSM Information

Cluster Software www.ibm.com/servers/eserver/clusters/software/

Cluster Resource Center publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp

CSM FAQ www.ibm.com/developerworks/forums/dw_thread.jsp?forum=907&thread=128386&cat=53

CSM Technical Support Website www14.software.ibm.com/webapp/set2/sas/f/csm/home.html

CSM Mailing List www.ibm.com/developerworks/forums/dw_forum.jsp?forum=907&cat=53

xCSM Toolkit www14.software.ibm.com/webapp/set2/sas/f/csm/utilities/xCSMfixhome.html

30 © 2008 IBM Corporation

IBM Power Systems

CSM Value Summary

Performance – efficient

monitoring, reduced network traffic in cluster

Automated error detection - can lead to problem

avoidance, rapid resolution and recovery

Use of OpenSource products and

focus on usability - can lead to less time to get administrators up

to speed

Automated setup of security,

redefined node groups - leads toquick setup of administrator's

cluster

Rich set of remotemanagement tools all

from one single point- of-control can lead to

much more efficient utilization of admin time

and reduced cost of ownership

Flexibility -Ability to manage

machines across multiple hardware domains, switch topologies, HA

clusters

31 © 2008 IBM Corporation

IBM Power Systems

Backup Charts – CSM Product History

32 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.7.0.10 (Apr 2008)

New Support for AIX 6 TL1 and AIX 5.3 TL8

New Hardware support

Power Systems: 575, 595BladeCenter JS22 and JS12InfiniBand Qlogic Switch

HA MS for AIX 6 TL1 using TSA 2.3 FixPack 3

33 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.7 (Nov 2006)

New Support for AIX 6 (AIX 5.3 still supported, AIX 5.2 is not)

New Hardware support

Power Systems: 570, 520 and 55System x3550, x3950(m2), x3455, x3755 serversNew Blue Gene/P Solution

DVD support and enhanced support for Linux diskless configuration

Performance and consistency enhancements for dsh and CFM utilities

Enhancements to the RMC Resource Monitors for AIX error log, AIX syslog and Linux syslog.

Pre-defined (10 min) shutdown of inactive RMC Resource Monitors

34 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.6 (Oct 2006)

Hierarchical management servers support is added.

New Hardware support

System x3455, x3550, x3650, X3655, X3755 and x3850 serversxSeries® 366 serverAMD Opteron LS21 and LS41 for IBM BladeCenter®Intel® HS21 for IBM BladeCenter

Support for SLES 10 on both System p™ and System x™ nodes with CSM for Linux

Support for HMC-less configurations with IVM

Install support for diskless Linux nodes

Support for Linux OS upgrades

Last release of CSM to support AIX 5.2

35 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.5.1 (May 2006)

Secondary adapter support enhanced to include Infiniband for AIX nodes.

Additional hardware support: BladeCenter JS21 and IBM System p5 510, 510Q, 560Q, 570 and 575.

A CSM 1.5.1 management server in a Blue Gene cluster can now be either AIX or Linux.

New commands for hardware monitoring of FSP connected p5 systems and Cluster Ready Hardware Server HMC domains.

Enhanced scalability:

For systems with POWER-based architecture, the CSM Scaling limit is 768 OS images.

36 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.5.0 (Nov 2005)

Secondary adapter support integrated into CSM (for Linux nodes: Ethernet, for AIX nodes: Ethernet, Multi-link and HPS.

BlueGene/L support.

Direct FSP (i.e. no HMC) for p5 systems hardware control.

The following functions still require an HMC: Service Focal Point, DLPAR, LPAR creation, virtual I/O

Support for p5 microcode/firmware updates via CSM

HPC stack installation and configuration (LoadLeveler, GPFS, ParallelESSL, Parallel Environment).

Enhanced scalability:

For systems with POWER-based architecture, the CSM Scaling limit is 512 OS images.

37 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.4.1.10 (Aug 2005)

Added support of distributed shell (dsh) pluggable target types.

Full interoperability between AIX or Linux management servers and all levels of supported operation system levels on management nodes including SLES8, SLES9, RedHat Enterprise Linux 3 & 4, and AIX 5L.

Enhanced scalability:

For supported systems with x86-based architecture the CSM Scaling limit is 1024 operating system (OS) images.

For systems with POWER-based architecture, the CSM Scaling limit is 256 OS images.

For mixed environments, the limit is 1024 OS images with no more than 256 of those OS images on systems with POWER-based architecture.

38 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.4.1 (May 2005)Cluster Ready Hardware Server (CHRS) function has been added, enabling hardware discovery, HMC to server assignment, and server processor password management from the CSM management server for POWER5™ hardware.

Separate installation servers located off the management server for dramatic improvement in installation and configuration capabilities.

Enhanced support for mixed cluster environments = more flexibility with multiple operating systems and Linux distributions in one management domain.

Improved handling of CSM event types such as consolidating error reports across platforms.

Implementation documentation for Firewalls.

Merged AIX and Linux documentation.

Support for eServer p5 Virtual I/O Servers and I/O Server Partitions as non-node devices.

Least Privilege Resource Manager (LPRM) now allows non-root users to execute a subset of CSM commands on the management server that normally require root access.

39 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.4.0.11 (Dec 2004)

Optional CSM High Availability Management Server (HA MS) feature designed to allow automated failover of the CSM management server to a backup management server.

Support for new POWER5 servers – p5 510, 520, 550, 570, 575, 590, 595 running either AIX or Linux.

Greater cluster scaling with each POWER4 HMC up to 32 physical systems and 64 operating system images, and on each POWER5 HMC up to 16 physical systems and 64 operating system images.

Support for up to 1024 nodes on xSeries.

SSH to HMC’s for remote commands run via dsh.

40 © 2008 IBM Corporation

IBM Power Systems

Highlights for CSM V1.3.3 (Apr 2004)

New command called dcp to easily copy files across noders or node groups in the cluster.

Support for representing additional parts of the cluster (such as RSAs, terminal servers for xSeries and HMCs for pSeries) as non-node devices in the cluster, allowing the CSM management server to show the status of non-node devices (network reachability) and be able to power on/off the devices if there is remote power capability.

Support for Serial Over LAN for the HS20 blades for remote console support.

Ability to have Kerboros V5 authentication enabled when running remote commands such as dsh and dcp.

Sample scripts for setting up NTP, automounter and tuning configurations in the cluster. These scripts can be used as-is, or can be modified and extended to meet the needs of the administrator’s environment.

Improved usability on install including improved feedback and allowing a timeout for install and allowing installs over non-eth0 interfaces for greater install flexibility.