4q01-Lin

7/27/2019 4q01-Lin

1/6

High-performance computing (HPC) clusters use threemain types of master and compute node configurations:loosely, moderately, and tightly coupled. Each configura-

tion describes the compute nodes dependency on the master node(see Figure 1). Although all three require the master to be availablefor a job to run, the masters status does not necessarily affect thecompute nodes availability.

From an operating system viewpoint, the compute nodes in aloosely coupled cluster are fully autonomous machines. Each nodehas a full copy of the operating system (OS), which allows someoneto boot up and log into the node without contacting the masternodeunless the network uses dynamic Internet Protocol (IP)addresses. Failure to retrieve a dynamic IP address from the master

node will not inhibit a node from successfully starting, but it will beaccessible only through a local console.

A moderately coupled cluster binds the compute nodes moreclosely to the master node. In this configuration, the computenodes boot process requires the master node because, at minimum,the programs and information needed during boot are located onthe master. Once the compute node has retrieved all needed file sys-tems from the master, it will act like a stand-alone machine and canbe logged into as though all file systems were local.

Tightly coupled systems push the dependence on the masternode one step further. The compute node must load its operating

system over the network from the master node. Compute nodes in atightly coupled cluster do not store file systems locally, aside frompossibly swap or tmp . From an OS standpoint, few differences existbetween the master node and the compute nodes. The ability to loginto the compute nodes individually does not exist. The processspace is leveled so that the cluster looks more like one large mono-lithic machine rather than a cluster of smaller machines.

The following sections explain the utilities or methods availablethat enable setup and installation of the desired cluster type. Eachconfiguration has inherent advantages and disadvantages, and the dis-cussion explores which configuration best matches particular needs.

Installing loosely coupled clusters

In a loosely coupled cluster, each compute node has a local copy of the operating system. The most tedious and cumbersome way toinstall such a cluster is one at a time using a CD. Some automatedmethods to install a loosely coupled cluster include the following.

The Kickstart fileThe Red Hat Kickstart installation method lets a user create a sin-gle, simple text file to automate most of a Red Hat Linux installa-tion such as language selection, network configuration, keyboardselection, boot loader installation (such as the Linux Loader (LILO)or the GRand Unified Bootloader (GRUB)), disk partitioning, mouse

Installing LinuxHigh-Performance

Computing Clusters

The first challenge in moving a newly deployed cluster framework to a usable high-performance

computing cluster is installation of the operating system as well as third-party software packages.

In four- to eight-node clusters, each node can be installed manually. Large, industrial-strength

clusters require a more efficient method. This article describes different types of cluster

configurations, efficient Linux installation methods, and the benefits of each.

By Christopher Stanton, Rizwan Ali, Yung-Chin Fang,and Munira A. Hussain

www.dell.com/powersolutions Power Solutions

H I G H - P E R F O R M A N C E C O M P U T I N G

7/27/2019 4q01-Lin

2/6

selection, and X Window System configuration. The Kickstart fileconsists of three sections: commands, package list, and scripts.

Commands. The commands section lists all the installationoptions such as language and partition specification, network con-figuration, and installation method. For example, administratorscan use the network configuration option to specify the nodes IPaddress, host name, and gateway.

Packages. The %packages command starts the Kickstart filesection that lists the packages to be installed. A component name(for a group of related packages) or an individual package name canspecify the packages.

A comps file on the Red Hat Linux CD-ROM (Redhat/base/comps) lists several predefined components. Users also can createtheir own component and list the packages needed. ( Note: To createa component, users must create a new International Organization forStandardization (ISO ) image of the CD-ROM with their modifiedcomps file.) The first component in the file is the Base component,which lists the set of packages necessary for Linux to run.

Scripts. Administrators can use the post-installation commandin the Kickstart file to install packages that are not on the CD-ROMor to further tune the installation, such as customizing the host filesor enabling SSH (secure shell).

The post section is usually at the end of the Kickstart file andstarts with the %post command. The additional packages must beavailable from a server on the network, typically the master node.

The %post section would look like Figure 2. This sample commandwould install the rpm package my_driver.rpm from the server withthe IP address 10.180.0.2.

Red Hat 7.1 includes a Kickstart Configurator, a graphical userinterface (GUI) to create a Kickstart file (instead of typing). Afterselecting Kickstart options, a user can click on the Save File buttonto generate the Kickstart file. The Configurator enables users toselect most of the options required for a Kickstart file and providesa good starting point for expert users who may alter the generatedfile to suit their needs.

Kickstart installation methodsThe Installation Method command on the Kickstart file lets admin-istrators specify the installation method: using a local CD-ROM or alocal hard drive, or via Network File System (NFS), File TransferProtocol (FTP), or Hypertext Transfer Protocol (HTTP).

The most cumbersome installation is to create a Kickstartfile for each node and save the file to a Red Hat installationboot floppy. When the system is booted from the floppy (theRed Hat Linux CD must be in the CD-ROM and the Kickstartfile set to install from CD-ROM), the installation process auto-matically starts based on the options specified in the Kickstartfile on the floppy. Each node has different network settings

(IP address and host name) and therefore requires a separatefloppy. This method is tedious for large cluster installations: Itrequires manual intervention to move the floppy and CD from


Power Solutions Issue 4, 200112

Loosely coupled (independent compute nodes)

Moderately coupled (dependent compute nodes)

Tightly coupled (integrated compute nodes)

Master node

Master node

Master node

# P O S T - I N S T A L L A T I O N C O M M A N D S%postrpm ivh 10.180.0.2:/opt/nfs_export/Beowulf/drivers/my_driver.rpm

Figure 2. Post-installation command in the Kickstart file

Figure 1. Cluster-level view of master and compute node configurations

7/27/2019 4q01-Lin

3/6

node to node, unless a large number of floppiesor CDs are available to simultaneously install allthe nodes.

A more efficient method uses the network toperform the installation. Here again, each nodemust have a floppy, but the CD is no longerrequired. The Installation Method section in theKickstart file can be changed to support either FTPor NFS installation. Once the Red Hat installationwith the Kickstart file has booted, it will retrievethe installation image from a dedicated server(usually the master node) on the network.

In the most commonly used installation approach, adminis-trators place the Kickstart file as well as the CD image on the

network. A Boot Protocol/Dynamic Host Configuration Protocol(BOOTP/DHCP) server and an NFS server must be on the localnetwork, usually on the cluster master node. The BOOTP/DHCPserver must include configuration information for all themachines to be installed in the cluster. The BOOTP/DHCPserver provides the client its networking information as well asthe location of the installation boot kernel and ramdisk and pos-sibly the location of the Kickstart file. If the location of the Kick-start file is not provided, the installation program will try to readthe file /kickstart/1.2.3.4-kickstart , where 1.2.3.4 is thenumeric IP address of the machine being installed, on the DHCPserver. Finally the client NFS mounts the files path, copies thespecified file to its local disk, and begins installing the machineas described by the Kickstart file.

Installing the cluster using SystemImager SystemImager is a remote OS duplication and maintenance sys-tem that reduces the repetitive steps needed to create a cluster of autonomous machines. SystemImager requires the administratorto install and configure an example compute node before cloningthe remaining compute nodes. One advantage of this approach isthat during installation, the administrator is not required to writespecialized scripts to install additional software packages or con-

figure system settings.In the SystemImager approach, the compute node that will be

used as the source or example system is called the golden client.Theadministrator must first install and configure the machine usingtraditional methods so that it is representative of all compute nodesin the cluster.

SystemImager, which is installed on the master node, thencreates a file system image of the entire golden client machine by using the getimage command. This image only contains fileson the remote machine rather than images of the entire parti-tion, which saves space. The prepareclient command creates

a partition information table and a list of mounted file systems. This allows partitions tobe created with the same mount points and size.

The master node now contains the informa-tion to create a duplicate of the golden client (seeFigure 3). During the installation of computenodes, the addclients command allowsadministrators to adjust system-specificconfiguration information on each node. Theaddclients command prompts for a host-namebase and range, client image, and IP address. The

base represents the static part of the host name, and the range rep-resents a starting and ending index to append to the host name. Forexample, if we chose node as the base and 1-3 as the range, instal-

lation routines would be created for node1 , node2 , and node3 .When the naming convention has been finalized, the adminis-

trator will be prompted to assign an install image to these machinesand then an IP address to each node. The host name and associatedIP address are then added to a host-name file that will be used dur-ing both the boot and installation process.

Upon completion of these steps on the master node, the compute-node boot method must be chosen. The SystemImager kernel andramdisk can be booted from portable media such as a floppy orCD-ROM (created by either the makeautoinstallfloppy ormakeautoinstallcd commands, respectively). Alternatively, thekernel and ramdisk can be booted over the network via PrebootExecution Environment (PXE).

SystemImager contains prebuilt configuration files for theLinux PXE server (PXELinux) that must be running on the masternode. PXE is a lightweight protocol that enables the compute nodeto contact a BOOTP/DHCP server. BOOTP (and DHCP, which isan extension of BOOTP) allows a server to provide the clientidentified by its hardware Media Access Control (MAC) addressmuch of its initial configuration information, such as IP address,subnet mask, broadcast address, network address, gateway address,host name, and kernel and ramdisk download path.


Master node

Goldenclient

Master node grabs the golden

client image

Compute nodes

Remaining computenodes pull the

image fromthe master node

Figure 3. SystemImager installation method

In a loosely coupled

cluster, each

compute node has

a local copy of

the operating system.

7/27/2019 4q01-Lin

4/6

Once the node has booted, it must retrieve itsIP address and host name. This is accomplished by a DHCP server on the master node that assignsboth values or by placing both values on thefloppy disk used to boot the node. SystemImagerprovides a DHCP configuration-building tool,makedhcpserver , that will construct a DHCPconfiguration file that maps a host name and IPaddress. The makedhcpstatic command cancreate static mappings between a specific machineand a host name/IP address pair.

Maintaining the cluster with SystemImager An administrator also can use the golden client

image as a change log and a single point of admin-istration for cluster-wide modificationsfrom asingle file to an entire package. First the cluster administrator makesthe desired modifications to the golden client. Next, the administra-tor either updates the currently used image or creates a new imagefrom which to base the cluster.

This approach enables the administrator to create a versionhistory in case a change breaks the cluster. Once the new imagehas been created, the remaining compute nodes are synced withthe changes. This generally requires minimal time because only the modified files are copied to each node. If a change does dis-rupt the cluster, it can be re-synced from an earlier image that isknown to work.

Installing moderately coupled clustersIn a moderately coupled cluster, each compute node can be accessedas an individual machine, but each node does not have its own localcopy of the OS. Administrators can use many different methods toinstall a moderately coupled system. The following section describestwo common methods: the hybrid model (temporary data is storedlocally) and the fully diskless model (the compute nodes do nothave hard drives). Both methods use a central server to store andload the OS and other system information.

Booting the compute node from the network The compute node will need to retrieve many essential OS compo-nents over the network. First it must be able to network boot, whichrequires the compute node to support a network booting protocol,such as PXE, so the node can contact a BOOTP/DHCP server forconfiguration information.

Each time a node boots, it is assigned its network informationand given a path from which to download a Linux kernel andramdisk via Trivial FTP (TFTP). Although the kernel and ramdisk can be located on the same server as the BOOTP/DHCP server, it is

not a requirement. The kernel must be built withsupport for initial ramdisk enabled because theentire root (/) file system will be located inside theramdisk. This allows the node to boot withoutan NFS mount.

To use the ramdisk as a local root file sys-tem, some modifications are required. When thekernel and ramdisk are loaded into memory, thekernel mounts the ramdisk as a read-write filesystem. Next it looks for a /linuxrc file (abinary executable or a script beginning with#! ). After this file has finished running, thekernel unmounts the ramdisk and mounts atraditional root file system from disk. Because

the root file system does not exist locally ondisk, the /linuxrc file must be linked to

/sbin/init so the OS will run directly from the ramdisk.At this point, the implementation of the two methods splits.Hybrid model. In the hybrid model, the newly booted

kernel checks for the existence of swap , var , and tmp parti-tions for local storage and logging. If correctly sized partitionsdo not exist on the local media, they are created and mounted.The hybrid model reduces the network load, stores logs on astatic storage device, and provides swap space for memory swapping. Storing logs statically allows the log to be savedacross reboots.

Diskless model. The diskless model uses var and tmp directo-ries located on either the initial ramdisk, another ramdisk down-loaded later, or NFS-mounted directories. Using an NFS-mountedvar directory provides a single point of access to cluster-wide logfiles, and the logs will not be lost because of rebooting. This benefitcan ease administration: running utilities locally on the NFS-exporting machine allows log monitoring as well as condensed logreporting. Because no local disk exists, memory swapping mustoccur over NFS or not at all.

Memory swapping is important because if an applicationsneeds exceed the amount of available physical memory, the

program could crash or the operating system might hang.Although memory can be swapped across the network tononlocal drives, this action will drastically degrade theclusters performance. For these reasons, if jobs with unknownmemory requirements will run on the cluster, diskless nodesare not recommended.

Moderately coupled clusters offer administrative benefitsUnder both methods, after the final directories have beenmounted over the network or the local drive, the compute nodewill be fully booted and ready to accept jobs. From an external


Power Solutions Issue 4, 200114

In a moderately

coupled cluster, each

compute node

can be accessed as

an individual machine,

but each node does

not have its own local

copy of the OS.

7/27/2019 4q01-Lin

5/6

as well as a cluster viewpoint, each compute node acts as anindividual machine.

From an administrative viewpoint, upgrading each computenode is vastly simplified. Because only one compute node imageexists, administrators need to upgrade only one image rather thanall the compute nodes. Modifications or updates made to an NFSshared directory take effect immediately. If the changes are made tothe compute node kernel, ramdisk, or any other piece of the OS thathas been downloaded, administrators must reboot each machine forthe changes to take effect.

Tightly coupled clusters remove distinctionsTightly coupled clusters try to remove the distinction betweenthe compute nodes and the cluster. Users see only the master

node, which looks like a massively parallel computer. The com-pute nodes are simply an extension of the master node and arepure processing platforms. The Scyld Beowulf system createssuch a cluster.

Implementing Scyld Beowulf Scyld is a commercial package that provides a simple and con-venient way to install, administer, and maintain a Beowulf clus-ter. A set of multiple compute nodes are managed and controlledfrom a master node. This cluster differs from a traditionalBeowulf deployment because the cluster behaves much like asingle, large-scale parallel computer. The compute nodes arepure compute nodes that do not offer a login shell. System toolshave been modified so administrators can view the entire processspace from the master node, similar to how the local processspace is viewed on a stand-alone machine.

Installation of the master node begins with the ScyldBeowulf CD, which is a modified version of Red Hat Linux 6.2.These modifications include additional cluster management andcontrol applications, message passing libraries, and a modifiedkernel for the compute nodes. Once the master node has beeninstalled from the CD, administrators canchoose from a few compute node installation

methods. A compute node installation transfersthe necessary kernel, ramdisk, and libraries, andoccurs in two stages.

Stage 1. Several media can transfer the stage-1kernel and ramdisk to the compute nodes: floppy disk, CD-ROM, basic input/output system (BIOS),hard drive, or over the network using a network booting protocol such as PXE. The first four only require that a bootable kernel be located on thechosen device. Booting the node from the network requires modifications to the master node. The

master node will need a TFTP server and a DHCP server installedand configured. When a compute node first boots, it will receive anIP address and the location from which to download the stage-1kernel and ramdisk via TFTP. Once a node has booted into stage 1,it is ready to be added to the cluster.

Stage 1 places the compute node in a loop where it wil lrepeatedly broadcast its MAC address via Reverse AddressRequest Protocol (RARP) to indicate that it is available. On themaster node, daemons that are part of the Scyld system willdetect this activity and add the MAC to a list of unknownmachines. A graphical utility called beosetup allows theadministrator to migrate machines from the list of unknownmachines to the compute node list. Compute nodes are assignedtheir node identifier based on the order in which they are added

to the compute node list.Stage 2. This stage begins when the machine has been placed on

the compute node list. The master node first transfers the stage-2kernel to the selected compute node over a TCP/IP connection. Thecurrently running stage-1 kernel then switches to the stage-2 kernelby a technique known as two kernel monte.

After this kernel finishes booting, it downloads all requiredshared libraries, mounts exported file systems, and downloadsand starts two user daemons, beostats and beoslave . Theseuser daemons enable the transfer of node statistics (such as loadand memory pressure) to the master node and the migrationof processes from the master node, respectively. A login consoleis never available on the compute node, so all control and moni-toring is centralized at the master node and communicatedthrough the two user daemons.

Advantages and disadvantages of tightly coupled clustersIt should be noted that no information has been permanently stored on the compute nodes. The Scyld system allows the cre-ation of completely diskless compute nodes if desired but alsoprovides the ability to locally install parts of the system, such as

the stage 1 and 2 pieces and swap space (muchlike the moderately coupled method). From a

user and administrator standpoint, the cluster isone large machine, which is the major advantageof this style of Beowulf cluster. The tightly inte-grated suite of cluster management and cluster-ing utilities offer the administrator a uniform setof tools, which reduces time requirements andinteroperability problems.

Unfortunately, this clustering style has some of the disadvantages of the moderately coupled clus-tering style and also ties the cluster to a single soft-ware vendor for the entire solution stack.


Tightly coupled

clusters try to remove

the distinction

between the compute

nodes and the

cluster.

7/27/2019 4q01-Lin

6/6

4q01-Lin

Documents

Transcript of 4q01-Lin