Cluster Server

39
Step-by-Step Guide to Installing, Configuring, and Tuning a High- Performance Compute Cluster White Paper Published: June 2007 For the latest information, please see http://www.microsoft.com/windowsserver2003/ccs

Transcript of Cluster Server

Page 1: Cluster Server

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute Cluster White Paper

Published: June 2007

For the latest information, please see http://www.microsoft.com/windowsserver2003/ccs

Page 2: Cluster Server

1

Contents

Introduction .................................................................................................................................1

Before You Begin ........................................................................................................................4

Plan Your Cluster ...................................................................................................................4

Install Your Cluster Hardware ................................................................................................5

Configure Your Cluster Hardware ..........................................................................................6

Obtain Required Software ......................................................................................................6

Installation, Configuration, and Tuning Steps .......................................................................... 11

Step 1: Install and Configure the Service Node .................................................................. 11

Step 2: Install and Configure ADS on the Service Node ..................................................... 15

Step 3: Install and Configure the Head Node ...................................................................... 19

Step 4: Install the Compute Cluster Pack............................................................................ 21

Step 5: Define the Cluster Topology ................................................................................... 22

Step 6: Create the Compute Node Image ........................................................................... 22

Step 7: Capture and Deploy Image to Compute Nodes ...................................................... 25

Step 8: Configure and Manage the Cluster ......................................................................... 26

Step 9: Deploy the Client Utilities to Cluster Users ............................................................. 28

Appendix A: Tuning your Cluster ............................................................................................. 30

Appendix B: Troubleshooting Your Cluster ............................................................................. 33

Appendix C: Cluster Configuration and Deployment Scripts ................................................... 36

Related Links ........................................................................................................................... 37

Page 3: Cluster Server

1 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 1

Introduction

High-performance computing is now within reach for many businesses by clustering industry-standard servers. These clusters can range from a few nodes to hundreds of nodes. In the past, wiring, provisioning, configuring, monitoring, and managing these nodes and providing appropriate, secure user access was a complex undertaking, often requiring dedicated support and administration resources. However, Microsoft

® Windows

® Compute Cluster

Server 2003 simplifies installation, configuration, and management, reducing the cost of compute clusters and making them accessible to a broader audience.

Windows Compute Cluster Server 2003 is a high-performance computing solution that uses

clustered commodity x64 servers that are built with a combination of the Microsoft Windows

Server® 2003 Compute Cluster Edition operating system and the Microsoft Compute Cluster

Pack. The base operating system incorporates traditional Windows system management

features for remote deployment and cluster management. The Compute Cluster Pack

contains the services, interfaces, and supporting software needed to create and configure the

cluster nodes, as well as the utilities and management infrastructure. Individuals tasked with

Windows Compute Cluster Server 2003 administration and management have the advantage

of working within a familiar Windows environment, which helps enable users to quickly and

easily adapt to the management interface.

Windows Compute Cluster Server 2003 is a significant step forward in reducing the barriers to deployment for organizations and individuals who want to take advantage of the power of a compute clustering solution.

- Integrated software stack. Windows Compute Cluster Server 2003 provides an

integrated software stack that includes operating system, job scheduler, message passing

interface (MPI) layer, and the leading applications for each target vertical.

- Better integration with IT infrastructure. Windows Compute Cluster Server 2003

integrates seamlessly with your current network infrastructure (for example, Active

Directory®), enabling you to leverage existing organizational skills and technology.

- Familiar development environment. Developers can leverage existing Windows-based

skills and experience to develop applications for Windows Compute Cluster Server 2003.

Microsoft Visual Studio® is the most widely used integrated development environment

(IDE) in the industry, and Visual Studio 2005 includes support for developing HPC

applications, such as parallel compiling and debugging. Third-party hardware and

software vendors provide additional compiler and math library options for developers

seeking an optimized solution for existing hardware. Windows Compute Cluster Server

2003 supports the use of MPI with Microsoft‘s MPI stack, or the use of stacks from other

vendors.

This step-by-step guide is based on the highly successful cluster deployment at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. The cluster was built as a joint effort between NCSA and Microsoft, using commonly available hardware and Microsoft software. The cluster was composed of 450 x64 servers, achieving 4.1 teraflops (TFLOPs) on 896 processors using the widely accepted LINPACK benchmark. Figure 1 shows the cluster topology used for the NCSA deployment, including the public, private, and MPI networks.

Page 4: Cluster Server

2 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 2

Internet

· Windows Server 2003 R2

Standard x64 Edition

· Microsoft Compute Cluster

Pack

· SQL Server 2005 Standard

Edition x64

· Windows Server 2003

Compute Cluster Edition

· Microsoft Compute

Cluster Pack

· All other required apps

and drivers installed

· Windows Server 2003

Enterprise Edition x86

· Automated

Deployment Services

· Compute node images

· DNS & DHCP

· Active Directory

· Windows Server 2003

Compute Cluster

Edition

· Compute Cluster Pack

Public Network

(GigE)

Private Network

(GigE)

MPI Network (IB)

Compute Node Image

Head Node

Service Node

(ADS Server)

Compute Nodes

Figure 1 Supported cluster topology similar to NCSA-deployment topology

Although every IT environment is different, this guide can serve as a basis for setting up your large-scale compute cluster. If you need additional guidance, see the Related Links section at the end of this guide for more resources.

Note

The intended audience for this document is network administrators who have at least two years‘ experience with network infrastructure, management, and configuration. The example deployment outlined in this document is targeted at clusters in excess of 100 nodes. Although the steps discussed here will work for smaller clusters, they represent steps modeled on large deployments for enterprise-scale and research-scale clusters.

Page 5: Cluster Server

3 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 3

Note

The skill level that is required to complete the steps in this document assumes knowledge of how to install, configure, and manage Microsoft Windows Server 2003 in an Active Directory environment, and experience in adding and managing computers and users within a domain.

Note

This is Version 1 of this document. To download the latest updated version, visit the Microsoft Web site (http://www.microsoft.com/hpc/). The update may contain critical information that was not available when this document was published.

Page 6: Cluster Server

4 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 4

Before You Begin

Setting up a compute cluster with Windows Server 2003 Compute Cluster Edition begins with the following tasks:

1. Plan your cluster.

2. Install your cluster hardware.

3. Configure your cluster hardware.

4. Obtain required software.

When you have completed these tasks, use the steps in the Installation, Configuration, and Tuning Steps section to help you install, configure, and tune your cluster.

Plan Your Cluster

This step-by-step guide provides basic instructions on how to deploy a Windows compute cluster. Your cluster planning should cover the types of nodes that are required for a cluster, and the networks that you will use to connect the nodes. Although the instructions in this guide are based on one specific deployment, you should also consider your environment and the number and types of hardware you have available.

Your cluster requires three types of nodes:

- Head node. A head node mediates all access to the cluster resources and acts as a

single point for cluster deployment, management, and job scheduling. There is only one

head node per cluster.

- Service node. A service node provides standard network services, such as directory and

DNS and DHCP services, and also maintains and deploys compute node images to new

hardware in the cluster. Only one service node is needed for the cluster, although you can

have more than one service node for different roles in the cluster—for example, moving

the image deployment service to a separate node.

- Compute node. A compute node provides computational resources for the cluster.

Compute nodes are provided jobs and are managed by the head node.

Additional node types that can be used but are not required are remote administration nodes and application development nodes. For an overview of device roles in the cluster, see the Windows Compute Cluster Server 2003 Reviewers Guide (http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx).

Your cluster also depends on the number and types of networks used to connect the nodes. The Reviewer‘s Guide discusses the topologies that you can use to connect your nodes, by using combinations of private and public adapters for message passing between the nodes and system traffic among all of the nodes. For the cluster detailed in this guide, the head node and service node have public and private adapters for system traffic, and the compute nodes have private and message passing interface (MPI) adapters. (Note: This is not a supported topology but is very similar to one that is.) Consult the Reviewer‘s Guide for the advantages of each network topology.

Lastly, you should consider the level of cluster expertise, networking knowledge, and amount of management time available on your staff to dedicate to your cluster. Although deployment and management is simplified with Windows Compute Cluster Server 2003, keep in mind that no matter what the circumstances, a large-scale compute cluster deployment should not be taken lightly. It is important to understand how management and deployment work when

Page 7: Cluster Server

5 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 5

planning for the appropriate resources. Compute Cluster Server uses robust, enterprise-grade technologies for all aspects of network and device management. Its management tools and programs allow granular, role-based management of security for cluster administration and cluster users, and its network and system management tools can easily and quickly deploy applications and jobs using familiar, wizard-based interfaces. Additional compute nodes can be added automatically to the compute cluster by simply plugging the nodes in and connecting them to the cluster. Extensive (and expensive) daily hands-on tweaking, configuration, and management are not needed when using commodity hardware and a standards-based infrastructure.

Install Your Cluster Hardware

For ease of management and configuration, all nodes in the deployment in this guide will use the same basic hardware platform. Hardware requirements for computers running Windows Compute Cluster Server 2003 are similar to those for Windows Server 2003, Standard x64 Edition. You can find the system requirements for your cluster at http://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspx. Table 1 shows a list of hardware for all nodes. This list is based on the hardware used in the NCSA deployment.

Table 1: Hardware for All Nodes

Component Recommended Hardware

CPU Blade servers - Each blade has two single-core

3.2 GHz processors with 2 MB cache and an

800MHz front-side bus. Motherboard includes 4x

PCI Express slots.

RAM 4 x 1GB 400 MHz DIMMs. For compute nodes,

you should plan on having 2 GB RAM per core.

Storage SCSI adapter, 73GB 10K RPM Ultra320 SCSI

disk. RAID may be used on any node, but was

not used in this deployment. For the head node,

you should plan on having three disks: one for

the OS, one for the database, and one for the

transaction logs. This will provide improved

performance and throughput.

Network Interface Cards 1000 Mb Gigabit Ethernet adapter

1x InfiniBand 4x PCE Express adapter

Gigabit Network Hardware 48-port Gigabit switch per rack: 40 ports for

blades, 4 for uplink to ring

48-port Layer 2 Gigabit switches in ring

configuration

InfiniBand Network Hardware 5x 24-port InfiniBand switches per rack

2x 96-port InfiniBand switches for cross-rack

connectivity

Note

The head node and the network services node each use two Gigabit Ethernet network adapters; both the compute nodes and the head nodes use the private MPI network, though the head node‘s MPI interface was disabled for this specific deployment. Also, the service node requires a 32-bit operating system, since ADS will only work with 32-bit, but you can run the operating system on 32-bit or 64-bit hardware. (This is a custom configuration used on the

Page 8: Cluster Server

6 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 6

cluster deployment at NCSA and is not supported for general use. However, it is very similar to a supported cluster topology. For more information on supported cluster topologies, please refer to the Windows Compute Cluster Server 2003 Reviewers Guide.)

Configure Your Cluster Hardware

When you have added your switches and blades to the rack, you must configure the network connections and network hardware prior to installing the network software. To configure your hardware, follow the checklist in Table 2.

Table 2: Hardware Configuration Checklist

Check when completed

Configuration Item

Connect all high-speed interconnect connections from the pass-through

module on the chassis to the rack‘s high-speed interconnect switches.

Connect all Gigabit Ethernet connections from the pass-through

module on the chassis to the rack‘s 48-port Gigabit Ethernet switch.

Connect all Infiniband switches to the Layer 2 switches.

Connect all Gigabit Ethernet switches to the Gigabit Ethernet Layer 2

switches.

Disable the built-in subnet manager on all switches. The built-in subnet

manager doesn‘t support OpenIB clients, and conflicts with the subnet

manager that does support such clients.

Change the BIOS boot sequence on all nodes to Network Pre-boot

Execution Environment (PXE) first, CD ROM second, and Hard Drive

third. For platforms that dynamically remove missing devices at power-

up, an efficient way to set the hard drives last in the boot order is to pull

the hard drives, power up the devices once, power off the devices, put

the drives back in, and then power up again. The boot order will be set

correctly thereafter.

Disable hyperthreading on all nodes and set the node‘s system clock to

the correct time zone, if required.

Obtain a list of all private Gigabit Ethernet adapter MAC addresses for

the compute nodes. These addresses are used as input with a

configuration script to identify your nodes and configure them with the

proper image. In some cases you can use the blade chassis telnet

interface to collect the MAC addresses. See Appendix C for a

description of the input file and the file format.

Obtain Required Software

In addition to Windows Compute Cluster Server 2003, you will need to obtain operating systems, administration utilities, drivers, and Quick Fix files to bring your systems up-to-date. Table 3 lists the software required for each node type, and the notes following the chart show you where to obtain the necessary software. The following list is based on the software used in the NCSA deployment.

Page 9: Cluster Server

7 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 7

Table 3: Software Required by Node Type

Software Required by Node Type Head Node

Service Node

Compute Node

Windows Server 2003 R2 Standard Edition x64

Windows Server 2003 R2 Enterprise Edition x86

Windows Server 2003 Compute Cluster Edition x64

Microsoft Compute Cluster Pack

SQL Server 2005 Standard Edition x64

Automated Deployment Services (ADS) version 1.1

Microsoft Management Console (MMC) 3.0

.NET Framework 2.0

Windows Preinstall Environment (WinPE)

QFE KB910481

QFE KB914784

Microsoft System Preparation tool (sysprep.exe)

Cluster configuration and deployment scripts

Latest network adapter drivers

Notes on the software required the deployment described in this paper:

Microsoft SQL Server™ 2005 Standard Edition x64: By default, the Compute Cluster Pack will install MSDE on the head node for data and node tracking purposes. Because MSDE is limited to eight concurrent connections, SQL Server Standard Edition 2005 is recommended for clusters with more than 64 compute nodes.

ADS version 1.1: ADS requires 32-bit versions of Windows Server 2003 Enterprise Edition for image management and deployment. Future Microsoft imaging technology (Windows Deployment Services, available in the next release of Windows Server, code name ―Longhorn‖) will support 64-bit software. You can download the latest version of ADS from the Microsoft Web site (http://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspx).

Because this paper is based on a previous large-scale compute cluster deployment at NCSA, it details using ADS to deploy compute node images as opposed to using Microsoft Windows Deployment Services (WDS). However, future updates to this paper will explain how to use WDS to deploy compute node images to your cluster.

MMC 3.0: MMC 3.0 is required for the administration node, which may or may not be the head node. It is automatically installed by the Compute Cluster Pack on the computer that is used to administer the cluster. You can also download and install the latest versions for Windows Server 2003 and Windows XP x86 and x64 versions at the Microsoft Web site (http://support.microsoft.com/?kbid=907265).

.NET Framework 2.0: The .NET Framework is automatically installed by the Compute Cluster Pack. You can also download the latest version at the Microsoft Web site (http://msdn2.microsoft.com/en-us/netframework/aa731542.aspx).

Page 10: Cluster Server

8 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 8

WinPE: You will need a copy of Windows Preinstallation Environment for Windows Server 2003 SP1. If you need to add your Gigabit Ethernet drivers to the WinPE image, you will need to obtain a copy of the Windows Server 2003 SP1 OEM Preinstallation Kit (OPK), which contains the programs needed to update the WinPE image for your hardware. WinPE and the OPK are available only to customers with enterprise or volume license agreements; contact your Microsoft representative for more information.

QFE KB910481: This Quick Fix is for potential problems when deploying Winsock Direct in a fast Storage Area Network (SAN) environment. You can download the quick fix at the Microsoft Web site (http://support.microsoft.com/?kbid=910481).

QFE KB914784: This Quick Fix is in response to a Security Advisory and provides additional kernel protection in some environments. You can download the quick fix at the Microsoft Web site (http://support.microsoft.com/?kbid=914784).

Sysprep.exe: Sysprep.exe is used to help prepare the compute node image prior to deployment. Sysprep is included as part of Windows Server 2003 Compute Cluster Edition. Note: You must use the x64 bit version of Sysprep in order to capture and deploy your images.

Cluster configuration and deployment scripts: These scripts are available to download at http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspx. They include hard-coded paths and require you to follow the installation and usage instructions exactly as described in this guide. If you must modify the scripts for your deployment, make sure that you verify that the scripts work in your environment before using them to deploy your cluster.

For the scripts to run properly, you will also need specific information about your cluster and its hardware. Appendix C contains a sample input file (AddComputeNodes.csv) that is used to automatically configure the compute cluster nodes and populate Active Directory with node information. Table 4 lists the specific items needed, with room for you to write down the values for your deployment. You can then use this information when building your cluster and when creating your compute node images. Follow the instructions in Appendix C for creating your own sample input file.

Note

Every item in Table 4 must have an entry or the input file will not work properly. If you do not have a value for a field, use a hyphen ‗-‗ for the field instead.

Latest network adapter drivers: Contact the manufacturer of your network adapters for the most recent drivers. You will need to install these drivers on your cluster nodes.

Table 4: Cluster Information Needed for Script Input File

Input Value Your Value Description

FullName Populates the cluster node registry with

the Registered Owner name.

Organisation name Populates the cluster node registry with

the Registered Organization name.

ProductKey 25-digit alphanumeric product key used

for all compute cluster nodes. Contact

your Microsoft representative for your

volume license key.

Server Name Populates Active Directory with a

Compute Cluster node name.

Page 11: Cluster Server

9 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 9

Input Value Your Value Description

Srv Description Populates the ADS Management

console with a text description of the

node. Can be used to list rack placement

or other helpful information.

Server MAC Gigabit Ethernet MAC address for each

compute cluster node.

Machine Name Used to configure the cluster node with a

machine name. Must match the value in

the Server Name field.

Admin Password Local administrator password.

Domain The cluster domain name (for example,

―HPCCluster.local‖).

Domain Username Account name with permission to add

computers to a domain.

Domain Password Password for the account with

permission to add computers to a

domain.

ImageName The image name to be installed on the

cluster node (for example, CCSImage).

HPC Cluster Name The head node name must be used for

the cluster name.

NetworkTopology Must be ―Single‖.

PartitionSize Not used.

PublicIP Not used.

PublicSubnet Not used.

PublicGateway Not used.

PublicDNS Not used.

PublicNICName Not used.

PublicMAC Not used.

PrivateIP Not used.

PrivateSubnet Not used.

PrivateGateway Not used.

PrivateDNS Not used.

PrivateNICName Not used.

PrivateMAC Not used.

MPIIP Assigns a static address to the MPI

adapter (for example, 11.0.0.1).

MPISubnet Assigns a subnet mask to the MPI

adapter (for example, 255.255.0.0).

MPIGateway Not used.

MPIDNS Not used.

Page 12: Cluster Server

10 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 10

Input Value Your Value Description

MPINICName Not used.

MPIMAC Not used.

MachineOU Populates Active Directory with Machine

OU information (for example,

OU=Cluster

Servers,DC=HPCCluster,DC=local).

Page 13: Cluster Server

11 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 11

Installation, Configuration, and Tuning Steps

To install, configure, and tune a high-performance compute cluster, complete the following steps:

1. Install and configure the service node.

2. Install and configure ADS on the service node.

3. Install and configure the head node.

4. Install the Compute Cluster Pack.

5. Define the cluster topology.

6. Create the compute node image.

7. Capture and deploy image to compute nodes.

8. Configure and manage the cluster.

9. Deploy the client utilities to cluster users.

Step 1: Install and Configure the Service Node

The service node provides all the back-end network services for the cluster, including

authentication, name services, and image deployment. It uses standard Windows technology

and services to manage your network infrastructure. The service node has two Gigabit

Ethernet network adapters and no MPI adapters. One adapter connects to the public network;

the other connects to the private network dedicated to the cluster.

There are five tasks that are required for installation and configuration:

1. Install and configure the base operating system.

2. Install Active Directory, Domain Name Services (DNS), and DHCP.

3. Configure DNS.

4. Configure DHCP.

5. Enable Remote Desktop for the cluster.

Install and configure the base operating system. Follow the normal setup procedure for Windows Server 2003 R2 Enterprise Edition, with the exceptions as noted in the following procedure.

To install and configure the base operating system

Boot the computer to the Windows Server 2003 R2 Enterprise Edition CD.

1. Accept the license agreement.

2. On the Partition List screen, create two partitions: one partition of 30 GB, and a

second using the remainder of the space on the hard drive. Select the 30 GB partition

as the install partition, and then press ENTER.

3. On the Format Partition screen, accept the default of NTFS, and then press ENTER.

Proceed with the remainder of the text-mode setup. The computer then reboots into

graphical setup mode.

Page 14: Cluster Server

12 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 12

4. On the Licensing Modes page, select the option for which you are licensed, and

then configure the number of concurrent connections if needed. Click Next.

5. On the Computer Name and Administrator Password page, type a name for the

service node (for example, SERVICENODE). Type your local administrator password

twice, and then press ENTER.

6. On the Networking Settings page, select Custom settings, and then click Next.

7. On the Networking Components page for your private adapter, select Internet

Protocol (TCP/IP), and then click Properties. On the Internet Protocol (TCP/IP)

Properties page, select Use the following IP address. Configure the adapter with a

static nonroutable address, such as 10.0.0.1, and a 24-bit subnet mask (255.0.0.0).

Select Use the following DNS server addresses, and then configure the adapter to

use 127.0.0.1. Click OK, and then click Next.

Note: If this computer has a 1394 Net Adapter, it will ask you to set the IP for that

adapter first (before setting setting TCP/IP properties). Click Next to skip this page

(unnecessary to the cluster deployment) and move on to setting the TCP/IP

properties.

8. Repeat the previous step for the public adapter. Configure the adapter to acquire its

address by using DHCP from the public network. If you prefer, you can assign it a

static address if you have one already reserved. Configure the public adapter to use

127.0.0.1 for DNS queries. Click OK, and then click Next.

9. On the Workgroup or Computer Domain page, accept the default of No and the

default of WORKGROUP, and then click Next. The computer will copy files, and then

reboot.

10. Log in to the server as administrator. Click Start, click Run, type diskmgmt.msc, and

then click OK. The Disk Management console starts.

11. Right-click the second partition on your drive, and then click Format. In the Format

dialog box, select Quick Format, and then click OK. When the format process is

finished, close the Disk Management console.

Install Active Directory, DNS, and DHCP. Windows Server 2003 provides a wizard to

configure your server as a typical first server in a domain. The wizard configures your server

as a root domain controller, installs and configures DNS, and then installs and configures

DHCP.

To install Active Directory, DNS, and DHCP

1. Log in to your service node as Administrator. If the Manage Your Server page is not

visible, click Start, and then click Manage Your Server.

2. Click Add or remove a role. The Configure Your Server Wizard starts. Click Next.

3. On the Configuration Options page, select Typical configuration for a first server,

and then click Next.

4. On the Active Directory Domain Name page, type the domain name that will be

used for your cluster and append the ―.local‖ suffix (for example, HPCCluster.local).

Click Next.

Page 15: Cluster Server

13 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 13

5. On the NetBIOS Domain Name page, accept the default NetBIOS name (for

example, HPCCLUSTER) and click Next. At the Summary of Selections page, click

Next. If the Configure Your Server Wizard prompts you to close any open

programs, click OK.

6. On the NAT Internet Connection page, make sure the public adapter is selected.

Deselect Enable security on the selected interface, and then click Next. If you

have more than two network adapters in your computer, the Network Selection page

appears. Select the private LAN adapter and then click Next. Click Finish. After the

files are copied, the server reboots.

7. After the server reboots, log on as Administrator. Review the actions listed in the

Configure Your Server Wizard, and then click Next. Click Finish.

Configure DNS. DNS is required for the cluster and will be used by people who want to use

the cluster. It is linked to Active Directory and manages the node names that are in use. DNS

must be configured so that name resolution will function properly on your cluster. The

following task helps to configure your DNS settings for your private and public networks.

To configure DNS

1. Click Start, and then click Manage Your Server. In the DNS Server section, click

Manage this DNS server. You can also start the DNS Management console by

clicking Start, Administrative Tools, and then DNS.

2. Right-click your server, and then click Properties.

3. Click the Interfaces tab. Select Only the following IP addresses. Select the public

interface, and then click Remove. Only the private interface should be listed. If it is

not, type the IP address of the private interface, and then click Add. This ensures that

your services node will provide DNS services only to the private network and not to

addresses on the rest of your network. Click Apply.

4. Click the Forwarders tab. If the public interface is using DHCP, confirm that the

forwarder IP list has the IP address for a DNS server in your domain. If not, or if you

are using a static IP address, type the IP address for a DNS server on your public

network, and then click Add. This ensures that if the service node cannot resolve

name queries, the request will be forwarded to another name server on your network.

Click OK.

5. In the DNS Management console, select Reverse Lookup Zones. Right-click

Reverse Lookup Zones, and then click New Zone. The New Zone Wizard starts.

Click Next.

6. On the Zone Type page, select Primary zone, and then select Store the zone in

Active Directory. Click Next.

7. On the Active Directory Zone Replication Scope page, select To all domain

controllers in the Active Directory domain. Click Next.

8. On the Reverse Lookup Zone Name page, select Network ID, and then type the

first three octets of your private network‘s IP address (for example, 10.0.0). A reverse

name lookup is automatically created for you. Click Next.

Page 16: Cluster Server

14 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 14

9. On the Dynamic Update page, select Allow only secure dynamic updates. Click

Next.

10. On the Completing the New Zone Wizard page, click Finish. The new reverse

lookup zone is added to the DNS Management console. Close the DNS Management

console.

Configure DHCP. Your cluster requires automated IP addressing services to keep node

traffic to a minimum. Active Directory and DHCP work together so that network addressing

and resource allocation will function smoothly on your cluster. DHCP has already been

configured for your cluster network. However, if you want finer control over the number of IP

addresses available and the information provided to DHCP clients, you must delete the

current DHCP scope and create a new one, using settings that reflect your cluster

deployment.

To configure DHCP

1. Click Start, and then click Manage Your Server. In the DHCP Server section, click

Manage this DHCP server. You can also start the DHCP Management console by

clicking Start, clicking Administrative Tools, and then clicking DHCP.

2. Right-click the scope name (for example, Scope [10.0.0.0] Scope1), and then click

Deactivate. When prompted, click Yes. Right-click the scope again, and then click

Delete. When prompted, click Yes. The old scope is deleted.

3. Right-click your server name and then click New Scope. The New Scope Wizard

starts. Click Next.

4. On the Scope Name page, type a name for your scope (for example, ―HPC Cluster‖)

and a description for your scope. Click Next.

5. On the IP Address Range page, type the start and end ranges for your cluster. For

example, the start address would be the same address used for the private adapter:

10.0.0.1. The end address depends on how many nodes you plan to have in your

cluster. For up to 250 nodes, the end address would be 10.0.0.254. For 250 to 500

nodes, the end address would be 10.0.1.254. For the subnet mask, you can either

increase the length to 16, or type in a subnet mask of 255.255.0.0. Click Next.

6. On the Add Exclusions page, you define a range of addresses that will not be

handed to computers at boot time. The exclusion range should be large enough to

include all devices that use static IP addresses. For this example, type the start

address of 10.0.0.1 and an end address of 10.0.0.9. Click Add, and then click Next.

7. On the Lease Duration page, accept the defaults, and then click Next.

8. On the Configure DHCP Options page, select Yes, I want to configure these

options now, and then click Next.

9. On the Router (Default Gateway) page, type the private network adapter address

(for example, 10.0.0.1), and then click Add. Click Next.

10. On the Domain Name and DNS Servers page, in the Parent domain text box, type

your domain name (for example, HPCCluster.local). In the Server name text box,

type the server name (for example, SERVICENODE). In the IP ADDRESS fields, type

Page 17: Cluster Server

15 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 15

the private network adapter address (for example, 10.0.0.1). Click Add, and then click

Next.

11. On the WINS Servers page, click Next.

12. On the Activate Scope page, select Yes, I want to activate this scope now, and

then click Next.

13. On the Completing the New Scope Wizard page, click Finish. Close the DHCP

Management console.

Enable Remote Desktop for the cluster. You can enable Remote Desktop for nodes on

your cluster so that you can log on remotely and manage services by using the node‘s

desktop.

To disable Windows Firewall and enable Remote Management for the domain

1. Click Start, click Administrative Tools, and then click Active Directory Users and

Computers.

2. Right-click your domain (for example, hpccluster.local), click New, and then click

Organizational Unit.

3. Type the name of your new OU (for example, Cluster Servers) and then click OK. A new

OU is created in your domain.

4. Right-click your OU and then click Properties. The OU Properties dialog appears. Click

the Group Policy tab. Click New. Type the name for your new Group Policy (for example,

Enable Remote Desktop) and then press ENTER.

5. Click Edit. The Group Policy Object Editor opens. Browse to Computer Configuration \

Administrative Templates \ Windows Components \ Terminal Services.

6. Double-click Allow users to connect remotely using Terminal Services. Click Enabled

and then click OK. Close the Group Policy Object Editor.

7. On the OU Properties page, on the Group Policy tab, select your new Group Policy and

then click Options. Click No Override, click OK. You have created a new Group Policy

for your OU that enables Remote Desktop. Click OK.

Step 2: Install and Configure ADS on the Service Node

ADS is used to install compute node images on new hardware with little or no input from the

cluster administrator. This automated procedure makes it easy to set up and install new

nodes on the cluster, or to replace failed nodes with new ones. To install and configure ADS,

perform the following procedures:

1. Copy and update the WinPE binaries.

2. Copy and edit the script files.

3. Install and configure ADS.

4. Share the ADS certificate.

5. Import ADS templates.

6. Add devices to ADS.

Page 18: Cluster Server

16 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 16

Copy and update the WinPE binaries. The WinPE binaries provide a simple operating system for the ADS Deployment Agent and the scripting engine to run scripts against the node. Because the WinPE binaries are based on the installation files that are found on the Windows Server 2003 CD, the driver cabinet files may not include the drivers for your Gigabit Ethernet adapters. If your adapter is not recognized during installation and configuration of your compute node image, you will need to update the WinPE binaries with the necessary adapter drivers and information files.

Note: You can also wait to create the WinPE binaries until after you have installed and configured ADS on the service node.

Copy and update the WinPE binaries

1. Create a C:\WinPE folder on your service node. Copy the WinPE binaries to C:\WinPE.

2. To update your WinPE binaries with the drivers and information files for your adapter,

create a C:\Drivers folder on your service node. Copy the .sys, .inf, and .cat files for your

driver to C:\Drivers.

3. Click Start, click Run, type cmd, and then click OK. A command prompt window opens.

4. Change directories to C:\WinPE\.

5. Type drvinst.exe /inf:c:\drivers\<filename>.inf c:\WinPE, where <filename> is the file name

for your driver‘s .inf file, and then press ENTER. Your WinPE binaries are now updated

with the drivers for your Gigabit Ethernet Adapter.

Copy and edit the script files. Follow the normal setup procedure for Windows Server 2003 R2 Enterprise Edition, with the exceptions noted later.

Copy and edit the script files

1. Create the C:\HPC-CCS. Create three new folders within the HPC-CCS folder: C:\HPC-

CCS\Scripts, C:\HPC-CCS\Sequences, and C:\HPC-CCS\Sysprep. Create the folder

C:\HPC-CCS\Sysprep\I386.

2. Copy the files AddADSDevices.vbs, ChangeIPforIB.vbs, and AddComputeNodes.csv (or

the name of your input file) into C:\HPC-CCS\Scripts. Copy Capture-CCS-image-with-

winpe.xml and Deploy-CCS-image-with-winpe.xml into C:\HPC-CCS\Sequences. Copy

sysprep.inf into C:\HPC-CCS\Sysprep.

3. Insert the Windows Server 2003 Compute Cluster Edition CD into the CD drive. Browse to

the CD folder \Support\Tools. Double-click Deploy.cab. Copy the files sysprep.exe and

setupcl.exe to the C:\HPC-CCS\Sysprep\I386 folder. You must use the 64-bit versions of

these files or the image capture script will not work.

4. Use the chart in Table 4 to edit the file AddComputeNodes.csv (or the name of your input

file) and use the values for your company, your administrator password information, your

product key, MAC addresses, and MachineOU values. The easiest way to work with this

file, especially for entering the MAC addresses, is to import it into Excel as a comma-

delimited file, add the necessary values, and then export the data as a comma-separated

value file.

Install and configure ADS. You can download the ADS binaries from Microsoft, and then either copy them to your service node or burn them onto a CD.

Page 19: Cluster Server

17 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 17

To install and configure ADS

1. Browse to the CD or the folder containing the ADS binaries and then run ADSSetup.exe.

2. A Welcome page appears. Click Install Microsoft SQL Server Desktop Engine SP4

(Windows). The setup program automatically installs the MSDE software.

3. On the Welcome page, click Install Automated Deployment Services. The Automated

Deployment Services Setup Wizard starts. Click Next.

4. On the License Agreement page, select I accept the terms of the license agreement,

and then click Next.

5. On the Setup Type page, select Full installation, and then click Next.

6. The Installing PXE warning dialog appears. Click OK, and then click Next.

7. On the Configure the ADS Controller page, make sure that Use Microsoft SQL Server

Desktop Engine (Windows) is selected, and that Create a new ADS database is

selected. Click Next.

8. On the Network Boot Service Settings page, make sure that Use this path is selected.

Insert the Windows Server 2003 R2 Enterprise Edition x86 CD into the drive. Browse to

the CD drive, or type the drive containing the CD, and then click Next.

9. On the Windows PE Repository page, select Location of Windows PE. Browse to the

folder containing the WinPE binaries (for example, C:\WinPE). In the Repository name

text box, type a name for your repository (for example, NodeImages). Click Next.

10. On the Image Location page, type the path to the folder where the images will be stored.

These must be on the second partition that you created on your server (for example,

E:\Images). The folder will be created and shared automatically. Click Next.

11. If ADS Setup Wizard detects more than one network adapter in your computer, the

Network Settings for ADS Services page is displayed. In the Bind to this IP address

drop-down list, select the IP address that the ADS services will use to distribute images

on the private network, and then click Next.

12. On the Installation Confirmation page, click Install.

13. On the Completing the Automated Deployment Services Setup Wizard page, click

Finish. Close the Automated Deployment Services Welcome dialog box.

14. To open the ADS Management console, click Start, click All Programs, click Microsoft

ADS, and then click ADS Management.

15. Expand the Automated Deployment Services node, and then select Services. In the

center pane, right-click Controller Services, and then click Properties. On the Controller

Service Properties page, select the Service tab, and then change Global job template to

boot-to-winpe. For the Device Identifier, select MAC Address. For the WinPE

Repository Name, type NodeImages or the repository name that you created earlier.

Click Apply, and then click OK.

16. In the ADS Management console, right-click Image Distribution Service, and then click

Properties. Select the Service tab, and ensure that Multicast image deployment is

selected. Click OK.

Share the ADS certificate. ADS creates a computer certificate when it is installed. This certificate is used to identify all computers in the cluster. The certificate must be shared so

Page 20: Cluster Server

18 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 18

that the compute node image can import the certificate and then use it during the configuration process.

To share the ADS certificate

1. Click Start, click Administrative Tools, and then click Server Management. The Server

Management console opens.

2. Click Shared Folders, and then click New File Share. The Share a Folder Wizard

starts. Click Next.

3. On the Folder Path page, click Browse, and then browse to C:\ Program Files\ Microsoft

ADS\ Certificate. Click Next.

4. On the Name, Description, and Settings page, accept the defaults, and then click Next.

5. On the Permissions page, accept the defaults, and then click Finish. Click Close, and

then close the Server Management console. The ADS certificate is shared on your

network.

Import ADS templates. ADS includes several templates that are useful when managing your nodes, including reboot-to-winpe and reboot-to-hd. The templates are not installed by default; you must add them to ADS using a batch file. You also need to add the compute cluster templates to ADS so that you can capture and deploy the compute node image on your network.

To import ADS templates

1. Open Windows Explorer and browse to C:\ Program Files\ Microsoft ADS\ Samples\

Sequences.

2. Double-click create-templates.bat. The script file automatically installs the templates in

ADS. Close Windows Explorer.

3. Click Start, click All Programs, click Microsoft ADS, and then click ADS Management.

The ADS Management console opens.

4. Browse to Job Templates. Right-click Job Templates, and then click New Job

Template. The New Job Template Wizard starts. Click Next.

5. On the Template Type page, select An entirely new template, and then click Next.

6. On the Name and Description page, type a name for the compute node capture

template (for example, Capture Compute Node). Type a description (for example, Run

within Windows Server CCE), and then click Next.

7. On the Command Type page, select Task sequence, and then click Next.

8. On the Script or Executable Program page, browse to C:\hpc-ccs\sequences. Select All

files from the Files of type drop-down list. Select Capture-CCS-image-with-winpe.xml,

and then click Open. Click Next.

9. On the Device Destination page, select None, and then click Next. Click Finish. Your

capture template is added to ADS.

10. Repeat steps 4 through 9. In step 6, use Deploy Compute Node and Run from WinPE as

the name and description. In step 8, select the file Deploy-CCS-image-with-winpe.xml.

When finished, you have added the deployment template to ADS.

Page 21: Cluster Server

19 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 19

Add devices to ADS. Follow the normal setup procedure for Windows Server 2003 R2 Enterprise Edition, with the exceptions noted later.

To add devices to ADS

1. Populate the ADS server with ADS devices. Click Start, click Run, type cmd.exe, and

then click OK. Change the directory to C:\HPC-CCS\Scripts.

2. Type AddADSDevices.vbs AddComputeNodes-Sample.csv (use the name of your input

file instead of the sample file name). The script will echo the nodes as they are added to

the ADS server. When the script is finished, close the command window.

If your company uses a proxy server to connect to the Internet, you should configure your server so that it can receive system and application updates from Microsoft.

1. To configure your proxy server settings, open Internet Explorer®. Click Tools, and

then click Internet Options.

2. Click the Connections tab, and then click LAN Settings.

3. On the Local Area Network (LAN) Settings page, select Use a proxy server for your LAN. Enter the URL or IP address for your proxy server.

4. If you need to configure secure HTTP settings, click Advanced, and then enter the URL and port information as needed.

5. Click OK three times, and then close Internet Explorer.

When you have finished configuring your server, click Start, click All Programs, and then click Windows Update. This will ensure that your server is up-to-date with service packs and software updates that may be needed to improve performance and security.

Step 3: Install and Configure the Head Node

The head node is responsible for managing the compute cluster nodes, performing job control, and acting as the gateway for submitted and completed jobs. It requires SQL Server 2005 Standard Edition as part of the underlying service and support structure. You should consider using three hard drives for your head node: one for the operating system, one for the SQL Server database, and one for the SQL Server transaction logs. This will provide reduced drive contention, better overall throughput, and some transactional redundancy should the database drive fail.

In some cases, enabling hyperthreading on the head node will also result in improved performance for heavily-loaded SQL Server applications.

There are two tasks that are required for installing and configuring your head node:

1. Install and configure the base operating system.

2. Install and configure SQL Server 2005 Standard Edition.

To install and configure the base operating system

1. On the head node computer, boot to the Windows Server 2003 R2 Standard Edition

x64 CD.

2. Accept the license agreement.

Page 22: Cluster Server

20 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 20

3. On the Partition List screen, create two partitions: one partition of 30 GB, and a

second that uses the remainder of the space on the hard drive. Select the 30 GB

partition as the install partition, and then press ENTER.

4. On the Format Partition screen, accept the default of NTFS, and then press ENTER.

Proceed with the remainder of the text-mode setup. The computer then reboots into

graphical setup mode.

5. On the Licensing Modes page, select the option for which you are licensed, and

then configure the number of concurrent connections, if needed. Click Next.

6. On the Computer Name and Administrator Password page, type a name for the

head node (for example, HEADNODE). Type the account with permission to join a

computer to the domain (for example, hpccluster\administrator), type the password

twice, and then press ENTER.

7. On the Networking Settings page, select Typical settings, and then click Next. This

will automatically assign addresses to your public and private adapters. If you want to

use static IP addresses for either interface, select Custom Settings, and then click

Next. Follow the steps that you used to configure your service node adapter settings.

8. On the Workgroup or Computer Domain page, select Yes, make this computer a

member of a domain. Type the name of your cluster domain (for example,

HPCCluster.local), and then click Next. When prompted, type the name and the

password for an account that has permission to add computers to the domain

(typically, the Administrator account), and then click OK. Note: If your network

adapter drivers are not included on the Windows Server 2003 CD, then you will not

be able to join a domain at this time. Instead, make the computer a member of a

workgroup, complete the rest of setup, install your network adapters, and then join

your head node to the domain.

When you have configured the base operating system, you can install SQL Server 2005 Standard Edition on your head node.

To install and configure SQL Server 2005 Standard Edition

1. Log on to your server as Administrator. Insert the SQL Server 2005 Standard Edition x64

CD into the head node. If setup does not start automatically, browse to the CD drive and

then run setup.exe.

2. On the End User License Agreement page, select I accept the licensing terms and

conditions, and then click Next.

3. On the Installing Prerequisites page, click Install. When the installations are complete,

click Next. The Welcome to the Microsoft SQL Server Installation Wizard starts. Click

Next.

4. On the System Configuration Check page, the installation program displays a report

with potential installation problems. You do not need to install IIS or address any IIS-

related warnings because IIS is not used in this deployment. Click Next.

5. On the Registration Information page, complete the Name and Company fields with the

appropriate information, and then click Next.

6. On the Components to Install page, select all check boxes, and then click Next.

Page 23: Cluster Server

21 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 21

7. On the Instance Name page, select Named instance, and then type

COMPUTECLUSTER in the text box. Your cluster must have this name, or Windows

Compute Cluster will not work. Click Next.

8. On the Service Account page, select Use the built-in System account, and then select

Local system in the drop-down list. In the Start services at the end of setup section,

select all options except SQL Server Agent, and then click Next.

9. On the Authentication Mode page, select Windows Authentication Mode. Click Next.

10. On the Collation Settings page, select SQL collations, and then select Dictionary

order case-insensitive for use with 1252 Character Set from the drop-down list. Click

Next.

11. On the Error and Usage Report Settings page, click Next.

12. On the Ready to Install page, click Install. When the Setup Progress page appears,

click Next.

13. On the Completing Microsoft SQL Server 2005 Setup page, click Finish.

14. Open the Disk Management console. Click Start, click Run, type diskmgmt.msc, and then

click OK.

15. Right-click the second partition on your drive, and then click Format. In the Format dialog

box, select Quick Format, and then click OK. When the format process finishes, close

the Disk Management console.

If your company uses a proxy server to connect to the Internet, you should configure your head node so that it can receive system and application updates from Microsoft.

1. To configure your proxy server settings, open Internet Explorer. Click Tools, and then click Internet Options.

2. Click the Connections tab, and then click LAN Settings.

3. On the Local Area Network (LAN) Settings page, select Use a proxy server for your LAN. Enter the URL or IP address for your proxy server.

4. If you need to configure secure HTTP settings, click Advanced, and then enter the URL and port information as needed.

5. Click OK three times, and then close Internet Explorer.

When you have finished configuring your server, click Start, click All Programs, and then click Windows Update. This will ensure that your server is up-to-date with service packs and software updates that may be needed to improve performance and security. You should elect to install Microsoft Update from the Windows Update page. This service provides service packs and updates for all Microsoft applications, including SQL Server. Follow the instructions on the Windows Update page to install the Microsoft Update service.

Step 4: Install the Compute Cluster Pack

When the head node has been configured, you can install the Compute Cluster Pack that contains services, interfaces, and supporting software that is needed to create and configure cluster nodes. It also includes utilities and management infrastructure for your cluster.

Page 24: Cluster Server

22 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 22

To install the Compute Cluster Pack

1. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute

Cluster Pack Installation Wizard appears. Click Next.

2. On the Microsoft Software License Terms page, select I accept the terms in the

license agreement, and then click Next.

3. On the Select Installation Type page, select Create a new compute cluster with this

server as the head node. Do not use the head node as a compute node. Click Next.

4. On the Select Installation Location page, accept the default. Click Next.

5. On the Install Required Components page, a list of required components for the

installation appears. Each component that has been installed will appear with a check

next to it. Select a component without a check, and then click Install.

6. Repeat the previous step for all uninstalled components. When all of the required

components have been installed, click Next. The Microsoft Compute Cluster Pack

Installation Wizard completes. Click Finish.

Step 5: Define the Cluster Topology

After the Compute Cluster Pack installation for the head node is complete, a Cluster Deployment Tasks window appears with a To Do List. In this procedure, you will configure the cluster to use a network topology that consists of a single private network for the compute nodes and a public interface from the head node to the rest of the network.

To define the cluster topology

1. On the To Do List page, in the Networking section, click Configure Cluster Network

Topology. The Configure Cluster Network Topology Wizard starts. Click Next.

2. On the Select Setup Type page, select Compute nodes isolated on private network

from the drop-down list. A graphic appears that shows you a representation of your

network. You can learn more about the different network topologies by clicking the Learn

more about this setup link. When you have reviewed the information, click Next.

3. On the Configure Public Network page, select the correct public (external) network

adaptor from the drop-down list. This network will be used for communicating between the

cluster and the rest of your network. Click Next.

4. On the Configure Private Network page, select the correct private (internal) adaptor

from the drop-down list. This network will be used for cluster management and node

deployment. Click Next.

5. On the Enable NAT Using ICS page, select Disable Internet Connection Sharing for

this cluster. Click Next.

6. Review the summary page to ensure that you have chosen an appropriate network

configuration, and then click Finish. Click Close.

Step 6: Create the Compute Node Image

You can now create a compute node image. This is the compute node image that will be captured and deployed to each of the compute nodes. There are three tasks that are required to create the compute node image:

Page 25: Cluster Server

23 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 23

1. Install and configure the base operating system.

2. Install and configure the ADS agent and Compute Cluster Pack.

3. Update the image and prepare it for deployment.

To install and configure the base operating system

1. Start the node that you want to use to create your compute node image. Insert the

Microsoft Windows Server 2003 Compute Cluster Edition CD into the CD drive. Text-

mode setup launches automatically.

2. Accept the license agreement.

3. On the Partition List screen, create one partition of 16 GB. Select the 16 GB partition as

the install partition, and then press ENTER.

4. On the Format Partition screen, accept the default of NTFS, and then press ENTER.

Proceed with the remainder of the text-mode setup. The computer then reboots into

graphical setup mode.

5. On the Licensing Modes page, select the option for which you are licensed, and then

configure the number of concurrent connections, if needed. Click Next.

6. On the Computer Name and Administrator Password page, type a name for the

compute node that has not been added to ADS (for example, NODE000). Type your local

administrator password twice, and then press ENTER.

7. On the Networking Settings page, select Typical settings, and then click Next. This will

automatically assign addresses to your public and private adapters. The adapter

information for the deployed nodes will be automatically created when the image is

deployed to a node.

8. On the Workgroup or Computer Domain page, select Yes, make this computer a

member of a domain. Type the name of your cluster domain (for example, HPCCluster),

and then click Next. When prompted, type the name and the password for an account

that has permission to add computers to the domain (for example,

hpccluster\administrator), and then click OK. The computer will copy files, and then

reboot. Note: If your network adapter drivers are not included on the Windows Server

2003 Compute Cluster Edition CD, then you will not be able to join a domain at this time.

Instead, make the computer a member of a workgroup, complete the rest of setup, install

your network adapters, and then join your compute node to the domain.

9. Log on to the node as administrator.

10. Copy the QFE files to your compute node. Run each executable and follow the

instructions for installing the quick fix files on your server.

11. Open Regedit. Click Start, click Run, type regedit, and then click OK.

12. Browse to HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\

Parameters. Right-click in the right pane. Click New, and then click DWORD value. Type

SynAttackProtect (case sensitive), and then press ENTER.

13. Double-click the new key that you just created. Confirm that the value data is zero, and

then click OK.

Page 26: Cluster Server

24 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 24

14. Right-click in the right pane. Click New, and then click DWORD value. Type

TcpMaxDataRetransmissions (case sensitive), and then press ENTER.

15. Double-click the new key that you just created. In the Value data text box, type 20.

Ensure that Base is set to Hexadecimal, and then click OK.

16. Close Regedit.

17. Disable any network interfaces that will not be used by the cluster, or that do not have

physical network connectivity.

When you have configured the base operating system, you can then install and configure the ADS Agent and the Compute Cluster Pack on your image.

To install and configure the ADS Agent and Compute Cluster Pack

1. Copy the ADS binaries to a folder on the compute node. Browse to the folder, and then

run ADSSetup.exe.

2. A Welcome page appears. Click Install ADS Administration Agent. The

Administration Agent Setup Wizard starts. Click Next.

3. On the License Agreement page, select I accept the terms of the license agreement,

and then click Next.

4. On the Configure Certificates page, select Now. Type the fully-qualified path to the

certificate share on the service node (for example, \\servicenode \Certificate\ adsroot.cer).

Click Next.

5. On the Configure the Agent Logon Settings page, select None, and then click Next.

6. On the Installation Confirmation page, click Install.

7. On the Completing the Administration Agent Setup Wizard page, click Finish. Close

the Automated Deployment Services Welcome page.

8. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute

Cluster Pack Installation Wizard appears. Click Next.

9. On the Microsoft Software License Terms page, select I accept the terms in the

license agreement, and then click Next.

10. On the Select Installation Type page, select Join this server to an existing compute

cluster as a compute node. Type the name of the head node in the text box (for

example, HEADNODE). Click Next.

11. On the Select Installation Location page, accept the default. Click Next.

12. On the Install Required Components page, a list of required components for the

installation appears. Each component that has been installed will appear with a check

next to it. Select a component without a check, and then click Install.

13. Repeat the previous step for all uninstalled components. When all of the required

components have been installed, click Next. When the Microsoft Compute Cluster Pack

completes, click Finish.

When you have installed and configured the ADS Agent and Compute Cluster pack, you can update your image with the latest service packs, and then prepare your image for deployment.

Page 27: Cluster Server

25 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 25

To update the image and prepare it for deployment

1. Run the Windows Update service on your compute node. If your cluster lies behind a

proxy server, configure Internet Explorer with your proxy server settings. For information

on how to do this, see Step 1: Install and Configure the Service Node, earlier in this

guide.

2. Run the Disk Cleanup utility. Click Start, click All Programs, click Accessories, click

System Tools, and then click Disk Cleanup. Select the C: drive, and then click OK.

Select all of the check boxes, and then click OK. When the cleanup utility is finished,

close the utility.

3. Run the Disk Defragmenter utility. Click Start, click All Programs, click Accessories,

click System Tools, and then click Disk Defragmenter. Select the C: drive, and then

click Defragment. When the defragmentation utility is finished, close the utility.

Step 7: Capture and Deploy Image to Compute Nodes

You can now capture the compute node image that you just created. You can then deploy the image to compute nodes on your cluster.

To capture the compute node image

1. If the compute node is not running, turn on the computer and wait for the node to boot into

Windows Server 2003 Compute Cluster Edition.

2. Log on to the service node as administrator. Click Start, and then click ADS

Management. Right-click Devices, and then click Add Device.

3. In the Add Device dialog box, type a name in the Name text box (for example, Node000),

a description for your node (for example, Compute Node Image), and then type the MAC

address for the node that is running the compute node image. Click OK. The status pane

will indicate that the node was created successfully. Click Cancel to close the dialog box.

4. Right-click your compute node name. Click Properties, and then click the User Variables

tab.

5. Click Add. In the Variables dialog box, in the Name text box, type Imagename. In the

Value text box, type a name for your image (for example, CCSImage). Click OK twice.

6. Right-click the compute node device again, and then click Properties. In the WinPE

repository name text box, type the name for your repository that you defined when you

installed ADS (for example, NodeImages). Click Apply, and then click OK.

7. Right-click the compute node that you just added, and then click Take Control.

8. Right-click the compute node device again, and then click Run job. The Run Job Wizard

starts. Click Next.

9. On the Job Type page, select Use an existing job template, and then click Next.

10. On the Template Selection page, select Capture Compute Node. Click Next.

11. On the Completing the Run Job Wizard page, click Finish. A Created Jobs dialog box

appears. Click OK. The ADS Agent on your compute node runs the job, using Sysprep to

prepare and configure the node image, and then using the ADS image capture functions

to create and copy the image to ADS. When the image capture is complete, the node

boots into WinPE.

Page 28: Cluster Server

26 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 26

Deploy the image to nodes on the cluster. When you have captured the compute node image to the service node, you can deploy the image to compute nodes on the cluster.

To deploy the image to nodes on the cluster

1. Log on to the service node as administrator. Click Start, click All Programs, click

Microsoft ADS, and then click ADS Management.

2. Expand the Automated Deployment Services node, and then select Devices.

3. Select all devices that appear in the right pane, right-click on the selected devices,

and then select Take Control. The Control Status changes to Yes.

4. Right-click on the devices, and then click Run job.

5. The Run Job Wizard appears. Click Next.

6. On the Job Type page, select Use an existing job template. Click Next.

7. On the Template Selection page, select boot-to-winpe. Click Next.

8. On the Completing the Run Job Wizard page, click Finish.

9. Boot the computer nodes. The network adapters should already be configured to use

PXE and obtain the WinPE image from the service node. To avoid overwhelming the

ADS server during unicast deployment of WinPE image, it is recommended that you

boot only four nodes at a time. Subsequent sets of four nodes should be booted up

only after all of the previous sets of four nodes are showing Connected to WinPE

status in the ADS Management window on the head node.

10. After all the nodes are connected to WinPE, you can deploy the compute node image

to those nodes. Right-click the devices, and then click Run job.

11. The Run Job Wizard appears. Click Next.

12. On the Job Type page, select Use an existing job template. Click Next.

13. On the Template Selection page, select Deploy CCS Image. Click Next.

14. On the Completing the Run Job Wizard page, click Finish. The nodes automatically

download and run the image. This task will take a significant amount of time,

especially when you are installing hundreds of nodes. Depending on your available

staff, you may want to run this as an overnight task. When finished, your nodes are

joined to the domain and ready to be managed by the head node.

Step 8: Configure and Manage the Cluster

The head node is used to manage and maintain your cluster once the node images have been deployed. The Compute Cluster Pack includes a Compute Cluster Administrator console that simplifies management tasks, including approving nodes on the cluster and adding users and administrators to the cluster. The console includes a To Do List that shows you which tasks have been completed. Follow these steps to configure and manage your cluster:

1. Disable Windows Firewall on all nodes in the cluster.

2. Approve nodes that have joined the cluster.

3. Add users and administrators to the cluster.

Page 29: Cluster Server

27 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 27

Disable Windows Firewall on all nodes on the cluster. The Compute Cluster Administrator console enables you to define how the firewall is configured on all cluster node network adapters. For best performance on large-scale deployments, it is recommended that you disable Windows Firewall on all interfaces.

To disable Windows Firewall on all nodes on the cluster

1. Click Start, click All Programs, click Microsoft Compute Cluster Pack, and then click

Compute Cluster Administrator.

2. Click the To Do List. In the Networking section in the results pane, click Manage

Windows Firewall Settings. The Manage Windows Firewall Wizard starts. Click Next.

3. On the Configure Firewall page, select Disable Windows Firewall, and then click Next.

4. On the View Summary page, click Finish. On the Result page, click Close. When

compute nodes are approved to join the cluster, the firewall will be disabled.

Approve nodes that have joined the cluster. When you deploy Compute Cluster Edition nodes, they have joined the cluster but have not been approved to participate or process any jobs. You must approve them before they can receive and process jobs from your users.

To approve nodes that have joined the cluster

1. Open the Compute Cluster Administrator console. Click Node Management.

2. In the results pane, select one or more nodes that display a status of Pending Approval.

3. In the task pane, click Approve. You can also right-click the selected nodes and then

click Approve.

4. When the nodes are approved, the status changes to Paused. You can leave the nodes in

Paused status, or in the task pane you can click Resume to enable the node to receive

jobs from your users.

Add users and administrators to your cluster. In order to use and maintain the cluster, you must add cluster users and administrators to your cluster domain. This will make it possible for others to submit jobs to the cluster, and to perform routine administration and maintenance on the cluster. If your organization uses Active Directory, you will need to create a trust relationship between your cluster domain and other domains in your organization. You will also need to create organizational units (OUs) in your cluster domain that will act as containers for other OUs or users from your organization. You may need to work with other groups in your company to create the necessary security groups so that you can add users from other domains to your compute cluster domain. Because each organization is unique, it is not possible to provide step-by-step instructions on how to add users and administrators to the cluster domain. For help and information on how best to add users and administrators to your cluster, see Windows Server Help.

To add users and administrators to your cluster

1. In the Compute Cluster Administrator, click the To Do List. In the results pane, click

Manage Cluster Users and Administrators. The Manage Cluster Users Wizard starts.

Click Next.

2. On the Cluster Users page, the default group of HPCCLUSTER\Domain Users has been

added for you. Type a user or group by using the format domain\user or domain\group,

and then click Add. You can add or remove users and groups using the Add and

Page 30: Cluster Server

28 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 28

Remove buttons. When you have finished adding or removing users and groups, click

Next.

3. On the Cluster Administrators page, the default group of HPCCLUSTER\Domain

Admins has been added for you. Type a user or group using the format domain\user or

domain\group, and then click Add. You can add or remove users and groups by using the

Add and Remove buttons. When you have finished adding or removing users and

groups, click Next.

4. On the View Summary page, click Next.

5. On the Result page, click Close.

Step 9: Deploy the Client Utilities to Cluster Users

The Compute Cluster Administrator and the Compute Cluster Job Manager are installed on the head node by default. If you install the client utilities on a remote workstation, an administrator can manage clusters from that workstation. If you install the Compute Cluster Administrator or Job Manager on a remote computer, the computer must have one of the following operating systems installed:

Windows Server 2003, Compute Cluster Edition

Windows Server 2003, Standard x64 Edition

Windows Server 2003, Enterprise x64 Edition

Windows XP Professional x64 Edition

Windows Server 2003 R2 Standard x64 Edition

Windows Server 2003 R2 Enterprise x64 Edition

In addition, Windows Compute Cluster Server 2003 requires the following:

Microsoft .NET Framework 2.0

Microsoft Management Console (MMC) 3.0 to run the Compute Cluster Administrator snap-in

SQL Server 2000 Desktop Engine (MSDE) to store all job information

The last step in the Windows Compute Cluster Server 2003 deployment process is to create an administrator or operator console.

To deploy the client utilities

1. On the workstation that is running the appropriate operating system, insert the Compute Cluster Pack CD. The Microsoft Compute Cluster Pack Installation Wizard is automatically launched. Click Next.

2. On the Microsoft Software License Terms page, select I accept the terms in the license agreement, and click Next.

3. On the Select Installation Type page, select Install only the Microsoft Compute Cluster Pack Client Utilities for the cluster users and administrators, and then click Next.

4. On the Select Installation Location page, accept the default location, and then click Next.

5. On the Install Required Components page, highlight any components that are not installed, and then click Install.

Page 31: Cluster Server

29 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 29

6. When the installation is finished, a window appears that says Microsoft Compute Cluster Pack has been successfully installed. Click Finish.

Please note that for an administration console, you should install only the client utilities. For a development workstation, you should install both the software development kit (SDK) utilities and the client utilities.

Page 32: Cluster Server

30 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 30

Appendix A: Tuning your Cluster

Each cluster is created with a different goal in mind; therefore, there is a different way to tune each cluster for optimal performance. However, some basic guidelines can be established. To achieve performance improvements, you can do some planning, but testing will also be crucial. For testing, it is important to use applications and data that are as close as possible to the ones that the cluster will ultimately use. In addition to the specific use of the cluster, its projected size will be another basis for making decisions. After you deploy the applications, you can work on tuning the cluster appropriately.

The best networking solution will depend on the nature of your application. Although there are many different types of applications, they can be broadly categorized as message-intensive and embarrassingly parallel. In message-intensive applications, each node‘s job is dependent on other nodes. In some situations, data is passed between nodes in many small messages, meaning that latency is the limiting factor. With latency-sensitive applications, high-performance networking interfaces, such as Winsock Direct, are crucial. In addition, the use of high-quality routers and switches can improve performance with these applications.

In some messaging situations, large messages are passed infrequently, meaning that bandwidth is the limiting factor. A specialty network, such as InfiniBand or Myrinet, will meet these high-bandwidth requirements. If network latency is not an issue, a gigabit Ethernet network adapter might be the best choice.

In embarrassingly parallel applications, each node processes data independently with little message passing. In this case, the total number of nodes and the efficiency of each node is the limiting factor. It is important to be able to fit the entire dataset into RAM. This will result in much faster performance, as the data will not have to be paged in and out from the disk during processing. The speed of the processors and the type and number of nodes is a prime concern. If the processors are dual-core or quad-core, this may not be as efficient as having separate processors, each with their own memory bus. In addition, if hyper-threading is available, it may be advantageous to turn this feature off. Hyper-threading is used when applications are not using all CPU cycles, so we have them run on a single processor. Hyperthreading is generally bad for high-performance computing applications, but not necessarily all of them. So long as the operating system kernel is hyperthread-aware, the floating point intensive processes will be balanced across physical cores. For multi-threaded applications that may have I/O intensive threads and floating point intensive threads, hyperthreading could be a benefit. Hyperthreading was disabled at NCSA because none of the applications were floating-point intensive, and no specific thread-balancing or kernel-tuning was performed. This works for regular scenarios, but in high-performance computing, all CPU resources are used, so having all processes on a single processor has the opposite effects: they have to wait to get resources. If the application were actually perfectly parallel, each extra node would increase performance time linearly.

For each application, there are a maximum number of processors that will increase performance. Above that number, each processor adds no value, and could even decrease performance. This is referred to as application scaling. Depending on the system architecture, all cores sometimes divide available bandwidth to memory, and they certainly always divide the network bandwidth. One of these three (CPU, network, or memory bandwidth) is the performance bottleneck with any application. If the nature of the application(s) is known, you can determine in advance the optimal cluster specifications that will match the application. You should work with your application vendor to ensure that you have the optimal number of processors.

In some applications, many jobs are run, each of short duration. With this scenario, the performance of the job scheduler is crucial. The CCS job scheduler was designed to handle this situation.

Page 33: Cluster Server

31 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 31

When evaluating cluster performance, it is important to be aware that benchmarks don‘t always tell the whole story. You must evaluate the performance based on your own needs and expectations. Evaluation should take place by using the application along with the data that will be running on the cluster. This will help to ensure a more accurate evaluation that will result in a system that better meets your needs.

For more information on cluster tuning, you can download the ―Performance Tuning a Compute Cluster‖ white paper from the Microsoft Web site at http://go.microsoft.com/fwlink/?LinkId=87828

You can also find additional tips and new information on performance tuning at the HPC Team blog: http://windowshpc.net/blogs.

Table 5 deals with scalability and will help you make decisions based on the intended size of your cluster. The first part focuses on management scenarios, while the second part focuses on networking scenarios. For each scenario, there are an estimated number of nodes, above which the scenario will manifest itself. If your cluster exceeds the specified number of nodes, you may need to use the Note column to plan accordingly, or to troubleshoot.

Table 5: Scalability Considerations

Management Scenario Nodes Note

MSDE on Head Node

supports 8 or fewer concurrent connections.

64+ Use SQL Server 2005 on the head node (hard coded)

5-7 tables for scheduler. Use 8 tables for SDM.

RIS on Service Node

supports only 80 machines simultaneously.

64+ Use ADS for CCS 2003 (ADS requires 32-bit).

ICS/NAT has an address

range limit of 192.168.0.* 250+ Use DHCP Server instead.

The File server on Head Node only supports a limited

number of simultaneous connections to SMB/NTFS.

24 Executable on compute nodes

Increase the number of connections that the file server on the head node can support (see KB 317249)

The DC/DNS server on Head Node is not optimal. It doesn‘t

handle well with several NICs.

64+ It is best to leverage corporate IT DC.

Put DC/DNS on a separate management node.

ADS loses contact with

compute nodes after WinsockDirect has been enabled.

N/A Use clusrun or jobs to control the machine. If IPMI is available, use IPMI to reboot the machine into winPE.

WDS for next version of CCS works with WinsockDirect.

Cisco IB switch subnet manager is incompatible with openIB drivers.

N/A Use openSM:

Disable Cisco IB switch subnet manager

Enable openSM

A SDM update bottleneck

exists. 64+ CCS V1 SP1

Job Scheduling bottlenecks

exist. 64+ CCS V1 SP1

WinsockDirect (large scale

only) 64+ CCS V1 SP1

Winsock Direct hotfixes 910481 , 927620, 924286

Infiniband drivers (large

scale only) 64+ This is fixed in openFabrics build 459, found at:

http://windows.openib.org/downloads/binaries/

Page 34: Cluster Server

32 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 32

Management Scenario Nodes Note

There is a bottleneck in the number of possible simultaneous connections with code path used when SYN attack protection is on.

64+ Disable SYN attack protection registry value

HKLM\System\CurrentControlSet\Services\Tcpip\Parameters

SynAttackProtect =0

There are TCP timeouts on

calling nodes when network is jammed (delay at switch). For example, mpi all reduce.

64+ Set TCP retransmission count to 0x20. Please note that this is hard to diagnose as one-to-all makes different nodes fail.

Latency is too high. N/A Use mpiexec –env IBWSD_POLL 500 linpack.

Bandwidth is too low. N/A Use mpiexec –env MPICH_SOCKET_SBUFFER_SIZE 0 to avoid copy on send to improve bandwidth. Only use this when Winsock Direct is enabled – it can cause lockup with GigE and IPoIB.

A Winsock Direct connection

timeout exists. N/A Use mpiexec –env IBWSD_SA_TIMEOUT 1000 to set

the subnet manager timeout to a higher value during Winsock Direct connection establishment.

Page 35: Cluster Server

33 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 33

Appendix B: Troubleshooting Your Cluster

In addition, Table 6 can help you troubleshoot problems with your cluster.

Table 6: Troubleshooting

Issue Mitigation Details

Application Hangs

N/A

For ease of debugging, switch off shared memory communication using environment variable MPICH_DISABLE_SHM=1

Note: This is done from the

command line with the command:

mpiexec –env VARIABLE

SETTING -env OTHERVARIABLE

OTHERSETTING

Note: You can also set up

WinDbg for just-in-time debugging with the command Windbg –I

MPI environment variable MPICH_DISABLE_SHM

If you disable the shared memory, MSMPI stops looking at communication between processors and focuses on network queues (node to node) communication.

Application Fails

Network connectivity issue

The last line of the MPI output file gives you information on network errors

Note: The stdout output is located

where you route it in your job; i.e., it is specified by the /stdout: switch to ―job submit‖

Output file

SYN protection interferes with connectivity under heavy load

Turn SYN protection off completely on all CNs (leave SYN protection active on the head node to avoid denial-of-service)

Registry setting for SYN attack protection:

KEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\SynAttackProtect=0

To deploy this setting to all nodes:

clusrun /all reg add

HKLM\SYSTEM\CurrentControlSet\Serv

ices\Tcpip\Parameters /v

SynAttackProtect /t REG_DWORD /d 0

/f

clusrun /all shutdown -t 10 -r -f

Page 36: Cluster Server

34 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 34

Issue Mitigation Details

Network connectivity failure

Identify node with defect

Use Pallas ping pong, one-to-all, all-to-all

A good set of tools for this are the Linux-based Intel MPI benchmarks (based on the Pallas test suite). These are available for download from http://windowshpc.net/files/4/porting_unix_code/entry373.aspx

Note: Because these tests are Linux-

based, you will have to port them to CCS using the Subsystem for UNIX Applications (SUA). Instructions how to do this are included with the download.

Winsock Direct issues

Disable Winsock Direct (WSD) and use the IPoIB path instead of RDMA:

Clusrun /all

\\HEADNODE\IBDriverInstallPa

th\net\amd64\installsp -r

If it works when disabled, then try to ‗Repair‘ IB connections clusrun netsh interface set interface name=MPI admin=DISABLE/ENABLE

Validate that your cluster has the latest Winsock Direct patches

IB driver and Winsock Direct installation utility

Application Performance Not Optimal

Application not optimized for memory or CPU utilization

Check whether nodes are paging instead of using RAM

Check CPU utilization

Use perfmon counters

http://go.microsoft.com/fwlink/?LinkId=86619

Application doesn‘t scale to large number of nodes

Decrease the number of nodes used by the application until application performance comes back to expected level

MSMPI does not balance well between same node processors communication and node to node communication

Experiment with disabling the shared memory setting and see whether application performance improves

Especially relevant for message-intensive applications

MPICH_DISABLE_SHM

Messages are not coming in fast enough

GigE: Experiment with turning off the interrupt modulation to free up CPU usage

IB: Experiment with increasing polling of messages.

Polling causes high CPU usage, so if usage is too high, it will be detrimental to the application computing CPU needs.

openIB driver IBWSD_POLL

Page 37: Cluster Server

35 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 35

Issue Mitigation Details

Connectivity to one or more nodes on the cluster is lost

Divide the cluster into subsets of nodes.

Run Pallas ping pong or Pallas one-to-all or Pallas all-to-all on those subsets.

Intel MPI benchmarks.

This strategy breaks the cluster into subclusters to try to find where the issue is. In each sublcluster run sanity tests like the Pallas series in order to discover which subcluster contains the ―bad‖ node.

Switches oversubscription not optimal

Try a higher number of uplinks This strategy involves checking the number of uplinks/downlinks per switch to check to see this is the cause of poor application performance.

Send operation Experiment with having no extra copy on the Send operation

MSMPI setting

Set MPICH_SOCKET_SBUFFER_SIZE to 0

Note: This is done on the command line

with the command:

mpiexec –env VARIABLE SETTING -env

OTHERVARIABLE OTHERSETTING

Note: This will lead to higher bandwidth but also to higher CPU utilization.

Note: Use this only when compute

nodes are fitted with a WSD-enabled driver. Using a setting of 0 will cause the compute nodes on non-WSD networks to stop responding.

Memory bus bottleneck

Experiment with setting the processor affinity (assign an MPI process to a specific CPU or CPU core)

An example of doing this from the command line:

job submit /numprocessors:12

mpiexec /cmd /c setAffinity.bat

myapp.exe

where setAffinity.bat consists of:

@echo off

set /a AFFINITY="1 << (%PMI_RANK%

%% %NUMBER_OF_PROCESSORS%)"

echo affinity is %AFFINITY%

start /wait /b /affinity

%AFFINITY% %*

Page 38: Cluster Server

36 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 36

Appendix C: Cluster Configuration and Deployment

Scripts

These scripts are used to automatically add nodes to the cluster and to deploy images to nodes automatically without administrator intervention.

AddADSDevices.vbs. Parses an input file and uses the data to automatically populate ADS with the correct compute node information, including node names and MAC address values. These values are later used by Sysprep.exe to configure the node images during deployment. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb01.mspx

AddComputeNodes.csv. Sample input file that shows configuration information needed for adding nodes to the cluster. The easiest way to work with this file is to import it into Excel as a comma-delimited file, add the necessary values, including compute node MAC addresses, and then export the data as a comma-separated value file. Every item must have an entry or the input file will not work properly. If you do not have a value for a field, use a hyphen ‗-‗ for the field instead. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/node/ccnovb11.mspx

Capture-CCS-image-with-winpe.xml. ADS job template that captures a compute node image for later deployment to nodes on the cluster. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb04.mspx

Deploy-CCS-image-with-winpe.xml. ADS job template that deploys a compute node image to compute nodes on the cluster. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb02.mspx

Sysprep.inf. Generic configuration file for use with Sysprep.exe. Variable values are retrieved from ADS during the image deployment process. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb05.mspx

The original high-performance compute cluster used additional scripts specific to its environment, including configuring InfiniBand networking. If you have similar needs, you can use these examples as a foundation for creating your own scripts and job templates.

ChangeIPforIB.vbs. Original script to configure IP over InfiniBand networking. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb03.mspx

Capture-image-with-winpe.xml. Original job template to capture compute node image. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb06.mspx

Deploy-image-on-16GB-with-winpe.xml. Original job template to deploy a compute node image to the compute nodes. http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb07.mspx

Page 39: Cluster Server

37 Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster 37

Related Links

For more information about Windows Compute Cluster Server 2003 and high-performance computing, visit the Windows high-performance computing Web site at http://www.microsoft.com/hpc

For more information on scripts, visit Scripting for Compute Cluster Server at http://www.microsoft.com/technet/scriptcenter/hubs/ccs.mspx.

For more information on the Windows Server 2003 family, visit the Windows Server 2003 Web site at http://www.microsoft.com/windowsserver2003

For information on obtaining professional assistance with planning and deploying a cluster, visit Microsoft Partner Solutions Center at http://www.microsoft.com/serviceproviders/busresources/mpsc.mspx

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2007 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Internet Explorer, Virtual Studio, Windows, the Windows logo, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

All other trademarks are property of their respective owners.