© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

24
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up

Transcript of © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

Page 1: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

© 2007 IBM Corporation

IBM Global Engineering Solutions

IBM Blue Gene/P

Blue Gene Bring Up

Page 2: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Linux on Service Node

SuSE SLES 10 A RAID array is recommended, typically either RAID1 or RAID5

depending on the number of disks available. Either 1 or 2 volume groups depending on the disk

configuration (rootvg and datavg).

Page 3: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Linux on Service Node

Partitions / - 1 GB - rootlv

/usr - 3 GB - usrlv

/var - 2 GB – varlv

/opt - 10 GB – optlv

/tmp - 10 GB – tmplv

swap - 4 GB - swap - swaplv

/dbhome - 20GB - dbhomelv

/bgsys - 10GB – bgsyslv

Page 4: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Linux on Service Node

RPMs cpp, gcc, libgcc, gcc-c++, gcc-64bit, glibc-devel, libgcc-64bit, bison,

texinfo, flex, termcap, termcap-64bit, gcc-fortran, gmp, gmp-64bit, gmp-devel, gmp-devel-64bit, ncurses-devel, ncurses-devel-64bit

vacpp.rte-8.0.1-2.ppc64.rpm xlsmp.rte-1.6.1-3.ppc64.rpm xlsmp.msg.rte-1.6.1-3.ppc64.rpm

bgp_os, bgp_base, bgptoolchain

Interfaces Functional network

Service network

Public network

Page 5: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Linux on Service Node

Groups

db2rasdb

db2iadm1

db2fadm1

db2asgrp

Users

bgpsysdb

bgpdb2c

bgpadmin

mpirun

NFS

IONodes mount /bgsys to finish their boot process, as such /bgsys is exported on the functional network via NFS

bgpuser

bgpdeveloper

bgpadmin

bgpservice

Page 6: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Front End Node

Groups bgpadmin

bgpservice

bgpdeveloper

bgpuser Users

mpirun Profile

/etc/profile.d/bgp.sh

Page 7: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Group Roles (set using bguser.pl)

Role Capability

user submit Jobs via mpirun read access to small amount of data (job/block status) on Service node, via Navigator access to the Front End nodes complete access compilers/tool chain/etc for development on the Front End nodes

developer submit jobs via mpirun read access to some data (job/block status) on Service node, via Navigator controlled and limited access to Service node - requires userid on SN doesn’t have root access but has elevated privileges complete access compilers/tool chain/etc for development on the Front End nodes. debugs with coreprocessor

admin complete access to Blue Gene/P functions on the Service Node and Front End Node(s)

service access to required debug tools, system logs, read access to database

Page 8: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2 Structure

Page 9: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2 - Why use a Database?

Need a software representation of the hardware A machine of such large scale requires a persistent means of

storing errors (RAS events), job history, block definitions, environmental readings, etc.

Operational state of the machine can be obtained without touching the hardware

Page 10: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Other Benefits of a Database

Setting values in the database can trigger actions in other components

Can simplify the design by having policy stored in the database itself via procedures, triggers, and constraints instead of the code

Information can be obtained using existing tools or SQL

Page 11: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2

Product Description Restricted license

Enterprise Server Edition (ESE)

Client Database Location

/dbhome/bgpsysdb Instances

bgpsysdb (server)

bgpdb2c (client)

Page 12: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2 concepts

SchemaThe collection of database objects such as tables, views, indexes, and triggers that define the database.

TablesA named data object that consists of a specific number of columns and some unordered rows.

ViewsA logical table that consists of data that a query generates.

Page 13: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2 Naming Guidelines for BG/P

Tables always start with TBGP, such as TBGPNodeCard, or TBGPLinkCard

Names are NOT case sensitive in SQL For each of the tables, there is a view that has the more user-friendly

columns, such as location, and without VPD These are named without the T, such as BGPNodeCard In cases where some information is omitted from the view, there is also an

extra view for diags, such as BGPNodeCardAll If there is no need for any derived columns in the view, or omitted

columns, then an alias is created i.e. BGPClockCard The net effect is that almost all the time, using the “BGP” name will show you

what you want If there is a history being kept, then _history is added to the end

Page 14: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

BG/P Tables

TBGPBlock TBGPBPBlockMap TBGPSmallBlock TBGPLinkBlockMap TBGPProductType TBGPMachine TBGPMachineSubnet TBGPMidplane TBGPNodeCard TBGPNode TBGPServiceCard TBGPLinkCard TBGPClockCard TBGPBulkPowerSupply TBGPSwitch TBGPCable TBGPClockCable TBGPLinkChip TBGPICON TBGPFanModule TBGPJob TBGPEthGateway TBGPEGWMachineMap TBGPPortBlockMap TBGPBlockUsers TBGPMidplaneSubnet TBGPNodeSubnet TBGPServiceAction TBGPUserPrefs

TBGPReplacement_history TBGPMachine_history TBGPMidplane_history TBGPNodeCard_history TBGPNode_history TBGPServiceCard_history TBGPLinkCard_history TBGPClockCard_history TBGPLinkChip_history TBGPIcon_history TBGPFanModule_history TBGPJob_history TBGPServiceCardEnvironment TBGPFanEnvironment TBGPClockCardEnvironment TBGPBULKPOWEREnvironment TBGPNodeCardPOWEREnvironment TBGPLinkCardPOWEREnvironment TBGPSrvcCardPOWEREnvironment TBGPLinkChipEnvironment TBGPLinkCardEnvironment TBGPNodeEnvironment TBGPNodeCardEnvironment TBGPEventLog TBGPERRCodes TBGPDiagRuns TBGPDiagBlocks TBGPDiagResults TBGPDiagTests

Page 15: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

BG/P Views

BGPMidplane BGPMidplaneAll BGPNodeCard BGPNodeCardAll BGPNode BGPNodeAll BGPServiceCard BGPServiceCardAll BGPLinkCard BGPLinkCardAll BGPClockCardAll BGPBulkPowerSupplyAllBGPLinkChip BGPLinkChipAllBGPFanModule BGPFanModuleAll BGPLink BGPClockCardEnvironmentBGPDiagTests

BGPNodeCardCountBGPLinkCardCountBGPServiceCardCountBGPNodeCountBGPBasePartitionBGPBPBlockStatusBGPSwitchLinksBGPLinkBlockStatusBGPSwitchPortBGPPortBlockStatusBGPBlockSize

Page 16: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Database setup

Database PopulateThis is a Perl script that populates the database with the expected configuration for the Blue Gene system.

InstallServiceActionVerifies that the predefined structure matches the actual configuration

VerifyCablesConfirms that the torus network cabling is correct

VerifyIpAddressesConfirms that the IO card IP addresses are correct

Page 17: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

DB2/SQL examples

List all tables/viewslist tables

Describe table/viewdescribe table TBGPmidplane

Extracting dataselect * from TBGPmidplaneMore complex

select a.position,count(isionode), a.status, a.seqid

from tbgpnodecard a left outer join bgpnode b

on b.midplanepos = a.midplanepos and b.nodecardpos = a.position and b.isionode = 'T' and b.status <>'M'

where a.midplanepos = ‘R00-M0'

group by a.position,a.status,a.seqid

order by 1

Page 18: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Exercise

Logon to service node as bgpadmin db2 conect to bgdb0 user bgpsysdb List tables in the database List the serial numbers of the nodecards List only the compute cards

Page 19: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

BGP RPMs

RPMs bgp_os bgpbase bgptoolchain

Directory tree /bgsys /bgsys/drivers/ppcfloor – symbolic link to current driver sw /bysys/drivers/ppcfloor/bin - binaries /bgsys/drivers/ppcfloor/bareMetal – service actions scripts

Page 20: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Site Specific Configuration

Templates are located in /bgsys/local/etc rc scripts UIDs and GIDs profiles

/etc/profile.d/bgp.sh

Page 21: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Shutdown

Run a service action on the clock cards in each rack:tertiary, secondary, primary clock cards

‘bgpmaster stop’ stop db2 Power down rack(s) Shutdown FEN Shutdown service node

Page 22: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Startup

Service node Front end node Power up racks ‘bgpmaster start’ End service actions on clock cards (primary, secondary,

tertiary) Verify all hardware is seen

Page 23: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Unexpected Power Outage

Power off all systems Power up and boot service node Power up and boot FEN Power up rack(s) ‘bgpmaster start’ Run install service action

Page 24: © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene Bring Up.

IBM Blue Gene/P System Administration

Exercise

Shutdown and startup system Verify all is well