WORKER NODE
-
Upload
nerea-atkins -
Category
Documents
-
view
21 -
download
1
description
Transcript of WORKER NODE
www.eu-eela.org
E-science grid facility for Europe and Latin America
WORKER NODE
GIUSEPPE PLATANIA
INFN Catania
30 June - 4 July, 2008
2www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
OUTLINE
• OVERVIEW
• INSTALLATION & CONFIGURATION
• TESTING
• FIREWALL SETUP
• TROUBLESHOOTING
3www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
OVERVIEW
• The Worker Node is a service where the jobs run. • Its main functionally are:
– execute the jobs– update to Computing Element the status of the jobs
• It can run several kinds of client batch system:– Torque– LSF– SGE– Condor
4www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
TORQUE client
• The Torque client is composed by a:
– pbs_mompbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
Worker Node installation & configuration using YAIM
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
There are several kinds of metapackages to install:
ig_WN – “Generic” WorkerNode.
ig_WN_noafs – Like ig_WN but without AFS.
ig_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does
not install LSF softwarebut it apply some fixes via ig_configure_node.
ig_WN_LSF_noafs – Like ig_WN_LSF but without AFS.
ig_WN_torque – Torque WorkerNode.
ig_WN_torque_noafs – Like ig_WN_torque but without AFS.
WHAT KIND OF WN?
7www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
Repository settings
• REPOS="ca dag ig jpackage gilda glite-wn_torque.repo"
Download and store repo files:
• for name in $REPOS; do wget \
http://grid018.ct.infn.it/mrepo/repos/$name.repo -O \
/etc/yum.repos.d/$name.repo; done
8www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
INSTALLATION
• yum install jdk java-1.5.0-sun-compat
• yum install lcg-CA
• yum install ig_WN_torque_noafs
In case you want to AFS installed on:
• yum install openafs openafs-client kernel-module-openafs-
`uname -r`
• yum install ig_WN_torque
Gilda rpms:
• yum install gilda_utils gilda_applications
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
• Copy users and groups example files to /opt/glite/yaim/etc/gilda/
cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/
• Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf
cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf
cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf
Customize ig-site-info.def
10www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
• Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it
cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/<your_site-info.def>
• Open /opt/glite/yaim/etc/gilda/<your_site-info.def> file using a text editor and set the following values according to your grid environment:
CE_HOST=<write the CE hostname you are installing>
TORQUE_SERVER=$CE_HOST
Customize ig-site-info.def
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf
The file specified in WN_LIST has to be set with the list of all your WNs hostname.
WARNING: It’s important to setup it before to run the configure command
Customize ig-site-info.def
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.confUSERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.confJAVA_LOCATION="/usr/java/j2sdk1.4.2_12“
JOB_MANAGER=lcgpbsBATCH_BIN_DIR=/usr/binBATCH_VERSION=torque-2.1.9-4VOS=“gilda”ALL_VOMS=“gilda”
Customize ig-site-info.def
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
QUEUES="short long infinite“
SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOS
In case of to configure a queue fo a single VO:
QUEUES="short long infinite gilda“
SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOSGILDA_GROUP_ENABLE=“gilda”
Customize ig-site-info.def
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
WN Torque CONFIGURATION
•Now we can configure the node:
/opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/<your_site-
info.def> -n ig_WN_torque_noafs
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
Worker Nodetesting
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
• Verify if the pbs_mom is active and if its status is free:[root@wn root]# /etc/init.d/pbs_mom statuspbs_mom (pid 3692) is running...
[root@wn root]# pbsnodes -awn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue
Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,availmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectime=1159016111
Testing
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
• First of all, check if a generic user on WN can do ssh to the CE without type the password:
[root@wn root] su – gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001]
• The same test has to be executed between the WNs in order to run MPI jobs:
[gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001]
Testing
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
FIREWALL setup
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]:RH-Firewall-1-INPUT - [0:0]-A INPUT -j RH-Firewall-1-INPUT-A FORWARD -j RH-Firewall-1-INPUT-A RH-Firewall-1-INPUT -i lo -j ACCEPT-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT-A RH-Firewall-1-INPUT -p tcp -s <ip_you_want> --dport 22 -j ACCEPT-A RH-Firewall-1-INPUT -p all -s <your CE ip address> -j ACCEPT-A RH-Firewall-1-INPUT -p all -s <your WN ip address> -j ACCEPT-A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibitedCOMMIT
/etc/sysconfig/iptables
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
IPTABLES STARTUP
/sbin/chkconfig iptables on
/etc/init.d/iptables start
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
Troubleshooting
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
[root@wn root]# su – gilda001[gilda001@wn gilda001] ssh cegilda001@ce’s password:
probably this wn hostname is not in /etc/ssh/shosts.equiv or its ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE
Solution (to run on CE):• Ensure that the wn is in pbs list using:[root@ce root]# pbsnodes –a• And then:[root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv[root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts
Troubleshooting
www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008
[root@wn root]# pbsnodes -awn.localdomain state = down np = 2 properties = lcgpro ntype = cluster
Solution: [root@wn root]# /etc/init.d/pbs_mom restart
Troubleshooting
24www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008