Download - GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Transcript
Page 1: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

GXP in nutshell

• You can send jobs (Unix shell command line) to many machines, very fast

• Very small prerequisites– Each node has python (ver 2.3.4 or later)– You have ssh access to it without being asked

to enter passphrases (e.g., use ssh-agent for ssh)

– Install GXP (only) to your home node. GXP multiplies itself to nodes you want to use

Page 2: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

What is it useful for?

• With GXP, you can comfortably– operate many nodes, interactively or non-intera

ctively– use nodes across multiple clusters– reach nodes behind firewalls/NATs– deal with many nodes some of which are daily

dead or unavailable– coordinate multiple clusters as a parallel proces

sing resource without installing any job scheduling software (PBS/condor etc.)

Page 3: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Things made simple by GXP (1)

• Launch a parallel program on many nodes across multiple clusters

• Kill them with a single stroke of Ctrl-C• Simple PBS/Condor-like job scheduling• Monitor specific programs (ps … on all nodes)• Kill specific programs (killall … on all nodes)• Clean up all processes as a last resort (bomb)

Page 4: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Things made simple by GXP (2)

• Copy a file to many nodes, some behind firewalls/NATs

• Elect a single node from each file system• Get load-average of all nodes and drop hig

hly-loaded nodes• Check installation of a command and drop

nodes that don’t have it• List processes consuming significant amo

unt of CPU time

Page 5: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Our Experience

• A fairly large natural language processing task– parse > 100M web documents collected and archived

by our web crawler– resource : 350 CPUs across two clusters– GXP integrates them without specific efforts

• Environments– No globus/PBS installed (globus/rsh ports are blocked

across clusters)– Documents must be staged on demand due to disk ca

pacity– Nodes in one of the two clusters cannot connect to ou

tside the cluster. We used GXP to stage a file through multiple relaying hosts

Page 6: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Basic Usage

• `explore’ command: login & authenticate yourself to many nodes

• `e’ command: send and execute a command line (very fast)

Page 7: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Features (1) multihop logins

• You can reach nodes through other nodes

• Two typical scenarios where this is important– Home a cluster gateway

cluster nodes– Very large clusters for which trees

are mandatory

• Subsequent command submissions transparently reach all nodes home

cluster gateways

Page 8: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Feature (2) node selection

• You don’t always want to send commands to all nodes

• After logging in many nodes, you can interactively select some of them

• `smask’ command selects the nodes on which the last command succeeded

Page 9: GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver 2.3.4 or later)

Feature (3) coordination syntax

• `e’ command takes an extended shell syntax

• e {{ S }} M– Run S on all selected nodes– Run M on home node– Merge all S’s standard out and feed it to M

• e B {{ S }}– Feed B’s standard out to all S’s

• e S is a abbreviation of e {{ S }}