Sge
-
Upload
chris-roeder -
Category
Documents
-
view
656 -
download
1
description
Transcript of Sge
Scaling, Grid Engine and Running UIMA on the Cluster
Chris Roeder 11/2010
The Scaling Problem
“Does the solution scale?” asks if larger versions of the problem (often more data) can be dealt with by a given piece of software.
“Scaling” is a loose collection of techniques to improve or implement a solution’s scalability.
The choice of techniques depends on the critical resource: cpu, memory or i/o and how easily the task is broken into pieces.
This talk focusses on Scaling as it applies to UIMA NLP processing (not withstanding OpenDMAPv2).
It is a work in progress.
Scaling NLPProcessing a file is independent of processing another file:
Text in, annotations out.
• Multi-threaded– More than one thread of execution in one process
• pipelines share memory and can step on each other.
– Ex. Stanford crashes because of concurrency issues• “was not an issue in 2001”
– <casProcessors casPoolSize=“4" processingUnitThreadCount=“2">
• Multi-process– Separate JVM’s, each with a single thread
• Memory is not shared, no crushed toes• <casProcessors casPoolSize="3" processingUnitThreadCount=“1">• Overhead of repeated JVM and pipeline does cost, but it works.
• Many machines– More memory, more cores– Independence means they won’t miss being on the same machine– Independent machines (Cluster) are cheaper than integrated (Enki)
Hardware
• Local Cluster (Colfax)– A rack of machines with software (SGE) to integrate
• Integrated CPUs (Enki)– Much like a rack, but motherboards are tied together and can
share memory• Gigabit ethernet delivers on the order of 300Mb/sec• Motherboard runs up to 4.8GB/sec
• Virtual Cluster– Virtualization software allows for a single machine to appear as
many, offers flexibility, security• Cloud
– A virtual cluster on the net: Amazon EC2
Hardware: CCP’s Colfax Cluster
• Runs Linux (Fedora/Red Hat)• 6 machines (amc-colfax,
amc-colfaxnd[1-5])• 2 cpus (Intel), 4 cores each, 48 cores total• Intel motherboard• 16GB memory each, 96 GB total• 5TB shared (over NFS) disk array, RAID5
• Named after the assembler: Colfax International
(Sun|Oracle) Grid Engine (SGE)
• Manages a queue of jobs, optimizing resources utilization
• Starts individual processes for a job• Often used with Message Passing Interface
(MPI) for processes that cooperate• Used here to start “Array Jobs”• Each job processes a portion of a large array of
work to be done.
SGE Job
– An SGE job is a script and a command line– Command line specifies resources for scheduling• Memory• others
– Script is run once for each process started• Is not pure shell, but more/less a shell script (next slide)
– Job is assigned an ID number
more/less a shell script?
• Put these lines at top for SGE:– #$ -N stanford_out• Standard out goes to a file with this prefix
– #$ -S /bin/bash• The shell to use (no “she-bang”: #!/bin/sh)
– #$ -cwd• Runs from the current directory
– #$ -j y• Merge stdout and stderr to one file
Submit a Job: qsub
• Qsub –t 1-200000:20000 sge_stanford_out.sh– -t Index Range
• Do array items from 1 to 200 thousand, by 20k: 10 processes
– Do this with the sge_stanford_out.sh script• How does the script know what files to process?– $SGE_TASK_ID (first file number to run)– $SGE_TASK_STEPSIZE
• A task will get values of 0,19999,20000 for example
Sge_stanford_out.sh
• Will evolve into generic UIMA job submission script• Script modifies a template CPE file, creates a CPE
for each process• CPE specifies starting document number and
number to process• http://wikis.sun.com/display/gridengine62u2/How
+to+Submit+an+Array+Job+From+the+Command+Line
[roederc@amc-colfax sge_scripts]$ qsub -t 1-50:3 sge_stanford_out.sh
Your job-array 130.1-50:3 ("stanford_out") has been submitted
qstat[roederc@amc-colfax sge_scripts]$ qstatjob-ID prior name user state submit/start at queue slots ja-
task-ID -------------------------------------------------------------------------------------------------------------
---- 130 0.00000 stanford_o roederc qw 11/02/2010 12:39:01 1 1-49:3[roederc@amc-colfax sge_scripts]$ qmon[roederc@amc-colfax sge_scripts]$ qstatjob-ID prior name user state submit/start at queue slots ja-
task-ID -------------------------------------------------------------------------------------------------------------
---- 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 4 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 7 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 10 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 13 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 16 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 19 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 22 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 25 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 28 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 [email protected] 1 31
Qdel command
• Use to kill a command• Qdel <job num>
Failures?
• Q:What if a job fails?– (A: it stops)
• Open problem– For now, that process dies leaving unprocessed
jobs– Need to cull unprocessed files and try again• Usually not enough memory
– Future: db-driven collection reader with cas-consumer that reports completion
Example 1:
• Distribute a simple script on cluster:– Test_sge.sh– Qsub test_sge.sh• Runs it once
– Qsub test_sge.sh –t 1-5:1• Runs it five times
– Qsub test_sge.sh –t 100-500:100• Also runs it five times• Gives index starts spaced by 100
Example 2:Run UIMA on Cluster
• Sge_stanford_out.sh:• Calls a script with a template CPE and index range: • run_cpe_cluster_stanford_out.sh– Modifies CPE template, creating a CPE for each sub-
range– Sets up environment, calls SimpleRunCPE (java)
• Note temp_cpe_<n>.xml in ../desc/cpe• Start a number of terminals, run “top” in each to
see cpu and memory usage.
Hadoop
• Inspired by Lisp’s map/reduce• Map: apply a function to each element of a hash• Reduce: combine hashes into one• Known for optimizing by moving processing rather
than data• Similar code used by Google. • Hadoop is open source, used by Yahoo, Amazon.• Specialized interfaces make it more suited to
greenfield development
What about “The Cloud”
• Amazon’s Elastic Compute Cloud (EC2) is a cluster on the internet that can be rented by the hour
• Very Dynamic– Set up nodes when you start using them– Expect them to dissapper when you stop– Must have machine configuration management
sussed. You have to re-install everything.• Use S3 for long-term storage• Starts at $0.10/hour
Colfax Cluster
6 CPUs
5TB disk array
Enki
CPU
8TB RAID