Shape determination of proteins in solution using high throughput computing Donna Lammie
High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing
description
Transcript of High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing
![Page 1: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/1.jpg)
High Performance Computing: Concepts, Methods & Means
High Capacity (Throughput) Computing
Prof. Thomas SterlingDepartment of Computer ScienceLouisiana State UniversityJanuary 25th, 2007
![Page 2: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/2.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
2
![Page 3: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/3.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
3
![Page 4: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/4.jpg)
Key Terms and Concepts
4
Problem
instructions
CPU
Conventional serial execution where the problem is represented as a series of instructions that are executed by the CPU
CPU CPU CPU CPU
instructions
Task Task Task TaskProblemProblem
Parallel execution of a problem involves partitioning of the problem into multiple executable parts that are mutually exclusive and collectively exhaustive represented as a partially ordered set exhibiting concurrency.
Parallel computing takes advantage of concurrency to :• Solve larger problems within
bounded time• Save on Wall Clock Time• Overcoming memory constraints• Utilizing non-local resources
![Page 5: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/5.jpg)
Key Terms and Concepts• Speedup : Relative reduction of execution time of a fixed size
workload through parallel execution
• Efficiency : Ratio of the actual performance to the best possible performance.
5
processorsNontimeexecutionprocessoroneontimeexecutionSpeedup
________
)______(____
processorsofnumberprocessorsmultipleontimeexecutionprocessoroneontimeexecutionEfficiency
![Page 6: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/6.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
6
![Page 7: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/7.jpg)
Defining the 3 C’s …
• Main Classes of computing :– High capacity parallel computing : A strategy for
employing distributed computing resources to achieve high throughput processing among decoupled tasks. Aggregate performance of the total system is high if sufficient tasks are available to be carried out concurrently on all separate processing elements. No single task is accelerated.
– High capability parallel computing : A strategy for employing tightly couple structures of computing resources to achieve reduced execution time of a given application through partitioning in to concurrently executable tasks.
– Cooperative computing : A strategy for employing moderately coupled ensemble of computing resources to increase size of the data set of a user application while limiting its execution time.
7
![Page 8: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/8.jpg)
Defining the 3 C’s …
• High capacity computing systems emphasize the overall work performed over a fixed time period. Work is defined as the aggregate amount of computation performed across all functional units, all threads, all cores, all chips, all coprocessors and network interface cards in the system.
• High capability computing systems emphasize improvement (reduction) in execution time of a single user application program of fixed data set size.
8Adapted from : High-performance throughput computing S Chaudhry, P Caprioli, S Yip, M Tremblay - IEEE Micro, 2005 - doi.ieeecomputersociety.org
![Page 9: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/9.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
9
![Page 10: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/10.jpg)
Models of Parallel Processing • Conventional models of parallel processing
– Decoupled Work Queue (covered in segment 1)– Shared memory multiple thread (covered in segment 2)– Communicating Sequential Processing (CSP message passing)
(covered in segment 3)
• Alternative models of parallel processing– SIMD
• Single instruction stream multiple data stream processor array– Vector Machines
• Hardware execution of value sequences to exploit pipelining– Systolic
• An interconnection of basic arithmetic units to match algorithm– Data Flow
• Data precedent constraint self-synchronizing fine grain execution units supporting functional (single assignment) execution
10
![Page 11: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/11.jpg)
Decoupled Work Queue Model
• Concurrent disjoint tasks • Parametric Studies
– SPMD (single program multiple data)• Very coarse grained• Example software package : Condor• Processor farms and clusters • Last part of Segment 1 covers this
model of parallelism
11
![Page 12: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/12.jpg)
Shared memory multiple Thread
• Static or dynamic• Fine Grained• OpenMP• Distributed shared memory systems• Covered in Segment 2
12
Network
CPU 1 CPU 2 CPU 3
memory memory memory
Network
CPU 1 CPU 2 CPU 3
memory memory memory
Symmetric Multi Processor (SMP usually cache coherent )
Distributed Shared Memory (DSM often not cache coherent)
![Page 13: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/13.jpg)
Caches and Cache Coherence
• Caches are part of memory hierarchy• Match high processor demand to high capacity, long
access time main memory• Caches are low capacity (relatively) short access
time• Caches hold temporary copies of data in memory but
they can’t hold all of it at any one time• Processors may share memory but caches are
private• Cache coherence keeps copies of the same data
consistent across caches of different processors• This is tough to do, slows things down, and doesn’t
scale very well
SID
E B
AR
Dis
cuss
ion
![Page 14: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/14.jpg)
Shared Memory Multiple Thread Model
14
• Hardware view : All system memory directly accessible from all processing elements, possibly with cache coherence. Instruction streams performed concurrently, possibly switching contexts for shared resources.
• Software view : A flow of control in a process that can have concurrent execution paths. Threads share the same address space and have self contained state information. Synchronization achieved through shared variables in memory.
• Advantages : – Threads are inexpensive to create, represent and destroy. – Relatively faster to switch context between threads than processes. – Better resource management due to shared address spaces utilization
• Disadvantages :– Cache coherent and symmetric multiprocessor systems not scalable– Scalable systems are none uniform memory access (NUMA)– System call through threads can potentially block the process and consequently
degrading CPU utilization.• More in-depth coverage on this topic in segment 2 of the course
![Page 15: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/15.jpg)
Communicating Sequential Processes
• One process is assigned to each processor
• Work done by the processor is performed on the local data
• Data values are exchanged by messages• Synchronization constructs for inter
process coordination• Distributed Memory• Coarse Grained• MPI• Clusters and MPP
– MPP is acronym for “Massively Parallel Processor”
• Covered in Segment 3
15
Network
CPU 1 CPU 2 CPU 3
memory memory memory
Distributed Memory (DM often not cache coherent)
![Page 16: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/16.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
16
![Page 17: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/17.jpg)
Ideal Speedup Example
17
W
220
w1 w210 210
P28
210 210 210 210
Processors
212
P1
T(1)=220
T(28)=212
812
20
222
Speedup
1222
2 0812
20
Efficiency
Units : steps
i
iwW
![Page 18: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/18.jpg)
Ideal Speedup Issues
18
• W is total workload measured in elemental pieces of work (e.g. operations, instructions, etc.)
• T(p) is total execution time measured in elemental time steps (e.g. clock cycles) where p is # of execution sites (e.g. processors, threads)
• wi is work for a given task i• Example: here we divide a million (really Mega)
operation workload, W, in to a thousand tasks, w1 to w1024 each of a 1 K operations
• Assume 256 processors performing workload in parallel
• T(256) = 4096 steps, speedup = 256, Eff = 1
![Page 19: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/19.jpg)
Granularities in ParallelismOverhead
• The additional work that needs to be performed in order to manage the parallel resources and concurrent abstract tasks that is in the critical time path.
Coarse Grained• Decompose problem into large independent
tasks. Usually there is no communication between the tasks. Also defined as a class of parallelism where: “relatively large amounts of computational work is done between communication”
Fine Grained • Decompose problem into smaller inter-
dependent tasks. Usually these tasks are usually communication intensive. Also defined as a class of parallelism where: “relatively small amounts of computational work is done between communication events” –www.llnl.gov/computing/tutorials/parallel_comp
19Images adapted from : http://www.mhpcc.edu/training/workshop/parallel_intro/
Overhead
Computation
Coarse Grained
Overhead
Computation
Finely Grained
![Page 20: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/20.jpg)
Overhead
20
v w
W=4v+4w
PWwi
wvT
PW
P
WPP
PW
W
PWW
TTS
P
1
11
v = overheadw = work unitW = Total workTi = execution time with i processorsP = # processors
P
iiwW
1
Assumption : Workload is infinitely divisible
![Page 21: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/21.jpg)
Overhead
21
• Overhead: Additional critical path (in time) work required to manage parallel resources and concurrent tasks that would not be necessary for purely sequential execution
• V is total overhead of workload execution• vi is overhead for individual task wi
• Each task takes v+w time steps to complete• Overhead imposes upper bound on scalability
![Page 22: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/22.jpg)
Scalability & Overhead
22
gg
P
gP
gg
ggP
gg
wv
P
wv
PW
WTTS
wv
PWT
wv
PWvw
PwWvw
PJT
WvWT
wW
wWtasksJ
11
1
1)(
#
1
1 when W >> v
v = overheadwg = work unitW = Total workTi = execution time with i processorsP = # ProcessorsJ = # Tasks
![Page 23: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/23.jpg)
Scalability and Overhead for fixed sized work tasks
23
• W is divided in to J wg sized tasks• Each task requires v overhead work to manage• For P processors there are approximates J/P tasks
to be performed in sequence so,• TP is J(wg + v)/P
• Note that S = T1 / TP
• So, S = P / (1 + v / wg)
![Page 24: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/24.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
24
![Page 25: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/25.jpg)
Capacity Computing with basic Unix tools
• Combination of common Unix utilities such as ssh, scp, rsh, rcp can be used to remotely create jobs. ( to get more information about these commands try man ssh, man scp, man rsh, man rcp on any Unix shell )
• For small workloads it can be convenient to translate the execution of the program into a simple shell script.
• Relying on simple Unix utilities poses several application management constraints for cases such as :– Aborting started jobs – Querying for free machines– Querying for job status – Retrieving job results – etc..
25
![Page 26: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/26.jpg)
Demo 1
Using Unix utilities to executing Capability Computing
26
![Page 27: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/27.jpg)
BOINC , Seti@Home• BOINC (Berkley Open Infrastructure for Network Computing)• Opensource software that enables distributed coarse grained
computations over the internet. • Follows the Master-Worker model, in BOINC : no communication
takes place among the worker nodes • SETI@Home• Einstein@Home• Climate prediction• And many more…
27
![Page 28: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/28.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
28
![Page 29: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/29.jpg)
Management Middleware : Condor
• Designed, developed and maintained at University of Wisconsin Madison by a team lead by Miron Livny
• Condor is a versatile workload management system for managing pool of distributed computing resources to provide high capacity computing.
• Assists job management by providing mechanisms for job queuing, scheduling, priority management, tools that facilitate utilization of resources across Condor pools
• Condor also enables resource management by providing monitoring utilities, authentication & authorization mechanisms, condor pool management utilities and support for Grid Computing middlewares such as Globus.
• Condor Components• ClassAds• Matchmaker• Problem Solvers
29
![Page 30: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/30.jpg)
Condor Components : Class Ads• ClassAds (Classified Advertisements) concept is very
similar to the newspaper classifieds concepts where buyers and sellers advertise their products using abstract yet uniquely defining named expressions. Example : Used Car Sales
• ClassAds language in Condor provides well defined means of describing the User Job and the end resources ( storage / computational ) so that the Condor MatchMaker can match the job with the appropriate pool of resources.
Management Middleware : Condor
Src : Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and Computation: Practice and
Experience, Vol. 17, No. 2-4, pages 323-356, February-April, 2005.http://www.cs.wisc.edu/condor/doc/condor-practice.pdf 30
![Page 31: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/31.jpg)
Job ClassAd & Machine ClassAd
![Page 32: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/32.jpg)
Condor MatchMaker• MatchMaker, a crucial part of the Condor
architecture, uses the job description classAd provided by the user and matches the Job to the best resource based on the Machine description classAd
• MatchMaking in Condor is performed in 4 steps : 1. Job Agent (A) and resources (R) advertise themselves.2. Matchmaker (M) processes the known classAds and
generates pairs that best match resources and jobs3. Matchmaker informs each party of the job-resource pair of
their prospective match. 4. The Job agent and resource establish connection for further
processing. (Matchmaker plays no role in this step, thus ensuring separation between selection of resources and subsequent activities)
Management Middleware : Condor
Src : Douglas Thain, Todd Tannenbaum, and Miron Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and
Computation: Practice and Experience, Vol. 17, No. 2-4, pages 323-356, February-April, 2005.
http://www.cs.wisc.edu/condor/doc/condor-practice.pdf
32
![Page 33: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/33.jpg)
Condor Problem Solvers• Master-Worker (MW) is a problem solving system that is
useful for solving a coarse grained problem of indeterminate size such as parameter sweep etc.
• The MW Solver in Condor consists of 3 main components : work-list, a tracking module, and a steering module. The work-list keeps track of all pending work that master needs done. The tracking module monitors progress of work currently in progress on the worker nodes. The steering module directs computation based on results gathered and the pending work-list and communicates with the matchmaker to obtain additional worker processes.
• DAGMan is used to execute multiple jobs that have dependencies represented as a Directed Acyclic Graph where the nodes correspond to the jobs and edges correspond to the dependencies between the jobs. DAGMan provides various functionalities for job monitoring and fault tolerance via creation of rescue DAGs.
Management Middleware : Condor
33
Master
w1 w..N
![Page 34: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/34.jpg)
Core components of Condor• condor_master: This program runs constantly and ensures that all other parts of Condor are
running. If they hang or crash, it restarts them. • condor_collector: This program is part of the Condor central manager. It collects information
about all computers in the pool as well as which users want to run jobs. It is what normally responds to the condor_status command. It's not running on your computer, but on the main Condor pool host (Celeritas head node).
• condor_negotiator: This program is part of the Condor central manager. It decides what jobs should be run where. It's not running on your computer, but on on the main Condor pool host (Celeritas head node).
• condor_startd: If this program is running, it allows jobs to be started up on this computer--that is, hal is an "execute machine". This advertises hal to the central manager (more on that later) so that it knows about this computer. It will start up the jobs that run.
• condor_schedd If this program is running, it allows jobs to be submitted from this computer--that is, hal is a "submit machine". This will advertise jobs to the central manager so that it knows about them. It will contact a condor_startd on other execute machines for each job that needs to be started.
• condor_shadow For each job that has been submitted from this computer, there is one condor_shadow running. It will watch over the job as it runs remotely. In some cases it will provide some assistance You may or may not see any condor_shadow processes running, depending on what is happening on the computer when you try it out.
35Source : http://www.cs.wisc.edu/condor/tutorials/cw2005-condor/intro.html
![Page 35: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/35.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
36
![Page 36: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/36.jpg)
Condor : A Walkthrough of Condor commands
condor_status : provides current pool statuscondor_q : provides current job queuecondor_submit : submit a job to condor poolcondor_rm : delete a job from job queue
37
![Page 37: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/37.jpg)
What machines are available ? (condor_status)
condor_status queries resource information sources and provides the current status of the condor pool of resources
38
Some common condor_status command line options : -help : displays usage information -avail : queries condor_startd ads and prints information about available
resources -claimed : queries condor_startd ads and prints information about
claimed resources -ckptsrvr : queries condor_ckpt_server ads and display checkpoint
server attributes -pool hostname queries the specified central manager (by default
queries $COLLECTOR_HOST) -verbose : displays entire classads For more options and what they do run “condor_status –help”
![Page 38: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/38.jpg)
condor_status : Resource States• Owner : The machine is currently being utilized by a
user. The machine is currently unavailable for jobs submitted by condor until the current user job completes.
• Claimed : Condor has selected the machine for use by other users.
• Unclaimed : Machine is unused and is available for selection by condor.
• Matched : Machine is in a transition state between unclaimed and claimed
• Preempting : Machine is currently vacating the resource to make it available to condor.
39
![Page 39: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/39.jpg)
Example : condor_status
40
[cdekate@celeritas ~]$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
vm1@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:42:23vm2@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:42:24vm3@compute-0 LINUX X86_64 Unclaimed Idle 0.010 1964 0+00:45:06vm4@compute-0 LINUX X86_64 Owner Idle 1.000 1964 0+00:00:07vm1@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:42:25vm2@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 1+09:05:58vm3@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:37:27vm4@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 0+00:05:07……vm3@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:42:33vm4@compute-0 LINUX X86_64 Unclaimed Idle 0.000 1964 3+13:42:34
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 32 3 0 29 0 0 0
Total 32 3 0 29 0 0 0
![Page 40: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/40.jpg)
What jobs are currently in the queue? condor_q
• condor_q provides a list of job that have been submitted to the Condor pool
• Provides details about jobs including which cluster the job is running on, owner of the job, memory consumption, the name of the executable being processed, current state of the job, when the job was submitted and how long has the job been running.
41
Some common condor_q command line options : -global : queries all job queues in the pool -name : queries based on the schedd name provides a queue listing of the
named schedd -claimed : queries condor_startd ads and prints information about claimed
resources -goodput : displays job goodput statistics (“goodput is the allocation time
when an application uses a remote workstation to make forward progress.” – Condor Manual)
-cputime : displays the remote CPU time accumulated by the job to date... For more options run : “condor_q –help”
![Page 41: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/41.jpg)
[cdekate@celeritas ~]$ condor_q
-- Submitter: celeritas.cct.lsu.edu : <130.39.128.68:40472> : celeritas.cct.lsu.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 30.0 cdekate 1/23 07:52 0+00:01:13 R 0 9.8 fib 100 30.1 cdekate 1/23 07:52 0+00:01:09 R 0 9.8 fib 100 30.2 cdekate 1/23 07:52 0+00:01:07 R 0 9.8 fib 100 30.3 cdekate 1/23 07:52 0+00:01:11 R 0 9.8 fib 100 30.4 cdekate 1/23 07:52 0+00:01:05 R 0 9.8 fib 100
5 jobs; 0 idle, 5 running, 0 held[cdekate@celeritas ~]$
42
Example : condor_q
![Page 42: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/42.jpg)
How to submit your Job ? condor_submit
• Create a job classAd (condor submit file) that contains Condor keywords and user configured values for the keywords.
• Submit the job classAd using “condor_submit” • Example :
condor_submit matrix.submit• condor_submit –h provides additional flags
43
[cdekate@celeritas NPB3.2-MPI]$ condor_submit -hUsage: condor_submit [options] [cmdfile] Valid options: -verbose verbose output -name <name> submit to the specified schedd -remote <name> submit to the specified remote schedd (implies -spool) -append <line> add line to submit file before processing (overrides submit file; multiple -a lines ok) -disable disable file permission checks -spool spool all files to the schedd -password <password> specify password to MyProxy server -pool <host> Use host as the central manager to query If [cmdfile] is omitted, input is read from stdin
![Page 43: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/43.jpg)
condor_submit : Example
44
[cdekate@celeritas ~]$ condor_submit fib.submit Submitting job(s).....Logging submit event(s).....5 job(s) submitted to cluster 35.[cdekate@celeritas ~]$ condor_q
-- Submitter: celeritas.cct.lsu.edu : <130.39.128.68:51675> : celeritas.cct.lsu.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 35.0 cdekate 1/24 15:06 0+00:00:00 I 0 9.8 fib 10 35.1 cdekate 1/24 15:06 0+00:00:00 I 0 9.8 fib 15 35.2 cdekate 1/24 15:06 0+00:00:00 I 0 9.8 fib 20 35.3 cdekate 1/24 15:06 0+00:00:00 I 0 9.8 fib 25 35.4 cdekate 1/24 15:06 0+00:00:00 I 0 9.8 fib 30
5 jobs; 5 idle, 0 running, 0 held[cdekate@celeritas ~]$
![Page 44: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/44.jpg)
How to delete a submitted job ? condor_rm
• condor_rm : Deletes one or more jobs from Condor job pool. If a particular Condor pool is specified as one of the arguments then the condor_schedd matching the specification is contacted for job deletion, else the local condor_schedd is contacted.
45
[cdekate@celeritas ~]$ condor_rm -h Usage: condor_rm [options] [constraints] where [options] is zero or more of: -help Display this message and exit -version Display version information and exit -name schedd_name Connect to the given schedd -pool hostname Use the given central manager to find daemons -addr <ip:port> Connect directly to the given "sinful string" -reason reason Use the given RemoveReason -forcex Force the immediate local removal of jobs in the X state (only affects jobs already being removed) and where [constraints] is one or more of: cluster.proc Remove the given job cluster Remove the given cluster of jobs user Remove all jobs owned by user -constraint expr Remove all jobs matching the boolean expression -all Remove all jobs (cannot be used with other constraints)[cdekate@celeritas ~]$
![Page 45: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/45.jpg)
[cdekate@celeritas ~]$ condor_q-- Submitter: celeritas.cct.lsu.edu : <130.39.128.68:51675> :
celeritas.cct.lsu.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 41.0 cdekate 1/24 15:43 0+00:00:03 R 0 9.8 fib 100 41.1 cdekate 1/24 15:43 0+00:00:01 R 0 9.8 fib 150 41.2 cdekate 1/24 15:43 0+00:00:00 R 0 9.8 fib 200 41.3 cdekate 1/24 15:43 0+00:00:00 R 0 9.8 fib 250 41.4 cdekate 1/24 15:43 0+00:00:00 R 0 9.8 fib 300
5 jobs; 0 idle, 5 running, 0 held[cdekate@celeritas ~]$ condor_rm 41.4Job 41.4 marked for removal[cdekate@celeritas ~]$ condor_rm 41 Cluster 41 has been marked for removal.[cdekate@celeritas ~]$
46
condor_rm : Example
![Page 46: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/46.jpg)
Creating Condor submit file ( Job a ClassAd )
• Condor submit file contains key-value pairs that help describe the application to condor.
• Condor submit files are job ClassAds. • Some of the common descriptions found in the job
ClassAds are :
47
executable = (path to the executable to run on Condor)input = (standard input provided as a file)output = (standard output stored in a filelog = (output to log file )arguments = (arguments to be supplied to the )queue
![Page 47: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/47.jpg)
DEMO 2 : Steps involved in running a job on Condor.
1. Creating a Condor submit file2. Submitting the Condor submit file to a
Condor pool3. Checking the current state of a
submitted job4. Job status Notification
48
![Page 48: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/48.jpg)
Condor Usage Statistics
49
![Page 49: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/49.jpg)
Montage workload implemented and executed using Condor ( Source : Dr. Dan Katz )
• Mosaicking astronomical images : • Powerful Telescopes taking high resolution (and highest zoom) pictures of the sky can cover small region over time• Problem being solved in this project is “stitching” these images together to make a high-resolution zoomed in snapshot of the sky.• Aggregate requirements of 140000 CPU hours (~16 years on a single machine) output ranging in the order of 6 TeraBytes
50
Example DAG for 10 input files
mAdd
mBackground
mBgModel
mProject
mDiff
mFitPlane
mConcatFit
Data Stage-in nodes
Montage compute nodes
Data stage-out nodes
Registration nodes
Pegasus
Grid Information Systems
Information about available resources,
data location
Grid
Condor DAGMan
Maps an abstract workflow to an executable form
Executes the workflow
MyProxy
User’s grid credentials
http://pegasus.isi.edu/
![Page 50: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/50.jpg)
Montage Use By IPHAS: The INT/WFC Photometric H-alpha Survey of the Northern Galactic Plane (Source : Dr. Dan Katz)
Supernova remnant S147
Nebulosity in vicinity of HII region, IC 1396B, in Cepheus
Crescent Nebula NGC 6888
Study extreme phases of stellar
evolution that involve very large
mass loss
![Page 51: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/51.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
52
![Page 52: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/52.jpg)
53
• Throughput computing• Performance measured as total workload performed over time
to complete• Overhead factors
– Start up time– Input data distribution– Output result data collection– Terminate time– No task coupling or inter-task coordination overhead
• Starvation– Insufficient work to keep all processors busy– Inadequate parallelism of coarse grained task parallelism– Poor or uneven load distribution
Capacity Computing Performance Issues
![Page 53: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/53.jpg)
Topics
• Key terms and concepts• Basic definitions• Models of parallelism• Speedup and Overhead• Capability Computing & Unix utilities• Condor : Overview• Condor : Useful commands• Performance Issues in Capacity Computing• Material for Test
54
![Page 54: High Performance Computing: Concepts, Methods & Means High Capacity (Throughput) Computing](https://reader036.fdocuments.in/reader036/viewer/2022062521/5681681b550346895ddda9b6/html5/thumbnails/54.jpg)
Summary : Material for the Test• Understand material on slide (4,5),(7,8)• Understand example detailed in slides 17, 18• Understand (19) and be able to derive (20,21), (22,
23)• Understand Condor concepts detailed in slides
30,31,32• Condor Commands (37-47) : know what the basic
commands are, what they do and interpret output presented by them etc. (No need to memorize command-line options)
• Understand issues listed on slide 53• Required reading materials :
– http://www.cct.lsu.edu/~cdekate/7600/beowulf-chapter-rev1.pdf– Specific pages to focus on : 3-16 55