Introduction to Supercomputing at ARSC
Transcript of Introduction to Supercomputing at ARSC
![Page 1: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/1.jpg)
1
Introduction toSupercomputing at
ARSC
Kate Hedstrom,
Arctic Region SupercomputingCenter (ARSC)
Jan, 2004
![Page 2: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/2.jpg)
2
Topics• Introduction to Supercomputers at ARSC
– Computers
• Accounts– Getting an account– Kerberos– Getting help
• Architectures of parallel computers– Programming models
• Running Jobs– Compilers– Storage– Interactive and batch
![Page 3: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/3.jpg)
3
Introduction to ARSCSupercomputers
• They’re all Parallel Computers
• Three Classes:– Shared Memory
– Distributed Memory
– Distributed & Shared Memory
![Page 4: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/4.jpg)
4
Cray X1: klondike
• 128 MSPs
• 4 MSP/node
• 4 Vector CPU/MSP,
800 MHz
• 512 GB Total
• 21 TB Disk
• 1600 GFLOPS peak
•NAC required
![Page 5: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/5.jpg)
5
Cray SX-6: rime
• 8 500MHz NECVector CPUs
• 64 GB of sharedmemory
• 1 TB RAID-5 Disk
• 64 GFLOPS peak
• Only one in theUSA
• On loan from Cray
• Non-NAC
![Page 6: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/6.jpg)
6
Cray SV1ex: chilkoot
• 32 VectorCPUs, 500 MHz
• 32 GB Sharedmemory
• 2 TB Disk
• 64 GFLOPSpeak
• NAC required
![Page 7: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/7.jpg)
7
Cray T3E: yukon
• 272 CPUs, 450MHz
• 256 MB perprocessor
• 69.6 GB totaldistributedmemory
• 230 GFLOPSpeak
• NAC required
![Page 8: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/8.jpg)
8
IBM Power4: iceberg
• 2 nodes of 32 p690+s,
1.7 GHz (2 cabinets)256 GB each
• 92 nodes of 8 p655+s,
1.5 GHz (6 cabinets)
• 6 nodes of 8 p655s1.1 GHz (1 cabinet)
• 16 GB Mem/Node
• 22 TB Disk
• 5000 GFLOPS
• NAC required
![Page 9: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/9.jpg)
9
IBM Regatta: iceflyer• 8-way, 16GB front
end coming soon
• 32 1.7 GHz Power4CPUs in– 24-way SMP node– 7-way interactive node– 1 test node– 32-way SMP node soon
• 256 GB Memory
• 217 GFLOPS
• Non-NAC
![Page 10: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/10.jpg)
10
IBM SP Power3: icehawk• 50 4-Way SMP Nodes
=> 200 CPUs, 375MHz
• 2 GB Memory/Node
• 36 GB Disk/Node
• 264 GFLOPS peak for176 CPUs (max perjob)
• Leaving soon
• NAC required
![Page 11: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/11.jpg)
11
Storing Files
• Robotictape silos
• Two Sunstorageservers
• Nanook
– Non-NACsystems
• Seawolf– NAC
systems
![Page 12: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/12.jpg)
12
Accounts, Logging In
• Getting an Account/Project
• Doing a NAC
• Logging in with Kerberos
![Page 13: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/13.jpg)
13
Getting an Account/Project
• Academic Applicant for resources is a PI:– Full time faculty or staff research person
– Non-commercial work, must reside in USA
– PI may add users to their project– http://www.arsc.edu/support/accounts/acquire.html
• DoD Applicant– http://www.hpcmo.hpc.mil/Htdocs/SAAA
• Commercial, Federal, State– Contact User Services Director
– Barbara Horner-Miller, [email protected]
– Academic guidelines apply
![Page 14: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/14.jpg)
14
Doing a National AgencyCheck (NAC)
• Required for HPCMO Resources only– Not required for workstations, Cray SX-6, or IBM Regatta
• Not a security clearance– But there are detailed questions covering last 5-7 years
• Electronic Personnel Security Questionnaire (EPSQ)– Windows only software
• Fill out EPSQ cover sheet– http://www.arsc.edu/support/policy/pdf/OPM_Cover.pdf
• Fingerprinting, Proof of Citizenship (passport, visa, etc.)– See http://www.arsc.edu/support/policy/accesspolicy.html
![Page 15: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/15.jpg)
15
Logging in with Kerberos
• On non-ARSC systems, download kerberos5client– http://www.arsc.edu/support/howtos/krbclients.html
• Used with SecureID– Uses a pin to generate a key at login time
• Login requires user name, pass phrase, & key– Don’t share your pin or SecureID with anyone
• Foreign Nationals or others with problems– Contact ARSC to use ssh to connect to ARSC gateway– Still need Kerberos & SecureID after connecting
![Page 16: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/16.jpg)
16
SecureID
![Page 17: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/17.jpg)
17
From ARSC System
• Enter username
• Enter <return> for principle
• Enter pass phrase
• Enter SecureID passcode
• From that system:ssh iceflyer
• ssh handles X11 handshaking
From ARSC System
![Page 18: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/18.jpg)
18
From Your System
• Get Kerberos clients installed
• Get ticketkinit [email protected]
• See ticketsklist
• Login into arsc systemkrlogin -l username iceflyerssh -l username iceflyerktelnet -l username iceflyer
![Page 19: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/19.jpg)
19
Rime and Rimegate
• Log into rimegate as usual, withyour rimegate username (arscxxx)ssh -l arscksh rimegate
• Compile on rimegate (sxf90, sxc++)
• Log into rime from rimegatessh rime
• Rimegate $HOME is
/rimegate/users/username on rime
![Page 20: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/20.jpg)
20
SupercomputerArchitectures
• They’re all Parallel Computers
• Three Classes:– Shared Memory
– Distributed Memory
– Distributed & Shared Memory
![Page 21: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/21.jpg)
21
Shared Memory ArchitectureCray SV1, SX-6, IBM Regatta
![Page 22: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/22.jpg)
22
Distributed Memory
Architecture Cray T3E
![Page 23: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/23.jpg)
23
Cluster ArchitectureIBM iceberg, icehawk, Cray X1
• Scalable, distributed,shared-memoryparallel processor
![Page 24: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/24.jpg)
24
Programming Models
• Vector Processing– compiler detection or manual directives
• Threaded Processing (SMP)– OpenMP, Pthreads, java threads– shared memory only
• Distributed Processing (MPP)– message passing with MPI– shared or distributed memory
![Page 25: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/25.jpg)
25
Vector Programming
• Vector CPUs are specialized forarray/matrix operations– 64-element (SV1, X1), 256-element (SX-6) Vector Registers
– Operations proceed assembly-line fashion
– High memory-to-CPU bandwidth
• Less CPU time wasted waiting for data from memory
– Once loaded, produces one result per clock cycle
• Compiler does a lot of the work
![Page 26: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/26.jpg)
26
Vector Programming
• Codes will run without modification.
• Cray compilers automatically detect loopswhich are safe to vectorize.
• Request listing file to find out whatvectorized.
• Programmer can assist the compiler:– Directives and pragmas can force vectorization– Eliminate conditions which inhibit vectorization (e.g.,
subroutine calls and data dependencies in loops)
![Page 27: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/27.jpg)
27
Threaded Programming onShared-Memory Systems
• OpenMP– Directives/pragmas added to serial programs
– A portable standard implemented on Cray (one node),SGI, IBM (one node), etc...
• Other Threaded Paradigms– Java Threads
– Pthreads
![Page 28: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/28.jpg)
28
OpenMP Fortran Example!$omp parallel do do n = 1,10000 A(n) = x * B(n) + c end do___________________________________________________
On 2 CPUS, this On 2 CPUS, this pragma pragma divides work as follows:divides work as follows:CPU 1:CPU 1:
do n = 1,5000 A(n) = x * B(n) + c end do
CPU 2:
do n = 5001,10000 A(n) = x * B(n) + c end do
![Page 29: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/29.jpg)
29
OpenMP C Example
#pragma omp parallel forfor (n = 0; n < 10000; n++) A[n] = x * B[n] + c;___________________________________________________
On 2 CPUS, this On 2 CPUS, this pragma pragma divides work as follows:divides work as follows:CPU 1:CPU 1:
for (n = 0; n < 5000; n++) A[n] = x * B[n] + c;CPU 2:
for (n = 5000; n < 10000; n++) A[n] = x * B[n] + c;
![Page 30: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/30.jpg)
30
Threads DynamicallyAppear and DisappearNumber set by Environment
![Page 31: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/31.jpg)
31
Distributed Processing
Concept:
1) Divide theproblemexplicitly
2) CPUs Performtasksconcurrently
3) Recombineresults
4) All processorsmay or may notbe doing thesame thing
Branimir Gjetvaj
![Page 32: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/32.jpg)
32
Distributed Processing
• Data needed by a given CPU must bestored in the memory associated with thatCPU
• Performed on distributed or sharedmemory computer
• Multiple copies of code are running
• Messages/data are passed between CPUs
• Multi-level: can be combined with vectorand/or OpenMP
![Page 33: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/33.jpg)
33
• Initialization
• Simple send/receive
! Processor 0 sends individual messages to others if (my_rank == 0) then do dest = 1, npes-1 call mpi_send(x, max_size, MPI_FLOAT, dest, 0, comm, ierr); end do else call mpi_recv(x, max_size, MPI_FLOAT, 0, 0, comm, status, ierr); end if
call mpi_init(ierror) call mpi_comm_size (MPI_COMM_WORLD, npes, ierror); call mpi_comm_rank (MPI_COMM_WORLD, my_rank, ierror);
Distributed Processingusing MPI (Fortran)
![Page 34: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/34.jpg)
34
• Initialization
• Simple send/receive
/* Processor 0 sends individual messages to others */ if (my_rank == 0) { for (dest = 1; dest < npes; dest++) { MPI_Send(x, max_size, MPI_FLOAT, dest, 0, comm); } } else { MPI_Recv(x, max_size, MPI_FLOAT, 0, 0, comm, &status); }
MPI_Init(&argc, &argv); MPI_Comm_size (MPI_COMM_WORLD, &npes); MPI_Comm_rank (MPI_COMM_WORLD, &my_rank);
Distributed Processingusing MPI (C)
![Page 35: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/35.jpg)
35
Number of ProcessesConstant
Number set by Environment
![Page 36: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/36.jpg)
36
Message Passing ActivityExample
![Page 37: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/37.jpg)
37
Cluster Programming
• Shared-memory between processorson one node:– OpenMP, threads, or MPI
• Distributed-memory methods betweenprocessors on multiple nodes– MPI
• Mixed mode– MPI distributes to nodes, OpenMP within node
![Page 38: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/38.jpg)
38
Programming Environments
• Compilers
• File Systems
• Running jobs
– Interactive
– Batch
• See individual machine documentation
– http://www.arsc.edu/support/resources/hardware.html
![Page 39: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/39.jpg)
39
Cray Compilers• SV1, T3E
– f90, cc, CC
• X1– ftn, cc, CC
• SX-6 front end (rimegate)– sxf90, sxc++
• SX-6 (rime)
– f90, cc, c++
• No extra flags for MPI, OpenMP
![Page 40: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/40.jpg)
40
IBM Compilers• Serial
– xlf, xlf90, xlf95, xlc, xlC
• OpenMP
– Add -qsmp=omp, _r extension for thread-safe libraries, e.g. xlf_r
• MPI
– mpxlf, mpxlf90, mpxlf95, mpcc, mpCC
• Might be best to always use _rextension (mpxlf90_r)
![Page 41: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/41.jpg)
41
File Systems• Local storage
– $HOME– /tmp or /wrktmp or /wrkdir -> $WRKDIR– /scratch -> $SCRATCH
• Permanent storage– $ARCHIVE
• Quotas– quota -v on Cray– qcheck on IBM
![Page 42: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/42.jpg)
42
Running a job
• Get files from $ARCHIVE to system’sdisk
• Keep source in $HOME, but run in$WRKDIR
• Use $SCRATCH for local-to-nodetemporary files, clean up before job ends
• Put results out to $ARCHIVE
• $WRKDIR is purged
![Page 43: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/43.jpg)
43
Iceflyer Filesystems
• Smallish $HOME
• Larger /wrkdir/username
• $ARCHIVE for longterm storage,especially larger files
• qcheck to check quotas
![Page 44: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/44.jpg)
44
SX6 Filesystems
• Separate from the rest of ARSCsystems
• Rimegate has /home, /scratch
• Rime mounts them as/rimegate/home, /rimegate/scratch
• Rime has own home, /tmp, /atmp,etc.
![Page 45: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/45.jpg)
45
Interactive
• Works on the command line
• Limits exist on resources(time, # cpus, memory)
• Good for debugging
• Larger jobs must be submitted to thebatch system
![Page 46: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/46.jpg)
46
Batch Schedulers
• Cray: NQS– Commands:
• qsub, qstat, qdel
• IBM: LoadLeveler– Commands:
• llclass, llq, llsubmit, llcancel, llmap, xloadl
![Page 47: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/47.jpg)
47
NQS Script (rime)
#@$-q batch # job queue class#@$-s /bin/ksh # which shell#@$-eo # stdout and stderr together#@$-lM 100 MW#@$-lT 30:00 # time requested h:m:s#@$-c 8 # 8 cpus#@$ # required last command
# beginning of shell script
cd $QSUB_WORKDIR # cd to submission directory
export F_PROGINF=DETAILexport OMP_NUM_THREADS=8
./my_job
![Page 48: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/48.jpg)
48
NQS Commands
• qstat to find out job status, list ofqueues
• qsub to submit job
• qdel to delete job from queue
![Page 49: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/49.jpg)
49
LoadLeveler Script (iceflyer)
#!/bin/ksh#@ total_tasks = 4#@ node_usage = shared#@ wall_clock_limit = 1:00:00#@ job_type = parallel#@ output = out.$(jobid)#@ error = err.$(jobid)#@ class = large#@ notification = error#@ queue
poe ./my_job
![Page 50: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/50.jpg)
50
Loadleveler Commands
• llclass to find list of classes
• llq to see list of jobs in queue
• llsubmit to submit job
• llcancel to delete job from queue
• llmap is local program to see loadon machine
• xloadl X11 interface to loadleveler
![Page 51: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/51.jpg)
51
Getting Help
• Consultants and Specialists are here toserve YOU
– 907-474-5102• http://www.arsc.edu/support/support.html
![Page 52: Introduction to Supercomputing at ARSC](https://reader031.fdocuments.in/reader031/viewer/2022021019/62045bb50e8a6d57d10e8ef7/html5/thumbnails/52.jpg)
52
Homework
• Make sure you can log into
– iceflyer
– rimegate
– rime
• Ask consultants for help ifnecessary