Magellan @ NERSC Jeff Broughton System Department Head, NERSC March 10, 2010
Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S...
-
Upload
doreen-foster -
Category
Documents
-
view
220 -
download
1
Transcript of Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S...
![Page 1: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/1.jpg)
Profiling Tools on the NERSC Crays and IBM/SP
NERSC User Services
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
![Page 2: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/2.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
2
Outline
• Profiling Tools on NERSC platforms
– Cray PVP (killeen, seymour)
– Cray T3E (mcurie)
– IBM/SP (gseaborg)
• UNIX profiling/performance analysis tools
• References
![Page 3: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/3.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
3
Why Profile?
• Characterise application :
– Is code cpu bound?
– Is code I/O bound?
– Is code memory bound?
– Analyse communication patterns - D.M. codes
• Focus optimisation effort ... and ultimately..
• Improve performance and resource utilisation
![Page 4: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/4.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
4
Cray PVP/T3E - Application Characterization
• Job accounting (ja) • ja
• ./a.out
• ja -st -n a.out - see next slide for sample output
• Look out for :• Maximum Memory Used > available memory
• Total I/O wait time (locked+unlocked) > 50% User CPU time
• Multitasking breakdown for parallel codes
![Page 5: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/5.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
5
Job accounting : summary reportElapsed Time : 8 Seconds User CPU Time : 35.5939 Seconds Multitasking/ Multistreaming Breakdown (Concurrent CPUs * Connect seconds = CPU seconds)
1 * 0.0100 = 0.0100 2 * 0.0100 = 0.0200 3 * 0.0600 = 0.1800 4 * 8.8500 = 35.4000
(Avg.) (total) (total) 3.99 * 8.9300 = 35.6100
System CPU Time : 0.1226 Seconds I/O Wait Time (Locked) : 0.0000 I/O Wait Time (Unlocked) : 0.0000CPU Time Memory Integral : 5.3854 Mword-seconds Data Transferred : 0.0001 MWords Maximum memory used : 0.4746 MWords
![Page 6: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/6.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
6
HPM - Hardware Performance HPM - Hardware Performance MonitorMonitor
• Helps locate CPU related code bottlenecks• reports use of vector registers, instruction buffers,
memory ports
• hpm {options} ./a.out {prog_arguments}• options = -g2 -> memory access information
• options = -g3 -> vector register information
• Look for :• Ratio of Floating Ops/CPU second to CPU mem.
references per sec should reflect the FpOps in the code
![Page 7: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/7.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
7
Sample hpm output : (hpm -g0 ./a.out)Million inst/sec (MIPS) : 7.67 Instructions : 274017290Avg. clock periods/inst : 26.06% CP holding issue : 94.02 CP holding issue : 6714667737Inst.buffer fetches/sec : 0.04M Inst.buf. fetches: 1420802Floating adds/sec : 15.40M F.P. adds : 550002417Floating multiplies/sec : 24.36M F.P. multiplies : 870004996Floating reciprocal/sec : 0.28M F.P. reciprocals : 10000042Cache hits/sec : 0.00M Cache hits : 45893CPU mem. references/sec : 34.64M CPU references : 1236978495Floating ops/CPU second: 40.5M
![Page 8: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/8.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
8
Cray PVP : CPU Bound Codes: prof/profview
• Instruments code to provide % cpu time in function calls
• f90 -lprof prog.f90
• ./a.out -> generates prof.data
• prof -st ./a.out > prof.report
• Chart (over) indicates relative distribution of CPU execution time by function call– prof -x a.out > pgm.prof
– profview pgm.prof
![Page 9: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/9.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
9
Profview - Sample Output
![Page 10: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/10.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
10
I/O and Memory Bound Codes : procstat/procview
• procstat -m -i -R a.raw a.out
• procview a.raw
– I/O Analysis :
• Reports, Files -> All User Files (Long Report)
• Bytes Processed or I/O Wait Time
– Memory Analysis :
• Reports -> Processes -> Maximum Memory Used (Long Format)
![Page 11: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/11.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
11
I/O Bound Codes : procview
• procview indicates which files consume most real time for I/O processing
![Page 12: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/12.jpg)
Memory Bound Codes : procview– “High” (> 10% Elapsed
Time) Time to complete Memory requests may indicate memory bound code
– Use Graphs option to produce plot of Memory use over elapsed time of application
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
12
![Page 13: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/13.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
13
ATExpert - Autotasking ATExpert - Autotasking PredictionPrediction
• Analysis of source code to predict autotasking performance on dedicated Cray PVP
• f90 -eX -O3 -r4 -o {prog_name} prog.f90– ./a.out– atexpert -> shows predicted speed-up
![Page 14: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/14.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
14
ATExpert Sample outputATExpert Sample output
Indicates predicted speed-up of 4.3 on dedicated 8 processor PVP when source code is autotasked
![Page 15: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/15.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
15
Also available on Cray PVP Also available on Cray PVP • Flowtrace/flowview
• times (using Operating System timers) subroutines and functions during program execution
• jumptrace/jumpview• provides exact timing in function/subroutine by
analysis of machine instructions in program
• perftrace/perfview• times subroutines/functions based on statistics
gathered from HPM tool
![Page 16: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/16.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
16
Cray T3E - ApprenticeCray T3E - Apprentice• Locate performance problems /inefficiencies
• MPI and shared memory performance, load balance and communication, memory use
• Provides hardware performance information and tuning recommendations (Displays -> Observations)
• Compile/link• f90 -o {prog} -eA {prog_name.f90} -lapp
• cc -o {prog} -happrentice {prog_name.c} -lapp
• Run code to generate app.rif
![Page 17: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/17.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
17
Output from :
apprentice app.rif
![Page 18: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/18.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
18
Cray T3E - PATCray T3E - PAT
• Generates profile of CPU time in functions; load balance across PEs; h/w counter info.
• Compile and Link with PAT library• f90 -o exe -lpat {source.f} pat.cld
• Run program as normal• mpprun -n {procs} {exe} -> generate exe.pif
• pat executable exe.pif
![Page 19: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/19.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
19
Profile based on relative CPU time in function calls
Load Balance Histogram for routine “COLL”
![Page 20: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/20.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
20
Cray T3E - ACTS/TAUCray T3E - ACTS/TAU • Performance analysis of distributed/shared
memory applications (C++ in particular)• module load tau
• instrument programs with TAU macros
• add $(TAU_DEFS), $(TAULIBS) to compile/link
• run application; view tracefile with pprof, VAMPIR
• Reference• http://acts.nersc.gov/tau
• http://hpcf.nersc.gov/training/classes/Teleconf/1999july/Wu
![Page 21: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/21.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
21
Cray T3E - VampirCray T3E - Vampir • Analysis of message passing characteristics -
generates display of MPI activity over instrumented time period (e.g. sender, receiver, message size, elapsed time)
• module load VAMPIR; module load vampirtrace
• Facility to instrument with VAMPIRtrace calls
• Generate trace file using TAU or VAMPIRtrace
• Reference :• http://hpcf.nersc.gov/software/tools/vampir.html
![Page 22: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/22.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
22
IBM/SP - XprofilerIBM/SP - Xprofiler• Graphical interface for gprof profiles of
parallel applications – Compile and link code with “-g -pg”– poe ./a.out -procs {n}
• generates gmon.out.{n} file for each process
• may introduce significant (upto factor of 2) overhead
– (In $TMPDIR) xprofiler ./a.out gmon.out.*
• Report menu provides (gprof) text profile
• Source statement profiling shown
![Page 23: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/23.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
23
![Page 24: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/24.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
24
Statement level profile available by clicking on relevant function graphical output - use Show Source Code option
![Page 25: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/25.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
25
IBM/SP - Visualization Tool (VT)IBM/SP - Visualization Tool (VT)
• Message passing trace visualization
• Realtime system activity monitor (limited)
• MPI load balance overview : • poe ./a.out -procs {n} -tlevel=3
• copy a.out.trc to $TMPDIR
• (In $TMPDIR) Invoke vt
• In trace visualization mode, “Play” a.out.trc
• see next slide for sample of Interprocessor Communication during program execution
![Page 26: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/26.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
26
![Page 27: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/27.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
27
IBM/SP : system_statsIBM/SP : system_stats• IBM Internal Tool
• module load sptools
• instrument code with system_stats() call
• Link with $(SPTOOLS), run code as normal
• Sample output Summary of the utilization of system resources:node hostname wall(s) user(s) sys(s) size(KB) pswitches 0 gs01015 16.80 13.18 0.04 2748 2138 1 gs01015 16.80 16.07 0.04 2744 1868 2 gs01003 16.80 16.62 0.04 2740 1870 3 gs01003 16.80 16.56 0.03 2732 1841
![Page 28: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/28.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
28
IBM/SP - trace-mpi IBM/SP - trace-mpi • IBM Internal tool - Quantitative information
on MPI calls– module load USG ; module load trace-mpi– Fortran - add $(TRACE_MPIF) to build– C - add $(TRACE_MPI) to build– poe ./a.out -procs {n} - generates mpi.trace_file for each
process (executable must call MPI_Finalize)– summary mpi.trace_file.{n} (see over)
• Useful check for load balance :– grep “Total Communication” mpi.trace.file.*
![Page 29: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/29.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
29
MPI message-passing summary for mpi.trace_file.3MPI Function #calls Avg Bytes Time (sec)-------------------------------------------------------------MPI_Allreduce: 9355 8.0 3.596MPI_Barrier: 3 0.0 0.017MPI_Bcast: 66 5.8 0.013MPI_Scatter: 31 1008.0 0.088MPI_Comm_rank: 1 0.0 0.000MPI_Comm_size: 1 0.0 0.000MPI_Isend: 43023 2003.7 0.893MPI_Recv: 43023 2003.7 7.481MPI_Wait: 43023 2003.7 3.739Total Communication Information: WALL = 15.8277, CPU = 15.53, MBYTES = 258.72The total amount of wall time = 26.229613
![Page 30: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/30.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
30
Upcoming on the SPUpcoming on the SP• ACTS/TAU (C/C++)
• currently being ported to the IBM/SP
• VAMPIR• has been ordered, awaiting delivery
• Performance Monitor Toolkit (HPM)• should be available with Phase II system
(requires AIX 4.3.4)
• Also, see Performance API project:– http://icl.cs.utk.edu/projects/papi
![Page 31: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/31.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
31
General/UNIX Profiling ToolsGeneral/UNIX Profiling Tools• Command line profilers and system analysis
• prof/gprof (enabled for MPI on IBM/SP)
• csh time command : time ./a.out
• vmstat -> look for high paging over extended time period (application may require more memory)
• Fortran/C function timers • getrusage
• rtc, irtc
• etime, dtime, mclock
• MPI_Wtime
![Page 32: Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649ee65503460f94bf6a98/html5/thumbnails/32.jpg)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
32
Reference MaterialReference Material• NERSC web pages
• http://hpcf.nersc.gov/software/tools
• Cray PVP/Cray T3E • http://www.cray.com/swpubs
– Optimizing Code on Cray PVP Systems
– Cray T3E C, Fortran Optimization Guides
• IBM/SP• LLNL Workshop on Performance Tools