Transcript of Debugging Cluster Programs using symbolic debuggers.
- Slide 1
- Debugging Cluster Programs using symbolic debuggers
- Slide 2
- Debugging Code Careful review of your code Add debugging code
to your code print statements at strategic locations in code remove
later Use a symbolic debugger
- Slide 3
- Careful review of your code Rereading your code is often
helpful Most parallel code errors are serial errors Compare your
code to specs Take a break, review your code with a fresh brain
Have someone else help you review your code
- Slide 4
- Common sources of errors Beyond what the compiler catches
Usually run-time errors Incorrect use of pointers Point out of
memory Reference should have used a pointer Referenced wrong
variable Index initialized wrong, wrong exit condition
- Slide 5
- Common parallel errors Deadlock errors Receive before send
Receive, but no send Incorrect arguments in MPI calls Mismatch on
tags Mismatch of source/destination Misunderstanding of a the use
of an argument
- Slide 6
- Add Debugging Code Add strategically placed code in your code
to display critical information Watch values of variables as the
program progresses Can create data-dump functions call when you
need them Have a way to remove them in production code
- Slide 7
- Add Debugging Code Can be difficult to get the right debugging
code in the right place Does not scale well in parallel environment
Can produce unmanageable or unintelligible output
- Slide 8
- Symbolic Debuggers Allow you to inspect your code monitor its
behavior modify the data values on the fly as your code
executes
- Slide 9
- gdb GNU debugger
- Slide 10
- Frequently used GDB commands: break [file:]function - Set a
breakpoint at function (in file). run [arglist] - Start your
program (with arglist, if specified). bt - Backtrace: display the
program stack. print expr - Display the value of an expression. c -
Continue running your program (after stopping, e.g. at a
breakpoint). next - Execute next program line (after stopping);
step over any function calls in the line. step - Execute next
program line (after stopping); step into any function calls in the
line. help [name] - Show information about GDB command name, or
general information about using GDB. quit - Exit from GDB.
- Slide 11
- gdb
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Running in X-windows Linux (Unix) to Linux ssh to host, login
and enter X application Other platforms (Windows, Mac) Use X-
windows server application VNC in most platforms VNC operates as a
remote control application in Linux VNC operates as a X-windows
server viewer for Windows, Macintosh, Solaris
- Slide 16
- Running in X-windows Using VNC ssh to host and login start
vncserver pay attention to display id (:n) from your desktop run
VNCViewer select host with correct display id After session kill
vncserver vncserver kill :n (n is display id number)
- Slide 17
- Using VNC
- Slide 18
- Slide 19
- x desktop with VNC
- Slide 20
- ddd a graphic front end to gdb
- Slide 21
- pgdbg Debugger from the Portland Group (PGI) Can use with PG
compilers Can use with GNU compilers
- Slide 22
- pgdbg common commands Back to text mode for a bit lis[t] [count
| low:high | routine | line,count] -display lines from the source
code file or routine att[ach] [ | ] - attach to a running process
or start a local executable and attach to it, or start an
executable on c[ont] - continue executing from the current
location
- Slide 23
- pgdbg common commands det[ach] detach from the currently
attached process halt halt the executing process or thread n[ext]
[count] continue executing and stop after count lines of source
code nexti [count] continue executing and stop after count
instructions
- Slide 24
- pgdbg common commands q[uit] terminate pgdbg and exit ru[n]
[arg0 arg1 argn] run program from beginning with arguments arg0,
arg1 s[tep] [count] execute next count lines of source code and
stop. Step steps into called routines s[tep] up steps out of
current routine stepi [count] execute next count instructions and
stop. Steps into called routines
- Slide 25
- pgdbg common commands stepi up steps out of current routine and
stops Event command break line | function - sets a break point to
specified line or function. If no line or function specified lists
existing breakpoints. A break point stops execution at specified
point clear [all | line | func] clears all breakpoints, or a
breakpoint at line line or at function func.
- Slide 26
- pgdbg common commands stop var - break when the value of var
changes at a location watch expr stops and display the value of
expr when it changes track expr like watch except does not stop
execution trace var - displays a trace of source line execution
when the value of var changes
- Slide 27
- pgdbg common commands p[rint] var displays the value of a
variable edit filename evokes an editor to edit file filename. If
no filename given edits current file decl[aration name displays the
type declaration for the object name as[ign] var = expr - assigns
the value expr to the variable var proc [number] sets the current
process to process number number
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Resources gdb man gdb info gdb; Using GDB: A Guide to the GNU
Source- Level Debugger, Richard M. Stallman and Roland H. Pesch,
July 1991. ddd man ddd VNC http://www.uk.research.att.com/vnc/
http://www.realvnc.com
- Slide 33
- Resources PGI Debugger Users Guide
http://www.pgroup.com/ppro_docs/pgdbg_ug/PGDBG4.htm
http://www.pgroup.com/ppro_docs/pgdbg_ug/PGDBG4.htm PGI Users
Guide, PGI 4.1 Release Notes, FAQ, Tutorials
http://www.pgroup.com/docs.htm MPI-CH http://www.netlib.org/ OpenMP
http://www.openmp.org/ HPDF (High Performance Debugging Forum)
Standard http://www.ptools.org/hpdf/draft/intro.html
- Slide 34