Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific Applications
description
Transcript of Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific Applications
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific
Applications
David R. “Chip” Kent IV
High Performance Computing Environments GroupLos Alamos National Laboratory
March 21, 2006
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Outline
• Overview of LANL and computing at LANL• Code coverage in scientific applications• Tracking scientific simulations• Dyninst challenges at LANL
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Los Alamos National Laboratory
History• Birthplace of the atomic bomb
Current mission• Ensure the safety and reliability of US nuclear weapons• Prevent the spread of weapons of mass destruction• Protect the homeland from attack
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Computing at LANL
• No nuclear testing by the US since 1992• Nuclear arsenal is now well over a decade old• Simulations and laboratory experiments are now used in place of nuclear tests
• Software correctness is extremely important• Simulation repeatability is extremely important• Simulation results must reproduce laboratory experiments and old nuclear tests• Requires huge computing resources• Application performance is important• Requires research into computing areas ranging from hardware to OS to physics simulations
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Simulation software at LANL
• Applications are developed over decades• O(1M) source lines for large applications• Large applications contain a mixture of programming languages
• Fortran 77/9x• C/C++• Preprocessed variants of Fortran
• Compilation done with multiple compilers• pgf90, pgcc• gcc, g++
•Some teams provide single-physics libraries and other teams merge the libraries into multi-physics simulations • Libraries are typically linked in statically (not always)
• “100MB Binary of Death” -- Drew• Binaries are often at least 100MB
• MPI is used for parallel simulations• Simulations can run for months
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Code coverage in scientific applications
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
What is Javelina?
An advanced code coverage tool (what code got executed)• Can portably acquire data (any platform Dyninst supports)
• x86/Linux• ia64/Linux*• x86_64/Linux* (any day now)• PowerPC/AIX 5.1*• MIPS/IRIX 6.5*• Alpha/Tru64• x86/ Windows 2000/XP*
• Operates on the binary with no source or build changes • Acquires data with minimal overhead
• Dynamic instrumentation (Dyninst) is used• Coverage instrumentation can be removed once it is executed
• Coverage data can be analyzed using arbitrarily complex logic• Can find code executed by end users but not executed by tests• Can be incorporated into python scripts
*untested
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Using Javelina: Linux, etc.
1. Build your program– make flag
2. Run the program– mpirun javelina flag include inputs
3. Perform logic on code coverage data– python mylogic.py
4. View the resulting data– javelinagui mydata.xml
No Code/Build Modifications
No Code/Build Modifications
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
• Javelina analyzes and instruments binaries (no source or build modifications)– Binary instrumentation is used on Tru64 systems (Atom)
– 2-3x uninstrumented runtime– A new binary is created which contains the coverage instrumentation
– Dynamic instrumentation is used on Linux and other supported systems (Dyninst)– 1.06-3x uninstrumented runtime (working to improve this range)– Binary is instrumented when execution starts– Once a block is executed, its instrumentaiton will be removed
• Coverage is measured at the instruction block level.
• Instruction blocks are mapped to source lines using debugging information.
• Supports C/C++, Fortran 77/90/95, and mixtures of these (Anything the compilers support)
• Supports parallel applications
• Working to reduce the Dyninst overhead so that end-user runs can regularly be analyzed
Binary Analysis, Instrumentation, & Coverage Data Generation
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Dynamic Instrumentation: Linux, etc.
source.{f,c,cpp} myexe RAM
{f90,cc,c++}
Debug Info
javelina \myexe
Instrumentation insertedinto & removed from instructions in memory
Map between source lines and instrumentation
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Logical Operations
• AND(self, other)– Performs a logical AND operation on the data in two objects and returns the result. A
line will be marked as executed if both objects mark the line as having been executed.
• NOT(self)– Performs a logical NOT operation on the data in this object and returns the result. A
line will be marked as executed if it was not executed and vice versa.
• OR(self, other)– Performs a logical OR operation on the data in two objects and returns the result. A
line will be marked as executed if either object marks the line as having been executed.
• SUBTRACT(self, other)– Extracts the lines of this object which have been executed, marks these lines as
executed if they are executed in the other object, and returns the result. This operator is useful in determining which lines executed by a user were tested.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Logical Operations: OR
Executed by test 1
Source.f
Executed by test 2 Executed by either
Source.f Source.f
OR
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Logical Operations: SUBTRACT
Executed by tests Executed by apps.
Source.f Source.f
SUB-TRACT
Executed by apps.
Source.f
Highlighted lines used by applications, but not tested.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
GUI: Large Application
Used by applications, but not tested.
Used by applications, but not tested.
Files ranked by worst offenders.
Files ranked by worst offenders.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Tracking Scientific Applications
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Multiphysics simulations are complex
• 105+ lines of constantly changing code
• Constantly changing libraries
• Complex input files
• Simulations and libraries read environment variables
• Simulations use variable numbers of processors
• HPC System changes– Compilers– Libraries– Operating system– Hardware (upgrades, repairs, new machines)
• Etc.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Example Physics Package
FLAG startup
subroutine
FLAG dump
subroutine
FLAG Ensight dump
subroutine
Input .flg file: C1
FLAG simulation
: D1
FLAG executabl
e: C3
EOS Library:
C2
Grid: C4
Old input .flg file: A1
Build script
Grid generator
Text editor
FLAG CVS repository
: B1
EOSPAC library: B2
Compiler: B3
UNIX environment:
B4
Powerpoint
Ensight code
Older grid: E1
Script: E2
Log: E1,E2
Rtn: “C4”
Log: B1 B2 B3 B4Rtn: “C3”
Log: C1 C2 C3 C4Rtn: “D1”
Log: D1Rtn: “Fn”
Ensight dump: F1Ensight
dump: F1Ensight dump: F1Ensight
dump: Fn
Log: D1Rtn: “Gn”
Restart dump: G1Restart
dump: G1Restart dump: G1Restart
dump: Gn
Ensight picture:
H1Ensight picture:
H1Ensight picture:
H1Ensight picture:
Hn
Log: F FnRtn: “Hn”
Presentation Note: “Hn” is in
the graphic itself
Note: FLAG may have to imbed “C1” in the file
Script: F
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Motivation
• It is practically impossible for a human to precisely record everything that went into or came out of a simulation
– E.g. shared libraries
• Ability to reproduce simulations decreases with time since the simulation was run
– Systems change– Humans didn’t precisely specify all aspects of a simulation– Etc.
• Currently cannot specify all outputs impacted by a bug– Especially difficult if the bug was discovered long after the simulation
• Currently, in many cases, cannot easily determine exactly how two simulations differ
• These are critical V&V issues
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria In A Sentence
Alexandria tracks the history and relationships of files and processes to each other
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Example Information Flow Graph
myphysics
F1 F2 F3
F4 F5
genmesh
ensight
F6
F0 File
Application Execution
(e.g. build,
simulation, etc.)
Mesh Generation
Simulation
Visualization
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
File Signatures As Fundamental Identification
Why the use of Signature
• It is a short-hand unique identifier for the file content.
• It ensures the integrity of the file content through time.
• The whole file does not have to be stored
How signature is generated
• Many algorithms - example uses 160 bit SHA-1 algorithm.
• Takes as input a file of arbitrary length and produces as output a 160-bit "fingerprint" or "message digest" of the input.
drkent% ./logging_mv file1 file2IN: /Users/drkent/code/test/file1 41d7b77c8fe2634cfab042f54f5b6ae6c24d3a17IN: /sw/bin/mv 389df9ea4ba8c266659165dd434d7ce33e97a936ACTION: mv /Users/drkent/code/test/file1 /Users/drkent/code/test/file2OUT: /Users/drkent/code/test/file2 41d7b77c8fe2634cfab042f54f5b6ae6c24d3a17
Example: Wrapper around mv command- generates signatures and tracks actions
• Our signatures are really cryptographic hash functions
• Checksums are simple examples of verifying file content
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
User Interface: HPC System Side
• Data will be acquired by intercepting system calls (e.g. “open”)– int x = open(“/etc/hosts”, O_RDONLY);
– File: /etc/hosts– I/O: Input (O_RDONLY)
– Int x = open(“/tmp/scratch.file”, O_WRONLY, 00640);– File: /tmp/scratch.file– I/O: Output (O_WRONLY)
• A few possible methods for intercepting system calls– Currently using Dyninst
• Does not involve modifying user code
• Use on standard systems:– alexandria myexe inputs
• On lightweight-kernel systems may involve relinking
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Why System Call Interception?: Minimal Effort
FC=f95CC=cc
all: myexe…
FC=alexandria f95CC=alexandria cc
all: myexe…
mpirun myexe input
mpirun alexandria myexe input
Build
Simulation Run
untracked
untracked
tracked
tracked
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria Object Database
myphysics
F1 F2 F3
F4 F5
genmesh
ensight
F6
F0• Storing everything necessary to exactly describe our simulations will generate a lot of data over time (think terabytes or more)
• The data is highly interconnected–M inputs and N outputs for every process–each input/output can be an input/output for other processes
• Data querying must be fast enough for a user to perform interactive analysis
• Database must:–Be a robust commercial product
–Data persists for decades–Need protection against corruption, etc.
–Scale to very large datasets –Perform well with highly interconnected data–Require minimal administration costs–Minimize development time and effort
• To meet these requirements, we are using the Objectivity/DB Object Database.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
What Outputs Are Impacted By buggyfile.f?
buggyfile.f
otherfile1.f
otherfile2.ff95 *.f -o myexe myexe
f95
myinput1 myexe myinput1
myoutput1
myoutput2
myinput2 myexe myinput2
Inputs are to the left and outputs are to the right of a process (information flows left to right)
Flow of Information
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
How Did I Create bigexplosion.gif?
Inputs are to the left and outputs are to the right of a process (information flows left to right)
myexe
mesh
input myexe mesh input
output1
libc.sobigexplosion.gif
makeplot.gnp
gnuplot makeplot.gnp
liblapack.so
output2
output3
Flow of Information
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
User Interface: Analysis & Query
• User interface to perform queries like:– Find the executable and all inputs used to generate a plot– Compare two simulations and identify differences– Locate a file with a given signature (e.g. in HPSS at location)– Determine the impact of problems in source files or libraries– Determine the genealogy of a given file– Find all simulations where a given input was used– Find all jobs run by a user during a time window– Etc.
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria CLI Example: Job Setup
Setup a new job
Print the unique job id
Print the job’s current state
Run the calculation underthe Alexandria interceptor
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria CLI Example: Printing A Job
Unique Job ID
Process Timing Info
Input/Output File Info
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria CGI Example: Where was this file used/created?
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria CGI Example: Where was this file used/created?
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria CGI Example: Where was this file used/created?
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Alexandria + Code Usage/Coverage
• Considering tracking code usage in Alexandria
• Based on LANL code usage/coverage work (Javelina)
• Can be done with little overhead using Dyninst
• Alexandria would:– Record which functions executed during a simulation– Record which function a bug is in (in a particular source file)– Allow you to identify which simulations using a buggy source file
executed the buggy function!– Allow you to identify which functions have not been executed over the
last N years!
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Dyninst Challenges at LANL: Part 1
• We can’t give out any of our important binaries which break Dyninst– Dyninst is very difficult to debug
• Dyninst startup overhead– Improved by parsing only a subset of the binary– Can take >30min on a 100MB binary– Some binaries take longer to parse than others (PGI takes ~10x longer than GCC)– Still slow
• Dyninst runtime overhead– Traps are used too often on x86
– Getting better– Performance has been improved by ~1000x for Javelina
– “read” and “write” seem to run slow when instrumented at exit
• MPI + Dyninst can lead to problems– “mpirun mydyninstprog myexe arg1 arg2 …” does not work with all MPI implementations– Seems to be a conflict with MPI startup and Dyninst (e.g. problems with signals)– Open-MPI seems to work fine (Yea!)
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Dyninst Challenges at LANL: Part 2
• Dyninst is still brittle– A 100MB binary has stuff in it that Dyninst has never been tested against
– Specific instruction sequences– Debug information
– Robustness depends on the compiler/language– GCC compiled applications have less problems than PGI compiled applications– C/C++ applications have less problems than Fortran 9x applications
– Robustness depends on the architecture/os– Often have to debug Dyninst on each platform you intend your application to run on
• Supercomputers are “flavor of the week”– Systems have a lifetime of 3-5 years– Poorly supported platforms (Alpha/Tru64) are bought for performance (price)
reasons– Our Linux clusters are significantly modified from standard distributions– Makes Dyninst support difficult – LANL, LLNL, and SNL are working to improve the situation
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Final Note
LANL is involved in the Open|SpeedShop effort, and Dyninst will soon be used to obtain performance data at LANL.
U N C L A S S I F I E D
U N C L A S S I F I E D
LA-UR-06-1506
U N C L A S S I F I E D LA-UR-06-1506
U N C L A S S I F I E D
Abstract
LANL’s use of Dyninst in Alexandria and Javelina is discussed. An overview of these projects and a list of the problems LANL has encountered with Dyninst are discussed.