Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific Applications

U N C L A S S I F I E D LA-UR-06-1506

U N C L A S S I F I E D

Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific

Applications

David R. “Chip” Kent IV

High Performance Computing Environments GroupLos Alamos National Laboratory

March 21, 2006



Outline

• Overview of LANL and computing at LANL• Code coverage in scientific applications• Tracking scientific simulations• Dyninst challenges at LANL



Los Alamos National Laboratory

History• Birthplace of the atomic bomb

Current mission• Ensure the safety and reliability of US nuclear weapons• Prevent the spread of weapons of mass destruction• Protect the homeland from attack



Computing at LANL

• No nuclear testing by the US since 1992• Nuclear arsenal is now well over a decade old• Simulations and laboratory experiments are now used in place of nuclear tests

• Software correctness is extremely important• Simulation repeatability is extremely important• Simulation results must reproduce laboratory experiments and old nuclear tests• Requires huge computing resources• Application performance is important• Requires research into computing areas ranging from hardware to OS to physics simulations



Simulation software at LANL

• Applications are developed over decades• O(1M) source lines for large applications• Large applications contain a mixture of programming languages

• Fortran 77/9x• C/C++• Preprocessed variants of Fortran

• Compilation done with multiple compilers• pgf90, pgcc• gcc, g++

•Some teams provide single-physics libraries and other teams merge the libraries into multi-physics simulations • Libraries are typically linked in statically (not always)

• “100MB Binary of Death” -- Drew• Binaries are often at least 100MB

• MPI is used for parallel simulations• Simulations can run for months



Code coverage in scientific applications



What is Javelina?

An advanced code coverage tool (what code got executed)• Can portably acquire data (any platform Dyninst supports)

• x86/Linux• ia64/Linux*• x86_64/Linux* (any day now)• PowerPC/AIX 5.1*• MIPS/IRIX 6.5*• Alpha/Tru64• x86/ Windows 2000/XP*

• Operates on the binary with no source or build changes • Acquires data with minimal overhead

• Dynamic instrumentation (Dyninst) is used• Coverage instrumentation can be removed once it is executed

• Coverage data can be analyzed using arbitrarily complex logic• Can find code executed by end users but not executed by tests• Can be incorporated into python scripts

*untested



Using Javelina: Linux, etc.

1. Build your program– make flag

2. Run the program– mpirun javelina flag include inputs

3. Perform logic on code coverage data– python mylogic.py

4. View the resulting data– javelinagui mydata.xml

No Code/Build Modifications

No Code/Build Modifications



• Javelina analyzes and instruments binaries (no source or build modifications)– Binary instrumentation is used on Tru64 systems (Atom)

– 2-3x uninstrumented runtime– A new binary is created which contains the coverage instrumentation

– Dynamic instrumentation is used on Linux and other supported systems (Dyninst)– 1.06-3x uninstrumented runtime (working to improve this range)– Binary is instrumented when execution starts– Once a block is executed, its instrumentaiton will be removed

• Coverage is measured at the instruction block level.

• Instruction blocks are mapped to source lines using debugging information.

• Supports C/C++, Fortran 77/90/95, and mixtures of these (Anything the compilers support)

• Supports parallel applications

• Working to reduce the Dyninst overhead so that end-user runs can regularly be analyzed

Binary Analysis, Instrumentation, & Coverage Data Generation



Dynamic Instrumentation: Linux, etc.

source.{f,c,cpp} myexe RAM

{f90,cc,c++}

Debug Info

javelina \myexe

Instrumentation insertedinto & removed from instructions in memory

Map between source lines and instrumentation



Logical Operations

• AND(self, other)– Performs a logical AND operation on the data in two objects and returns the result. A

line will be marked as executed if both objects mark the line as having been executed.

• NOT(self)– Performs a logical NOT operation on the data in this object and returns the result. A

line will be marked as executed if it was not executed and vice versa.

• OR(self, other)– Performs a logical OR operation on the data in two objects and returns the result. A

line will be marked as executed if either object marks the line as having been executed.

• SUBTRACT(self, other)– Extracts the lines of this object which have been executed, marks these lines as

executed if they are executed in the other object, and returns the result. This operator is useful in determining which lines executed by a user were tested.



Logical Operations: OR

Executed by test 1

Source.f

Executed by test 2 Executed by either

Source.f Source.f

OR



Logical Operations: SUBTRACT

Executed by tests Executed by apps.

Source.f Source.f

SUB-TRACT

Executed by apps.

Source.f

Highlighted lines used by applications, but not tested.



GUI: Large Application

Used by applications, but not tested.

Used by applications, but not tested.

Files ranked by worst offenders.

Files ranked by worst offenders.



Tracking Scientific Applications



Multiphysics simulations are complex

• 105+ lines of constantly changing code

• Constantly changing libraries

• Complex input files

• Simulations and libraries read environment variables

• Simulations use variable numbers of processors

• HPC System changes– Compilers– Libraries– Operating system– Hardware (upgrades, repairs, new machines)

• Etc.



Example Physics Package

FLAG startup

subroutine

FLAG dump

subroutine

FLAG Ensight dump

subroutine

Input .flg file: C1

FLAG simulation

: D1

FLAG executabl

e: C3

EOS Library:

C2

Grid: C4

Old input .flg file: A1

Build script

Grid generator

Text editor

FLAG CVS repository

: B1

EOSPAC library: B2

Compiler: B3

UNIX environment:

B4

Powerpoint

Ensight code

Older grid: E1

Script: E2

Log: E1,E2

Rtn: “C4”

Log: B1 B2 B3 B4Rtn: “C3”

Log: C1 C2 C3 C4Rtn: “D1”

Log: D1Rtn: “Fn”

Ensight dump: F1Ensight

dump: F1Ensight dump: F1Ensight

dump: Fn

Log: D1Rtn: “Gn”

Restart dump: G1Restart

dump: G1Restart dump: G1Restart

dump: Gn

Ensight picture:

H1Ensight picture:

H1Ensight picture:

H1Ensight picture:

Hn

Log: F FnRtn: “Hn”

Presentation Note: “Hn” is in

the graphic itself

Note: FLAG may have to imbed “C1” in the file

Script: F



Motivation

• It is practically impossible for a human to precisely record everything that went into or came out of a simulation

– E.g. shared libraries

• Ability to reproduce simulations decreases with time since the simulation was run

– Systems change– Humans didn’t precisely specify all aspects of a simulation– Etc.

• Currently cannot specify all outputs impacted by a bug– Especially difficult if the bug was discovered long after the simulation

• Currently, in many cases, cannot easily determine exactly how two simulations differ

• These are critical V&V issues



Alexandria In A Sentence

Alexandria tracks the history and relationships of files and processes to each other



Example Information Flow Graph

myphysics

F1 F2 F3

F4 F5

genmesh

ensight

F6

F0 File

Application Execution

(e.g. build,

simulation, etc.)

Mesh Generation

Simulation

Visualization



File Signatures As Fundamental Identification

Why the use of Signature

• It is a short-hand unique identifier for the file content.

• It ensures the integrity of the file content through time.

• The whole file does not have to be stored

How signature is generated

• Many algorithms - example uses 160 bit SHA-1 algorithm.

• Takes as input a file of arbitrary length and produces as output a 160-bit "fingerprint" or "message digest" of the input.

drkent% ./logging_mv file1 file2IN: /Users/drkent/code/test/file1 41d7b77c8fe2634cfab042f54f5b6ae6c24d3a17IN: /sw/bin/mv 389df9ea4ba8c266659165dd434d7ce33e97a936ACTION: mv /Users/drkent/code/test/file1 /Users/drkent/code/test/file2OUT: /Users/drkent/code/test/file2 41d7b77c8fe2634cfab042f54f5b6ae6c24d3a17

Example: Wrapper around mv command- generates signatures and tracks actions

• Our signatures are really cryptographic hash functions

• Checksums are simple examples of verifying file content



User Interface: HPC System Side

• Data will be acquired by intercepting system calls (e.g. “open”)– int x = open(“/etc/hosts”, O_RDONLY);

– File: /etc/hosts– I/O: Input (O_RDONLY)

– Int x = open(“/tmp/scratch.file”, O_WRONLY, 00640);– File: /tmp/scratch.file– I/O: Output (O_WRONLY)

• A few possible methods for intercepting system calls– Currently using Dyninst

• Does not involve modifying user code

• Use on standard systems:– alexandria myexe inputs

• On lightweight-kernel systems may involve relinking



Why System Call Interception?: Minimal Effort

FC=f95CC=cc

all: myexe…

FC=alexandria f95CC=alexandria cc

all: myexe…

mpirun myexe input

mpirun alexandria myexe input

Build

Simulation Run

untracked

untracked

tracked

tracked



Alexandria Object Database

myphysics

F1 F2 F3

F4 F5

genmesh

ensight

F6

F0• Storing everything necessary to exactly describe our simulations will generate a lot of data over time (think terabytes or more)

• The data is highly interconnected–M inputs and N outputs for every process–each input/output can be an input/output for other processes

• Data querying must be fast enough for a user to perform interactive analysis

• Database must:–Be a robust commercial product

–Data persists for decades–Need protection against corruption, etc.

–Scale to very large datasets –Perform well with highly interconnected data–Require minimal administration costs–Minimize development time and effort

• To meet these requirements, we are using the Objectivity/DB Object Database.



What Outputs Are Impacted By buggyfile.f?

buggyfile.f

otherfile1.f

otherfile2.ff95 *.f -o myexe myexe

f95

myinput1 myexe myinput1

myoutput1

myoutput2

myinput2 myexe myinput2

Inputs are to the left and outputs are to the right of a process (information flows left to right)

Flow of Information



How Did I Create bigexplosion.gif?

Inputs are to the left and outputs are to the right of a process (information flows left to right)

myexe

mesh

input myexe mesh input

output1

libc.sobigexplosion.gif

makeplot.gnp

gnuplot makeplot.gnp

liblapack.so

output2

output3

Flow of Information



User Interface: Analysis & Query

• User interface to perform queries like:– Find the executable and all inputs used to generate a plot– Compare two simulations and identify differences– Locate a file with a given signature (e.g. in HPSS at location)– Determine the impact of problems in source files or libraries– Determine the genealogy of a given file– Find all simulations where a given input was used– Find all jobs run by a user during a time window– Etc.



Alexandria CLI Example: Job Setup

Setup a new job

Print the unique job id

Print the job’s current state

Run the calculation underthe Alexandria interceptor



Alexandria CLI Example: Printing A Job

Unique Job ID

Process Timing Info

Input/Output File Info



Alexandria CGI Example: Where was this file used/created?



Alexandria + Code Usage/Coverage

• Considering tracking code usage in Alexandria

• Based on LANL code usage/coverage work (Javelina)

• Can be done with little overhead using Dyninst

• Alexandria would:– Record which functions executed during a simulation– Record which function a bug is in (in a particular source file)– Allow you to identify which simulations using a buggy source file

executed the buggy function!– Allow you to identify which functions have not been executed over the

last N years!



Dyninst Challenges at LANL: Part 1

• We can’t give out any of our important binaries which break Dyninst– Dyninst is very difficult to debug

• Dyninst startup overhead– Improved by parsing only a subset of the binary– Can take >30min on a 100MB binary– Some binaries take longer to parse than others (PGI takes ~10x longer than GCC)– Still slow

• Dyninst runtime overhead– Traps are used too often on x86

– Getting better– Performance has been improved by ~1000x for Javelina

– “read” and “write” seem to run slow when instrumented at exit

• MPI + Dyninst can lead to problems– “mpirun mydyninstprog myexe arg1 arg2 …” does not work with all MPI implementations– Seems to be a conflict with MPI startup and Dyninst (e.g. problems with signals)– Open-MPI seems to work fine (Yea!)



Dyninst Challenges at LANL: Part 2

• Dyninst is still brittle– A 100MB binary has stuff in it that Dyninst has never been tested against

– Specific instruction sequences– Debug information

– Robustness depends on the compiler/language– GCC compiled applications have less problems than PGI compiled applications– C/C++ applications have less problems than Fortran 9x applications

– Robustness depends on the architecture/os– Often have to debug Dyninst on each platform you intend your application to run on

• Supercomputers are “flavor of the week”– Systems have a lifetime of 3-5 years– Poorly supported platforms (Alpha/Tru64) are bought for performance (price)

reasons– Our Linux clusters are significantly modified from standard distributions– Makes Dyninst support difficult – LANL, LLNL, and SNL are working to improve the situation



Final Note

LANL is involved in the Open|SpeedShop effort, and Dyninst will soon be used to obtain performance data at LANL.



LA-UR-06-1506



Abstract

LANL’s use of Dyninst in Alexandria and Javelina is discussed. An overview of these projects and a list of the problems LANL has encountered with Dyninst are discussed.

Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific Applications

Documents

Transcript of Using Dyninst for Simulation Tracking and Code Coverage on Large Scientific Applications