1 Vampir Overview

Post on 20-Jan-2015

1.009 views 3 download

Tags:

description

 

Transcript of 1 Vampir Overview

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Event Tracing withVampirTrace and Vampir

2

Introduction

Event Tracing Overview

Instrumentation

Run-Time Measurement

Conclusions

Overview

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Introduction

4

Moore's Law still in charge, so what?

increasingly difficult to get close to peak performance

– for sequential computation• memory wall• optimum pipelining, ...

– for parallel interaction• Amdahl's law• synchronization with single late-comer, ...

efficiency is important because of limited resources

scalability is important to cope with next bigger simulation

Why bother with performance analysis?

5

Profile Recording

of aggregated information (Time, Counts, …)

about program and system entities

– functions, loops, basic blocks

– application, processes, threads, …

Methods of Profile Creation

sampling (statistical approach)

direct measurement (deterministic approach)

Profiling and Tracing

6

Trace Recording

run-time events (points of interest)

during program execution

saved as event record

– timestamp, process, thread, event type

– event specific information

via instrumentation & trace library

Event Trace

collection of all events of a process / program

sorted by time stamp

Profiling and Tracing

7

Tracing Advantages

preserve temporal and spatial relationships (context)

allow reconstruction of dynamic behavior

profiles can be calculated from traces

Tracing Disadvantages

traces can become very large

may cause perturbation

instrumentation and tracing is complicated

– event buffering, clock synchronization, …

Profiling and Tracing

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Event Tracing Overview

9

Event Tracing from A to Z

Instrumentation Run TimeMeasurement

Visualization / Analysis

src

exec.

instrument

instrument

exec.

trace file(s)

see more belowsee followingpresentation

10

Which events to monitor?

enter/leave of function/routine/region

– time stamp, process/thread, function ID

send/receive of P2P message (MPI)

– time stamp, sender, receiver, length, tag, communicator

collective communication (MPI)

– time stamp, process, root, communicator, # bytes

hardware performance counter value

– time stamp, process, counter ID, value

corresponding “record types” in trace file format

Most common event types

11

10010 P 1 ENTER 5

10090 P 1 ENTER 6

10110 P 1 ENTER 12

10110 P 1 SEND TO 3 LEN 1024 ...

10330 P 1 LEAVE 12

10400 P 1 LEAVE 6

10520 P 1 ENTER 9

10550 P 1 LEAVE 9

...

10020 P 2 ENTER 5

10095 P 2 ENTER 6

10120 P 2 ENTER 13

10300 P 2 RECV FROM 3 LEN 1024 ...

10350 P 2 LEAVE 13

10450 P 2 LEAVE 6

10620 P 2 ENTER 9

10650 P 2 LEAVE 9

...

DEF TIMERRES 1000000000

DEF PROCESS 1 `Master`

DEF PROCESS 1 `Slave`

DEF FUNCTION 5 `main`

DEF FUNCTION 6 `foo`

DEF FUNCTION 9 `bar`

DEF FUNCTION 12 `MPI_Send`

DEF FUNCTION 13 `MPI_Recv`

Parallel Trace Files

Trace Format Schematics

12

Trace Visualization: Timeline Display

13

Trace Visualization: Process Timeline Display

14

Trace Visualization: Statistic Summary Display

15

Trace Visualization: Message Statistics Display

16

The Vampir Tool Family

VampirTrace

convenient instrumentation and measurement

hides away complicated details

provides many options and switches for experts

VampirTrace is part of Open MPI 1.3

Vampir/VampirServer

interactive trace visualization and analysis

intuitive browsing and zooming

scalable to large trace data sizes (100GB)

scalable to high parallelism (2000 processes)

Vampir for Windows in progress, beta versionavailable

17

Open Trace Format (OTF)

Open source trace file format

Includes powerful libotf for use in custom applications

High level interface for tools + low level interface for trace libraries

Other Formats

TAU Trace Format (Univ. of Oregon)

Epilog (ZAM, FZ Jülich)

STF (Pallas, now Intel)

Trace File Formats

18

Other Event Tracing Tools

TAU profiling (University of Oregon, USA)

– profiling and tracing for parallel applications

– http://www.cs.uoregon.edu/research/tau/

Paraver (CEPBA, Barcelona, Spain)

– trace based parallel performance analysis and visualization

– http://www.cepba.upc.edu/paraver/

Scalasca (FZ Jülich)

– tracing and automatic detection of performance problems

– http://www.scalasca.org/

Intel Trace Collector & Analyzer

– Very similar to Vampir

Other Tools

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Instrumentation

20

Instrumentation: Process of modifying programs to detect and reportevents by calling instrumentation functions.

instrumentation functions provided by trace library

notification about run-time event

there are various ways of instrumentation

Instrumentation

21

Edit – Compile – Run Cycle

Edit – Compile – Run Cycle with VampirTrace

Source Code Binary ResultsCompiler Run

Source Code Binary ResultsVT Wrapper

Run

Traces

Compiler

Instrumentation

22

Source code instrumentation

– manually

– automatically

Instrumentation with wrapper functions

Library pre-load instrumentation

Compiler Instrumentation

Binary instrumentation

VampirTrace supports different methods of instrumentation

Hidden in compiler wrappers

Instrumentation Types

23

int foo(void* arg) {

if (cond) {

return 1;

}

return 0;

}

int foo(void* arg) {

enter(7);

if (cond) {

leave(7);

return 1;

}

leave(7);

return 0;

}

manually or automatically

Source Code Instrumentation

24

manually

large effort

error prone

difficult to manage

automatically

via source to source translation

Program Database Toolkit (PDT)http://www.cs.uoregon.edu/research/pdt/

OOpenMP PPragma AAnd Region IInstrumentor (Opari)http://www.fz-juelich.de/zam/kojak/opari/

Source Code Instrumentation

25

provide wrapper functions

– call instrumentation function for notification

– call original target for actual functionality

implement via library pre-load

or via preprocessor directives

suitable for standard libraries (e.g. MPI, glibc)

can evaluate function call semantics (function signature, arguments)

#define fread WRAPPER_glibc_fread

#define fwrite WRAPPER_glibc_fwrite

Instrumentation with Wrapper Functions

26

wrapper library

Instrumentation via library pre-load, e.g. for MPI

Each MPI function has two names:

– MPI_xxx and PMPI_xxx

Selective replacement of MPI routines at link time

user program

MPI library

MPI_Send

PMPI_Send MPI_Send

MPI_Send

MPI_Send

MPI_SendMPI_Send

The MPI Profiling Interface

27

gcc -finstrument-functions –c foo.c

many compilers support instrumentation:

(GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …)

no common API, different command line switches, differentbehavior

no source modification necessary

managed by VampirTrace

void __cyg_profile_func_enter( <args> );

void __cyg_profile_func_exit( <args> );

Compiler Instrumentation

28

modify binary executable in main memory (or in a file)

insert instrumentation calls

very platform/machine dependent

expensive

Using the DynInst project

provides common interface to binary instrumentation

available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX,Sparc/Solaris, x86/Linux+Windows, ia64/Linux

see http://www.dyninst.org

Dynamic Instrumentation

29

Use VampirTrace compiler wrappers

Internals and plattform specifics hidden

Select appropriate way(s) of instrumentation

Substitute calls to the regular compiler with calls to compilerwrappers

CC=mpicc

CC=vtcc

Practical Instrumentation

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Run Time Measurement

31

What does the trace library do?

provide instrumentation functions

receive events of various types

collect event properties

– time stamp

– location (thread, process, cluster node, MPI rank)

– event specific properties

– perhaps hardware performance counter values

record to memory buffer, flush eventually

try to be fast, minimize overhead

Trace Library

32

There are a number of run-time options

Controlled by environment variables

PAPI hardware performance counters

Memory allocation counters

Application I/O calls

Filtering

Grouping

more ...

see more in the following presentations and hands-on parts

Run-Time Options

33

Include hardware performance counters in traces

– via PAPI library

– or Sun Solaris CPC counters

– or NEC SX counters

VT_METRICS can be used to specify a colon-separated list of counters

see papi_avail and papi_command_line tools etc.

see VampirTrace Documentation for CPC and NEC counters

set VT_METRICS environment variable

export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM

Performance Counters

34

monitor memory allocation behavior

record memory volume as counter

record glibc calls like “malloc” and “free” as function calls

via environment variable VT_MEMTRACE

export VT_MEMTRACE=yes

Memory Allocation Tracing

35

monitor POSIX I/O behavior

record read/write rates as counters

record standard I/O calls like “open” and “read”

via environment variable VT_IOTRACE

mmap I/O not supported

export VT_IOTRACE=yes

I/O Tracing

36

selective tracing of certain functions/subroutines

one way to reduce trace file size!

via environment variable VT_FILTER_SPEC

run-time filtering, no re-compilation or re-linking

see also the vtfilter tool

– can create a filter file with rough target size estimate

– can apply a filter to an existing trace file as post processing

export VT_FILTER_SPEC=/home/user/filter.spec

my*;test -- 1000calculate -- -1* -- 1000000

Function Filtering

37

defined user specified groups

highlighting application behavior, different activities, program phases

– communication, computation, initialization, different libraries, ...

groups are assigned to colors in Vampir displays

run-time grouping, no re-compilation or re-linking

via environment variable VT_GROUPS_SPEC

contains a list of groups of associated functions, wildcards allowed

export VT_GROUPS_SPEC=/home/<user>/groups.spec

CALC=calculateMISC=my*;testUNKNOWN=*

Function Grouping

38

Further activities of the trace library:

Data management

– Trace data is written to a buffer in memory first

– When this buffer is full, data is flushed to files

– Data compression, etc

Timer selection and time synchronization between local clocks

– use highly accurate clocks

Unification of local process/thread traces (post processing)

– trace processes/threads separately

– collect all traces of all parallel processes/threads at the end

– add global information about all participants

Behind the Scenes

Zellescher Weg 12

Willers-Bau A114

Tel. +49 351 - 463 - 38323

Andreas Knüpfer (andreas.knuepfer@tu-dresden.de)

Conclusions

40

performance analysis is very important in HPC

use performance analysis tools for profiling and tracing

do not spend effort in DIY solutions, e.g. like printf-debugging

use tracing tools with some precautions

– overhead

– data volume

let us know about problems and about feature wishes viavampirsupport@zih.tu-dresden.de

Conclusion

41

available via http://www.vampir.eu/ and http://www.tu-dresden.de/zih/vampirtrace/

Thank you !