1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial...

45
1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel Vijay Janapa Reddi, University of Colorado * Other names and brands may be claimed as the property of others

Transcript of 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial...

Page 1: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

1

Software Instrumentation and Hardware Profiling for Intel®

Itanium® Linux* CGO’04 Tutorial

3/21/04

Robert Cohn, Intel

Stéphane Eranian, HP

CK Luk, Intel

Vijay Janapa Reddi, University of Colorado* Other names and brands may be claimed as the property of others

Page 2: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

2

Agenda

• Instrumentation – Robert, Vijay• Itanium Performance Monitoring Unit – CK• Break• Linux/IA64 Support for Performance Monitoring

– Stéphane• Guiding Ispike with Instrumentation and

Hardware (PMU) Profiles - CK

Page 3: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

3

Instrumentation of Intel® Itanium® Linux* Programs with

Pindownload: http://systems.cs.colorado.edu/Pin/

Robert Cohn

[email protected]

Intel

* Other names and brands may be claimed as the property of others

Page 4: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

4

What Does Pin Stand For?

• Pin Is Not an acronym

• Pin is based on the post link optimizer spike– Use dynamic code generation to make a less

intrusive profile guided optimization and instrumentation system

– Pin is a small spike

Page 5: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

5

Instrumentation

Max = 0;for (p = head; p; p = p->next){

if (p->value > max){

max = p->value;}

}

count[0]++;

count[1]++;

printf(“In Loop\n”);

printf(“In max\n”);

User definedDynamic

Page 6: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

6

What Can You Do With Instrumentation?

• Profiler for optimization:– Basic block count

– Value profile

• Micro architectural study– Instrument branches to simulate branch predictor

– Generate traces

• Bug checking– Find references to uninitialized, unallocated data

Page 7: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

7

Classification of Instrumentation Tools

Source Binary

src to src compiler

cc

static dynamic

JIT Editing

Pin Dyninst

Atom

CIL

See last slide for references and more examples

Page 8: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

8

JIT-Based:Execution Drives Instrumentation

2 3

1

7

4 5

67’

2’

1’

Compiler

Originalcode

Codecache

Page 9: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

9

Execution Drives Instrumentation

2 3

1

7

4 5

67’

2’

1’

Compiler

Originalcode

Codecache

3’

5’

6’

Page 10: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

10

Inserting Instrumentation

Relative to an instruction:1. Before2. After3. Taken edge of branch

L2: mov r9 = 4

br.ret

count(3)

count(100)count(105)

mov r4 = 2

(p1) br.cond L2

add r3=8,r9

Page 11: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

11

Analysis Routines

• Instead of inserting IPF instructions, user inserts calls to analysis routine– User specified arguments– E.g. Increment counter, record memory

address, …

• Written in C, ASM, etc.• Optimizations like inlining, register

allocation, and scheduling make it efficient

Page 12: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

12

Instrumentation Routines

• Instrumentation routine walks list of instructions, and inserts calls to analysis routines

• User writes instrumentation routine• Pin invokes instrumentation routine when

placing new instructions in code cache• Repeated execution uses already

instrumented code in code cache

Page 13: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

13

Example: Instruction Count

[rscohn1@shli0005 Tests]$ hello

Hello world

[rscohn1@shli0005 Tests]$ icount -- hello

Hello world

ICount 496890

[rscohn1@shli0005 Tests]$

Page 14: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

14

Example: Instruction Count

mov r2 = 2

add r3 = 4, r3

(p2) br.cond L1

add r4 = 8, r4

br.cond L2

counter++;

counter++;

counter++;

counter++;

counter++;

Page 15: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

15

#include <stdio.h>#include "pinstr.H"

UINT64 icount=0;

// Analysis Routinevoid docount() { icount++; }

// Instrumentation Routinevoid Instruction(INS ins){ PIN_InsertCall(IPOINT_BEFORE, ins,

(AFUNPTR)docount, IARG_END);}

VOID Fini(){ fprintf(stderr,"ICount %lld\n", icount);}

int main(int argc, char *argv[]){ PIN_AddInstrumentInstructionFunction(Instruction); PIN_AddFiniFunction(Fini); PIN_StartProgram();}

Page 16: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

16

Example: Instruction Trace

[rscohn1@shli0005 Trace]$ itrace -- helloHello world[rscohn1@shli0005 Trace]$ head prog.trace0x20000000000045c00x20000000000045c10x20000000000045c20x20000000000045d00x20000000000045d20x20000000000045e00x20000000000045e10x20000000000045e2[rscohn1@shli0005 Trace]$

Page 17: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

17

Example: Instruction Trace

mov r2 = 2

add r3 = 4, r3

(p2) br.cond L1

add r4 = 8, r4

br.cond L2

traceInst(ip);

traceInst(ip);

traceInst(ip);

traceInst(ip);

traceInst(ip);

Page 18: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

18

#include <stdio.h>#include "pinstr.H"FILE *traceFile;void traceInst(long * ipsyll){ fprintf(traceFile, "%p\n", ipsyll);}void Instruction(INS ins){ PIN_InsertCall(IPOINT_BEFORE, ins,

(AFUNPTR)traceInst, IARG_IP_SLOT, IARG_END);}int main(int argc, char *argv[]){ PIN_AddInstrumentInstructionFunction(Instruction); traceFile = fopen("prog.trace", "w"); PIN_StartProgram();}

Page 19: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

19

Arguments to Analysis Routine

• IARG_UINT8, …, IARG_UINT64

• IARG_REG_VALUE <register name>

• IARG_IP_SLOT

• IARG_BRANCH_TAKEN

• IARG_BRANCH_TARGET_ADDRESS

• IARG_THREAD_ID

• IARG_IN_SIGNAL

Page 20: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

20

More Advanced Tools

• Instruction cache simulation: replace itrace analysis function

• Data cache: like icache, but instrument loads/stores and pass effective address

• Malloc/Free trace: instrument entry/exit points• Detect out of bound stack references

– Instrument instructions that move stack pointer

– Instrument loads/stores to check in bound

Page 21: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

21

Example: Faster Instruction Count

mov r2 = 2

add r3 = 4, r3

(p2) br.cond L1

add r4 = 8, r4

br.cond L2

counter++;

counter++;

counter++;

counter++;

counter++;

counter += 3;

counter += 2;

Page 22: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

22

Sequences

• List of instructions that is only entered from top

Program:mov r2 = 2

L2:

add r3 = 4, r3

add r4 = 8, r4

br.cond L2

Sequence 1:mov r2 = 2

add r3 = 4, r3

add r4 = 8, r4

br.cond L2

Sequence 2:

add r3 = 4, r3

add r4 = 8, r4

br.cond L2

Page 23: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

23

void docount(UINT64 c) { icount += c; }void Sequence(INS head) { INS ins; INS last = INS_INVALID(); UINT64 count = 0; for (ins = head; ins != INS_INVALID(); ins = INS_Next(ins)) { count++; switch(INS_Category(ins)) { case TYPE_CAT_BRANCH: case TYPE_CAT_CBRANCH: case TYPE_CAT_JUMP: case TYPE_CAT_CJUMP: case TYPE_CAT_CHECK: case TYPE_CAT_BREAK: PIN_InsertCall(IPOINT_BEFORE, ins, (AFUNPTR)docount, IARG_UINT64, count, IARG_END); count = 0; break; } last = ins; } PIN_InsertCall(IPOINT_AFTER, last, (AFUNPTR)docount, IARG_UINT64, count, IARG_END); }

Page 24: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

24

Instruction Information Accessed at Instrumentation Time

1. INS_Category(INS)

2. INS_Address(INS)

3. INS_Regr1, INS_Regr2, INS_Regr3, …

4. INS_Next(INS), INS_Prev(INS)

5. INS_BraType(INS)

6. INS_SizeType(INS)

7. INS_Stop(INS)

Page 25: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

25

Callbacks

• Call backs for instrumentation– PIN_AddInstrumentInstructionFunction

– PIN_AddInstrumentSequenceFunction

• Other callbacks– PIN_AddImageLoadFunction

– PIN_AddImageUnloadFunction

– PIN_AddThreadBeforeFunction

– PIN_AddThreadAfterFunction

– PIN_AddFiniFunction: last thread exits

Page 26: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

26

Instrumentation is Transparent

• When application looks at itself, sees same:

– Code addresses

– Data addresses

– Memory contents Don’t want to change behavior, expose latent bugs

• When instrumentation looks at application, sees original application:

– Code addresses

– Data addresses

– Memory contents Observe original behavior

Page 27: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

27

Pin Instruments All Code

• Execution driven instrumentation:– Shared libraries– Dynamically generated code

• Self modifying code– Instrumented first time executed– Pin does not detect code has been modified

Page 28: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

28

Dynamic Instrumentation

• While program is running:– Instrumentation can be turned on/off– Code cache can be invalidated– Reinstrumented the next time it is executed– Pin can detach and run application native

• Use this for fast skip

Page 29: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

29

Making Instrumentation Fast

• Use sequences to reduce analysis calls• Do work at instrumentation-time

– Instrumentation functions executed once– Analysis function executed every time

instruction is executedPIN_InsertCall(ins, IPOINT_BEFORE, Fun, Lookup(INS_Address(ins)), IARG_END);Fun(BUCKET *){…

Better than:PIN_InsertCall(ins, IPOINT_BEFORE, Fun, IARG_IP, IARG_END);Fun(UINT64 ip){ Lookup(ip); …

Page 30: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

30

Future

• Support for other ISA’s– x86– Arm

• Attach to running process

Page 31: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

31

Advanced Topics

• Symbol table

• Altering program behavior

• Threads

• Debugging

• Jitting

Page 32: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

32

Symbol Table/Image

• Query:– Address symbol name– Address image name (e.g. libc.so)– Address source file, line number

• Instrumentation:– Procedure before/after

• PIN_InsertCall(IPOINT_BEFORE, Sym, Afun, IARG_REG_VALUE, REG_REG_GP_ARG0, IARG_END)

• Before: at entry point• After: immediately before return is executed

– Catch image load/unload

Page 33: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

33

Alter Program Behavior

• Analysis routines can write application memory

• Replace one procedure with another– E.g. replace library calls– PIN_ReplaceProcedureByAddress(address,

funptr);– Replaces function at address with funptr

Page 34: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

34

Alter Program Behavior

• Change values of registers: – PIN_InsertCall(IPOINT_BEFORE, ins, zap,

IARG_RETURN_VALUES, IARG_REG_VALUE, REG_G03, IARG_END);

– Return value of function zap is written to r3

Page 35: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

35

Alter Program Behavior

• Change instructions:Ld8 r3=[r4]Becomes:Ld8 r3=[r9]– INS_RegR1Set(ins, REG_G09)

• Pin provides virtual registers for instrumentation:– REG_INST_GP0 – REG_INST_GP9

Page 36: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

36

Threads

• Pin is thread safe• Pin assumes your tool is not thread safe

– Tell pin how many threads your tool can handle– PIN_SetMaxThreads(INT)

• Make your tool thread safe– Instrumentation code: guarded by single lock– Analysis code – protect global data structures

• Lock– Use pin provided routines, not pthreads

• Thread local storage– Use IARG_THREAD_ID to index into array

Page 37: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

37

Debugging

• Instrumentation code– Use gdb

• Analysis code– Pin dynamically optimizes analysis code to

reduce cost of instrumentation (up to 10x)– Disable optimization to use gdb

• icount –p pin –O0 -- /bin/ls

– Otherwise, use printf

Page 38: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

38

Jitting

• Fetch

• Instrument

• Translate

• Stub

• Allocate registers

• Generate code

Page 39: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

39

Fetch

2 3

1

7

4 5

67’

2’

1’

Originalcode

Codecache

Page 40: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

40

Instrument

ld8 r36=[r14]

mov r32=r1

nop.m 0x0

mov r38 = r2

br.call

b0=print

alloc r34=ar.pfs,6,4,0

mov r35=r12

adds r2=-16,r1

…..

br.ret.sptk.many b0;;

Application

Setup call frame

Setup arguments

Call

Undo call frame

return

Call

Bridge Instrumentation

Save scratch

Restore scratch

Page 41: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

41

Translate

• Jitting does not change architectural behavior– Registers/memory– Timing is not architectural

• Change instructions to preserve old behavior

Page 42: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

42

Translate

• New Code is not at same address0x4000: mov b1 = ip

movl rscr1 = 0x4000; mov b1 = rscr1

• Branches always target code cachebr.cond 0x4010

br.cond 0x5030

• New Code is not at same address/ Branches always target code cache0x4000: br.call.many b0 = 0x4100

movl rscr1 = 0x4010; mov b0 = rscr1; br.call bscr1 = 0x5010

• Indirect targets resolved at runtimebr.ret b0

movl rscr1 = 0x4010; mov rscr2 = b0; cmpne p1,p0=rscr1, rscr2; br.cond (p1)

Page 43: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

43

Stub

2 3

1

7

4 5

67’

2’

1’

Compiler

Originalcode

Codecache

3’

5’

6’ stub

stubstub

Page 44: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

44

Instrumentation/Hardware Counters

• Instrumentation is better when:– Exact profile needed

• code coverage must distinguish 0/1 execution

– No hardware to measure event• Modeled cache does not match current hardware• Store followed by load to the same address

• But instrumentation may:– Be too slow compared to sampling/counting– Change the behavior of program by slowing it down– Not be able to observe microarchitectural events

Page 45: 1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.

45

More Reading on Instrumentation Systems:Static:

Source:

CIL: http://manju.cs.berkeley.edu/cil/

Binary:

Atom: static instrumentation for Alpha

Papers: http://research.compaq.com/wrl/projects/om/wrlpapers.html

Manual: http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARH9VDTE/TITLE.HTM

Etch: static instrumentation of x86 Windows, http://citeseer.ist.psu.edu/romer97instrumentation.html

EEL: machine independent executable editing, http://citeseer.ist.psu.edu/context/10359/0

Vulcan: ftp://ftp.research.microsoft.com/pub/tr/tr-2001-50.pdf

Dynamic:

JIT based:

Pin: dynamic instrumentation ipf Linux, http://systems.cs.colorado.edu/Pin/

DynamoRio x86 Linux and Windows, http://www.cag.lcs.mit.edu/rio/

Diota: dynamic instrumentation of x86 Linux: http://www.elis.rug.ac.be/~ronsse/diota/

Editing based:

Dyninst: dynamic code generation api for multiple architectures/OS, http://www.dyninst.org/

Caliper: itanium HP/UX www.hp.com/go/caliper

Vulcan: ftp://ftp.research.microsoft.com/pub/tr/tr-2001-50.pdf