Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

42
Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

Transcript of Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

Page 1: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

Process Virtualizationand Symbiotic Optimization

Kim HazelwoodACACES Summer School

July 2009

Page 2: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

Currently – Assistant Professor at University of Virginia– Faculty Consultant at Intel

Previously– PostDoc at Intel (2004-2005)– PhD from Harvard (2004)– Four summer internships (HP & IBM)

– Worked with Dynamo, Jikes RVM, …

Other Interests– Marathons (Boston, NYC, Disney)– Reality TV Shows– Family (8 month old at home!)

2

About Your Instructor

Page 3: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

About the Course

• Day 1 – What is Process Virtualization?

• Day 2 – Building Process Virtualization Systems

• Day 3 – Using Process Virtualization Systems

• Day 4 – Symbiotic Optimization

• We’ll use Pin as a case studywww.pintool.org

• You’ll have homework!

3

Page 4: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

What is Process Virtualization?

System virtualization – allows multiple OSes to share the same hardware

Process virtualization – runs as a normal application (on top of an OS) and supports a single process

4

HW HWVMM OS

OS1 OS2

App1 App2

DBTApp1

DBIApp2

System Virtualization

Process Virtualization

Page 5: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization5

Classifying Virtualization

Dynamic binary optimization (x86 x86--)• Complement the static compiler

– User inputs, phases, DLLs, hardware features– Examples: DynamoRIO, Mojo, Strata

Dynamic translation (x86 PPC)• Convert applications to run on a new architecture

– Examples: Rosetta, Transmeta CMS, DAISY

Dynamic instrumentation (x86 x86++)• Inspect/add features to existing applications

– Examples: Pin, Valgrind

Page 6: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

6

A Simple Example of Instrumentation

Inserting extra code into a program to collect runtime information

sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax

counter++;

counter++;

counter++;

counter++;

counter++;

Page 7: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization7

Instruction Count Output

$ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out

$ pin -t inscount.so -- /bin/ls

Makefile imageload.out itrace proccount imageload inscount atrace itrace.out

Count 422838

Page 8: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization8

A Simple Example of Optimization

On Pentium 3, inc is faster than add

On Pentium 4, add is faster than inc

sub $0xff, %edxcmp %esi, %edxjle <L1>mov $0x1, %ediinc %eax

sub $0xff, %edxcmp %esi, %edxjle <L1>mov $0x1, %ediadd $0x1, %eax

Page 9: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

Research Applications

Computer Architecture

• Trace Generation

• Fault Tolerance Studies

• Emulating New Instructions

Program Analysis

• Code coverage

• Call-graph generation

• Memory-leak detection

• Instruction profiling

Multicore

• Thread analysis– Thread profiling– Race detection

• Cache simulations

Compilers

• Compare programs from competing compilers

Security

• Add security checks and features

9

Page 10: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization10

Approaches

• Source modification:– Modify source programs

• Binary modification:– Modify executables directly

Advantages for binary modification Language independent Machine-level view Modify legacy/proprietary software

Page 11: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization11

Static vs Dynamic Approaches

Dynamic approaches are more robust No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes

The Code Discovery Problem on x86Instr 1 Instr 2

Instr 3 JumpReg DATA

Instr 5 Instr 6Uncond Branch PADDING

Instr 8

Indirect jump to ??

Data interspersed with code

Pad for alignment

Page 12: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization12

Dynamic Modification: Approaches

JIT Mode• Create a modified copy of the application on-the-fly• Original code never executes

More flexible, more common approach

Probe Mode• Modifies the original application instructions• Inserts jumps to modified code (trampolines)

Lower overhead (less flexible) approach

Page 13: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization13

JIT-Mode Binary Modification

Generate and cache modified copies of instructions

Modified (cached) instructions are executed in lieu of original instructions

EXE

Transform

CodeCache

Execute

Profile

Page 14: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization14

JIT-Mode Instrumentation

Original code Code cache

Fetch trace starting block 1 and start instrumentation

7’

2’

1’

Pin

2 3

1

7

45

6

Exits point back to VMM

Page 15: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization15

JIT-Mode Instrumentation

Original code Code cache

Transfer control intocode cache (block 1)

2 3

1

7

45

67’

2’

1’

Pin

Page 16: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization16

JIT-Mode Instrumentation

Original code Code cache

7’

2’

1’

PinFetch and instrument a new trace

6’

5’

3’trace linking

2 3

1

7

45

6

Page 17: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization17

Instrumentation Approaches

JIT Mode• Create a modified copy of the application on-the-fly• Original code never executes

More flexible, more common approach

Probe Mode• Modify the original application instructions• Insert jumps to instrumentation code (trampolines)

Lower overhead (less flexible) approach

Page 18: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization18

A Sample Probe

• A probe is a jump instruction that overwrites original instruction(s) in the application– Copy/translate original bytes so probed functions

can be called

Entry point overwritten with probe:0x400113d4: jmp

0x414810640x400113d9: push %ebx

Copy of entry point w/ original bytes:0x50000004: push %ebp0x50000005: mov %esp,%ebp0x50000007: push %edi0x50000008: push %esi0x50000009: jmp 0x400113d9

Original function entry point:0x400113d4: push %ebp0x400113d5: mov %esp,%ebp0x400113d7: push %edi0x400113d8: push %esi0x400113d9: push %ebx

Page 19: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization19

Probe Instrumentation

Advantages:

• Low overhead – few percent

• Less intrusive – execute original code

Disadvantages:

• More tool writer responsibility

• Restrictions on where to modify (routine-level)

Page 20: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization20

Probe Tool Writer Responsibilities

No control flow into the instruction space where probe is placed• 6 bytes on IA32, 7 bytes on Intel64, bundle on IA64• Branch into “replaced” instructions will fail• Probes at function entry point only

Thread safety for insertion/deletion of probes• During image load callback is safe• Only loading thread has a handle to the image

Replacement function has same behavior as original

Page 21: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization21

Probe vs. JIT Summary

Probes JIT

Overhead Few percent 50% or higher

Intrusive Low High

Granularity Function boundary

Instruction

Safety & Isolation

More responsibility for tool writer

High

Page 22: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

Process Virtualization Systems

Readily Available

• DynamoRIO

• Valgrind

• Pin

Available By Request

• Strata

• Adore

Unavailable

• Transmeta CMS

• Dynamo

22

Page 23: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

DynamoRIO

23

Page 24: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

Valgrind

24

Page 25: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

Pin

25

Page 26: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization26

Intel Pin

Dynamic Instrumentation:• Do not need source code, recompilation, post-linking

Programmable Instrumentation:• Provides rich APIs to write in C/C++ your own instrumentation

tools (called Pintools)

Multiplatform:• Supports x86, x86-64, Itanium, Xscale• Supports Linux, Windows, MacOS

Robust:• Instruments real-life applications: Database, web browsers, …• Instruments multithreaded applications• Supports signals

Efficient:• Applies compiler optimizations on instrumentation code

Page 27: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization27

Using Pin

Launch and instrument an application $ pin –t pintool.so –- application

Instrumentation engine

(provided in the kit)

Instrumentation tool

(write your own, or use one provided in the kit)

Attach to and instrument an application $ pin –t pintool.so –pid 1234

Page 28: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization28

Pin Instrumentation APIs

Basic APIs are architecture independent:• Provide common functionalities like determining:

– Control-flow changes– Memory accesses

Architecture-specific APIs• e.g., Info about opcodes and operands

Call-based APIs:• Instrumentation routines• Analysis routines

Page 29: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization29

Instrumentation vs. Analysis

Concepts borrowed from the ATOM tool:

Instrumentation routines define where instrumentation is inserted• e.g., before instruction Occurs first time an instruction is executed

Analysis routines define what to do when instrumentation is activated• e.g., increment counter Occurs every time an instruction is executed

Page 30: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization30

Pintool 1: Instruction Count

sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax

counter++;

counter++;

counter++;

counter++;

counter++;

Page 31: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization31

Pintool 1: Instruction Count Output

$ /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out

$ pin -t inscount0.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out

Count 422838

Page 32: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization32

ManualExamples/inscount0.cpp

instrumentation routine

analysis routine

#include <iostream>#include "pin.h"

UINT64 icount = 0;

void docount() { icount++; } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);}

void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; }

int main(int argc, char * argv[]){ PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}

Page 33: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization33

Pintool 2: Instruction Trace

sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax

Print(ip);

Print(ip);

Print(ip);

Print(ip);

Print(ip);

Need to pass ip argument to the analysis routine (Printip())

Page 34: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization34

Pintool 2: Instruction Trace Output

$ pin -t itrace.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out

$ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5

Page 35: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization35

ManualExamples/itrace.cpp

argument to analysis routine

analysis routineinstrumentation routine

#include <stdio.h>#include "pin.h"FILE * trace;void printip(void *ip) { fprintf(trace, "%p\n", ip); }

void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END);}void Fini(INT32 code, void *v) { fclose(trace); }int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0);

PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}

Page 36: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization36

Examples of Arguments to Analysis Routine

IARG_INST_PTR– Instruction pointer (program counter) value

IARG_UINT32 <value>– An integer value

IARG_REG_VALUE <register name>– Value of the register specified

IARG_BRANCH_TARGET_ADDR– Target address of the branch instrumented

IARG_MEMORY_READ_EA– Effective address of a memory read

And many more … (refer to the manual for details)

Page 37: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization37

Instrumentation Points

Instrument points relative to an instruction:

• Before: IPOINT_BEFORE• After:

– Fall-through edge: IPOINT_AFTER– Taken edge: IPOINT_TAKEN_BRANCH

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

<L1>: mov $0x8,%edi

count()

count()

count()

Page 38: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization38

• Instruction• Basic block

– A sequence of instructions terminated at a control-flow changing instruction

– Single entry, single exit• Trace

– A sequence of basic blocks terminated at an unconditional control-flow changing instruction

– Single entry, multiple exits

Instrumentation Granularity

sub $0xff, %edxcmp %esi, %edxjle <L1>

mov $0x1, %ediadd $0x10, %eaxjmp <L2>1 Trace, 2 BBs, 6

insts

Instrumentation can be done at three different granularities:

Page 39: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization39

Pintool 3: Faster Instruction Count

sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax

counter += 3

counter += 2basic blocks (bbl)

Page 40: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization40

ManualExamples/inscount1.cpp#include <stdio.h>#include "pin.H“UINT64 icount = 0;void docount(INT32 c) { icount += c; }void Trace(TRACE trace, void *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); }}void Fini(INT32 code, void *v) { fprintf(stderr, "Count %lld\n", icount);}int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}

analysis routineinstrumentation routine

Page 41: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization

What Did We Learn Today?

• Overview of Process Virtualization

• Approaches• Source vs. Binary• Static vs. Dynamic• JIT vs. Probes

• Three Available Systems

• Three Simple Examples

41

Page 42: Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.

ACACES 2009 – Process Virtualization42

Want More Info?

• Read Jim Smith’s book: Virtual Machines

• Download one (or more) of them!

Pin www.pintool.org

DynamoRIO code.google.com/p/dynamorio

Valgrind www.valgrind.org

Day 1 – What is Process Virtualization?Day 2 – Building Process Virtualization SystemsDay 3 – Using Process Virtualization SystemsDay 4 – Symbiotic Optimization