LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability:...

23
LLDSAL A L ow-L evel D omain-S pecific A spect L anguage for Dynamic Code-Generation and Program Modification Mathias Payer, Boris Bluntschli, & Thomas R. Gross Department of Computer Science ETH Zürich, Switzerland

Transcript of LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability:...

Page 1: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

LLDSALA Low-Level Domain-Specific Aspect Language

for Dynamic Code-Generation and Program Modification

Mathias Payer, Boris Bluntschli, & Thomas R. GrossDepartment of Computer Science

ETH Zürich, Switzerland

Page 2: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 2

1

2

34

Motivation: program instrumentation

LLDSAL enables runtime code generation in a Dynamic Binary Translator (DBT)

Page 3: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 3

Motivation: program instrumentation

LLDSAL enables runtime code generation in a Dynamic Binary Translator (DBT)

● External aspects extend program functionality

● Internal aspects to implement the instrumentation framework

1

2

34

1'

2'

3' 4'

DynamicBinary

Translation(DBT)

Page 4: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 4

Problem: code generation in DBT

DBT needs aspects that bridge between (translated) application and DBT world

● No calling conventions, must store everything

● Dynamic environment, no static addresses or locations

● Code must be fast (JIT-able)

char *code = ...;

BEGIN_ASM(code) addl $5, %eax movl %eax, %edx incl {&my_var}END_ASM

Page 5: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 5

Solution: LLDSAL

Low-level Domain Specific Aspect/Assembly Language● Aspects have access to high-level language constructs

● Aspects adhere to low-level conventions

DBT and LLDSAL enable AOP without any hooks● JIT binary rewriting adds aspects on the fly

LLDSAL status: implemented and in use● LLDSAL used for internal aspects of a BT (fastBT)

● LLDSAL guarantees security properties (libdetox security framework)

Page 6: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 6

Outline

Motivation

Background: Binary Translation (BT)

Language design

Implementation

Related work

Conclusion

Page 7: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 7

Binary Translation in a nutshell

● Translates individual basic blocks● Checks branch targets and origins● Weaves aspects into translated code

1 1'2 2'3 3'… ...

Original code Code cacheMapping table

Translator

1

2

43

1'

2'

3'

R RX

Indirect control flow transfers use a dynamic check to verify target and origin

Page 8: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 8

Outline

Motivation

Binary Translation (BT)

Language design● Dynamic assembly language

● Data (variable) access

● Example: Dynamic code generation

Implementation

Related work

Conclusion

Page 9: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 9

Language design

Usability: low-level / high-level trade-off● Mix assembly code plus access to high-level language constructs

Integration into host language● DSL integrates naturally into the host language

No runtime dependencies● Source-to-source translation (LLDSAL to C code)

LLDSAL defines a dynamic assembly language● Enables dynamic low-level code generation at runtime

Page 10: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 10

Dynamic assembly language

LLDSAL combines assembly code with access to high-level data structures

● Expressiveness and syntax comparable to inline assembler

● JIT code generation at runtime, optimization for data-accesses

● Parameters encoded (inlined) into instructions

char *code = ...;

BEGIN_ASM(code) addl $5, %eax movl %eax, %edx incl {&my_var}END_ASM

Pointer to code

Variable access

Assembly block

Page 11: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 11

Comparison LLDSAL vs. inline asm

Code generation● Inline asm executes code inline

● LLDSAL generates code inline

Access to dynamic or thread local data● Inline asm uses indirect memory references (pointer chasing)

● LLDSAL embeds direct pointers in generated code

asm ("incl %0\n" : "=a"(myvar) : "0"(myvar));

char *code = ...;

BEGIN_ASM(code) addl $5, %eax movl %eax, %edx incl {&my_var}END_ASM

Page 12: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 12

Data (variable) access

JIT-compiled code enables new data access patterns● LLDSAL enables variable access in host space using {variable}

Variable addresses directly encoded in emitted code● No parameters are passed

● No indirection or pointer chasing

// inside indirect_call actionBEGIN_ASM(code) incl {&tld->stat->nr_ind_calls}END_ASM

Page 13: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 13

Dynamic code generation

long result = 5;char *target = ...;void_func f = (void_func)target; { BEGIN_ASM(target) pushl ${result} call_abs {my_func} movl %eax, {&result} addl $4, %esp ret END_ASM}

f(); // result == 25

typedef void (*void_func)();long my_func(long a) { return a * a; }

pushes $5 to the stack

my_func(5)

result = my_func(5)

Clean-up and returnExecute dynamic code

Page 14: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 14

Outline

Motivation

Binary Translation (BT)

Language design

Implementation

Related work

Conclusion

Page 15: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 15

LLDSAL implementationSource file

*.dslGNU C

preprocessor

LLDSALProcessing

GNUassembler

objdump

C output*.c

GNU Ccompiler

Compiled object*.o

Translate LLDSALto C code

Page 16: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 16

LLDSAL alternatives

Macro-based approach

● No additional compilation pass needed

● Error prone, manual encoding

JIT code generation (GNU lightning, asmjit)● Very flexible, dynamic register allocation

● High overhead, library dependencies

#define PUSHL_IMM32(dst, imm) \\*dst++=0x68; *((int_32_t*)dst)=imm; dst+=4

...PUSHL_IMM32(code, 0xdeadbeef);

Page 17: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 17

Outline

Motivation

Binary Translation (BT)

Language design

Implementation

Related work

Conclusion

Page 18: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 18

Related work

Compile-time DSL parsing [Porkolab et al., GPCE'10]● LLDSAL first dynamic low-level DSAL for BT

Guyer and Lin describe an approach to optimize libraries for different environments [DSL'99]

● Annotation based, LLDSAL uses assembly code with high-level data access

Khepora is an approach to s2s DSLs [Faith et al., DSL'97]● Full DSL parsing using syntax trees, too heavy-weight for LLDSAL

Page 19: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 19

ConclusionLLDSAL enables dynamic code generation for DBTs

● Direct access to host variables and data structures

● Low-overhead (no arguments passed, low-level encoding)

● No library dependencies

LLDSAL raises level of interaction between developer and BT framework

● Increased readability of code

● Better maintainability due to automatic translation

Page 20: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 20

Thank you for your attention

?

Page 21: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 21

Data (variable) access

Use the address of the variable ${&foo}● Instruction stores current address as immediate

Encode the (static) value of the variable ${foo}● Instruction stores current value as immediate

Use dynamic value of variable {&foo}● Instruction stores address of variable and encodes memory

dereference

Use dynamic value of the address of the variable {foo}● Instruction stores value as immediate and encodes memory

dereference

Page 22: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 22

Data (variable) access

pushl ${tld}● Push current value of tld onto stack

movl {tld->stack-1}, %esp● Read value from *(tld->stack-1) and store it in %esp

movl ${tld->stack-1}, %esp● Store address of (tls->stack-1) in %esp

movl %eax, {&tld->saved_eax}● Store %eax at &tld->saved_eax

Page 23: LLDSAL - nebelwelt.net · 2012-03-27 Mathias Payer, ETH Zürich 9 Language design Usability: low-level / high-level trade-off Mix assembly code plus access to high-level language

2012-03-27 Mathias Payer, ETH Zürich 23

Example (indirect lookup, inside BT)BEGIN_ASM(transl_instr) pushfl pushl %ebx pushl %ecx movl 12(%esp), %ebx // Load target address movl %ebx, %ecx // Duplicate RIP

/* Load hashline (eip element) */ andl ${MAPPING_PATTERN >> 3}, %ebx; cmpl {tld->mappingtable}(, %ebx, 8), %ecx; jne nohit

hit: // Load target movl {tld->mappingtable+4}(, %ebx, 8), %ebx

movl %ebx, {&tld->ind_target} popl %ecx popl %ebx popfl leal 4(%esp), %esp jmp *{&tld->ind_target}

nohit: // recover mode - there was no hit! ...END_ASM