Compiler optimizations based on call-graph flattening
-
Upload
cafxx -
Category
Technology
-
view
1.420 -
download
1
description
Transcript of Compiler optimizations based on call-graph flattening
![Page 1: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/1.jpg)
Compiler optimizationsbased on call-graph flatteningCarlo Alberto Ferrarisprofessor Silvano Rivoira
Master of Science in Telecommunication EngineeringThird School of Engineering: Information TechnologyPolitecnico di TorinoJuly 6th, 2011
![Page 2: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/2.jpg)
Increasing complexitiesEveryday objects are becoming
multi-purposenetworkedinteroperablecustomizablereusableupgradeable
![Page 3: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/3.jpg)
Increasing complexitiesEveryday objects are becoming
more and more complex
![Page 4: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/4.jpg)
Increasing complexitiesSoftware that runs smart objects is
becomingmore and more complex
![Page 5: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/5.jpg)
Diminishing resourcesSystems have to be resource-efficient
![Page 6: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/6.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many different flavours
![Page 7: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/7.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many different flavoursPowerEspecially valuable in battery-powered
scenarios such as mobile, sensor, 3rd world applications
![Page 8: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/8.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many different flavoursPower, densityCritical factor in data-center and product
design
![Page 9: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/9.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many different flavoursPower, density, computationalCPU, RAM, storage, etc. are often growing
slower than the potential applications
![Page 10: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/10.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many different flavoursPower, density, computational, developmentDevelopment time and costs should be as low
as possible for low TTM and profitability
![Page 11: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/11.jpg)
Diminishing resourcesSystems have to be resource-efficient
Resources come in many non-orthogonal flavours
Power, density, computational, development
![Page 12: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/12.jpg)
Do more with less
![Page 13: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/13.jpg)
AbstractionsWe need to modularize and hide the
complexityOperating systems, frameworks, libraries,
managed languages, virtual machines, …
![Page 14: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/14.jpg)
AbstractionsWe need to modularize and hide the
complexityOperating systems, frameworks, libraries,
managed languages, virtual machines, …
All of this comes with a cost: generic solutions are generally less efficient than ad-hoc ones
![Page 15: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/15.jpg)
AbstractionsWe need to modularize and hide the
complexity
Palm webOSUser interface running onHTML+CSS+Javascript
![Page 16: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/16.jpg)
AbstractionsWe need to modularize and hide the
complexity
Javascript PC emulatorRunning Linux inside a browser
![Page 17: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/17.jpg)
OptimizationsWe need to modularize and hide the
complexity without sacrificing performance
![Page 18: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/18.jpg)
OptimizationsWe need to modularize and hide the
complexity without sacrificing performance
Compiler optimizations trade off compilation time with development, execution time
![Page 19: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/19.jpg)
Vestigial abstractionsThe natural subdivision of code in functions
is maintained in the compiler and all the way down to the processor
Each function is self-contained with strict conventions regulating how it relates to other functions
![Page 20: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/20.jpg)
Vestigial abstractionsProcessors don’t care about functions;
respecting the conventions is just additional work
Push the contents of the registers and return address on the stack, jump to the callee; execute the callee, jump to the return address; restore the registers from the stack
![Page 21: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/21.jpg)
Vestigial abstractionsMany optimizations are simply not feasible
when functions are presentint replace(int* ptr, int value) { int tmp = *ptr; *ptr = value; return tmp;}
int A(int* ptr, int value) { return replace(ptr, value);}
int B(int* ptr, int value) { replace(ptr, value); return value;}
void *malloc(size_t size) { void *ret; // [various checks] ret = imalloc(size); if (ret == NULL) errno = ENOMEM; return ret;}
// ...type *ptr = malloc(size);if (ptr == NULL) return NOT_ENOUGH_MEMORY;// ...
![Page 22: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/22.jpg)
Vestigial abstractionsMany optimizations are simply not feasible
when functions are presentinterpreter_setup();while (opcode = get_next_instruction()) interpreter_step(opcode);interpreter_shutdown();
function interpreter_step(opcode) { switch (opcode) { case opcode_instruction_A: execute_instruction_A(); break; case opcode_instruction_B: execute_instruction_B(); break; // ... default: abort("illegal opcode!"); }}
![Page 23: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/23.jpg)
Vestigial abstractionsMany optimization efforts are directed at working
around the overhead caused by functions
Inlining clones the body of the callee in the caller; optimal solution w.r.t. calling overhead but causes code size increase and cache pollution; useful only on small, hot functions
![Page 24: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/24.jpg)
Call-graph flattening
![Page 25: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/25.jpg)
Call-graph flatteningWhat if we dismiss
functions during early compilation…
![Page 26: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/26.jpg)
Call-graph flatteningWhat if we dismiss
functions during early compilation and track the control flow explicitely instead?
![Page 27: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/27.jpg)
Call-graph flatteningWhat if we dismiss
functions during early compilation and track the control flow explicitely instead?
![Page 28: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/28.jpg)
Call-graph flatteningWhat if we dismiss
functions during early compilation and track the control flow explicitely instead?
![Page 29: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/29.jpg)
Call-graph flatteningWe get most benefits of inlining without
code duplication, including the ability to perform contextual code optimizations, without the code size issues
![Page 30: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/30.jpg)
Call-graph flatteningWe get most benefits of inlining without
code duplication, including the ability to perform contextual code optimizations, without the code size issues
Where’s the catch?
![Page 31: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/31.jpg)
Call-graph flatteningThe load on the compiler increases greatly
both directly due to CGF itself and also indirectly due to subsequent optimizations
Worse case complexity (number of edges) is quadratic w.r.t. the number of callsites being transformed (heuristics may help)
![Page 32: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/32.jpg)
Call-graph flatteningDuring CGF we need to statically keep track
of all live values across all callsites in all functions
A value is alive if it will be needed in subsequent instructionsA = 5, B = 9, C = 0;
// live: A, BC = sqrt(B); // live: A, Creturn A + C;
![Page 33: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/33.jpg)
Call-graph flatteningBasically the compiler has to statically
emulate ahead-of-time all the possible stack usages of the program
This has already been done on microcontrollers and resulted in a 23% decrease of stack usage (and 5% performance increase)
![Page 34: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/34.jpg)
Call-graph flatteningThe indirect cause of increased compiler
load comes from standard optimizations that are run after CGF
CGF does not create new branches (each call and return instruction is turned into a jump) but other optimizations can
![Page 35: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/35.jpg)
Call-graph flatteningThe indirect cause of increased compiler
load comes from standard optimizations that are run after CGF
Most optimizations are designed to operate on small functions with limited amounts of branches
![Page 36: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/36.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
![Page 37: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/37.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
Code motionMove instructions between function
boundaries; avoid unneeded computations, alleviate register pressure, improve cache locality
![Page 38: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/38.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
Code motion, macro compressionFind similar code sequences in different
parts of the code and merge them; reduce code size and cache pollution
![Page 39: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/39.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
Code motion, macro compression, nonlinear CF
CGF supports natively nonlinear control flows; almost-zero-cost EH and coroutines
![Page 40: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/40.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
Code motion, macro compression, nonlinear CF, stackless execution
No runtime stack needed in fully-flattened programs
![Page 41: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/41.jpg)
Call-graph flatteningMany possible application scenarios beside
inlining
Code motion, macro compression, nonlinear CF, stackless execution, stack protection
Effective stack poisoning attacks are much harder or even impossible
![Page 42: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/42.jpg)
ImplementationTo test if CGF is applicable also to complex
architectures and to validate some of the ideas presented in the thesis, a pilot implementation was written against the open-source LLVM compiler framework
![Page 43: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/43.jpg)
ImplementationOperates on LLVM-IR; host and target
architecture agnostic; roughly 800 lines of C++ code in 4 classes
The pilot implementation can not flatten recursive, indirect or variadic callsites; they can be used anyway
![Page 44: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/44.jpg)
ImplementationEnumerate suitable functionsEnumerate suitable callsites (and their live
values)Create dispatch function, populate with codeTransform callsitesPropagate live valuesRemove original functions or create wrappers
![Page 45: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/45.jpg)
int a(int n) { return n+1;}
int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n;}
Examples
![Page 46: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/46.jpg)
int a(int n) { return n+1;}
int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n;}
![Page 47: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/47.jpg)
int a(int n) { return n+1;}
int b(int n) { int i; for (i=0; i<10000; i++) n = a(n); return n;}
![Page 48: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/48.jpg)
int a(int n) { return n+1;}
int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n;}
Examples
![Page 49: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/49.jpg)
int a(int n) { return n+1;}
int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n;}
![Page 50: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/50.jpg)
.type .Ldispatch,@function.Ldispatch: movl $.Ltmp4, %eax # store the return dispather of a in rax jmpq *%rdi # jump to the requested outer disp. .Ltmp2: # outer dispatcher of b movl $.LBB2_4, %eax # store the address of %10.Ltmp0: # outer dispatcher of a movl (%rsi), %ecx # load the argument n in ecx jmp .LBB2_4.Ltmp8: # block %17 movl $.Ltmp6, %eax jmp .LBB2_4.Ltmp6: # block %18 movl $.Ltmp7, %eax.LBB2_4: # block %10 movq %rax, %rsi incl %ecx # n = n + 1 movl $.Ltmp8, %eax jmpq *%rsi # indirectbr.Ltmp4: # return dispatcher of a movl %ecx, (%rdx) # store in pointer rdx the return value ret # in ecx and return to the wrapper.Ltmp7: # return dispatcher of b movl %ecx, (%rdx) ret
![Page 51: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/51.jpg)
FuzzingTo stress test the pilot implementation and
to perform benchmarks a tunable fuzzer has been written
int f_1_2(int a) { a += 1; switch (a%3) { case 0: a += f_0_2(a); break; case 1: a += f_0_4(a); break; case 2: a += f_0_6(a); break; } return a;}
![Page 52: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/52.jpg)
![Page 53: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/53.jpg)
BenchmarksDue to the shortcomings in the currently
available optimizations in LLVM, the only meaningful benchmarks that can be done are those concerning code size and stack usage
In literature, average code size increases of 13% were reported due to CGF
![Page 54: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/54.jpg)
BenchmarksUsing our tunable fuzzer different programs
were generated and key statistics of the compiled code were gathered
![Page 55: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/55.jpg)
BenchmarksUsing our tunable fuzzer different programs
were generated and key statistics of the compiled code were gathered
![Page 56: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/56.jpg)
BenchmarksIn short, when optimizations work the
resulting code size is better than the one found in literature
![Page 57: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/57.jpg)
BenchmarksIn short, when optimizations work the
resulting code size is better than the one found in literature
When they don’t, the register spiller and allocator perform so badly that most instructions simply shuffle data around on the stack
![Page 58: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/58.jpg)
Benchmarks
![Page 59: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/59.jpg)
Next stepsReduce live value verbosityAlternative indirection schemesTune available optimizations for CGF constructsBetter register spiller and allocatorAd-hoc optimizations (code threader, adaptive
fl.)Support recursion, indirect calls; better
wrappers
![Page 60: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/60.jpg)
Conclusions“Do more with less”; optimizations are requiredCGF removes unneeded overhead due to low-
level abstractions and empowers powerful global optimizations
Benchmark results of the pilot implementation are better than those in literature when available LLVM optimizations can cope
![Page 61: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/61.jpg)
Compiler optimizationsbased on call-graph flatteningCarlo Alberto Ferrarisprofessor Silvano Rivoira
![Page 62: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/62.jpg)
![Page 63: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/63.jpg)
![Page 64: Compiler optimizations based on call-graph flattening](https://reader033.fdocuments.in/reader033/viewer/2022061201/547933a4b379597b2b8b46c6/html5/thumbnails/64.jpg)
.type wrapper,@functionsubq $24, %rsp # allocate space on the stackmovl %edi, 16(%rsp) # store the argument n on the stackmovl $.Ltmp0, %edi # address of the outer dispatcherleaq 16(%rsp), %rsi # address of the incoming argument(s)leaq 12(%rsp), %rdx # address of the return value(s)callq .Ldispatch # call to the dispatch functionmovl 12(%rsp), %eax # load the ret value from the stackaddq $24, %rsp # deallocate space on the stackret # return