Static Analysis and Compiler Design for Idempotent Processing

idempotent (ī-dəm-pō-tənt) adj. 1 of, relating to, or being a mathematical quantity which when applied to itself equals itself; 2 of, relating to, or being an operation under which a mathematical quantity is idempotent.

idempotent processing (ī-dəm-pō-tənt prə-ses-iŋ) n. the application of only idempotent operations in sequence; said of the execution of computer programs in units of only idempotent computations, typically, to achieve restartable behavior.

Marc de KruijfKarthikeyan Sankaralingam

Somesh Jha

PLDI 2012, Beijing

Example

int sum(int *array, int len) { int x = 0; for (int i = 0; i < len; ++i) x += array[i]; return x;}

source code

Example

R2 = load [R1] R3 = 0LOOP: R4 = load [R0 + R2] R3 = add R3, R4 R2 = sub R2, 1 bnez R2, LOOPEXIT: return R3

assembly code

F F F F0

faults

exceptions

load ?

mis-speculations

Example

assembly code

BAD STUFF HAPPENS!

R0 and R1 are unmodified

Example

assembly code

just re-execute!

convention:use checkpoints/buffers

It’s Idempotent!

idempoh… what…?

int sum(int *data, int len) { int x = 0; for (int i = 0; i < len; ++i) x += data[i]; return x;}

Idempotent Processing

idempotent regionsALL THE TIME

Idempotent Processing

normal compiler:custom compiler:

executive summary

low runtime overhead (typically 2-12%)

cut semantic clobber antidependences

how?idempotence inhibited by clobber antidependences

Presentation Overview

❶ Idempotence

❷ Algorithm

❸ Results

What is Idempotence?

is this idempotent?

how about this?

maybe this?

operation sequence dependence chain idempotent?

read, write

write, read, write Yes

it’s all about the data dependences

operation sequence dependence chain idempotent?

write, read

read, write

write, read, write Yes

it’s all about the data dependences

CLOBBER ANTIDEPENDENCEantidependence with an exposed read

Semantic Idempotence

(1) local (“pseudoregister”) state:can be renamed to remove clobber antidependences*

does not semantically constrain idempotence

two types of program state

(2) non-local (“memory”) state:cannot “rename” to avoid clobber antidependences

semantically constrains idempotence

semantic idempotence = no non-local clobber antidep.

preserve local state by renaming and careful allocation

❶ Idempotence

❷ Algorithm

❸ Results

Region Construction Algorithm

steps one, two, and three

Step 1: transform functionremove artificial dependences, remove non-clobbers

Step 3: refine for correctness & performanceaccount for loops, optimize for dynamic behavior

Step 2: construct regions around antidependencescut all non-local antidependences in the CFG

Step 1: Transform

Transformation 1: SSA for pseudoregister antidependences

not one, but two transformations

clobber antidependences

region boundaries

region identification

But we still have a problem:

depends on

Step 1: Transform

Transformation 2: Scalar replacement of memory variables

[x] = a;b = [x];[x] = c;

[x] = a;b = a;[x] = c;

non-clobber antidependences… GONE!

Step 1: Transform

Transformation 2: Scalar replacement of memory variables

clobber antidependences

region boundaries

region identification depends on

Step 2: Cut the CFGcut, cut, cut…

construct regions by “cutting” non-local antidependences

antidependence

larger is (generally) better:large regions amortize the cost of input preservation

region size

head sources of overhead

optimal region size?

Step 2: Cut the CFG

rough sketch

but where to cut…?

Step 2: Cut the CFGbut where to cut…?

goal: the minimum set of cuts that cuts allantidependence paths

intuition: minimum cuts fewest regions large regions

approach: a series of reductions:minimum vertex multi-cut (NP-complete)minimum hitting set among pathsminimum hitting set among “dominating nodes”

details in paper…

Step 3: Loop-Related Refinements

correctness: Not all local antidependences removed by SSA…

loops affect correctness and performance

loop-carried antidependences may clobberdepends on boundary placement; handled as a post-pass

performance: Loops tend to execute multiple times…

to maximize region size, place cuts outside of loopalgorithm modified to prefer cuts outside of loops

details in paper…

❶ Idempotence

❷ Algorithm

❸ Results

Results

compiler implementation – Paper compiler implementation in LLVM v2.9 – LLVM v3.1 source code release in July timeframe

experimental data (1) runtime overhead (2) region size (3) use case

Runtime Overhead

SPEC INTSPEC FP

PARSECOVERALL

instruction countexecution time

as a percentage

ead 7.6

benchmark suites (gmean) (gmean)

Region Size

SPEC INTSPEC FP

PARSECOVERALL

compiler-generated

average number of instructions

(gmean)benchmark suites (gmean)

Use Case

SPEC INTSPEC FP

PARSECOVERALL

101520253035

idempotencecheckpoint/loginstruction TMR

hardware fault recovery

(gmean)

24.030.5

benchmark suites (gmean)

❶ Idempotence

❸ Results

❷ Algorithm

Summary & Conclusions

idempotent processing – large (low-overhead) idempotent regions all the time

static analysis, compiler algorithm – (a) remove artifacts (b) partition (c) compile

summary

low overhead – 2-12% runtime overhead typical

Summary & Conclusions

several applications already demonstrated – CPU hardware simplification (MICRO ’11) – GPU exceptions and speculation (ISCA ’12) – hardware fault recovery (this paper)

conclusions

future work – more applications, hybrid techniques – optimal region size? – enabling even larger region sizes

Back-up Slides

Error recovery

mis-speculation (e.g. branch misprediction) – compiler handles for pseudoregister state – for non-local memory, store buffer assumed

arbitrary failure (e.g. hardware fault) – ECC and other verification assumed – variety of existing techniques; details in paper

exceptions – generally no side-effects beyond out-of-order-ness – fairly easy to handle

dealing with side-effects

Optimal Region Size?

region size

detectionlatency

registerpressure

re-executiontime

it depends… (rough sketch not to scale)

Prior Workrelating to idempotence

Technique Year Domain

Sentinel Scheduling 1992 Speculative memory re-ordering

Fast Mutual Exclusion 1992 Uniprocessor mutual exclusion

Multi-Instruction Retry 1995 Branch and hardware fault recovery

Atomic Heap Transactions 1999 Atomic memory allocation

Reference Idempotency 2006 Reducing speculative storage

Restart Markers 2006 Virtual memory in vector machines

Data-Triggered Threads 2011 Data-triggered multi-threading

Idempotent Processors 2011 Hardware simplification for exceptions

Encore 2011 Hardware fault recovery

iGPU 2012 GPU exception/speculation support

Detailed Runtime Overhead

SPEC INTSPEC FP

PARSEC gccnamd

OVERALL05

1015202530

instruction countexecution time

as a percentage

suites (gmean) outliers (gmean)

7.67.7

non-idempotent inner loops + high register pressure

Detailed Region Size

SPEC INTSPEC FP

PARSEChmmer lbm

OVERALL10

compileridealIdeal w/o outliers

average number of instructions

suites (gmean) outliers (gmean)

>1,000,000limited aliasing

information

Static Analysis and Compiler Design for Idempotent Processing

Documents

Transcript of Static Analysis and Compiler Design for Idempotent Processing

Compiler Construction of Idempotent Regions and Applications in Architecture Design · 2012-07-30 · Compiler Design & Evaluation 41 a summary design and implementation – static

CSC 425 - Principles of Compiler Design Icsitrd.kutztown.edu/.../lectures/Type_Checking.pdf · A static type system enables a compiler to detect many common programming errors The

Idempotent Work Stealing

CHAPTER 6 Generalized inverses of -idempotent matrixshodhganga.inflibnet.ac.in/bitstream/10603/9636/13/13_chapter 6.pdf · -idempotent matrix is found. Various generalized inverses

15-411 Compiler Design Peter Lee, Frank Pfenning, André ...symbolaris.com/course/Compilers12/11-ssa.pdf · Static Single Assignment 15-411 Compiler Design Peter Lee, Frank Pfenning,

Static Analysis and Compiler Design for Idempotent Processingresearch.cs.wisc.edu/vertical/papers/2012/pldi12-idem.… · · 2012-06-14Static Analysis and Compiler Design for Idempotent

CHAPTER 2 -idempotent matrices - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/9636/9/09_chapter 2.pdf · 21 2.1 Characterizations of k-idempotent matrices A -idempotent matrix

Am I Idempotent?

Tropical and Idempotent Mathematics and Applications · Tropical and Idempotent Mathematics and Applications International Workshop Tropical and Idempotent Mathematics August 26–31,

Compiler Construction - Lecture : [1ex] Winter Semester ... · Compiler Construction Lecture 17: Code Generation III (Short-Circuit Evaluation & Static Data Structures) Winter Semester

An Implementation of a Static Compiler for the TypeScript … · 2019-09-06 · Static TypeScript An Implementation of a Static Compiler for the TypeScript Language Thomas Ball Microsoft

Opencj: A research Java static compiler based on Open64 researc… · Opencj: A research Java static compiler based on Open64 Keqiao Yang, Zhemin Yang, Zhiwei Cao, Zeng Huang, Di

Google Compiler Research G. Chatelet, B. De Backer, O ...llvm.org/devmtg/2018-04/slides/Courbet-Static...Confidential + Proprietary Confidential + Proprietary Static Performance Analysis

Compiler Construction 2009/2010 SSA---Static Single Assignment Form

Exploiting Static Application Knowledge in a Java Compiler ... · Exploiting Static Application Knowledge in a Java Compiler for Embedded Systems A Case Study Christoph Erhardt, Michael

Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014

Topic 16: Static Single Assignment - POSTECHcorelab.postech.ac.kr/~hanjun/2017S_compiler/16_SSA.pdf · Topic 16: Static Single Assignment Compiler Design ... •Any solution? ...

Starkiller: A Static Type Inferencer and Compiler …people.csail.mit.edu/jrb/Projects/starkiller.pdfStarkiller: A Static Type Inferencer and Compiler for Python by Michael Salib Submitted

Static ILP Static (Compiler Based) Scheduling

Filter models: non-idempotent intersection types ... models: non-idempotent intersection types, orthogonality and polymorphism Alexis Bernadet1,2 and Stéphane Lengrand1,3 1 École