Introduction to Compiler Development

99
INTRO TO COMPILER DEVELOPMENT LOGAN CHIEN http://slide.logan.tw/compiler-intro/

Transcript of Introduction to Compiler Development

Page 1: Introduction to Compiler Development

INTRO TO COMPILERDEVELOPMENT

LOGAN CHIENhttp://slide.logan.tw/compiler-intro/

Page 2: Introduction to Compiler Development

LOGAN CHIENSo�ware engineer at MediaTek

Ametuar LLVM/Clang developer

Integrated LLVM/Clang into Android NDK

Page 3: Introduction to Compiler Development

WHY I LOVE COMPILERS?I have been a faithful reader of Jserv's blog for ten years.

I was inspired by the compiler and the virtual machinetechnologies mentioned in his blog.

I have decided to choose compiler technologies as myresearch topic since then.

Page 4: Introduction to Compiler Development

UNDERGRADUATE COMPILERCOURSE

I took the undergraduate compiler course when I was asophomore.

Page 5: Introduction to Compiler Development

MY PROFESSOR SAID ...“We took many lectures to discuss about the parser.

However, when people say they are doing compilerresearch, with large possibility, they are not referring to the

parsing technique.”

Page 6: Introduction to Compiler Development

I AM HERE TO ...Re-introduce the compiler technologies,

Give a lightening talk on industrial-strength compilerdesign,

Explain the connection between compiler technologies andthe industry.

Page 7: Introduction to Compiler Development

AGENDARe-introduction to Compiler (30min)

Industrial-strength Compiler Design (90min)

Compiler and ICT Industry (20min)

Page 8: Introduction to Compiler Development

RE-INTRODUCTION TOCOMPILER

Page 9: Introduction to Compiler Development

WHAT IS A COMPILER?Compiler are tools for programmers to translate

programmer's thought into computer runnable programs.

ANALOGY — Translators who turn from one language toanother, such as those who translate Chinese to English.

Page 10: Introduction to Compiler Development

WHAT HAVE WE LEARNED INUNDERGRADUATE COMPILER

COURSE?

Page 11: Introduction to Compiler Development

LEXERReads the input source code (as a sequence of bytes) and

converts them into a stream of tokens.u n s i g n e d b a c k g r o u n d ( u n s i g n e d f o r e g r o u n d ) { i f ( ( f o r e g r o u n d > > 1 6 ) > 0 x 8 0 ) { r e t u r n 0 ; } e l s e { r e t u r n 0 x f f f f f f ; } }

unsigned background ( unsigned foreground ) { if ( ( foreground >>16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }

Page 12: Introduction to Compiler Development

PARSERReads the tokens and build an AST according to the syntax.

unsigned background ( unsigned foreground ) { if ( ( foreground >>16 ) > 128 ) { return 0 ; } else { return 16777215 ; } }( p r o c e d u r e b a c k g r o u n d ( a r g s ' ( f o r e g r o u n d ) ) ( c o m p o u n d - s t m t ( i f - s t m t ( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 ) ( r e t u r n - s t m t 0 ) ( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )

Page 13: Introduction to Compiler Development

CODE GENERATORGenerate the machine code or (assembly) according to the

AST. In the undergraduate course, we usually simply dosyntax-directed translation.

( p r o c e d u r e b a c k g r o u n d ( a r g s ' ( f o r e g r o u n d ) ) ( c o m p o u n d - s t m t ( i f - s t m t ( b i n - e x p r G E ( b i n - e x p r R S H I F T f o r e g r o u n d 1 6 ) 1 2 8 ) ( r e t u r n - s t m t 0 ) ( r e t u r n - s t m t 1 6 7 7 7 2 1 5 ) ) ) )

l s r w 8 , w 0 , # 1 6 c m p w 8 , # 1 2 8 b . l o . L e l s e m o v w 0 , w z r r e t . L e l s e : o r r w 0 , w z r , # 0 x f f f f f f r e t

Page 14: Introduction to Compiler Development

WHAT'S MISSING?Can a person who can only lex and parse sentences translate

articles well?

Page 15: Introduction to Compiler Development

COMPILER REQUIREMENTSA compiler should translate the source code precisely.

A compiler should utilize the device efficiently.

Page 16: Introduction to Compiler Development

THREE RELATED FIELDSProgramming Language

Computer Architecture

Compiler

Page 17: Introduction to Compiler Development

PROGRAMMING LANGUAGE THEORYEssential component of a programming language: type

theory, variable scoping, language semantics, etc.

How do people reason and compose a program?

Create an abstraction that is understandable to human andtracable to computers.

Page 18: Introduction to Compiler Development

EXAMPLE: SUBTYPE AND MUTABLERECORDS

Why you can't perform following conversion in C++?v o i d t e s t ( i n t * p t r ) { i n t * * p = & p t r ; c o n s t i n t * * a = p ; / / C o m p i l e r g i v e s w a r n i n g / / . . . }

This is related to covariant type and contravarience type. With PLT, we know that we can only choosetwo of (a) covariant type, (b) mutable records, and (c) type consistency.

v o i d t e s t ( i n t * p t r ) { c o n s t i n t c = 0 ; i n t * * p = & p t r ; c o n s t i n t * * a = p ; / / I f i t i s a l l o w e d , b a d p r o g r a m s w i l l p a s s . * a = & c ; * p = 5 ; / / N o w a r n i n g h e r e .}

Page 19: Introduction to Compiler Development

EXAMPLE: DYNAMIC SCOPING INBASH

# ! / b i n / s h v = 1 # I n i t i a l i z e v w i t h 1 f o o ( ) { e c h o " f o o : v = $ { v } " # W h i c h v i s r e f e r r e d ? v = 2 # W h i c h v i s a s s i g n e d ? } b a r ( ) { l o c a l v = 3 f o o e c h o " b a r : v = $ { v } " # W h a t w i l l b e p r i n t e d ? } v = 4 # A s s i g n 4 t o v b a r e c h o " v = $ { v } " # W h a t w i l l b e p r i n t e d ?

Ans: foo:v=3 , bar:v=2 , v=4 . Surprisingly, foo isaccessing local v in bar instead of the global v.

Page 20: Introduction to Compiler Development

EXAMPLE: NON-LEXICAL SCOPING INJAVASCRIPT

/ / J a v a s c r i p t , t h e b a d p a r t f u n c t i o n b a d ( v ) { v a r s u m ; w i t h ( v ) { s u m = a + b ; } r e t u r n s u m ;}

c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 } ) ) ; c o n s o l e . l o g ( b a d ( { a : 5 , b : 1 0 , s u m : 1 0 0 } ) ) ;

Ans: The second console.log() prints undefined.

Page 21: Introduction to Compiler Development

COMPUTER ARCHITECTUREInstruction set architecture: CISC vs. RISC.

Out-of-Order Execution vs. Instruction Scheduling.

Memory hierarchy

Memory model

Page 22: Introduction to Compiler Development

QUIZ: DO YOU REALLY KNOW C?Is it guaranteed that v will always be loaded a�er pred?i n t p r e d ; i n t v ;

i n t g e t ( i n t a , i n t b ) { i n t r e s ; i f ( p r e d > 0 ) { r e s = v * a - v / b ; } e l s e { r e s = v * a + v / b ; } r e t u r n r e s ;}

Ans: No. Independent reads/writes can be reordered. The standard only requires the result shouldbe the same as running from top to bottom (in a single thread.)

Page 23: Introduction to Compiler Development

COMPILER ANALYSISData-flow analysis — Analyze value ranges, check theconditions or contraints, figure out modifications tovariables, etc.

Control-flow analysis — Analyze the structure of theprogram, such as control dependency and loop structure.

Memory dependency analysis — Analyze the memoryaccess pattern of the access to array elements or pointerdereferences, e.g. alias analysis.

Page 24: Introduction to Compiler Development

ALIAS ANALYSISDetermine whether two pointers can refers to the same

object or not.v o i d m o v e ( c h a r * d s t , c o n s t c h a r * s r c , i n t n ) { f o r ( i n t i = 0 ; i < n ; + + i ) { d s t [ i ] = s r c [ i ] ; } }

i n t s u m ( i n t * p t r , c o n s t i n t * v a l , i n t n ) { i n t r e s = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { r e s + = * v a l ; * p t r + + = 1 0 ; } r e t u r n r e s ;}

Page 25: Introduction to Compiler Development

QUIZ: DO YOU KNOW C++?c l a s s Q M u t e x L o c k e r { p u b l i c : u n i o n { Q M u t e x * m t x _ ; u i n t p t r _ t v a l _ ; } ; v o i d u n l o c k ( ) { i f ( v a l _ ) { i f ( ( v a l _ & ( u i n t p t r _ t ) 1 ) = = ( u i n t p t r _ t ) 1 ) { v a l _ & = ~ ( u i n t p t r _ t ) 1 ; m t x _ - > u n l o c k ( ) ; } } } } ;

Pitfall: Reading from union fields that were not writtenpreviously results in undefined behavior. Type-Based Alias

Analysis (TBAA) exploits this rule.

Page 26: Introduction to Compiler Development

COMPILER OPTIMIZATIONScalar optimization — Fold the constants, remove theredundancies, change expressions with identities, etc.

Vector optimization — Convert several scalar operationsinto one vector operation, e.g. combining for add instructioninto one vector add.

Interprocedural optimization — function inlining,devirtualization, cross-function analysis, etc.

Page 27: Introduction to Compiler Development

OTHER COMPILER-RELATEDTECHNOLOGYJust-in-time compilers

Binary translators

Program profiling and performance measurement

Facilities to run compiled executables, e.g. garbagecollectors

Page 28: Introduction to Compiler Development

INDUSTRIAL-STRENGTH COMPILER

DESIGN

Page 29: Introduction to Compiler Development

What's the difference between yourfinal project and the industrial-

strength compiler?

Page 30: Introduction to Compiler Development

KEY DIFFERENCEAnalysis — Reasons program structures and changes ofvalues.

Optimization — Applies several provably correcttransformation which should make program run faster.

Intermediate Representation — Data structure on whichanalyses and optimizations are based.

Page 31: Introduction to Compiler Development

INTERMEDIATE REPRESENTATION1/4

A data structure for program analyses and optimizations.

High-level enough to capture important properties andencapsulates hardware limitation.

Low-level enough to be analyzed by analyses andmanipulated by transformations.

An abstraction layer for multiple front-ends and back-ends.

Page 32: Introduction to Compiler Development

INTERMEDIATE REPRESENTATION2/4

C/C++

Ada

Fortran

Go

arm

aarch64

mips

nds

x86

GENERIC RTLGIMPLE

Tree SSA

GCC Compiler Pipeline

Page 33: Introduction to Compiler Development

INTERMEDIATE REPRESENTATION3/4

C/C++

Swift

Rust

Obj‑C

arm

aarch64

mips

sparc

x86

MachInstrSelDAGLLVM IR

LLVM Compiler Pipeline

Page 34: Introduction to Compiler Development

INTERMEDIATE REPRESENTATION4/4

GCC — GENERIC, GIMPLE, Tree SSA, and RTL.

LLVM — LLVM IR, Selection DAG, and Machine Instructions.

Java HotSpot — HIR, LIR, and MIR.

Page 35: Introduction to Compiler Development

CONTROL FLOW GRAPH1/3

Basic block — A sequence of instructions that will only beentered from the top and exited from the end.

Edge — If the basic block s may branch to t, then we add adirected edge (s, t).

Predecessor/Successor — If there is an edge (s, t), then s is apredecessor of t and t is a successor of s.

Page 36: Introduction to Compiler Development

CONTROL FLOW GRAPH2/3

/ / y = a * x + b ; v o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n , c o n s t d o u b l e * r e s t r i c t a , c o n s t d o u b l e * r e s t r i c t x , c o n s t d o u b l e * r e s t r i c t b ) { f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ r * n + i ] * x [ i ] ; } y [ r ] = t ; } }

Input source program

Page 37: Introduction to Compiler Development

CONTROL FLOW GRAPH3/3

bp = getelementptr b, rt = load bpi = 0cmp = icmp lt, i, nbr cmp, B2, B3

B1

B2rn = mul r, nai = add rn, iap = getelementptr a, aiad = load apxp = getelementptr x, ixd = load xpax = mul ad, xdt = add t, axi = add i, 1cmp = icmp lt, i, nbr cmp, B2, B3

B3yp = getelementptr y, rstore t, ypr = add r, 1cmp = icmp lt, r, nbr cmp, B1, B4B4

ret void

B0 (ENTRY)

r = 0cmp = icmp lt r, mbr cmp, B1, B4

Page 38: Introduction to Compiler Development

VARIABLE DEFINITION

bp = getelementptr b, rt = load bpi = 0cmp = icmp lt, i, nbr cmp, B2, B3

B1

B2rn = mul r, nai = add rn, iap = getelementptr a, aiad = load apxp = getelementptr x, ixd = load xpax = mul ad, xdt = add t, axi = add i, 1cmp = icmp lt, i, nbr cmp, B2, B3

B3yp = getelementptr y, rstore t, ypr = add r, 1cmp = icmp lt, r, nbr cmp, B1, B4B4

ret void

B0 (ENTRY)

r = 0cmp = icmp lt r, mbr cmp, B1, B4

definitions of r

The place where a variable is assigned or defined.

Page 39: Introduction to Compiler Development

VARIABLE USE

bp = getelementptr b, rt = load bpi = 0cmp = icmp lt, i, nbr cmp, B2, B3

B1

B2rn = mul r, nai = add rn, iap = getelementptr a, aiad = load apxp = getelementptr x, ixd = load xpax = mul ad, xdt = add t, axi = add i, 1cmp = icmp lt, i, nbr cmp, B2, B3

B3yp = getelementptr y, rstore t, ypr = add r, 1cmp = icmp lt, r, nbr cmp, B1, B4B4

ret void

B0 (ENTRY)

r = 0cmp = icmp lt r, mbr cmp, B1, B4

definitions of ruses of r

The places where a variable is referred or used.

Page 40: Introduction to Compiler Development

REACHING DEFINITION1/2

bp = getelementptr b, rt = load bpi = 0cmp = icmp lt, i, nbr cmp, B2, B3

B1

B2rn = mul r, nai = add rn, iap = getelementptr a, aiad = load apxp = getelementptr x, ixd = load xpax = mul ad, xdt = add t, axi = add i, 1cmp = icmp lt, i, nbr cmp, B2, B3

B3yp = getelementptr y, rstore t, ypr = add r, 1cmp = icmp lt, r, nbr cmp, B1, B4B4

ret void

B0 (ENTRY)

r = 0cmp = icmp lt r, mbr cmp, B1, B4

{d0} {d0, d1}

{d0, d1}

d0:

d1:

{d1}

{d0, d1}

{d0, d1}

Definitions that reaches a use.

Page 41: Introduction to Compiler Development

REACHING DEFINITION2/2

Constant propagation is a good example to show theusefulness of reaching definition.

v o i d t e s t ( i n t c o n d ) { i n t a = 1 ; / / d 0 i n t b = 2 ; / / d 1 i f ( c o n d ) { c = 3 ; / / d 2 } e l s e { / / R e a c h D e f [ a ] = { d 0 } / / R e a c h D e f [ b ] = { d 1 } c = a + b ; / / d 3 }

/ / R e a c h D e f [ c ] = { d 2 , d 3 } u s e ( c ) ; }

Page 42: Introduction to Compiler Development

DOMINANCE RELATION1/2

A basic block s dominates t iff every paths that goes fromentry to t will pass through s.

Every basic block in a CFG has an immediate dominator andforms a dominator tree.

Page 43: Introduction to Compiler Development

DOMINANCE RELATION2/2

B0

B1

B2

B3

B4

B0

B1

B2B3

B4

Dominator Tree

Page 44: Introduction to Compiler Development

DOMINANCE FRONTIERA basic block t is a dominance frontier of a basic block s, if

one of predecessor of t is dominated by s but t is not strictlydominated by s.

B0

B1 B2

B3

B6

B4B5

DF[B2] = {B3, B5}

DF[B4] = {B4}

DF[B1] = {B3, B5}

Page 45: Introduction to Compiler Development

STATIC SINGLE-ASSIGNMENT FORMstatic — A static analysis to the program (not the execution.)

single-assignment — Every variable can only be assignedonce.

SSA form is the most popular intermediate representationrecently.

It is adopted by a wide range of compilers, such as GCC,LLVM, Java HotSpot, Android ART, etc.

Page 46: Introduction to Compiler Development

SSA PROPERTIESEvery variables can only be defined once.

Every uses can only refer to one definition.

Use phi function to handle the merged control-flow.

Page 47: Introduction to Compiler Development

PHI FUNCTIONS

d e f i n e v o i d @ f o o ( i 1 c o n d , i 3 2 a , i 3 2 b ) { e n t : b r c o n d , b 1 , b 2 b 1 : t 0 = m u l a , 4 b r b 3 b 2 : t 1 = m u l b , 5 b r b 3 b 3 : t 2 = p h i ( t 0 ) , ( t 1 ) u s e ( t 2 ) r e t }

d e f i n e v o i d @ f o o ( i 3 2 n ) { e n t : b r l o o p l o o p : i 0 = p h i ( 0 ) , ( i 1 ) c m p = i c m p g e i 0 , n b r c m p , e n d , c o n t c o n t : u s e ( i 0 ) i 1 = a d d i 0 , 1 b r l o o p e n d : r e t }

Page 48: Introduction to Compiler Development

ADVANTAGE OF SSA FORMCompact — Reduce the def-use chain.

Referential transparency — The properties associated with avariable will not be changed, aka. context-free.

Page 49: Introduction to Compiler Development

REDUCED DEF-USE CHAINv o i d f o o ( i n t c o n d 1 , i n t c o n d 2 , i n t a , i n t b ) { i n t t ; i f ( c o n d 1 ) { t = a * 4 ; / / d 0 } e l s e { t = b * 5 ; / / d 1 } i f ( c o n d 2 ) { / / r e a c h - d e f : { d 0 , d 1 } u s e ( t ) ; } e l s e { / / r e a c h - d e f : { d 0 , d 1 } u s e ( t ) ; } }

v o i d f o o ( i n t c o n d 1 , i n t c o n d 2 , i n t a , i n t b ) { i f ( c o n d 1 ) { t . 0 = a * 4 ; } e l s e { t . 1 = b * 5 ; } t . 2 = p h i ( t . 0 , t . 1 ) ; i f ( c o n d 2 ) { u s e ( t . 2 ) ; } e l s e { u s e ( t . 2 ) ; } }

Page 50: Introduction to Compiler Development

REFERENTIAL TRANSPARENCY1/2

v o i d f o o ( ) { i n t r = 5 ; / / d 0

/ / . . . S O M E C O D E . . .

/ / W e c a n o n l y a s s u m e / / " r = = 5 " i f d 0 i s t h e / / o n l y r e a c h i n g d e f i n i t i o n . u s e ( r ) ; }

v o i d f o o ( ) { r . 0 = 5 ;

/ / . . . S O M E C O D E . . .

/ / N o m a t t e r w h a t c o d e a r e / / s k i p p e d a b o v e , i t i s s a f e / / t o r e p l a c e f o l l o w i n g r . 0 / / w i t h 5 . u s e ( r . 0 ) ; }

Page 51: Introduction to Compiler Development

REFERENTIAL TRANSPARENCY2/2

v o i d f o o ( i n t a , i n t b ) { i n t c = a + b ; i n t r = a + b ; / / C a n w e s i m p l y r e p l a c e / / a l l o c c u r r e n c e o f r w i t h / / c ? ( N O )

/ / . . . S O M E C O D E . . .

u s e ( r ) ; }

v o i d f o o ( i n t a , i n t b ) { c . 0 = a + b ; r . 0 = a + b ;

/ / . . . S O M E C O D E . . .

/ / N o m a t t e r w h a t c o d e a r e / / s k i p p e d a b o v e , i t i s s a f e / / t o r e p l a c e f o l l o w i n g r . 0 / / w i t h c . 0 . u s e ( r . 0 ) ; }

Page 52: Introduction to Compiler Development

BUILDING SSA FORM1. Compute domination relationships between basic blocks

and build the dominator tree.2. Compute dominator frontiers.3. Insert phi functions at dominator frontiers.4. Traverse the dominator tree and rename the variables.

Page 53: Introduction to Compiler Development

BEFORE SSA CONSTRUCTION

bp = getelementptr b, rt = load bpi = 0cmp = icmp lt, i, nbr cmp, B2, B3

B1

B2rn = mul r, nai = add rn, iap = getelementptr a, aiad = load apxp = getelementptr x, ixd = load xpax = mul ad, xdt = add t, axi = add i, 1cmp = icmp lt, i, nbr cmp, B2, B3

B3yp = getelementptr y, rstore t, ypr = add r, 1cmp = icmp lt, r, nbr cmp, B1, B4B4

ret void

B0 (ENTRY)

r = 0cmp = icmp lt r, mbr cmp, B1, B4

Page 54: Introduction to Compiler Development

AFTER SSA CONSTRUCTION

r.0 = phi (0, B0) (r.1 B3)bp = getelementptr b, r.0t.0 = load bpcmp = icmp lt, 0, nbr cmp, B2, B3

B1

B2t.1 = phi (t.0, B1) (t.2 B2)i.0 = phi (0, B1) (i.1, B2)rn = mul r.0, nai = add rn, i.1ap = getelementptr a, aiad = load apxp = getelementptr x, i.1xd = load xpax = mul ad, xdt.2 = add t.1, axi.1 = add i.0, 1cmp = icmp lt, i.1, nbr cmp, B2, B3

B3t.3 = phi (t0, B1) (t.2, B2)yp = getelementptr y, r.0store t.3, ypr.1 = add r.0, 1cmp = icmp lt, r.1, nbr cmp, B1, B4

B4ret void

B0 (ENTRY)

cmp = icmp lt 0, mbr cmp, B1, B4

Page 55: Introduction to Compiler Development

OPTIMIZATIONS

Page 56: Introduction to Compiler Development

CONSTANT PROPAGATION1/3

Constant propagation, sometimes known as constantfolding, will evaluate the instructions with constant

operands and propagate the constant result.a = a d d 2 , 3 b = a c = m u l a , b

a = 5 b = 5 c = 2 5

Page 57: Introduction to Compiler Development

CONSTANT PROPAGATION2/3

Why do we need constant propagation?s t r u c t q u e u e * c r e a t e _ q u e u e ( ) { r e t u r n ( s t r u c t q u e u e * ) m a l l o c ( s i z e o f ( s t r u c t q u e u e * ) * 1 6 ) ; }

i n t p r o c e s s _ d a t a ( i n t a , i n t b , i n t c ) { i n t k [ ] = { 0 x 1 3 , 0 x 1 7 , 0 x 1 9 } ; i f ( D E B U G ) { v e r i f y _ d a t a ( k , d a t a ) ; } r e t u r n ( a * k [ 1 ] * k [ 2 ] + b * k [ 0 ] * k [ 2 ] + c * k [ 0 ] * k [ 1 ] ) ; }

Page 58: Introduction to Compiler Development

CONSTANT PROPAGATION3/3

For each basic block b from CFG in reversed postorder:

For each instruction i in basic block b from top to bottom:

If all of its operands are constants, the operation has noside-effect, and we know how to evaluate it at compile time,then evaluate the result, remove the instruction, and replaceall usages with the result.

Page 59: Introduction to Compiler Development

GLOBAL VALUE NUMBERING1/2

Global value numbering tries to give numbers to thecomputed expression and maps the newly visited expression

to the visited ones.

a = c * d ; / / [ ( * , c , d ) - > a ] e = c ; / / [ ( * , c , d ) - > a ] f = e * d ; / / q u e r y : i s ( * , c , d ) a v a i l a b l e ? u s e ( a , e , f ) ;

a = c * d ; u s e ( a , c , a ) ;

Page 60: Introduction to Compiler Development

GLOBAL VALUE NUMBERING2/2

Traverse the basic blocks in depth-first order on dominatortree.

Maintain a stack of hash table. Once we have returned froma child node on the dominator tree, then we have to pop thestack top.

Visit the instructions in the basic block with tᅠ=ᅠopᅠa,ᅠbform and compute the hash for (op, a, b).

If (op, a, b) is already in the hash table, then change tᅠ=ᅠopᅠa,ᅠb with tᅠ=ᅠhash_tab[(op,ᅠa,ᅠb)]

Otherwise, insert (op, a, b) -> t to the hash table.

Page 61: Introduction to Compiler Development

Otherwise, insert to the hash table.

DEAD CODE ELIMINATION1/6

Dead code elimination (DCE) removes unreachableinstructions or ignored results.

Constant propagation might reveal more dead code sincethe branch conditions become constant value.

On the other hand, DCE can exploit more constant forconstant propagation because several definitions are

removed from the program.

Page 62: Introduction to Compiler Development

DEAD CODE ELIMINATION2/6

Conditional statements with constant conditioni f ( k I s D e b u g B u i l d ) { / / D e a d c o d e c h e c k _ i n v a r i a n t ( a , b , c , d ) ; / / D e a d c o d e }

Page 63: Introduction to Compiler Development

DEAD CODE ELIMINATION3/6

Platform-specific constantv o i d h a s h _ e n t _ s e t _ v a l ( s t r u c t h a s h _ e n t * h , i n t v ) { i f ( s i z e o f ( i n t ) < = s i z e o f ( v o i d * ) ) { h - > p = ( v o i d * ) ( u i n t p t r _ t ) ( v ) ; } e l s e { h - > p = m a l l o c ( s i z e o f ( i n t ) ) ; / / D e a d c o d e * ( h - > p ) = v ; / / D e a d c o d e } }

Page 64: Introduction to Compiler Development

DEAD CODE ELIMINATION4/6

Computed result ignoredi n t c o m p u t e _ s u m ( i n t a , i n t b ) { i n t s u m = ( a + b ) * ( a - b + 1 ) / 2 ; / / D e a d c o d e : N o t u s e d r e t u r n 0 ; }

Dead code a�er dead store optimizationv o i d t e s t ( i n t * p , b o o l c o n d , i n t a , i n t b , i n t c ) { i f ( c o n d ) { t = a + b ; / / D e a d c o d e * p = t ; / / W i l l b e r e m o v e d b y D S E } * p = c ; }

Page 65: Introduction to Compiler Development

DEAD CODE ELIMINATION5/6

Dead code a�er code specializationv o i d m a t m u l ( d o u b l e * r e s t r i c t y , u n s i g n e d l o n g m , u n s i g n e d l o n g n , c o n s t d o u b l e * r e s t r i c t a , c o n s t d o u b l e * r e s t r i c t x , c o n s t d o u b l e * r e s t r i c t b ) { i f ( n = = 0 ) { f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { / / D e a d c o d e t + = a [ r * n + i ] * x [ i ] ; / / D e a d c o d e } / / D e a d c o d e y [ r ] = t ; } } e l s e { / / . . . s k i p p e d . . . } }

Page 66: Introduction to Compiler Development

DEAD CODE ELIMINATION6/6

Traverse the CFG starting from entry in reversed post order.

Only traverse the successor that may be visited, i.e. if thebranch condition is a constant, then ignore the other side.

While traversing the basic block, clear the dead instructionswithin the basic block.

A�er the traversal, remove the unvisited basic blocks andremove the variable uses that refers to the variables that aredefined in the unvisited basic blocks.

Page 67: Introduction to Compiler Development

LOOP OPTIMIZATIONSIt is reasonable to assume a program spends more time in

the loop body. Thus, loop optimization is an important issuein the compiler.

Page 68: Introduction to Compiler Development

LOOP INVARIANT CODE MOTION1/3

Loop invariant code motion (LICM) is an optimization whichmoves loop invariants or loop constants out of the loop.i n t t e s t ( i n t n , i n t a , i n t b , i n t c ) { i n t s u m = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { s u m + = i * a * b * c ; / / a * b * c i s l o o p i n v a r i a n t } r e t u r n s u m ;}

Page 69: Introduction to Compiler Development

LOOP INVARIANT CODE MOTION2/3

f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ ( r * n ) + i ] * x [ i ] ; / / " r * n " i s a i n n e r l o o p i n v a r i a n t } y [ r ] = t ; }

f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / " r * n " m o v e d o u t o f t h e i n n e r l o o p f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { t + = a [ k + i ] * x [ i ] ; } y [ r ] = t ; }

Page 70: Introduction to Compiler Development

LOOP INVARIANT CODE MOTION3/3

How do we know whether a variable is a loop invariant?

If the computation of a value is not (transitively) dependingon following black lists, we can assume such value is a loopinvariant:

Phi instructions associating with the loop beingconsideredInstructions with side-effects or non-pure instructions,e.g. function calls

Page 71: Introduction to Compiler Development

INDUCTION VARIABLEIf x represents the trip count (iteration count), then we call

a*x+b induction variables.f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V t + = a [ m ] * x [ i ] ; } y [ r ] = t ; }

k and r are induction variables of outer loop.

i and m are induction variables of inner loop.

Page 72: Introduction to Compiler Development

LOOP STRENGTH REDUCTION1/2

Strength reduction is an optimization to replace themultiplication in induction variables with an addition.f o r ( u n s i g n e d l o n g r = 0 ; r < m ; + + r ) { d o u b l e t = b [ r ] ; u n s i g n e d l o n g k = r * n ; / / o u t e r l o o p I V f o r ( u n s i g n e d l o n g i = 0 ; i < n ; + + i ) { u n s i g n e d l o n g m = k + i ; / / i n n e r l o o p I V t + = a [ m ] * x [ i ] ; } y [ r ] = t ; }

f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { / / k d o u b l e t = b [ r ] ; f o r ( u n s i g n e d l o n g i = 0 , m = k ; i < n ; + + i , + + m ) { / / m t + = a [ m ] * x [ i ] ; } y [ r ] = t ; }

Page 73: Introduction to Compiler Development

LOOP STRENGTH REDUCTION2/2

We can even rewrite the range to eliminate the indexcomputation.

i n t s u m ( c o n s t i n t * a , i n t n ) { i n t r e s = 0 ; f o r ( i n t i = 0 ; i < n ; + + i ) { r e s + = a [ i ] ; / / i m p l i c i t * ( a + s i z e o f ( i n t ) * i ) } r e t u r n r e s ;}

i n t s u m ( c o n s t i n t * a , i n t n ) { i n t r e s = 0 ; c o n s t i n t * e n d = a + n ; / / i m p l i c i t a + s i z e o f ( i n t ) * n f o r ( c o n s t i n t * p = a ; p ! = e n d ; + + p ) { / / r a n g e u p d a t e d r e s + = * p ; } r e t u r n r e s ;}

Page 74: Introduction to Compiler Development

LOOP UNROLLING1/2

Unroll the loop body multiple times.

Purpose: Reduce amortized loop iteration overhead.

Purpose: Reduce load/store stall and exploit instruction-level parallelism, e.g. so�ware pipelining.

Purpose: Prepare for vectorization, e.g. SIMD.

Page 75: Introduction to Compiler Development

LOOP UNROLLING2/2

f o r ( u n s i g n e d l o n g r = 0 , k = 0 ; r < m ; + + r , k + = n ) { d o u b l e t = b [ r ] ; s w i t c h ( n & 0 x 3 ) { / / D u f f ' s d e v i c e c a s e 3 : t + = a [ k + 2 ] * x [ 2 ] ; c a s e 2 : t + = a [ k + 1 ] * x [ 1 ] ; c a s e 1 : t + = a [ k ] * x [ 0 ] ; } f o r ( u n s i g n e d l o n g i = n & 0 x 3 ; i < n ; i + = 4 ) { t + = a [ k + i ] * x [ i ] ; t + = a [ k + i + 1 ] * x [ i + 1 ] ; t + = a [ k + i + 2 ] * x [ i + 2 ] ; t + = a [ k + i + 3 ] * x [ i + 3 ] ; } y [ r ] = t ; }

Page 76: Introduction to Compiler Development

INSTRUCTION SELECTION1/5

Instruction selection is a process to map IR into machineinstructions.

Compiler back-ends will perform pattern matching to thebest select instructions (according to the heuristic.)

Page 77: Introduction to Compiler Development

INSTRUCTION SELECTION2/5

Complex operations, e.g. shi�-and-add or multiply-accumulate.

Array load/store instructions, which can be translated to oneshi�-add-and-load on some architectures.

IR instructions are which not natively supported by the targetmachine.

Page 78: Introduction to Compiler Development

INSTRUCTION SELECTION3/5

d e f i n e i 6 4 @ m a c ( i 6 4 % a , i 6 4 % b , i 6 4 % c ) { e n t : % 0 = m u l i 6 4 % a , % b % 1 = a d d i 6 4 % 0 , % c r e t i 6 4 % 1 }

m a c : m a d d x 0 , x 1 , x 0 , x 2 r e t

Page 79: Introduction to Compiler Development

INSTRUCTION SELECTION4/5

d e f i n e i 6 4 @ l o a d _ s h i f t ( i 6 4 * % a , i 6 4 % i ) { e n t : % 0 = g e t e l e m e n t p t r i 6 4 , i 6 4 * % a , i 6 4 % i % 1 = l o a d i 6 4 , i 6 4 * % 0 r e t i 6 4 % 1 }

l o a d _ s h i f t : l d r x 0 , [ x 0 , x 1 , l s l # 3 ] r e t

Page 80: Introduction to Compiler Development

INSTRUCTION SELECTION5/5

% s t r u c t . A = t y p e { i 6 4 * , [ 1 6 x i 6 4 ] }

d e f i n e i 6 4 @ g e t _ v a l ( % s t r u c t . A * % p , i 6 4 % i , i 6 4 % j ) { e n t : % 0 = g e t e l e m e n t p t r % s t r u c t . A , % s t r u c t . A * % p , i 6 4 % i , i 3 2 1 , i 6 4 % j % 1 = l o a d i 6 4 , i 6 4 * % 0 r e t i 6 4 % 1 }

g e t _ v a l : m o v z w 8 , # 0 x 8 8 m a d d x 8 , x 1 , x 8 , x 0 a d d x 8 , x 8 , x 2 , l s l # 3 l d r x 0 , [ x 8 , # 8 ] r e t

Page 81: Introduction to Compiler Development

SSA DESTRUCTION1/5

How do we deal with the phi functions?

Copy the assignment operator to the end of the predecessor.

(Must be done carefully)

Page 82: Introduction to Compiler Development

SSA DESTRUCTION2/3

cmp = icmp eq, cond, 0br cmp, B1, B2

B0

br B3B1

br B3B2

a = phi (3, B1) (7, B2)print(a)

B3

cmp = icmp eq, cond, 0br cmp, B1, B2

B0

a = 3br B3

B1a = 7br B3

B2

print(a)

B3converts to

Example to show the assignment copying

Page 83: Introduction to Compiler Development

SSA DESTRUCTION3/5

i.0 = phi (0, B0) (i.1, B2)cmp = icmp lt, i.0, 100br cmp, B2, B3

B1retB3

print(i.0)i.1 = add i.0, 1br B1

B2

br B1B0

cmp = icmp lt, i.0, 100br cmp, B2, B3

B1 B3

print(i.0)i.1 = add i.0, 1i.0 = i.1br B1

B2

i.0 = 0br B1

B0

ret

Converts to

Converts to

Example to show the assignment copying with loop

Page 84: Introduction to Compiler Development

SSA DESTRUCTION4/5

i.0 = phi (0, B0) (i.1, B2)cmp = icmp lt, i.0, 100br cmp, B2, B3

B1print(i.0)ret

B3

print(i.0)i.1 = add i.0, 1br B1

B2

br B1B0

cmp = icmp lt, i.0, 100br cmp, B2, B3

B1 B3

print(i.0)i.1 = add i.0, 1i.0 = i.1br B1

B2

i.0 = 0br B1

B0

print(i.0)ret

Doesn'twork!

Lost copy problem — Naive de-SSA algorithm doesn't workdue to live range conflicts. Need extra assignments and

renaming to avoid conflicts.

Page 85: Introduction to Compiler Development

SSA DESTRUCTION5/5

t = xx = yy = tbr cmp, B2, B1

B1

print(x)print(y)

B2

x = 1y = 2br B1

B0

x1 = phi (x0, y1)y1 = phi (y0, x1)br cmp, B2, B1

B1

print(x1)print(y1)

B2

x0 = 1y0 = 2br B1

B0

x1 = y1y1 = x1br cmp, B2, B1

B1

print(x1)print(y1)

B2

x0 = 1y0 = 2x1 = x0y1= y0br B1

B0

Swap problem — Conflicts due to parallel assignmentsemantics of phi instructions. A correct algorithm shoulddetect the cycle and implement parallel assignment with

swap instructions.

Page 86: Introduction to Compiler Development

REGISTER ALLOCATION1/2

We have to replace infinite virtual registers with finitemachine registers.

Register Allocation — Make sure that the maximumsimultaneous register usages are less then k.

Register Assignment — Assign a register given the fact thatregisters are always sufficient.

The classical solution is to compute the life time of eachvariables, build the inference graph, spill the variable to

stack if the inference graph is not k-colorable.

Page 87: Introduction to Compiler Development

REGISTER ALLOCATION2/2

ret yadd y, c, gadd g a fadd f, d, eadd e, a, cadd d, a, bmov c, #2mov b, #1mov a, #1

abcdefgy

a

g

bc

ed

fy

Page 88: Introduction to Compiler Development

INSTRUCTION SCHEDULING1/2

Sort the instruction according the number of cycles in orderto reduce execution time of the critical path.

Constraints: (a) Data dependence, (b) Functional units, (c)Issue width, (d) Datapath forwarding, (e) Instruction cycle

time, etc.

Page 89: Introduction to Compiler Development

INSTRUCTION SCHEDULING2/2

0 : a d d r 0 , r 0 , r 1 1 : a d d r 1 , r 2 , r 2 2 : l d r r 4 , [ r 3 ] 3 : s u b r 4 , r 4 , r 0 4 : s t r r 4 , [ r 1 ]

2 : l d r r 4 , [ r 3 ] # l o a d i n s t r u c t i o n n e e d s m o r e c y c l e s 0 : a d d r 0 , r 0 , r 1 1 : a d d r 1 , r 2 , r 2 3 : s u b r 4 , r 4 , r 0 4 : s t r r 4 , [ r 1 ]

Page 90: Introduction to Compiler Development

COMPILER AND ICTINDUSTRY

Page 91: Introduction to Compiler Development

WHERE CAN WE FIND COMPILER?

Page 92: Introduction to Compiler Development

SMART PHONESAndroid ART (Java VM)OpenGL|ES GLSL Compiler (Graphics)

Page 93: Introduction to Compiler Development

BROWSERSJavascript-related

Javascript Engine, e.g. V8, IonMonkey, JavaScriptCore,ChakraRegular Expression EngineWebAssembly

WebGL — OpenGL binding for the Web (Graphics)WebCL — OpenCL binding for the Web (Computing)

Page 94: Introduction to Compiler Development

DESKTOP APPLICATION RUN-TIMEENVIRONMENTS

Java run-time (Java, Scala, Clojure, Groovy).NET run-time (C#, F#)RubyPythonPHP

Page 95: Introduction to Compiler Development

VIRTUAL MACHINES AND EMULATORSSimulate different architecture (using dynamicrecompilation), e.g. QEMUSimulate special instructions that are not supported byhypervisor.

Page 96: Introduction to Compiler Development

DEVELOPMENT ENVIRONMENTC/C++ compiler, e.g. MSVC, GCC, LLVMJava compiler, e.g. OpenJDKDSP compilers for digital signal processors.VHDL compilers for IC design team.Profiling and/or intrusmentation tools, e.g. Valgrind, Dr.Memory, Address Sanitizer, Memory Sanitizer

Page 97: Introduction to Compiler Development

WHAT DOES A COMPILER TEAM DO?Develop the compiler toolchain for the in-house processors,

e.g. DSP, GPU, CPU, and etc.

Tune the performance of the existing compiler or virtualmachines.

Develop a new programming language or model (e.g. Cudafrom NVIDIA.)

Page 98: Introduction to Compiler Development

SOME RESEARCH TOPICS ...Memory model for concurrency — Designing a good

memory model at programming language level, whichshould be intuitive to human while open for

compiler/architecture optimization, is still a challengingproblem.

Both Java and C++ took efforts to standardize. However,there are still some unintuitive cases that are allowed and

some desirable cases being ruled out.

Page 99: Introduction to Compiler Development

THE ENDQ & A