Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual...

35
Design and Implementation of a TriCore Backend for the LLVM Compiler Framework Studienarbeit Christoph Erhardt Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg November 20, 2009 A TriCore Backend for LLVM (November 20, 2009) 1 – 25

Transcript of Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual...

Page 1: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Design and Implementation of a TriCore Backendfor the LLVM Compiler Framework

Studienarbeit

Christoph Erhardt

Friedrich-Alexander-Universitat Erlangen-Nurnberg

November 20, 2009

A TriCore Backend for LLVM (November 20, 2009) 1 – 25

Page 2: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Overview

Overview

The TriCore Processor Architecture

The LLVM Compiler Infrastructure

Design and Implementation of the Backend

Evaluation & Conclusion

A TriCore Backend for LLVM (November 20, 2009) Overview 2 – 25

Page 3: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Motivation What do we need it for?

TriCore chips are omnipresent around here:

QuadcopterHigh strikerCarolo Cup...

The RTSC (Real-Time Systems Compiler) project:

Operating system aware compiler for real-time applicationsProcesses atomic basic blocksBased on LLVM

RTSC should be able to generate TriCore machine code

A TriCore Backend for LLVM (November 20, 2009) Overview 3 – 25

Page 4: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Motivation What do we need it for?

TriCore chips are omnipresent around here:

QuadcopterHigh strikerCarolo Cup...

The RTSC (Real-Time Systems Compiler) project:

Operating system aware compiler for real-time applicationsProcesses atomic basic blocksBased on LLVM

RTSC should be able to generate TriCore machine code

A TriCore Backend for LLVM (November 20, 2009) Overview 3 – 25

Page 5: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

The TriCore Processor Architecture Overview

Three-in-one architecture

Real-time microcontroller unit

DSP

Superscalar RISC processor

Basic features

Load/store architecture

32-bit data, address, and instruction words

Some special 16-bit instruction words for higher code density

Little-endian byte order

16 data + 16 address registers

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 4 – 25

Page 6: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

The TriCore Processor Architecture Overview

Three-in-one architecture

Real-time microcontroller unit

DSP

Superscalar RISC processor

Basic features

Load/store architecture

32-bit data, address, and instruction words

Some special 16-bit instruction words for higher code density

Little-endian byte order

16 data + 16 address registers

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 4 – 25

Page 7: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Peculiarities Some things that TriCore handles in an unusual way

Strict distinction between data and address registers:

Also reflected in the calling conventionsSerious problem for the compiler!

Data registers are also used for floating-point operands

Special DSP-oriented instructions and addressing modes

Task/context model:

Automatic context save/restore upon call/returnContext save areas (linked lists managed by hardware)

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 5 – 25

Page 8: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

The LLVM Compiler Infrastructure Overview

Open-source compiler infrastructure project started in 2000

Main sponsor: Apple Inc.

Written in C++

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 6 – 25

Page 9: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Basic Architecture The classical three tiers of a compiler

x86assembly

SPARCassembly

C source

Fortransource

LLVM-GCCfrontend Optimizer

x86 codegenerator

SPARC codegenerator

Clangfrontend

... ...

LLVMassembly/

bitcode

LLVMassembly/

bitcode

... ...

Language-specific frontends

Optimizer: generic IR, analysis/transformation passes

Several backends for machine code generation

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 7 – 25

Page 10: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Unique Characteristics What does LLVM have that others don’t?

Not merely a compiler, but a compiler infrastructure:

Static compilationJust-in-time compilation

Strictly modular, library-based architecture:

Easily extendiblePossibility to incorporate parts of LLVM in other projects

BSD-style licence

Produces highly optimized machine code in an efficient way:

Memory-efficientTime-efficient

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 8 – 25

Page 11: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Design and Implementation of the Backend Overview

Extensive generic code generation framework:

Makes work a lot easier... but also imposes some problems in specific cases

Fixed class hierarchy

Many target-independent algorithms:

Instruction schedulingRegister colouring...

Code generation process executed by a series of passes

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 9 – 25

Page 12: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Code Generation Process

DAGLowering

DAGLegalization

InstructionSelection

SchedulingSSA-BasedOptimization

RegisterAllocation

Pro-/EpilogueInsertion

PeepholeOptimization

AssemblyPrinting

ListLLVM code(SSA form)

DAGsnot legalized

DAGslegalized

DAGsnative

instructions

ListSSA form

ListSSA form

Listwith physical

registers

Listwith resolved

stackreferences

Listwith resolved

stackreferences

Textassembly

code

TriCoreTargetLowering TriCoreDAGToDAGISelTriCoreInstrInfo

TriCoreRegisterInfoTriCoreInstrInfoTriCoreAsmPrinter

TriCoreInstrInfo

TriCoreInstrInfoTriCoreLoadStoreOpt

Post-AllocationPasses

Listwith physical

registers

TriCoreVirtInstrResolver

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 10 – 25

Page 13: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

TableGen One tool to rule them all...

Problem

Backend contains large portions of descriptive data

C++ obviously not suitable

TableGen

Language for domain-specific modelling

Similar to object-oriented approach:

Classes, records (objects), attributesInheritance

Definition files (.td) preprocessed by tblgen tool→ Auto-generation of C++ code

Used for description of:

Subtargets, registersCalling conventionsInstruction set

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 11 – 25

Page 14: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

TableGen One tool to rule them all...

Problem

Backend contains large portions of descriptive data

C++ obviously not suitable

TableGen

Language for domain-specific modelling

Similar to object-oriented approach:

Classes, records (objects), attributesInheritance

Definition files (.td) preprocessed by tblgen tool→ Auto-generation of C++ code

Used for description of:

Subtargets, registersCalling conventionsInstruction set

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 11 – 25

Page 15: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

SelectionDAG Construction Largely automated

Directed acyclic graph

Per basic block

Nodes: instructions

Edges:

Data dependenciesControl flow dependencies

Example

%mul = mul i32 %a, %a

%mul4 = mul i32 %b, %b

%add = add nsw i32 %mul4, %mul

ret i32 %add

isel input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

mul

0xa8325d0

i32

   

mul

0xa832548

i32

   

add

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

TriCoreISD::RET_FLAG

0xa8326e0

ch

GraphRoot

Page 16: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

SelectionDAG Construction Largely automated

Directed acyclic graph

Per basic block

Nodes: instructions

Edges:

Data dependenciesControl flow dependencies

Example

%mul = mul i32 %a, %a

%mul4 = mul i32 %b, %b

%add = add nsw i32 %mul4, %mul

ret i32 %add

isel input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

mul

0xa8325d0

i32

   

mul

0xa832548

i32

   

add

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

TriCoreISD::RET_FLAG

0xa8326e0

ch

GraphRoot

Page 17: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Troubles The integer vs. pointer problem

Problem

TriCore strictly distinguishes between addresses and data integers

Have to be put into separate register files → calling conventions!

LLVM’s backend framework treats pointers just like integers...

Solution

Annotation of “pointer / no pointer” flag in value type class

Promotion of this flag throughout the DAG construction phase(required some hacks...)

Case differentiations in all relevant situations

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 13 – 25

Page 18: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Troubles The integer vs. pointer problem

Problem

TriCore strictly distinguishes between addresses and data integers

Have to be put into separate register files → calling conventions!

LLVM’s backend framework treats pointers just like integers...

Solution

Annotation of “pointer / no pointer” flag in value type class

Promotion of this flag throughout the DAG construction phase(required some hacks...)

Case differentiations in all relevant situations

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 13 – 25

Page 19: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Instruction Selection Largely auto-generated

isel input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

mul

0xa8325d0

i32

   

mul

0xa832548

i32

   

add

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

TriCoreISD::RET_FLAG

0xa8326e0

ch

GraphRoot

Pattern matching→

def MULrr2 : Rr2Instr<0x0a,

(outs DR:$c), (ins DR:$a, DR:$b),

"mul\t$c, $a, $b",

[(set DR:$c, (mul DR:$a, DR:$b))]>;

scheduler input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

MULrr2

0xa832548

i32

     

MADDrrr2

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

RETsys

0xa8326e0

ch

GraphRoot

Page 20: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Instruction Selection Largely auto-generated

isel input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

mul

0xa8325d0

i32

   

mul

0xa832548

i32

   

add

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

TriCoreISD::RET_FLAG

0xa8326e0

ch

GraphRoot

Pattern matching→

def MULrr2 : Rr2Instr<0x0a,

(outs DR:$c), (ins DR:$a, DR:$b),

"mul\t$c, $a, $b",

[(set DR:$c, (mul DR:$a, DR:$b))]>;

scheduler input for euclidSquare:entry

EntryToken

0xa8321e8

ch

Register %reg1024

0xa832900

i32

Register %reg1025

0xa832988

i32

Register %D2

0xa8327f0

i32

   

CopyFromReg

0xa832a10

i32 ch

   

CopyFromReg

0xa832878

i32 ch

   

MULrr2

0xa832548

i32

     

MADDrrr2

0xa832658

i32

     

CopyToReg

0xa8324c0

ch flag

   

RETsys

0xa8326e0

ch

GraphRoot

Page 21: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Scheduling & Register Allocation Target-independent algorithms

Scheduling

DAGs → list (SSA form)

Target-independent algorithm using data from the instructiondescription table

Register Allocation

Virtual registers → physical registers

SSA deconstruction

Target-independent colouring algorithm using the registerinformation table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 15 – 25

Page 22: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Scheduling & Register Allocation Target-independent algorithms

Scheduling

DAGs → list (SSA form)

Target-independent algorithm using data from the instructiondescription table

Register Allocation

Virtual registers → physical registers

SSA deconstruction

Target-independent colouring algorithm using the registerinformation table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 15 – 25

Page 23: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Final passes Handwork

Virtual Instruction Resolution

Some instructions (e. g., moves) had operands of unknownregister classes at the time of their creation

Now that physical registers have been assigned, these instructionscan be resolved

Pre-/Epilogue Insertion

Insertion of pre-/epilogue code to entry/exits of all functions

Virtual stack slots → physical stack frame references

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 16 – 25

Page 24: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Final passes Handwork

Virtual Instruction Resolution

Some instructions (e. g., moves) had operands of unknownregister classes at the time of their creation

Now that physical registers have been assigned, these instructionscan be resolved

Pre-/Epilogue Insertion

Insertion of pre-/epilogue code to entry/exits of all functions

Virtual stack slots → physical stack frame references

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 16 – 25

Page 25: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Code Emission Partly auto-generated

Peephole Optimization

Merging of two subsequent 32-bit loads/stores into a single 64-bitload/store

# Before:st.w [%a10]4, %d9st.w [%a10]0, %d8

# After:st.d [%a10]0, %e8

Assembly Printing

Output of assembly code in text form

Large parts auto-generated from the instruction description table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 17 – 25

Page 26: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Code Emission Partly auto-generated

Peephole Optimization

Merging of two subsequent 32-bit loads/stores into a single 64-bitload/store

# Before:st.w [%a10]4, %d9st.w [%a10]0, %d8

# After:st.d [%a10]0, %e8

Assembly Printing

Output of assembly code in text form

Large parts auto-generated from the instruction description table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 17 – 25

Page 27: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

In practice How do I use it?

Required software

Clang compiler frontend (support has been integrated)

GNU Binutils for TriCore:

AssemblerLinker

Headers and libraries from TriCore-GCC

Small Perl wrapper script

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 18 – 25

Page 28: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Comparison to GCC

Criteria

1. Compilation speed

2. Code size

3. Code performance

Testing system

Benchmark application: CoreMark

Compilation PC: Core 2 Quad Q6600 (2.40 GHz)

Runtime system: TC1796 board (40 MHz)

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 19 – 25

Page 29: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Comparison to GCC

Criteria

1. Compilation speed

2. Code size

3. Code performance

Testing system

Benchmark application: CoreMark

Compilation PC: Core 2 Quad Q6600 (2.40 GHz)

Runtime system: TC1796 board (40 MHz)

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 19 – 25

Page 30: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

The CoreMark Benchmark

Alternative to the well-known Dhrystone benchmark

C source code publicly available (albeit not open source software)

Easily portable

Prevents compilers from “cheating” by optimizing away unusedcomputation results

Operations:

Linked list processingMatrix manipulationState machine operationsCRC computation

Results can be validated

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 20 – 25

Page 31: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Compilation Time

0

50

100

150

200

250

300

350

400

450

500

-O0 -O1 -O2 -O3 -Os

Com

pila

tion

time

(in m

illise

cond

s)

Optimization level

GCCLLVM

Takes about 10 % less time than GCC

Even faster when compiling at -O0

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 21 – 25

Page 32: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Code Size

0

5

10

15

20

25

30

-O0 -O1 -O2 -O3 -Os

Size

of t

he te

xt s

egm

ent (

in K

iB)

Optimization level

GCCLLVM

Generates slightly smaller code

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 22 – 25

Page 33: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Code Performance

0

5

10

15

20

25

30

35

40

-O0 -O1 -O2 -O3 -Os

Itera

tions

per

sec

ond

Optimization level

GCCLLVM

Code 12–20 % slower than the code generated by GCC

Further work needed to become fully competitive

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 23 – 25

Page 34: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

Summary

All of the basic functionality is there and is working reliably

Space for further optimizations and extensions

First TriCore compiler to be released under a BSD-style licence!

End goal: inclusion into LLVM’s repository

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 24 – 25

Page 35: Design and Implementation of a TriCore Backend for the LLVM … · Register Allocation Virtual registers ! physical registers SSA deconstruction Target-independent colouring algorithm

The End

Any Questions?

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 25 – 25