Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24...

31
1 Mid-term Evaluation March 19 th , 2015 March 19 th , 2015 Christophe Huriaux — Mid-term Evaluation - 1 Christophe HURIAUX Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic Reconfiguration Accélérateurs matériels reconfigurables embarqués avec reconfiguration dynamique efficace

Transcript of Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24...

Page 1: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

1

Mid-term Evaluation March 19th, 2015

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 1

Christophe HURIAUX

Embedded Reconfigurable Hardware Accelerators with Efficient Dynamic

Reconfiguration

Accélérateurs matériels reconfigurables embarqués avec reconfiguration dynamique efficace

Page 2: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

2

Outline §  Introduction

§  Thesis context: FlexTiles in a nutshell §  Relocation: State of the Art §  Challenges

§  Contributions §  Hardware §  Architecture §  CAD tools

§  Side Activities §  Conclusion & Ongoing Work

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 2

Page 3: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

3

Context: FlexTiles in a nutshell

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 3

§  FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles

§  Provide a heterogeneous many-core architecture offering §  Large flexibility §  High-performance, energy efficiency §  Raised programming efficiency §  Self-adaptation through virtualization

Page 4: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

4

Context: FlexTiles in a nutshell

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 4

§  3D-Stacked Heterogeneous manycore §  General Purpose Processors (GPP)

§  for flexibility and programming homogeneity

§  Network On Chip §  Dedicated hardware accelerators mapped at

run-time on a reconfigurable layer

§  Reconfigurable layer with seamless task migration capabilities

§  Virtualization layer to provide an abstraction of the manycore and self adaptive services

§  Tool-chain for parallelization and compilation

Page 5: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

5 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 5 - 5 - 5

3D interface to the NoC

DSP blocks

Memory blocks

Page 6: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

6

State of the Art: Industry

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 6

§  Predefined reconfigurable regions [Altera2010][Xilinx2013]

§  Bit-stream

depends on task location

§  Use LUTs as interfaces with static logic

I/O I/O I/O I/O I/O I/O I/O

I/O I/O I/O I/O I/O I/O I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

I/O

HW Accelerator #1

BS #1

HW Accelerator #1

BS #2

Page 7: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

7

State of the Art: Academic

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 7

§  Online Rewrite of parts of the bit-stream [Horta2001] [Kalte2006]

§  Time consuming, limited flexibility

§  Offline calculations of possible differences [Touiza2012] [Beckhoff2014]

§  Memory consuming

§  Online place and route [Lysecky2004]

§  Time and memory consuming

§  No work on heterogeneous relocation !

Page 8: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

8

Challenges

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 8

§  Position-independent tasks §  Simple algorithms §  No predefined configuration domains

§  Cope with the heterogeneity §  Resource sharing/distribution easiness §  How to move a task around the logic fabric ?

§  Dedicated CAD tool-flow §  Needed to validate the other contributions

Page 9: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

9

Contributions

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 9

eFPGA

Architecture

CAD

Hardware

Routing Reconf. Mem.

Logic array

Controller

Placement Routing

Bitstream RTL generation

Arch. model

Virtual Bit-Stream

Reconf. Algorithm

Page 10: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

10

Contributions: Hardware

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 10

§  Homogeneous case §  No constraint on task placement §  Regular routing architecture

§  Cope with heterogeneity §  RAM, DSP, 3D I/Os §  Migration is limited

§  vertically to the same column §  to the next column containing same

complex blocks

Task Configured LE Logic Element (LE)

Page 11: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

11

Contributions: Hardware

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 11

§  Heterogeneous blocks routing is abstracted from logic routing §  Long lines allow a trade-off between placement

flexibility and routing complexity §  A two-level routing is performed at runtime:

§  Logic routing (as in the homogeneous case) §  Heterogeneous block routing through long lines

- 11

Page 12: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

12

Contributions: Hardware

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 12

§  Increase the flexibility of a task placement

§  Implemented in a modified version of Versatile Place & Route (VPR)

§  Evaluation on critical path delay and required routing resources: §  Only 2% delay increase in average §  1.8x routing resources increase (need specialized

routing algorithm for a more fair use)

§  Dissemination §  FPL’14 [Huriaux2014]

Page 13: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

13

Contributions: Architecture

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 13

§  A task is synthesized, placed & routed into a Virtual Bit-Stream (VBS) §  Independent from task physical location in the fabric §  No predefined configuration domains

1 2 3 11 321 2

3 212

212

3

1 321

§  A reconfiguration controller generates final BS at run-time

Page 14: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

14

Contributions: Architecture

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 14

§  Island-style FPGA §  Logic grid §  Mesh routing lines §  Switch boxes §  Interconnect

§  The VBS encode each island separately

Page 15: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

15

Contributions: Architecture

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 15

§  Each routing node is 6 or 3 transistors

§  The bitstream is the state of each transistor

§  123 bits in this example

4        5        6        7  

12  13    14  15    

0      1      2      3      

8        9      10    11  

16  

17  

18  

19                        20  

Page 16: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

16

Contributions: Architecture

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 16

§  The VBS abstracts the inner details of the routing

§  The routes are encoded as a list of connections: §  (20 ; 8) §  (1 ; 9) §  (5 ; 18)

4        5        6        7  

12  13    14  15    

0      1      2      3      

8        9      10    11  

16  

17  

18  

19                        20  

Page 17: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

17

Contributions: Architecture

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 17

§  The VBS encoding is position independent §  The final bit-stream can be calculated from the VBS

for differently routed network §  The online decoding algorithm is simple since

the global routing has been determined offline §  The resulting VBS is 2.5x smaller than the

equivalent raw bit-stream §  Up to 10x smaller using clusters of islands

§  Dissemination: §  DATE’15 [Huriaux2015]

§  Patent [Sentieys2014]

Page 18: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

18

Contributions: CAD tools

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 18

§  Based on the Verilog-To-Routing (VTR) framework §  Allows to describe any island-style architecture and

perform place and route operations

§  Uses Versatile Place and Route §  Widely used for academic FPGA architecture

research §  A custom backend reads the placement and

routing data to generate Virtual Bit-Streams

Page 19: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

19

Contributions: CAD tools

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 19

High-level Synthesis

High-level task description

RTL task description

HDL Synthesis

HDL task description

Flat logic netlist

Technology mapping

Mapped logic netlist

Placer Router

Placement data

Routing data

Arch. netlist

Bitstream generation

Virtual bit-stream Arch. description

Page 20: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

20

Side Activities §  Teaching

§  64h IUT (analog electronics, computer engineering) §  64h+64h ENSSAT (analog electronics, digital systems)

§  Courses

§  Scientific: 96h §  General: 46h

§  3 month mobility at University of Amherst (USA) with Pr. Russell Tessier (Summer 2014) §  Publication on FPGAs Trojans [Swierczynski2015]

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

Page 21: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

21

Conclusion & Ongoing Work §  Summary

§  Proposed a routing architecture to provide more flexibility for heterogeneous relocation

§  Introduced the concept of a position-independent and compressed task bit-stream: the Virtual Bit-Stream (VBS)

§  Developped the associated tool-flow to generate the VBS

§  Elaborated an RTL model of the whole architecture

§  Ongoing work §  Enhance the configuration method §  Dissemination on the CAD tools (ICCAD) §  Journal extension(s)

March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

Page 22: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

22 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

Q&A

Thank you J

Questions ?

Page 23: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

23 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

References

[Altera2010] Increasing Design Functionality with Partial and Dynamic Reconfiguration in 28-nm FPGAs, Altera Corporation, 2010. [Beckhoff2014] C. Beckhoff, D. Koch, and J. Torresen, Portable Module Relocation and Bitstream Compression for Xilinx FPGAs, in the Proceedings of the 24th conference of Field Programmable Logic, pp. 30–30. [Horta2001] E. Horta, J. W. Lockwood. PARBIT: a tool to transform bitfiles to implement partial reconfiguration of field pro- grammable gate arrays (FPGAs), Tech. Rep. WUCS-01-13, Washington University, 2001. [Huriaux2014] C. Huriaux, O. Sentieys, and R. Tessier, FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams, in the Proceedings of the 24th conference of Field Programmable Logic, pp. 30–30. [Huriaux2015] C. Huriaux, A. Courtay, O. Sentieys, Design Flow and Run-Time Management for Compressed FPGA Configurations, in the Proceedings of the 18th DATE conference, to appear. [Kalte2006] H. Kalte and M. Porrmann, REPLICA2Pro: Task Relocation by Bit- stream Manipulation in Virtex-II/Pro FPGAs, in the Proceedings of the 3rd conference on computing frontiers (CF). ACM, 2006, pp. 403–412. [Lysecky2004] R. Lysecky, F. Vahid, and S. X.-D. Tan, Dynamic FPGA routing for just-in-time FPGA compilation, in the Proceedings of the 41th Design Automation Conference, 2004, pp. 954–959. [Sentieys2014] O. Sentieys, A. Courtay, C. Huriaux and S. Pillement, Method and Device for Programming an FPGA, EU Patent, filed on Jan. 2014

Page 24: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

24 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

References [Swierczynski2015] P. Swierczynski, M. Fybriak, and C. Paar, C. Huriaux, and R. Tessier, Protecting against Cryptographic Trojans in FPGAs , in the Proceedings of the 23rd IEEE International Symposium on Field-Programmable Custom Computing Machines, 2015, to appear. [Touiza2012] M. Touiza, G. Ochoa-Ruiz, E.-B. Bourennane, A. Guessoum, and K. Messaoudi, A novel methodology for accelerating bitstream relocation in partially reconfigurable systems, Microprocessors and Microsystems, vol. 37, no. 3, pp. 358–372, 2012. [Xilinx2013] Partial Reconfiguration User Guide, UG702, Xilinx, Inc., 2013.

Page 25: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

25

FPL’14: Results

§  Architecture based on a simplified Stratix IV with: §  Dual-port 144k memories §  Fracturable 36x36 multipliers

§  Evaluation on two criteria §  Delay of the critical path §  Minimum channel width

§  Number of tracks in the homogeneous routing channels

§  Minimum channel width determined by VPR §  Not directly related to silicon area

September 3rd, 2014 C. Huriaux, O. Sentieys and R. Tessier - 25

Page 26: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

26

FPL’14: Results §  Benchmark set: VTR framework circuits [1]

September 3rd, 2014 C. Huriaux, O. Sentieys and R. Tessier - 26

[1] Rose, Jonathan, Luu, Jason, Yu, Chi Wai, et al. The VTR project: architecture and CAD for FPGAs from verilog to routing. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM, 2012. p. 77-86.

Circuit # Mem # Mult # LB bgm 0 11 2,174 boundtop 1 0 2,977 ch_intrinsics 1 0 272 diffeq1 0 5 41 diffeq2 0 5 43 LU8PEEng 45 8 30 mkDelayWorker32B 41 0 497 mkPktMerge 15 0 17 mkSMAdapter4B 5 0 181 or1200 2 1 273 raygentop 1 7 192 stereovision1 0 38 990

Page 27: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

27

FPL’14: Results: Delay

§  Estimation of the worst case delay §  Impossible to predict where connections to long lines

will be done §  Some channels crossing fixed-function blocks are

longer

September 3rd, 2014 C. Huriaux, O. Sentieys and R. Tessier - 27

Page 28: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

28

FPL’14: Results: Delay

§  Only 2% delay increase (in average)

September 3rd, 2014 C. Huriaux, O. Sentieys and R. Tessier - 28

0

0,2

0,4

0,6

0,8

1

1,2

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00 proposed/classic ns

Crit. Path (classic)

Crit. Path. (enhanced)

Crit. Path. (ratio)

Page 29: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

29

FPL’14: Results: Min. Channel Width

§  1.8X channel width increase on average §  Need for specific routing algorithms to deal with

the heterogeneous interconnection network

September 3rd, 2014 C. Huriaux, O. Sentieys and R. Tessier - 29

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

0,00

20,00

40,00

60,00

80,00

100,00

120,00

140,00

160,00 proposed/classic # tracks

min W (classic)

min W (enhanced)

min W (ratio)

Page 30: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

30 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

DATE’15: Results

§  Benchmark §  20 biggest

MCNC designs

§  Avg. Compression ratio: 40%

100

1000

10000

apex2apex4bigkeyclm

adesdiffeqdsipellipticex1010ex5pfriscm

isex3pdcs298s38417s38584.1seqsplatseng

0 %

20 %

40 %

60 %

80 %

100 %

Siz

e (K

bit)

Com

pres

sion

ratio

Circuit

Bit-stream size comparison

BSVBS

Ratio VBS/BS

Page 31: Christophe HURIAUXpeople.rennes.inria.fr/Christophe.Huriaux/static/huriaux-cst-defense.p… · 24 Christophe Huriaux — Mid-term Evaluation March 19th, 2015 - 20 References [Swierczynski2015]

31 March 19th, 2015 Christophe Huriaux — Mid-term Evaluation - 20

DATE’15: Results

§  Up to 10% compression using clusters

0

200

400

600

800

1000

1 2 3 4 5 6 7 8 9 100 %

20 %

40 %

60 %

80 %

100 %

VB

S s

ize

(Kbi

t)

Com

pres

sion

ratio

Cluster size

Size (min/max)Size (avg)

Compression (avg)