Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design...

27
Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski 1 , Krzysztof Kuchcinski 2 , Kevin Martin 3 , Erwan Raffin 3,4 and François Charot 3 1 Rennes University I/IRISA, France 2 Dept. of Computer Science, Lund University, Sweden 3 INRIA, Centre Rennes-Bretagne Atlantique, France 4 Thomson R&D, Rennes, France Wolinski & Kuchcinski 1(19)

Transcript of Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design...

Page 1: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Design of Processor Accelerators with Constraints

Christophe Wolinski1, Krzysztof Kuchcinski2,Kevin Martin3, Erwan Raffin3,4 and François Charot3

1Rennes University I/IRISA, France2Dept. of Computer Science, Lund University, Sweden

3INRIA, Centre Rennes-Bretagne Atlantique, France

4Thomson R&D, Rennes, FranceWolinski & Kuchcinski 1(19)

Page 2: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Problem Definition

Data-path

Processor core

x

in

out

x

in

out

in

+

in

out

+

in

out

in

out

/* Sample C code */ void fir(const int x[], const int h[], int y[]) { int i, j, sum; for (j = 0; j < 100; j=j+1) { sum = 0; for (i = 0; i < 8; i=i+1) sum += x[i + j] * h[i]; sum = sum >> 15; y[j] = sum; } }

Compilation

Wolinski & Kuchcinski 2(19)

Page 3: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

ASIP

Reg1 Reg2 RegN

InterconnectionsData-path

Processor core

Reconfigurablecell

in

x

in

xin

+

in

+

+ x x out

out out out

x

in

out

x

in

out

in

+

in

out

+

in

out

in

out

. . .

ASIP

Reconfigurablecell

m

m

*

+

in

out

+

*

in

+

in

in ctrl 0

ctrl 1

0

10 2

1

application specificinstructionsreconfigurable unitsbetter performanceand lower poweretc.

Wolinski & Kuchcinski 3(19)

Page 4: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Main Problems

Identification of computational patterns for instructionsSelection of a subset of instructions for implementationSequential or parallel execution scenariosPattern merging to build reconfigurable cell

Goal:Get speed-up of an application with minimal hardware cost.

Wolinski & Kuchcinski 4(19)

Page 5: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Main Problems

Identification of computational patterns for instructionsSelection of a subset of instructions for implementationSequential or parallel execution scenariosPattern merging to build reconfigurable cell

Goal:Get speed-up of an application with minimal hardware cost.

Wolinski & Kuchcinski 4(19)

Page 6: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Our Design Flow

Hardware Software

GECOS compiler framework

Pattern generation

Graph coveringScheduling

C Front-end

HCDG builder Dataflow analyser

PolyhedraltransformationsSelection

CDFG

HCDGHCDG

Polyhedric HCDG

HCDG

Architecture model

HCDG

DIPS

Pattern merging

SystemC, VHDL generator

Synthesis GCC NIOS

Program generator

SPS NIS

merged patterns

VHDL

Modified C

Assembly Code

DURASE flow

Extensionbitstream

C program

Target Instruction Set

ExtensionCABA Model

SystemC

CP-based methods:Pattern GenerationGraph covering andSchedulingPattern merging

Wolinski & Kuchcinski 5(19)

Page 7: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

CP Solution

JaCoP.graph constraints(Sub-)graph isomorphism constraintsClique constraintsSimple pathetc.

Other standard constraints (number of inputs/outputs, criticalpath, etc.)

Methods based on constraintsPattern generation- purely based on constraintMatch identificationPattern selection and schedulingPattern merging

Wolinski & Kuchcinski 6(19)

Page 8: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

CP Solution

JaCoP.graph constraints(Sub-)graph isomorphism constraintsClique constraintsSimple pathetc.

Other standard constraints (number of inputs/outputs, criticalpath, etc.)Methods based on constraints

Pattern generation- purely based on constraintMatch identificationPattern selection and schedulingPattern merging

Wolinski & Kuchcinski 6(19)

Page 9: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Generation

n0 n1 n2

nsn3

n4

n5

n6 n7

n8

Pattern 3 inputs2 outputs

allsucc(Ns) = {n4,n5,n6,n7,n8}

Seed node

∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)

∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X

m∈succ(n)

msel ≥ 1

∀n ∈ (N − (allsucc(ns) ∪ ns)) :X

m∈succ(n)

msel = 0⇒ nsel = 0

∀n ∈ allsucc(ns) : nsel = 1 ⇒X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel ≥ 1

∀n ∈ allsucc(ns) :X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel = 0 ⇒ nsel = 0

Wolinski & Kuchcinski 7(19)

Page 10: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Generation

n0 n1 n2

nsn3

n4

n5

n6 n7

n8

Pattern 3 inputs2 outputs

allsucc(Ns) = {n4,n5,n6,n7,n8}

Seed node

∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)

∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X

m∈succ(n)

msel ≥ 1

∀n ∈ (N − (allsucc(ns) ∪ ns)) :X

m∈succ(n)

msel = 0⇒ nsel = 0

∀n ∈ allsucc(ns) : nsel = 1 ⇒X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel ≥ 1

∀n ∈ allsucc(ns) :X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel = 0 ⇒ nsel = 0

Wolinski & Kuchcinski 7(19)

Page 11: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Generation

n0 n1 n2

nsn3

n4

n5

n6 n7

n8

Pattern 3 inputs2 outputs

allsucc(Ns) = {n4,n5,n6,n7,n8}

Seed node

∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)

∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X

m∈succ(n)

msel ≥ 1

∀n ∈ (N − (allsucc(ns) ∪ ns)) :X

m∈succ(n)

msel = 0⇒ nsel = 0

∀n ∈ allsucc(ns) : nsel = 1 ⇒X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel ≥ 1

∀n ∈ allsucc(ns) :X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel = 0 ⇒ nsel = 0

Wolinski & Kuchcinski 7(19)

Page 12: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Generation

n0 n1 n2

nsn3

n4

n5

n6 n7

n8

Pattern 3 inputs2 outputs

allsucc(Ns) = {n4,n5,n6,n7,n8}

Seed node

∀n ∈ Np ∧ n 6= ns ∃path(Pns , n, ns)

∀n ∈ (N − (allsucc(ns) ∪ ns)) : nsel = 1 ⇒X

m∈succ(n)

msel ≥ 1

∀n ∈ (N − (allsucc(ns) ∪ ns)) :X

m∈succ(n)

msel = 0⇒ nsel = 0

∀n ∈ allsucc(ns) : nsel = 1 ⇒X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel ≥ 1

∀n ∈ allsucc(ns) :X

m∈(pred(n)∩(allsucc(ns )∪ns ))

msel = 0 ⇒ nsel = 0

Wolinski & Kuchcinski 7(19)

Page 13: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Generation (cont’d)

DIPS ← ∅for each ns ∈ N

TPS ← ∅CPS ← FindAllPatterns(G, ns)for each p ∈ CPS

if ∀pattern ∈ TPS : p 6≡ patternTPS ← TPS ∪ {p},NMPp ← | FindAllMatches(G, p) |

NMPns ← | FindAllMatches(G, ns) |for each p ∈ TPS

if coef · NMPn ≤ NMPpDIPS ← DIPS ∪ {p}

return DIPS

Wolinski & Kuchcinski 8(19)

Page 14: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Selection

*0

in

+8

*1

in

*2

in

+9

*3

in

*4

in

+10

*5

in

*6

in

+11

*7

in

+26

+12 +13

+27

*14 *16 *17*15

+18 +19

*22 *20 *21*23

+24+25

outout

+

out

+

in *

in

+

in *

in

+

out

+

*

in

+

in*

in

+

out *

*

out

*

in+

out

in

+

* *

*

in

out

*

in

out +

* out

in *

out

+

out

+

in

*

+

in

out

*

in

*

out

in

+

out

in

+

out

in

+

out

in

// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,

M ← ∅for each p ∈ DIPS

Mp ← FindAllMatches(G, p)M ← M ∪ Mp

for each m ∈ Mfor each n ∈ m

matchesn ← matchesn ∪ {m}

m0 m1 m9n0n

*

all matches

all n

od

es n2

m2m3 m4m5 m6m7m8

n5

n3n4

n1

n6

* * * * * *

***

*** **

*

Wolinski & Kuchcinski 9(19)

Page 15: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Selection

*0

in

+8

*1

in

*2

in

+9

*3

in

*4

in

+10

*5

in

*6

in

+11

*7

in

+26

+12 +13

+27

*14 *16 *17*15

+18 +19

*22 *20 *21*23

+24+25

outout

+

out

+

in *

in

+

in *

in

+

out

+

*

in

+

in*

in

+

out *

*

out

*

in+

out

in

+

* *

*

in

out

*

in

out +

* out

in *

out

+

out

+

in

*

+

in

out

*

in

*

out

in

+

out

in

+

out

in

+

out

in

// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,

M ← ∅for each p ∈ DIPS

Mp ← FindAllMatches(G, p)M ← M ∪ Mp

for each m ∈ Mfor each n ∈ m

matchesn ← matchesn ∪ {m}

m0 m1 m9n0n

*

all matches

all n

od

es n2

m2m3 m4m5 m6m7m8

n5

n3n4

n1

n6

* * * * * *

***

*** **

*

Wolinski & Kuchcinski 9(19)

Page 16: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Selection

*0

in

+8

*1

in

*2

in

+9

*3

in

*4

in

+10

*5

in

*6

in

+11

*7

in

+26

+12 +13

+27

*14 *16 *17*15

+18 +19

*22 *20 *21*23

+24+25

outout

+

out

+

in *

in

+

in *

in

+

out

+

*

in

+

in*

in

+

out *

*

out

*

in+

out

in

+

* *

*

in

out

*

in

out +

* out

in *

out

+

out

+

in

*

+

in

out

*

in

*

out

in

+

out

in

+

out

in

+

out

in

// Inputs: G=(N,E)-- application graph,// DIPS-- Definitively Identified Pattern Set// Mp-- set of matches for pattern p,// M-- set of all matches,// matchesn-- set of matches that could cover the node n,

M ← ∅for each p ∈ DIPS

Mp ← FindAllMatches(G, p)M ← M ∪ Mp

for each m ∈ Mfor each n ∈ m

matchesn ← matchesn ∪ {m}

m0 m1 m9n0n

*

all matches

all n

od

es n2

m2m3 m4m5 m6m7m8

n5

n3n4

n1

n6

* * * * * *

***

*** **

*

Wolinski & Kuchcinski 9(19)

Page 17: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Selection and Scheduling

Match selection- optimize execution time

ExecutionTime =∑m∈M

msel ·mdelay

Match delay defined by constraints

mdelay = δinm + δm + δoutm

Wolinski & Kuchcinski 10(19)

Page 18: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Scheduling Example- FIR filter

Integer mul

in in

add mul

in in

mul

in in

add

add mul

in in

add mul

in in

Integer

add mul

in in

mul

in in

shr

out out

add

add

add

mul

in in

add

in

out

mul

in in

mul

in in

out

Integer mul

in in

add

out

Integer

shr

out out

add

in add

in in

match M_18

match M_7

match M_6

match M_5

match M_4

match M_3

match M_20

Wolinski & Kuchcinski 11(19)

Page 19: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Scheduling Example (cont’d)

!

"

#

$

%

&

'

(

)

*

"!

""

"#

"$

"%

"&

"'

"(

")

"*

#!

+,-

+,-

./")

./")

01

./(

./(

01

./'

./'

01

./&

./&

01

./%

./%

01

./$

./$

01

./#!

01

2,3

01

01

2,3

01

01

2,3

456755

01

01

01

01

01

01

01

2,3

2,3

0182,3

9:3;1<021<

2,3

!"#$%&'(%&

)*+,

)*+,

)*-

)*-

)*.

)*.

)*/

)*/

)*0

)*0

)*1

)*1

'%

)*23

3

+

2

1

0

/

.

-

,

4

+3

++

+2

+1

+0

+/

'%

'%

'%

(5#

(5#

(5#

'%

(5#

'%

(5#

'%

(5#

'%

678977

'%:(5#

;5<

;5<

Wolinski & Kuchcinski 12(19)

Page 20: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Merging

+0

out

+1

*2

in

+3

*4

in

+5

+6

out

*7

in

in

in

+0/+5

outmux3

+1/+6

mux1

out

*2

in

+3

in in

mux0

*4/*7

in

in in

+0/+6

out

+1

*2

in

+3/+5

in in*4/*7

in

a) original strategy b) our method

Wolinski & Kuchcinski 13(19)

Page 21: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Merging

+0

out

+1

*2

in

+3

*4

in

+5

+6

out

*7

in

in

in

+0/+5

outmux3

+1/+6

mux1

out

*2

in

+3

in in

mux0

*4/*7

in

in in

+0/+6

out

+1

*2

in

+3/+5

in in*4/*7

in

a) original strategy b) our method

Wolinski & Kuchcinski 13(19)

Page 22: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Merging- compatibility graph

*2/*7

+3/+5 +3/+6

+1/+5 +1/+6

+0/+5 +0/+6

*4/*7

(*2,+1)/(*7,+6)

(+3,+0)/(+5,+6) (+1,+0)/(+5,+6)

(*4,+1)/(*7,+6) (*2,+1,+0)/(*7,+6) (*4,+1,+0)/(*7, +6)

Wolinski & Kuchcinski 14(19)

Page 23: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Pattern Merging- additional constraints

Critical path constraints

Delayu = Latency(u) · selu∀(u, v) ∈ E : Startu + Delayu ≤ Startv∀u ∈ Out : Startu + Delayu ≤ CPL

Number of multiplexers on critical path

Wolinski & Kuchcinski 15(19)

Page 24: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Results

Results obtained for MediaBench and MiBench benchmark setscompiled for NIOS target processor with DURASE system.

2 in / 1 out 4 in / 2 outmodel A model B model A model B

Benchmarks Nod

es

cycl

es

coef

iden

tified

sele

cted

cove

rage

cycl

es

spee

dup

sele

cted

cove

rage

cycl

es

spee

dup

coef

iden

tified

sele

cted

cove

rage

cycl

es

spee

dup

sele

cted

cove

rage

cycl

es

spee

dup

JPEG Write BMP Header 34 34 0 6 2 82% 14 2.42 2 82% 14 2.42 0 66 2 88% 12 2.83 3 88% 12 2.83JPEG Smooth Downsample 66 78 0 5 2 19% 68 1.14 2 19% 68 1.14 0 49 4 95% 44 1.77 4 100% 35 2.22JPEG IDCT 250 302 0.5 28 10 76% 214 1.41 10 76% 134 2.25 0.5 254 13 83% 141 2.36 15 89% 112 2.69EPIC Collapse 274 287 0 11 8 68% 165 1.74 8 68% 165 1.74 0 111 11 71% 156 1.83 14 71% 159 1.8BLOWFISH encrypt 201 169 0.5 11 3 74% 90 1.87 3 74% 90 1.87 0 153 8 90% 81 2.08 7 92% 73 2.31SHA transform 53 57 0 5 3 64% 28 2.03 3 64% 28 2.03 0 48 8 98% 22 2.59 6 95% 17 3.35MESA invert matrix 152 334 0.5 2 2 10% 320 1.04 2 10% 320 1.04 0.5 53 9 65% 262 1.27 9 65% 243 1.37FIR unrolled 67 131 0 3 2 9% 126 1.04 2 9% 126 1.04 1 10 2 94% 98 1.30 2 97% 67 1.95FFT 10 18 0 0 - - - - - - - - 0 12 2 60% 10 1.80 2 60% 10 1.80

Average 50% 1.5 50% 1.7 83% 2 84% 2.3

Wolinski & Kuchcinski 16(19)

Page 25: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Results (cont’d)

Rijndael and GSM encoders for patterns with 7 nodes limit.

speed-up

Application |V| iden

tified

sele

cted

cove

rage

Seq. Par.Part of Rijndael encryption encoder 106 10 6 75% 1.9 2.9Part of GSM encoder 604 11 7 66% 2.1 3.4

Wolinski & Kuchcinski 17(19)

Page 26: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Conclusions

Constraint makes it possible to explore solutions that is difficultto examine using specific algorithms.Constraints provide flexibility of defining different conditions.(Sub-)graph isomorphism constraints offer easy way to definedesign problems.Experimental results are very encouraging.

Wolinski & Kuchcinski 18(19)

Page 27: Design of Processor Accelerators with Constraints · Design of Accelerators with Constraints Design of Processor Accelerators with Constraints Christophe Wolinski1, Krzysztof Kuchcinski2,

Design of Accelerators with Constraints

Further Reading

Ch. Wolinski and K. Kuchcinski.Automatic selection of application-specific reconfigurable processorextensions.In Proc. Design Automation and Test in Europe, Munich, Germany, March10-14, 2008.

Ch. Wolinski, K. Kuchcinski, K. Martin, E. Raffin, and F. Charot.How constrains programming can help you in the generation of optimizedapplication specific reconfigurable processor extensions.In Proc. of The Intl. Conference on Engineering of Reconfigurable Systemsand Algorithms, Las Vegas, USA, (Invited paper), July 13-16, 2009.

K. Martin, Ch. Wolinski, K. Kuchcinski, A. Floch, and F. Charot.Constraint-driven identification of application specific instructions in theDURASE system.In SAMOS IX: International Workshop on Systems, Architectures, Modelingand Simulation, Samos, Greece, July 20-23, 2009.

Wolinski & Kuchcinski 19(19)