A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

37
Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units Portugal

description

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units. João M. P. Cardoso April 30 , 2001 IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA. Faculty of Sciences and Technology University of Algarve, Faro. Portugal. Index. - PowerPoint PPT Presentation

Transcript of A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

Page 1: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Faculty of Sciences and TechnologyUniversity of Algarve, Faro

João M. P. Cardoso

April 30, 2001

IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units

Portugal

Page 2: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

IndexIndex

Introduction

Temporal Partitioning

Problem Definition

New vs Previous Approach

Algorithm Working Through an Example

Experimental Results

Related Work

Conclusions

Future Work

Page 3: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

IntroductionIntroduction

“Virtual Hardware”: Reuse of devices Save silicon area View “unlimited resources” Enabled by the dynamically reconfigurable FPGAs

Two concepts: Context switching among functionalities Allowing a large “function” to be executed

FPGA devices allowing virtualization: off-chip configurations on-chip configurations

Several research efforts…

Page 4: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

IntroductionIntroduction

Answers: Temporal Partitioning Sharing of Functional Units

Goal: combining the two...

dx

+

u

-

u

-

dx

+

u_1

x y

dxx

x_1

dxu

y_1

+

y<< 1 << 1

Size larger than the available reconfigware area?

Page 5: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Temporal PartitioningTemporal Partitioning

uxdxx u

aux1

+

x_1

dx

y_1

+

y<< 1

time

Page 6: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Temporal PartitioningTemporal Partitioning

aux1

dx

-

u

-

dx

+

u_1

y

<< 1

time

Page 7: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Temporal PartitioningTemporal Partitioning

aux1

+

ux

dxx

x_1

dxu

y_1

+

y<< 1

aux1

dx

-

u

-

dx

+

u_1

y

<< 1

time

Page 8: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Temporal PartitioningTemporal Partitioning

Create temporal partitions to be executed by time-sharing the device

Netlist level (structural) Difficulties when dealing with feedbacks Loss of Information Flat structure Intricate for exploiting sharing of functional units

Behavioral level (functional) Loops can be explicitly represented Better design decisions “A must” for compilers for reconfigurable computing

Page 9: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Problem DefinitionProblem Definition

But, if we decrease the needed area by sharing functional units?

Simultaneously Temporal Partitioning and sharing of Functional Units

THE PROBLEM:

Given a dataflow graph (representing a behavioral description), a library of components,...

Map the dataflow graph onto the available resources of the FPGA device: Considering sharing of Functional Units Considering Temporal Partitioning Decreasing the overall execution latency

Page 10: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

New vs Previous ApproachNew vs Previous Approach

Previous

Simultaneously Temporal

Partitioning and High-Level Synthesis

Component Library

ConstraintsDFG, CDFG

Circuit-generation,

Logic Synthesis

Temporal Partitioning

High-Level Synthesis

Component Library

Circuit-generation,

Logic Synthesis

ConstraintsDFG, CDFG

New

Page 11: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Suppose the following dataflow graphSuppose the following dataflow graph Consider:

Area(+) = 1 cell Area(x) = 2 cells Delay(+) = 1 control step (cs) Delay(x) = 2 cs

Total area of the DFG: 8 cells

Available Area: 3 cells

0 1

2

3

4

5

Page 12: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Calculate ASAP and ALAP valuesCalculate ASAP and ALAP values

Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3

0 1

2

3

4

5

Page 13: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Identify the critical pathIdentify the critical path

Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3

0 1

2

3

4

5

Page 14: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Create an initial number of TPs: suppose 3Create an initial number of TPs: suppose 3

0 1

2

3

4

5

MAXCS

1

2

3

Area

Page 15: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Map each node of the critical path on each temporal partitionMap each node of the critical path on each temporal partition

0 1

2

3

4

5

MAXCS

2 cs

1

2

3

3

4

5

Area

1 cs

1 cs

Page 16: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

0 1

2

3

4

5

MAXCS

2 cs

1

2

3

3

4

5

Area

1 cs

1 cs

Page 17: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

0

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 18: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 19: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

3

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

0 1

2

3

4

5

Page 20: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

2

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

0 1

2

3

4

5

Page 21: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

Try to map nodes in each temporal partition (3)Try to map nodes in each temporal partition (3)

0 1

2

3

4

5

2

Page 22: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Relax: add 1 clock step to MAXCS Relax: add 1 clock step to MAXCS

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Page 23: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

3

Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)

Page 24: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

2

Page 25: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

10

2 cs

1

2

3

3

4

5

1 cs

1 cs

MAXCSArea

0 1

2

3

4

5

2

Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)

2

Page 26: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (1) Merge Operation (1)

10

2 cs

1

2

3

3

4

5

2 cs

1 cs

MAXCSArea

0 1

2

3

4

5

2

Page 27: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (1) Merge Operation (1)

10

1,2

3

3

4

5

MAXCSArea

2

0 1

2

3

4

54 cs

1 cs

Page 28: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (2) Merge Operation (2)

10

1,2

3

3

4

5

1 cs

MAXCSArea

2

0 1

2

3

4

54 cs

Page 29: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Algorithm Working Through an ExampleAlgorithm Working Through an Example

Merge Operation (2) Merge Operation (2)

10

1,2,3

3

4

5

MAXCSArea

2

0 1

2

3

4

5

4 cs

Page 30: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Experimental ResultsExperimental Results

Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing

0

2

4

6

8

10

12

14

16

18

#T

Ps

-30%

-20%

-10%

0%

10%

20%

30%

Pe

rf. Im

pro

v.

#p(SA) #p(Our*)#p(Our*) %(#cs-Our*)%(#cs-Our**)

EX1 SEHWA HAL EWF

Page 31: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Experimental ResultsExperimental Results

048

12

16202428

#TP

s

-16%-10%-4%2%8%14%20%26%32%

Per

f. Im

prov

.

#p(SA) #p(Our*) #p(Our*)

%(#cs-Our*) %(#cs-Our**)

Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing

FIR MAT4x4

72 37

Page 32: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Experimental ResultsExperimental Results

Performance vs No. of Temporal PartitionsPerformance vs No. of Temporal Partitions

Mult4x4, RMAX=10 (no sharing of adders)

05

1015202530

1 3 5 7 9 11 13 15 17 19 21 23 25Initial Number of TPs

Final

#TPs

646668

7072

Exec

. (#c

s)

TPsExec.

Page 33: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Experimental ResultsExperimental Results

Is the algorithm good for scheduling?Is the algorithm good for scheduling?

0

5

10

15

20

25

30

35

#cs

known scheduling results

Our

EWF SEHWA

Comparison to some optimum results

Page 34: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Related WorkRelated Work

List-Scheduling considering dynamic reconfiguration [Vasilko et al., FPL’96]

ASAP [GajjalaPurna et al., IEEE Trans. on Comp., 1999]

Minimize latency taking onto account communication costs [Cardoso et al. VLSI’99]: Enhanced Static-List Scheduling Iterative approach (Simulated Annealing)

ILP formulation [SPARCs, DATE’98; RAW’98]

Enhanced Force-Directed List Scheduling [Pandey et al., SPIE’99]

And others [see the Related Work section]

Page 35: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

ConclusionsConclusions

Novel algorithm simultaneously doing temporal partitioning and sharing of functional units Low complexity Heuristic approach Based on gradually enlarging of time slots

Permits to exploit the duality between the number of temporal partitions and resource sharing

Close-to-optimum results with some examples

Results proved that the algorithm is not weak when performing scheduling

Page 36: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Future WorkFuture Work

Enhancements to the algorithm: consider functional units with pipelining consider pipelining between execution and

reconfiguration

Study the possibility to take into account communication and reconfiguration costs

Test results with a reconfigurable computing system (comercial board)

Page 37: A Novel Algorithm Combining Temporal Partitioning and  Sharing of Functional Units

Contact AuthorContact Author

João M. P. Cardoso

[email protected]

http://w3.ualg.pt/~jmcardo

THANK YOU!