Center for Embedded Computer Systems University of California, Irvine

20
Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis Supported by Semiconductor Research Corporation Sumit Gupta

description

Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis. Sumit Gupta. Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark. Supported by Semiconductor Research Corporation. M e m o r y. Control. ALU. Data path. - PowerPoint PPT Presentation

Transcript of Center for Embedded Computer Systems University of California, Irvine

Page 1: Center for Embedded Computer Systems University of California, Irvine

Center for Embedded Computer SystemsUniversity of California, Irvine

http://www.cecs.uci.edu/~spark

Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis

Supported by Semiconductor Research Corporation

Sumit Gupta

Page 2: Center for Embedded Computer Systems University of California, Irvine

High Level SynthesisHigh Level Synthesis

M e m o r y

ALUCon

trol

Data path

d = e - f g = h + i

If NodeT Fc

x = a + bc = a < b

j = d x gl = e + x

x = a + b;c = a < b;if (c) then d = e – f;else g = h + i;j = d x g;l = e + x;

Transform behavioral descriptions to RTL/gate level

From C to CDFG to Architecture

Page 3: Center for Embedded Computer Systems University of California, Irvine

Our Approach to HLSOur Approach to HLS

Optimizing Compiler and Parallelizing Compiler transformations Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformationsSource-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code MotionsCode Restructuring by Speculative Code Motions Operation replication to improve concurrencyOperation replication to improve concurrency Transformations applied dynamically during scheduling to exploit Transformations applied dynamically during scheduling to exploit

new opportunities due to code motionsnew opportunities due to code motions Extract a high degree of parallelization using extensive Code Extract a high degree of parallelization using extensive Code

Transformations Transformations Improve Resource Utilization and increase Code CompactionImprove Resource Utilization and increase Code Compaction Reduce impact of programming style and control constructs on Reduce impact of programming style and control constructs on

HLS resultsHLS results Our approach is particularly suited to descriptions with nested Our approach is particularly suited to descriptions with nested

conditionals and loopsconditionals and loops

C Input VHDLOutput

Original CDFG

Optimized CDFG

Scheduling& Binding

Source-Level Compiler

Transformations

Scheduling Compiler

Transformations

Page 4: Center for Embedded Computer Systems University of California, Irvine

Hierarchical Intermediate Hierarchical Intermediate RepresentationRepresentation We use We use Hierarchical Task GraphsHierarchical Task Graphs (HTGs) (HTGs)

Maintain structured view of design descriptionMaintain structured view of design description Consists of hierarchy of basic blocks and HTG nodesConsists of hierarchy of basic blocks and HTG nodes

3 Types of HTG Nodes:3 Types of HTG Nodes:

SingleSingle: No sub-nodes: No sub-nodes CompoundCompound: sub-nodes: sub-nodes LoopLoop: Encapsulate loops: Encapsulate loops

Augmented by data Augmented by data dependency graphsdependency graphs

Enable Coarse-Grain Enable Coarse-Grain transformationstransformations

Page 5: Center for Embedded Computer Systems University of California, Irvine
Page 6: Center for Embedded Computer Systems University of California, Irvine

TrailblazingTrailblazing: Hierarchical Code Motion : Hierarchical Code Motion TechniqueTechnique

Can move operations across large pieces of Can move operations across large pieces of code without visiting each node in betweencode without visiting each node in between

Page 7: Center for Embedded Computer Systems University of California, Irvine

Speculative Code MotionsSpeculative Code Motions

+

+If Node

T FReverse

Speculation

Conditional Speculation

Speculation

Across HierarchicalBlocks

_

a

b

c

Operation Movement to reduce impact of Programming Style on Quality of HLS Results

Early Condition Execution

Evaluates conditionsAs soon as possible

Page 8: Center for Embedded Computer Systems University of California, Irvine

Scheduling HeuristicScheduling Heuristic

BB 2 BB 3

BB 1

BB 6 BB 7

BB 5

BB 4

BB 8

+

+

+

Speculate

c

b

d

+ +a Get Available OpsGet Available Ops a, b, c, da, b, c, d

Determine Code Determine Code Motions RequiredMotions Required

Assign Cost to Assign Cost to each Operationeach Operation

Cost is based on Cost is based on data dependency data dependency chainchain

Schedule Op with Schedule Op with lowest Costlowest Cost

BB 0

BB 9

Speculate

Across HTG

Page 9: Center for Embedded Computer Systems University of California, Irvine

BB 2 BB 3

BB 1

BB 6 BB 7

BB 5

BB 4

BB 8

+

+ c

b

+a BB 0

BB 9+ d

Scheduling HeuristicScheduling Heuristic

BB 2 BB 3

BB 1

BB 6 BB 7

BB 5

BB 4

BB 8

+

+

+

c

b

d

+ +a BB 0

BB 9

Speculate

Across HTG

Page 10: Center for Embedded Computer Systems University of California, Irvine

Increasing the Scope of Code Increasing the Scope of Code MotionsMotions

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+ a

+ b_ c

_ dS0

S1

S2

S3

++Resource Allocation

Original Design

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ d

Scheduled Design

UnbalancedConditional

Page 11: Center for Embedded Computer Systems University of California, Irvine

Insert New Scheduling Step in Insert New Scheduling Step in Shorter BranchShorter Branch

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ dS0

S1

S2

++Resource Allocation

If NodeT F

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ d

e_ _e

Page 12: Center for Embedded Computer Systems University of California, Irvine

Common Sub-Expression Common Sub-Expression EliminationElimination

a = b + c;c = b < c;if (c) d = b + c;else e = g + h;

C Description

BB 2 BB 3

BB 1

d = b + c

BB 4

a = b + c

e = g + h

HTG Representation

If NodeT F

BB 0

BB 2 BB 3

BB 1

d = a

BB 4

a = b + c

e = g + h

After CSE

If NodeT F

BB 0

Page 13: Center for Embedded Computer Systems University of California, Irvine

New Opportunities for New Opportunities for “Dynamic” CSE“Dynamic” CSE

Due to Speculative Code Due to Speculative Code MotionsMotions

BB 2 BB 3

BB 1

a = b + c

BB 6 BB 7

BB 5

d = b + c

BB 4

BB 8

Speculate

BB 2 BB 3

BB 1

a = dcse

BB 6 BB 7

BB 5

d = dcse

BB 4

BB 8

dcse = b + c BB 0BB 0

Page 14: Center for Embedded Computer Systems University of California, Irvine

SPARSPARKK

High High Level Level

SynthesiSynthesis s

FramewoFrameworkrk

Page 15: Center for Embedded Computer Systems University of California, Irvine

ExperimentationExperimentation Experiments for several transformationsExperiments for several transformations

Pre-synthesis transformations: loop invariant code Pre-synthesis transformations: loop invariant code motions, CSEmotions, CSE

Speculative Code MotionsSpeculative Code Motions Dynamic CSEDynamic CSE

We have used Spark to synthesize designs We have used Spark to synthesize designs derived from several industrial designsderived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing softwareMPEG-1, MPEG-2, GIMP Image Processing software

Scheduling ResultsScheduling Results Number of States in Number of States in

FSMFSM Cycles on Longest Path Cycles on Longest Path

through Designthrough Design

VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length Critical Path Length

(ns)(ns) Unit AreaUnit Area

Page 16: Center for Embedded Computer Systems University of California, Irvine

Target ApplicationsTarget ApplicationsDesignDesign # of # of

IfsIfs# of # of

LoopsLoops# Non-# Non-Empty Empty Basic Basic BlocksBlocks

# of # of OperatiOperati

onsons

MPEG-1 MPEG-1 pred1pred1

44 22 1717 123123

MPEG-1 MPEG-1 pred2pred2

1111 66 4545 287287

MPEG-2 MPEG-2 dp_framdp_fram

ee

1818 44 6161 260260

GIMP GIMP tilertiler

1111 22 3535 150150

Page 17: Center for Embedded Computer Systems University of California, Irvine

Code Motions: Logic Code Motions: Logic Synthesis ResultsSynthesis Results

MPEG Pred1 Function

Critical Path (cns)

Total Delay (c*lns)

Unit Area

d

MPEG Pred2 Function

0

0.2

0.4

0.6

0.8

1

1.2

Critical Path(c ns)

Total Delay(c*l ns)

Unit Area

Nor

mal

ized

Val

ues

Within Basic Blocks &Across Hierar. Blocks

+ Speculation

+ Reverse Speculation& Early Condition Execution

Condition Speculation

Page 18: Center for Embedded Computer Systems University of California, Irvine

CSE/Dynamic CSE ResultsCSE/Dynamic CSE ResultsMPEG Pred2 Function

0

0.2

0.4

0.6

0.8

1

Critical Path(c ns)

Total Delay(c*l ns)

Unit Area

Nor

mal

ized

Val

ues

MPEG Pred1 Function

Critical Path (cns)

Total Delay (c*lns)

Unit Area

d

All Code Motions Enabled

+ Only CSE

+ Only Dynamic CSE

+ CSE & Dynamic CSE

Page 19: Center for Embedded Computer Systems University of California, Irvine

ConclusionsConclusions Parallelizing code transformations enable a new range of Parallelizing code transformations enable a new range of

HLS transformationsHLS transformations Can provide the needed improvement in quality of HLS results Can provide the needed improvement in quality of HLS results

for them to be competitive against manually designed circuits. for them to be competitive against manually designed circuits. Synthesis approach can dominate SOC embedded systems Synthesis approach can dominate SOC embedded systems

design design Can enable productivity improvements in microelectronic designCan enable productivity improvements in microelectronic design

Built a synthesis system with a range of code Built a synthesis system with a range of code transformationstransformations Platform for applying Coarse and Fine-grain OptimizationsPlatform for applying Coarse and Fine-grain Optimizations Code transformations address complex control flowCode transformations address complex control flow Tool-box approach where transformations and heuristics can be Tool-box approach where transformations and heuristics can be

developeddeveloped Enables finding the right synthesis script for different application domainsEnables finding the right synthesis script for different application domains

Performance improvements of 60-70 % across a number of Performance improvements of 60-70 % across a number of designsdesigns

We have also shown its effectiveness on an Intel designWe have also shown its effectiveness on an Intel design

Page 20: Center for Embedded Computer Systems University of California, Irvine

PublicationsPublications Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-

Intensive Designs Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATEDATE, March 2003 , March 2003 SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler

TransformationsTransformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI DesignVLSI Design 2003 2003 Best Paper AwardBest Paper Award

Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Dynamic Common Sub-Expression Elimination during Scheduling in High-Level SynthesisSynthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSSISSS 2002 2002

Coordinated Transformations for High-Level Synthesis of High Performance Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor BlocksMicroprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, Nicolau, DACDAC 2002 2002

Conditional Speculation and its Effects on Performance and Area for High-Level Conditional Speculation and its Effects on Performance and Area for High-Level SynthesisSynthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSSISSS 2001 2001

Speculation Techniques for High Level synthesis of Control Intensive DesignsSpeculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DACDAC 2001 2001

Analysis of High-level Address Code Transformations for Programmable ProcessorsAnalysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATEDATE 2000 2000

Synthesis of Testable RTL Designs using Adaptive Simulated Annealing AlgorithmSynthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI DesignVLSI Design, 1998 , 1998 Best Student Best Student Paper AwardPaper Award

Book ChapterBook Chapter ASIC DesignASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by , S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by

Wai-Kai ChenWai-Kai Chen