SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta...

25
SPARK SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta...

Page 1: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

SPARKSPARK

Accelerating ASIC designs through parallelizing high-level synthesis

Sumit Gupta

Rajesh Gupta

[email protected]

Page 2: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 2

OutlineOutline

The targetThe problemThe technologyThe competition

The market opportunity

The people The status The plan

Page 3: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

3

A Chip Is A Wonderful Thing!A typical chip, circa: 2006 50 square millimeters 50 million transistors 1-10 GHz, 100-1000 MOP/sq mm, 10-100 MIPS/mW 300 mm, 10,000 units/wafer, 20K wafers/month $5 per part

Does not matter what you build Processor, MEMS, Networking, Wireless, Memory

But it takes $20M to build one today, going to $50+M

So there is a strong incentive to port your application, system, box to the “chip”

Page 4: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

4

But Design Decisions Matter!

Page 5: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 5

Technical TargetTechnical Target

Anyone and everyone with a technology IP to grind (build on-chip) – E.g., WLAN, Cellphone Chips:

•about 50 GOPS in BB processing

– and about 72 other application ‘markets’ enhanced by ASIC/FPGA parts

More technically– Behavioral descriptions with complex

and nested conditionals and loops.

Page 6: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 6

The ProblemThe Problem

Doing chip design in a system house is increasingly a costly proposition– Case Study: Conexant in 802.11a chip

•9 month from PRD to parts•7 months from PRD to synthesizable RTL•The pain is in getting the algorithmic right

for the chip implementation

Would love a “compiler” – but “push-buttons” just do not work.

Page 7: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 7

Enter High-Level SynthesisEnter High-Level Synthesis

TaskAnalysis

HW/SWPartitioning

ASIC

ProcessorCore

Memory

FPGA

I/O

HardwareBehavioralDescription

SoftwareBehavioralDescription

SoftwareCompiler

HighLevel

Synthesis

Page 8: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 8

Poor QOR, even Poor Controllability Poor QOR, even Poor Controllability

M e m o r y

ALUCo

ntr

ol

Data path

d = e - f g = h + i

If NodeT F

c

x = a + bc = a < b

j = d x gl = e + x

x = a + b;c = a < b;if (c) then d = e – f;else g = h + i;j = d x g;l = e + x;

Page 9: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 9

The Technology: Enter SPARKThe Technology: Enter SPARK

C Input VHDLOutput

Original CDFG

Optimized CDFG

Scheduling& Binding

Source-Level Compiler

Transformations

Scheduling Compiler & Dynamic

Transformations

By the time you got to CDFG, it is already too late

Parallelize (judiciously) and submerge it with HLS.

Page 10: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 10

Why SPARK, Why Now?Why SPARK, Why Now?

The chip designer is finally– letting go of the cycle boundary in design– being replaced by non-chip types

Education and awareness through – Synopsys Behavioral Compiler – But not ready to be the dominator…

SPARK changes the landscape– Parallelizing compilation as the ‘power

tool’

Page 11: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 11

SPARK Core StrengthsSPARK Core Strengths

Focus on – Transformations that increase

amount of parallelism available in the source description

– Tightly integrate with parallelizing compiler transformations

Provide a HLS Toolbox for the micro-architect– Fire the circuit designer.

Page 12: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 12

The POC and The ExperimentsThe POC and The Experiments

Intel ILD design– Produced a design that fundamentally

restructures the input description (the way a designer would, and no tool could)

Bunch of other media benchmarks– 40-70% improvement in delay for the

same area– Based on Synopsys backend

See appendix.

Page 13: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 13

The Market OpportunityThe Market Opportunity

The big picture– Semi is $140B, Fabless Semi is $15B– EDA currently is about $4B

Current EDA market– $1B Synthesis and verification

•$400M synthesis, $400M verification, $200M E.

– $3B in PDA, IP and Design Services. $400M Synthesis

– 90% is RTL and below. Market movement and ‘structural’ changes.

Page 14: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 14

Future ESL and Synthesis MarketFuture ESL and Synthesis Market

Keys to growth – ASIC focus (including structured ASICS)– ‘Power tool’ key to commanding high

ASPsChallenge

– The raid of the FPGAs• In which case, PHLS will be OEM’d

– ASICs mired in Nano swamp•Attention shifts to PDA, stationary semi

market

Page 15: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 15

The CompetitionThe Competition

The early educator: Synopsys BC– Classical HLS that just does not work,

fundamentally flawed The improviser: Cadence Get2Chip A2C

– Done a good job at RTL The others

– Celoxica, Forte, Synfora, BlueSpec– “Boutiques” primarily targeted for

“somebody else”

Page 16: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

SynopsSynopsysys

Behav. Behav. CompilCompilerer

Traditional HLS: Synthesis Traditional HLS: Synthesis from subset of SystemC and from subset of SystemC and Behav VHDLBehav VHDL

No parallelizing No parallelizing and beyond basic and beyond basic block (BBB) block (BBB) transformationstransformations

CadencCadence/e/

Get2ChGet2Chipip

A2CA2C Traditional HLS; closely tied to Traditional HLS; closely tied to logic synthesislogic synthesis

No parallelizing No parallelizing and BBB trafosand BBB trafos

CeloxicCeloxicaa

DK DK Design Design SuiteSuite

Uses explicitly parallelized Uses explicitly parallelized input in Handel-C; traditional input in Handel-C; traditional HLSHLS

No pure No pure behavioral input behavioral input such as C or such as C or SystemCSystemC

Forte Forte DSDS

CyntheCynthesizersizer

Traditional HLS from SystemC Traditional HLS from SystemC with design space explorationwith design space exploration

No parallel and No parallel and BBB trafosBBB trafos

SynforaSynfora NANA Maps applications to a VLIW Maps applications to a VLIW processor and a pipelined processor and a pipelined array of processors – uses array of processors – uses parallelizing transformations in parallelizing transformations in VLIW compilerVLIW compiler

Does not do HLS Does not do HLS at all – it’s more at all – it’s more of a mapping tool of a mapping tool from C to a from C to a processor arrayprocessor array

BlueSpBlueSpecec

NANA Based on term rewriting Based on term rewriting systems; starts from a systems; starts from a description closer to RTL than description closer to RTL than to behavto behav

Not HLS – input Not HLS – input is behav code is behav code already already scheduled into scheduled into statesstates

The Competition

Page 17: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 17

What Do We Want To Do?What Do We Want To Do?

Make it accessible to SystemC, SystemVerilog– Front end architecture to port it across

Implement missing compiler passes– Really standard stuff but missing piece now

Work out a design flow– Build a path to existing RTL flow incl.

validation Industry strength characterization Secure IP rights

Page 18: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 18

Synergistic ActivitiesSynergistic Activities

SPARK release on the web– Mailing list– Build the users group– Expand to SystemC User

CommunityKluwer book in preparation

– Announcement at DATE, Feb 2004– Availability at DAC, June 2004

Page 19: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 19

Exit StrategyExit Strategy

Not yet worked out, but… Build a stand-alone EDA company

– As a standalone it would not work unless complemented by verification

Build to be bought– As an HLS company

License technology– Companies that have shown interest in

licensing it•Poseidon Systems, Cadence

Page 20: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

©2003 Spark Team, Confidential 20

SPARK HistorySPARK History

A joint project– Rajesh Gupta, Nikil Dutt, Alex Nicolau

Kicked off in Fall 1999– First Ph.D., Sumit Gupta, 2003

Supported by– Semiconductor Research Corporation,

SRC– Intel grant as a match to UC Micro– National Science Foundation.

Page 21: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

21Copyright Sumit Gupta 2003

Case Study: Case Study: IntelIntel Instruction Length Instruction Length DecoderDecoder

Stream ofInstructions

Instruction Length Decoder

FirstInsn

SecondInsn

ThirdInstruction

Instruction BufferInstruction Buffer

Page 22: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

22Copyright Sumit Gupta 2003

ILD Synthesis: Resulting ILD Synthesis: Resulting ArchitectureArchitecture

Speculate Operations,Fully Unroll Loop,

Eliminate Loop Index Variable

Multi-cycle Sequential

Architecture

Multi-cycle Sequential

Architecture

Single cycle Parallel

Architecture

Single cycle Parallel

Architecture

Our toolbox approach enables us to develop a Our toolbox approach enables us to develop a script to synthesize applications from different script to synthesize applications from different domainsdomains

Final design looks close to the actual Final design looks close to the actual implementation done by Intelimplementation done by Intel

Page 23: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

23Copyright Sumit Gupta 2003

Target ApplicationsTarget ApplicationsDesignDesign # of # of

IfsIfs# of # of

LoopsLoops# Non-# Non-Empty Empty Basic Basic BlocksBlocks

# of # of OperatiOperati

onsons

MPEG-1 MPEG-1 pred1pred1

44 22 1717 123123

MPEG-1 MPEG-1 pred2pred2

1111 66 4545 287287

MPEG-2 MPEG-2 dp_framdp_fram

ee

1818 44 6161 260260

GIMP GIMP

tilertiler1111 22 3535 150150

Page 24: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

24Copyright Sumit Gupta 2003

MPEG-1 Pred1 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

MPEG-1 Pred2 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

Scheduling & Logic Synthesis Scheduling & Logic Synthesis ResultsResults

Non-speculative CMs: Within BBs & Across Hier Blocks

42%

10%

36%

36%

8%

39%

Overall: 63-66 % improvement in DelayOverall: 63-66 % improvement in Delay

Almost constant Area Almost constant Area

Page 25: SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta rgupta@ucsd.edu.

25Copyright Sumit Gupta 2003

Non-speculative CMs: Within BBs & Across Hier Blocks

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

Scheduling & Logic Synthesis Scheduling & Logic Synthesis ResultsResultsMPEG-2 DpFrame Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

GIMP Tiler Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

14%

20%1%

33%

41%

52%

Overall: 48-76 % improvement in DelayOverall: 48-76 % improvement in Delay

Almost constant Area Almost constant Area