Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary...

51
Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009

Transcript of Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary...

Page 1: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Flexible Hardware Design at Flexible Hardware Design at Low Levels of AbstractionLow Levels of Abstraction

Emil Axelsson

Hardware Description and Verification

May 2009

Page 2: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Why low-level?Why low-level?

gadget a b = case a of2 -> thing (b+10)3 -> thing (b+20)_ -> fixNumber a

Related question: Why is some software written in C? (but difference between high- and low-level is much greater in hardware)

Ideal:Software-like code → magic compiler → chip masks

Page 3: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Why low-level?Why low-level?

Related question: Why is some software written in C? (but difference between high- and low-level is much greater in hardware)

Ideal:Software-like code → magic compiler → chip masks

gadget a b = case a of2 -> thing (b+10)3 -> thing (b+20)_ -> fixNumber a

Page 4: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Why low-level?Why low-level?

Reality:“Ascii schematic” → chain of synthesis tools → chip masks

Page 5: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Why low-level?Why low-level?

Reality:“Ascii schematic” → chain of synthesis tools → chip masks

Reiterate to improve timing/power/area/etc.Very costly / time-consuming

Each fabrication costs ≈ $1.000.000

Page 6: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Failing abstractionFailing abstraction

Realistic flow cannot avoid low-level awareness

ParadoxModern designs require higher abstraction level...but...Modern chip technologies make abstraction harder

Main problem: Routing wires are dominant in signal delays and power consumption

Controlling the wires is key to the performance!

Page 7: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Gate vs. wire delay under scalingGate vs. wire delay under scaling

Process technology node [nm]

Rel

ativ

e de

lay

Page 8: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Physical design levelPhysical design level

Certain high-performance components (e.g. arithmetic) need to be designed at even lower level

Physical level:A set of connected standard cells (implemented gates)Absolute or relative positions of cells (placement)Shape of connecting wires (routing)

Page 9: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Physical design levelPhysical design level

Design by interfacing to physical CAD toolsCall automatic tools for certain tasks (mainly routing)

Often done through scripting codeTediousHard to explore design spaceLimited design reuse

Aim of this work:Raise the abstraction level of physical design!Raise the abstraction level of physical design!

Page 10: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Two ways to raise abstractionTwo ways to raise abstraction

Automatic synthesis+ Powerful abstraction– May not be optimal for e.g. high-performance arithmetic– Opaque (hard to control the result)– Unstable (heuristics-based)

Language-based techniques (higher-order functions, recursion, etc.)

+ Transparent, stable– Still quite low-level– Somewhat limited to regular circuits

Page 11: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Two ways to raise abstractionTwo ways to raise abstraction

Automatic synthesis+ Powerful abstraction– May not be optimal for e.g. high-performance arithmetic– Opaque (hard to control the result)– Unstable (heuristics-based)

Language-based techniques (higher-order functions, recursion, etc.)

+ Transparent, stable– Still quite low-level– Somewhat limited to regular circuits

Our approach

Page 12: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

LavaLava

Gate-level hardware description in Haskell

Parameterized module generators: Haskell programs that generate circuits

Can be smart, e.g. optimize for speed in a given environment

Basic placement expressed through combinators

Used successfully to generate high-performance FPGA cores

Page 13: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Wired: Extension to LavaWired: Extension to Lava

Finer control over geometry

More accurate performance modelsFeedback from timing/power analysis enables self-optimizing generators

Wire-awareness (unique for Wired)Performance analysis based on wire length estimatesControl routing through “guides” (experimental)

...

Page 14: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Monads in HaskellMonads in Haskell

Haskell functions are pure

Side-effects can be “simulated” using monads

add a b = do    as <­ get    put (a:as)    return (a+b)

*Main> runState prog [](26, [18,11,5])

Monads can also be used to model e.g. IO, exceptions,

non-determinism etc.

prog = do    a <­ add 5 6    b <­ add a 7    add b 8

Syntactic sugar, expands to a pure

program with explicit state passing

Result Side-effect

Page 15: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Monad combinatorsMonad combinators

Haskell has a general and well-understood combinator library for monadic programs

*Main> runState (mapM (add 2) [11..13]) []([13,14,15],[2,2,2])

*Main> runState (mapM (add 2 >=> add 4) [11..13]) []([17,18,19],[4,2,4,2,4,2])

Page 16: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Example: Parallel prefixExample: Parallel prefix

Given inputs

compute

for ∘, an associative (but not necessarily commutative) operator

x1, x2, … xn

y1 = x1

y2 = x1 ∘ x2

yn = x1 ∘ x2 ∘ … ∘ xn

Page 17: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefixParallel prefix

Very central component in microprocessors

Most common use: Computing carries in fast adders

Trying different operators:

Addition: prefix (+) [1,2,3,4]

Page 18: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefixParallel prefix

Very central component in microprocessors

Most common use: Computing carries in fast adders

Trying different operators:

Addition: prefix (+) [1,2,3,4]= [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10]

Page 19: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefixParallel prefix

Very central component in microprocessors

Most common use: Computing carries in fast adders

Trying different operators:

Addition: prefix (+) [1,2,3,4]= [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10]

Boolean OR: prefix (||) [F,F,F,T,F,T,T,F]

Page 20: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefixParallel prefix

Very central component in microprocessors

Most common use: Computing carries in fast adders

Trying different operators:

Addition: prefix (+) [1,2,3,4]= [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10]

Boolean OR: prefix (||) [F,F,F,T,F,T,T,F]

= [F,F,F,T,T,T,T,T]

Page 21: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefixParallel prefix

Implementation choices (relying on associativity):

prefix (∘) [x1,x2,x3,x4] = [y1,y2,y3,y4]

Serial: y4 = ((x1 ∘ x2) ∘ x3) ∘ x4

Parallel: y4 = (x1 ∘ x2) ∘ (x3 ∘ x4)

Sharing: y4 = y3 ∘ x4

Page 22: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

There are many of them...There are many of them...

Sklansky

Brent-Kung

Ladner-Fischer

Page 23: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Parallel prefix: SklanskyParallel prefix: Sklansky

sklansky op [a] = return [a]

sklansky op as = do

    let k       = length as `div` 2        (ls,rs) = splitAt k as'

    ls' <­ sklansky op ls    rs' <­ sklansky op rs

    rs'' <­ sequence [op (last ls', r) | r <­ rs']    return (ls' ++ rs'')

Simplest approach (divide-and-conquer)

Purely structural (no geometry)

Could have been (monadic) Lava

Page 24: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement: Add placementRefinement: Add placement

sklansky op [a] = space cellWidth [a]

sklansky op as = downwards 1 $ do

    let k       = length as `div` 2        (ls,rs) = splitAt k as'

    (ls',rs') <­ rightwards 0 $ liftM2 (,)        (sklansky op ls)        (sklansky op rs)

    rs'' <­ rightwards 0 $              sequence [op (last ls', r) | r <­ rs']    return (ls' ++ rs'')

Page 25: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Sklansky with placementSklansky with placement

Simple postscript allows interactive development of

placement

Page 26: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement: Add routing guidesRefinement: Add routing guides

bus = rightwards 0 . mapM bus1  where    bus1 = space 2750 >=> guide 3 500 >=> space 1250

sklanskyIO op = downwards 0      $ inputList 16 "in"    >>= bus    >>= space 1000    >>= sklansky op    >>= space 1000    >>= bus    >>= output "out"

Reusing standard (monadic) Haskell combinators

(nothing Wired-specific)

Page 27: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Sklansky with guidesSklansky with guides

Page 28: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement: More guidesRefinement: More guides

sklansky op [a] = space cellWidthD [a]

sklansky op as  = downwards 1 $ do

    bus as    let k       = length as `div` 2        (ls,rs) = splitAt k as

    (ls',rs') <­ rightwards 0 $ liftM2 (,)        (sklansky op ls)        (sklansky op rs)

    rs'' <­ rightwards 0 $              sequence [op (last ls', r) | r <­ rs']    bus (ls' ++ rs'')

Page 29: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Sklansky with guidesSklansky with guides

Page 30: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Experiment: CompactionExperiment: Compaction

sklansky op [a] = space cellWidthD [a]

sklansky op [a] = return [a]

Buses were compacted separately

Page 31: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Export to CAD tool Export to CAD tool (Cadence Soc Encounter)(Cadence Soc Encounter)

Auto-routed in Encounter

Odd rows flipped to sharepower rails

Simple change in recursive call:

sklansky (flipY.op) ls

Exchanged using DEF file

format

Page 32: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Fast, low-power prefix networksFast, low-power prefix networks

Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel prefix networks

Initially, crude performance modelsDelay: Logical depthPower: Number of operators

Still good results

Now using Wired to improve accuracyStatic timing/power analysis using models from cell library

Page 33: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Minimal change to search algorithmMinimal change to search algorithm

prefix f p = memo pm  where    pm ([],w)  = perhaps id' ([],w)    pm ([i],w) = perhaps id' ([i],w)    pm (is,w) | 2^(maxd(is,w)) < length is = Fail    pm (is,w)

      = (bestOn is f . dropFail)          [ wrpC ds (prefix f p) (prefix p p)            | ds <­ igen ... ]        where          wrpC ds p1 p2 =            wrp ds (perhaps id’ c) (p1 c1) (p2 c2)

          ...

Page 34: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Minimal change to search algorithmMinimal change to search algorithm

prefix f p = memo pm  where    pm ([],w)  = perhaps id' ([],w)    pm ([i],w) = perhaps id' ([i],w)    pm (is,w) | 2^(maxd(is,w)) < length is = Fail    pm (is,w)

      = (bestOn is f . dropFail)          [ wrpC ds (prefix f p) (prefix p p)            | ds <­ igen ... ]        where          wrpC ds p1 p2 =            wrp ds (perhaps id’ c) (p1 c1) (p2 c2)

          ... Plug in cost functions that analyze the placed network through Wired

Page 35: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

85 bits, depth 885 bits, depth 8

Page 36: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

85 bits, depth 885 bits, depth 8

Page 37: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Design explorationDesign exploration

85 inputs, depth 8, varying allowed fanout

At 128 bits, minimum depth is slower than going one deeper (crude delay model fails)

Accurate model consistent with timing report from Encounter

Fanout7 0,646 15,28 0,628 15,79 0,624 15,9

10 0,620 16,1

Delay [ns] Power [mW]

Page 38: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Fanout 7

Fanout 8

Fanout 9

Fanout 10

Page 39: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Binary multiplicationBinary multiplication

       101100     * 001011       101100      101100     000000    101100   000000+ 000000      000111100100

“Partial products”

484

1) Generate the partial products (PPs)

2) Sum the partial productsa) Sum until two terms leftb) Add the two remaining terms

44 * 11

Page 40: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Binary multiplicationBinary multiplication

       101100     * 001011       101100      101100     000000    101100   000000+ 000000      000111100100

“Partial products”

484

1) Generate the partial products (PPs)

2) Sum the partial productsa) Sum until two terms leftb) Add the two remaining terms

44 * 11

Not in this talk

Page 41: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Column compression multipliersColumn compression multipliers

       101100     * 001011       101100      101100     000000    101100   000000+ 000000     

Use full adders to compress the bits in each column until only two bits remain

Each full adder produces a carry which is forwarded to the next column

Different strategies for which order to process the bits yields very different characteristics (e.g. linear vs. logarithmic depth)

Page 42: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

High-performance multiplier (HPM)High-performance multiplier (HPM)

Multiplier reduction tree with logarithmic logic depth and regular connectivity.Eriksson, Sheeran, et al. ISCAS '06.

Simple scheme:Process PP signals firstProcess full adder output bits “as late as possible”Prioritize carry bits

Page 43: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Purely structural version (Purely structural version (≈≈Lava)Lava)

Show code...

Page 44: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement 1Refinement 1

Page 45: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement 2Refinement 2

Page 46: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Refinement 3Refinement 3

Page 47: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Rectangular transformRectangular transform

Page 48: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Using reduction tree in real designUsing reduction tree in real design

Page 49: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Using reduction tree in real designUsing reduction tree in real design

By Kasyab, Ph.D. Student at Computer

Engineering

Page 50: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

SummarySummary

Wire-aware hardware design methods needed

Wired offers flexible hardware design at low levels of abstraction

SklanskyAt Intel: 1000 lines of scripting code (Perl)In Wired: <50 lines (though fewer details)

Layout-/wire-aware design exploration

Page 51: Flexible Hardware Design at Low Levels of Abstraction · Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel

Get WiredGet Wired

Install Haskell Platform (to get the Cabal tool):http://hackage.haskell.org/platform/

Install Wired:

Manual download:http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Wired

> cabal install Wired