CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page...

33
CALTECH CS137 Spring2002 -- DeH on CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) orking problem with Eylon Caspi]

description

CALTECH CS137 Spring DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i )

Transcript of CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page...

Page 1: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

CS137:Electronic Design Automation

Day 13: May 20, 2002Page Generation

(Area and IO Constraints)

[working problem with Eylon Caspi]

Page 2: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Today

• Cover/clustering– Minimize Weight– W/ area and IO constraints

• Motivation: SCORE Page generation– Also energy minimization

• Techniques• Current Results• FPGA/hardware implementation?

Page 3: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Abstract Problem

• Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges.

• Cluster nodes into subsets Vi, such that (Cost(Vi)) minimized IO(Vi) < IO limit A(Vi) < Area limit Cost(Vi) = (cost(e) | e E st. e1 Vi and e2 Vi)

Page 4: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

SCORE CompilationProgramming Model Execution Model

• Graph of TDF FSMD operators • Graph of page configs

- unlimited size, # IOs - fixed size, # IOs- no timing constraints - timed, single-cycle firing

Compile

memorysegment

TDFoperator

stream

memorysegment

compute page

stream

Page 5: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

How Big is an Operator?

• Wavelet Decode• Wavelet Encode• JPEG Encode• MPEG Encode

Area for 47 Operators(Before Pipeline Extraction)

0

500

1000

1500

2000

2500

3000

3500

Operator (sorted by area)

Are

a (4

-LU

Ts)

FSM AreaDF Area

• JPEG Encode• JPEG Decode• MPEG (I)• MPEG (P)• Wavelet Encode• IIR

Page 6: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Clustering is Critical

• Inter-page comm. latency may be long• Inter-page feedback loops are slow• Cluster to:

– Fit feedback loops within page– Fit feedback loops on device

Page 7: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Pipeline Extraction• Hoist uncontrolled FF data-flow out of

FSMD• Benefits:

– Shrink FSM cyclic core– Extracted pipeline has more freedom for

scheduling and partitioning

Extract

state foo(i): acc=acc+2*i

state foo(two_i): acc=acc+two_i

i

stat

e

DFCF

*2

two_ii

pipeline pipeline

Page 8: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Pipeline Extraction – Extractable AreaExtractable Data-Path Area

for 47 Operators

0

500

1000

1500

2000

2500

3000

3500

Operator (sorted by data-path area)

Are

a (4

-LU

Ts)

Extracted DF AreaResidual DF Area

• JPEG Encode• JPEG Decode• MPEG (I)• MPEG (P)• Wavelet Encode• IIR

Page 9: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Page Generation

• Pipeline extraction – removes dataflow can freely extract from

FSMD control• Still have to partition potentially large

FSMs– approach: turn into a clustering problem

Page 10: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

State Clustering• Start: consider each state to be a unit• Cluster states into page-size sub-

FSMDs– Inter-page transitions become streams

• Possible clustering goals:– Minimize delay (inter-page latency)– Minimize IO (inter-page BW)– Minimize area (fragmentation)

IA IB

OA OB

Page 11: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

State Clustering to Minimize Inter-Page State

Transfer• Inter-page state transfer is slow• Cluster to:

– Contain feedback loops– Minimize frequency of

inter-page state transfer• Previously used in:

– VLIW trace scheduling [Fisher ‘81]– FSM decomposition for low power

[Benini/DeMicheli ISCAS ‘98]

– VM/cache code placement– GarpCC code selection [Callahan ‘00]

Page 12: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Clustering Problem

• SCORE Page – Fixed area (# of LUTs)– Fixed IO

• Cost on edges is probability take state transition

• Clustering Goal is to minimize page-to-page transition– Maximize expected transitions within same page– Find page-count/page-transition tradeoff curve

Page 13: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Abstract Problem

• Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges.

• Cluster nodes into subsets Vi, such that (Cost(Vi)) minimized IO(Vi) < IO limit A(Vi) < Area limit Cost(Vi) = (cost(e) | e E st. e1 Vi and e2 Vi)

Pages

Inter-Page Communication Frequency

Page 14: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

DSM

• Possibly relevant for minimizing delay in DSM

• Previously discussed:– Larger area longer wires, slower– Want to cluster logic locally

• Maybe:– Cluster common computations together– Make distant computation transfer

uncommon

Page 15: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Island Packing for Energy

• Note: Modern FPGAs pack cluster of LUTs into an endpoint– e.g. Altera LAB

Page 16: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Island Packing for Energy

• Modern FPGAs pack cluster of LUTs into an endpoint– e.g. Altera LAB

• Local wiring less energy cost than long wiring

• Covering for energy:– minimize exposed activity factor– same covering problem

Page 17: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Abstract Problem

• Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges.

• Cluster nodes into subsets Vi, such that (Cost(Vi)) minimized IO(Vi) < IO limit A(Vi) < Area limit Cost(Vi) = (cost(e) | e E st. e1 Vi and e2 Vi)

Clusters/Islands

Switching Activity

Page 18: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

First Try

• Use FBB (flow cut) [Wong/cs137a:day7]• Pick seed element• Compute mincut

– On mix of IO, cost edge weights?• If too small,

– Cluster in node and repeat• Else

– Cluster out node and repeat

Page 19: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Mincut lessons

• Couldn’t consistently control IO– Non-monotonic results adjusting weight

• Not clear what to cluster in

Page 20: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Idea #2

• If we had an ordering of nodes– (wishful thinking)

• Then easy to know how to include more– Just pick the next node

• Order: 1D list of nodes• Cluster: a contiguous sequence of

nodes in list– Specify start, finish

Page 21: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

From Sequence to Clusters

• Easy to know if a contiguous subsequence– Meets area constraints– Meets io constraints

• Cover– Set of (non-overlapping) subsequences– Include all nodes

Page 22: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Feasible Clusters (mult16a)

Page 23: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Covering

• Not clear when to put more or less stuff in a cluster…versus leave with next cluster– Can’t build clusters greedily

• Like associative/parthesization problem saw earlier [day 5]

Page 24: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Parenthesis Matching

• Similar• But compute from all

breaks across a diagonal– Not just nearest

neighbor• Hence extra O(N)

Day 5

Page 25: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Dynamic Programming

• For each subsequence start,end– Either the area and io match – OR want to find a breakpoint between cluster

sets• Cluster sets startmidpoint, midpointend may

each either be single or multiple clusters

• Different splits may– Minimize number of clusters– Minimize cost– Keep dominator set [day11]

Page 26: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Algorithm

• Compute Linear Order• Compute IO, Area on each

subsequence – Think NxN table (but sparse)

• Use Dynamic Programming to cover

Page 27: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Compute Order?

• Could experiment with various techniques

• Considering: Spectral Ordering – [Hall/cs137a:day7]

• How weight edges?– IO, cost, mix?– Try linear mix…vary mix weighting

Page 28: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Weight Mix

• Why unclear?– IO weight good to cluster connectivity

• If Ios limited, allows to use fewer clusters• Pack more stuff into pageless cases need to

transition– Cost weight what we’re minimizing

• Cluster high cost edges together• Hide in page

– But, cost ordering may get less stuff in page if poorly IO clustered…

Page 29: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

spp results

• [see HTML]

Page 30: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Versus Weighting (w by 0.01)

Page 31: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Discussion

• Promising Results– New capability not clear what compare to

• Maybe LUT clustering to validate algorithm– Absolutes look promising

• Weighting– Not clear how to search for best– Maybe should try other ways of weighting?

• [Michael suggests try taking log(trans)]

Page 32: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Spatial/Hdw Implementation?

• Compute Linear Order– Use 1D FDSA?

• Compute IO, Area on each subsequence – Parallel prefix sum scan

• One for each start point?

• Use Dynamic Programming to cover– Like parenthesis– Maybe 1D and combine with area/io scan?

Page 33: CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

CALTECH CS137 Spring2002 -- DeHon

Promising Ideas

• Compute good ordering– Easy to vary inclusion when know what’s

next to include/exclude• Mix weights• Cluster to minimize exposed (cut) costs