CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page...

CALTECH CS137 Spring2002 -- DeHon

CS137:Electronic Design Automation

Day 13: May 20, 2002Page Generation

(Area and IO Constraints)

[working problem with Eylon Caspi]

• Cover/clustering– Minimize Weight– W/ area and IO constraints

• Motivation: SCORE Page generation– Also energy minimization

• Techniques• Current Results• FPGA/hardware implementation?

Abstract Problem

• Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges.

• Cluster nodes into subsets Vi, such that (Cost(Vi)) minimized IO(Vi) < IO limit A(Vi) < Area limit Cost(Vi) = (cost(e) | e E st. e1 Vi and e2 Vi)

SCORE CompilationProgramming Model Execution Model

• Graph of TDF FSMD operators • Graph of page configs

- unlimited size, # IOs - fixed size, # IOs- no timing constraints - timed, single-cycle firing

Compile

memorysegment

TDFoperator

stream

memorysegment

compute page

stream

How Big is an Operator?

• Wavelet Decode• Wavelet Encode• JPEG Encode• MPEG Encode

Area for 47 Operators(Before Pipeline Extraction)

Operator (sorted by area)

FSM AreaDF Area

• JPEG Encode• JPEG Decode• MPEG (I)• MPEG (P)• Wavelet Encode• IIR

Clustering is Critical

• Inter-page comm. latency may be long• Inter-page feedback loops are slow• Cluster to:

– Fit feedback loops within page– Fit feedback loops on device

Pipeline Extraction• Hoist uncontrolled FF data-flow out of

FSMD• Benefits:

– Shrink FSM cyclic core– Extracted pipeline has more freedom for

scheduling and partitioning

Extract

state foo(i): acc=acc+2*i

state foo(two_i): acc=acc+two_i

two_ii

pipeline pipeline

Pipeline Extraction – Extractable AreaExtractable Data-Path Area

for 47 Operators

Operator (sorted by data-path area)

Extracted DF AreaResidual DF Area

• JPEG Encode• JPEG Decode• MPEG (I)• MPEG (P)• Wavelet Encode• IIR

Page Generation

• Pipeline extraction – removes dataflow can freely extract from

FSMD control• Still have to partition potentially large

FSMs– approach: turn into a clustering problem

State Clustering• Start: consider each state to be a unit• Cluster states into page-size sub-

FSMDs– Inter-page transitions become streams

• Possible clustering goals:– Minimize delay (inter-page latency)– Minimize IO (inter-page BW)– Minimize area (fragmentation)

State Clustering to Minimize Inter-Page State

Transfer• Inter-page state transfer is slow• Cluster to:

– Contain feedback loops– Minimize frequency of

inter-page state transfer• Previously used in:

– VLIW trace scheduling [Fisher ‘81]– FSM decomposition for low power

[Benini/DeMicheli ISCAS ‘98]

– VM/cache code placement– GarpCC code selection [Callahan ‘00]

Clustering Problem

• SCORE Page – Fixed area (# of LUTs)– Fixed IO

• Cost on edges is probability take state transition

• Clustering Goal is to minimize page-to-page transition– Maximize expected transitions within same page– Find page-count/page-transition tradeoff curve

Abstract Problem

Inter-Page Communication Frequency

• Possibly relevant for minimizing delay in DSM

• Previously discussed:– Larger area longer wires, slower– Want to cluster logic locally

• Maybe:– Cluster common computations together– Make distant computation transfer

uncommon

Island Packing for Energy

• Note: Modern FPGAs pack cluster of LUTs into an endpoint– e.g. Altera LAB

Island Packing for Energy

• Modern FPGAs pack cluster of LUTs into an endpoint– e.g. Altera LAB

• Local wiring less energy cost than long wiring

• Covering for energy:– minimize exposed activity factor– same covering problem

Abstract Problem

Clusters/Islands

Switching Activity

First Try

• Use FBB (flow cut) [Wong/cs137a:day7]• Pick seed element• Compute mincut

– On mix of IO, cost edge weights?• If too small,

– Cluster in node and repeat• Else

– Cluster out node and repeat

Mincut lessons

• Couldn’t consistently control IO– Non-monotonic results adjusting weight

• Not clear what to cluster in

Idea #2

• If we had an ordering of nodes– (wishful thinking)

• Then easy to know how to include more– Just pick the next node

• Order: 1D list of nodes• Cluster: a contiguous sequence of

nodes in list– Specify start, finish

From Sequence to Clusters

• Easy to know if a contiguous subsequence– Meets area constraints– Meets io constraints

• Cover– Set of (non-overlapping) subsequences– Include all nodes

Feasible Clusters (mult16a)

Covering

• Not clear when to put more or less stuff in a cluster…versus leave with next cluster– Can’t build clusters greedily

• Like associative/parthesization problem saw earlier [day 5]

Parenthesis Matching

• Similar• But compute from all

breaks across a diagonal– Not just nearest

neighbor• Hence extra O(N)

Dynamic Programming

• For each subsequence start,end– Either the area and io match – OR want to find a breakpoint between cluster

sets• Cluster sets startmidpoint, midpointend may

each either be single or multiple clusters

• Different splits may– Minimize number of clusters– Minimize cost– Keep dominator set [day11]

Algorithm

• Compute Linear Order• Compute IO, Area on each

subsequence – Think NxN table (but sparse)

• Use Dynamic Programming to cover

Compute Order?

• Could experiment with various techniques

• Considering: Spectral Ordering – [Hall/cs137a:day7]

• How weight edges?– IO, cost, mix?– Try linear mix…vary mix weighting

Weight Mix

• Why unclear?– IO weight good to cluster connectivity

• If Ios limited, allows to use fewer clusters• Pack more stuff into pageless cases need to

transition– Cost weight what we’re minimizing

• Cluster high cost edges together• Hide in page

– But, cost ordering may get less stuff in page if poorly IO clustered…

spp results

• [see HTML]

Versus Weighting (w by 0.01)

Discussion

• Promising Results– New capability not clear what compare to

• Maybe LUT clustering to validate algorithm– Absolutes look promising

• Weighting– Not clear how to search for best– Maybe should try other ways of weighting?

• [Michael suggests try taking log(trans)]

Spatial/Hdw Implementation?

• Compute Linear Order– Use 1D FDSA?

• Compute IO, Area on each subsequence – Parallel prefix sum scan

• One for each start point?

• Use Dynamic Programming to cover– Like parenthesis– Maybe 1D and combine with area/io scan?

Promising Ideas

• Compute good ordering– Easy to vary inclusion when know what’s

next to include/exclude• Mix weights• Cluster to minimize exposed (cut) costs

CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page...

Documents

Transcript of CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page...

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 5: January 21, 2004 Multi-level Synthesis.

CS137: Electronic Design Automation

CALTECH CS137 Winter2002 -- DeHon CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated Annealing…)

CALTECH CS137 Spring2004 -- DeHon CS137: Electronic Design Automation Day 9: May 2, 2004 FSM Equivalence Checking.

Spring2002 - JALT

CALTECH CS137 Spring2004 -- DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.

CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix.

CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 9: May 6, 2002 FSM Equivalence Checking.

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 6: January 26, 2004 Sequential Optimization (FSM Encoding)

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)

CALTECH CS137 Winter2002 -- DeHon CS137: Electronic Design Automation Day 15: March 4, 2002 Two-Level Logic-Synthesis.

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.

CALTECH CS137 Winter2006 -- DeHon CS137: Electronic Design Automation Day 4: Jan 18, 2006 Concept Generation.

CALTECH CS137 Winter2002 -- DeHon CS137: Electronic Design Automation Day 14: February 27, 2002 Routing 2 (Pathfinder)

CALTECH CS137 Fall2005 -- DeHon 1 CS137: Electronic Design Automation Day 17: November 11, 2005 Placement (Simulated Annealing…)

CALTECH CS137 Fall2005 -- DeHon 1 CS137: Electronic Design Automation Day 22: December 2, 2005 Routing 2 (Pathfinder)

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)

CALTECH CS137 Winter2004 -- DeHon CS137: Electronic Design Automation Day 10: February 11, 2002 Partitioning 2 (spectral, network flow, replication)

CALTECH CS137 Winter2002 -- DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.