Motivated Organized Innovative Problem Solvers Information Experts About WEIRDOs.
1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions...
-
Upload
lorena-oconnor -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Logic Restructuring for Timing Optimization Outline: Definitions and problem statementDefinitions...
1
Logic Restructuring for Logic Restructuring for Timing OptimizationTiming Optimization
Outline:Outline:• Definitions and problem statementDefinitions and problem statement• Overview of techniques Overview of techniques (motivated by (motivated by
adders)adders)– Tree height reduction (THR)Tree height reduction (THR)– Generalized bypass transform (GBX)Generalized bypass transform (GBX)– Generalized select transform (GST)Generalized select transform (GST)– Partial collapsing (?)Partial collapsing (?)
2
Timing OptimizationTiming OptimizationFactors determining Factors determining delaydelay of circuit: of circuit:• Underlying Underlying circuitcircuit technology technology
– Circuit type Circuit type (e.g. domino, static CMOS, etc.)(e.g. domino, static CMOS, etc.)– Gate typeGate type– Gate sizeGate size
• Logical Logical structurestructure of circuit of circuit– Length of computation pathsLength of computation paths– False pathsFalse paths– BufferingBuffering
• ParasiticsParasitics– Wire loadsWire loads– Layout Layout
3
Problem StatementProblem Statement
Given:Given:• Initial circuit function descriptionInitial circuit function description• Library of primitive functionsLibrary of primitive functions• Performance constraints Performance constraints (arrival/required (arrival/required
times)times)
Generate:Generate:
an implementation of the circuit using the an implementation of the circuit using the primitive functions, such that:primitive functions, such that:1.1. performanceperformance constraints are met constraints are met
2.2. circuit circuit areaarea is minimized is minimized
4
Current Design ProcessCurrent Design Process
BehaviorBehaviorOptiizationOptiization(scheduling)(scheduling)
PartitioningPartitioning(retiming)(retiming)
Logic synthesisLogic synthesis•Technology independentTechnology independent•Technology mappingTechnology mapping
Timing drivenTiming drivenplace and routeplace and route
Behavioral descriptionBehavioral description
Logic and latchesLogic and latches
Logic equationsLogic equations
Gate netlistGate netlist
Layout Layout
•Gate libraryGate library•Perf. ConstraintsPerf. Constraints•Delay modelsDelay models
5
Technology mapping for Technology mapping for delaydelay
FunctionFunctiontreetree
BufferBuffertreetree
6
Overview of Solutions for Overview of Solutions for delaydelay
1.1. Circuit Circuit re-structuringre-structuring– Rescheduling operations to reduce time of computationRescheduling operations to reduce time of computation
2.2. Implementation of Implementation of functionfunction trees trees (technology (technology mapping)mapping)– Selection of gates from librarySelection of gates from library
• Minimum delay Minimum delay (load independent model - Kukimoto)(load independent model - Kukimoto)• Minimize delay and area Minimize delay and area (Jongeneel, DAC’00)(Jongeneel, DAC’00)
(combines Lehman-Watanabe and Kukimoto)(combines Lehman-Watanabe and Kukimoto)
3.3. Implementation of Implementation of bufferbuffer trees trees– Touati Touati (LT-trees)(LT-trees)– SinghSingh
4.4. ResizingResizing
Focus Focus herehere on circuit on circuit re-structuringre-structuring
7
Circuit re-structuringCircuit re-structuring
Approaches:Approaches:
Local:Local: • Mimic optimization techniques in Mimic optimization techniques in addersadders
– Carry lookahead (Carry lookahead (THRTHR tree height reduction) tree height reduction)– Conditional sum (Conditional sum (GSTGST transformation) transformation)– Carry bypass (Carry bypass (GBXGBX transformation) transformation)
Global:Global:• Reduce depth of entire circuitReduce depth of entire circuit
– Partial collapsingPartial collapsing– Boolean simplificationBoolean simplification
8
Re-structuring methodsRe-structuring methods
Performance measured by Performance measured by 1.1. levels, levels,
2.2. sensitizable paths, sensitizable paths,
3.3. technology dependent delaystechnology dependent delays
• LevelLevel based optimizations: based optimizations:– Tree height reduction (Singh ‘88)Tree height reduction (Singh ‘88)– Partial collapsing and simplification (Touati ‘91)Partial collapsing and simplification (Touati ‘91)– Generalized select transform (Berman ‘90)Generalized select transform (Berman ‘90)
• SensitizableSensitizable paths paths– Generalized bypass transform (Mcgeer ‘91)Generalized bypass transform (Mcgeer ‘91)
9
Re-structuring for delay: Re-structuring for delay: tree-height reductiontree-height reduction
nn
ll mm
ii jj
hh
kk33
66
55 55
11 4411
00 00 00 00 22 00 00aa bb cc dd ee ff gg
ii11
00 00
aa bb
mm
jj
hh
kk3344
11
00 00 22 00 00
cc dd ee ff gg
n’n’DuplicatedDuplicatedlogiclogic
11220000
55CriticalCriticalregionregion
CollapsedCollapsedCritical regionCritical region
10
Restructuring for delay: Restructuring for delay: path reductionpath reduction
ii11
00 00
aa bb
mm
jj
hh
kk3344
11
00 00 22 00 00
cc dd ee ff gg
n’n’DuplicatedDuplicatedlogiclogic
11220000
55
ii11
00 00
aa bb
mm
jj
hh
kk33
4411
00 00 22 00 00
cc dd ee ff gg
1122
00
3355
n’n’
22
11
00
44
Singh ‘88Singh ‘88
CollapsedCollapsedCritical regionCritical region
New delay = 5New delay = 5
11
Generalized bypass Generalized bypass transform (GBX)transform (GBX)
• Make critical path Make critical path falsefalse– Speed up the circuitSpeed up the circuit
• BypassBypass logic of critical path(s) logic of critical path(s)
McGeer ‘91McGeer ‘91
ffmm=f=f ffm+1m+1 ffnn=g=g……
ffm m =f=f ffm+1m+1 ffnn=g=g…… 00
11g’g’
dgdg____dfdf
BooleanBooleandifferencedifference
s-a-0 redundants-a-0 redundant
12
GBX and KMS transformGBX and KMS transformGBX gives little area increase, GBX gives little area increase, BUT BUT have now created an have now created an
untestableuntestable fault fault (on control input to multiplexor)(on control input to multiplexor)
KMS transform:KMS transform: (remove false paths without increasing delay)(remove false paths without increasing delay)1.1. ffkk is is lastlast node on false path that fans out. node on false path that fans out.
2.2. DuplicateDuplicate false path {f false path {f11,…, f,…, fkk} -> } -> {f’{f’11, … , f’, … , f’kk}}
3.3. f’f’jj fans out to every fanout of f fans out to every fanout of fjj except fexcept fj+1j+1, and f, and fjj just fans out to just fans out to ffj+1j+1
4.4. Set fSet f00 input to f input to f11 to to controlling valuecontrolling value and propagate constant and propagate constant (can do (can do because path is false and does not fanout)because path is false and does not fanout)
KMS resultsKMS results1.1. Function of every node, except fFunction of every node, except f11, … ,f, … ,fk k is is unchangedunchanged
2.2. Added k-1 nodesAdded k-1 nodes
3.3. Area added in Area added in linearlinear in size of length of false paths; in practice in size of length of false paths; in practice smallsmall area area increase.increase.
13
KMS KMS ((Keutzer, Malik, Saldanha Keutzer, Malik, Saldanha ‘90‘90))
ffmm ffm+1m+1 ffnn……ffkk ffk+1k+1
f’f’mm f’f’m+1m+1 f’f’kk
ffmm ffm+1m+1 ffnn……ffkk ffk+1k+100
……Delay is Delay is notnotincreasedincreased
14
End of lecture 20End of lecture 20
15
Generalized select Generalized select transform (GST)transform (GST)
LateLate signal feeds multiplexor signal feeds multiplexor
cc dd ee ff gg
aa
bb
outout
cc dd ee ff gg
bb
cc dd ee ff gg
bb
a=0a=0
a=1a=1
outout00
11
aa
Berman ‘90Berman ‘90
16
GST vs GBXGST vs GBX…… 00
11g’g’
dhdh____dada
aa
0/10/1
bb
cc gg
hh
cc dd ee ff ggbb
cc dd ee ff ggbb
a=0a=0
a=1a=1
outout00
11
aa
GSTGST
cc dd ee ff ggbb
cc dd ee ff ggbb
a=0a=0
a=1a=1
…… 00
11g’g’
0/10/1 cc gg
bb
GBXGBX
aa
hh
Boolean
diffe
Note:
rence =
a a a
hh h
GBXGBX
17
GST vs GBXGST vs GBX• Select transform Select transform appearsappears to be more to be more areaarea
efficientefficient• ButBut Boolean difference generally more Boolean difference generally more
efficiently formed in efficiently formed in practicepractice• NoNo delay/speedup delay/speedup advantageadvantage for either for either
transformtransform• Need Need
– one MUX one MUX perper fanoutfanout in GST, in GST, – only only oneone MUX in GBX MUX in GBX
cc dd ee ff ggbb
cc dd ee ff ggbb
a=0a=0
a=1a=1
out1out100
11
aa
GSTGST out2out200
11
aa
18
Technology independent Technology independent delay reductionsdelay reductions
Generally THR, GBX, GST Generally THR, GBX, GST (critical path based (critical path based methods)methods) work OK, work OK, butbut notnot great great
Why are technology independent delay reductions Why are technology independent delay reductions hardhard??
Lack of Lack of fast and accuratefast and accurate delay models delay models1.1. # levels# levels, , fastfast but but crudecrude
2.2. # levels + correction term# levels + correction term (fanout, wires,… ): a little (fanout, wires,… ): a little betterbetter, but still crude (what coefficients to use?), but still crude (what coefficients to use?)
3.3. Technology mappedTechnology mapped: reasonable, but very : reasonable, but very slowslow
4.4. Place and routePlace and route: better but : better but extremely slowextremely slow
5.5. SiliconSilicon: best, but : best, but infeasiblyinfeasibly slow (except for FPGAs) slow (except for FPGAs)
bbeetttteerr
sslloowweerr
19
Clustering/partial-collapseClustering/partial-collapse
Traditional Traditional critical-pathcritical-path based methods require based methods require– Well defined Well defined criticalcritical path path– Good Good delay/slackdelay/slack information information
Problems:Problems:– Good delay information comes from mapper and layoutGood delay information comes from mapper and layout– Delay estimates and models are weakDelay estimates and models are weak
Possible solutions:Possible solutions:– Better delay modeling at technology independent levelBetter delay modeling at technology independent level– Make speedup, insensitive to actual critical paths and Make speedup, insensitive to actual critical paths and
mapped delaysmapped delays
20
Clustering/partial-collapseClustering/partial-collapse
Two-level circuits are fastTwo-level circuits are fast– Collapse circuit to 2-level - Collapse circuit to 2-level - butbut
• Huge Huge areaarea penalty penalty• Huge capacitive Huge capacitive loadingloading on inputs (can be on inputs (can be muchmuch slower) slower)
To avoid huge area penaltyTo avoid huge area penalty– IdentifyIdentify clusters of nodes clusters of nodes
• Each cluster has some fixed sizeEach cluster has some fixed size– Perform Perform collapsecollapse of each cluster of each cluster– SimplifySimplify each node each node
DetailsDetails– How to choose the How to choose the clustersclusters??– How to choose cluster How to choose cluster sizesize??– How to How to simplifysimplify each node? each node?
21
Lawler’s clustering Lawler’s clustering algorithmalgorithm
• OptimalOptimal in delay: in delay:– For a given clustering sizeFor a given clustering size
• May May duplicateduplicate nodes nodes (hence possible area (hence possible area penalty)penalty)– Not optimal w.r.t duplicationNot optimal w.r.t duplication– Use a heuristicUse a heuristic
• FastFast: O(m : O(m xx k) k)– m = number of edges in networkm = number of edges in network– k = maximum cluster sizek = maximum cluster size
22
Clustering algorithm - Clustering algorithm - overviewoverview
1.1. Label phase:Label phase: ( (kk is cluster size) is cluster size)– If node u is an input, If node u is an input, label(u) := L := 0label(u) := L := 0
• Else Else L := max label of fanin of uL := max label of fanin of u– If (# nodes in TFI(u) with (label = L) >= If (# nodes in TFI(u) with (label = L) >= kk))
label(u) := L+1label(u) := L+1
2.2. Cluster phase:Cluster phase: (outputs to inputs) (outputs to inputs)– If node u is an output, If node u is an output, L := infinityL := infinity
• Else Else L := max label of fanouts of uL := max label of fanouts of u– If (label(u) < L) then create a If (label(u) < L) then create a newnew cluster with “root” u and with cluster with “root” u and with
members members allall the nodes in TFI(u) with label = label(u) the nodes in TFI(u) with label = label(u)
3.3. Collapse phase:Collapse phase: (order independent) (order independent)– Collapse all nodes in a cluster into a Collapse all nodes in a cluster into a singlesingle node node– NoteNote: a node may be in : a node may be in severalseveral clusters (causes area increase clusters (causes area increase
23
Example of clusteringExample of clustering
00
00
00
00 00
00
11
11
11
11
22
0011
1122
00
00
ResultResult: Lawler’s algorithm: Lawler’s algorithmgives gives minimum depthminimum depth circuit circuit
Typically, Typically, 1.1. we decompose initial we decompose initial
circuit into 2-input NANDs circuit into 2-input NANDs and invertors. and invertors.
2.2. then cluster size then cluster size k k reflects # 2-input NANDs reflects # 2-input NANDs to be collapsed together.to be collapsed together.
k = 3k = 3
24
Choosing Choosing kk• I(k):I(k): number of levels, given k number of levels, given k• d(k):d(k): duplication ratio duplication ratio
– Number of gates in cluster network Number of gates in cluster network divideddivided by number of gates in by number of gates in original networkoriginal network
• Determine kDetermine k00 where k where k00/d(k/d(k00)~2.0)~2.0
• For every k from 2 to kFor every k from 2 to k00, compute d(k), I(k), compute d(k), I(k)– Use exhaustive enumeration: label and cluster (without collapse) for Use exhaustive enumeration: label and cluster (without collapse) for
each k.each k.– Each iteration is O(|E|k)Each iteration is O(|E|k)
• Choose k such that Choose k such that – I(k) is minimizedI(k) is minimized
• Break ties using d(k)Break ties using d(k)– Minimize d(k)Minimize d(k) d(k)d(k)
I(k)I(k)
11 22 kk00
25
Area recoveryArea recovery
Area increase is due to node Area increase is due to node duplicationduplication - - – this occurs when node is in this occurs when node is in multiplemultiple
clustersclusters
Two solutions:Two solutions:1.1. Break clusters into Break clusters into smallersmaller pieces off pieces off
critical pathcritical path
2.2. After cluster and collapse, After cluster and collapse, recoverrecover area area
26
Relabeling procedure:Relabeling procedure:
Attempt to Attempt to increaseincrease node labels without exceeding node labels without exceeding cluster sizecluster size
In In reversereverse topological order topological orderStartStart : assign : assign
IncreaseIncrease label(u) if label(u) if
1.1. new-label(u) <= label(v) for each fanout v new-label(u) <= label(v) for each fanout v andand
2.2. new-label(u) = new-label(v) for each fanout v only if new-label(u) = new-label(v) for each fanout v only if label(u) = label(v) before relabeling, label(u) = label(v) before relabeling, andand
3.3. no cluster size is violatedno cluster size is violated
- ( ) max ( )i jj PO
new label O label O
27
Relabeling exampleRelabeling example
00
00
00
00 00
00
11
11
11
22
22
00
00
00
00 00
00
11
11
11
11
22
beforebefore
afterafter
28
Post-collapse area Post-collapse area recoveryrecovery
• Do algebraic factorization, Do algebraic factorization, butbut– UndoUndo factorization if depth increases factorization if depth increases
• Full_simplifyFull_simplify– Only consider node Only consider node vv as possible fanin of a node as possible fanin of a node
((v v introduced by introduced by using don’t cares) using don’t cares) if if level of level of vv < level of node. < level of node.
• Redundancy removalRedundancy removal
29
Conclusions Conclusions
• Variety of methods for delay optimizationVariety of methods for delay optimization– No single technique dominates No single technique dominates (KJ Singh PhD thesis)(KJ Singh PhD thesis)
• When applied to ripple-carry adder getWhen applied to ripple-carry adder get– Carry-lookahead adder (THR)Carry-lookahead adder (THR)– Carry-bypass adder (GBX)Carry-bypass adder (GBX)– Carry-select adder (GST)Carry-select adder (GST)– ? (partial collapse)? (partial collapse)
• All techniques ignore All techniques ignore false pathsfalse paths when when assessing the delay and critical regionsassessing the delay and critical regions– Can use Can use KMSKMS transform to eliminate false paths transform to eliminate false paths
without increasing delay without increasing delay (area increase however).(area increase however).