ECE 260B – CSE 241A Static Timing Analysis 1 ECE260B – CSE241A Winter 2005 Logic Synthesis...
-
Upload
imogene-owens -
Category
Documents
-
view
251 -
download
9
Transcript of ECE 260B – CSE 241A Static Timing Analysis 1 ECE260B – CSE241A Winter 2005 Logic Synthesis...
ECE 260B – CSE 241A Static Timing Analysis 1 http://vlsicad.ucsd.edu
ECE260B – CSE241A
Winter 2005
Logic Synthesis
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
Slides courtesy of Dr. Cho Moon
ECE 260B – CSE 241A Static Timing Analysis 2 http://vlsicad.ucsd.edu
Introduction
Why logic synthesis? Ubiquitous – used almost everywhere VLSI is done Body of useful and general techniques – same solutions can be
used for different problems Foundation for many applications such as
- Formal verification
- ATPG
- Timing analysis
- Sequential optimization
ECE 260B – CSE 241A Static Timing Analysis 3 http://vlsicad.ucsd.edu
RTL Design Flow
RTLSynthesis
HDL
netlist
LogicSynthesis
netlist
Library
Physical Synthesis
layout
a
b
s
q0
1
d
clk
a
b
s
q0
1
d
clk
ModuleGenerators
ManualDesign
Slide courtesy of Devadas, et. al
ECE 260B – CSE 241A Static Timing Analysis 4 http://vlsicad.ucsd.edu
Logic Synthesis Problem
Given Initial gate-level netlist Design constraints
- Input arrival times, output required times, power consumption, noise immunity, etc…
Target technology libraries
Produce Smaller, faster or cooler gate-level netlist that meets constraints
Very hard optimization problem!
ECE 260B – CSE 241A Static Timing Analysis 5 http://vlsicad.ucsd.edu
Combinational Logic Synthesis
Logic Synthesis
netlist
netlist
Library
techindependent
techdependent
2-levelLogic opt
multilevelLogic opt
Library
Slide courtesy of Devadas, et. al
ECE 260B – CSE 241A Static Timing Analysis 6 http://vlsicad.ucsd.edu
Outline
Introduction
Two-level Logic Synthesis
Multi-level Logic Synthesis
Timing Optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 7 http://vlsicad.ucsd.edu
Two-level Logic Synthesis Problem
Given an arbitrary logic function in two-level form, produce a smaller representation.
For sum-of-products (SOP) implementation on PLAs, fewer product terms and fewer inputs to each product term mean smaller area.
F = A B + A B C
F = A B
I1
I2
O1
O2
ECE 260B – CSE 241A Static Timing Analysis 8 http://vlsicad.ucsd.edu
Boolean Functions
f(x) : Bn B
B = {0, 1}, x = (x1, x2, …, xn)
x1, x2, … are variables
x1, x1, x2, x2, … are literals
each vertex of Bn is mapped to 0 or 1
the onset of f is a set of input values for which f(x) = 1
the offset of f is a set of input values for which f(x) = 0
ECE 260B – CSE 241A Static Timing Analysis 9 http://vlsicad.ucsd.edu
Logic Functions:
Slide courtesy of Devadas, et. al
ECE 260B – CSE 241A Static Timing Analysis 10 http://vlsicad.ucsd.edu
Cube Representation
Slide courtesy of Devadas, et. al
ECE 260B – CSE 241A Static Timing Analysis 11 http://vlsicad.ucsd.edu
A function can be represented by a sum of cubes (products):
f = ab + ac + bc
Since each cube is a product of literals, this is a “sum of products” representation
A SOP can be thought of as a set of cubes FF = {ab, ac, bc} = C
A set of cubes that represents f is called a cover of f. F={ab, ac, bc} is a cover of f = ab + ac + bc.
Sum-of-products (SOP)
ECE 260B – CSE 241A Static Timing Analysis 12 http://vlsicad.ucsd.edu
Prime Cover
A cube is prime if there is no other cube that contains it(for example, b c is not a prime but b is)
A cover is prime iff all of its cubes are prime
c
ab
ECE 260B – CSE 241A Static Timing Analysis 13 http://vlsicad.ucsd.edu
Irredundant Cube
A cube of a cover C is irredundant if C fails to be a cover if c is dropped from C
A cover is irredundant iff all its cubes are irredudant (for exmaple, F = a b + a c + b c)
c
ba
Not covered
ECE 260B – CSE 241A Static Timing Analysis 14 http://vlsicad.ucsd.edu
Quine-McCluskey Method
We want to find a minimum prime and irredundant cover for a given function.
Prime cover leads to min number of inputs to each product term.
Min irredundant cover leads to min number of product terms.
Quine-McCluskey (QM) method (1960’s) finds a minimum prime and irredundant cover.
Step 1: List all minterms of on-set: O(2^n) n = #inputs Step 2: Find all primes: O(3^n) n = #inputs Step 3: Construct minterms vs primes table Step 4: Find a min set of primes that covers all the minterms:
O(2^m) m = #primes
ECE 260B – CSE 241A Static Timing Analysis 15 http://vlsicad.ucsd.edu
QM Example (Step 1)
F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
List all on-set minterms
Minterms
a’ b’ c’
a b’ c’
a b’ c
a b c
a’ b c
ECE 260B – CSE 241A Static Timing Analysis 16 http://vlsicad.ucsd.edu
QM Example (Step 2)
F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
Find all primes.
primes b’ c’ a b’ a c b c
ECE 260B – CSE 241A Static Timing Analysis 17 http://vlsicad.ucsd.edu
QM Example (Step 3)
F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
Construct minterms vs primes table (prime implicant table) by determining which cube is contained in which prime. X at row i, colum j means that cube in row i is contained by prime in column j.
b’ c’ a b’ a c b c
a’ b’ c’ X
a b’ c’ X X
a b’ c X X
a b c X X
a’ b c X
ECE 260B – CSE 241A Static Timing Analysis 18 http://vlsicad.ucsd.edu
QM Example (Step 4)
F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c
Find a minimum set of primes that covers all the minterms
“Minimum column covering problem”
b’ c’ a b’ a c b c
a’ b’ c’ X
a b’ c’ X X
a b’ c X X
a b c X X
a’ b c X
Essential primes
ECE 260B – CSE 241A Static Timing Analysis 19 http://vlsicad.ucsd.edu
ESPRESSO – Heuristic Minimizer
Quine-McCluskey gives a minimum solution but is only good for functions with small number of inputs (< 10)
ESPRESSO is a heuristic two-level minimizer that finds a “minimal” solution
ESPRESSO(F) {
do {
reduce(F);
expand(F);
irredundant(F);
} while (fewer terms in F);
verfiy(F);
}
ECE 260B – CSE 241A Static Timing Analysis 20 http://vlsicad.ucsd.edu
ESPRESSO ILLUSTRATED
Reduce
Irredundant
Expand
ECE 260B – CSE 241A Static Timing Analysis 21 http://vlsicad.ucsd.edu
Outline
Introduction
Two-level Logic Synthesis
Multi-level Logic Synthesis
Timing optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 22 http://vlsicad.ucsd.edu
Multi-level Logic Synthesis
Two-level logic synthesis is effective and mature
Two-level logic synthesis is directly applicable to PLAs and PLDs
But…
There are many functions that are too expensive to implement in two-level forms (too many product terms!)
Two-level implementation constrains layout (AND-plane, OR-plane)
Rule of thumb: Two-level logic is good for control logic Multi-level logic is good for datapath or random logic
ECE 260B – CSE 241A Static Timing Analysis 23 http://vlsicad.ucsd.edu
Two-Level (PLA) vs. Multi-Level
PLAcontrol logic
constrained layout
highly automatic
technology independent
multi-valued logic
slower?
input, output, state encoding
Multi-levelall logic
general
automatic
partially technology independent
coming
can be high speed
some results
ECE 260B – CSE 241A Static Timing Analysis 24 http://vlsicad.ucsd.edu
Multi-level Logic Synthesis Problem
Given Initial Boolean network Design constraints
- Arrival times, required times, power consumption, noise immunity, etc…
Target technology libraries
Produce a minimum area netlist consisting of the gates from the target
libraries such that design constraints are satisfied
ECE 260B – CSE 241A Static Timing Analysis 25 http://vlsicad.ucsd.edu
Modern Approach to Logic Optimization
Divide logic optimization into two subproblems:
Technology-independent optimization- determine overall logic structure
- estimate costs (mostly) independent of technology
- simplified cost modeling
Technology-dependent optimization (technology mapping)
- binding onto the gates in the library
- detailed technology-specific cost model
Orchestration of various optimization/transformation techniques for each subproblem
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 26 http://vlsicad.ucsd.edu
Optimization Cost Criteria
The accepted optimization criteria for multi-level logic are to minimize some function of:
1. Area occupied by the logic gates and interconnect (approximated by literals = transistors in technology independent optimization)
2. Critical path delay of the longest path through the logic
3. Degree of testability of the circuit, measured in terms of the percentage of faults covered by a specified set of test vectors for an approximate fault model (e.g. single or multiple stuck-at faults)
4. Power consumed by the logic gates
5. Noise Immunity
6. Wireability
while simultaneously satisfying upper or lower bound constraints placed on these physical quantities
ECE 260B – CSE 241A Static Timing Analysis 27 http://vlsicad.ucsd.edu
Representation: Boolean Network
Boolean network:• directed acyclic graph (DAG)• node logic function representation
fj(x,y)
• node variable yj: yj= fj(x,y)
• edge (i,j) if fj depends explicitly on yi
Inputs x = (x1, x2,…,xn )
Outputs z = (z1, z2,…,zp )
Slide courtesy of Brayton
ECE 260B – CSE 241A Static Timing Analysis 28 http://vlsicad.ucsd.edu
Network Representation
Boolean network:
ECE 260B – CSE 241A Static Timing Analysis 29 http://vlsicad.ucsd.edu
Node Representation: Sum of Products (SOP)
Example:
abc’+a’bd+b’d’+b’e’f (sum of cubes)
Advantages:• easy to manipulate and minimize• many algorithms available (e.g. AND, OR, TAUTOLOGY)• two-level theory applies
Disadvantages:• Not representative of logic complexity. For example f=ad+ae+bd+be+cd+ce f’=a’b’c’+d’e’
These differ in their implementation by an inverter.• hence not easy to estimate logic; difficult to estimate progress during logic manipulation
ECE 260B – CSE 241A Static Timing Analysis 30 http://vlsicad.ucsd.edu
Factored Forms
Example: (ad+b’c)(c+d’(e+ac’))+(d+e)fg
Advantages• good representative of logic complexity f=ad+ae+bd+be+cd+ce
f’=a’b’c’+d’e’ f=(a+b+c)(d+e)• in many designs (e.g. complex gate CMOS) the implementation of a function corresponds directly to
its factored form• good estimator of logic implementation complexity• doesn’t blow up easily
Disadvantages• not as many algorithms available for manipulation• hence usually just convert into SOP before manipulation
ECE 260B – CSE 241A Static Timing Analysis 31 http://vlsicad.ucsd.edu
Factored Forms
Note:
literal count transistor count area
(however, area also depends on wiring)
ECE 260B – CSE 241A Static Timing Analysis 32 http://vlsicad.ucsd.edu
Factored Forms
Definition : a factored form can be defined recursively by the following rules. A factored form is either a product or sum where:• a product is either a single literal or product of factored forms• a sum is either a single literal or sum of factored forms
A factored form is a parenthesized algebraic expression.
In effect a factored form is a product of sums of products … or a sum of products of sums …
Any logic function can be represented by a factored form, and any factored form is a representation of some logic function.
ECE 260B – CSE 241A Static Timing Analysis 33 http://vlsicad.ucsd.edu
Factored Forms
When measured in terms of number of inputs, there are functions whose size is exponential in sum of products representation, but polynomial in factored form.
Example: Achilles’ heel function
There are n literals in the factored form and (n/2)2n/2 literals in the SOP form.
2/
1212 )(
ni
iii xx
Factored forms are useful in estimating area and delay in a multi-level synthesis and optimization system.
In many design styles (e.g. complex gate CMOS design) the implementation of a function corresponds directly to its factored form.
ECE 260B – CSE 241A Static Timing Analysis 34 http://vlsicad.ucsd.edu
Factored Forms
Factored forms cam be graphically represented as labeled trees, called factoring trees, in which each internal node including the root is labeled with either + or , and each leaf has a label of either a variable or its complement.
Example: factoring tree of ((a’+b)cd+e)(a+b’)+e’
ECE 260B – CSE 241A Static Timing Analysis 35 http://vlsicad.ucsd.edu
Reduced Ordered BDDs
• like factored form, represents both function and complement
• like network of muxes, but restricted since controlled by primary input variables
- not really a good estimator for implementation complexity
• given an ordering, reduced BDD is canonical, hence a good replacement for truth tables
• for a good ordering, BDDs remain reasonably small for complicated functions (e.g. not multipliers)
• manipulations are well defined and efficient• true support (dependency) is displayed
ECE 260B – CSE 241A Static Timing Analysis 36 http://vlsicad.ucsd.edu
Technology-Independent Optimization
Technology-independent optimization is a bag of tricks: Two-level minimization (also called simplify) Constant propagation (also called sweep)
f = a b + c; b = 1 => f = a + c Decomposition (single function)
f = abc+abd+a’c’d’+b’c’d’ => f = xy + x’y’; x = ab ; y = c+d
Extraction (multiple functions)
f = (az+bz’)cd+e g = (az+bz’)e’ h = cde
f = xy+e g = xe’ h = ye x = az+bz’ y = cd
ECE 260B – CSE 241A Static Timing Analysis 37 http://vlsicad.ucsd.edu
More Technology-Independent Optimization
More technology-independent optimization tricks: Substitution
g = a+b f = a+bc
f = g(a+c)
Collapsing (also called elimination)f = ga+g’b g = c+d
f = ac+ad+bc’d’ g = c+d
Factoring (series-parallel decomposition)
f = ac+ad+bc+bd+e => f = (a+b)(c+d)+e
ECE 260B – CSE 241A Static Timing Analysis 38 http://vlsicad.ucsd.edu
Summary of Typical Recipe for TI Optimization
Propagate constants
Simplify: two-level minimization at Boolean network node
Decomposition
Local “Boolean” optimizations Boolean techniques exploit Boolean identities (e.g., a a’ = 0)
Consider f = a b’ + a c’ + b a’ + b c’ + c a’ + c b’ Algebraic factorization procedures
f = a (b’ + c’) + a’ (b + c) + b c’ + c b’ Boolean factorization procedures
f = (a + b + c) (a’ + b’ + c’)
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 39 http://vlsicad.ucsd.edu
Technology-Dependent Optimization
Technology-dependent optimization consists of
Technology mapping: maps Boolean network to a set of gates from technology libraries
Local transformations Discrete resizing Cloning Fanout optimization (buffering) Logic restructuring
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 40 http://vlsicad.ucsd.edu
Technology Mapping
Input Technology independent, optimized logic network Description of the gates in the library with their cost
Output Netlist of gates (from library) which minimizes total cost
General Approach Construct a subject DAG for the network Represent each gate in the target library by pattern DAG’s Find an optimal-cost covering of subject DAG using the
collection of pattern DAG’s Canonical form: 2-input NAND gates and inverters
ECE 260B – CSE 241A Static Timing Analysis 41 http://vlsicad.ucsd.edu
DAG Covering
DAG covering is an NP-hard problem
Solve the sub-problem optimally Partition DAG into a forest of trees Solve each tree optimally using tree covering Stitch trees back together
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 42 http://vlsicad.ucsd.edu
Tree Covering Algorithm
Transform netlist and libraries into canonical forms 2-input NANDs and inverters
Visit each node in BFS from inputs to outputs Find all candidate matches at each node N
- Match is found by comparing topology only (no need to compare functions)
Find the optimal match at N by computing the new cost- New cost = cost of match at node N + sum of costs for matches at
children of N Store the optimal match at node N with cost
Optimal solution is guaranteed if cost is area
Complexity = O(n) where n is the number of nodes in netlist
ECE 260B – CSE 241A Static Timing Analysis 43 http://vlsicad.ucsd.edu
Tree Covering Example
into the technology library (simple example below)
Find an ``optimal’’ (in area, delay, power) mapping of this circuit
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 44 http://vlsicad.ucsd.edu
Elements of a library - 1
INVERTER 2
NAND2 3
NAND3 4
NAND4 5
Element/Area Cost Tree Representation (normal form)
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 45 http://vlsicad.ucsd.edu
Trivial Covering
subject DAG
7 NAND2 (3) = 215 INV (2) = 10
Area cost 31
Slide courtesy of Keutzer
Can we do better with tree covering?
ECE 260B – CSE 241A Static Timing Analysis 46 http://vlsicad.ucsd.edu
Optimal tree covering - 1
``subject tree’’
3
2
2
3
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 47 http://vlsicad.ucsd.edu
Optimal tree covering - 2
``subject tree’’
5
8
3
2
2
3
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 48 http://vlsicad.ucsd.edu
Optimal tree covering - 3
``subject tree’’
Cover with ND2 or ND3 ?
3
2
2
3
813
5
1 NAND2 3+ subtree 5
1 NAND3 = 4
Area cost 8Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 49 http://vlsicad.ucsd.edu
Optimal tree covering – 3b
``subject tree’’
3
2
2
3
813
5 4
Label the root of the sub-tree with optimal match and cost
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 50 http://vlsicad.ucsd.edu
Optimal tree covering - 4
``subject tree’’
Cover with INV or AO21 ?
54
3
8
2
2
13
2
1 Inverter 2+ subtree 13
Area cost 15
1 AO21 4+ subtree 1 3+ subtree 2 2
Area cost 9
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 51 http://vlsicad.ucsd.edu
Optimal tree covering – 4b
``subject tree’’54
3
8
2
2
13
2
9
Label the root of the sub-tree with optimal match and cost
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 52 http://vlsicad.ucsd.edu
Optimal tree covering - 5
``subject tree’’
Cover with ND2 or ND3 ?
subtree 1 9subtree 2 41 NAND2 3
Area cost 16
NAND2 NAND3
8
4
9
subtree 1 8subtree 2 2subtree 3 41 NAND3 4
Area cost 18
2
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 53 http://vlsicad.ucsd.edu
Optimal tree covering – 5b
``subject tree’’
168
4
9
2
Label the root of the sub-tree with optimal match and cost
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 54 http://vlsicad.ucsd.edu
Optimal tree covering - 6
``subject tree’’
Cover with INV or AOI21 ?
INV AOI21
Area cost 22
5
16
Area cost 18
subtree 1 161 INV 2
subtree 1 13subtree 2 51 AOI21 4
13
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 55 http://vlsicad.ucsd.edu
Optimal tree covering – 6b
``subject tree’’5
16
1813
Label the root of the sub-tree with optimal match and cost
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 56 http://vlsicad.ucsd.edu
Optimal tree covering - 7
``subject tree’’
Cover with ND2 or ND3 or ND4 ?
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 57 http://vlsicad.ucsd.edu
Cover 1 - NAND2
``subject tree’’
Cover with ND2 ?
16
18
subtree 1 18subtree 2 01 NAND2 3
Area cost 21
4
9
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 58 http://vlsicad.ucsd.edu
Cover 2 - NAND3
``subject tree’’
Cover with ND3?
subtree 1 9subtree 2 4subtree 3 01 NAND3 4
Area cost 17
9
4
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 59 http://vlsicad.ucsd.edu
Cover - 3
``subject tree’’
Cover with ND4 ?
Area cost 19
subtree 1 8subtree 2 2subtree 3 4subtree 4 01 NAND4 5
8
4
2
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 60 http://vlsicad.ucsd.edu
Optimal Cover was Cover 2
``subject tree’’
INV 2ND2 32 ND3 8AOI21 4
Area cost 17
AOI21
ND2
INV
ND3
ND3
Slide courtesy of Keutzer
ECE 260B – CSE 241A Static Timing Analysis 61 http://vlsicad.ucsd.edu
Summary of Technology Mapping
DAG covering formulation Separated library issues from mapping algorithm (can’t do this
with rule-based systems)
Tree covering approximation Very efficient (linear time) Applicable to wide range of libraries (std cells, gate arrays) and
technologies (FPGAs, CPLDs)
Weaknesses Problems with DAG patterns (Multiplexors, full adders, …) Large input gates lead to a large number of patterns
ECE 260B – CSE 241A Static Timing Analysis 62 http://vlsicad.ucsd.edu
Outline
Introduction
Two-level Logic Synthesis
Multi-level Logic Synthesis
Timing optimization in Synthesis
ECE 260B – CSE 241A Static Timing Analysis 63 http://vlsicad.ucsd.edu
Timing Optimization in Synthesis
Factors determining delay of circuit:
Underlying circuit technology Circuit type (e.g. domino, static CMOS, etc.) Gate type Gate size
Logical structure of circuit Length of computation paths False paths Buffering
Parasitics Wire loads Layout
ECE 260B – CSE 241A Static Timing Analysis 64 http://vlsicad.ucsd.edu
Problem Statement
Given:
Initial circuit function description
Library of primitive functions
Performance constraints (arrival/required times)
Generate:
an implementation of the circuit using the primitive functions, such that:
1. performance constraints are met
2. circuit area is minimized
ECE 260B – CSE 241A Static Timing Analysis 65 http://vlsicad.ucsd.edu
Current Design Process
BehaviorBehaviorOptiizationOptiization(scheduling)(scheduling)
PartitioningPartitioning(retiming)(retiming)
Logic synthesisLogic synthesis•Technology independentTechnology independent•Technology mappingTechnology mapping
Timing drivenTiming drivenplace and routeplace and route
Behavioral descriptionBehavioral description
Logic and latchesLogic and latches
Logic equationsLogic equations
Gate netlistGate netlist
Layout Layout
•Gate libraryGate library•Perf. ConstraintsPerf. Constraints•Delay modelsDelay models
ECE 260B – CSE 241A Static Timing Analysis 66 http://vlsicad.ucsd.edu
Synthesis delay models
Why are technology independent delay reductions hard?
Lack of fast and accurate delay models
1. # levels, fast but crude
2. # levels + correction term (fanout, wires,… ): a little better, but still crude (what coefficients to use?)
3. Technology mapped: reasonable, but very slow
4. Place and route: better but extremely slow
5. Silicon: best, but infeasibly slow (except for FPGAs)
bbeetttteerr
sslloowweerr
ECE 260B – CSE 241A Static Timing Analysis 67 http://vlsicad.ucsd.edu
Clustering/partial-collapse
Traditional critical-path based methods require Well defined critical path Good delay/slack information
Problems: Good delay information comes from mapper and layout Delay estimates and models are weak
Possible solutions: Better delay modeling at technology independent level
Make speedup, insensitive to actual critical paths and mapped delays
ECE 260B – CSE 241A Static Timing Analysis 68 http://vlsicad.ucsd.edu
Overview of Solutions for delay
1. Circuit re-structuring Rescheduling operations to reduce time of computation
2. Implementation of function trees (technology mapping) Selection of gates from library
- Minimum delay (load independent model)- Minimize delay and area
Implementation of buffer trees
3. Resizing
Focus here on circuit re-structuring
ECE 260B – CSE 241A Static Timing Analysis 69 http://vlsicad.ucsd.edu
Circuit re-structuring
Approaches:
Local:
Mimic optimization techniques in adders Carry lookahead (tree height reduction) Conditional sum (generalized select transformation) Carry bypass (generalized bypass transformation)
Global:
Reduce depth of entire circuit Partial collapsing Boolean simplification
ECE 260B – CSE 241A Static Timing Analysis 70 http://vlsicad.ucsd.edu
Re-structuring methods
Performance measured by 1. levels,
2. sensitizable paths,
3. technology dependent delays
Level based optimizations: Tree height reduction (Singh ‘88) Partial collapsing and simplification (Touati ‘91) Generalized select transform (Berman ‘90)
Sensitizable paths Generalized bypass transform (Mcgeer ‘91)
ECE 260B – CSE 241A Static Timing Analysis 71 http://vlsicad.ucsd.edu
Re-structuring for delay: tree-height reduction
nn
ll mm
ii jj
hh
kk33
66
55 55
11 4411
00 00 00 00 22 00 00aa bb cc dd ee ff gg
ii11
00 00
aa bb
mm
jj
hh
kk3344
11
00 00 22 00 00
cc dd ee ff gg
n’n’DuplicatedDuplicatedlogiclogic
11220000
55CriticalCriticalregionregion
CollapsedCollapsedCritical regionCritical region
ECE 260B – CSE 241A Static Timing Analysis 72 http://vlsicad.ucsd.edu
Restructuring for delay: path reduction
ii11
00 00
aa bb
mm
jj
hh
kk3344
11
00 00 22 00 00
cc dd ee ff gg
n’n’DuplicatedDuplicatedlogiclogic
11220000
55
ii11
00 00
aa bb
mm
jj
hh
kk33
4411
00 00 22 00 00
cc dd ee ff gg
1122
00
3355
n’n’
22
11
00
44
CollapsedCollapsedCritical regionCritical region
New delay = 5New delay = 5
ECE 260B – CSE 241A Static Timing Analysis 73 http://vlsicad.ucsd.edu
Generalized select transform (GST)
Late signal feeds multiplexor
cc dd ee ff gg
aa
bb
outout
cc dd ee ff gg
bb
cc dd ee ff gg
bb
a=0a=0
a=1a=1
outout00
11
aa
ECE 260B – CSE 241A Static Timing Analysis 74 http://vlsicad.ucsd.edu
Generalized bypass transform (GBX)
Make critical path false Speed up the circuit
Bypass logic of critical path(s)
ffmm=f=f ffm+1m+1 ffnn=g=g……
ffm m =f=f ffm+1m+1 ffnn=g=g…… 00
11g’g’
dgdg____dfdf
BooleanBooleandifferencedifference
s-a-0 redundants-a-0 redundant
ECE 260B – CSE 241A Static Timing Analysis 75 http://vlsicad.ucsd.edu
cc dd ee ff ggbb
cc dd ee ff ggbb
a=0a=0
a=1a=1
outout00
11
aa
GSTGST
GST vs GBX
…… 00
11g’g’
dhdh____dada
aa
0/10/1
bb
cc gg
hh
cc dd ee ff ggbb
cc dd ee ff ggbb
a=0a=0
a=1a=1
…… 00
11g’g’
0/10/1 cc gg
bb
GBXGBX
aa
hh
Boolean
diffe
Note:
rence =
a a a
hh h
GBXGBX
ECE 260B – CSE 241A Static Timing Analysis 76 http://vlsicad.ucsd.edu
Thank you!