1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer...
-
Upload
candace-collins -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer...
1
Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders
Chung-Kuan ChengComputer Science and Engineering
Depart. University of California, San Diego
2
Outline Prefix Adder Problem
Background & Previous Work Extensions: High-radix, Ling
Our Work Area/Timing/Power Models Mixed-Radix (2,3,4) Adders ILP Formulation
Experimental Results Future Work
Prefix Adder – Challenges
Logical Levels
Wire Tracks
Fanouts
Area
Physical placement
Detail routing
New Design Scope
Timing
Gate Cap
Wire Cap
Gate sizingBuffer insertion
Signal slope
Input arrival time
Output require time
Increasing impact of physical design and concern of power.
Power
Static power
Dynamic power
Power gating
Activity Probability
4
Binary Addition Input: two n-bit binary numbers
and , one bit carry-in Output: n-bit sum and one bit
carry out Prefix Addition: Carry generation &
propagation
011... aaan
011... bbbn 0c
011... sssn
nc
)(
:Propagate
:Generate
1
iiii
iiii
iii
iii
bacs
cpgc
bap
bag
5
Prefix Addition – Formulation
iiiiii bapbag Pre-processing:
Post-processing:
Prefix Computation:
iii
iii
cps
cPGc
0]0:[]0:[1
]:1[]:[]:[
]:1[]:[]:[]:[
kjjiki
kjjijiki
PPP
GPGG
6
1234
12:13:14:1
Prefix Adder – Prefix Structure Graph
gpi
pi
G[i:0]
si
biai
GP[i, j] GP[j-1, k]
GP[i, k]
gp generator
sum generator
GP cell
Pre-processing
Post-processing
Prefix Computation
7
Previous Works – Classical prefix adders
12345678
12:13:14:16:17:18:1 5:1
Brent-Kung:Logical levels: 2log2n–1 Max fanouts: 2Wire tracks: 1
12345678
12:13:14:16:17:18:1 5:1
Kogge-Stone:Logical levels: log2nMax fanouts: 2Wire tracks: n/2
12345678
12:13:14:16:17:18:1 5:1
Sklansky:Logical levels: log2nMax fanouts: n/2Wire tracks: 1
8
High-Radix Adders
Each cell has more than two fan-in’s
Pros: less logic levels 6 levels (radix-2) vs. 3 levels (radix-4)
for 64-bit addition Cons: larger delay and power in
each cell
9
Radix-3 Sklansky & Kogge-Stone Adder
David Harris, “Logical Effort of Higher Valency Adders”
10
Ling Adders
iiiiii bapbag Pre-processing:
Post-processing:
Prefix Computation:
iii
iii
cps
cPGc
0]0:1[]0:1[
]:1[]:[]:[
]:1[]:[]:[]:[
kjjiki
kjjijiki
PPP
GPGG
Prefix Ling
* *[ 1:0] [ 1:0] 1( )i i i i i is G t G t p
,i i i i i i
i i i
g a b p a b
t a b
1*
]1:[1*
]1:[ , iiiiiiii ppPggG* * * *[ : ] [ : ] [ 1: 1] [ 1: ]i k i j i j j kG G P G
*]:1[
*]:[
*]:[ kjjiki PPP
*]0:1[1 iii Gpc
11
An 8-bit Ling Adder
H1H2H3H4H5H6H7H8
12345678
0 1
si
ai bipi-1
Hi
gi pi-1 pi-2
G*i P*
i-1
gi-1
12
Area Model
Distinguish physical placement from logical structure, but keep the bit-slice structure.
Logical view Physical view
Bit position
Logical level
Bit position
Physical level
Compact placement
12345678 12345678
13
Timing Model Cell delay calculation:
pfd Effort Delay Intrinsic Delay
hgf
Logical EffortElectrical Effort = Cout/Cin= (fanouts+wirelength) / size
Intrinsic properties of the cell
14
Power Model Total power consumption:
Dynamic power + Static Power Static power: leakage current of device
Psta = *#cells Dynamic power: current switching
capacitancePdyn = Cload
is the switching probability = j (j is the logical level*)
cellsCjPPP loadstadyntotal # * Vanichayobon S, etc, “Power-speed Trade-off in Parallel Prefix Circuits”
15
ILP Formulation OverviewStructure variables: •GP cells•Connections (wires)•Physical positions
Capacitance variables: •Gate cap•Vertical wire cap•Horizontal wire cap
Timing variables: •Input arrival time•Output arrival time
Power Objective
ILPILOG CPLEX
Optimal Solution
16
Integer Linear Programming (ILP) ILP: Linear Programming with integer
variables. Difficulties and techniques:
Constraints are not linear Linearize using pseudo linear constraints
Search Space too large Reduce search space
Search is slow Add redundant constraints to speedup
17
ILP – Integer Linear Programming
Linear Programming: linear constraints, linear objective, fractional variables.
Integer Linear Programming: Linear Programming with integer variables.
LP Optimal
Constraints
ILP Optimal
ILP – Pseudo-Linear Constraint
Minimize: x3
Subject to: x1 300
x2 500
x3 = min(x1, x2)
Minimize: x3
Subject to: x1 300
x2 500
x3 x1
x3 x2
x3 x1 – 1000 b1 (1)
x3 x2 – 1000 (1 – b1) (2)
b1 is binary
Problem: ILP formulation:
LP objective: 0ILP objective: 300
• A constraint is called pseudo-linear if it’s not effective until some integer variables are fixed.
• Pseudo-linear constraints mostly arise from IF/ELSE scenarios• binary decision variables are introduced to indicate true or false.
19
ILP Solver Search Procedure
0Root (all vars are fractional)
b1
b2
b3
b4
2
‘0’ ‘1’infeasible
‘0’ ‘1’
3feasible
(current best)
infeasible
Minimize F(b1, b2, b3, b4, f1,…) bi is binary
It is VERY helpful if ILP objective is close to LP objective
2
‘0’ ‘1’
4
‘0’ ‘1’
3 2
4
55
‘0’ ‘1’
Cut
2 Bound (Smallest candidate)
Interval Adjacency Constraint
H1H2H3H4H5H6H7H8
12345678
(7,3): Interval [7,1]
(3,2): Interval [3,1]
(7,2): Interval [7,4]
Must be adjacent,i.e. 4 = 3 + 1
(column id, logic level)
21
Linearization for Interval Adjacency Constraint
(i, j)
(i, h) (k1, l1) (k2, l2)
wl wr1 wr2
],[ ),(),(R
hiL
hi yy ],[ )1,1()1,1(R
lkL
lk yy ],[ )2,2()2,2(R
lkL
lk yy
],[ ),(),(R
jiL
ji yy
11 if 1),(),( (i,j,k,l) wrwl(i,j,h) yy Llk
Rhi
1 if 1),,,(1),(
),( wl(i,j,h) lkjiwrkylk
Rhi
11 ),,,(1),(
),( wl(i,j,h))(nlkjiwrkylk
Rhi
11 ),,,(1),(
),( wl(i,j,h))(nlkjiwrkylk
Rhi
iyLji ),(
Linearize
Pseudo Linear
Left interval bound equal to column index
22
Search Space Reduction Ling’s adder:
separate odd and even bits
Double the bit-width we are able to search H1H2H3H4H5H6H7H8
12345678
23
Redundant Constraints Cell (i,j) is known to have logic level j before
wire connection Assume load is MinLoad (fanout=1 with
minimum wire length):
MinLoadjP ji ),(
Cell (i,j) has a path of length j-1 Assume each cell along the path has MinLoad
)(),( MinLoadLEPDjT ji
24
Experiments – 16-bit Uniform Timing
Experiments – 16-bit Uniform Timing
25
26
Min-Power Radix-2 Adder (delay= 22, power = 45.5FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
27
Min-Power Radix-2&4 Adder (delay=18, power = 29.75FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell
28
Min-Power Mixed-Radix Adder (delay=20, power = 28.0FO4)
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell Radix-3 Cell
29
Experiments – 16-bit Non-uniform Time (Mixed Radix)
ILP is able to handle non-uniform timings
Ling adders are most superior in increasing arrival time– faster carries
Increasing Arrival Time (delay=35.5, power = 27.0FO4 )
30
Decreasing Arrival Time (delay=34.5, power = 30.5FO4)
31
Convex Arrival Time (delay=35.9, power = 32.4FO4 )
32
Increasing Required Time (delay=34.5, power = 30.5FO4)
33
Decreasing Required Time (delay=36.5, power = 32.5FO4)
34
Convex Required Time (delay=36.5, power = 32.5FO4)
35
36
Experiments – 64-bit Hierarchical Structure (Mixed-Radix)
Handle high bit-width applications 16x4 and 8x8
ILP Block ILP Block ILP Block ILP Block
ILP Block
a1b1a16b16a17b17a32b32a33b33a48b48a49b49a64b64
…... …... …... …...
Level 1
Level 2
…... …... …... …...
…... …... …... …...
GP*[64:50]GP*[48:34] GP*[32:18] GP*[16:2]
GP*[1:1]GP*[17:17]GP*[33:33]GP*[49:49]
…... …... …...H64 H49 H48 H33 H32 H17 H16 H1
37
Experiments – 64-bit Hierarchical Structure
TSL: a 64-bit high-radix three-stage Ling adder
V. Oklobdzija and B. Zeydel, “Energy-Delay Characteristics of CMOS Adders”,in High-Performance Energy-Efficient Microprocessor Design, pp. 147-170, 2006
38
ASIC Implementation - Results 64-bit hierarchical design (mixed-radix)
by ILP vs. fast carry look-ahead adder by Synopsys Design Compiler
TSMC 90nm standard cell library was used
39
Future Work
ILP formulation improvement Expected to handle 32 or 64 bit
applications without hierarchical scheme
Optimizing other computer arithmetic modules Comparator, Multiplier
40
Q & A
Thank You!