Digital Integrated Circuitshtang/ece4311_doc_F11/LectureSlide/week… · © Digital Integrated...
Transcript of Digital Integrated Circuitshtang/ece4311_doc_F11/LectureSlide/week… · © Digital Integrated...
EE141© Digital Integrated Circuits2ndArithmetic Circuits
1
Digital Integrated
CircuitsA Design Perspective
Arithmetic Circuits
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic
EE141© Digital Integrated Circuits2ndArithmetic Circuits
2
A Generic Digital Processor
MEMORY
DATAPATH
CONTROL
INP
UT
-OU
TP
UT
EE141© Digital Integrated Circuits2ndArithmetic Circuits
3
Building Blocks for Digital Architectures
Arithmetic unit
- Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.)
Memory
- RAM, ROM, Buffers, Shift registers
Control
- Finite state machine (PLA, random logic.)
- Counters
Interconnect
- Switches
- Arbiters
- Bus
EE141© Digital Integrated Circuits2ndArithmetic Circuits
4
Arithmetic building blocks Speed and power of arithmetic components often
dominates the overall system performance
For each module, multiple topologies and ways of
design exists, with each of them has its own advantages
A global picture is of crucial importance. A designer
focus their attention on gates or transistors that have the
largest impact on their goal function. Non-critical
components can be developed routinely.
Typically two optimization process: logic optimization
(re-arrange Boolean equations so that a faster or small
circuit could be obtained) and circuit optimization
(manipulate circuit topology and transistor sizes to
optimize speed)
EE141© Digital Integrated Circuits2ndArithmetic Circuits
5
Bit-Sliced Design
Bit 3
Bit 2
Bit 1
Bit 0
Regis
ter
Ad
der
Sh
ifte
r
Mu
ltip
lexer
Control
Data
-In
Data
-Ou
t
Tile identical processing elements
Since the same operation has to be performed on each
bit of a data word, the data path can consist of the
number of bit slices (equal to the word length), each
operating on a single bit – hence the term bit-sliced
EE141© Digital Integrated Circuits2ndArithmetic Circuits
7
Full-AdderA B
Cout
Sum
Cin Fulladder
Generate (G) = AB
Propagate (P) = A B
Delete (D) = A B
G,D, ensures a carry bit will be generated or deleted at Co independent of Ci,
While P guarantees that Ci will propagate to Co.
EE141© Digital Integrated Circuits2ndArithmetic Circuits
8
The Binary Adder
S A B Ci
=
A= BCi ABCi ABCi
ABCi
+ + +
Co
AB BCi
ACi
+ +=
A B
Cout
Sum
Cin Fulladder
EE141© Digital Integrated Circuits2ndArithmetic Circuits
9
The Ripple-Carry Adder
Worst case delay linear with the number of bits
Goal: Make the fastest possible carry path circuit
FA FA FA FA
A0 B0
S0
A1 B1
S1
A2 B2
S2
A3 B3
S3
Ci,0 Co,0
(= Ci,1)
Co,1 Co,2 Co,3
td = O(N)
tadder = (N-1)tcarry + tsum
EE141© Digital Integrated Circuits2ndArithmetic Circuits
10
Complimentary Static CMOS Full Adder
28 Transistors
A B
B
A
Ci
Ci A
X
VDD
VDD
A B
Ci BA
B VDD
A
B
Ci
Ci
A
B
A CiB
Co
VDD
S
EE141© Digital Integrated Circuits2ndArithmetic Circuits
11
Complimentary Static CMOS Full Adder
Large PMOS stacks are present in both carry and sum
generation circuits
Intrinsic load capacitance of Co signal is large and
consists of eight capacitance components
There is one more inverter delay for carry and sum
(worse when the load capacitance is large)
Note that critical signal Ci closer to the output node
EE141© Digital Integrated Circuits2ndArithmetic Circuits
12
Express Sum and Carry as a function of
P, G, D
Define 3 new variable which ONLY depend on A, B
Generate (G) = AB
Propagate (P) = A B
Delete (D) = A B
Can also derive expressions for S and Co based on D and P
Propagate (P) = A + B
Note that we will be sometimes using an alternate definition for
EE141© Digital Integrated Circuits2ndArithmetic Circuits
13
Transmission Gate XOR
A
B
F
B
A
B
B
M1
M2
M3/M4
tionimplementaary complementfor rs transisto12 ),( BABAF +
When B=1, M1/M2 inverter, M3/M4 off, so F=AB
When B=0, M1/M2 off, M3/M4 transmission gate, so F=AB
EE141© Digital Integrated Circuits2ndArithmetic Circuits
14
Transmission Gate Full Adder
A
B
P
Ci
VDDA
A A
VDD
Ci
A
P
AB
VDD
VDD
Ci
Ci
Co
S
Ci
P
P
P
P
P
Sum Generation
Carry Generation
SetupPropagate (P) = A B
EE141© Digital Integrated Circuits2ndArithmetic Circuits
15
Manchester Carry Chain
CoC
i
Gi
Di
Pi
Pi
VDD
CoC
i
Gi
Pi
VDD
Generate (G) = AB
Propagate (P) = A B
Delete = A B
Prevent floating Co
EE141© Digital Integrated Circuits2ndArithmetic Circuits
17
Manchester Carry Chain
G2
C3
G3
Ci,0
P0
G1
VDD
G0
P1
P2
P3
C3
C2
C1
C0
EE141© Digital Integrated Circuits2ndArithmetic Circuits
18
Manchester Carry Chain
Pi + 1
Gi + 1
Ci
Inverter/Sum Row
Propagate/Generate Row
Pi
Gi
Ci - 1
Ci + 1
VDD
GND
Stick Diagram
EE141© Digital Integrated Circuits2ndArithmetic Circuits
19
Manchester Carry Chain Delay for the Manchester Carry Chain can be
modeled similar to a linearized RC network as in
transmission-gates
This means the propagation delay is quadratic in the
number of bits N (but does not imply the delay will be
larger than the ripple carry adder)
It might be necessary to insert signal buffering
inverters.
Still a ripple carry adder, typically only good for small
word length (<8/16 bits)
We need faster adders for computer and multimedia
applications with word length 32-128 bits
EE141© Digital Integrated Circuits2ndArithmetic Circuits
20
Carry-Bypass Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci ,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mu
ltip
lexer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)
then Co3 = C0, else “kill” or “generate”.
Also called
Carry-Skip
Break the bit-slice organization
G0
G0
P1
P1
“delete” or “generate”
EE141© Digital Integrated Circuits2ndArithmetic Circuits
21
Carry-Bypass Adder (cont.)
Carrypropagation
Setup
Bit 0–3
Sum
M bits
tsetup
tsum
Carrypropagation
Setup
Bit 4–7
Sum
tbypass
Carrypropagation
Setup
Bit 8–11
Sum
Carrypropagation
Setup
Bit 12–15
Sum
tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum
Tsetup: overhead time to create G, P, D signals
(worst case)
EE141© Digital Integrated Circuits2ndArithmetic Circuits
22
Carry Ripple versus Carry Bypass
(both still linear)
N
tp
ripple adder
bypass adder
4..8
EE141© Digital Integrated Circuits2ndArithmetic Circuits
23
Carry-Select Adder
Setup
"0" Carry Propagation
"1" Carry Propagation
Multiplexer
Sum Generation
Co,k-1 Co,k+3
"0"
"1"
P,G
Carry Vector
EE141© Digital Integrated Circuits2ndArithmetic Circuits
24
Carry Select Adder: Critical Path
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
Ci,0 Co,3 Co,7 Co,11 Co,15
S0–3
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
S4–7
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry 0-Carry
Setup
S8–11
0
1
Sum Generation
Multiplexer
1-Carry
Setup
S12–15
EE141© Digital Integrated Circuits2ndArithmetic Circuits
25
Linear Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
S0-3 S4-7 S8-11 S12-15
Ci,0
(1)
(1)
(5)(6) (7) (8)
(9)
(10)
(5) (5) (5)(5)
tadder = tsetup + Mtcarry + (N/M)tmux + tsum
EE141© Digital Integrated Circuits2ndArithmetic Circuits
26
Square Root Carry Select
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13
S0-1 S2-4 S5-8 S9-13
Ci,0
(4) (5) (6) (7)
(1)
(1)
(3) (4) (5) (6)
Mux
Sum
S14-19
(7)
(8)
Bit 14-19
(9)
(3)
M
EE141© Digital Integrated Circuits2ndArithmetic Circuits
27
Adder Delays - Comparison
Square root select
Linear select
Ripple adder
20 40
N
t p(in
un
it d
ela
ys)
600
10
0
20
30
40
50
Bypass
EE141© Digital Integrated Circuits2ndArithmetic Circuits
28
LookAhead - Basic Idea
Co k f Ak Bk Co k 1– Gk PkCo k 1–+= =
AN-1, BN-1A1, B1
P1
S1
• • •
• • • SN-1
PN-1Ci, N-1
S0
P0Ci,0 Ci,1
A0, B0
EE141© Digital Integrated Circuits2ndArithmetic Circuits
29
Look-Ahead: Topology
Co k Gk Pk Gk 1– Pk 1– Co k 2–+ +=
Co k Gk Pk Gk 1– Pk 1– P1 G0 P0Ci 0+ + + +=
Expanding Lookahead equations:
All the way:
Co,3
Ci,0
VDD
P0
P1
P2
P3
G0
G1
G2
G3
EE141© Digital Integrated Circuits2ndArithmetic Circuits
30
Look-Ahead Adder: Logarithmic adder
A7
F
A6A5A4A3A2A1
A0
A0
A1
A2
A3
A4
A5
A6
A7
F
tp log2(N)
tp N
EE141© Digital Integrated Circuits2ndArithmetic Circuits
31
Carry Look-Ahead Trees
Can continue building the tree hierarchically.
=G1:0+P1:0C0
(G1:0=G1+P1G0 P1:0=P1P0)
C0=G0+P0Cin
C1=G1+P1C0
C2=G2+P2C1
C3=G3+P3C2
C0=G0+P0Cin
C1=G1+P1C0 =G1+G0P1+P1P0Cin
C2=G2+P2C1 =G2+G1P2+G0P2P1+P2P1P0Cin =G2:1+P2:1C0
(G2:1=G2+P2G1 P2:1=P2P1)
C3=G3+P3C2 =G3+G2P3+G1P3P2+G0P3P2P1+P3P2P1P0Cin
=G3:2+P3:2C1=G3:2+P3:2(G1:0+P1:0C0)=(G3:2+P3:2G1:0)+P3:2P1:0C0
G3:2=(G3+P3G2) and P3:2=P3P2 are called dot products.
EE141© Digital Integrated Circuits2ndArithmetic Circuits
32
Tree Adders
16-bit radix-2 Kogge-Stone tree (radix 2 means that the tree is
Binary: it combines two dot product or carry words at a time at
Each level of hierarchy)
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10,
B10)
(A11,
B11)
(A12,
B12)
(A13,
B13)
(A14,
B14)
(A15,
B15)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
EE141© Digital Integrated Circuits2ndArithmetic Circuits
33
Tree Adders(a
0,
b0)
(a1,
b1)
(a2,
b2)
(a3,
b3)
(a4,
b4)
(a5,
b5)
(a6,
b6)
(a7,
b7)
(a8,
b8)
(a9,
b9)
(a1
0,
b1
0)
(a1
1,
b1
1)
(a1
2,
b1
2)
(a1
3,
b1
3)
(a1
4,
b1
4)
(a1
5,
b1
5)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S1
0
S1
1
S1
2
S1
3
S1
4
S1
5
16-bit radix-4 Kogge-Stone Tree
EE141© Digital Integrated Circuits2ndArithmetic Circuits
34
Sparse Trees(a
0,
b0)
(a1,
b1)
(a2,
b2)
(a3,
b3)
(a4,
b4)
(a5,
b5)
(a6,
b6)
(a7,
b7)
(a8,
b8)
(a9,
b9)
(a1
0,
b1
0)
(a1
1,
b1
1)
(a1
2,
b1
2)
(a1
3,
b1
3)
(a1
4,
b1
4)
(a1
5,
b1
5)
S1
S3
S5
S7
S9
S1
1
S1
3
S1
5
S0
S2
S4
S6
S8
S1
0
S1
2
S1
4
16-bit radix-2 sparse tree with sparseness of 2
EE141© Digital Integrated Circuits2ndArithmetic Circuits
35
Tree Adders(A
0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A1
0,
B1
0)
(A1
1,
B1
1)
(A1
2,
B1
2)
(A1
3,
B1
3)
(A1
4,
B1
4)
(A1
5,
B1
5)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S1
0
S1
1
S1
2
S1
3
S1
4
S1
5
Brent-Kung Tree
EE141© Digital Integrated Circuits2ndArithmetic Circuits
36
Intel Itanium Microprocessor
9-1
Mu
x9
-1 M
ux
5-1
Mu
x2
-1 M
ux
ck1
CARRYGEN
SUMGEN
+ LU
1000um
b
s0
s1
g64
sum sumb
LU : Logical
Unit
SU
MS
EL
a
to Cache
node1
RE
GItanium has 6 integer execution units like this
EE141© Digital Integrated Circuits2ndArithmetic Circuits
37
Bit-Sliced Design
Bit 3
Bit 2
Bit 1
Bit 0
Regis
ter
Ad
der
Sh
ifte
r
Mu
ltip
lexer
Control
Data
-In
Data
-Ou
t
Tile identical processing elements
EE141© Digital Integrated Circuits2ndArithmetic Circuits
38
Bit-Sliced Datapath
Adder stage 1
Wiring
Adder stage 2
Wiring
Adder stage 3
Bit s
lice
0
Bit s
lice
2
Bit s
lice
1
Bit s
lice
63
Sum Select
Shifter
Multiplexers
Loopback B
us
From register files / Cache / Bypass
To register files / Cache
Loopback B
us
Loopback B
us
The adder is
implemented
as a radix-4
Carry Look-
Ahead adder,
the red lines
are forwarding
the results of
different stages
EE141© Digital Integrated Circuits2ndArithmetic Circuits
39
Itanium Integer Datapath
Courtesy of Intel
EE141© Digital Integrated Circuits2ndArithmetic Circuits
41
The Binary Multiplication
Z X··
Y Zk2k
k 0=
M N 1–+
= =
Xi2i
i 0=
M 1–
Yj2j
j 0=
N 1–
=
XiYj2i j+
j 0=
N 1–
i 0=
M 1–
=
X Xi2i
i 0=
M 1–
=
Y Yj2j
j 0=
N 1–
=
with
EE141© Digital Integrated Circuits2ndArithmetic Circuits
42
The Binary Multiplication
x
+
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1
EE141© Digital Integrated Circuits2ndArithmetic Circuits
43
The Array Multiplier (4 by 4)
Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Z2
Z0
The carryout of the last adder for Yi is forwarded to Yi+1
carry
sum
Half
adder
EE141© Digital Integrated Circuits2ndArithmetic Circuits
44
The MxN Array Multiplier
— Critical Path
HA FA FA HA
HAFAFAFA
FAFA FA HA
Critical Path 1
Critical Path 2
Critical Path 1 & 2
EE141© Digital Integrated Circuits2ndArithmetic Circuits
45
Carry-Save Multiplier
HA HA HA HA
FAFAFAHA
FAHA FA FA
FAHA FA HA
Vector Merging Adder
A more efficient
realization can be obtained
by noticing that the
multiplication results does
not change when the output
carry bits are passed
diagonally downwards
instead of to the right.
But need extra adders
(vector merging adders) that
can use fast carry look
ahead adders (since results
come at the same time)
Critical path is uniquely
defined
EE141© Digital Integrated Circuits2ndArithmetic Circuits
46
Multiplier Floorplan
SCSCSCSC
SCSCSCSC
SCSCSCSC
SC
SC
SC
SC
Z0
Z1
Z2
Z3Z4Z5Z6Z7
X0X1X2X3
Y1
Y2
Y3
Y0
Vector Merging Cell
HA Multiplier Cell
FA Multiplier Cell
X and Y signals are broadcasted
through the complete array.
( )
EE141© Digital Integrated Circuits2ndArithmetic Circuits
47
Wallace-Tree Multiplier
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Partial products First stage
Bit position
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Second stage Final adder
FA HA
(a) (b)
(c) (d)
Save the number of full adders
Increase the complexity of routing
EE141© Digital Integrated Circuits2ndArithmetic Circuits
48
Wallace-Tree Multiplier
HA
Can use carry Look-Ahead adder for the last stage
EE141© Digital Integrated Circuits2ndArithmetic Circuits
49
Wallace-Tree Multiplier
FA
FA
FA
FA
y0 y1 y2
y3
y4
y5
S
Ci-1
Ci-1
Ci-1
Ci
Ci
Ci
FA
y0 y1 y2
FA
y3 y4 y5
FA
FA
CC S
Ci-1
Ci-1
Ci-1
Ci
Ci
Ci
EE141© Digital Integrated Circuits2ndArithmetic Circuits
50
Booth encoding Multiply by 01111110 gives 8 partial products, but two
are all zero. Add these zero is waste of time.
Instead, multiply by 100000010, where 1 stands for -1.
Then you need to only add (actually subtract) partial
products, which improves speed
This kind of transformation is called booth encoding. It
reduces the number of partial product to at most half of
the original multiplier width.
The encoding logic is easily incorporated in the overall
multiplier design.
EE141© Digital Integrated Circuits2ndArithmetic Circuits
51
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
- Logarithmic versus Linear (Wallace Tree Mult)
This is also why algorithmic invention has significant
meaning to VLSI design.
EE141© Digital Integrated Circuits2ndArithmetic Circuits
53
The Binary Shifter
Ai
Ai-1
Bi
Bi-1
Right Leftnop
Bit-Slice i
...
EE141© Digital Integrated Circuits2ndArithmetic Circuits
54
The Barrel Shifter
Sh3Sh2Sh1Sh0
Sh3
Sh2
Sh1
A3
A2
A1
A0
B3
B2
B1
B0
: Control Wire
: Data Wire
Area Dominated by Wiring
Column: maximum shift
Word length
EE141© Digital Integrated Circuits2ndArithmetic Circuits
55
4x4 barrel shifter
BufferSh3Sh2Sh1Sh0
A3
A2
A1
A0
Coder/decoder required to set shift bits
Signal pass through one gate independent of shift
amount (parasitic capacitance may change the picture)
EE141© Digital Integrated Circuits2ndArithmetic Circuits
56
Logarithmic ShifterSh1 Sh1 Sh2 Sh2 Sh4 Sh4
A3
A2
A1
A0
B1
B0
B2
B3
No separate coder/decoder is required
EE141© Digital Integrated Circuits2ndArithmetic Circuits
57
A3
A2
A1
A0
Out3
Out2
Out1
Out0
0-7 bit Logarithmic Shifter
Good for large shift amount (note that cascade pass
transistor slow down the gate and generate weak signals,
buffers may be needed)