Post on 14-Apr-2018
7/27/2019 Shawn Present
1/28
Low-power, High-speedMultiplier Architectures
Shawn Nicholl
ELEC-5705yMarch 7, 2005
7/27/2019 Shawn Present
2/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 2
Agenda/Overview
Design Abstraction Numbering Systems
Addition and Subtraction
Adder Architectures
Multiplication
Traditional Multiplier Architectures
Advanced Multiplier Architectures
7/27/2019 Shawn Present
3/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 3
Levels of Abstraction in Digital ICs
Higher levels of abstraction have greatereffect on overall system performance
Systems
Modules
Logic Gates
Circuits
Devices
Low-power, high-speed techniques can be
used at many levels of abstraction
Inc
reasing
Abs
traction
Multiplier Architectures
7/27/2019 Shawn Present
4/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 4
Numbering Systems A Quick Review
Decimal
1
0
10
n
i
i
idD
1
0
2
n
i
i
ibB
Range: 0 to 10n-1
Range: 0 to 2n-1
Range: -2n-1
to +(2n-1
1)
Some common numbering systems:
UnsignedBinary
Twos-Complement
Sign Decimal Sign Unsigned Binary Sign Twos Complement
+ 10 + 0000 1010 N/A 0000 1010
- 45 - 0010 1101 N/A 1101 0011
1 1 0 1 0 0 1 1
1 1 0 1 0 0 1 0
1
2s Comp
45d = 0+0+25+0+2
3+2
2+0+2
0
0 0 1 0 1 1 0 1
Eg.
7/27/2019 Shawn Present
5/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 5
Adding and Subtracting
Twos-complement algorithm is consistent Addition and subtraction and behave the same
Negative numbers treated same as positive numbers
Example: Add45d to 10d
10d-45d
-45d10d
45d-10d
45d-10d35d
-35d
Step1) Initialize
Step2) Compare so that augendholds larger number
Step3) Treat as a subtraction
Step4) Do subtraction (borrowsmay be required)
Step5) Negate result (knowing thataugend was negative)
Twos Complement
Method
Step1) Initialize
Step2) Add(no special rules)
10d = 0000 1010b-45d = 1101 0011b
0000 1010b1101 0011b
1101 1101b
Converting 2s Comp back to decimal:1101 1101b = -35d
7/27/2019 Shawn Present
6/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 6
Adding and Subtracting (Example 2)
Example2: Subtract45d from 10d
10d- -45d
10d+ 45d
55d
Step1) Initialize
Step2) Subtrahend is negative,so negate it and do an addition
Signed Decimal Method Twos Complement Method
10d = 0000 1010b-45d = 1101 0011b
1b0000 1010b0010 1100b0011 0111b
Converting 2s Comp back to decimal:0011 0111b = 55d
Step1) Initialize
Step2) Invertsubtrahend and setCIN = 1
Subtraction logic can be shared withaddition logic!
7/27/2019 Shawn Present
7/282005/03/07 Low-Power, High-Speed Multiplier Architectures 7
Adder Building Blocks
Half AdderSn = An BnCOn = An Bn
An
Bn
COn
Sn
SnCIN
n
COUTn
AnB
n
Full AdderSn = An Bn CINnCOUTn = An Bn CINn
7/27/2019 Shawn Present
8/282005/03/07 Low-Power, High-Speed Multiplier Architectures 8
Adder Architectures (CRA)
Carry Ripple Adder (CRA)
Gate Count N Area N Delay N Power N Layout friendly (low fan-in/fan-out; regular structure)
AN
BN
SN
FACOUTN
CIN0
A1
B1
S1
FA
A0
B0
S0
FA
7/27/2019 Shawn Present
9/282005/03/07 Low-Power, High-Speed Multiplier Architectures 9
Adder Architectures (CLA)
Carry Lookahead Adder (CLA)
Generate: Gn = An Bn Propagate: Pn = An + Bn
Recursive Relationship:
CINn = Gn-1 + Pn-1 CINn-1
Generates
Propagates 1
CINn = Gn-1 + Pn-1Gn-2 + Pn-1Pn-2P1G0 + Pn-1Pn-2P0CIN0
CLA: Delay log2N
(if built right)
Gate count, power aregreater than CRA
Not layout friendly (highfan-in; difficult to route)
GN-1
PN-1
CIN0
P0
P1
PN-1
PN-1
GN-3
PN-1
P1
P2
G0
PN-2
GN-2
CINN
AN BN A1 B1 A0 B0
SN
S1
S0
Source:Patterson and Hennessy,Figure A.14
Stage n
CINn
Stage n
CINn
Stage n-1
Stage n-1
CINn
7/27/2019 Shawn Present
10/282005/03/07 Low-Power, High-Speed Multiplier Architectures 10
Adder Architectures (CSA)
Carry Save Adder Adders work
independently, sovery fast
Pipelinedarchitectureresults in flops andcontrol logic,which increasearea and latency
CIN0A0 B0
S0
FA
COUT0
CIN1A1 B1
S1
FA
COUT1
CINN-1AN-1 BN-1
SN-1
FA
COUTN-1
CINNAN BN
SN
FA
COUTN
FAFAFAFA
FAFAFAFA
FAFAFAFA
7/27/2019 Shawn Present
11/282005/03/07 Low-Power, High-Speed Multiplier Architectures 11
Unsigned Multiplication
Shift-and-AddAlgorithm
Example: Multiply 118d by 99d
Multiplicand
Multiplier
Step1) Initialize
Step2) Find partial products
Step3) Sum up the shifted
partial products
118d99d
1062d1062 d11682d
Twos ComplementMethod
Step1) Initialize
Step2) Find partialproducts
Step3) Sum up theshifted partialproducts
118d = 0111 0110b99d = 0110 0011b
01110110b
Convert 2s-Comp back to decimal:0010 1101 1010 0010 = 11682d
00000000 b00000000 b
01110110 b01110110 b
00000000 b010110110100010 b
01110110 b00000000 b
7/27/2019 Shawn Present
12/282005/03/07 Low-Power, High-Speed Multiplier Architectures 12
Shift-and-Add Multiplier
A
B
SCOUT
Anx B
N-bit Adder
N N
Load BLoad A
P
N
N
N
N
N
N
N+1
1
2N
Shift
Add
B MultiplicandX A Multiplier
P Product
Shift-and-AddMultiplier
Take N cyclesto complete:
TLat= (TN-bitADD+Tshift)xN
Requiresminimal logic(most logic isin the adder)
7/27/2019 Shawn Present
13/282005/03/07 Low-Power, High-Speed Multiplier Architectures 13
A B
Shift-and-Add
Multiplier
Convert to
Unsigned
Convert to
Unsigned
Determine
Sign of Result
Convert to
Signed
P
2N
NN
Basic Signed Multiplication
ExtraHardware!
Basic Idea1. Convert to Unsigned2. Use Shift-and-Add
Multiplier
3. Convert to Signed
7/27/2019 Shawn Present
14/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 14
Signed Multiplication
Booth Recoding
Reduce the number of partial products byre-coding the multiplier operand
Works for signed numbers
Example: Multiply -118d by -99d
Recall, 99d = 0110 0011b
1001 1100b
1b-99d = 1001 1101bRadix-2BoothRecoding
0101 1110-99d =
An An-1PartialProduct
0 0 0
0 1 +B
1 0 -B
1 1 0
Low-order Bit
Last Bit Shifted Out
7/27/2019 Shawn Present
15/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 15
Radix-2 Booth Multiplication
Radix-2 Booth
Step1) Initialize
Step2) Find partialproducts
Step3) Sum up theshifted partialproducts
-118d = 0111 0110b
01110110b
Convert 2s-Comp back to decimal:0010 1101 1010 0010 = 11682d
00000000 b00000000 b
1110001010 b000000000 b
01110110 b0010110110100010 b
110001010 b01110110 b
0101 1110-99d =
-B
B-B00B0
-B
B = -118d = 1000 1010b-B = 118d = 0111 0110b
A = -99d = 1001 1101b
Example: Multiply -118d by -99d
Sign Extension
0101 1110-99d =
7/27/2019 Shawn Present
16/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 16
Array Multiplier
Array Multiplier Combinatorial, so it is very
fast delay N Can be pipelined
Very regular structure
-118d = 0111 0110b
01110110b
00000000 b00000000 b
1110001010 b000000000 b
01110110 b0010110110100010 b
110001010 b
01110110 b
0101 1110-99d =-BB
-B00
B0
-B
01110110b
110001010 b01110110 b
-B
B-B
FA FAFAFA
CSA
CSA
CSA
CSA
CSA
CPA
00000000 b 0
00000000 b 0
1110001010 b B
000000000 b 0
01110110 b -B
7/27/2019 Shawn Present
17/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 17
Array Multiplier Structure
Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999
7/27/2019 Shawn Present
18/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 18
Radix-4 Booth Multiplication
Similar to Radix-2, butuses looks at two low-order bits at a time(instead of 1) A2n+1 A2n A2n-1
PartialProduct
0 0 0 00 0 1 +B
0 1 0 +B
0 1 1 +2B
1 0 0 -2B
1 0 1 -B
1 1 0 -B
1 1 1 0
Low-order Bits
Last Bit Shifted Out
Recall, 99d = 0110 0011b
1001 1100b1b
-99d = 1001 1101bRadix-4BoothRecoding
-99d = 1122
7/27/2019 Shawn Present
19/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 19
Radix-4 Booth Multiplication
Radix-4 Booth
Step1) Initialize
Step2) Find partialproducts
Step3) Sum up theshifted partialproducts
-118d = 0111 0110b
Convert 2s-Comp back to decimal:0010 1101 1010 0010 = 11682d
111111110001010b
011101100 b0010110110100010 b
01110110 b
11100010100 b
B-B2B-2B
B = -118d = 1000 1010b-B = 118d = 0111 0110b
2B = -236d = 1 0001 0100b-2B = 236d = 0 1110 1100b
A = -99d = 1001 1101b
Example: Multiply -118d by -99d
Sign Extension
-99d = 1122
-99d = 1122
Reduces number of partial products by half!
7/27/2019 Shawn Present
20/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 20
Tree Multiplier
Wallace Tree Reduces the total number
of full-adders
Uses 3:2 Compressor(aka Full Adder)
Delay log3/2N Irregular structure is
difficult to layout
Source: J. Kuo, et. al., Low-Voltage CMOS VLSI Circuits, 1999
B7A
0B
0A
0
B7A
8B
0A
8
B7A
0B
0A
0B
7A
8
B0A
8
Original
Structure
Tree
Structure
7/27/2019 Shawn Present
21/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 21
Twin Pipe Serial-Parallel Multiplier
Features
Source: S. Shah, et.al., Comparison of 32-bit Multipliers for Various Performance Measures, 2000.
Even data
bits on risingclock
Odd databits onfalling clock
Parallel FeedOne Operand
Serial FeedOne Operand
Low Area High latency Low Power
7/27/2019 Shawn Present
22/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 22
Cluster Multiplication
Divide circuit intoclusters of nibble-wide multiplications
If all bits in a nibble
are zeroes, thenuse clock-gating togate multiplicationfor that nibble
A0
B0
A1
B1
A(N-1)
B(N-1)
A(N-1)xB0 A1xB0 A0xB0
A(N-1)xB1 A1xB1 A0xB1
A(N-1)x
B(N-1)A1xB(N-1) A0xB(N-1)
4 44
4
4
4
Source: A. Fayed, M. Bayoumi, A Novel Architecture forLow-Power Design of Parallel Multipliers, 2001.
Features Low Power
(claims 13% savings)
7/27/2019 Shawn Present
23/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 23
Multiplexer-Based Array Multiplier
Characteristics Fast (because it is
array-based)
Unlike Booth, does
not requireencoding logic Source: K. Pekmestzi, Multiplexer-Based Array Multipliers, 1999.
Processes 1 bit of multiplier and 1 bit of multiplicand at a time,thus it is symmetric
Has a zigzag shape, thus not layout-friendly
7/27/2019 Shawn Present
24/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 24
Area-Efficient Multiplexer-Based Multiplier
Characteristics
Increases each row to have N+1 cells (instead of N)
Depth is cut in half (increases squareness)
Source:Y. Wang, Y. Jiang, E. Sha, On Area-Efficient Low Power Array Multipliers, 2001.
7/27/2019 Shawn Present
25/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 25
Low Latency Booth-Encoding-based Pipeline Multiplier
Features Delay N/4 Needs (N+N/2)-bit
addition at end
Uses CLAs instead ofCSAs because longest
stage (i.e. adder atend) determines fastestoperating frequency
Source: X. Wu, H. Chen, S. Wei, Design of a Low Latency HighSpeed Pipelining Multiplier, 2001.
7/27/2019 Shawn Present
26/28
2005/03/07 Low-Power, High-Speed Multiplier Architectures 26
Twos Complement Gray-Encoded Array Multiplier
Characteristics Uses gray code to
reduce theswitching activity ofmultiplier
Claims thattraditional Boothuses 45% morepower
Greater area thantraditional Booth
Source: E. Costa, et.al., A New Architecture for 2s Complement Gray Encoded Array Multiplier, 2002.
7/27/2019 Shawn Present
27/28
7/27/2019 Shawn Present
28/28
2005/03/07 Low Power High Speed Multiplier Architectures 28
References
S. Shah, A.J. Al-Khalili, D. Al-Khalili, Comparison of 32-bit Multipliers for VariousPerformance Measures, Proc. 2000 Intl Conf. Microelectronics, pp. 75-80, 2000.
D. Patterson, J. Hennessy, 2nd, ed., Computer ArchitectureA Quantitative Approach,San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1996.
X. Wu, H. Chen, S. Wei, Design of a Low Latency High Speed Pipelining Multiplier,Proc. 2001 Intl Conf. on ASIC, pp. 551-554, 2001.
J. Wakerly, 2nd, ed., Digital DesignPrinciples and Practices, Eaglewood Cliffs, NJ:Prentice Hall, 1994.
J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons,Inc., 1999.
K. Pekmestzi, Multiplexer-Based Array Multipliers, IEEE Trans. on Computers, vol. 48,pp. 15-23, 1999.
A. Fayed, M. Bayoumi, A Novel Architecture for Low-Power Design of ParallelMultipliers, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149-154, 2001.
Y. Wang, Y. Jiang, E. Sha, On Area-Efficient Low Power Array Multipliers, Proc. 2001IEEE Intl Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429-1432, 2001.