ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 -...

55
ELEC516/10 Lecture ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste: Chapter 8 Rabaey: Chapter 11 Note: some of the figures in this slide set are adapted from the slide set of “ Digital Integrated Circuits” by Rabaey. Et. al. 2002

Transcript of ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 -...

Page 1: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 41

ELEC 516 VLSI System Design and Design Automation Spring 2010

Lecture 4 - Shifter and Multiplier Design

Reading Assignment:

Weste: Chapter 8

Rabaey: Chapter 11

Note: some of the figures in this slide set are adapted from the slide setof “ Digital Integrated Circuits” by Rabaey. Et. al. 2002

Page 2: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 42

Shifter Design• Shifting operations are important and are used extensively for

– arithmetic shifting, logical shifting, rotation, – floating point operations, scaling and multiplications by

constant number– Data alignment– Field extraction/combination– Address generation

• Shifting a data-word left or right over a constant amount is trivial hardware operation. A programmable shifter, however, is more complex.

• E.g. shift left or right for a variable number of bit• Design style

– Two dimension arrays– Variable size– Rotate– Padding with zeros/ones

Page 3: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 43

A simple shifter

•The above design will rapidly become complex and slow for larger shift values•More structural approach is advisable: Two commonly used shift structures, the barrel shifter and the logarithmic shifter.

Ai

Ai-1

Bi

Bi-1

Right Leftnop

Bit-Slice i

...

Page 4: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 44

Barrel Shifter

• It consists of array of transmission gates, where the number of row equals the word length of the data and the number of columns equals the maximum shift length.

• A major advantage for this shifter is that the signal has to pass through at most one transmission gate and hence the delay is theoretically constant and independent of the shift value or shifter size. This is not true in reality since the capacitance at the input of the buffers rise linearly with the maximum shift-width.

Page 5: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 45

Barrel Shifter (2)

Sh3Sh2Sh1Sh0

Sh3

Sh2

Sh1

A3

A2

A1

A0

B3

B2

B1

B0

: Control Wire

: Data Wire

Area Dominated by Wiring

Page 6: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 46

Logarithmic Shifter

• While the barrel shifter implements the whole shifter as a single array of pass-transistors, the log. shifter uses a staged approach. It uses stages of multiplexers which decompose the shift into power-of-two stages.

• A shifter with a maximum shift width of M consists of log2M stages, where the ith stage either shifts over 2i or passes the data unchanged.

• Log. shifter is usually smaller than the barrel shifter. For larger values, of M, it is definitely the structure of choice.

• The speed depends upon the shift-width in a log. way since a n-bit shifter requires log2n stages.

• Other shift options are frequently required, for instance, shuffles, bit reversals, and interchanges.

Page 7: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 47

Logarithmic Shifter (2)

• In general, it can be concluded that a barrel-shifter is appropriate for smaller shifters. For large shift values, the log. shifter becomes more effective, in terms of area and speed. Also log. shifter is more regular and hence can be easily generated automatically.

Sh1 Sh1 Sh2 Sh2 Sh4 Sh4

A3

A2

A1

A0

B1

B0

B2

B3

Page 8: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 48

Multiplexer-based shifter

Page 9: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 49

Shifter design - Summary

• The design of a shifter is a trade-off between area, delay.

• Barrel shifter: fastest but requires more transistors Speed: O(1), area: n2 transistors

• Logarithmic shifter: Slower but less transistors: Speed: O(log n), area: n log n transistors

• Barrel shifter is wire-dominated circuit

Page 10: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 410

The Multiplier

• Very important operation. Often the speed of multiplication limits the performance of the digital processor.

• Multiplications are used in many digital signal processing applications: – correlations, convolution, filtering, and frequency analysis.– Vector product, matrix multiplication.– Weighted sums required in many DSP such as Neural

network, Filtering etc…

• Multipliers are in fact complex adder arrays. • The analysis of the multiplier gives us some further insight

on how to optimize the performance (or the area) of complex circuit topologies.

Page 11: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 411

Example

•The multiplication process may be viewed to consist of two steps:

•Evaluation of partial products•Accumulation of the shifted partial products.

• Partial products can be generated using an array of AND gates.

• Example: 10x5

Multiplicand: 1 0 1 0 10Multiplier: 0 1 0 1 5

1 0 1 00 0 0 0

1 0 1 00 0 0 0

0 1 1 0 0 1 0 50

4 partial products

Page 12: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 412

The Multiplier(II)

• Binary multiplication is equivalent AND operation. Evaluation of the partial products consists of the logical ANDing of the multiplicand and the relevant multiplier bit.

• Different techniques exist. The choice of technique is based on factors such as speed, throughput, numerical accuracy and area.

• N*N multiplier has 2n bits output– Integer multiplier – takes the n LSB bits

– Floating point multiplier (or fixed point with decimal point in the MSB) e.g. FP, 1.XXX * 1.XXX, takes the n MSB bits

Page 13: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 413

Simple multiplier

• Generates and add one partial product at each cycles.

• Takes n cycles.multiplicand

multiplierPartial Product

generation

Adder

Shift

Shift right every cycle

Page 14: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 414

Issues for design fast multiplier

• Reduce the number of partial products• Fast adder cells• Reducing the number of addition required to sum

the partial products – e.g. use tree adders

Page 15: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 415

The Array Multiplier

• Consider two unsigned binary number X and Y that are M and N bits wide, respectively

iM

iiXX 2

1

0

jN

jjYY 2

1

0

1

0

1

0

1

0

1

0

222M

i

N

j

jiji

N

j

jj

M

i

ii YXYYXX

•Pk the partial product terms called summands. There are M*N summands which are generated in parallel by a set of M*N AND gates

kNM

kkbYXb 2

1

0

Page 16: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 416

The Array Multiplier (II)

• A n*n multiplier requires n(n-2) full adders, n half adders, and n2 AND gates. The worst case delay is (2n+1)g, where g is the worst case adder delay.

Page 17: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 417

The Array Multiplier (III)

• The following is a basic cell used in array multiplier

B CYX

Y

CO

X PO

+

Page 18: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 418

A 4*4 array multiplier

HA FA FA HA

FA FA FA HA

FA FA FA HA

X0X1X2X3 Y1

X0X1X2X3 Y2

X0X1X2X3 Y3

Z1

Z2

Z3Z4Z5Z6

Z0

Z7

Y0x3 x2 x1 x0

Page 19: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 419

The MxN Array Multiplier - Critical Path

HA FA FA HA

HAFAFAFA

FAFA FA HA

Critical Path 1

Critical Path 2

andsumcarrymult ttNtNMt )1()]2()1[(

Page 20: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 420

Carry-Save Adder (old style)• We don’t need to optimize the carry chain of each of

the rows. Postpone the carry to a later stage

CSA

Delay=N.tcarry+ tand + tmerge

HA HA HA HA

HA FA FA FA

HA FA FA FA

HA FA FA HA

[Rab96] p.411

Vector merging stageHA FA FA HA

N

M

Page 21: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 421

Booth Encoding

• The multiplier we studied before use radix-2 multiplication, i.e. by observing one bit of the multiplicand at a time.

• Higher radix multipliers may be designed to reduce the number of adders and hence the delay required to compute the partial sums.

• Booth encoding - perform two’s complement multiplication and perform several steps of the multiplication at once.

• It takes the advantage of the fact that an add-subtracter is nearly as fast and small as a simple adder.

• The most common form of Booth’s algorithm looks at three bits of the multiplier at a time to perform two stages of multiplication.

Page 22: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 422

Booth Multiplier: Example• 2a = 2a+1- 2a and hence we can recode each 1 in

multiplier as “+2-1”– Converts sequences of 1 to 10…0(-1)– Might reduce the number of 1’s

0 0 1 1 1 1 1 1 0 0

+1 -1+1 -1

+1 -1+1 -1

+1 -1+1 -1

0 1 0 0 0 0 0 -1 0 0

Less 1’s inthis sequence

[© K. Bazaragan]

Page 23: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 423

0 0 1 1 0 6x 0 1 1 1 0 14+1 0 0 -1 0 0 0 0 0 0

1 1 0 1 0 (-6) 0 0 0 0 0

0 0 0 0 0 0 0 1 1 0

0 0 1 0 1 0 1 0 0 84

Booth Recoding: Multiplication Example

1 1 1

Sign extension Only two rows of partial sums

[© K. Bazaragan]

Page 24: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 424

Booth Recoding: Advantages and Disadvantages

• Major advantage: Can reduce the number of 1’sin multiplier

• So far:– We did not improve the speed of the multiplier as we

still have to wait for the critical path, e.g., the shift-add delay in sequential multiplier.

– Booth recording results in increased area as we need recoding circuitry AND subtraction

Page 25: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 425

Modified Booth Multiplier• We can reduce the # of partial sums –Group more bits• Group pairs, leaving –2, -1, 0, 1, 2

– Grouping reduces # of partial products by half

• Booth recoding results in:– Gets rid of 3’s (sequences of 1’s in general)

0 1 1 0 1 1 1 0 0 0 1 0

(+1 -1) (+1 -1) (+1 -1) (+1 -1) (+1 -1) (+1 -1)

+1 0 -1 +1 0 0 -1 0 0 +1 -1 0 +2 -1 0 -2 +1 -2

[©Hauck]

Page 26: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 426

Modified Booth Encoding (II)

• Consider the two’s complement representation of the multiplier y:

22

11 222 n

nn

nn

n yyyy

• We can rewrite 2a = 2a+1- 2a and hence

)(2)(2)(2 232

121

1 nnn

nnn

nnn yyyyyyy

• Look at the first two terms

)(2)(2 121

1

nnn

nnn yyyy

Page 27: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 427

Modified Booth Multiplier

• Can encode the digits by looking at three bits at a time (reduce the partial sums)

• Booth recoding table:– Must be able to add

multiplicand times –2, -1, 0, 1 and 2

– Since Booth recoding got rid of 3’s, generating partial products is not that hard (shifting and negating)

i+1 i i-1 add

0 0 0 0*M 0 0 1 1*M 0 1 0 1*M 0 1 1 2*M 1 0 0 –2*M 1 0 1 –1*M 1 1 0 –1*M 1 1 1 0*M

[©Hauck]

Page 28: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 428

-2-1 01 1 1 0 0 1 1 0

1 1 1 1 0 0 1 10 0 0 0 0 0

1 1 1 0 1 1 0 0 1 0

Booth Multiplier: Example

• Retire two bits per shift operation• Addition: signed

– Sign extend 2 bits if addingtwo partial products at a time

i+1 i i-1 add

0 0 0 0*M 0 0 1 1*M 0 1 0 1*M 0 1 1 2*M 1 0 0 –2*M 1 0 1 –1*M 1 1 0 –1*M 1 1 1 0*M

0 0 1 1 0 1 131 1 1 0 1 0 -6

[© K. Bazaragan]

Page 29: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 429

Booth Multiplier• The following shows a structure of a Booth multiplier

Left shift 2

codeAdder/subtractor

mux sel

Left shift 2

codeAdder/subtractor

mux sel

Pj+1

Pj+1

0 x 2x

yi+4

yi+3

yi+2

yi+2

yi+1

yi

0 x 2x

Pj

Stage j+1 Stage j

Page 30: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 430

Modified Booth Multiplier -Summary

• Uses high-radix to reduce number of intermediate addition operands– Can go higher: radix-8, radix-16

– Radix-8 should implement *3, *-3, *4, *-4

– Recoding and partial product generation becomes more complex

• Can automatically take care of signed multiplication

Page 31: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 431

Wallace-Tree Based Multiplier

• Principle– Sum N shifted partial products

– Do N-input addition efficiently

– Reduced N-input addition in steps

– Use counters, e.g. carry-save adder (CSA) (3/2 reduction)

• CSA is simple, it is just a full adder– At the end of the array you need to add two parts

together.

– This take a fast adder, but you only need one at the end, not one for each partial product.

Page 32: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 432

Reduction by Carry-save adders

• Example: X(2,1,0)*Y(2,1,0), Let A0=X(0)*Y(0), A1 = X(1)*Y(0), X(2)*Y(0), etc. A2 A1 A0

B2 B1 B0

C2 C1 C0

CSA

CSA

CPA

C0

A0B0A1

B1A2

C1B2

C2

Page 33: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 433

Carry-Save Multiplier

HA HA HA HA

FAFAFAHA

FAHA FA FA

FAHA FA HA

Vector Merging Adder

Page 34: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 434

Wallace Tree Multiplier

• The Wallace tree multiplier uses logic tricks to speed up the required addition. It is an adder tree built from carry save adders using 3-to-2 reduction

ABC CS No. of 1’s000 00 0001 01 1010 01 1011 10 2100 01 1101 10 2110 10 2111 11 3

A 1-bit adder provides a 3:2 compression in the number of bits. The addition of partial products in a column of an array multiplier may be thought of as totaling up the number of 1’s in that column, with an carry being passed to the next column to the left.

Page 35: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 435

Wallace Tree Multiplier

Partial Product Generator

Summation Network

Carry Propagate Adder

Multiplicand

Partial Products

Two 2n bit operands

Final 2n bit Product

Page 36: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 436

Wallace-Tree Multiplier

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Partial products First stage

Bit position

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Second stage Final adder

FA HA

(a) (b)

(c) (d)

Page 37: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 437

Wallace Tree Example

[© Oxford U Press]

Delay = 4 CSA + 1 CLA[Par00] p130

Page 38: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 438

Wallace-Tree Multiplier

Partial products

First stage

Second stage

Final adder

FA FA FA

HA HA

FA

x3y3

z7 z6 z5 z4 z3 z2 z1 z0

x3y2x2y3

x1y1x3y0 x2y0 x0y1x0y2

x2y2x1y3

x1y2x3y1x0y3 x1y0 x0y

Page 39: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 439

Wallace-Tree Based Multiplier

FA

FA

FA

FA

y0 y1 y2

y3

y4

y5

S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

FA

y0 y1 y2

FA

y3 y4 y5

FA

FA

CC S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

Page 40: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 440

The issues of sign extension

• When the partial product is negative, we need to do sign extension.

• If we do it just by copying of bit, there is impact on the delay since the fanout can be large.

• We can do some tricks– Pre-add the triangle of 1’s

– The to clear out 1’s by adding 1 to the row

1 1 1 1 1 1 1 11 1 1 1 1 11 1 1 11 11 0 1 0 1 0 1 1

1 1 1 1 1 1 1 1 S

0 0 0 0 0 0 0 0or 1 1 1 1 1 1 1 1

(S=0)(S=1)

Page 41: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 441

The issues of sign extension

• Now you only need to add few bits

S S S 1S 1 S 1S 1 0 1 0 1 0 11

• Adding these few bits is equivalent to complete sign extension

Page 42: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 442

Other Multiplier structures• Serial Multiplier: Very compact but very slow: M+N bit product

requires Td= MN clock cycles

• Serial/Parallel Multiplier: Very modular, good trade-off: Td=M+N cycles

Page 43: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 443

Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques

- Data encoding (Booth)- Pipelining

FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

- Logarithmic versus Linear (Wallace Tree Mult)

Page 44: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 444

Floating-point units

• More complex operation/more time• Fewer access• Often designed outside the normal ALU• Co-processor• Floating point representation• Data = (-1)sign*0.1 Fraction*2exp

• Normalization:– 1 < Data <= ½ (Exp =0, Sign =0)– First Decimal Digit is one– No need for representing it

• IEEE standard: sign – 1 bit, exponent – 11 bits, fraction – 52 bits => total 64 bits

Page 45: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 445

Floating Point Addition

• Align operands– Check exponents

– Shift data

• Add fractional bits– Integer addition

• Normalization– Shift data

– Increment or decrement exponents

• Rounding data

Page 46: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 446

Floating point adder

A B A B A Bsign exponent mantissa

SignUnit

Exp. Diff.ShiftAlign

Adder(Mantissa)

NormRound

CC C

Exp. update

+/-

signexponent

mantissa

Page 47: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 447

Floating Point Multiplication

• Add exponents– 11 bit addition

• Multiply the mantissa– Integer multiplication

• Normalization– Shift data (at most by one)

– Decrement exponent

• Rounding data

Page 48: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 448

Floating Point Multiplier

A B A B A Bsign exponent mantissa

Ex-or

Exp. Add

Multiplier(Mantissa)

NormRound

CC C

Exp. update

signexponent

mantissa

Page 49: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 449

Comparator

• A = B, A > B, A < B

Page 50: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 450

High speed comparator

• A single-cycle comparator based on the priority-encoding algorithm and dynamic circuit design technique [Huang 2002]

• 4 steps:1. XOR gate is used to determine whether each corresponding bit of

the two numbers is equal or not.

2. A priority encoder is used to set the most significant unequal bit of the result from step 1 to ‘1’ and reset all other bits to ‘0’.

3. The result of step 2 is “ANDed” with the two input numbers.

4. All the bits of the results of step 3 are “ORed” together to determine which number is greater.

Page 51: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 451

Dynamic Priority Encoder

Critical path: 7 transistors because of the NAND gate implementation

Page 52: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 452

Wide bit width comparator – 64 bits

• Hierarchical- multistages• Phase pipelining to achieve single clock

Page 53: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 453

New comparator not using Priority encoder

• New algorithm uses a parallel MSBs bit checking method instead of priority encoding to determine the location of the first significant bit that the two inputs are different.

• Using this method facilitates the use of NOR-type logic gate and results in faster speed for dynamic logic implementation

Page 54: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 454

New algorithm• 4 steps

1. Both AB’ and A’B are computed. Unlike the original PE algorithm which uses XOR gate to find the bits that A and B are different, the information of which number is larger at that particular bit location. E.g :4’b0010 indicates that at bit 1, A is larger than B.

2. A data conversion (calculating A* and B*) is done to determine the most significant bit that is a ‘1’ in the result of step 1. Different from the priority encoder, instead of setting the most significant 1-bit to 1 and resetting all the other bits to ‘0’, we set all the preceding bits of the most significant 1-bit (not including the most significant 1-bit itself) to 1 and reset all the other bits to zero. By doing so the implementation can be done using NOR type of dynamic logic.

3. we calculate (A*)’B* and A*(B*)’. If A* has a longer running length of zero, A*(B*)’. will be all zero and (A*)’B* will have some bits equal to 1, and vice versa.

4. We check whether the result of step 3 is an all zero vector or not by ORing all the bits together. A corresponding zero vector means that the other input is the greater one.

BA.

Page 55: ELEC516/10 Lecture 4 1 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 4 - Shifter and Multiplier Design Reading Assignment: Weste:

ELEC516/10 Lecture 455

Implementation