1 Chapter 3, Appendix B ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ?...

1Chapter 3, Appendix B

ALU for Computers (MIPS)

• design a fast ALU for the MIPS ISA

• requirements ?– support the arithmetic/logic operations: add, addi addiu,

sub, subu, and or, andi, ori, xor, xori, slt, slti, sltu, sltiu

• design a multiplier

• design a divider

Review Digital Logic

Gates:

Combinational Logic

PLA: AND array, OR array

4Chapter 3, Appendix B Review Digital Logic

A D latch implemented with NOR gates.

A D flip-flop with a falling-edge trigger.

Value of D is sampled on positive clock edge.

Q outputs sampled value for rest of cycle.

module ff(D, Q, CLK);

input D, CLK;output Q;reg Q;

always @ (posedge CLK) Q <= D;

endmodule

Correct ?

input D, CLK;output Q;

always @ (CLK) Q <= D;

endmodule

Module code has two bugs.

Where?

Review: Edge-Triggering in Verilog

If Change == 1 on positive CLK

edgetraffic lightchanges

If Rst == 1 on positive CLK

edgeR Y G = 1 0 0

CLK Change Rst

(yellow)

(green)

Change == 1

Change == 1 Change == 1

R Y G1 0 0

R Y G0 0 1

R Y G0 1 0

Rst == 1

R Y G 1 0 0 1 0 00 1 00 0 1

Change == 1

R Y G1 0 0

R Y G0 0 1

R Y G0 1 0

Rst == 1

Change

Change == 1

R Y G1 0 0

R Y G0 0 1

R Y G0 1 0

Rst == 1

“One-Hot Encoding”

D QD Q D QR G Y

Next State Combinational Logic

D QD Q D QR G Y

ChangeRst

Change == 1

R Y G1 0 0

R Y G0 0 1

R Y G0 1 0

Rst == 1

wire next_R, next_Y, next_G;output R, Y, G;

D QD Q D QR G Y

State Elements: Traffic Light Controller

input D, CLK;output Q;reg Q;

always @ (posedge CLK) Q <= D;

endmodule

Value of D is sampled on positive clock edge.

Q outputs sampled value for rest of cycle.

D QD Q D QR G Y

State Elements: Traffic Light Controller

ff ff_R(R, next_R, CLK);ff ff_Y(Y, next_Y, CLK);ff ff_G(G, next_G, CLK);

Next State Logic: Traffic Light Controller

next_Gnext_R next_YR G Y

ChangeRst

wire next_R, next_Y, next_G;

assign next_R = rst ? 1’b1 : (change ? Y : R); assign next_Y = rst ? 1’b0 : (change ? G : Y);assign next_G = rst ? 1’b0 : (change ? R : G);

ff ff_R(R, next_R, CLK);ff ff_Y(Y, next_Y, CLK);ff ff_G(G, next_G, CLK);

Logic Diagram: Traffic Light Controller

D QD Q D QR G Y

Change == 1

R Y G1 0 0

R Y G0 0 1

R Y G0 1 0

Rst == 1

19Chapter 3, Appendix BALU for MIPS ISA

• design a 1-bit ALU using AND gate, OR gate, a full adder, and a mux

ALU for MIPS ISA• design a 32-bit ALU

by cascading 32 1-bit ALUs

ALU for MIPS• a 1-bit ALU performing AND, OR, addition and

subtraction

If we set Binvert = Carryin =1then we can perform a - b

ALU for MIPS

• include a “less” input for set-on-less-than (slt)

ALU for MIPS

• design the most significant bit ALU

• most significant bit need to do more work (detect overflow and MSB can be used for slt )

• how to detect an overflow overflow = carryin{MSB} xor carryout{MSB]

overflow = 1 ; means overflow

overflow = 0 ; means no overflow

• set-on-less-than

slt $1, $2, $3; if $2 < $3 then $1 = 1, else $1 = 0

; if MSB of $2 - $3 is 1, then $1 = 1

; 2’s comp. MSB of a negative no. is 1

ALU for MIPS

• a 1-bit ALU for the MSB

Overflow=Carryin XOR Carryout

A 32-bit ALU

constructed from

32 1-bit ALUs

A 32-bit ALUwith zero detector

A Verilog behavioral definition of a MIPS ALU.

ALU for MIPS

• Critical path of 32-bit ripple carry adder is 32 x carry propagation delay

• How to solve this problem– design trick : use more hardware

– design trick : look ahead, peek

– carry look adder (CLA)

• CLAa b cout

0 0 0 nothing happen

0 1 cin propagate cin

1 0 cin propagate cin

1 1 1 generate

propagate = a + b; generate = ab

ALU for MIPS

• CLA using 4-bit as an example

• two 4-bit numbers: a3a2a1a0, b3b2b1b0

• p0 = a0 + b0; g0 = a0b0

c1 = g0 + p0c0

c2 = g1 + p1c1

c3 = g2 + p2c2

c4 = g3 + p3c3

• larger CLA adders can be constructed by cascading 4-bit CLA adders

• other adders: carry select adder, carry skip adder

Design Process

• Divide and Conquer– using simple components

– glue simple components together

– work on the things you know how to do. The unknown will become obvious as you make progress

• Successive Refinement– multiplier design

– divider design

Multiplier

• paper and pencil method

multiplicand 0110

multiplier 1001

0110110

product

n bits x m bits = m+n bits

binary : 0 place 0

1 place a copy of multiplicand

Multiply Hardware Version 1

multiplicand shift left

64 bits

shift right

64-bit ALU multiplier

product write control64 bits

32 bits x 32 bits; using 64-bit multiplicand reg. 64 bit ALU, 64 bit product reg. 32 bit multiplier

Check the rightmost bit of M’rto decide to add 0or multiplicand

Control providesfour controlsignals

Multiply Algorithm Version 1

1. test multiplier0 (i.e., bit0 of multiplier)

1.a if multiplier0 = 1, add

multiplicand to product

and place result in

product register

2. shift the multiplicand left 1 bit

3. shift the multiplier right 1 bit

4. 32nd repetition ? if yes done

if no go to 1.

Multiply Algorithm Version 1 Example

iter. step multiplier multiplicand product

0 initial 0101 0000 0010 0000 0000

1 1.a 0101 0000 0010 0000 0010

2 0101 0000 0100 0000 0010

3 0010 0000 0100 0000 0010

2 2 0010 0000 1000 0000 0010

3 0001 0000 1000 0000 0010

3 1.a 0001 0000 1000 0000 1010

2 0001 0001 0000 0000 1010

3 0000 0001 0000 0000 1010

4 2 0000 0010 0000 0000 1010

3 0000 0010 0000 0000 1010

0010 x 0101 = 0000 1010

Multiplier Algorithm Version 1

• observations from version 1

• 1/2 bits in multiplicand always 0

• use 64-bit adder is wasted (for 32 bit x 32 bit)

• 0’s inserted into multiplicand as shifted left, least significant bits of the product does not change once formed

• 3 steps per bit

• shift product to right instead of shifting multiplicand to left ? (by adding to the left half of the product register)

multiplicand

32 bits

shift right

32-bit ALU multiplier

product shift right control32 bits

32-bit multiplicand reg. 32-bit ALU, 64-bit product reg. 32-bit multiplier reg

Check the rightmost bit of M’rto decide to add 0or multiplicand

Write into the left half of theproduct register

write32 bits

1. test multiplier0 (i.e., bit 0 of the multiplier)

1a. if multiplier0 = 1 add

multiplicand to the left

half of product and place

the result in the left half of

product register;

2. shift product reg. right 1 bit

3. shift multiplier reg. right 1 bit

4. 32nd repetition ? if yes done

if no, go to 1.

iter. step multiplier multiplicand product

0 initial 0011 0010 0000 0000

1 1.a 0011 0010 0010 0000

2 0011 0010 0001 0000

3 0001 0010 0001 0000

2 1.a 0001 0010 0011 0000

2 0001 0010 0001 1000

3 0000 0010 0001 1000

3 2 0000 0010 0000 1100

3 0000 0010 0000 1100

4 2 0000 0010 0000 0110

3 0000 0010 0000 0110

Multiply Version 2

• Observations– product reg. wastes space that exactly matches the size

of multiplier

– 3 steps per bit

– combine multiplier register and product register

• 32-bit multiplicand register, 32-bit ALU, 64-bit product register, multiplier reg is part of product register

multiplicand

32 bit ALU

product (multiplier) control

shift right

write intoleft half

1. test product0 (multiplier is in the right half of product register)

1a. if product0 = 1

add multiplicand to the left

half of product and place the

result in the left half of product

register

2. shift product register right 1 bit

3. 32nd repetition ? if yes, done

if no, go to 1.

iter. step multiplicand product

0 initial 1110 0000 1011

1 1.a 1110 1110 1011

2 1110 0111 0101

2 1.a 1110 10101 0101

2 1110 1010 1010

3 2 1110 0101 0101

4 1.a 1110 10011 0101

2 1110 1001 1010

1110 x 1011

1110 x 1011 = 1001 1010 14 x 11 = 154

need to save the carry

• Observations

• 2 steps per bit because of multiplier and product in one register, shift right 1 bit once (rather than twice in version 1 and version 2)

• MIPS registers Hi and Li correspond to left and right half of product

• MIPS has instruction multu

• How about signed numbers in multiplication ?– method 1: keep the sign of both numbers and use the magnitude

for multiplication, after 32 repetitions, then change the product to appropriate sign.

– method 2: Booth’s algorithm

– Booth’s algorithm is more elegant in signed number multiplications

– Booth’s algorithm uses the same hardware as version 3

Booth’s Algorithm

• Motivation for Booth’s Algorithm is speedexample 2 x 6 = 0010 x 0110

normal approach Booth’s approach

0010 0010

0110 0110

Booth’s approach : replace a string of 1s in multiplier by two actionsaction 1: beginning of a string of 1s, subtract multiplicandaction 2: end of a string of 1s, add multiplicand

Booth’s Algorithm

end of run middle of run beginning of run

011111111111111111110

current bit bit to the right explanation action

(previous bit)

1 0 beginning of a run of 1s sub. mult’d fromleft half of product

1 1 middle of a run no arithmetic oper.

0 1 end of a run add mul’d to left half of product0 0 middle of a run of 0s no arith. operation.

Booth’s Algorithm Example

iteration step multiplicand product

0 initial 1110 0000 0111 0

1 sub. 1110 0010 0111 0

product shift right 1110 0001 0011 1

2 shift right 1110 0000 1001 1

3 shift right 1110 0000 0100 1

4 add 1110 1110 0100 1

shift right 1110 1111 0010 0

-2 x 7=-14 in signed binary 1110 x 0111 = 1111 0010previous bit

To begin with we put multiplier at the right half of the product register

Divide Algorithm

Paper and pencil

quotient

divisor dividend

remainder (modulo )

10101010101011

Divide Hardware Version 1

• 64-bit divisor reg., 64-bit ALU, 32-bit quotient reg. 64-bit remainder register

divisorshift right

64-bit ALU

remainder

quotient

control

shift left

put the dividend in the remainder register initially

Divide Algorithm Version 1start: place dividend in remainder

1. sub. divisor from the remainder and place the result in remainder

2. test remainder

2a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1

2b. if remainder <0, restore the original value by adding divisor to remainder, and place the sum in remainder. shift

quotient to left and setting new least significant bit 0

3. shift divisor right 1 bit

4. n+1 repetitions ? if yes, done, if no, go to 1.

52Chapter 3, Appendix BDivide Algorithm Version 1 Example

iter. step quotient divisor remainder

0 initial 0000 0010 0000 0000 0111

1 1 0000 0010 0000 1110 0111

2b 0000 0010 0000 0000 0111

3 0000 0001 0000 0000 0111

2 1 0000 0001 0000 1111 0111

2b 0000 0001 0000 0000 0111

3 0000 0000 1000 0000 0111

3 1 0000 0000 1000 1111 1111

2b 0000 0000 1000 0000 0111

3 0000 0000 0100 0000 0111

4 1 0000 0000 0100 0000 0011

2a 0001 0000 0100 0000 0011

3 0001 0000 0010 0000 0011

5 1 0001 0000 0010 0000 0001

2a 0011 0000 0010 0000 0001

3 0011 0000 0001 0000 0001

Divide Algorithm Version 1

Observations – 1/2 bits in divisor always 0

– 1/2 of divisor is wasted

– 1/2 of 64-bit ALU is wasted

Possible improvement– instead of shifting divisor to right, shifting remainder to

left ?

– first step can not produce a 1 in quotient, so switch order to shift first and then subtract. This can save one iteration

32-bit divisor reg. 32-bit ALU, 32-bit quotient reg., 64-bit remainder reg.

divisor

32-bit ALU

remainder control

quotient

shift left

start: place dividend in remainder

1. shift remainder left 1 bit

3. test remainder

3a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1

3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift quotient to left and setting new least significant bit 0

4. n repetitions ? if yes, done,

if no, go to 1.

Divide Algorithm Version 2 Exampleiter. step quotient divisor remainder

0 initial 0000 0011 0000 1111

1 1 0000 0011 0001 1110

2 0000 0011 1110 1110

3b 0000 0011 0001 1110

2 1 0000 0011 0011 1100

2 0000 0011 0000 1100

3a 0001 0011 0000 1100

3 1 0001 0011 0001 1000

2 0001 0011 1110 1000

3b 0010 0011 0001 1000

4 1 0010 0011 0011 0000

2 0010 0011 0000 0000

3a 0101 0011 0000 0000

• Observations– 3 steps (shift remainder left, subtract, shift quotient left)

• Further improvement (version 3)– eliminating quotient register by combining with

remainder register as shifted left

– therefore loop contains only two steps, because the shift of remainder is shifting the remainder in the left half and the quotient in the right half at the same time

– consequence of combining the two registers together is the remainder shifted one time unnecessary at the last iteration

– final correction step: shift back the remainder in the left half of the remainder register (i.e., shift right 1 bit of remainder only)

32-bit divisor register, 32-bit ALU, 64-bit remainder register, 0-bit quotient register (quotient bit shifts into remainder register, as remainder register shifts left)

divisor

32-bit ALU

remainder, quotient

control

64-bit

32bits

shift left

start: place dividend in remainder

1. shift remainder left 1 bit

3. test remainder

3a. if remainder >= 0, shift remainder to left setting the new rightmost bit to 1

3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift remainder to left and setting new least significant bit 0

4. n repetitions ? if yes, done,

if no, go to 2.

Divide Algorithm Version 3 Example

iter. step divisor remainder

0 initial 0101 0000 1110

1 0101 0001 1100

1 2 0101 1100 1100

3b 0101 0011 1000

2 2 0101 1110 1000

3b 0101 0111 0000

3 2 0101 0010 0000

3a 0101 0100 0001

4 2 0101 1111 0001

3b 0101 1000 0010

0100 0010

correction step: shift remainder right 1bit.quotient

• Observations– same hardware as multiply, need a 32-bit ALU to add and

subtract and a 64-bit register to shift left and right

– divide algorithm version 3 is called restoring division algorithm for unsigned numbers

• Signed numbers divide– simplest method

» remember signs of dividend and divisor, make postive, and finally complement quotient and remainder as necessary

» dividend and remainder must have the same sign

» quotient is negative if dividend sign and divisor sign disagree

– SRT (named after three persons) method

» an efficient algorithm

Floating Point Numbers

• What can be represented in N bits ?

unsigned 0 <-------------> 2N-1

2’s complement. -2N- 1 <------------------> 2N-1 - 1

1’s comp. -2N-1+ 1 <---------------------->2N-1 - 1

BCD 0 <-----------------------> 10N/4 - 1

How about

very small numbers, very large numbers

rationals, such as 2/3; irrationals such as 2;

transcendentals, such as , .

• Mantissa (aka Significand), Exponent (using radix of 10)

6.12 x 10 23

IEEE standard F.P. 1.M x 2E-127

mantissa = sign + magnitude; magnitude is normalized with hidden integer bit: 1.Mexponent = E -127 (excess 127), 0 < E < 255

a FP number N = (-1)S 2(E-127) (1.M)

0 = 0 00000000 00000000000000000000000-1.5 = 1 01111111 10000000000000000000000

single precision S(1bit), E(8 bits), M(23 bits)

• Single Precision FP numbers

- 0.75 = __________________________________

- 5.0 = ___________________________________

7 = ____________________________________

-0.75 =-0.11b=-1.1 x 2-1 E=126 1 01111110 10000.......0

-5.0 = -101.0b=-1.01 x 22 E=129

7 = 111b = 1.11 x 22 E=129

• Single precision FP number

What is the smallest number in magnitude ?

(1.0) 2 -126

What is the largest number in magnitude ?

(1.11111111111111111111111)binary 2127 = (2 - 2-23) 2127

single precision FP numbers

Exponent Significand Object represented

0 nonzero denormalized numbers

1 to 254 anything floating point numbers

255 0 infinite

255 nonzero NaN (Not A Number)

other topics in FP numbers1. extra bits for rounding2. guard bit, sticky bit3. algorithms for FP numbers

• Double precision– 64 bits total

» 52-bit significand

» 11-bit exponent (excess 1023 bias)

– Number is: (-1)s (1.M) x 2E-1023

Basic Addition Algorithm

• Steps for Y + X, assuming Y >= X1. Align binary points (denormalize smaller number)

a. compute Diff = Exp(Y) - Exp(X); Exp = Exp(Y)

b. Sig(X) = Sig(X) >> Diff

2. Add the aligned components

Sig = Sig(X) + Sig(Y)

3. Normalize the sum

1. shift Sig right/left until leading bit is 1; decrementing or incrementing Exp.

2. Check for overflow in Exp

3. Round

4. repeat step 3 it not still normalized

Addition Example

• 4-bit significand1.0110 x 23 + 1.1000 x 22

• align binary points (denormalize smaller number)1. 0110 x 23

0. 1100 x 23

• Add the aligned components10. 0010 x 23

• Normalize the sum1.0001 x 24

No overflow, no rounding

Another Addition Example

• 1.0001 x 23 - 1.1110 x 1

– 4-bit significand; extra bit needed for accuracy

1. Align binary point:

1. 0001 x 23

- 0. 01111 x 23

2. Subtract the aligned components

0. 10011 x 23

3. Normalize

1.0011 x 22 = 4.75

Without extra bit, the result would be 0.1001 x 23 = 100.1 = 4.5, which is off by 0.25. This is too much!

Accuracy and Rounding

• Want arithmetic to be fully precise– IEEE 754 keeps two extra digits on the right during

intermediate calculations (guard digit, round digit)

• Alignment step can cause data to be discarded (shifted out on right)

2.56 x 100 + 2.34 x 102

2.3400 x 102

+ 0.0256 x 102

2.3656 x 102

Round Answer = 2.37 x 102

Without using Guard and Round digits,Answer would be 2.36 x 102

72Chapter 3, Appendix BPentium Bug

• Pentium FP divide uses SRT algotithm for divide– implementation using PLA, five divisors omitted from PLA (1.0001,

1.0100, 1.0111, 1.1010, 1.1101)

– Pentium table uses 7 bits of remainder and 4 bits of divisor = 2048 entries

– FP divisors near integers 3, 9,15, 21, 27 are dangerous

– Scientists suspect errors and report on Internet Sep. 1994

– Intel discoveries bug in Pentium in June 1994, takes months to fix (4 to 5 million Pentiums with bug)

– Intel press release Nov 1994, “Most engineers and financial analysts need only 4 or 5 digits. Error occurs at the 9th digit. Only mathematicians should be concerned”

– Intel claims error happens once in 27000 years

– IBM claims error happens once in 24 days, Ban Pentium Sales

– Intel says “It is just a FLAW, Dammit, not a BUG”

– Well’ it takes nearly 300 correct codes to fix a FLAW.

– Jan. 1995: Intel writes down $500M to cover replacement costs

1 Chapter 3, Appendix B ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ?...

Documents

Transcript of 1 Chapter 3, Appendix B ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ?...

Emulation: Binary Translationcse.unl.edu/~witty/class/embedded/material/note/binary_translation_… · Binary Translation MIPS: addi r16,r4,4 add r17,r2,r16 lw r18,0(r17) add r7,r18,r7

Beyond the Hype: MIPS® - the Processor for MCUs · ARM has carried forward aspects of its legacy archi- ... • The MIPS architecture primarily executes single ... all ALU and shift

Agenda - EECS Instructional Support Group Home Pagecs61c/fa10/lectures/08...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/15/10 16 Stages&of&the&Datapath&(1/5)& • There&is&awide&variety&of&MIPS&instrucWons:&so&

Vhdl Codes for Mips Instructions Lw,Sw,Beq,Bne,j,Jal,Lui,Add,Addi,Or,Ori,Slt,Nor,And,Exceptions

Alu-Tech Bodyshell - Canterbury · PDF file · 2015-12-04Alu-Tech Bodyshell The pursuit of enjoyment ... About Alu-Tech Alu-Tech Bodyshell Alu-Tech Bodyshell Alu-Tech Bodyshell Alu-Tech

· 2020. 3. 31. · ALU 6040 ALU 6040 ALU 6040 ALU 6041 K1 ALU 6041 KT. 60 61 ALU 6060 ALU 6060 ALU 6060 ALU 6061 K3 ALU 6062 F1 ALU 6063 K5. 62 63 ALU 6070 ALU 6070 ALU 6070 ALU

Organisasi Dasar MIPS · Sli l k hil hitSelain mengeluarkan hasil penghitungan (ALU Result)ALU juga mengeluarkan zero flag Zero flagdigunakan sebagai indikator apakah nilai keluarannya

MIPS ALU. Building from the adder to ALU ALU – Arithmetic Logic Unit, does the major calculations in the computer, including – Add – And – Or – Sub –

Fundamentals of Computer Systemsmartha/courses/3827/sp16/mips-isa.pdfArithmetic (immediate) addi Add immediate addiu Add immediate unsigned slti Set on l. t. immediate sltiu Set on

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structurescs61c/fa04/lectures/L26-dg-singlecpu.pdf · Verilog ALU for MIPS Interpreter (1/3) // Behavioral model of ALU: // 8 functions

13.7 Alu-Light Inside / Alu-Light Box System: - Alu-Light - Poly … · 2020. 9. 8. · - Alu-Light - Poly-Metallic - Poly-Decor Alu/Wood - Poly-Color. System: 13.7 Alu-Light Inside

Addi Best)

1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and,

Fundamentals of Computer Systems - cs.columbia.edusedwards/classes/2012/3827-fall/mips-uarch.pdf · ALU Interface and Implementation ... Fundamentals of Computer Systems - A Single

Fall 2006 1 EE 333 Lillevik 333f06-l7 University of Portland School of Engineering Computer Organization Lecture 7 ALU design MIPS data path.

MIPS32® Architecture Volume II: The MIPS32® Instruction Set II: MIPS32... · 2021. 2. 24. · MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS-3D, MIPS16, MIPS16e, MIPS32,

MIPS Arithmetic and Logic Instructionsfaculty.kfupm.edu.sa/coe/mudawar/coe301/lectures/04-ALU... · Arithmetic & Logic Unit 32 General Purpose Registers Integer Multiplier/Divider

EECC550 - Shaaban #1 Lec # 7 Spring2000 3-31-2000 MIPS Integer ALU Requirements Add, AddU, Sub, SubU, AddI, AddIU: 2’s complement adder/sub with overflow.

The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013

MIPS Architecture Topics What resources MIPS assembly manipulates CPU (Central Processing Unit) ALU (Arithmetic Logical Unit), Registers Memory I/O.