Arithmetic for Computers

Introduction

Chapter 3 Sections 3.1 3.5 & 3.8Appendix C.1 C.3, C.5 C.6

Dr. Iyad F. JafarArithmetic for ComputersOutlineAddition and SubtractionOverflow Detection Faster Addition The 1-Bit ALU The 32-bit MIPS ALU Shift OperationsMultiplication DivisionFloating Point NumbersFallacies and Pitfalls

22Addition and SubtractionAdd corresponding bits including the sign bit and ignore the carry out of the MSBFor subtraction, add the negative 4

+ 3

70100

0011

0111-4

+ 3

-11100

0011

1111-4

- (-3)

1100

1101

4

- 3

10100

1101

1 0001-4

+ 3

-11100

0011

111133When do we get overflow?Adding two positive numbers and get a negative number When we add two negative numbers and get a positive numberInvestigate the sign bit!

Detecting Overflow 0+ 00Cin0Cout+++ 0+ 01Cin0Cout++-No overflowOverflow 1+ 10Cin1Cout--+ 1+ 11Cin1Cout---OverflowNo OverflowOverflow when carry into sign bit does not equal the carry outCinCoutOverflow0110 1+ 00Cin0Cout-+-No Overflow1 1+ 01Cin1Cout-++No Overflow04Addition and SubtractionHow to perform addition in hardware? Design 32-bit adder (two 32-bit inputs !!!!)Cell design ! 1-bit Full Adder 5+B1A1SumCarryOut CarryInABCinCoutSum000000010101001011101 0001101101101011111BAB00 01 11 10 0 1 ACinCout00011101BAB00 01 11 10 0 1 ACinSum01100110Cout =Sum = A B Cin + BCin+ ACinAB5Addition and Subtraction32-bit ripple-carry adderCascade 32 copies and wire them up through the Cin and Cout

How long does it take to get the result ?

6FA0A0B0S0FAA1B1S1FAA2B2S2FAA31B31S31C326Addition and Subtraction32-bit ripple-carry SubtractorSubtraction is addition of the negative!Compute the 2s complement = 1s complement + 1

7FA1A0B0D0FAA1B1D1FAA2B2D2FAA31B31D31B327Addition and Subtraction32-bit ripple-carry adder/subtractorRedundancy in hardware!! Subtraction is addition of the negative!Use one adder and configure the second inputRemember X 1 = X and X 0 = X

8FAAdd/SubA0B0S0FAA1B1S1FAA2B2S2FAA31B31S31C32 0 ADD 1 Subtract8Faster AdditionThe ripple-carry adder is slow!We have to wait until the carry is propagated to the final position in order to read out the addition or subtraction result.Carry generation is associated with two levels of gates at each bit position Coi = AiBi + AiCini + BiCini Total delay = gate delay x 2 x number of bitsExample16 bit adder delay is 32 delay units Can we go faster? What if we generate the carries in parallel?

99

Faster AdditionThe carries can be expressed by the Adders inputs and c0 exclusively!Add a separate hardware to compute the carry in parallel!Carry-lookahead Adder

10A31 A0B31 B0c0c1c2c3c410Faster AdditionIn a 4-bit adder, the equations of the carries are c1 = (b0 . c0) + (a0 . c0) + (a0 . b0) c2 = (b1 . c1) + (a1 . c1) + (a1 . b1) c3 = (b2 . c2) + (a2 . c2) + (a2 . b2)c4 = (b3 . c3) + (a3 . c3) + (a3 . b3) By substitution c2 = (a1 . a0 . b0) + (a1 . a0 . c0) + (a1 . b0 . c0) + (b1 . a0 . b0) + (b1 . a0 . c0 ) + (b1 . b0 . c0) + (a1 . b1)c3 = (b2 . a1 . a0 . b0) + (b2 . a1 . a0 . c0) + (b2 . a1 . b0 . c0) + (b2 . b1 . a0 . b0) + (b2 . b1 . a0 . c0 ) + (b2 . b1 . b0 . c0) + (b2 . a1 . b1) + (a2 . a1 . a0 . b0) + (a2 . a1 . a0 . c0) + (a2 . a1 . b0 . c0) + (a2 . b1 . a0 . b0) + (a2 . b1 . a0 . c0 ) + (a2 . b1 . b0 . c0) + (a2 . a1 . b1) + (a2 . b2)c4 =

All carries require two gate delays !However, imagine the equation/cost if the adder is 32 bits ??

1111Faster AdditionWe can reduce the logic cost by simple simplificationci+1 = (ai . bi) + (bi . ci) + (ai . ci) = (ai . bi) + (ai + bi) . ci = gi + pi . cigi : carry generate pi : carry propagate Carry equations for 4 bit adderc1 = g0 + p0 . c0 c2 = g1 + p1. c1 = g1 + (p1 . g0) + (p1 . p0 . c0)c3 = g2 + p2. c2 = g2 + (p2 . g1) + (p2 . p1 . g0) + (p2 . p1 . p0 . c0)c4 = g3 + p3. c3= g3 + (p3 . g2) + (p3 . p2 . g1) + (p3 . p2 . p1 . g0) + (p3 . p2 . p1 . p0 . c0)

Delay to generate c4 is 3 gate delayStill cost is high for large adders ! ! !

1212Faster Addition2nd Level of AbstractionExample: 16-bit adder. assume that we have four 4-bit carry-lookahead addersThese 4-bit adders will be designed to produce supper generate (G) and propagate (P) signals P the four bits propagate a carry to the next four bits G the four bits generate a carry to the next four bitsThe super carry signals are fed to a separate carry generation unit

134-bit CLAc0A3-A0B3-B0S3-S0P0G0

13Faster AdditionNeed to generate the carry propagate and generate signals at higher levelThink of each 4-bit adder block as a single unit that can either generate or propagate a carry. 144-bit CLAC0A3-A0B3-B04-bit CLAA7-A4B7-B44-bit CLAA11-A8B11-B84-bit CLAA15-A12B15-B12S15-S12Carry Generation UnitC4S11-S8S7-S4S3-S0C1C2C3P0G0P1G1G2P2G3P314Faster AdditionSuper propagate signalsP0 = p3p2p1p0 (how can the first 4-bit adder propagate c0?)P1 = p7p6p5p4P2 = p11p10p9p8P3 = p15p14p13p12Super generate signalsG0 = g3+(p3 g2)+(p3p2g1)+(p3p2p1g0)G1 = g7+(p7 g6)+(p7p6g5)+(p7p6p5g4)G2 = g11+(p11 g10)+(p11p10g9)+(p11p10p9g8)G3 = g15+(p15 g14)+(p15p14g13)+(p15p14p13g12)Carry signal at higher levels are

C1 = G0 + (P0 c0)C2 = G1 + (P1 G0) + (P1P0c0)C3 = G2 + (P2 G1) + (P2P1G0) + (P2P1P0c0)C4 = G3 + (P3 G2) + (P3P2G1) + (P3P2P1G0) + (P3P2P1P0c0)

1515Faster AdditionEach supper carry signal is two level implementation in terms of Pi and Gi

Pi is one level of gates while Gi is two and expressed in terms of pi and gi

pi and gi are one level of gates

Total delay is 2 + 2 + 1 = 5

16-bit CLA is ~6 times faster than the 16-bit ripple carry adder

1616Designing the ALUWe want to design an ALU thatSupports logic operationsSupports arithmetic operationsSupports the set-on-less-than instructionSupports test for equalityWith special handling to sign extensionzero extensionoverflow detection

323232m (operation)resultABALU4zeroovf1117Designing the ALUWe start by 1-bit ALU Starting with logical operations is easier since they map directly to hardware1801ABOperationResultABA+BTwo operands, two results.We need only one result... Use 2-to MUXThe Operation input comes from logic that looks at the opcodeFunctionOperationA and B0A or B118Designing the ALUHow about addition?

19CinCout+Add an AdderConnect Cin(from previous bit) and Cout (to next bit)Expand Mux to 3-to-1 (Op is now 2 bits)01OperationResultAB201FunctionOperationA and B00A or B01A + B1019Designing the ALUHow about subtraction?

200101AOperationResult+2CoutBInvertBUse the same adder for subtractionDepending operation, choose whether to compute the 2s complement of B or not(MUX or XOR)For 2s complement, define the Binvert signal and set Cin of LSB to 1CinFunctionOperationBInvertCinA and B000xA or B010xA + B1000A - B101120Designing the ALUCan we add the NOR instruction?

210101AOperationResult+2CoutBInvertBNo need to add a NOR gate !!Use Demorgans theorem, an inverter and 2-to-1 MUXCin01AInvertDefine the Ainvert signalFunctionOperationBInvertCinAInvertA and B000x0A or B010x0A + B10000A - B10110A nor B001x121Designing the ALUBuilding the 32-bit ALU Simply, we need to wire up 32 copies of the ALU we designed earlier with special care to the LSB ALU The Cin and Binvert signals are the same, tie them together into one signal BNegate

220101AOperationResult+2CoutBNegateB01AInvertLSB ALU22Building the 32-bit ALU

OperationCoutBNegateALU31Result31CinA31B31CoutALU0A0B0Result0CinCoutB2ALU2Result2CinA2CoutALU1Result1A1B1CoutCinDesigning the ALUNote that the Cin and Bnegate for the LSB are the same in order to compute the 2s complement in case of subtraction23

Designing the ALUSupporting SLT instructionExpand the multiplexer for one more input (Less).Subtract the two registers and feed the sign bit (the result of bit 31) back to the Less input of the LSB ALUThe Less inputs of remaining ALUs is 0. 24

24The second version of 32-bit ALUFor SLT instruction, the MSB is fed back to the LSB while other bits are set to zero! The operation is basically subtraction

Designing the ALUOperationCoutBNegateALU31Result31CinA31B31CoutLessOverFlowSetALU0A0B0Result0CinLessCout0B2ALU2Result2CinA2LessCout0ALU1Result1A1B1CoutLessCin025Designing the ALUSupporting Branch instructionsBasically, subtract two registers! However, we need to generate a signal that indicates whether the result is zero or not. Simply OR the result bits and take the complement. This signal will be used to make the selection between the branch address and the PC. 26

Example on using the Zero signal to select the address for BEQ instruction

26Designing the ALUOperationCoutBNegate0ALU31Result31CinA31B31CoutLessOverFlowSetALU0A0B0Result0CinLessCout0B2ALU2Result2CinA2LessCout0ALU1Result1A1B1CoutLessCin

The 32-bit ALU27Designing the ALUThe 32-bit ALUList of Supported Operations

28FunctionOperationBNegateAInvertA and B0000A or B0100A + B1000A - B1010A nor B0011SLT1110BEQ1010BNE101028Shift OperationsShift operations are commonly needed! MIPS ISA specifies three shift instructionsTwo logical shift instructions SLL$rt, $rs, shift_amount #R[rt] = R[rs] > shift_amountOne arithmetic shift instructionSRA $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amountWhat is the difference?Unlike the SRL, the SRA instruction preserves the sign of the number!Encoding

29oprsrtrdshamtfunct655556R-type29Shift OperationsExample 1. 30srl $t1, $t1, 8 0010 0011 0111 0110 1010 1111 0000 1101$t1 0000 0000 0010 0011 0111 0110 1010 1111$t1andi $t1, $t1, 0x00FF 0000 0000 0000 0000 0000 0000 1010 1111$t12. You want to multiply $t3 by 8 (note: 8 equals 23) 0000 0000 0000 0000 0000 0000 0000 0101$t3sll $t3, $t3, 3# move 3 places to the left 0000 0000 0000 0000 0000 0000 0010 1000$t3(equals 5)(equals 40)80000 0000 0000 0000 0000 0000 1111 11111. You need to extract the 2nd byte of a 4-byte word in $t130Shift OperationsHow are these instructions implemented?Outside the ALU

Shift registers slow; shifting by one bit requires one cycle!

Barrel ShiftersA digital circuit that can shift a data word by a specified number of bits in one clock cycle, if long enough! Simply a set of multiplexors !

3131Shift OperationsExample 2. 4-bit barrel shifter (rotate to left by 0, 1, 2, or 3 bits) 324-bitBarrelShifter4D4YS1S0Shift ValueOutputS1 S0Y3 Y2 Y1 Y00 0D3 D2 D1 D00 1D2 D1 D0 D3 0D1 D0 D3 D21 1D0 D3 D2 D1D0D3D2D1Y0

D1D0D3D2Y1

D2D1D0D3Y2

D3D2D1D0Y3

32Multiplication33Multiplying two 3-digit numbers A and Bn partial products, where B is n digits longn - 1 additions6 x 5Equals 30Each partial product is either: 110 (A*1) or 000 (A*0)Note: Product may take as manyas two times the number of bits!In Binary...4 2 1x 1 2 31 2 6 38 4 2+ 4 2 15 1 7 8 31 1 0x 1 0 11 1 00 0 0+ 1 1 0 1 1 1 1 0MultiplicandMultiplier33MultiplicationMultiplication Steps

341 1 01 1 0 01 1 0 0 01 0 1 1 0 11 1 00 0 0 0+Step1: LSB of multiplier is 1 Add a copy of multiplicandx0 0 1 1 01 1 1 1 01 1 0 0 0Step2: Shift multiplier right to reveal new LSBShift multiplicand left to multiply by 2Step 3: LSB of multiplier is 0 Add zeroStep 4: Shift multiplier right, multiplicand leftDone!Thus, we need hardware to:1. Hold multiplier (32 bits) and shift it right2. Hold multiplicand (32 bits) and shift it left (requires 64 bits)4. Add the multiplicand to the current result3. Hold product (result) (64 bits)Step 5: LSB of multiplier is 1 Add a copy of multiplicandStep 6: Add partial products34MultiplicationMultiplication Hardware

35Control64-bitProduct64 bitWriteMultiplicand64 bitShift LeftMultiplier32 bitShift Right1. Hold multiplier (32 bits) and shift it right2. Hold multiplicand (32 bits) and shift it left (requires 64 bits)4. Add the multiplicand to the current result3. Hold product (result) (64 bits)5. Control the whole processLSB35MultiplicationExample 3. (4-bit multiplication)

36Multiplicand MultiplierProductxxxx1101 010100000000Initial Values1-->Add Multiplicand to ProductShift Mcand left, Mplier right0-->Do nothingShift Mcand left, Mplier right1-->Add Multiplicand to ProductShift Mcand left, Mplier right0-->Do nothingShift Mcand left, Mplier rightControl8-bit0000000008 bitWritexxxx1101 8 bitShLeft01014 bitShRightxxx11010 001000001101+xx110100 000100001101x1101000 000001000001+11010000 00000100000136MultiplicationA Cheaper Implementation Even though were only adding 32 bits at a time, we need a 64-bit adderInstead, hold the multiplicand still and shift the product register right!Now were only adding 32 bits each time

3732-bit32 bit32 bitControlRH Product64 bitWriteMultiplicandMultiplierShift RightLH ProductShift RightExtra bit for carryout37MultiplicationA Cheaper than the Cheaper Implementation Note that were shifting bits out of the multiplier and into the productWhy not put these together into the same register?!!As space opens up in the multiplier, overwrite it with the product bits

3832-bit32 bitControlMultiplier64 bitWriteMultiplicandLH ProductShift RightLSB38MultiplicationFast MultiplicationUse 31 32-bit adders to compute the partial productsOne input is the multiplicand ANDed with a multiplier, and the other is the partial product from previous step. Question? Show the multiplication tree to compute 5 X 3. Assume unsigned numbers represented using 3 bits and we have 4-bit ALU.

39

39MultiplicationMIPS MultiplicationTwo multiplication instructions mult $s0, $s1 # hi||lo = $s0 * $s1multu$s0, $s1 # hi||lo = $s0 * $s1

The result is 64 bits and it stored in two special registers LO holds the lower 32 bits of the result Hi holds the upper 32 bits of the result The contents of these registers can be read using two special instructions

40mfhi $t5 # move Hi to register $t5mflo $t6 # move Lo to register $t6oprsrtrdshamtfunct655556R-type40MultiplicationMIPS Multiplication (NOTES)Both multiplication instructions ignore overflow!It is the responsibility of the software to check if the result fits into 32 bits ! For MULTU, there is no overflow if hi is 0For MULT, there is no overflow if hi is the replicated sign of lo

Question! Modify the designed multiplier to support signed multiplication.

4141Division42dividendquotientdivisorremainder4832315-4533-3032-3023-15832217314531001001101-0001001-1011000-101110-101101111-000011Dividend = Divisor * Quotient + RemainderIdea: Repeatedly subtract divisor. Shift as appropriate.42Division43010010010101-010100000111001001001-0010100000100001-0001010000001101-0000101000000011-00000101000000111001001101-0001001-1011000-101110-101101111-000011Looking at the alignment a little differentlyMake the dividend 8 bits and the divisor 4 bits by filling in with 0sEach iteration, re-express the entire remainder as 8 bitsNote: At any step, the dividend = divisor * quotient + current remainderTry subtracting the divisor from the current remainder each time if it doesnt fit, restore the remainder43Division44Division Hardware1. Hold divisor (32 bits) and shift it right (requires 64 bits)2. Hold remainder (64 bits)4. Subtract the divisor from the current result3. Hold quotient (result) (32 bits) and shift it left5. Control the whole processControl64-bitRemainder64 bitWriteDivisor64 bitShift RightQuotient32 bitShift LeftAlgorithminitialize registers (divisor in LHS);for (i=0; i

Arithmetic for Computers

Documents

Transcript of Arithmetic for Computers