Download - Computer Organization: Basic Processor Structure · Computer Organization: Basic Processor Structure James Gil de Lamadrid April 17, 2018 Computer Organization: Basic Processor Structure

Computer Organization: Basic ProcessorStructure

James Gil de Lamadrid

April 17, 2018

Computer Organization: Basic Processor Structure

Chapter 1: Overview

I Computer Science students start by learning a high-levellanguage. We study what is below the high-level code theywrite.

I We break our study into two areas:I Computer Organization - the study of the implementation of

the computer.I Computer Architecture - the study of he interface to the

computer.


High-Level Laguages

I Programming languages are classified by level.

I Low level languages are closer to the hardware.

I High level languages manipulate more abstract datastructures.

Examples

I Haskell - a functional language.I C++ - an object-oriented language.


Machine Language

I Machine language is numeric.

I A machine instruction is a collection of fields, or numbers therepresent the information given in the instruction.

I Instruction format: op-code, destination, source, constant

I Machine instructions operate on registers


Machine Language (cont.)

Examples

Source code:x = 5 + y * 3;

Machine code:1, 1, 2, 3

14, 1, 1, 5

Meaning: (Registers R1, and R2 are used to represent the variablesx , and y , respectively.)

R1 = R2 * 3

R1 = R1 + 5


Assembly Language

I Assembly language is a symbol version of machine language.

I Numbers forming parts of the machine instruction, are givensymbolic names.

I The programmer is relieved of remembering the meanings ofnumbers.

Examples

Assembly code:mult R1, R2, #3

add R1, R1, #5


Compilers & Assembly Language

I High-level source code must be translated into machine code,to able to execute on hardware.

I Translation is done in several stages. In the first stage. Sourcecode is often translated into Assembly code.

I The translation Process

1. Parse - the source code is translated into an abstractrepresentation, often an abstract syntax tree (AST).

2. Generate Code - the AST is traversed, and as it is, code foreach node assembly code for each node is written.


Compilers & Assembly Language (cont.)

Examples

Example AST

=

x+

5 *

y3


Assemblers & Object Code

I The assembler translates assembly code to object code

I Object code is incomplete machine code.

I The assembler has trouble completing the machine codebecause of external references.

I A module containing a reference to a definition from anothermodule has an external reference.


External References

Module Q contains:extern int x;

x = 5;

Module Driver contains:int x;

In assembly language, this would be

I Q:store x, #5

I Driver (Allocate a word in memory for the variable x .):x: .word


External References (cont.)

I Translating the store into machine language might yield thefollowing (We assume the op-code for store is 19, and x hasbeen allocated memory location 50.):

19, 50, 5

I But, the assembler only analyses one module at a time, andand cannot determine what memory location has beenallocated to x .

I Instead the assembler produces the following object codeinstruction, with a blank left for the address of x , when it iseventually calculated.

19, x?, 5


Compiler vrs. Assembler

I The compiler parsing activity is complex.

I The code generation is complex, often producing severalassembly instructions for each high-level statement.

I Assembler translation is little more than looking up symbols ina symbol table.

I The numeric field values are assembled into a full instruction.


The Linker & Executable Code

I The input to the linker is a set of object code files.

I The output is a single executable code file.I Linker tasks:

I Resolve external references.I Library search.I Relocation of object code modules.


The Linker & Executable Code (cont.)

I Resolving External references:The linker sees both the module Q, and Driver. It cancalculate the address of x in Driver, and fill in the blank in theQ module.

I Library searches:The linker pulls in modules from the library, and adds them tothe executable code, in order to resolve some externalreferences.

I Relocation:Modules are assigned an order in memory. The addresses inthe module must be adjusted to reflect the module’s position.


Library Search Example

Examples

(A0 is the argument register, used to pass an argument to afunction, and RV is the return value register, used to pass a valueback from a function.)

Source code:z = sqrt(y);

Assembly code:load A0, y

call sqrt

store z, RV


Relocation Example

I Module Driver has addresses 0 - 2,999.

I Module Q has addresses 0 - 1,999.

I Module q is placed after Mudule Driver. The base address ofmodule Q is now 3,000.

I All addresses from Module Q must be modified by adding3,000 to them.


The Loader

I The loader:I Relocates the executable code.I Initializes registers.

I The loader loads an executable program into its own sectionof memory called its workspace.

I Several programs (processes) can be active simultaneously.

I The processor executes small pieces of each process (calledquanta) in rapid succession, making it appear that allprocesses in memory are running simultaneously.

I Depending on the location of the program workspace,addresses in the executable code will need to be altered yetagain.

I Several registers, with special uses must be initialized beforethe program is started.


Initializing the PC Register

The Program Counter (PC) is a register that contains the memoryaddress of the next instruction to be executed. It must be updatedeach time an instruction is executed. Initially, it must be set topoint to the base address of the program workspace.

PC2350

2000

workspace

Current Instruction2349

Memory


Translation Summary


The Processor

I Levels of abstraction for HardwareI The register transfer level (RTL), or behavioral level.I The gate level, or structural level.

I Processor BehaviorThe processor repeatedly executes the machine cycle, thatreads a single instruction from memory, and executes it.

I Steps in the machine cycle.

1. Fetch an instruction from memory.2. Decode the instruction. (Split the instruction into fields.)3. Execute the instruction.


Processor Structure

I The processor contains registers for storage. Collectively theyare referred to as the register file.

I An arithmetic logic unit (ALU) performs operation of datastored in registers.

I The way the devices in the processor are connected is calledthe data-path.

I The circuit that controls the data-path, and all devices is thecontrol unit.


The Data-Path

Examples (Simple 2-register data-path.)

R1← R1 + R2R2← 0

Corresponding circuit:

R1

R2

+

0

Operations performed:

1. Add the contents of R1, and R2, and put the result in registerR1.

2. Set register R2 to zero.

Input the registers is calculated by circuitry called a computationalunit.


Control Circuitry

Examples (Simple 2-input control.)

S1 : R1← R1 + R2S2 : R2← 0

Corresponding circuit:

R1

R2

+

0

S1

S2

LD

LD

Registers are opened for input when the load (LD) line is triggeredby the control inputs.These descriptions are register transfer level (RTL). RTL shows acollection of connected devices.


Digital Circuitry

I Below the RTL level is the digital circuit level, or gate level.

I Gate level circuits are composed of gates.

I Digital circuits represent Boolean values as voltages (maybe0V for false, and 5V for true).

I Gates compute Boolean functions, from input signals.

Examples (AND gate: computes z = a ∧ b.)

ab z


Combining Gates into Larger Functions

Examples (Circuit that computes z = a ∧ b ∨ ¬c)

a

b

c z

(Uses an OR gate, a NOT gate, or inverter.)


Chapter 2: Number, and Logic Systems

Topics covered:

I Computer systems use the base two (binary) number system.

I This system is cumbersome for people. A system that is lessfor people, but still easily translatable to binary is hexadecimal.

I The circuitry in computer systems is based on Booleanalgebra.


Numbers

I Binary has two digits: 0, and 1.

I A binary digit is called a bit.

I Numbers are stored in a collection of bits, of fixed width. Thecollection of bits is called a processor word. A 13 in a 4-bitword would be 1101. In an 8-bit word, it would be 00001101.

I Decimal expansion:365 = 3× 102 + 6× 101 + 5× 100.

I Digits: The leftmost digit is referred to as the high-orderdigit, and the rightmost digit is the low-order digit.

I Decimal is base 10 (the radix in the expansion is ten), and hasten digits: 0 through 9.


Binary Numbers

I Binary expansion:00110101 = 0× 27 + 0× 26 + 1× 25 + 1× 24 + 0× 23

+1× 22 + 0× 21 + 1× 20 = 25 + 24 + 22 + 20

I Converting from binary to decimal: simple do the calculationsin the binary expansion in decimal.

00110101 = 25 + 24 + 22 + 20 = 32 + 16 + 4 + 1 = 53


Binary Numbers (cont.)

Converting from decimal to binary.

Examples (Converting 365 to binary using successive division.)Calculation Quotient Remainder

365÷ 2 182 1182÷ 2 91 091÷ 2 45 145÷ 2 22 122÷ 2 11 011÷ 2 5 15÷ 2 2 12÷ 2 1 01÷ 2 0 1


Understanding Successive Division

Successive division in decimal:

ExamplesCalculation Quotient Remainder

365÷ 10 36 536÷ 10 3 63÷ 10 0 3

I Each division pulls off one digit of the number.I Low-order digits are extracted first.I Division by 10 extracts decimal digits. Division by 2 extracts

bits.I To form a binary number outof the results of successive

division, list the remainders from last extracted to firstextracted, left to right. For the example that would be 365 =101101101.


Hexadecimal Numbers

I Hexadecimal is base 16, with 16 digits: 0, 1, 2, 3, 4, 5, 6, 7,8, , A, B, C, D, E, F. (A - F represent the digits 10 - 15.)

I To convert from hexadecimal to decimal, use the hexexpansion.

A3F = 10×162 +3×161 +15×20 = 2, 560+48+15 = 2, 623


Hexadecimal, & Binary

I Converting hex from/into binary. A single hex digit is fourbinary digits.

I To convert from hex to binary, replace each hex digit with itscorresponding 4-bit representation.

I To convert from binary to hexadecimal, replace each group offour bits by the corresponding hex digit.

Examples

A3F = 1010 0011 11110010111010001011 = 0010 1110 1000 1011 = 2E 8B


Adding Binary Numbers

Examples (Decimal addition)0 10

13

06

05

+11922557

I You add column by column.

I In each column, you add two operand digits, and a carry-indigit.

I Each addition results in a sum digit, and a carry-out digit.


Adding Binary Numbers (cont.)

Examples (Binary addition)0 01

10

11

01

+00111110

I Carry-in to the low-order column is 0.

I A carry-out of 1 occurs when the column sum is greater thanor equal to 2.

I When a carry-out occurs, the sum digit is the sum minus 2.


Representing Negative Numbers

I Computers support two numbering systems:I Unsigned integers - all bit configurations of teh word are used

to represent non-negative integers.I Signed integers - half of the processor word bit configurations

are used to represent negative integers, and half are used torepresent non-negative integers.

I For signed integers, the top bit is the sign bit.I A 0 bit indicates a non-negative number.I A 1 bit indicates a negative number.


Signed Notations

Notation 107 -107

Sign-magnitude 0 1101011 1 1101011One’s Compliment 0 1101011 1 0010100Two’s Compliment 0 1101011 1 0010101

Notations:

I Sign-magnitude - formed by writing the magnitude in binary,and tacking on the correct sign bit.

I One’s compliment - formed by inverting every bit in thenumber.

I Two’s compliment - formed by adding 1 to the one’scompliment.


Signed Notations (cont.)

I Problem: sign-magnitude has two values of 0:I +0: 00000000I -0: 10000000

I Problem: one’s compliment also has two values of 0:I +0: 00000000I -0: 11111111

I

Examples (Two’s compliment of +107 = 01101011.)

One’s compliment: 10010100

0 01

00

00

01

00

01

00

00

+110010101


Desirable Properties of Two’s Compliment

1. There is only one representation of 0. (This can be seen bytaking the 2’s comp. of 0.) (Taking the 1’s comp. of 0 andadding 1.)

1 11

11

11

11

11

11

11

01

+100000000

2. Negation is idempotent. (−− a = a)2’s-comp(00011010) = 111001102.s-comp(11100110) = 00011010

3. The negative of a number is it’s additive inverse.(a +−a = 0) As an example, we do 26 +−26.

1 10

10

10

11

11

10

01

00

+1110011000000000


Shortcut 2’s Comp. Calculation

A copy transformation is used to calculate the 2’s comp. of abinary number.

00111001

00100110

trail. 0's1st 1

rest

as isas is1's comp.


Boolean Algebra

I Boolean algebra is an algebra, like arithmetic algebra, in whichwe form expressions from operators, and operands.

I Arithmetic algebra, the expressions are used to describefunctions that operate on numbers.

I In Boolean algebra the expressions operate on Boolean values:false, written a 0, and true, written as 1.

Examples

Arithmetic expression:x + 2 · y

Boolean expression:a · b + a · b

(”+” is the OR operator, and ”·” is the AND operator.)


AND, OR, and NOT Operations

Truth tables:

a b a · b a + b

0 0 0 00 1 0 11 0 0 11 1 1 1

a a

0 11 0

I The truth table shows the output of a Boolean function, forevery possible value of input.

I It is split into an input half, and an output half.

I To produce all input values, count in binary, with each rowhaving a different count in the input half. (In the table forAND, and OR, this would give 2-bit counts of 00, 01, 10, and11.)


Other Common Boolean Operators

Operators XOR, XNOR, NAND, and NOR.

a b a⊕ b a� b a · b a + b

0 0 0 1 1 10 1 1 0 1 01 0 1 0 1 01 1 0 1 0 0


Operation Summary

I a · b (AND): outputs 1 iff all of its operands are 1.

I a + b (OR): outputs 1 if any of its operands are 1.

I a (NOT): outputs 1 only if its operand is 0.

I a⊕ b (XOR): outputs 1 iff its operands are not equal.

I a� b = a⊕ b (XNOR): outputs 1 iff its operands are equal.

I a · b (NAND): outputs 1 only iff at least one of its operands is0.

I a + b (NOR): outputs 1 iff all of its operands are 0.


Boolean Expressions, & Truth Tables

Examples

g = (ab + c)⊕ (ac + b)a b c ab ab ab + c c ac b ac + b g

0 0 0 0 1 1 1 0 1 1 00 0 1 0 1 1 0 0 1 1 00 1 0 0 1 1 1 0 0 0 10 1 1 0 1 1 0 0 0 0 11 0 0 0 1 1 1 1 1 1 01 0 1 0 1 1 0 0 1 1 01 1 0 1 0 0 1 1 0 1 11 1 1 1 0 1 0 0 0 0 1

I Boolean operators are combined to form Boolean expressions.I To build a truth table from a Boolean expression, form

columns for intermediate subexpressions.


Boolean Expressions, & TruthTables (cont.)

Examples (Converting from table to equation.)a b c h

0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 01 0 1 11 1 0 11 1 1 1

h = abc + abc + abc + abc


Table to Equation

I h is 1 only if a is 0, b is 1, and c is 0, or a is 1, b is 0, and cis 1, or a is 1, b is 1, and c is 0, or a is 1, b is 1, and c is 1.

I These correspond to the rows in the truth table that haveoutput of 1.

I The multiplicative terms that contain all input variables arecalled minterms.

I Minterms correspond to rows in the truth table.

I They are often referred to by there number. Reading theinput values of a row as a binary number yields the number.For example for a = 0, b = 1, and c = 1, we get the mintermnumber 011, so abc is Minterm 3.


Don’t Care Conditions

An analogous incomplete function.

B(n, k) =

B(n − 1, k) + B(n − 1, k − 1), 0 < k ≤ n

1, n = k

1, k = 0

I When k = 0, the value of n doesn’t matter - we don’t carewhat it is; the function always returns 1.

I In Boolean algebra, we indicate don’t care conditions with thesymbol ”X”.


Don’t Care Conditions (cont.)

Examplesa b c f g

0 0 X 0 10 1 0 1 X0 1 1 X 11 0 0 0 11 0 1 1 01 1 X 1 0

I When the don’t care is on the output, we do not care whatthe output is, and the designer can choose what to output, tooptimize a circuit.

I When the don’t care is on the input side, the given output isfor both a 0, and a 1 value of the input. (The last line of thetable is for both Minterm 110, and Minterm 111.)


Boolean Simplification using Identities

Identities allow us to transform expressions into equivalentexpressions.

Examples (Arithmetic expression transformation using thedistributive law.)

(2a + 6) · (2a− 6)= (2a + 6) · 2a− (2a + 6) · 6= 22a2 + 6 · 2a− (6 · 2a + 62)

(Distributive Law: a(b + c) = ab + ac.)

There are other identities that allow further transformation.


Boolean Identities

Simplifying Boolean expressions allows us to build circuits thathave fewer components, consume less power, are faster, and takeless physical space.

Identities:

1. Double negation: a = a

2. Contradiction: a · a = 0

3. Tautology: a + a = 1

4. Commutativity: a + b = b + a, a · b = b · a5. Associativity: a + (b + c) = (a + b) + c , a · (b · c) = (a ·b) · c6. Identity elements: a + 0 = a, a · 1 = a

7. Zero elements: a + 1 = 1, a · 0 = 0

8. Idempotency: a + a = a, a · a = a


Boolean Identities (cont.)

Identities:

9. Distributive:a · (b + c) = a · b + a · c , a + bc = (a + b) · (a + c)

10. DeMorgan’s: a + b = a · b, a · b = a + b

11. Definition of XOR: a⊕ b = a · b + a · b

I DeMorgan’s Law specifies how to bring a negation into agroup. It also specifies two algebraic forms for the NAND,and NOR operators.

I The XOR operator has an algebraic equivalent. So does theXNOR operator:

a� b = ab + ab


Example Simplification using Identities

Examples

(ab + c)⊕ bc

= ab + c · bc + (ab + c) · bc (R11)

= (ab · c)(b + c) + (ab + c)bc (R10)

= (ab · c)(b + c) + (ab + c)bc (R1)

= (a + b) · c)(b + c) + (ab + c)bc ((R10)= (a + b) · c(b + c) + (ab + c)bc (R1)= c · (a + b)(b + c) + bc · (ab + c) (R4)= (c · a + cb)b + (c · a + cb)c + cabb + bc (R8, R4, R10)= b(c · a + cb) + c(c · a + cb) + 0 + bc (R4,R2, R7)= bc · a + bcb + c · c · a + c · cb + bc (R6, R9)


Example Simplification Using Identities (cont.)

Examples

= bc · a + 0 + c · a + cb + bc (R8, R4, R2, R7)= c · a(b + 1) + b(c + c) (R6, R4, R9)= c · a · 1 + b · 1 (R3, R7)= c · a + b (R6)

I Algebraic simplification is difficult, requiring strategicplanning.

I To allow automation of simplification, a more mechanicalmethod is needed.


Boolean Simplification using Karnaugh-Maps

I There are only four Boolean functions with less than twoparameters.

1. f0 = 02. f1 = 13. fidentity (x) = x4. finverse(x) = x

I The smallest interesting functions have two independentvariables.

I K-maps come in differing sizes, depending on the number ofindependent variables.


K-Maps of Two Variables

Examples

g = ab + ab + ab

a b g

0 0 10 1 01 0 11 1 1

g b

a

0 1

0

1

1

1 1

0

g = a + bComputer Organization: Basic Processor Structure

Combining Cells in the K-Map

The 2-variable K-mapr is a square with one variable on each axis.

Cell combination:

1. Adjacent cells that contain 1 can be combined.

2. Combined cells must form a rectangular group.

3. The size of a group must be a power of two.

4. The groups copied out must cover all cells that are 1. (Notice,however, that cells may be covered by several groups.)

5. The group names are ORed together, to form a simplifiedequation.

6. Group names are the AND of all variables that do not changetheir value, in the group.

7. The covering groups must be as large as possible.


K-Maps for Functions of Three Variables

Examples

a b c h

0 0 0 10 0 1 00 1 0 10 1 1 01 0 0 11 0 1 11 1 0 11 1 1 0

h

a

bc

0

1

00 01 11 10

1 1

1 1 1

0 0

0

h = c + ab


K-Maps for Functions of Three Variables (cont.)

I The 3-varaible K-Map is two 2-varaible K-maps stucktogether..

I The vertical axis is one of the variables, and the horizontalaxis is both of the other two variables.

I The horizontal coordinates are listed in Gray code sequence.

I Between elements of the Gray code sequence, only one bitchanges.

I K-Maps wrap around, both vertically, and horizontally. Thismeans that the cells on the let are next to the cells on theright of the Karnaugh-map.


K-Maps for Functions of Four Variables

Examples (Four variable)a b c d z

0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 0

z

ab

cd

00

01

11

10

00 01 11 10

1 1 1 1

1 1

1

1 1 1 1

0 0

0 0 0


K-Maps for Functions of Four Variables (cont.)

Examples (Four variable (cont.))

z = b + cd + a · cd

More than 4-variable K-maps become large, and it is best to use asoftware authoring tool to do simplification, rather than draw amap by hand.


Don’t Care Conditions in Karnaugh-Maps

Examples

a b c m

0 0 0 10 0 1 X0 1 0 00 1 1 11 0 0 X1 0 1 11 1 0 X1 1 1 0

m

a

bc

0

1

00 01 11 10

1 1

1X

X

X

0

0


Don’t Care Conditions in Karnaugh-Maps (cont.)

I Don’t cares in the output can be assigned either a value of 0,or 1, to yield the best simplification, allowing larger groups tobe pulled out of the K-map.

I Without using the don’t cares.m = abc + abc + abc

I Using the don’t cares.m = b + ac.


Chapter 3: Digital Circuitry

I Processors are digital circuits.I Digital circuits have wires that carry one of two possible

signals.I low : a low voltage, like 0V.I high: a high voltage, like 5V.

I we are not concerned with the actual voltage, and so we callthese signals 0, and 1.

I How 0, and 1 are assigned to voltage is irrelevant to us.I Types of digital circuits:

I Combinational circuits: they have no memory. The outputscan change immediately when the inputs are changed.

I Sequential circuits: they have memory. The outputs may notchange when the circuit is ”remembering” a previous value.


Combinational Circuits

Logical gates:

ab ab a

ba + b

a a

ab a + b a

b ab

ab a + b

ab a b

(AND, OR, NOT (inverter), XOR, NAND, NOR, XNOR)


Using Gates

Examples

Boolean function.f = (a⊕ b)(b + c)

Schematic.

a

b

c

f


Using Gates (cont.)

Examples (cont.)

Alternate drawing.

a

b

c

f


Buffers

The triangle on the inverter is a buffer, and the open circle is theinversion element.

Inverter types:

a a

a

a

c

m

a

(inverter, simple buffer, tri-state switch)


Simple Buffer

It boosts power. It is use in fanout situations, where splitting asignal weakens it.

x

x

x

x

x

x

x

x


Tri-State Switch

The control line, c, when cleared, turns the flow off (sets theoutput to a state of high impedance, Z ). The output has threestates: Z , 0, and 1.

a c m

0 0 Z0 1 01 0 Z1 1 1


Common Combinational Circuits

I The decoder - transforms a numeric code into trigger signals.

I The encoder - translates trigger signals into a code.

I The multiplexer - routes multiple inputs imnto a single outputline.

I The adder - Adds binary signals that represent numbers.


The Decoder

x1

x0 p2

p0

p1

p3

Dec2-4

I A decoder is a switch. It turns on (sets) one of several outputlines, and turns off (clears) the rest.

I The code x gives the index of the line to turn on.

I As an example, if x = 01, p1 would be 1, and all otheroutputs would be 0.

I Decoder sizes: k − 2k . k is the number of inputs, 2k is thenumber of outputs.


The Decoder (cont.)

Examples (4-1 decoder)x1 x0 p0 p1 p2 p3

0 0 1 0 0 00 1 0 1 0 01 0 0 0 1 01 1 0 0 0 1

p0 = x1 · x0

p1 = x1x0

p2 = x1x0

p3 = x1x0

x1

x0

p0

p1

p2

p3


The Encoder

x1

x0p2

p0p1

p3

Enc4-2

I An encoder checks several circuits, with only one circuit on(set), and reports a code indicating which circuit is.

I The code, x , gives the index of the line that is on.

I As an example, if p0 = 0, p1 = 0, p2 = 1, and p3 = 0, thenthe output x would be 10.

I Encoder sizes: 2j -j .


The Encoder (cont.)

Examplesp0 p1 p2 p3 x1 x0

1 0 0 0 0 00 1 0 0 0 10 0 1 0 1 00 0 0 1 1 1

(Rows that are not shown aredon’t cares.)

x1 = p2 + p3

x0 = p1 + p3

x1

p0p1

p2p300 01 1011

00

01

10

11

11

0

0

X X

X X X

X X X

X X

X

X

x0

p0p1

p2p300 01 1011

00

01

10

11

01

0

1

X X

X X X

X X X

X X

X

X

x0

x1

p0

p3

p2

p1


Encoder Schematic

Examples (cont.)

x0

x1

p0

p3

p2

p1


The Multiplexer (MUX)

MUX4-1 p

i0i1

i3

i2

s1 s0

I A MUX routes one of several inputs to a single output.

I Only one input is allowed to pass through. The other inputsare stopped.

I The input allowed through is specified by the code s.

I As an example if s = 11, the output p would be whatever ison the line i3.

I MUX sizes: 2k − 1. Width of the selector line s: k bits.


The Multiplexer (cont.)

Examplesi0 i1 i2 i3 s1 s0 p

0 X X X 0 0 01 X X X 0 0 1X 0 X X 0 1 0X 1 X X 0 1 1X X 0 X 1 0 0X X 1 X 1 0 1X X X 0 1 1 0X X X 1 1 1 1

p = i0s1 ·s0+i1s1s0+i2s1s0+i3s1s0

(Simplification is either byK-map, or by copying out eachminterm, ignoring the don’t careconditions.)


MUX Schematic

Examples (cont.)

s1

s0

i1

i0

i2i3

p


MUX Composition

Examples (4-1 MUX from 2-1 MUX’s)

MUX2-1

MUX2-1

MUX2-1

i0

i1

i2

i3

s0 s1

p

I The MUX’s are structuredinto a tournament, in theprocess called interleaving.

I The low-order bit, s0, isused to choose the betweenodd, and even indexes, inthe first round.

I The high-order bit, s1,chooses between the twofirst-round-heats, in the finalround.


The Adder

cin 0 1a 0 1

+b +1 +1scout 01 11

a

b

cin

cout

s+

An adder adds three 1-bit numbers, a, b, and cin, to form a sumbit, s, and a carry bit, cout .


The Adder (cont.)

cin a b cout s

0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

s = cin ⊕ a⊕ bcout = cina + cinb + ab

Checkerboard pattern for cout :

I XOR - odd parity cellcoordinates (the oddfunction).

I XNOR - even parity cellcoordinates (the evenfunction).

couts

cin cin

abab

0 0

11

00 01 11 10 00 01 11 10

1

1 11

0

0

0 0

1

1

1

1

0

0

0

0


Adder Schematic

a

b

cin

s

cout


The Ripple-Carry Adder

To add multi-bit numbers we use several adders, one per column ofthe long addition problem, to add the a operand, b operand, andthe carry-in. The carry-out becomes the carry-in of the nextcolumn.

1 11

10

11

01

+11011000

Notice that the carry ”ripples” up from the bottom column, to thetop. (The calculation of one column has to wait until thecalculation of the previous column is complete.)


The Ripple-Carry Adder (cont.)

a0

a1

a2

a3

b0

b1

s0

b3

b2

s1

s2

s3

cin

cout

+1

+0

+3

+2

a

b

+4-bit

s

4

cin

cout

4

4

(4-bit bus line in the interface diagram indicate inputs of fourlines.)


Sequential Circuits

I Sequential circuits are called sequential because the flowthrough a sequence of states.

I Code example:sum = 0

for i = 1 to n do

sum = sum + i

I The state is the variables, and their values.


The Clock

The clock is a device that produces regular ”beat” type signal.

0

1

t

period

I The signal has a rising edge, and a falling edge.I The time for one cycle is called the period.I The frequency is the number of cycle per second.

F = 1P , where F is the frequency, and P is the period.

I The unit of measurement for frequency is a Hertz. 1 Htz = (1cycle) / (1 second). (50MHtz = 50,000,000 cycles persecond.)


The Clock (cont.)

I The clock is used to synchronize the state changes ofsequential circuits.

I On of the two signal edges is designated as the trigger edge.

I All all state changes occur on the trigger edge. This simplifiesthe interaction between circuits.

I In our discussion we assume that the trigger edge is the risingedge.

I Although must circuitry in the processor are synchronized,there are a small number of asynchronous circuits. Not havingto wait for the trigger edge for a state change helps speed upasynchronous circuitry.


Storage Devices

Types:

I The latch - an unclocked device that stores one bit.

I The flip-flop - a 1-bit clocked storage device.I Device subtypes:

I D-typeI J-K-type


The D-latch

D-latchD

C

Q

Q

The D-latch is controlled by the input C . When C = 1, the latchis loaded with the value D. When C = 0 the latch locks its currentvalue, ignoring D. The output Q is the value stored in the latch.

D-latch exitation table:

D C Q(1)

X 0 Q(0)

0 1 01 1 1

(Q(0) - old latch value, Q(1) - new latch value)


The D-latch (cont.)

Examples

Example timing diagram for the D-latch.

D

C

Q


The D-flip-flop

DD

Clk

Q

Q>

On the D-flip-flop, the control signal is the clock. The flip-floponly loads exactly at the trigger edge.

D-flip-flop excitation table:

D Clk Q(1)

X ↑/ Q(0)

0 ↑ 01 ↑ 1

(Arrows indicate passing trigger edge.)


The D-flip-flop (cont.)

Examples

Example timing diagram for the D-flip-flop.

D

Clk

Q


The J-k Storage Devices

>J-K-latch J-K

JJ

KK

Clk

Q Q

Q Q

J-K excitation tables.

J K Q(1)

0 0 Q(0)

0 1 01 0 1

1 1 Q(0)

J K Clk Q(1)

X X ↑/ Q(0)

0 0 ↑ Q(0)

0 1 ↑ 01 0 ↑ 1

1 1 ↑ Q(0)


The J-K Storage Device (cont.)

Operations of the J-K Device:

I Lock: The device keeps its current value. This operation isspecified with J = K = 0.

I Set: The value of the device changes to 1. This operation isspecified with J = 1,K = 0.

I Reset: The value of the device changes to 0. This operation isspecified with J = 0,K = 1.

I Compliment: The value of the device is toggled from 0 to 1,or from 1 to 0. This operation is specified with J = K = 1.


Flip-Flops with Extra Pins

DD

Clk >

ST CL

LD

Q

Q

I The set pin (ST) is asynchronous (changes do not wait forthe clock pulse, but occur instantly). It initializes the flip-flopto 1.

I The clear pin (CL) is also asynchronous, and initializes theflip-flop value to 0.

I The load pin (LD) disables the clock signal, locking the valueof the flip-flop.


Flip-Flops with Extra Pins (cont.)

It is possible to implement the LD line on flop-flops that do nothave a load input, using a feedback loop.

>

D

D

LD

Clk

Q

Q

0

1


Sequential Design using the FSM

The tool for sequential circuit design is the finite state machine(FSM). The state diagram is a graphical representation of an FSM.

Examples (FSA0)

1

0

1

0

10

1

0

00/1 01/1

10/111/0

I States are the circles. Theirlabels are S/P, where S isthe state number, and P isthe output.

I Transitions are the arrows.They are labeled with I , theinput. Transitions from agiven state must havemutually exclusive labels.


The FSM and State Diagrams

Examples (FSA0 (cont.))

The interface for FSA0.

>

i

ClkpFSA0

The FSM shows the output ateach state, and the transitionform one state to the next, onthe clock pulse, and based on theinput.

Examples (FSA1)

0/10 1/01

00,10

01,11

00,11

01,10

>Clk

ab

c1

c0FSA1


The FSM and the State Transition Table

The state transition table is a tabular representation of the statediagram.

Examples (FSA0)i Q(0)1 Q(0)0 Q(1)1 Q(1)0 p

0 0 0 0 1 10 0 1 0 1 10 1 0 0 0 10 1 1 0 0 01 0 0 0 0 11 0 1 1 0 11 1 0 1 1 11 1 1 1 1 0


The FSM and the State Transition Table (cont.)

I The table has an input half, and an output half.

I In the input half you list the circuit inputs, and the bits of thecurrent state number, Q(0).

I In the output half you list the next state, Q(1), and the circuitoutputs.

I Each row represents a transition.

I Circuit output is based on the current state.


The FSM and the State Transition Table (cont.)

Examples (FSA1)

The transition table for FSA1.

a b Q(0) Q(1) c1 c0

0 0 0 0 1 00 0 1 1 0 10 1 0 1 1 00 1 1 0 0 11 0 0 0 1 01 0 1 0 0 11 1 0 1 1 01 1 1 1 0 1


State Diagrams, and Transition Tables; Building OneRepresentation from the Other

From table to diagram.

I Lay down states using numbers from the current state column.

I Fill in outputs from the output columns.

I Draw arrows, one per row in the state table, from the currentstate to the next state.

I Fill in the input labels on the diagram, from the inputcolumns in the table.


State Diagrams, and Transition Tables; Building OneRepresentation from the Other (cont.)

From diagram to table.

I Create the state table heading, listing out the input variables,the bits of the current state number, the bits of the next statenumber, and the output variables.

I Fill in all possible bit configurations on the input half of thetable.

I On each row, fill in the output for the current state.

I On each row, fill in the next state, using the arrow in thestate diagram corresponding to the row in the transition table.

Bits in the state number: for m states, you will have dlog me bits.


Moore versus Mealy Machines

A Moore machine associates output with the current state only. AMealy machine associates output with the current state, and theinput. The result, in the Mealy diagram, is that the output label ison the transition, and not the state.

Examples (Mealy machine for FSA1)

0 1

00/10,10/10

01/01,11/01

00/01,11/01

01/10,10/10


Implementing a Sequential Design

The Structure of a sequential circuit.

>ControlRegister

Input

Output

Q(0)Q(1)

I The register is a collection of flip-flops that store the currentstate number.

I The control circuit is a combinational circuit that calculatesthe output, and the next state.


Implementing a Sequential Design (cont.)

Examples (FSA0)

Equations for next state, and output are derived using K-maps, inhe usual way.

Q(1)1 = iQ(0)0 + iQ(0)1 = i(Q(0)0 + Q(0)1)

Q(1)0 = i · Q(0)1 + iQ(0)1 + Q(0)1Q(0)0

p = Q(0)1 + Q(0)0

I Use one flip flop to stoer each bit of the current state number.

I The input of the flip-flop is the next state, and the output ofthe flip-flop is the current state.



Examples (FSA0 (cont.))

Schematic of FSA0.

>

>

D0

D1

pi

Q(0)0

Q(0)1

Q(1)0

Q(1)1



Examples (FSA1)

Equations.Q(1) = ab + aQ(0)b + aQ(0)b = ab + a(Q(0) ⊕ b)

c1 = Q(0)

c0 = Q(0)

Schematic.

>D

a

b c0c1


Sequential Circuit Analysis

Going from schematic to FSM. Reverse the procedure used indesign.

Examples

Schematic:

zp

D0

D1

>

>


Sequential Circuit Analysis (cont.)

Examples (cont.)

Equations (byfollowing connectionsin the schematic):

p = Q(0)0 � Q(0)1

Q(1)0 = zQ(0)0

Q(1)1 = Q(0)0 +zQ(0)1

Table:

z Q(0)1 Q(0)0 Q(1)1 Q(1)0 p

0 0 0 1 0 10 0 1 0 1 00 1 0 1 0 00 1 1 0 1 11 0 0 1 0 11 0 1 0 0 01 1 0 1 0 01 1 1 1 0 1


Sequential Circuit Analysis (cont.)

Examples (cont.)

State Diagram (copy out rows as transitions):

00/1 01/0

10/011/1

1

0

0

0,1

0,1

1


Common Sequential Circuits

I Used to store multiple bit binary numbers.

I They use one flip flop to store each of the bits.

I Bit numbering:x = 1100 = x3x2x1x0

I Register types.I Parallel load register.I Shift register.I Counter.


The Parallel-Load Register

It’s a multi-bit flip-flop.The LD input causes the MUX’s to feed the value back, for a lockoperation, or feed in a new value, for a load operation.

D0D1D2D3

> > > >

0

1

0

1

0

1

0

1d0d1d2d3

Q0Q1

Q2Q3

LD


The Shift Register

The input SH controls the operation: SH = 0, to lock the register,and SH = 1 to perform a shift.Input MUX’s implement the operations with feedback loops, or theoutput of the adjacent bit.

Cincout

cin cout

Shift-left

Shift-right


The Shift Register (cont.)

Shl-Reg4-bit

SH

Q

>

4cin

cout

D0D1D2D3

> > > >

0

1

0

1

0

1

0

1

Q0Q1Q2Q3

SH

cout

cin


The Counter

I An input IN chooses an operation: IN = 0, the register islocked, and IN = 1, the register increments.

I The increment takes the register through the sequence 0000,0001, 0010, ..., 1111, 0000, ..., one value per clock cycle.

I It uses an adder to increment.

I The input MUX now chooses between a feedback, or theadder.


The Counter (cont.)

Count4-bit

IN

Q

>

4

cout

D0D1D2D3

> > > >

0

1

0

1

0

1

0

1

Q0Q1Q2Q3

INcout

+ ++ +1

0000


The Standard Register

Reg4-bit>

d Q

cout

LD IN CL

4 4

d

LDINCL

Qcout

0123

Enc

D +>

0000

0001

4

4

4

4

4

4

4

4

2

4


The Standard Register (cont.)

I We combine an increment, a load, and a clear operation toform a register that we use regularly.

I All 4-bit inputs are are shown by abrevieted notation, using abus.

I A MUX chooses between one of four computation units thatcalculate one of the operations.

I An encoder turns the three trigger lines into a code that canbe used to operate the MUX.


Chapter 4: Devices and the Bus

I Devices that interact with the processor are mostly external tothe processor, but on the motherboard

I Device types (collectively knwn as external devices):I Memory devices.I Peripheral devices.

I Connection:I Direct connection - the processor can be connected to each

device using dedicated connections.I Bus connection - the processor is connected via a single shared

line to all devices.

I Comparison:I Wiring complexity - Bus connection produces simpler wiring.I Concurrent communication - Direct connection allows several

devices to communicate with the processor, simultaneously.


Devices and the Bus (cont.)

CPU Mem IO DevIO Dev

Bus


Memory

I Stores multi-bit values.

I Each storage device is calleda word.

I The memory unit has a size:l × w , where l is the lengthof the unit (number ofwords), and w is the widthof the unit (number of bitsper word).

I Words are given addresses(numbers) to identify them.

0

1

2

3

4

5

6

7

Address

Memory8x4


Memory (cont.)

Memory operations:

I Read : produce the contents of a particular memory location.

I Write: store a given value in a particular memory location.

Memory types:

I Read Only Memory (ROM). (Allows a read operation only)

I Random Access Memory (RAM). (does both read and writeoperations)

RAM8x4 ROM

8x4Din

Dout

A

W E

Dout

A

E

33

444


Memory Types

I ROM’s are used, for example, to provide manufacturerinformation to an OS. (like the BIOS)

I RAM’s are the standard working memory in a computer.I Inputs

I A - the address of the word.I Din - the input data for a write operation.I W and E - control the operation on a RAM unit. Assert W

for a write operation, and assert E for a read operation.I Dout - the output data for a read operation.


Memory Types (cont.)

Performing a read operation.

1. Assert he desired address on the A port.

2. Strobe the E line, and allow time for the data to present itselfon the Dout port.

Performing a write operation.

1. Set up the inputs.

1.1 Assert the desired address on the A port.1.2 Assert the desired data on the Din port.

2. Perform the operation by strobing (setting and the thenresetting) the W line.


Memory Composition

The size of a memory unit is 2k ×m, where 2k is its length, and mis its width. The unit would have a k-bit address port, to representaddresses between 0 and 2k .

Examples

An 8× 4 memory has eight 4-bit words.An address is 3 bits (8 = 23). to specify addresses between 0 and7 (000 - 111).

Composition types:

I Horizontal - Creating a wider memory unit out of thinnerunits.

I Vertical - Creating a longer memory unit out of shorter units.


Horizontal Composition

Examples (Building an 8× 4 RAM from two 8× 2 RAMs.)

RAM8x4

A Dout43

E

RAM8x2

RAM8x2

A

E

Dout,3-2

Dout,1-03

3

3 2 1 0 3 2 1 0

RAM 8x4RAM 2x(8x2)

2

2

W

Din,1-0

Din,2-34Din


Vertical Composition

Examples (Building an 8× 4 ROM from four 2× 4 ROMs.)

0

1

23

4

5

6

7

0

1

23

4

5

6

7

ROM 8x4 ROM 4x(2x4)

ROM8x4

A Dout43

E

Dec2-4

0

1

2

3

A0

A1

A2

ROM2x4

ROM2x4

ROM2x4

ROM2x4

E Dout

4

4

4

4

4


Vertical Composition (cont.)

I The ROM is split into four sections. Each section is coveredby a small ROM unit.

I We number the small units, 0 - 3, for our example. The 3-bitaddress is split into a unit number, and an internal address.

(The field sizes depend onthe composition beingperformed.)

A0A1A2

Unit # Int. Address

I The unit number is used to enable the correct ROM, and theinternal address is fed into the ROM as its address signal.

I This, where the unit number is the high-order part of theaddress, is called high-order interleaving.

I When the unit number is the low-order part of the address,that is called low-order interleaving.


Internal Memory Structure

Dec2-4

0

1

2

3

Reg0

Reg1

Reg2

Reg3

2

2

2

LD

LD

LD

LD

W E

Dout

Din

A

2

2

2

2


Internal Memory Structure (cont.)

I Shown is a 4× 2 RAM. Each word is stored in a register.

I An address decoder turns an address into trigger lines.

I AND gates check for the the selected row, and the correctoperation.

I The input, Din, presents itself at each register, and enters theregister only if its LD input is triggered.

I The output, from each row is allowed onto the output bus,Dout , only if the tri-state switch is opened.

I A ROM has the same output structure, and no input.


RAM Types

RAM units can be classified as follows.

I Dynamic RAM (DRAM). It uses capacitors to store bits.(Charged is a 1, and depleted is a 0.) Capacitors leak overtime. A capacitpor memory has to be rewritten (refreshed) topersist.

I Static RAM (SRAM). It uses latches to store Boolean values.

Comparison:

I Access Speed: DRAM units tend to be slower than SRAM.This is because charging capacitors requires a latency.

I Density: Capacitors can be built much smaller than gates, andthe DRAM can be built more compactly than the SRAM.

I Cost: Storage using capacitor technology is cheaper to buildthan storage using the technology used in gates.


ROM Types

I ROM: Standard read-only memory. Contents are burnt in oncreation.

I PROM: Programmable ROM. Chips are originally blank.Using a PROM burner you ip;oad its contents. Once burnt, itis permanent.

I EPROM: Erasable PROM. The chip contains a window,through which you shine UV light, which erases the chipcontents. So, the chip can be reprogrammed.

I EEPROM: Electrically EPROM. The chip is erasable, like theEPROM, only with a special high voltage input pin.


Word and Byte Addressing

I The same memory is used o store both integers, andcharacters, which have radically different sizes.

I A character requires 8 bits (1 byte) to represent 256 possiblekeyboard characters.

I Use a combined memory. For a 16-bit integer, each wordwould be 16 bits. It would be split into 4 bytes, allowing us tostore 4 characters in it.

I Each byte has an address.

I Word addresses are multiples of 4. Byte addresses aremultiples of 1


Word and Byre Addressing (cont.)

Addressing for a 16× 4 memory:

0

2

4

68

10

12

14

RAM 8x16

01

byte

Instructions to store data into a bytemovb M[7], R0

or a wordmovw M[6], R0


Machine Byte Order

Addressing Schemes

I little-endian

I big-endian

0123 0 1 2 3

little-endian big-endian


Peripheral Devices

I Input devices. These are devices from which the processorreads data. The keyboard and pointer devices like the mouseare examples of input devices.

I Output devices. These are devices to which the processorwrites data. The monitor and printer are examples of suchdevices.

I I/O devices. These are devices that combine both an inputelement and an output element. The processor can write to,and read from, these devices. An example of such a device isa disk drive.


Peripheral Device Types

Reg

Reg

In

Out

I/O

Din

Din

Dout

Dout

LD

LD

E

E

Output Device

Input Device

I/O Device


Device Interface

I The output device has a register that is loaded with theoutput value.

I The input device has a switch that lets the input out of theoutput port.

I The I/O device has both interfaces.


Device Polling

I Problem: There is no way to determine when a device is readywith new input/output.

I Solution: Every device has a READY bit associated with it.

I The READY bit is raised to a 1 by the device, when thedevice is free.

I When the read/write operation is performed, the RFEADY bitis lowered to 0 by the processor.

I Before the processor accesses the device, it checks theREADY bit to see if it is 1, indicating the device is ready.


Interrupts

I When using device polling, the processor spends a lot of time“busy waiting”. (In a loop where it checks the READY bit,over and over and over.)

I With interrupts, the processor is sent a signal when theREADY bit is raised. It no longer needs to busy wait.

I The processor can now work on another process, whileprocessing I/O.

I When an in interrupt is received, the processor suspends theprocess it is executing, and jumps to an Interrupt ServiceRoutine (ISR). The ISR handles the interrupt request.

I When the ISR is done, the processor jumps back to where itleft off in the other process.


Interrupts (cont.)

Interrupts have many causes. A CAUSE register is used by thedevice to pass the ISR a cause code, so that the ISR knows how to

handle the interrupt.

PC

ISR

User Program

Memory


Software Interrupts

I Even a user program can request to be interrupted. (Asoftware interrupt.) Why?

I System security:I User mode – The process in user mode is limited in what

operations it can perform.I Kernel mode – The process in kernel mode is unlimited.

I To do a kernel operation, the user program (operating in usermode) it requests a service of the OS by asking to beinterrupted, and passing the ISR information on its request.


The CPU

The processor is a device that executes the machine cycle over andover. Each time the machine cycle is executed, a single machineinstruction is executed.

Machine cycle:

1. Fetch. The PC contains the address of the next instruction tobe executed. The instruction indicated by the PC is fetchedinto the CPU from memory, and the PC is updated.

2. Decode. The processor determines the operation to beperformed, and the location of the operands required.

3. Execute. Any operands are fetched, the operation isperformed, and the result is written to the destination.


Bus Communication

Bus structure:

bus

A

D

CtRd

Wt


Bus Use

There are three buses:

I The data bus carries data from the processor to the device. Italso carries data from a device to the CPU.

I The address bus carries addresses to memory units.

I The control bus carries the control signals read, to inputdevices, and write, to output devices.

How does a bus device know if a message is for it, or some otherdevice?


Bus Addressing

Every device has a collection of bus addresses that belong to it.The bus address is split into two fields:

I The unit number – every device on the bus is given a numberthat identifies it.

I The internal address – memory units are sent addresses forread and write operations.


Bus Addressing Example

I An 8× 4 RAM unit; addresses range from 0000000 to0000111.

I A 16× 4 ROM unit; addresses range from 0010000 to0011111.

I An input device with address 0100000.

I An output device with address 0110000.

I An I/O device with address 1000000.


Bus Addressing Example (cont.)

Deduce:

I The unit number is 3 bits (there are 5 devices).

I The internal address is 4 bits (the largest memory unit is oflength 16).

I The bus address is 7 bits. This is the size of the address bus.

I The data bus is of width 4 (all units are at must 4 bits wide).

I The RAM should perform a read operation only if the CPUsends a unit number of 000, and a read request.

I The RAM should perform a write operation only if the CPUsends a unit number of 000, and a write request.


Bus Addressing Example (cont.)

I The ROM performs a read when the CPU asks for a readoperation on Unit 001.

I The input device performs a read when the request is for aread from Unit 010.

I The output device performs a write operation when therequest is for a write operation on Unit 011.

I The I/O device performs a read or write when the thatoperation is requested on Unit 100.

I For each device control input we use 2 gates:

1. Addressing gate — checks for proper unit number.2. operation gate — checks for the proper operation (read or

write).


Example Memory Connection

RAM8x4

ROM16x4

A

D

Rd

Wt

7

4

444

A3-0

A3-0

A6-4

A6,5 A4

W E

E

DinDout

A

DoutA


Example Peripheral Connection

A

D

Rd

Wt

7

4

InOut

I/O

Reg

Reg

E

LD E

LD

A6,4 A5 A6 A5,4

A5,4A6

4

4

4 4


Chapter 5: The Register Transfer Language Level

I RTL (register transfer langauge) provides a tool for describingcircuitry at a higher level than the FSM or truth-table.

I The tools we have developed so far are structural descriptions.They describe the structure of a circuit.

I RTL is a behaviorla description. It describes the behavior of acircuit.

I An RTL description is a collection of µ-instructions.

I Each µ-instruction describes a circuit.

I A µ-instruction is composed of one or more µ-operations.


RTL Design

A µ-instruction has two parts (separated by a colon):I A data-path specification, describing how data flows through

the circuit.I A control part indicating when the µ-operations are

performed.

Examples (RTL implementation)

T : R1← R2

R1

R2

T

LD

>

>


RTL Design (cont.)

Examples (Use of trigger gates to generate control.)

ab + c : R1← 0

ab

c

R1

CL

>


RTL Design (cont.)

Examples

ab : R1← R1 + R2,R2← 3

a

b

3

+

R1

R2>

>

LD

LD


RTL Design (cont.)

Examples (RTL with input choice, using a MUX and OR gate.)ab : R1← R1 + R2

ab : R1← 3

a

b

R1

R2

+

013

LD

>

>


A Larger Example

Examples (Use of decoder instead of trigger gates.)x · y : R1← R1 + R3,R2← 0xy : R3← R3 + 1xy : R2← R1,R0← R3xy : R1← R0,R3← 5

R0 R1

R2R3

+

01

Decx

y

5

LD CLLD IN

LD LD

0123

> >

> >


RTL Analysis

To generate an RTL description from a schematic:

1. Write down control signals, using the decoder values. For theprevious example, Option 0 gives us xy , Option 1 gives us xy ,and so on.

2. Follow the decoder trigger lines to determine the µ-operationsperformed. For example for Option 2, the trigger line triggersthe LD line on R0, and the IN line on R3. This means that aµ-operation is performed on R0, and Another is performed onR3.

3. Follow the data-path lines to determine the exactµ-instruction. In the example the input port of R0 isconnected to R3, giving us the µ-instructionxy : R0← R3,R3← R3 + 1

4. Repeat the procedure for all decoder options.


From Structural to Behavioral Description

The method of transforming from circuit diagram to RTL is notuniversal. Here is a universal method.

i Q(0)1 Q(0)0 Q(1)1 Q(1)0 p

0 0 0 0 1 10 0 1 0 1 10 1 0 0 1 10 1 1 0 0 01 0 0 0 0 11 0 1 1 0 11 1 0 1 1 11 1 1 1 1 0


Structural to Behavioral (cont.)

Copy of each row as a µ-instruction.

i · Q1 · Q0 : Q ← 1, p ← 1

i · Q1Q0 : Q ← 1, p ← 1

iQ1Q0 : Q ← 1, p ← 1iQ1Q0 : Q ← 0, p ← 0

iQ1 · Q0 : Q ← 0, p ← 1

iQ1Q0 : Q ← 2, p ← 1

iQ1Q0 : Q ← 3, p ← 1iQ1Q0 : Q ← 3, p ← 0


Problems with Reverse Engineering

Building a circuit from this µ-program, with data-path and control,yields a poor design, compared to the original design. Themechanical translation looses semantic information.)

Q

01234567

p

01234567

1

1

1

11

1

0

0

0

1

2

1

3

1

3

0

>

i

2

2

222

22

22

2

LD

1


Common Processor µ-Instructions

RTL is good at describing high-level circuitry, but it can be used atall levels.

Examples (Combinational Circuit: the MUX)s1 · s0 : p ← i0s1s0 : p ← i1s1s0 : p ← i2s1s0 : p ← i3

Examples (Sequential Circuit: the J-K flip-flop)JK : Q ← 0

JK : Q ← 1

JK : Q ← Q

Examples (Sequential Circuit: the counter)

IN : Q ← Q + 1


Processor µ-Instructions

Arithmetic

1. Addition: X ← X + Y

2. Subtraction: X ← X − Y

3. Increment: X ← X + 1

4. Decrement: X ← X − 1

5. Transfer: X ← Y

6. Clear: X ← 0

Logic

1. AND: X ← X ∧ Y

2. OR: X ← X ∨ Y

3. NOT: X ← X

4. XOR: X ← X ⊕ Y


Processor µ-Instructions (cont.)

Shift

1. Logic Shift left: X ← shl X

2. Logic Shift right: X ← shr X

3. Circular shift left: X ← cir X

4. Circular shift right: X ← cil X

5. Arithmetic shift left: X ← ashl X

6. Arithmetic shift right: X ← ashr X

Memory

1. Read: X ← M[AR]

2. Write: M[AR]← X


Processor µ-Instructions (cont.)

Logic operations are bitwise. (they are done column by column.)

0110 0 1 1 0∧ 0101 ⇒ ∧ 0 ∧ 1 ∧ 0 ∧ 1

0100 0 1 0 0

Shifts of 1110

1. shl: 1100

2. shr: 0111

3. cil: 1101

4. cir: 0111

5. ashl: 1100

6. ashr: 1111

Memory addresses arespecified using the addressregister (AR). To fetch aninstruction from thelocation specified by thePC requires twoµ-operations.

AR ← PCX ← M[AR]


Shift types

Left Right

shl shr

cil cir

ashl ashr

0cout

0cout

cout cout

cout cout0


Algorithmic Machines

RTL is typically considered a declarative language: it specifies howa circuit is put together.

We can, however, use it as a procedural language: specifying asequence of steps, or actions.

Examples (The Teapot Example)

Design a control circuit for a teapot.

>

teaS

T H

X


Teapot Example

I InputsI S , the switch sensor: S = 0 if the switch is off, and S = 1 if

the switch is on.I T , the temperature sensor: T = 0 if the liquid is too cool, and

T = 1 if the liquid is hot enough.

I OutputsI X , turns off the on/off switch: X = 0 to turn the switch off,

and X = 0 to leave the switch state unchanged.I H, turns on the heating element: H = 0 turns off the element,

and H = 1 turns on the element.


Teapot Control Algorithm

stuck = 0

loopforever

if S and not stuck and not T then

H = 1

X = 0

stuck = 0

else if S and not stuck and T then

H = 0

X = 1

stuck = 1

else if S and stuck then

H = 0

X = 0

stuck = 1

else if not S then

H = 0

X = 0

stuck = 0


Teapot Flowchart

stuck = 0 S(stuck)T

S(stuck)T

S(stuck)

S

H = 1X = 0stuck = 0

H = 0X = 1stuck = 1

H = 0X = 0stuck = 1

H = 0X = 0stuck = 0

0

0

0

0

1

1

1

1

T0 T1

T2

T3

T4

T5

T6

T7

T8

I Each node of the chart isgiven a state name, Ti .

I A sequencer is a circuit thatproduces timing triggersignals, Ti .

I It consists of a counter, anda decoder.

I Each node becomes aµ-instruction in RTL, withthe timing signals as control,and the node actions asdata-path.


Teapot Sequencer

CDec

0123456789ABCDEF

T0T1T2T3T4T5T6T7T8

4


Generating RTL from the Flowchart

Def : T0 ≡ C = 0,T1 ≡ C = 1,T2 ≡ C = 2,T3 ≡ C = 3,T4 ≡ C = 4,T5 ≡ C = 5,T6 ≡ C = 6,T7 ≡ C = 7,T8 ≡ C = 8

T0 : stuck ← 0,C ← 1

T1S(stuck) · T : C ← 5

T1S(stuck) · T : C ← 2

T2S(stuck)T : C ← 6

T2S(stuck)T : C ← 3T3S(stuck) : C ← 7

T3S(stuck) : C ← 4T4S : C ← 8T4S : C ← 1T5 : H ← 1,X ← 0, stuck ← 0,C ← 1T6 : H ← 0,X ← 1, stuck ← 1,C ← 1T7 : H ← 0,X ← 0, stuck ← 1,C ← 1T8 : H ← 0,X ← 0, stuck ← 0,C ← 1


RTL and Verilog

RTL can be thought of as pseudo-code for VHDL (VLSICHardware Description Language).

// tea pot controller

module teapot(clk, S, T, H, X);

// input ports

input clk, S, T;

// output ports

output reg H, X;

// internal registers

reg stuck;

reg [3:0] C;

// define the states

assign T0 = C == 4’b0000;

assign T1 = C == 4’b0001;

assign T2 = C == 4’b0010;

assign T3 = C == 4’b0011;

assign T4 = C == 4’b0100;

assign T5 = C == 4’b0101;

assign T6 = C == 4’b0110;

assign T7 = C == 4’b0111;

assign T8 = C == 4’b1000;

// the circuit behavior

always @(posedge clk) begin

if (T0) begin

stuck = 0;

C = 4’b0001;

end

if (T1) begin

if (S && !stuck && !T)

C = 4’b0101;

else

C = 4’b0010;

end


RTL and Verilog (cont.)

if (T2) begin

if (S && !stuck && T)

C = 4’b0110;

else

C = 4’b0011;

end

if (T3) begin

if (S && stuck)

C = 4’b0111;

else

C = 4’b0100;

end

if (T4) begin

if (!S)

C = 4’b1000;

else

C = 4’b0001;

end

if (T5) begin

H = 1;

X = 0;

stuck = 0;

C = 4’b0001;

end

if (T6) begin

H = 0;

X = 1;

stuck = 1;

C = 4’b0001;

end

if (T7) begin

H = 0;

X = 0;

stuck = 1;

C = 4’b0001;

end



if (T8) begin

H = 0;

X = 0;

stuck = 0;

C = 4’b0001;

end

end // behavior

// initialize the state

initial begin

C = 4’b0000;

H = 0;

X = 0;

end

endmodule

I The moduledefinition gives thenames of the inputand output ports.This is followed bydeclarations that givethe port types, andsizes.

I Types are either inputor output, or reg, aregister, with optionalbit numbers tospecify the size.



I We define the timing signals, Ti , according to the sequencervalues.

I An action section specifies that the action takes place on thepositive edge of the clock signal.

I µ-instructions are implemented as if expressions. The testimplements the control, and the body implements thedata-path.

I The code contains µ-instructions for all timing signals, T0 –T8.

I The last section initializes the registers.


Chapter 6: Common Computer Architectures

I We examine some common ways of organizing a processor.

I Each organization is currently in use in some processor.I Topics

I ISA (instruction set architecture) — what instructions areavailable.

I Instruction format — how information is codded as a number.I Addressing modes — how location of operands is specified as

a number.


Instruction Set Architecture

Instruction types

I Data transfer. Move data from one location to another.

I Data manipulation. Perform arithmetic, logic, or shiftoperations on data.

I Control. Change the order of execution of machineinstructions.


Data Transfer Instructions

Categories based on location of the data.

I Register-to-Register. Movement inside the processor from oneregister to another.

mov R0, R1 ; R0 <- R1

I Register-to-Memory. Movement from a processor register outto a memory unit.

store 5, R1 ; M[5] <- R1

I Memory-to-Register. Movement from a memory in to aprocessor register.

load R0, 5 ; R0 <- M[5]


Data Transfer Instructions (cont.)

I Register-to-Device. Movement from a register out to anoutput device. (Devices are designated by their channelnumber.)

out 3, R0 ; D[3] <- R0

I Device-to-Register. Movement from an input device to aregister.

in R0, 3 ; R0 <- D[3]

I/O organization

I In special instruction I/O (as above) the processor uses inputand output instructions to perform I/O.

I In memory-mapped I/O devices are mapped to special meorylocations. (I/O is just movement to/from memory.)

store 255, R0 ; D[255] = M[255]


Data Manipulation & Data-Types

The processor operates on Data.

Common data-types

I Integer data.

I Real data.

I Boolean data.

I Character data.

I Binary coded decimal (BCD) data.


The Integer Data-Type

I Integer data consists of whole numbers.

I Integers are stored in a word.

I Word-size must be large enough to store the integer values auser is interested in operating on. Eight bits is not sufficient.A typical word size might be 32 bits.

I Integer typesI Unsigned integer – all bit configurations of the word are used

to represent non-negative numbers. (The range for a 32-bitword is 0 – 232 − 1 < 4× 109.)

I Signed integer – half the bit configurations are used fornon-negative integers, and half are used for negative integers.


The Real Data-Type

I A real number is a number with a fractional part.I Real number representation is based on scientific notation.I There are three important pieces of information in the

scientific notation: sign, mantissa, and exponent.I On a computer, this scientific notation representation is called

floating-point formatI There are two sized floating-point formats: single precision,

with a 32-bit FP word., and douple precision, with a 64-bitword.

−45.375 = −4.5375× 101

sign

exponent mantissa


The Boolean Data-Type

I It only takes a single bit to store the values true, or false.

I Most computer memories can only be accessed by the word,or byte, and so this is not convenient.

I Boolean values are stored in a byte: 0 for false, and not 0 fortrue.


The Character Data-Type

I To represent characters on a computer, (we can only storenumbers on a computer) the characters must be encoded.

I All of the characters on a keyboard can be numbered withcode from 0 to 255. This requires 8 bits.

I Eight bits is called a byte. It is possible to encode allcharacter on the keyboard with code that fits in a byte.

I The standard 1-byte code is ASCII (American Standard Codefor Information Interchange).

I The ASCII byte is only large enough for the Latin characters.TO represent other languages and scripts a larger code isneeded.

I UNICODE is a 16-bit code. ASCII is a subset of UNICODE.To form the UNICODE code for a Latin character you prefix itwith a byte of 0. (Other languages have prefix bytes that arenot 0.)


Binary Code Decimal (BCD)

I Humans work in decimal. When inputing and outputingnumbers, numbers typically need to be converted to or fromdecimal and binary.

I BCD is a way of representing integers that allows easyconversion to or from binary and decimal. The drawback, isthat arithmetic is more complex for BCD than it is forpositional binary.

I In BCD, an integer is represented as astring of 4-bit binaryencodings of its digits.

Examples

Decimal 365 has digits3 — 0011 6 — 0110 5 — 0101

In BCD: 0011 0110 0101


Data Manipulation Operation Types

I Arithmetic operations. The usual arithmetic operations, likeaddition, subtraction, multiplication, and division.

I Logic operations. Bitwise Boolean operation, like AND, OR,and NOT.

I Shift operations. Shifting integers left or right.


Data Manipulation Operations: Arithmetic

Arithmetic Operations.

I A processor must support arithmetic on all numericdata-types.

I The ISA may contain an instructioniadd R0, R1

for integer addition, andfadd F0, F1

for floating-point addition.


Data Manipulation Operations: Logic

Logic Operations.

I Logic operators allow us to manipulate individual bits in aninteger.

I As an exampleor R0, #00010000b

sets Bit 4 in R0, ORing R0 with the mask 00010000.

I In the exampleand R0, #00010000b

the AND operator clears all bits but Bit 4 in R0.


Data Manipulation Operations: Shift

Shift Operations.

I Shift operations can be to the right, or left, and can belogical, circular, or arithmetic.

I As an exampleshl R0

shifts R0 by one bit to the left.

I Shift operators can be used to perform multiplication by aconstant. They can do this faster than a full multiplicationinstruction.

I As an example, to multiply R0 by 5:mov R1, R0 ; R1 <- R0

shl R0 ; R0 <- R0 * 2

shl R0 ; R0 <- R0 * 2

add R0, R1 ; R0 <- R0 + R1


Control Operations

Control operations change the flow of control of a program fromthe next sequential instruction to another instruction.

Types of instructions

I Unconditional branches.

I Conditional branches.

I Machine reset.

I Context manipulation.


Unconditional Branches

...jump xyz

...xyz:...

I The jump causes the execution of the instruction at label xyz.A label is a symbolic address.

I The smbolic address represents the memory address of thenext instruction in machine language.

I The jump instruction is called unconditional because thebranch is always taken. In a conditional branch, the branch istaken only under certain circumstances.


Conditional Branches: Arithmetic

...beq R0, xyz

...xyz:...

I The beq checks to see if R0 is equal to zero. If so it branchesto address xyz.

I If R0 6= 0, execution continues with the sequentially nextinstruction.

I It is called an arithmetic branch because the beq instructionperforms arithmetic (compares R0 with 0) to determine if thebranch should be taken.


Conditional Branches: Status Flag...

sub R0, #0

bz R0, xyz

...xyz:

...

I In status flag branches, there is a processor register (theFLAGS register) which is a collection of status bits.

I When the processor performs arithmetic it sets the status bitsaccording to the results.

I The bz (Branch if zero) instruction checks the Z status FLAGto determine if a jump should occur. The Z flag is set if theresults of the last arithmetic operation were zero.

I The subtract instruction before the bz instruction is used toset the Z flag to 1 if R0 is 0.


Machine Reset Instruction

halt

I This instruction halts the machine.

I Usually the user does not want to terminate a program byhalting the machine.

I Usually, when a program finishes it should return control tothe operating system.

I Only low level OS programs would need to halt the machine.


Context Manipulation Instructions: Subroutines

High and low-level subroutine instructions....f()...function f()

begin...

return

end

...call f

...f:...

ret


Context Manipulation: Subroutines (cont.)

I A function call is translated into a call instruction.

I A return statement is translated into a ret instruction.

I A call instruction causes a jump to the address specified.

I A ret instruction causes a jump back to the instructionfollowing the call instruction. (This jump is back to what iscalled the return address.)

I The return address is pushed on to the system stack by thecall instruction.

I When the ret instruction needs the return address, it ispopped off the stack.



Stack:I A LIFO (last in first out) data structure.I It supports a pop operation, which takes the top element off

the stack.I It supports a push operation that places a new element on the

top of the stack.

SP

SP

SP

push(x)push(y)

xy

x

pop() = y



call instruction actions:

1. Push the return address on to the system stack.

2. Jump to the subroutine address.

ret instruction actions:

1. Pop the return address off of the system stack.

2. Load the popped return address into the PC.



Examples (A function f calls a function g)

SP

SPSP

ra(f) ra(f)

ra(g)

call f call g

retret


Context Manipulation Instructions: Interrupts

I Interrupts are like subroutines, except there is no callinstruction executed.

I Upon an interrupt, the control jumps to the start of the ISR.

I The interrupted program must be unaware that it has beeninterrupted.

I When the interrupt occurs, the state of the processor registersmust be saved. (This is called context saving.)

I On return from the interrupt, the processor registers must berestored (context restoration).

I An interrupt may be caused by stack problems. The returnaddress, and the context are therefor save in the ISR area,rather than on the stack.


Context Manipulation: Interrupts (cont.)

I A prgram can request to be interrupted with a syscallinstruction.

I A syscall instruction is issued when the program requests theOS to perform a service which it does not have permission toperform itself (like use a printer).

Examples (A syscallexecution.)...

syscall...ISR:...

iret

I The syscallinstruction jumps tothe fixed ISRlocation, and savesthe context.

I The iret instructionjumps back to thereturn address, afterrestoring the context.


Context Manipulation: Interrupts (cont.)

Executipon Modes:

I User mode. The program operating in user mode hasrestrictions. It cannot access certian devices, and certainmemory sections.

I Kernel mode. The program operating in kernel mode can doanything.

The users’ programs, typically operate in user mode. Many partsof the OS operate in kernel mode.

To terminate, a program asks the OS to take over. This is donethrough a syscall.


Instruction Format

Assembly instructions must be represented numerically.

Instruction partsadd R0, R1

I The operation: the operation being performed. In thisinstruction it is addition.

I The destination: the operand where the result will be stored.In this instruction this would be R0.

I The source: the other operand. In this case this would be R1.

Machine code equivalent

op dst src

6 5 5


Instruction Format (cont.)

Example instruction

001001 00000 00001

dst srcop

Full instruction: 0010010000000001

I The machine has 26 = 64 instructions in its ISA.

I The machine has 25 = 32 registers.


Addressing Modes

Numbers in operand fields have several interpretations, in a lesssimplistic computer. They could indicate a register number, amemory address, or a constant.Addressing modes indicate where the operand is found.Addressing Modes:

1. Direct mode.

2. Indirect mode.

3. Register direct mode.

4. Register indirect mode.

5. Immediate mode.

6. Implicit mode

7. Relative mode.

8. Indexed mode.


Addressing Modes (cont.)

Direct mode: The operand field gives the address of the effectiveoperand.

load R0, 5

R0← M[5]

Indirect mode: Theoperand field gives theaddress of a pointer to theeffective operand.

load R0, (5)

R0← M[M[5]]

RAM

5

9 3

9



Register direct mode: The operand field contains the number of aregister which contains the effective operand.

mov R0, R5

R0← R5

Register indirect mode: The operand field contains the number ofa register that contains a pointer to the effective operand.

load R0, @R5

R0← M[R5]

Immediate mode: The operand field contains the effective operand.mov R0, #5

R0← 5



Implicit mode: An operand is not explicitly given.Operand explicitly given (subroutine address):

call 5

Operand not given explicitly (ISR address):syscall

Relative mode: The address of the effective operand is calculatedas the contents of the operand field added as an offset to thecontents of the PC.

load R0, $5

R0← M[PC + 5]


Addressing Modes: Relative Mode

Relative mode addressing allows the relocation of a program,without modifying addresses.

Examples (A program that addresses a location 125, inside itsworkspace.)

I VA: The address is absolute.load R0, 125

I VR : The address is relative to the PC.load R0, $25


Addressing Modes: Relative Mode (cont.)

If the workspace is moved, the load instruction is VA will fetchfrom outside the workspace. The load for VR will fetch from thesame offset inside the workspace.

PC

125

PC

125

PC

80

100

30

50 7550

30

7550 + 25

VA VR



Indexed mode: The address of the effective operand is calculateadding together two fields in the instruction: an operand offsetfield, and and operand index register field.

load R0, 5(R1)

R0← M[R1 + 5]This mode allows for the easy implementation of array structuresin memory.


Addressing Modes: Indexed Mode

Array layout in memory.I The array has a base address.I Each element is located at an address which is the base

address added to an offset.

A[0]A[1]A[2]A[3]

A + 0A + 1A + 2A + 3

A

.

.

.

.

.

.


Addressing Modes: Index Mode (cont.)

Index mode is used to implement common array operations, withthe index register storing the offset.

for i = 0 to n-1 do

x = x + A[i]

(Assume that the variablei is stored in R1. Memoryaddresses are specified assymbolic addresses.)

mov R1, #0

lab1:

load R0, n

sub R0, #1

sub R0, R1

bz ext

add x, A(R1)

add R1, #1

jump lab1

ext:


Addressing in Machine Language

A machine with a 16-bit word, a 6-bit op-code, and two 5-bitoperand fields. (Notice that this small operand field is notadequate for a reasonably sized memory unit.) To representaddressing modes, we use one bit of the operand as an addressingmode bit. The single bit chooses between register direct (0), anddirect (1) modes.

op dstDM srcSM

6 1 4 1 4

Examples

add R0, 14

op dstDM srcSM

1001001 0 0000 1110


Alternate Machine Architectures

Machines can be classified by the number of operands in theirinstructions.

I Register machine (3-operand machine).

I Register implicit machine (2-operand machine).

I Accumulator machine (1-operand machine).

I Stack machine (0-operand machine).

We use a standard machine specification called the RIM machineto illustrate these alternate architectures.RIM specification:

I Registers R0 through R7.

I A 256× 16 RAM unit.

I A singe I/O device.

I A bus connection.


RIM ISA Specification

I Data transfer.I Register-to-register.I Memory-to-register.I Register-to-memory.

I Arithmetic.I Addition.I Subtraction.

I Logic.I AND.I OR.I NOT.

I Control.I Jump.I Branch if zero.I Branch if not zero.


The Register Machine

At least the arithmetic instructions have three operands.add R0, R1, R2

R0← R1 + R2ISA:

Assembly Code Machine Code Meaningload R1, m 0000 R1 m R1 ← M[m]store m, γ 0001 γ m M[m]← γadd R1, γ2, γ3 0010 R1 γ2 γ3 R1 ← γ2 + γ3,Z ← (γ2 + γ3) = 0sub R1, γ2, γ3 0011 R1 γ2 γ3 R1 ← γ2 − γ3,Z ← (γ2 − γ3) = 0and R1, γ2, γ3 0100 R1 γ2 γ3 R1 ← γ2 ∧ γ3,Z ← (γ2 ∧ γ3) = 0or R1, γ2, γ3 0101 R1 γ2 γ3 R1 ← γ2 ∨ γ3,Z ← (γ2 ∨ γ3) = 0not R1, γ2 0110 R1 γ2 0000 R1 ← γ2,Z ← γ2 = 0jump m 0111 0000 m PC ← mbz m 1000 0000 m if Z then PC ← mbnz m 1001 0000 m if Z then PC ← m


Register Machine: Instruction Format

The register machine has two formats:

I The register format

I The memory format

The memory format allows for larger address fields.

Interpreting the table.

Examples (Register format)

add R1, γ2, γ3

1. First operand, R1 – indicates a register number.

2. Second operand, γ2 – indicates a register, or an immediatevalue.

3. Third operand, γ3 – has the same meaning as the secondoperand.


Register Machine: Instruction Format (cont.)

Examples (Register Format (cont.))

add R0, R0, #1

Machine language:

op dst src1SI1 SI2 src2

4 1 3 1 3 1 3

0010 0 000 0000 0011

I op-code – 0010, for the add instruction.

I one bit 0.

I dst – 000, for R0.

I SI1 – 0, for register mode for the first operand.

I src1 – 000, for R0.

I SI2 – 1, for immediate mode for the second operand.

I src2 – 001, for the immediate value #1.



Examples (Memory Format)

load R1, m0000 R1 m

1. First operand, R1 – indicates a register number.

2. Second operand, m – indicates a memory address, orimmediate value.

load R2, 19

Machine language:

op dst address

4 1 3 8

0000 0 010 00010011



Examples (Memory Format (cont.))

I op-code – 0000 for load.

I one bit of 0.

I dst – 010 for R2.

I address – 00010011 for 19.



Instruction types:I Data Transfer

I load – memory to registerI store – register to memory

I Data Manipulation (all set the Z status flag)I addI subI andI orI not

I ControlI jump – unconditional branchI bz – conditional (branch if zero ) (uses the Z status flag)I bnz – conditional (branch if not zero)


Register Machine: Programming Example

Examples (Program to output x × y to sum.)sum = 0

i = 0

while i != x do

sum = sum + y

i = i + 1


Register Machine: Programming Example (cont.)

Assembly version of the multiplication program.

add R0, #0 #0 ; sum = 0

store sum

add R0, #0 #0 ; i = 0

store i

lp: ; while i != x do

load R0, i

load R1, x

sub R0, R0, R1

bz ext

load R0, sum ; sum = sum + y

load R1, y

add R0, R0, R1

store sum, R0

load R0, i ; i = i + 1

add R0, R0, #1

store i, R0

jump lp

ext:



Machine language version of multiplier.

Address Machine Code Assembly code00010100 0010 0 000 1 000 1 000 add R0, #0, #0

00010101 0001 0 000 00000000 store sum, R0

00010110 0010 0 000 1 000 1 000 add R0, #0, #0

00010111 0001 0 000 00000001 store i, R0

00011000 0000 0 000 00000001 load R0, i

00011001 0000 0 001 00000010 load R1, x

00011010 0011 0 000 0 000 0 001 sub R0, R0, R1

00011011 1000 0 000 00100100 bz ext

00011100 0000 0 000 00000000 load R0, sum

00011101 0000 0 001 00000011 load R1, y

00011110 0010 0 000 0 000 0 001 add R0, R0, R1

00011111 0001 0 000 00000000 store sum, R0

00100000 0000 0 000 00000001 load R0, i

00100001 0010 0 000 0 000 1 001 add R0, R0, #1

00100010 0001 0 000 00000001 store i, R0

00100011 0111 0 000 00011000 jump lp



Machine language translation.

I A machine language program is split into two segments ofmemory: a data segment, and a code segment. The datasegment contains variables. The code segment containsmachine instructions.

I Our data segment starts at location 0, and out code segmentstarts at location 20.

I Variable assignments:I isum is M[0]I i is M[1]I x is M[2]I y is M[3]

I The table is used to assemble each machine instruction, fromthe assembly instruction.


The Register Implicit Machine

The instructions have two operands.add R0, R1

ISA:Assembly Code Machine Code Meaningload R1, m 0000 1 R1 m R1 ← M[m]mov R1, γ2 0000 0 R1 γ2 R1 ← γ2

store m, R1 0001 0 R1 m M[m]← R1

add R1, γ2 0010 0 R1 γ2 R1 ← R1 + γ2,Z ← (R1 + γ2) = 0sub R1, γ2, 0011 0 R1 γ2 R1 ← R1 − γ2,Z ← (R1 − γ2) = 0and R1, γ2 0100 0 R1 γ2 R1 ← R1 ∧ γ2,Z ← (R1 ∧ γ2) = 0or R1, γ2 0101 0 R1 γ2 R1 ← R1 ∨ γ2,Z ← (R1 ∨ γ2) = 0not R1 0110 0 R1 00000000 R1 ← R1,Z ← R1 = 0jump m 0111 0 000 m PC ← mbz m 1000 0 000 m if Z then PC ← mbnz m 1001 0 000 m if Z then PC ← m


Register Implicit Machine: Instruction Format

A single instruction format. The src field can take one of threeforms.

op M dst src

4 1 3 8

Address

Register

8

1 3 0000

srsRI

Immediate

1 7

srsRI


Register Implicit Machine: Instruction Format (cont.)

I The M bit controls the interpretation of the src field:I 0 – the src field is either in register direct mode (sRI = 0), or

immediate mode (sRI = 1).I 1 – the src field is an address.

I The load and mov instructions are actually the sameinstruction; one with the src operand in direct mode, and theother with the src operand in either register direct orimmediate mode. This instruction is referred to as themovR/M instruction.

Examples (Forms of the movR/M instruction.)

load R0, 128 ; move from memory

mov R0, R1 ; move between registers


Register Implicit Machine: Machine Language

Examples

mov R0, R1

Op-code is 0000 for movR/M, M is 0 for register direct mode, dstis 000 for R0, sRI is 0 for register direct mode, sr is 001 fro R1.

0000 0 000 0 001 0000

mov R2, #4

Op-code is 0000, M is 0, dst is 010 for R2, sRI is 1 for immediatemode, sr is 0000100 for #4.

0000 0 010 1 0000100

load R1, 127

Op-code is 0000, M is 1 for direct mode, dst is 001 for R1, addressis 01111111 for 127.

0000 1 001 01111111Computer Organization: Basic Processor Structure

Register Implicit Machine: Programming Example

mov R0, #0 ; sum = 0

store sum R0

mov R0, #0 ; i = 0

store i, R0


load R0, i

load R1, x

sub R0, R1

bz ext

load R0, sum ; sum = sum + y

load R1, y

add R0, R1

store sum, R0

load R0, i ; i = i + 1

add R0, #1

store i, R0

jump lp

ext:


Register Implicit Machine: Programming example (cont.)

Address Machine Code Assembly code00010100 0000 0 000 1 0000000 mov R0, #0

00010101 0001 0 000 00000000 store sum, R0

00010110 0000 0 000 1 0000000 mov R0, #0

00010111 0001 0 000 00000001 store i, R0

00011000 0000 1 000 00000001 load R0, i

00011001 0000 1 001 00000010 load R1, x

00011010 0011 0 000 0 001 0000 sub R0, R1

00011011 1000 0 000 00100100 bz ext

00011100 0000 1 000 00000000 load R0, sum

00011101 0000 1 001 00000011 load R1, y

00011110 0010 0 000 0 001 0000 add R0, R1

00011111 0001 0 000 00000000 store sum, R0

00100000 0000 1 000 00000001 load R0, i

00100001 0010 0 000 1 0000001 add R0, #1

00100010 0001 0 000 00000001 store i, R0

00100011 0111 0 000 00011000 jump lp


The Accumulator Machine

For the accumulator machine instruction have only one exlicitoperand; a special register, the accumulator (AC) is always asecond implicit operand.

Examples

add R2 ; add R2 to AC

load 128 ; put M[128] into the AC


The Accumulator Machine (cont.)

ISA:Assembly Code Machine Code Meaning

load m 0000 1 000 m AC ← M[m]load γ2 0000 0 000 γ2 AC ← γ2

store m 0001 1 000 m M[m]← ACstore R1 0001 0 000 0 R1 0000 R1 ← ACadd γ2 0010 0 000 γ2 AC ← AC + γ2,

Z ← (AC + γ2) = 0sub γ2, 0011 0 000 γ2 AC ← AC − γ2,

Z ← (AC − γ2) = 0and γ2 0100 0 000 γ2 AC ← AC ∧ γ2,

Z ← (AC ∧ R2) = 0or γ2 0101 0 000 γ2 AC ← AC ∨ γ2,

Z ← (AC ∨ γ2) = 0

not 0110 0 000 00000000 AC ← AC ,Z ← AC = 0jump m 0111 0 000 m PC ← mbz m 1000 0 000 m if Z then PC ← m

bnz m 1001 0 000 m if Z then PC ← m


Accumulator Machine: Instruction Format

The accumulator machine uses the same format as the registerimplicit machine. The destination field is unused.

op M src

4 1 8

000

(Notice that the store instruction is capable of storing to memory,using direct mode, or a register, using register direct mode.)


Accumulator Machine: Programming Example

load #0 ; sum = 0

store sum

load #0 ; i = 0

store i


load x

store R0

load i

sub R0

bz ext

load y ; sum = sum + y

store R0

load sum

add R0

store sum

load #1 ; i = i + 1

store R0

load i

add R0

store i

jump lp

ext:


Accumulator Machine: Programming Example (cont.)

Address Machine Code Assembly code00010100 0000 0 000 1 0000000 load #0

00010101 0001 1 000 00000000 store sum

00010110 0000 0 000 1 0000000 load #0

00010111 0001 1 000 00000001 store i

00011000 0000 1 000 00000010 load x

00011001 0001 0 000 0 000 0000 store R0

00011010 0000 1 000 00000001 load i

00011011 0011 0 000 0 000 0000 sub R0

00011100 1000 0 000 00101000 bz ext

00011101 0000 1 000 00000011 load y

00011110 0001 0 000 0 000 0000 store R0

00011111 0000 1 000 00000000 load sum

00100000 0010 0 000 0 000 0000 add R0

00100001 0001 1 000 00000000 store sum

00100010 0000 0 000 1 0000001 load #1

00100011 0001 0 000 0 000 0000 store R0

00100100 0000 1 000 00000001 load i

00100101 0010 0 000 0 000 0000 add R0

00100110 0001 1 000 00000001 store i

00100111 0111 0 000 00011000 jump lp


The Stack Machine

On the stack machine, instructions have no explicit operands. Alloperands implicitly come off the arithmetic stack.

I Operands are pushed onto the stack.I Operators pop their operands off of the stack, and push their

results onto the stack.

Examples

3× (4 + 5)

3 3

4

3

4

5

3

9

27

push #3 push #4 push #5 add mult


The Stack Machine (cont.)

Examples (Arithmetic example)push #3

push #4

push #5

add

mult

Although some instructions have operands, the arithmeticoperations have no operands.


The Stack Machine (cont.)

Assembly Code Machine Code Meaning

push m 0000 0 000 m push(M[m])push i 0000 1 000 i push(i)pop m 0001 0 000 m M[m]← popadd 0010 000000000000 push(pop1 + pop2)sub 0011 000000000000 push(pop1 − pop2)and 0100 000000000000 push(pop1 ∧ pop2)or 0101 000000000000 push(pop1 ∨ pop2)not 0110 000000000000 push(pop)jump 0111 000000000000 PC ← popbz 1000 000000000000 if pop1 = 0 then PC ← pop2

bnz 1001 000000000000 if pop1 6= 0 then PC ← pop2


Stack Machine: Instruction Format

I There are two types of instructions: 0-operand, and 1-operandinstructions.

I Both instructions fit into the same format as for theaccumulator machine.

I The 0-operand machine leaves the operand field blank.

I pop, and push are used to transfer data from, and to thestack, respectively.

I Notation: Subscripts in the table on the pop operationindicate order. (pop1 is the first pop, and pop2 is the secondpop.)

I The Z flag is no longer used; the bz and bnz instructions nowchecks its first operand (arithmetic branch).


Stack Machine: Programming Example

push #0 ; sum = 0

pop sum

push #0 ; i = 0

pop i


push #ext

push x

push i

sub

bz

push y ; sum = sum + y

push sum

add

pop sum

push #1 ; i = i + 1

push i

add

pop i

push #lp

jump

ext:


Stack Machine: Programming Example (cont.)

Address Machine Code Assembly code00010100 0000 1 000 00000000 push #0

00010101 0001 0 000 00000000 pop sum

00010110 0000 1 000 00000000 push #0

00010111 0001 0 000 00000001 pop i

00011000 0000 1 000 00100111 push #ext

00011001 0000 0 000 00000010 push x

00011010 0000 0 000 00000001 push i

00011011 0011 0 000 00000000 sub

00011100 1000 0 000 00000000 bz

00011101 0000 0 000 00000011 push y

00011110 0000 0 000 00000000 push sum

00011111 0010 0 000 00000000 add

00100000 0001 0 000 00000000 pop sum

00100001 0000 1 000 00000001 push #1

00100010 0000 0 000 00000001 push i

00100011 0010 0 000 00000000 add

00100100 0001 0 000 00000001 pop i

00100101 0000 1 000 00011000 push #lp

00100110 0111 0 000 00000000 jump


ISA Design Issues

I Number of registersI The more registers, the more operands that can be held in the

processor, without reading from memory, decreasing operandfetch latency.

I However, the more registers in the processor, the bigger theprocessor circuit, making it slower.

I Word sizeI With a large word size the machine can accommodate large

data values, and it is easier to fill all of the fields of machineinstructions into a single word.

I However, a large word size increases the size of registers, andmemory, slowing them down. Also, for small data, many of theword bits will be wasted.


ISA Design Issues (cont.)

I Variable or fixed length instructionsI By using variable length instructions, you can better

accommodate a varying number of operands on the machine.I However, the variable number of words are harder to fetch,

and the circuitry is more complex than that needed if all of theinstructions fit in a single word.

I Memory accessI Allowing all instructions to fetch operands from eliminates the

need to prefetch direct mode instructions.I However, data manipulation instructions must be slowed down

to allow time for the memory fetch. Also, it is a problem to fitmemory addresses in instructions with several operands.



I Orthogonality – an instruction set is orthogonal if there isonly one way to do any operation.Example: if a machine the two instructions

inc R2

add R2 #1

these two instructions are not orthogonal, because anincrement can be done with an addition instruction.

I Completeness – an instruction set is complete if everyoperation the user requires is in the ISA.

I Orthogonality and completeness are often at odds:Completeness leads to large ISAs, and orthogonality tends torestrict the size of the ISA. Orthogonality can reduce the sizeof the processor, speeding up the machine. Completenessproduces an instruction set that is easier to use.



I RISC (reduced instruction set computer – Computer classifiedas RISC have small instruction sets, with simple instructions.They tend to have instruction sets that are orthogonal, butharder to use (incomplete).

I CISC (complex instruction set computer) – Computersclassified as CISC have large instruction sets, with complexinstructions that combine operations. Their instruction set iscomplete, but sacrifices orthogonality.



Architecture

I As we move from 3-operand to 0-operand machines, moreinstructions are needed to perform a programming task.

I However, instruction have more implicit operands, operandsthat are in fixed locations, and the fetching hardware becomessimpler. So, the instruction may execute faster.

I However, we need more memory fetches.

I Which architecture is best depends on a complex set offactors.


The BRIM Machine

The BRIM (basic register implicit machine) is the machine we usethroughout the book. It is the register implicit machine presentedwith a single I/O device and instructions to use it.

Assembly Code Machine Code Meaning

in R 1010 0 R 00000000 R ← Din

out γ 1011 0 000 γ Dout ← γ


Chapter 7: Hardwire CPU Design

I We construct the processor for the BRIM machine, and adaptit to other architectures.

I Processor design types:I Hardwire control.I Micro-programmed control.

I For Hardwired design, the processor is designed as asequential circuit, at the RTL level, building the data-path,and the control unit.

I For micro-programmed control, the processor is structured asa smaller processor (the sequencer), executingmicro-instructions to perform machine instruction operations.

I We start with hardwired control.


Register Implicit Machine Design

I We use the bus-based architecture.

I the BRIM machine has several registers, numbered 0—7.These are collected in a register file.

>

RegA

Din

Dout

3

16

16

LD E


Bus-Based Architecture

Reg

>

I/O

PC

DR

SR

ALU

RAM

IR

AR

>

>

>

>

>

01 LD E

RLDSRF

16

IR6-4

IR10-8

16

EW

MEMW

816

16

A

Din

Dout

LD E

IOLD IOE

1616

LD IN

PCLD PCIN SPC

8 88

LDDRLD

16 16

LDSRLD

16 16

ALUOP 16

16

SALU

LDIRLD

16

16

LDARLD

88

16

Z

16

ZLD

ZX

IRX

IR7-08

SAD

3

RESS SD

Mode

IR15-0

>

>

16


Bus-Based Architecture (cont.)

I The figure shows the data bus.I We have eliminated the address bus and bus addresing. This

is done by expanding the control bus to include dedicatedread/write control lines for each device.

A bus with a single write line, and bus addressing, versus a buswith dedicated write lines.

A

D

D

Wt

Wt0

Wt1

D0 D1

D0 D1

ww

w w



I The diagram contains a mix of devices: some internal to theprocessor, like registers, and some external, like the memoryunit.

I The control unit (CU) is not shown. It sends the controlsignals to the bus.

I Bus data is blocked from enetering a register by the registerload line.

I Register data is blocked from entering the bus by a tri-stateswitch.

I Bus connections are 16-bit, to transfer data, and 8-bit, totransfer addresses. For addresses, only the lower half of thebus connection is used.



The bus diagram contains an I/O device, a memory unit, a registerfile, and 6 processor registers.

I The Program Counter (PC) — contains the address of thenext instruction to be executed. (8 bits)

I The Destination Register (DR) — the destination operandfrom the current instruction. (16 bits)

I The Source Register (SR) — the source operand from thecurrent instruction. (16 bits)

I The Instruction Register (IR) — the current instruction, afterit has been fetched from memory. (16 bits)

I The Address Register (AR) — used to address the RAM unit.It is hardwired the RAM address port. (8 bits)

I The Zero status flag (Z) — a zero status flag. (1 bit)



Verifying that all instructions can be executed.

I Destination. The destination comes over the bus, into theregister file. The register file must be addressable using thedst field (IR bits 8 through 10).

I Source. The source comes from a register, an immediatevalue, or memory. The regiister file must be addressable bythe src field (IR bits 4 through 6). The memory unit must beaddressable by the IR bits 0 through 7. It must also bepossible to put to put an immediate value, IR bits 0 through6, on to the bus.



I It must be possible to load the DR, and SR registers from thebus, for the arithmetic and logic instructions.

I For the jump instructions, it must be possible to load the PCoff of the bus.

I The Z flag is calculated with a NOR gate. The ALU is usedto do arithmetic and logic operations.

I Connections must be present to the I/O device.



Bus control lines:1. Register file load (RLD)2. Write to memory (MW)3. I/O load (IOLD)4. Load program counter (PCLD)5. Load destination register (DRLD)6. Load source register (SRLD)7. Load instruction register (IRLD)8. Load address register (ARLD)9. Load Z flag (ZLD)

10. Register file enable (RE)11. Enable memory (ME)12. Enable I/O device (IOE)13. Select program counter (SPC)14. Select arithmetic logic unit (SALU)15. Select instruction address (SAD)16. Select register file address (SRF)17. Select source operand (SS)18. Select destination operand (SD)19. Increment program counter (INPC)20. ALU op-code; 3-bit (ALUOP)



Bus status lines:

1. Instruction register contents; 16-bit (IRX)

2. Z flag contents (ZX)

The CU needs access to the Z flag, and the IR register. These aresupplied by the status lines.

Components we have not covered:

I The ALU.

I The Mode MUX (selects the addressing mode).


The ALU

A simple add-AND ALU. MUX chooses which computational unitproduces the result.

ALU

A

B

A

B

ZZ

ALUOPALUOP

+

0

1

0


The ALU (cont.)

The BRIM ALU. It has two units: an arithmetic unit (AU), and alogic unit (LU).

ALUOP0

+

ALUOP1

0 1

0 1

3 2 1 0

ALUOP2

A

B

Z

16

16

16

1616

16


The ALU (cont.)

The bits of the ALUOP are used to control the output MUX, theAU, and the LU.

Unit#Option#

ALUOP

1 - LU options

00 - Identity

01 - NOT

10 - OR

11 - AND

0 - AU optionsCinSelectB

00 - Add

11 - Sub


The ALU (cont.)

I ALUOP2 specifies the unit number: 0 for AU, 1 for LU.

I If the LU is being used, bits ALUOP0,1 give the operationnumber: 0 for identity, 1 for compliment, 2 for OR, and 3 forAND.

I If the AU is used ALUOP1 selects the B operand: 0 for just B,and 1 for the one’s compliment of B.

I If the AU is used ALUOP0 gives the carry-in to the adder.

Examples

Addition: ALUOP = 000.Subtraction (add the two’s compliment): ALUOP = 011.OR: ALUOP = 110.


The Mode MUX

BRIM addressing modes:

I Direct – So few instructions use direct mode, that it can beimplemented by the control circuits.

I Register direct – handled by the data-path mode MUX.

I Immediate – handled by the data-path mode MUX.

The mode MUX delivers either a register value (register direct) oran immediate value, for the source operand on to the bus. Itdelivers a register value onto the bus, for the destination operand.


The Mode MUX: Structure

Reg

IR Bus

16

16 16

16

16

16

IR6-016

6

000000SS

SD

IR7

IR7


The Mode MUX: Structure (cont.)

I Inputs:I Register fileI IR register (the whole machine instruction)

I The output is sent to the bus.I Tri-state switches choose an option.I Options:

I A register direct value from the register file. Two switchesgenerate this option: one for the source, and one for thedestination.

I An immediate value from the machine instruction, extended to16 bits.

I Source register direct is chosen when the SS line is on, andthe mode bit is 1.

I Source immediate value is chosen when the SS line is on, andthe mode bit is 0.

I Destination register direct is chosen when the SD line is on.Computer Organization: Basic Processor Structure

The RIM Control Unit

CU

Data-path

Sequencer

5

2224

2

8

Inputs to CU:

I 8 timing signals from the sequencer.

I 5 signals from the data-path: IRX0−3, and ZX.

Outputs: 24 bits — 22 bus sign, and 2 sequencer control signals.


The CU: The Machine Cycle

Fetching an InstructionAR ← PCIR ← M[AR],PC ← PC + 1

Decoding an InstructionDecoding is a CU operation, not a data-path operation.

Executing an InstructionVaries from instruction to instruction.


The CU: Executing Instructions

Executing the movR/M Instruction

I Direct ModeAR ← IR7−0

SR ← M[AR]

I Register Direct, or Immediate ModeSR ← if IR7 then IR6−0 else R[IR6−4]

Executing the store InstructionAR ← IR7−0

M[AR]← R[IR10−8]


The CU: Executing Instructions (cont.)

Executing the Arithmetic and Logic Instructions

Operand Fetch:SR ← if IR7 then IR6−0 else R[IR6−4]DR ← R[IR10−8]

I Add.

R[IR10−8]← DR + SR,Z ←15∨i=0

(DR + SR)i

I Sub.

R[IR10−8]← DR − SR,Z ←15∨i=0

(DR − SR)i

I AND.

R[IR10−8]← DR ∧ SR,Z ←15∨i=0

(DR ∧ SR)i

I OR.

R[IR10−8]← DR ∨ SR,Z ←15∨i=0

(DR ∨ SR)i

I NOT.

R[IR10−8 ← DR,Z ←15∨i=0

DR i


The CU: Executing Instructions (cont.)

Executing the Branch Instructions(They differ only by under what situation the µ-instructions areperformed.)

PC ← IR7−0

Executing the I/O Instructions

I In.R[IR10−8]← Din

I Out.Dout ← if IR7 then IR6−0 else R[IR6−4]


The CU Behavioral Description

Overall structure of the CU:

Control

CU

Dec

Dec

op

time

Sequencer

3C

T7-0

2

IN CL

IRX15-12

Z

IR

Data-path

OP0-15

Data-path

22

Ctl busCtl bus

ZX


The CU Behavioral Description (cont.)

Flow-chart for the CU:AR <- PC

IR <- M[AR], PC <- PC + 1

IR15-12 = ?

0 1 2,3,4,5,6 7 9

10AR <- IR7-0

SR <- M[AR]

R[IR10-8] <- SR

AR <- IR7-0

M[AR] <- R[IR10-8]

DR <- R[IR10-8]

R[IR10-8] <- SR + DR, Z <- (SR + DR) = 0

R[IR10-8] <- SR - DR, Z <- (SR - DR) = 0

R[IR10-8] <- SR & DR, Z <- (SR & DR) = 0

R[IR10-8] <- SR | DR, Z <- (SR | DR) = 0

R[IR10-8] <- !DR, Z <- (!DR) = 0

PC <- IR7-0 PC <- IR7-0

PC <- IR7-0

R[IR10-8] <- Din

Dout <- IR7 ? IR6-0 : R[IR6-4]

F0

Dec

St2 Is0

J2

I2

Ot2

IR111

0

F1

Dir

M2

MD3

MRD3

St3

M4

Ad2

Sb2

An2

Or4

N4

2

3

4

5

6

11

Bz3 Bn3

Z Z0

1 0

1

8

DR <- R[IR10-8]

DR <- R[IR10-8]

DR <- R[IR10-8]

DR <- R[IR10-8]

SR <- IR7 ? IR6-0 : R[IR6-4]Ad3

Ad4Sb3

Sb4An3

An4Or3

Or2

N3

N2

SR <- IR3 ? IR2-0 : R[IR2-0]

SR <- IR7 ? IR6-0 : R[IR6-4]

IsN0

SR <- IR7 ? IR6-0 : R[IR6-4]

SR <- IR7 ? IR6-0 : R[IR6-4]

SR <- IR7 ? IR6-0 : R[IR6-4]


The CU: Flow-Chart

I Each square node contains a µ-instruction. Each diamondcontains a control decision.

I Subscripting on node names indicates the clock cycle in themachine cycle on which that µ-instruction is performed.

I Fetch is performed in stages F0, and F1.

I Decode is done in stage Dec . Control branches to one ofseveral nodes, based on the value if the op-code.

I Branch 0 implements the movR/M instruction. A sub-branchhandles either direct addressing mode or register/immediatemode.


The CU: Flow-Chart (cont.)

I Branch 1 handles the store instruction.

I Branches 2 – 6 implement the 5 ALU instructions.

I Branches 7 and 8 implement the conditional branches, bz andbnz . Each branch contains a sub-branch that either causesthe branch, or does nothing, based on the value of the Z flag.

I Branch 9 performs a branch for the jump instruction.

I Branches 10 and 11 perform the µ-code for the in, and outinstructions.

I After executing a machine language instruction, the flow-chartreturns to stage F0 to begin the next instruction.


The CU: Stage Control

I The CU knows what stage it is in, by the values of its controlinputs: the op-code, indicated by the inputs OP15−0, from theop-decoder; and the timing step, indicated by the inputsT7−0, from the sequencer.

I The stage is updated by changing the value of the sequencercounter.

I For square nodes in the flow-chart, the counter, C is eitherincremented to the next stage in a sequence, or cleared, tosend control back to the stage F0.


The CU: Stage Control (cont.)

Phase Control Signals Sequencer ControlF0 T0 C ← C + 1F1 T1 C ← C + 1M2 T2 · OP0 C ← C + 1St2 T2 · OP1 C ← C + 1Ad2 T2 · OP2 C ← C + 1Sb2 T2 · OP3 C ← C + 1An2 T2 · OP4 C ← C + 1Or2 T2 · OP5 C ← C + 1N2 T2 · OP6 C ← C + 1Bz2 T2 · OP7 · ZXBn2 T2 · OP8 · ZXBz2 ∨ Bz2 T2 · OP7 C ← 0

Bn2 ∨ Bn2 T2 · OP8 C ← 0J2 T2 · OP9 C ← 0I2 T2 · OP10 C ← 0Ot2 T2 · OP11 C ← 0

Phase Control Signals Sequencer ControlMD3 T3 · OP0 · IRX11 C ← C + 1

MRD3 T3 · OP0 · IRX11 C ← C + 1St3 T3 · OP1 C ← 0Ad3 T3 · OP2 C ← C + 1Sb3 T3 · OP3 C ← C + 1An3 T3 · OP4 C ← C + 1Or3 T3 · OP5 C ← C + 1N3 T3 · OP6 C ← C + 1M4 T4 · OP0 C ← 0Ad4 T4 · OP2 C ← 0Sb4 T4 · OP2 C ← 0An4 T4 · OP4 C ← 0Or4 T4 · OP5 C ← 0N4 T4 · OP6 C ← 0

(Bz2 ∨ Bz2, and Bn2 ∨ Bn2 indicate actions that are taken as partof a conditional branch, whether or not the branch is taken.)


The CU: Full Control Specification

Phase Micro-instruction Control OutputF0 T0 : AR ← PC ,C ← C + 1 ARLD, SPC, CINF1 T1 : IR ← M[AR],PC ← PC + 1, IRLD, ME, PCIN, CIN

C ← C + 1M2 T2 · OP0 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINMD3 T3 · OP0 · IRX11 : SR ← M[AR], SRLD, ME, CIN

C ← C + 1

MRD3 T3 · OP0 · IRX11 : SRLD, RE, SS, CINSR ← if IR7 then IR6−0

else R[IR6−4],C ← C + 1M4 T4 · OP0 : R[IR10−8]← SR,C ← 0 RLD, SRF, SALU, CCL,

ALUOP = 100St2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINSt3 T3 · OP1 : M[AR]← R[IR10−8], MW, RE, SRF, SD, CCL

C ← 0Ad2 T2 · OP2 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN

C ← C + 1Ad3 T3 · OP2 : SRLD, RE, SS, CIN

SR ← if IR7 then IR6−0

else R[IR6−4],C ← C + 1


The CU: Full Control Specification (cont.)

Phase Micro-instruction Control OutputAd4 T4 · OP2 : R[IR10−8]← SR + DR, RLD, SRF, SALU, ZLD,

Z ← (SR + DR) = 0,C ← 0 ALUOP = 000, CCLSb2 T2 · OP3 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN

C ← C + 1Sb3 T3 · OP3 : SRLD, RE, SS, CIN


else R[IR6−4],C ← C + 1Sb4 T4 · OP3 : R[IR10−8]← SR − DR, RLD, SRF, SALU, ZLD,

Z ← (SR − DR) = 0,C ← 0 ALUOP = 011, CCL,An2 T2 · OP4 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN

C ← C + 1An3 T3 · OP4 : SRLD, RE, SS, CIN


else R[IR6−4],C ← C + 1An4 T4 · OP4 : R[IR10−8]← SR ∧ DR, RLD, SRF, SALU, ZLD,

Z ← (SR ∧ DR) = 0,C ← 0 ALUOP = 111, CCLOr2 T2 · OP5 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN

C ← C + 1



Phase Micro-instruction Control OutputOr3 T3 · OP5 : SRLD, RE, SS, CIN


else R[IR6−4],C ← C + 1Or4 T4 · OP5 : R[IR10−8]← SR ∨ DR, RLD, SRF, SALU, ZLD,

Z ← (SR ∨ DR) = 0,C ← 0 ALUOP = 110, CCLN2 T2 · OP6 : DR ← R[IR10−8], DRLD, RE, SD, SRF,

C ← C + 1 CINN3 T3 · OP6 : SRLD, RE, SS, CIN


else R[IR6−4],C ← C + 1

N4 T4 · OP6 : R[IR10−8]← DR, RLD, SRF, SALU, ZLD,

Z ← DR = 0,C ← 0 ALUOP = 101, CCLBz2 T2 · OP7 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP7 : C ← 0 CCL

Bz2

Bn2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBn2∨ T2 · OP8 : C ← 0 CCL

Bn2



Phase Micro-instruction Control OutputJ2 T2 · OP9 : PC ← IR7−0,C ← 0 PCLD, SAD, CCLI2 T2 · OP10 : R[IR10−8]← Din : C ← 0 RLD, SRF, IOE, CCLOt2 T2 · OP11 :

Dout ← if IR7 then IR6−0 SS, RE, IOLD, CCLelse R[IR6−4],C ← 0

The table gives the stage, the µ-instruction performed, and thecontrol signals output by the CU to realize the µ-instruction.

Equations for the outputs can be derived from the table by copyingout inputs for rows where the output signal occurs.


The Control CircuitryOutput Signal FormulaRLD T2 · OP10 + T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)MW T3 · OP1

IOLD T2 · OP11

PCLD T2(OP7 · ZX + OP8 · ZX + OP9)DRLD T2(OP2 + OP3 + OP4 + OP5 + OP6)SRLD T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)IRLD T1

ARLD T0 + T2(OP0 + OP1)ZLD T4(OP2 + OP3 + OP4 + OP5 + OP6)RE T2(OP2 + OP3 + OP4 + OP5 + OP6 + OP11)

+T3(OP0 · IRX11 + OP1 + OP2 + OP3

+OP4 + OP5 + OP6)ME T1 + T3 · OP0 · IRX11

IOE T2 · OP10

SPC T0

SALU T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)

SAD T2(OP0 + OP1 + OP7 · ZX + OP8 · ZX + OP9)SRF T2(OP0 + OP2 + OP3 + OP4 + OP5 + OP6 + OP10)

+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)SS T3 · OP11

+T3(OP0 · IRX11 + OP2 + OP3 + OP4 + OP5 + OP6)SD T2(OP2 + OP3 + OP4 + OP5 + OP6) + T3 · OP1

PCIN T1

ALUOP0 T4(OP3 + OP4 + OP6)ALUOP1 T4(OP3 + OP4 + OP5)ALUOP2 T4(OP0 + OP4 + OP5 + OP6)CIN T0 + T1

+T2(OP0 + OP1 + OP2 + OP3 + OP4 + OP5 + OP6)+T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)

CCL T2(OP7 + OP8 + OP9 + OP10 + OP11) + T3 · OP1

+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)


The Control Circuitry (cont.)

Output Signal FormulaSS T3 · OP11

+T3(OP0 · IRX11 + OP2 + OP3 + OP4 + OP5 + OP6)SD T2(OP2 + OP3 + OP4 + OP5 + OP6) + T3 · OP1

PCIN T1

ALUOP0 T4(OP3 + OP4 + OP6)ALUOP1 T4(OP3 + OP4 + OP5)ALUOP2 T4(OP0 + OP4 + OP5 + OP6)CIN T0 + T1

+T2(OP0 + OP1 + OP2 + OP3 + OP4 + OP5 + OP6)+T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)

CCL T2(OP7 + OP8 + OP9 + OP10 + OP11) + T3 · OP1

+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)

Examples

SRLD occurs in stages MD3, MRD3, Ad3, Sb3, An3, Or3, and N3

This corresponds to input signals T3 ·OP0 · IRX11, T3 ·OP0 · IRX11,T3 · OP2, T3 · OP3, T3 · OP4, T3 · OP5, and T3 · OP6.


Control for the Register Machine

Data-path modification.

LD E

RERLD

8IR

LDIRLD

16

16

IRX SAD

2

0

1

2

IR6-4

IR2-0

IR10-8

SRF

16

Reg>

16 Mode

16

16

SD SS1 SS2

16


Control for the Register Machine (cont.)

Data-path modifications:

I The register addressing MUX allows register numbers fromthe dst field, the src1 field, or the src2 field.

I The mode MUX now selects between the register directdestination (SD), either an immediate value or a registerdirect operand for the first source (SS1), or an immediatevalue or register direct operand for the second source (SS2).




C ← C + 1L2 T2 · OP0 : AR ← IR7−0, ARLD, SAD, CIN

C ← C + 1L3 T3 · OP0 : SR ← M[AR], SRLD, ME, CIN

C ← C + 1L4 T4 · OP0 : R[IR10−8]← SR, RLD, SALU, SRF = 10,

C ← 0 CCL, ALUOP = 100St2 T2 · OP1 : AR ← IR7−0, ARLD, SAD, CIN

C ← C + 1St3 T3 · OP1 : M[AR]← R[IR10−8], MW, RE, SRF = 10, SD,

C ← 0 CCLAd2 T2 · OP2 : DR ← R[IR6−4], DRLD, RE, SRF = 01,

C ← C + 1 SS1, CINAd3 T3 · OP2 : SR ← R[IR2−0], SRLD, RE, SRF = 00,

C ← C + 1 SS2, CINAd4 T4 · OP2 : R[IR10−8]← SR + DR, RLD, SRF = 10, SALU,

Z ← (SR + DR) = 0,C ← 0 ZLD, CCL, ALUOP = 000



Phase Micro-instruction Control OutputSb2 T2 · OP3 : DR ← R[IR6−4], DRLD, RE, SRF = 01,

C ← C + 1 SS1, CINSb3 T3 · OP3 : SR ← R[IR2−0], SRLD, RE, SRF = 00,

C ← C + 1 SS2, CINSb4 T4 · OP3 : R[IR10−8]← SR − DR, RLD, SRF = 10, SALU,

Z ← (SR − DR) = 0,C ← 0 ZLD, CCL, ALUOP = 011An2 T2 · OP4 : DR ← R[IR6−4], DRLD, RE, SRF = 01,

C ← C + 1 SS1, CINAn3 T3 · OP4 : SR ← R[IR2−0], SRLD, RE, SRF = 00,

C ← C + 1 SS2, CINAn4 T4 · OP4 : R[IR10−8]← SR ∧ DR, RLD, SRF = 10, SALU,

Z ← (SR ∧ DR) = 0,C ← 0 ZLD, CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[IR6−4], DRLD, RE, SRF = 01,

C ← C + 1 SS1, CINOr3 T3 · OP5 : SR ← R[IR2−0], SRLD, RE, SRF = 00,

C ← C + 1 SS2, CINOr4 T4 · OP5 : R[IR10−8]← SR ∨ DR, RLD, SRF = 10, SALU,

Z ← (SR ∨ DR) = 0,C ← 0 ZLD, CCL, ALUOP = 110



Phase Micro-instruction Control OutputN2 T2 · OP6 : DR ← R[IR6−4], DRLD, RE, SRF = 01,

C ← C + 1 SS1, CINN3 T3 · OP6 : SR ← R[IR2−0], SRLD, RE, SRF = 00,

C ← C + 1 SS2, CIN

N4 T4 · OP6 : R[IR10−8]← DR, RLD, SRF = 10, SALU,

Z ← DR = 0,C ← 0 ZLD, CCL, ALUOP = 101Bz2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP8 : C ← 0 CCL

Bz2


Bn2

J2 T2 · OP7 : PC ← IR7−0,C ← 0 PCLD, SAD, CCL


Control for the Accumulator Machine

Data-path modification: The same data-path used for the registerimplicit machine is used for the accumulator machine, except thatthe SR register is renamed the AC.


C ← C + 1L2 T2 · OP0 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINL3 T3 · OP0 : AC ← M[AR], ACLD, ME, CCL

C ← 0St2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINSt3 T3 · OP1 : M[AR]← AC ,C ← 0 MW, SALU,

ALUOP=100, CCLAd2 T2 · OP2 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINAd3 T3 · OP2 : AC ← AC + DR, ACLD, SALU, ZLD,

Z ← (AC + DR) = 0,C ← 0 CCL, ALUOP = 000Sb2 T2 · OP3 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINSb3 T3 · OP3 : AC ← AC − DR, ACLD, SALU, ZLD,

Z ← (AC − DR) = 0,C ← 0 CCL, ALUOP = 011


Control for the Accumulator Machine (cont.)

Phase Micro-instruction Control OutputAn2 T2 · OP4 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINAn3 T3 · OP4 : AC ← AC ∧ DR, ACLD, SALU, ZLD,

Z ← (AC ∧ DR) = 0,C ← 0 CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINOr3 T3 · OP5 : AC ← AC ∨ DR, ACLD, SALU, ZLD,

Z ← (AC ∨ DR) = 0,C ← 0 CCL, ALUOP = 110N2 T2 · OP6 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CIN

N3 T3 · OP6 : AC ← AC , ACLD, SALU, ZLD,

Z ← AC = 0,C ← 0 CCL, ALUOP = 101Bz2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP8 : C ← 0 CCL

Bz2


Bn2

J2 T2 · OP7 : PC ← IR7−0,C ← 0 PCLD, SAD, CCL


Control for the Stack Machine

Data-path modifications:

LD E

RERLD

8IR

LDIRLD

16

16

IRX SAD

16

Reg>

16

TP

TPIN TPDE

IN DE

3

>

>


Control for the Stack Machine (cont.)

I The register file is now only addressed by the stack toppointer (TP).

I The TP register is an up-down counter, capable of beingincremented for the pop operation, and decremented for thepush operation.




C ← C + 1Pu2 T2 · OP0 : AR ← IR7−0, ARLD, SAD, TPDE,

TP ← TP − 1,C ← C + 1 CIN

PuM3 T3 · OP0 · IRX11 : R[TP]← M[AR] RLD, MEPuI3 T3 · OP0 · IRX11 : R[TP]← IR7−0 RLD, SADPuM3∨ T3 · OP0 : C ← 0 CCL

PuI3Po2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINPo3 T3 · OP1 : M[AR]← R[TP], MW, RE, TPIN, CCL

TP ← TP + 1,C ← 0Ad2 T2 · OP2 : DR ← R[TP], DRLD, RE, TPIN, CIN

TP ← TP + 1,C ← C + 1Ad3 T3 · OP2 : SR ← R[TP],C ← C + 1 SRLD, RE, CINAd4 T4 · OP2 : R[TP]← SR + DR, RW, SALU, ZLD,

Z ← (DR + SR) = 0,C ← 0 CCL, ALUOP = 000



Phase Micro-instruction Control OutputSb2 T2 · OP3 : DR ← R[TP], DRLD, RE, TPIN, CIN

TP ← TP + 1,C ← C + 1Sb3 T3 · OP3 : SR ← R[TP],C ← C + 1 SRLD, RE, CINSb4 T4 · OP3 : R[TP]← SR − DR, RW, SALU, ZLD,

Z ← (DR − SR) = 0,C ← 0 CCL, ALUOP = 011An2 T2 · OP4 : DR ← R[TP], DRLD, RE, TPIN, CIN

TP ← TP + 1,C ← C + 1An3 T3 · OP4 : SR ← R[TP],C ← C + 1 SRLD, RE, CINAn4 T4 · OP4 : R[TP]← SR ∧ DR, RW, SALU, ZLD,

Z ← (DR ∧ SR) = 0,C ← 0 CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[TP], DRLD, RE, TPIN, CIN

TP ← TP + 1,C ← C + 1Or3 T3 · OP5 : SR ← R[TP],C ← C + 1 SRLD, RE, CINOr4 T4 · OP5 : R[TP]← SR ∨ DR, RW, SALU, ZLD,

Z ← (DR ∨ SR) = 0,C ← 0 CCL, ALUOP = 110N2 T2 · OP6 : DR ← R[TP],C ← C + 1 DRLD, RE, CIN

N3 T3 · OP6 : R[TP]← DR, RW, SALU, ZLD,

Z ← (DR) = 0,C ← 0 CCL, ALUOP = 101



Phase Micro-instruction Control OutputBz2 T2 · OP8 · ZX : PC ← R[TP], PCLD, RE, TPIN

TP ← TP + 1Bz2∨ T2 · OP8 : C ← 0 CCL

Bz2

Bn2 T2 · OP9 · ZX : PC ← R[TP], PCLD, RE, TPINTP ← TP + 1

Bn2∨ T2 · OP9 : C ← 0 CCL

Bn2

J2 T2 · OP7 : PC ← R[TP], PCLD, RE, CCLTP ← TP + 1,C ← 0

I The movR/M instruction has been replaced by the pushinstruction, and the store instruction has been replaced by thepop instruction.

I Branch instructions ge the target address off the top of thestack, instead of as an immediate value operand.


Chapter 8: Computer Arithmetic

Performed by the ALU (or ALSU) (Arithmetic, logic, shift unit).Operations:

I Arithmetic.I Logic (bit-wise Boolean).I Shift.

Logic and Shift operations are performed by shift units, or logicgate arrays.

A0

A1

A2

A3

Z0

Z1

Z2

Z3

A Z

0

shl

A0

A1

A2

A3

B0

B1

B2

B3

Z0

Z1

Z2

Z3

A

BZ

4 4 4

4

4


Arithmetic Operations

Arithmetic is performed on elements of numeric data-types.Numeric data-types:

I Integer DataI Unsigned IntegerI Signed Integer

I Floating-Point Data


Unsigned and Signed Integers

I With unsigned integers, all bit configurations of the word areconsidered to represent non-negative integers.

I With signed integers, half of the bit configurations representnegative integers, and half represent non-negative integers.

I Unsigned integers are used for representing and manipulatingaddresses, that are all non-negative.

I Signed integers are used to implement general integerarithmetic.


Unsigned and Signed Integers (cont.)

I Example: the 8-bit configuration 11110100, interpreted as anunsigned value would just be the very large decimal number244. Interpreted as a signed integer, however, it represents–12.

I The processor treats a bit configuration according to theinstruction it is executing.

addu R0, R1 ; treat the values in R0, and R1 as unsigned

addi R0, R1 ; treat the values in R0, and R1 as signed


Unsigned and Signed Integers (cont.)

I Unsigned integers (8-bit): 00000000 to 11111111 (0 to28 − 1 = 255).

I Signed integers (8-bit): The top bit is used to determine sign.I 0 (non-negative): 00000000 to 01111111 (0 to 27 − 1 = 127).I 1 (negative): 11111111 to 10000000 (–1 to 27 = 128).

I Negative numbers proceed towards negative infinity bycounting with zeros; 11111111, 11111110, 11111101,11111100, ...

I Notice that there is one more negative number, than positivenumbers, since one non-negative configuration is taken by 0.


Unsigned Arithmetic

Operations:

I Addition: Z ← A + B

I Subtraction: Z ← A− B

I Multiplication: Z ← A× B

I Division: Z ← A÷ B

I Remainder: Z ← A mod B


Unsigned Addition

Binary addition requires two operands, and a carry-in as input, andproduces a sum and a carry-out as output. The carry-out isdiscarded, serves a purpose detecting overflow.

Examples (Demonstrating overflow.)

52 + 141 = 193

0 00

10

11

11

10

01

00

00

+1000110111000001


Unsigned Addition (cont.)

Examples (Demonstrating overflow.(cont.))

151 + 116 = 11 (something is wrong: arithmetic overflow)

1 11

10

10

01

10

01

01

01

+0111010000001011

The processor typically sets a status flag (V) when there isoverflow. Overflow ocurrs when there is a non-zero carry-out.

V = Cout


Unsigned Subtraction

Subtraction is done by the adder circuit. That isZ ← A− B

is implemented asZ ← A + (−B)

(The A operand is added to the two’s compliment of the Boperand, which is the one’s compliment of B plus a carry-in of 1.)

1

A

B

Cin

Cout

S+


Unsigned Subtraction (cont.)

Examples (Overflow)

55− 14 = 41

1 10

10

11

01

10

11

11

11

+1111000100101001

57− 112 = −55 (something is wrong: we cannot representnegative values as unsigned integers)

0 10

10

11

11

11

10

10

11

+1000011111000001

In subtraction, overflow occurs when the carry out is zero.V = Cout


Unsigned Multiplication

6× 11 = 66

0110×1011

01100110

000001101000010

The intermediate rows are either shifted copies of the multiplicand,or rows of zeros.


Unsigned Multiplication (cont.)

A method for the ALSU. (4-bit multiplication)I We use three registers

I M — The multiplier: 4 bits wide. For k-bit multiplication, thisregister would be k bits wide.

I N — The multiplicand: 8 bits wide. For k-bit multiplication,this register would be 2k bits wide.

I P — The product: 8 bits wide. For k-bit arithmetic, thisregister would be 2k bits wide.

(The N, and P registers must be the same size, since the Nregister is added to the M register.)

I We shift the product right, and hold the multiplicandstationary, rather than shifting the multiplicand left, andholding the product stationary.

I Rather than save up intermediate rows, we add themultiplicand to the product immediately.



N M P Action

01100000 1011 00000000 +01100000 →

0101 00110000 +10010000 →

0010 01001000 →0001 00100100 +

10000100 →0000 01000010 X

(Trace of 6× 11.)



I The multiplicand starts in the top half of the N register. Itremains constant throughout the algorithm.

I A loop is executed. Each time through the loop the M, andthe P registers are shifted one bit to the left.

I On iterations where the bottom bit of the multiplier is 1, theN register is added to the P register, before the shift.

I The iterations stop when 4 shifts have been performed.



Algorithm:

N7−4 = N3−0

N3−0 = 0P = 0I = 0while I 6= 4 do

if M0 then

P = P + NM = M >> 1P = P >> 1I = I + 1



µ-Program:

Def: I 4X ≡ I = 4,Tk ≡ C = kT0 : N7−4 ← N3−0,N3−0 ← 0,P ← 0, I ← 0,C ← C + 1T1 · I 4X : C ← 4

T1 · I 4X : C ← 2T2 ·M0 : P ← P + NT2 : C ← C + 1T3 : M ← shr M,P ← shr P, I ← I − 1,C ← 1T4 :

Notice that while addition can be performed in one clock cycle,multiplication requires several, and is significantly slower thanaddition.


Unsigned Division

79÷ 7: quotient 11, remainder 2.

01011111|1001111−111

010111−111

1001−111

010

The intermediate rows are either shifted copies of the divisorsubtracted from the remaining dividend, a subtraction of zero (nosubtraction).


Unsigned Division (cont.)

A method for the ALSU. (4-bit division)I We use three registers

I Q — The quotient: 4 bits wide. For k-bit multiplication, thisregister would be k bits wide.

I D — The divisor: 8 bits wide. For k-bit multiplication, thisregister would be 2k bits wide.

I R — The remainder: 8 bits wide. For k-bit arithmetic, thisregister would be 2k bits wide.

(The D, and R registers must be the same size, since the Dregister is added to the R register.)

I We shift the remainder left, and hold the divisor stationary,rather than shifting the divisor right, and holding theremainder stationary.

I The remainder starts as the dividend, and as we proceed itrepresents the portion of the dividend remaining. When wefinish the R register contains the remainder after division.



D Q R Action

01110000 0000 01001111 ←0000 10011110 -0001 00101110 ←0010 01011100 ←0100 10111000 -0101 01001000 ←1010 10010000 -1011 00100000 X

(Trace of 79÷ 11.)



I The divisor starts in the top half of the D register. It remainsconstant throughout the algorithm.

I A loop is executed. Each time through the loop the Q, andthe R registers are shifted one bit to the right.

I On iterations where, after the shift, R ≥ D, the D register issubtracted from the R register, and 1 is added to the Qregister.

I The iterations stop when 4 shifts have been performed.



Algorithm:

D7−4 = D3−0

D3−0 = 0Q = 0I = 0while I 6= 4 do

Q = Q << 1R = R << 1if R − D ≥ 0 then

R = R − DQ = Q + 1

I = I + 1R3−0 = R7−4

R7−4 = 0



µ-Program:

Def: I 4X ≡ I = 4,Tk ≡ C = k ,RDX ≡ R − D ≥ 0T0 : D7−4 ← D3−0,D3−0 ← 0,Q ← 0, I ← 0,C ← C + 1T1 · I 4X : C ← 4

T1 · I 4X : C ← 2T2 : Q ← shl Q,R ← shl R,C ← C + 1T3 · RDX : R ← R − D,Q ← Q + 1T3 : I ← I + 1,C ← 1T4 : R3−0 ← R7−4,R7−4 ← 0,C ← 5T5 :

The circuit that that performs add-shift multiplication can be builtso that it can be “reversed” to perform shift-subtract division.


Signed Addition and Subtraction

Addition of signed integers is done using the same adder as is usedfor unsigned addition. A difference is in how overflow is detected.

Examples (Overflow)

118− 38 = 80

1 10

11

11

11

10

11

01

00

+1101101001010000

95 + 51 = 146 (something when wrong: the resut of adding twopositive numbers was negative)

0 10

11

10

11

11

11

11

01

+0011001110010010


Signed Addition and Subtraction (cont.)

Overflow occurs when the sum of two operands of the same signresults in a value of the opposite sign. Or, equivalently, thecarry-in to the sign bit is not equal to the carry-out of the sign bit.

V = Cout ⊕ Cin

Examples (No overflow with negative operands.)

−40− 3 = −43 (Cin = Cout , so no overflow.)

1 11

11

10

11

01

00

00

00

+1111110111010101


Signed Multiplication and Division

We can use the unsigned add-shift multiplier circuit to multiplysigned numbers, following the following procedure. (Example:11110110× 00000110.)

1. Calculate their magnitudes: 00001010, and 00000110.

2. Use the unsigned multiplier to multiply the magnitudes:0001010× 0000110 = 00111100.

3. Calculate the product sign, by taking the XOR of the twooperand sign bits: 1⊕ 0 = 1.

4. If the product sign is negative, take the two’ s compliment ofthe product magnitude, to produce the actual product:11000100.


Unsigned Multiplication and Division (cont.)

I For division we can use the unsigned shift-subtract circuit onthe magnitudes of the operands, ans we did for signedmultiplication.

I The sign of the quotient is calculated as the XOR of the signsof the operands, as it is for the product in multiplication.

I For the remainder, it represents part of the dividend. It shouldhave the same sign as the dividend.

Examples

d ÷D (d is the dividend, and D is the divisor) should yield (Q,R),where d = D × Q + R.

−56÷ 9 = (−6,−2), where −56 = −6 · 9 + (−2).


Floating-Point Data

I Floating-point (FP) is used to represent “real numbers”.

I We cannot represent exactly all real numbers. For instance πcannot be represented with a finite number of bits, since it isirrational.

I We represent only rational numbers, with finite expansions.Arbitrary real numbers must be approximated by the closestsuch rational number.

I We store rationals with finite expansions in a FP word with afixed width. To do this we store the scientific notationrepresentation of the number.


Floating-Point Data (cont.)

I The scientific notation for the number has 3 pieces ofsignificance: (for the example −43.8125 = −4.38125× 101)

I sign of the number (negative, in this case).I mantissa; the fraction (4.38125).I exponent; the power (1 in this case)

(The base, which in this case is 10, is insignificant, since 10 isalways used.)

I The mantissa is always normalized. (There is only onenonzero digit above the decimal point.)

I The decimal point is moved, or floated to normalize themantissa.

I Certain numbers, like 0, cannot be normalized. (You cannotwrite 0 as a mantissa with a nonzero digit.) These numbersare given standard notations. (0 = +0.0× 100)



I The computer uses binary scientific notation.−101011.1101 = −1.010111101× 25

I In binary, the leading digit in a normalized mantissa is 1.

I Storing a rational number in a floating-point word. (Afloating-point word is 32 bits.)

sign exponent mantissa

1 8 23



Floating-point format.

I sign — behaves the same as an integer sign bit (0,non-negative; 1 negative).

I mantissa — a 23-bit field. Since the leading bit is 1 fornormalized numbers, it is not stored. For the example,−1.010111101× 25, the mantissa stored is 0101 1110 10000000 0000 000 (truncate off the “1.”, pad to 23 bits withzeros).

I exponenet — stored in 127-bias notation. To calculate the127-bias notation, add 127 to the exponent, and write theresult in 8 bits.



127-bias notation.The top bit of the exponent is its sign bit. It behave in thecontrary to 2s complimnet; a 0 in the sign bit indicates tha theexponent is non-positive, and a 1 indicates that it is positive.

Decimal Number 127-Bias–127 00000000–126 00000001–125 00000010–124 00000011...

...0 011111111 100000002 100000013 10000010...

...128 11111111



Examples (127-bias)

Actual exponent: -5−5 + 127 = 122 = 01111010 127-bias exponent

127-bias exponent: 1000011110000111− 01111111 = 135− 127 = 8 actual exponent

Shortcut conversion:

I Positive exponent, x : write a 1 sign bit, and the 7-bit valuex − 1.Actual exponent — 12: 1 0001011 = 10001011, 127-biasexponent.

I Non-positive exponent, x : write a 0 sign bit, and a 7-bit 1scompliment of x .Actual exponent — –12: 0 comp(0001100) = 0 1110011 =01110011, 127-bias exponent.



Examples (−1.010111101× 25)

Sign: 1Exponent: 10000100Mantissa: 0101 1110 1000 0000 0000 000

Floating-point word: 1,10000100,0101 1110 1000 0000 0000 000


Converting between Floating-Point and Decimal

1. Convert the decimal number to binary.

2. Write the binary number in scientific notation.

3. Pack the scientific notation number into the floating-pointword.

Converting to binary

I The integer part is converted using successive division.

I The fraction part is converted using successive multiplication.

I In successive multiplication, we multiply the fraction by 2. Foreach multiplication whatever digit pops up above the decimalpoint is recorded as a part of the result, and deleted.


Converting between Floating-Point and Decimal (cont.)

Example: –43.8125 (43 = 101011)

Calculation Integer Part Fractional Part

.8125× 2 1 .625

.625× 2 1 .25

.25× 2 0 .5

.5× 2 1 .0

(.8125 = .1101)–43.8125 = –101011.1101Scientific notation: −1.010111101× 25

Floating-point format: 1,10000100,0101 1110 1000 0000 0000 000(Calculate 127-bias exponent, truncate “1.” off of the mantissa.)



Examples (From floating-point to decimal.)

0,01111110,1010 0000 0000 0000 0000 000:

Sign: +Exponent: −comp(1111110) = −0000001 = −1Mantissa: (1.)1010 = 1.1010

Scientific notation: +1.101× 2−1

Converting from scientific notation to decimal.

I CM — convert and multiply. Convert all parts to decimal, andmultiply them, in decimal, to obtain the result.

I MC — multiply and convert. Multiply the parts, in binary,and convert the result to decimal.



CM ConversionIn the same way that we can rewrite 2.543 = 2543

103 , we can write

the mantissa 1.101 = 110123 = 1101

1000 .

+1.101× 2−1 = + 11011000 ×

110 = + 13

8 ×12 (convert to decimal)

= + 1316 = +0.8125 (multiply)

MC Conversion+1.101× 2−1 = + 1101

23 × 121 = + 1101

24 = +0.1101 (multiply)

= + 132−4 = + 13

16 = +0.8125 (convert)


Standardization

I Several different formats for the floating-point word used toexist.

I This caused programs to be non-portable. A program wouldwork on one machine, but would result in floating-pointoverflow on another, due to a smaller exponent field.

I Industry stakeholders got together, and agreed on a standardformat for the floating-point word. This standardization wasfacilitated by IEEE.

I The floating-point standard is referred to as the IEEE 754standard. It is the 32-bit format we have been using.


Standardization (cont.)

I For some scientific applications an 8-bit exponent, and a23-bit mantissa are not precise enough.

I IEEE 754 actually has two standard floating-point wordwidths: a 32-bit word, and a 64 bit format.

I The two word sizes are referred to as single precision, anddouble precision. They correspond to the C types of float,and double.

I The double precision exponent used 1023-bias notation, whichis the 11-bit version of 127-bias.

sign exponent mantissa

1 11 52


Standardization (cont.)

Non-standard numbers include:

I 0 — represented as 0,00000000,0000 0000 0000 0000 0000000.

I +∞I −∞I NaN — (not a number) a value used, often to represent an

undefined result in some arithmetic calculation.


Field Order

I When we write numbers in scientific notation, the order of thefields is sign, mantissa, and exponent.

I in floating-point we do not use this usual order. In stead wewrite the number as sign, exponent, and mantissa.

I The floating-point format allows us to use integer comparisoncircuitry to compare floating-point numbers

Examples (Comparing floating-point bit strings as if they wereintegers.)

Exponent first:a = 0, 10000011, 00100000000000000000000 = +1.001× 24

b = 0, 01111000, 00110000000000000000000 = +1.0011× 2−7

a > b, as it should be, because the most significant bits (theexponent) are further to the left.


Field Order (cont.)

Examples (Comparing floating-point bit strings as if they wereintegers. (cont.))

Mantissa first:a = 0, 00100000000000000000000, 10000011b = 0, 00110000000000000000000, 01111000

b > a, incorrectly, because the least significant bits (themantissa) are further to the left.

We use 127-bias for the same reason. That is to say that the bitstring for a negative exponent should appear smaller that the bitstring for a positive exponent.


Field Order (cont.)

Examples (Comparing exponents as part of an integer.)

127-bias:a = 0, 10000011, 00100000000000000000000 = +1.001× 24

b = 0, 01111000, 00110000000000000000000 = +1.0011× 2−7

a > b, as it should be, because a positive exponent starts witha 1, and a negative exponent starts with a 0.

2s-compliment:a = 0, 00000100, 00100000000000000000000 = +1.001× 24

b = 0, 11111001, 00110000000000000000000 = +1.0011× 2−7

b > a, incorrectly, because a positive exponent starts with a 0,and a negative exponent starts with a 1.


Arithmetic Approximation

Three properties associated with approximating real numbers withfloating-point.

I Precision. The number of digits in the mantissa. So, for theIEEE single precision format, we would say that we have aprecision of 24 bits. (This works out to be about sevendecimal digits of precision.)

I Range. This is the interval of numbers that we can represent,from the largest to the smallest. For the single precisionformat, this is the interval [1,11111110,1111 1111 1111 11111111 111 .. 0,11111110,1111 1111 1111 1111 1111 111] (thesmallest non-infinite negative number, to the largestnon-infinite positive number).

I Gap. The largest distance between consecutive, representablerational numbers.


Arithmetic Approximation (cont.)

The gap draws attention to the fact that we cannot represent allreal numbers in the range.

Examples

0, 10000011, 01010000111101010000110 =+1.0101, 0000, 1111, 0101, 0000, 110× 24 (approximately21.059826)

The next largest number is:0, 10000011, 01010000111101010000111 =+1.0101, 0000, 1111, 0101, 000, 111× 24 (approximately21.059828) (derived by flipping the rightmost bit)


Arithmetic Approximation

The range can also be exceeded by an arithmetic calculation.

I Floating-point overflow — The magnitude of the number islarger than the largest floating-point number that isrepresentable. This is indicated by an exponent that exceeds11111111 (128).

I Floating-point underflow — The magnitude of the number isto small to be represented. This is indicated by n exponentthat is smaller that 00000000 (-127).


Rounding

I When performing an arithmetic calculation, we cannotrepresent infinite precision.

I We typically keep the allowable precision, and two more bits.These bits are called the round bt, and the guard bit. (Thehigh-order bit is the round bit.)

I For example, if we had 5 bits of precision, the result of acalculation might be +1.0110, 10× 26, with round and guardbits.

I When packing this into the floating point word, the round,and guard bit are dropped, through the process of rounding.

I IEEE 754 allows 4 methods to be used to round.


Rounding (cont.)

Rounding methods:

I Round Nearest (RN): Round the number to the nearestfloating-point number.

I Round Zero (RZ): Round the number to the closestfloating-point number, towards zero.

I Round Positive (RP): Round the number to the closestfloating-point number, in the direction of positive infinity.

I Round Minus (RM): Round the number to the closestfloating-point number in the direction of negative infinity.


Rounding (cont.)

I The round and guard bits are used to determine if rounding isnecessary (they are non-zero).

I The round bit bit is used to perform RN. If it is 1, then youwould round up. If it is 0, then you would round down.

I To round up, you add 1 to the magnitude, and drop the roundand guard bits.

I To round down, or truncate, you simply drop the round andguard bits.

I For RN you use the round bit to determine if you round themagnitude up, or just truncate it.


Rounding (cont.)

I For RZ you always truncate.

I For RM you truncate positive numbers, and round up negativenumbers.

I For RP you truncate negative numbers, and round up positivenumbers.

Method +1.0110, 10 −1.0110, 10 +1.0110, 01 −1.0110, 01

RN +1.0111 −1.0111 +1.0110 −1.0110RZ +1.0110 −1.0110 +1.0110 −1.0110RP +1.0111 −1.0110 +1.0111 −1.0110RM +1.0110 −1.0111 +1.0110 −1.0111


Floating-Point Addition

Example: +1.0111× 23 − 1.1011× 22

1. Align binary points. Adjust exponent to the largest.+1.0111, 00× 23 − 0.1101, 10× 23.

2. Add the mantissas.

001.0111, 001 10

10

01.

10

11

01

01,

00

00

−000.1101.10 ⇒ +111.0010, 10

000.1001, 10

Result:

+0.1001, 10× 23.

3. Normalize the mantissa.+1.0011, 00× 22.

4. Round the mantissa (using one of the four rounding methods).+1.0011× 22.

(It may be necessary to re-normalize the mantissa. Example:1.1111,11 rounded up to 10.000.)


Floating-Point Addition (cont.)

Algorithm: FP numbers (sign, exponent, mantissa) (SA, EA, MA)added to (SB, EB, MB) producing (S, E, M)

EZ = EA

if EB > EA then

EZ = EB

while EA < EZ do

EA = EA + 1

MA = MA >> 1

while EB < EZ do

EB = EB + 1

MB = MB >> 1

if SA then

MA = -MA

if SB then

MB = -MB

MZ = MA + MB

SZ = 0

if MZ < 0 then

MZ = -MZ

SZ = 1

if MZ[1] then

MZ = MZ >> 1

EZ = EZ + 1

while !M[0] do

MZ = MZ << 1

EZ = EZ - 1

M[-5..-6] = [00]


Floating-Point Multiplication

Example: +1.0111× 23 ×−1.1011× 22

1. Add the exponents.+1.0111×−1.1011× 25.

2. Multiply the mantissas.1.0111×1.1011

1 011110 111

000 001101 1

1 101110.11001101

Result: +1×−1× 10.1100, 11× 25 (The sign has not beendetermined yet.)

3. Calculate the sign. −10.1100, 11× 25


Floating-Point Multiplication (cont.)

Example: +1.0111× 23 ×−1.1011× 22 (cont.)

4. Normalize the result.−1.0110, 01× 26

5. Round the result. We would use one of the four roundingmethods. (For the example, we use RM.)−1.0111× 26


Floating-Point Multiplication (cont.)

Algorithm: FP numbers (sign, exponent, mantissa) (SA, EA, MA)added to (SB, EB, MB) producing (S, E, M)

EZ = EA + EB

MZ = MA * MB

SZ = SA ^ SB

if MZ[1] then

MZ = MZ >> 1

EZ = EZ + 1

if SZ then

if MZ[-5] | MZ[-6] then

MZ[1..-4] = MZ[1..-4] + [000001]

MZ[-5..-6] = [00]

if MZ[1] then

MZ = MZ >> 1

EZ = EZ + 1


Computer Arithmetic: Increasing Efficiency

Because multiplication is so much slower than addition, efforts toincrease arithmetic efficiency have concentrated on multiplication.

I Wallace Trees. Instead of using one adder to sequentially addthe rows of shifted multiplicand to the product, use one adderper row. This way the additions can be carried out in parallel.The adders are usually organized as a tree.

I ROM Lookup Table. Instead of doing the multiplication ofsmaller numbers, store the products in a ROM, and look upthe result using the operands as an address.

I Arithmetic Pipeline. Do the additions sequentially, but withmultiple address. Once an adder has done its job on thecurrent operation, it can be used to do the nextmultiplication. In this way the data-path can actually work onseveral multiplications simultaneously.


Chapter 9: Micro-Programmed CPU Design

µ-instructions look like programming langauge instructions.

Examples

RTL.R ← R + S

C++R = R + S;

More complex RTLc : R ← R + 1c : R ← 0

More complex C++if(c)

R = R + 1;

else

R = 0;


Micro-Programmed CPU Design (cont.)

Structure of the µ-controlled processor:

>

μROMμAR

μDec

control input

control output

sequence

μop

Address selector


Micro-Programmed CPU Design (cont.)

Elements of the µ-sequencer.

I µ-ROM — contains the µ-instructions that implement themachine cycle.

I µAR — (µ-address-register) contains the address of thecurrent µ-instruction. It is changed each clock cycle.

I Address selector — chooses the next µ-instruction address.

I µDec — (µ-decoder) sends the required control signals toimplement the current µ-instruction.


Micro-Instruction Format

We must design a numeric form of RTL instructions, tailored tothe BRIM machine.

Fields:

I µ-op field. Specifies the µ-instruction to be performed.

I sequence field. Specifies the next µ-instruction to beexecuted.


The Sequence Field

The µ-progrm is stored in the µROM, one instruction per word.

Address calculation methods:

1. Increment the current address. The sequencer would use anadder to add one to the current address, producing theaddress of the next micro-instruction.

2. Unconditional jump to a new address. The new address mightbe given as a field in the micro-instruction.

3. Conditional branch to a new address. The sequencer woulddecide whether or not to take the branch, based on a controlinput. If the branch were taken, the new address would belooked up in a ROM jump table. if the branch were taken. Ifthe branch were not taken, the current address would beincremented.


The Sequence Field (cont.)

The sequence filed contains a code that indicated=s the choice ofthe addressing methods.

Alternate ways of calculating a conditional branch address:

I A hardwired address calculator could be used to calculate thenew address.

I The new address might be given as part of themicro-instruction, in the same way as it is for an unconditionalbranch.

I The new address might be looked up in a ROM table, oftencalled a jump table, using the control input to the sequenceras an index into the table. (This is our choice.)


The Sequence Field (cont.)

A simple jump table, based on the BRIM machine.Inputs are used as the address into the table:

I IRX15−12 — the op-code.

I IRX11 — the addressing mode bit.

I ZX — the contents of the Z flag.

IRX15−12 IRX11 ZX Address

0000 X X 011010101 1 X 100111010 X 1 01111


The Select and Address Sub-Fields

The sequence field can be split into two subfields:

I select field: selects between address calculation methods 2,and 3. (Method 1 is not implemented for the BRIM machine.)

I address field: the address of an unconditional branch.(Conditional branches use a jumo table.)

address select

sequence

micro-op

1 405


The Select and Address Sub-Fields (cont.)

I The sequence field is one of th inputs to the address selector.

I The other input is the control word.

I The control word consists of the op-code, and the value of theZ flag.

ZXIRX

16 1


The Select and Address Sub-Fields (cont.)

Address Selector Structure:

μCROMmapper 0

1

control word

selectaddress

μAR

address selector

I The µCROM (micro control ROM) contains the jump table.

I For a conditional branch, the control word is hashed into anaddress by the mapper, and that jump table entry is used asthe address of the next µ-instruction.

I For an unconditional branch, the address field from theµ-instruction is used as the address of the next µ-instruction.


Micro Architectures

I The µDec takes the µ-op field of the µ-instruction, andtranslates it into control signals that are sent to the data-path.

I There are several ways of implementing the µ-op field.I Direct control — The µ-op field is simply composed of the

control signals.I Vertical control — The data-path is split into several units.

The µ-op field specifies an op-code for each of the units.I Horizontal control — The µ-op field is composed of bits that

correspond to individual µ-operations in the processor. This isa compromise between vertical control, and direct control.


Micro Architecture (cont.)

Examples (Simple Machine)

A single operation:M[TP + 1]← M[TP] + M[TP + 1],TP ← TP + 1

The µ-instructions:AR ← TPX ← M[AR],AR ← AR + 1Y ← M[AR],TP ← TP + 1M[AR]← X + Y


Micro Architecture (cont.)

Names, and control for each µ-operation:

Code Name Micro-Inst. Control

Reg1 AR ← TP ARLD, STPMem1 X ← M[AR] XLD, MEReg2 AR ← AR + 1 INARMem2 Y ← M[AR] YLD, MEReg3 TP ← TP + 1 INTPMem3 M[AR]← X + Y MW, SADD

We have split the operations into two categories: Reg (registeroperations), and Mem (memory operations). Notice that no tworegister operations are performed in the same µ-instruction, and notwo memory operations are performed in the same µ-instruction.


Direct Control

In direct control, the µop field contains one bit for each controlline. For our sample machine this would result in a 9-bit field.

ARLD STP

XLD ME

INAR

YLD

INTP

MW

SADD

0 0 0 0 0 011 1

(This sample shows the µop field for the µ-instructionX ← M[AR],AR ← AR + 1.)Notice that the µDec is just a straight-line connection between theµ-instruction, and the control lines, for direct control.


Horizontal Control

In horizontal control, each µ-operation has a bit in the µop field.We start by giving each µ-operation a name. (Notice that this, ingeneral, shrinks the size of the µop field.)

Bit Name Micro-Op.

TP2AR AR ← TPRdX X ← M[AR]ARInc AR ← AR + 1RdY Y ← M[AR]TPInc TP ← TP + 1WtXY M[AR]← X + Y


Horizontal Control (cont.)

The resulting µop field, for our simple machine is 6 bits wide. Weshow the same sample µ-instruction.

TP2AR ARInc

RdX RdY

TPInc

WtXY

0 1 1 0 0 0

The µDec is no longer a simple direct connection. It musttranslate between µ-operations, and control signals.


Horizontal Control (cont.)

The Horizontal control µDec:

Interface:

μdec

TP2AR

RdX

ARInc

RdY

TPInc

WtXY

ARLD

XLD

STP

ME

INAR

YLD

INTP

MW

SADD

Control:

Output Inputs

ARLD TP2ARSTP TP2ARXLD RdXME RdX + RdYINAR ARIncYLD RdYINTP TPIncMW WtXYSADD WtXY


Vertical Control

I The µop field is divided into sub-fields, correspojnding togroups of µ-operations.

I We group µ-operations. Typically the grouping is based ondata-path devices.

I We assign each µ-operation in a group an op-code. Toperform a particular µ-operation, the op-code is stored in thesub-field belonging to that group.

I You must verify that no two µ-operations in a group areperformed simultaneously.

I Groups for our example machine:I Micro-operations associated with register manipulation.I Micro-operations associated with memory manipulation.


Vertical Control (cont.)

The names previously given to the µ-operations, Reg1, Reg2,Reg3, Mem1, Mem2, and Mem3, give the group, and code foreach µ-operation.

Instruction format, and codes for sample µ-instruction:

Reg

Mem

01 0 1

(Notice that the width of the µop field decreases from thehorizontal control.)


Vertical Control (cont.)

The vertical µDec.

I The two sub-fields of the µop field are decoded in to triggerlines for the horizontal machine.

I The trigger lines are fed into a horizontal µDec, whichdecodes them into control signals for the data-path.

Reg

Mem

Dec

Dec

Hrztl.μDec

2

2Data-path

9

123

123

TP2AR

RdXRdy

ARIncTPInc

WtXY

(Notice that as we move from direct control to vertical control, thesize of the µop field, and therefor the µROM, decreases. However,the complexity of the µDec increases.)


Micro-Control for the BRIM Machine

We will be implementing the horizontal control for the BRIMmachine, since it is a good compromise in terms of the size of theµROM, and the complexity of the µDec.

Design tasks:

I Specify the contents of the µROM.

I Specify the contents of the µCROM.

I Describe the structure of the mapper.

I Give a structural description of the µDec.


The BRIM Micro-Program

Naming the BRIM µ-operations.

Micro-Operation Signal

AR ← PC PC2ARIR ← M[AR] RdIRPC ← PC + 1 PCIncAR ← IR7−0 IR2ARSR ← M[AR] RdSRSR ← if IR7 then IR6−0 else R[IR6−4] SRFtchR[IR10−8]← SR SR2RM[AR]← R[IR10−8] WtRDR ← R[IR10−8] R2DRR[IR10−8]← SR + DR RAddZ ← (SR + DR) = 0 ZAddR[IR10−8]← SR − DR RSubZ ← (SR − DR) = 0 ZSub


The BRIM Micro-Program (cont.)

Naming the BRIM µ-operations (cont.).

Micro-Operation Signal

R[IR10−8]← SR ∧ DR RAndZ ← (SR ∧ DR) = 0 ZAndR[IR10−8]← SR ∨ DR ROrZ ← (SR ∨ DR) = 0 ZOr

R[IR10−8]← DR RNot

Z ← DR = 0 ZNotPC ← IR7−0 IR2PCR[IR10−8]← Din InRDout ← if IR7 then IR6−0 else R[IR6−4] OutR



µOp Format:

PC2AR

RdR

PCInc

IR2AR

RdSR

SRFtch

SR2R

WtR

R2DR

RAdd

ZAdd

RSub

ZSub

RAnd

ZAnd

ROr

ZOr

RNot

ZNot

IR2PC

InR

OutR

We now give the layout of the µROM.



Phs Loc Add Sel

PC2AR

RdIR

PCInc

IR2AR

RdSR

SRFtch

SR2R

WtR

R2DR

RAdd

ZAdd

RSub

ZSub

RAnd

ZAnd

ROr

ZOr

RNot

ZNot

IR2PC

InR

OutR

F0 00000 00001 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0F1 00001 00000 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MD2 00010 00011 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MD3 00011 00100 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0M4 00100 00000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MRD2 00101 00110 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MRD3 00110 00100 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0St2 00111 01000 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0St3 01000 00000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ad2 01001 01010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Ad3 01010 01011 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ad4 01011 00000 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0Sb2 01100 01101 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Sb3 01101 01110 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Sb4 01110 00000 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0



Phs Loc Add Sel

PC2AR

RdIR

PCInc

IR2AR

RdSR

SRFtch

SR2R

WtR

R2DR

RAdd

ZAdd

RSub

ZSub

RAnd

ZAnd

ROr

ZOr

RNot

ZNot

IR2PC

InR

OutR

An2 01111 10000 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0An3 10000 10001 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0An4 10001 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0Or2 10010 10011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Or3 10011 10100 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Or4 10100 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0N2 10101 10110 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0N3 10110 10111 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0N4 10111 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0J2 11000 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0I2 11001 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0Ot2 11010 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1



I The phase, and location of each µ-instruction are given.

I The location must match the jump table.

I All µ-operations corresponding the the sequencer, C , havebeen eliminated. The µ-controlled processor does not use thehardwired sequencer.

I µ-instructions consist of the addess field, the select field, andthe horizontal µop bits.



Differences between the µ-controlled machine, and the hardwiredmachine.

1. The phase M2 has been replaced by two identicalmicro-instructions: MD2, and MRD2.

2. The phases Bz2, Bz2 ∨ Bz2, Bn2, Bn2 ∨ Bn2, and J2 have allbeen compressed into a single phase, J2.

Difference 1 is to simplify the jump table. Difference 2 is to makethe µ-program smaller.



Difference 1 illustrated:

IR15-12 = ?

0

AR <- IR7-0

SR <- M[AR]

R[IR10-8] <- SR

Dec

IR11

10

Dir

M2

MD3MRD3

M4

SR <- IR3 ? IR2-0 : R[IR2-0]

IR15-12 = ?

0

AR <- IR7-0

SR <- M[AR]

R[IR10-8] <- SR

Dec

IR11

10

Dir

MRD2

MD3MRD3

M4

SR <- IR3 ? IR2-0 : R[IR2-0]

AR <- IR7-0

MD2


The BRIM Jump Table and Mapper

The mapper converts from the µ-control word to an address intothe the µCROM. The jump table is used to jump to code for themachine instruction, from F1.

ZXIRX15-11

5 1

(This mapper only eliminates bits of IRX.)


The BRIM Jump Table and Mapper (cont.)

The jump table.

IRX15−12 IRX11 ZX Add Phase0000 1 X 00010 MD2

0000 0 X 00101 MRD2

0001 X X 00111 St2

0010 X X 01001 Ad2

0011 X X 01100 Sb2

0100 X X 01111 An2

0101 X X 10010 Or2

0110 X X 10101 N2

0111 X 1 11000 J2

0111 X 0 00000 F0

1000 X 0 11000 J2

1000 X 1 00000 F0

1001 X X 11000 J2

1010 X X 11001 I21011 X X 11010 Ot2

11XX X X XXXXX


The µDec

Converting from horizontal signals to direct control signals.(Figure out the control for each µ-operation.)

Output Signal FormulaRLD SR2R + RAdd + RSub + RAnd + ROr + RNot + InRMW WtRIOLD OutRPCLD IR2PCDRLD R2DRSRLD RdSR + SRFetchIRLD RdIRARLD PC2AR + IR2ARZLD ZAdd + ZSub + ZAnd + ZOr + ZNotRE SRFetch + WtR + R2DR + OutRME RdIR + RdSRIOE InR


The µDec (cont.)

Output Signal FormulaSPC PC2ARSALU SR2R + RAdd + RSub + RAnd + ROr + RNotSAD IR2AR + IR2PCSRF SR2R + WtR + R2DR + RAdd + RSub + RAnd + ROr

+RNot + InRSS SRFetch + OutRSD WtR + R2DRPCIN PCIncALUOP0 RAnd + ZAnd + RSub + ZSub + RNot + ZNotALUOP1 RSub + ZSub + RAnd + ZAnd + ROr + ZOrALUOP2 SR2R + RAnd + ZAnd + ROr + ZOr + RNot + ZNot


Comparing µ-Control to Hardwired Control

I Ease of Processor ModificationWe might want modify the processor to add capability, fixbugs, or impose a new architecture. The hardwire-controlledprocessor must be redesigned, from scratch. With theµ-controlled all that is often necessary is to reprogram it.

I Complexity of the Processor CircuitryWith the hardwire-control processor, the circuitry mustimplement the full control flowchart. In the µ-controlprocessor, most of the complexity is in the firmware

I Speed of Machine Instruction ExecutionThe Hardwire-controlled processor is an efficient, customcircuit. The µ-controlled processor design is more general, andit does extensive memory access to fetch the µ-instructionsfrom the µROM.


Chapter 10: A Few Last Topics

We address two limitations of computers:I Decreasing execution time. Techniques:

I Cache memory.I Instruction pipelining.

I Increasing memory space. Techniques:I Virtual memory.


Cache Memory

I A small, fast memory in the processor.

I The processor keeps data will probably be used soon, in thecache unit.

I The prediction of usefulness for data is based on twoprinciples.

I Temporal Locality. If a particular piece of data has just beenexecuted, it will probably be executed again soon.

I Spatial Locality. If a particular piece of data has just beenexecuted, data close to it will probably be executed soon.

I We study one one caching method.


Direct Mapped Cache

I Data is always placed in a fixed location in cache.I Example:

I DRAM is 64× 8.I Cache is a 4× 20 SRAM unit.

I Each word of the cache is called a cache entry.

I When a word is fetched into cache, its neighbor is alsofetched. (Each entry contains two words of DRAM.)

I Placing a word: Use its modulus 4 value. (010110 would beplaced at SRAM location 10.)


Direct Mapped Cache (cont.)

I This location might be already in use. (For example, DRAMaddress 000110 would map to location 10 also.) This is calleda cache conflict.

I A cache conflict may occur even if there are unoccupiedentries.

I Use: When the processor needs a datum, it first looks to see ifit is in cache. If so, (a cache hit) it is etched from cache. Ifnot, (a cache miss it is fetched from DRAM, and placed incache.

I We need a way of determining if an entry is in its cache entry,or not.



Details of the cache entry.

tag Vdata

00

01

10

11

0 1

1

1

1

0

101

011

110

000

10100001 00101111

10000001 00011100

11110111 00001000

01100110 10011001

I The Data field stores two words from DRAM.

I The V and tag field are bookkeeping fields used to determinewhat locations are stored in the entry.



I The data field contains two DRAM words.

I Each DRAM word has a different offset:one at offset 0, andone at offset 1.

I When a DRAM word is fetched, so is the word at its offsetpair.

I The V bit indicates that an entry is occupied.



Cache addressing: The DRAM address is split into fields, and usedto address the cache unit.

tag offsetindex

123

I The offset field is used to access the correct data word in theentry.

I The index is the entry number in the SRAM.

I The tag is stored in the entry, to uniquely identify the DRAMwords in the cache entry.



Examples

For DRAM address 00011100, the offset is 0, the index is 10, andthe tag is 00011.

To determine if this word is in the cache, the entry 10 is examined.If the entry’s V bit is set, and its tag is 00011, then it is a cachehit. Otherwise it is a cache miss.



Procedure for a read operation:

1. Use the index to locate the cache entry corresponding to theaddress.

2. If the V bit is clear, this read results in a cache miss. Theword and its offset pair are read from DRAM into the cahceunit.

3. If the tag does not match the tag of the entry, this is also amiss

4. Otherwise, theread results in a hit, The offset of the DRAMaddress specifies which of the two data words contain thecorresponding data.



Determining address field sizes for a cache.

Examples

I DRAM size: 2m × n (A DRAM address is m bits.)

I Number of words in the data field: 2w

I Cache length: 2k

I Offset field: w bits

I Index field: k bits

I Tag field: m − k − w bits



Cache entry width:

I Size of tag: m − k − w

I Size of V bit: 1

I Size of data: n × w

I Total: wn + m − k − w + 1

Examples

I A DRAM that is of size 1K × 8 (210 × 8).

I A cache with eight (23) data words per entry, and 16 (24)entries.

Size of tag: 10− 4− 3 = 3Width of entry: 8 · 8 + (10− 4− 3) + 1 = 68


Writing to Cache

I Writes are always performed on the cache entry.

I When a cache entry must be replaced because of a conflict,We must ensure that the changes have been written toDRAM.

I A cache entry that has changes is called dirty. A cache entrythat has not been changed is called clean.

I Writing dirty entries to DRAM:I Write-back — the dirty cache entry is written to DRAM when

the cache entry is replaced, due to a conflict.I Write-through — the dirty cache entry is written to Dram

immediately after each change.I Comparison: Write-back keeps the SRAM and DRAM

inconsistent for long periods of time. Write-through requiresmany DRAM accesses, which decreases the cache performance


Cache Performance

Measuring performance.

Hit and miss ratioRh = Nh

n ,Rm = 1− Rh

where n is the number of memory accesses, and Nh is the numberof hits.

Expected hit and miss ratesh = E (Rh),m = 1− h

Expected memory access time, on a machine with cache.TA = Th + (1− h) · Tm

Expected time to access memory on a machine without cache.TB = Tm


Cache Performance (cont.)

Cache performance (comparing the performance of the twomachines).

PA,B = TBTA

Examples

For Machine A, Th = 2, Tm = 8, and h = 0.75. The performanceof the cached machine compared to the base machine is as follows:

PA,B = 82+(1−0.75)·8 = 8

4 = 2


Instruction Pipelining

I Pipelining increases throughput: the number of instructionsthat can be executed in a given time period.

I Pipelining does not change execution speed : the time neededto execute a single instruction.

I Pipelining introduces parallelism into the architecture: theability to work on several instructions simultaneously.

I Ways of introducing parallelism.I Using several processors. This requires divvying up the

program among the processor, and devising a method ofcommunication between processors.

I Using multiple cores. Cores are scaled down processors thatare on the same chip, and that share devices, like memory.Multi-core processors have less problems with inter-corecommunication.

I Use a single data-path and control unit that that is designedto process several instructions simultaneously (pipelining).


Instruction Pipelining (cont.)

I Pipelining requires restructuring of the normal busarchitecture into a stream architecture.

I In a stream architecture, the data-path is structured as severaldevices, one feeding its output in as the input the the nextdownstream device.

I Each device in the stream is referred to as a stage.

I In a simple scenario, each stage does its work in 1 clock cycle.



An example 5-stage pipeline:

Ftch Dec Ex Mem

Memory

Reg ALSU

WtB


Instruction Pipelining

Pipeline Stages:

1. Ftch — Fetch an instruction from memory.

2. Dec — Decode the instruction, and fetch operands.

3. Ex — Execute the instruction.

4. Mem — Read/write from/to memory, if needed.

5. WtB — (Write-back) Write result to the register file, ifnecessary.

Clocking.

I The clock period must be long enough to accomodate theslowest stage.

I For an s-stage pipeline, an instruction will finish in s cycles.



I We plan on running all stages simultaneously.I Resources, like the memory unit, used by the pipeline must be

specially designed to handle multiple requests simultaneously,from several stages.

I In the pipelining scheme, all stages work on differentinstructions.

Example with consecutive instructions, I1, I2, I3, I4, and I5.Cycle Ftch Dec Ex Mem WtB0 I11 I2 I12 I3 I2 I13 I4 I3 I2 I14 I5 I4 I3 I2 I15 I5 I4 I3 I26 I5 I4 I37 I5 I48 I5


Problems with Pipelines

Problems:

I Data hazards. A result of an unfinished machine instruction isneeded by a later machine instruction.

I Branch hazard. A result from a conditional branch instructionis needed to determine which instruction to fetch next.

Examples (Data Hazard)

add R0, R1, R2

mult R2, R0, R2

Cycle Ftch Dec Ex Mem WtB0 Add1 Mult Add2 Mult Add

(Notice that when the Mult instruction is fetching R2, the Addinstruction has not stored its result in R2.)


Problems with Pipelines (cont.)

One solution, although it slows the pipeline, is to inject 3 cyclestall between the two instructions.

Cycle Ftch Dec Ex Mem WtB

0 Add

1 Add

2 Add

3 Add

4 Mult Add

5 Mult

(Notice that the Add instruction has completed writing its resultby the time the Mult instruction is fetching operands.)


Problems with Pipelines (cont.)

Examples (Branch Hazard)

beq R0, R1, xyz

Cycle Ftch Dec Ex Mem WtB0 Beq1 I1 Beq2 I2 I1 Beq3 I3 I2 I1 Beq4 I4 I3 I2 I1 Beq

I Notice that by the time that the Beq instruction has writtenits result to the PC, the 4 next instructions have entered thepipeline.

I If the branch is taken, these instructions should not have beenstarted.

I Again, this can be solved by inserting a 4 cycle stall after theBeq instruction.


Pipeline Performance

We compare a pipelined machine to one that is not.

Expected time to execute a sequence of n instructions on a k-stage pipeline:

Tk = (n − 1) + k(It is assumed that the pipeline is operated at full capacity.)

Expected time to execute n instructions on a single-stage machine:T1 = n · k

Performance:Pk = T1

Tk


Pipeline Performance (cont.)

Examples (A 4-stage machine)

Tk = (1, 000− 1) + 4 = 1, 003

T1 = 1, 000 · 4 = 4, 000

Pk = 4,0001,003 ≈ 3.988

(The pipelined machine is about 4 times as fast.)


Increasing Memory Space

The memory hierarchyextends logical memoryfrom the cache, all of theway to disk storage.

Reg

CacheL1

CacheL2

Memory

Disk


Virtual Memory and Paging

I To execute a program that cannot fit in memory, we split itinto pieces called pages.

I Pages are brought into memory as needed, and shuffled outwhen we finish with them. On disk they reside in the swap file.

I every word in the program has a virtual address: its locationin the large virtual memory.

I Every word that is in DRAM has a physical address: itslocation in the DRAM unit.

I When the processor fetches a word using its virtual address, itmust be determined if the word is on disk, or in DRAM.

I If the word is in DRAM (a page hit) It is used from DRAM.

I If the word is not in memory (a page fault), it is read fromthe swap file and placed in DRAM.


Paging

Examples

I Program size: 16K = 24 × 210 (14-bit virtual address)

I Workspace size: 4K = 22 × 210 (12-bit physical address)

I Page size: 1K = 210 (10-bit page offset)

I The program would be split into 16 pages.

I The workspace is split into frames. Each frame holds 1 page.

I Frames are numbered 0 – 3 (00 – 11)

I We refer to a workspace split up into frames as a frame table.

I The swap file would be 16K words. It would be split into 16pages.

I Pages would be numbered 0 – 15 (0000 - 1111).


Paging (cont.)

Address translation: the 14-bit virtual address would be split into a4-bit page number, and a 10-bit address. for the example machine.

Physical Address

Page# OffsetPage Table

Swap File

Workspace

00

01

10

11

Frame#V Frame#

01010101010100

101

Page#

0100

Virtual Address

Frame# Offset

10 0101010101

Page#

0100


Paging (cont.)

I A page table is kept in protected memory. It has an entry forevery page in the workspace. The page table entry for a pagenumber gives its frame number, if it is DRAM (as determinedby the V bit).

I To assemble the physical address of a page that is in DRAM(the V bitis 1), the page number is looked up in the pagetable, yielding the frame number

I The virtual address is then completed by adding in the offsetfrom the virtual address.

I The required word is then accesses from the frame table.

I If a page is not in the frame table (the V bit is 0), the swapfile is accessed using the page number, and the page is loadedinto the frame table.


Page Replacement

If a new page needs to be loaded into the frame table, but theframe tabl is full, one of the frames must be written out to theswap file, and the emptied frame must be filled with the new page.This process is called a page swap.

Page replacement strategies:

I Random (RAN) replacement. Randomly choose a frame toreplace.

I Least recently used (LRU) replacement. Replace the pagethat has not been accessed for the longest time.

I First in, first out (FIFO) replacement. Replace the frame thathas been sitting in the frame table the longest.


Disk Access

I LRU, and FIFO both make valid postulation on how useful thepages in the frame table are

I It is, however, possible to create page reference scenarios, thatare non-obscure which cause LRU and FIFO to do excessiveswaps.

I Random replacement is more immune to this, mostly becauseof its unpredictability.

I Another problem with paging is that pages are fixed size, anda program is split automatically into pages. Naturalprogramming structures, like loops, might be split betweenpages. This could cause page swapping as the loop isexecuted.


Disk Access (cont.)

I In an alternate scheme to paging, segmentation, the userexplicitly breaks the program up into variable size blocks,resulting in more natural breaks.

I The problem with segmentation is making full use of theworkspace with variable size segments. Unused space in theworkspace results in an increase in swapping.

I (It is possible to combine paging and segmentation into a2-level hybrid compromise system.)


Memory Protection

I In a non-virtual memory system, there is nothing stopping oneprocess from writing or reading to/from another process’sworkspace, by specifying an appropriate address.

I In a virtual memory system, any address specified by aprogram is considered a virtual address, and translated into alocation in the program’s own workspace.

I The translation process automatically protects all processesfrom each other.

I The weakness of the virtual memory system is the page table.program cannot be allowed to change its own table table.

I The page table is stored in protected memory that can only beaccessed by a process running in kernel mode (the OS).Because of this, the page table can only be changed throughthe interrupt system.


Other Interesting Topics

I/O Structure

I How is data sent to another device: one bit at a time (serialcommunication), or all at once (parallel communication).

I The actual circuitry involved in handling interrupts (theinterrupt cycle).

I Speeding up memory access. This can be done byout-sourcing transfers to a separate processor, called a directmemory access device (DMA). This allows the CPU tocontinue work while the DMA works on the transferasynchronously.


Other Interesting Topics (cont.)

Parallel Architectures

I Systems with several processors

I Adding processors to a task can decrease execution time.However there are complications

I A scheme for interprocess communication must be developed.

I Processor synchronization must be addressed.

I Sharing of memory is an issue: is a single memory shared byall processors, is memory divided up among processors(distributed memory), or does each processor have its ownunshared memory?