Computer Organization: Basic ProcessorStructure
James Gil de Lamadrid
April 17, 2018
Computer Organization: Basic Processor Structure
Chapter 1: Overview
I Computer Science students start by learning a high-levellanguage. We study what is below the high-level code theywrite.
I We break our study into two areas:I Computer Organization - the study of the implementation of
the computer.I Computer Architecture - the study of he interface to the
computer.
Computer Organization: Basic Processor Structure
High-Level Laguages
I Programming languages are classified by level.
I Low level languages are closer to the hardware.
I High level languages manipulate more abstract datastructures.
Examples
I Haskell - a functional language.I C++ - an object-oriented language.
Computer Organization: Basic Processor Structure
Machine Language
I Machine language is numeric.
I A machine instruction is a collection of fields, or numbers therepresent the information given in the instruction.
I Instruction format: op-code, destination, source, constant
I Machine instructions operate on registers
Computer Organization: Basic Processor Structure
Machine Language (cont.)
Examples
Source code:x = 5 + y * 3;
Machine code:1, 1, 2, 3
14, 1, 1, 5
Meaning: (Registers R1, and R2 are used to represent the variablesx , and y , respectively.)
R1 = R2 * 3
R1 = R1 + 5
Computer Organization: Basic Processor Structure
Assembly Language
I Assembly language is a symbol version of machine language.
I Numbers forming parts of the machine instruction, are givensymbolic names.
I The programmer is relieved of remembering the meanings ofnumbers.
Examples
Assembly code:mult R1, R2, #3
add R1, R1, #5
Computer Organization: Basic Processor Structure
Compilers & Assembly Language
I High-level source code must be translated into machine code,to able to execute on hardware.
I Translation is done in several stages. In the first stage. Sourcecode is often translated into Assembly code.
I The translation Process
1. Parse - the source code is translated into an abstractrepresentation, often an abstract syntax tree (AST).
2. Generate Code - the AST is traversed, and as it is, code foreach node assembly code for each node is written.
Computer Organization: Basic Processor Structure
Compilers & Assembly Language (cont.)
Examples
Example AST
=
x+
5 *
y3
Computer Organization: Basic Processor Structure
Assemblers & Object Code
I The assembler translates assembly code to object code
I Object code is incomplete machine code.
I The assembler has trouble completing the machine codebecause of external references.
I A module containing a reference to a definition from anothermodule has an external reference.
Computer Organization: Basic Processor Structure
External References
Module Q contains:extern int x;
x = 5;
Module Driver contains:int x;
In assembly language, this would be
I Q:store x, #5
I Driver (Allocate a word in memory for the variable x .):x: .word
Computer Organization: Basic Processor Structure
External References (cont.)
I Translating the store into machine language might yield thefollowing (We assume the op-code for store is 19, and x hasbeen allocated memory location 50.):
19, 50, 5
I But, the assembler only analyses one module at a time, andand cannot determine what memory location has beenallocated to x .
I Instead the assembler produces the following object codeinstruction, with a blank left for the address of x , when it iseventually calculated.
19, x?, 5
Computer Organization: Basic Processor Structure
Compiler vrs. Assembler
I The compiler parsing activity is complex.
I The code generation is complex, often producing severalassembly instructions for each high-level statement.
I Assembler translation is little more than looking up symbols ina symbol table.
I The numeric field values are assembled into a full instruction.
Computer Organization: Basic Processor Structure
The Linker & Executable Code
I The input to the linker is a set of object code files.
I The output is a single executable code file.I Linker tasks:
I Resolve external references.I Library search.I Relocation of object code modules.
Computer Organization: Basic Processor Structure
The Linker & Executable Code (cont.)
I Resolving External references:The linker sees both the module Q, and Driver. It cancalculate the address of x in Driver, and fill in the blank in theQ module.
I Library searches:The linker pulls in modules from the library, and adds them tothe executable code, in order to resolve some externalreferences.
I Relocation:Modules are assigned an order in memory. The addresses inthe module must be adjusted to reflect the module’s position.
Computer Organization: Basic Processor Structure
Library Search Example
Examples
(A0 is the argument register, used to pass an argument to afunction, and RV is the return value register, used to pass a valueback from a function.)
Source code:z = sqrt(y);
Assembly code:load A0, y
call sqrt
store z, RV
Computer Organization: Basic Processor Structure
Relocation Example
I Module Driver has addresses 0 - 2,999.
I Module Q has addresses 0 - 1,999.
I Module q is placed after Mudule Driver. The base address ofmodule Q is now 3,000.
I All addresses from Module Q must be modified by adding3,000 to them.
Computer Organization: Basic Processor Structure
The Loader
I The loader:I Relocates the executable code.I Initializes registers.
I The loader loads an executable program into its own sectionof memory called its workspace.
I Several programs (processes) can be active simultaneously.
I The processor executes small pieces of each process (calledquanta) in rapid succession, making it appear that allprocesses in memory are running simultaneously.
I Depending on the location of the program workspace,addresses in the executable code will need to be altered yetagain.
I Several registers, with special uses must be initialized beforethe program is started.
Computer Organization: Basic Processor Structure
Initializing the PC Register
The Program Counter (PC) is a register that contains the memoryaddress of the next instruction to be executed. It must be updatedeach time an instruction is executed. Initially, it must be set topoint to the base address of the program workspace.
PC2350
2000
workspace
Current Instruction2349
Memory
Computer Organization: Basic Processor Structure
Translation Summary
Computer Organization: Basic Processor Structure
The Processor
I Levels of abstraction for HardwareI The register transfer level (RTL), or behavioral level.I The gate level, or structural level.
I Processor BehaviorThe processor repeatedly executes the machine cycle, thatreads a single instruction from memory, and executes it.
I Steps in the machine cycle.
1. Fetch an instruction from memory.2. Decode the instruction. (Split the instruction into fields.)3. Execute the instruction.
Computer Organization: Basic Processor Structure
Processor Structure
I The processor contains registers for storage. Collectively theyare referred to as the register file.
I An arithmetic logic unit (ALU) performs operation of datastored in registers.
I The way the devices in the processor are connected is calledthe data-path.
I The circuit that controls the data-path, and all devices is thecontrol unit.
Computer Organization: Basic Processor Structure
The Data-Path
Examples (Simple 2-register data-path.)
R1← R1 + R2R2← 0
Corresponding circuit:
R1
R2
+
0
Operations performed:
1. Add the contents of R1, and R2, and put the result in registerR1.
2. Set register R2 to zero.
Input the registers is calculated by circuitry called a computationalunit.
Computer Organization: Basic Processor Structure
Control Circuitry
Examples (Simple 2-input control.)
S1 : R1← R1 + R2S2 : R2← 0
Corresponding circuit:
R1
R2
+
0
S1
S2
LD
LD
Registers are opened for input when the load (LD) line is triggeredby the control inputs.These descriptions are register transfer level (RTL). RTL shows acollection of connected devices.
Computer Organization: Basic Processor Structure
Digital Circuitry
I Below the RTL level is the digital circuit level, or gate level.
I Gate level circuits are composed of gates.
I Digital circuits represent Boolean values as voltages (maybe0V for false, and 5V for true).
I Gates compute Boolean functions, from input signals.
Examples (AND gate: computes z = a ∧ b.)
ab z
Computer Organization: Basic Processor Structure
Combining Gates into Larger Functions
Examples (Circuit that computes z = a ∧ b ∨ ¬c)
a
b
c z
(Uses an OR gate, a NOT gate, or inverter.)
Computer Organization: Basic Processor Structure
Chapter 2: Number, and Logic Systems
Topics covered:
I Computer systems use the base two (binary) number system.
I This system is cumbersome for people. A system that is lessfor people, but still easily translatable to binary is hexadecimal.
I The circuitry in computer systems is based on Booleanalgebra.
Computer Organization: Basic Processor Structure
Numbers
I Binary has two digits: 0, and 1.
I A binary digit is called a bit.
I Numbers are stored in a collection of bits, of fixed width. Thecollection of bits is called a processor word. A 13 in a 4-bitword would be 1101. In an 8-bit word, it would be 00001101.
I Decimal expansion:365 = 3× 102 + 6× 101 + 5× 100.
I Digits: The leftmost digit is referred to as the high-orderdigit, and the rightmost digit is the low-order digit.
I Decimal is base 10 (the radix in the expansion is ten), and hasten digits: 0 through 9.
Computer Organization: Basic Processor Structure
Binary Numbers
I Binary expansion:00110101 = 0× 27 + 0× 26 + 1× 25 + 1× 24 + 0× 23
+1× 22 + 0× 21 + 1× 20 = 25 + 24 + 22 + 20
I Converting from binary to decimal: simple do the calculationsin the binary expansion in decimal.
00110101 = 25 + 24 + 22 + 20 = 32 + 16 + 4 + 1 = 53
Computer Organization: Basic Processor Structure
Binary Numbers (cont.)
Converting from decimal to binary.
Examples (Converting 365 to binary using successive division.)Calculation Quotient Remainder
365÷ 2 182 1182÷ 2 91 091÷ 2 45 145÷ 2 22 122÷ 2 11 011÷ 2 5 15÷ 2 2 12÷ 2 1 01÷ 2 0 1
Computer Organization: Basic Processor Structure
Understanding Successive Division
Successive division in decimal:
ExamplesCalculation Quotient Remainder
365÷ 10 36 536÷ 10 3 63÷ 10 0 3
I Each division pulls off one digit of the number.I Low-order digits are extracted first.I Division by 10 extracts decimal digits. Division by 2 extracts
bits.I To form a binary number outof the results of successive
division, list the remainders from last extracted to firstextracted, left to right. For the example that would be 365 =101101101.
Computer Organization: Basic Processor Structure
Hexadecimal Numbers
I Hexadecimal is base 16, with 16 digits: 0, 1, 2, 3, 4, 5, 6, 7,8, , A, B, C, D, E, F. (A - F represent the digits 10 - 15.)
I To convert from hexadecimal to decimal, use the hexexpansion.
A3F = 10×162 +3×161 +15×20 = 2, 560+48+15 = 2, 623
Computer Organization: Basic Processor Structure
Hexadecimal, & Binary
I Converting hex from/into binary. A single hex digit is fourbinary digits.
I To convert from hex to binary, replace each hex digit with itscorresponding 4-bit representation.
I To convert from binary to hexadecimal, replace each group offour bits by the corresponding hex digit.
Examples
A3F = 1010 0011 11110010111010001011 = 0010 1110 1000 1011 = 2E 8B
Computer Organization: Basic Processor Structure
Adding Binary Numbers
Examples (Decimal addition)0 10
13
06
05
+11922557
I You add column by column.
I In each column, you add two operand digits, and a carry-indigit.
I Each addition results in a sum digit, and a carry-out digit.
Computer Organization: Basic Processor Structure
Adding Binary Numbers (cont.)
Examples (Binary addition)0 01
10
11
01
+00111110
I Carry-in to the low-order column is 0.
I A carry-out of 1 occurs when the column sum is greater thanor equal to 2.
I When a carry-out occurs, the sum digit is the sum minus 2.
Computer Organization: Basic Processor Structure
Representing Negative Numbers
I Computers support two numbering systems:I Unsigned integers - all bit configurations of teh word are used
to represent non-negative integers.I Signed integers - half of the processor word bit configurations
are used to represent negative integers, and half are used torepresent non-negative integers.
I For signed integers, the top bit is the sign bit.I A 0 bit indicates a non-negative number.I A 1 bit indicates a negative number.
Computer Organization: Basic Processor Structure
Signed Notations
Notation 107 -107
Sign-magnitude 0 1101011 1 1101011One’s Compliment 0 1101011 1 0010100Two’s Compliment 0 1101011 1 0010101
Notations:
I Sign-magnitude - formed by writing the magnitude in binary,and tacking on the correct sign bit.
I One’s compliment - formed by inverting every bit in thenumber.
I Two’s compliment - formed by adding 1 to the one’scompliment.
Computer Organization: Basic Processor Structure
Signed Notations (cont.)
I Problem: sign-magnitude has two values of 0:I +0: 00000000I -0: 10000000
I Problem: one’s compliment also has two values of 0:I +0: 00000000I -0: 11111111
I
Examples (Two’s compliment of +107 = 01101011.)
One’s compliment: 10010100
0 01
00
00
01
00
01
00
00
+110010101
Computer Organization: Basic Processor Structure
Desirable Properties of Two’s Compliment
1. There is only one representation of 0. (This can be seen bytaking the 2’s comp. of 0.) (Taking the 1’s comp. of 0 andadding 1.)
1 11
11
11
11
11
11
11
01
+100000000
2. Negation is idempotent. (−− a = a)2’s-comp(00011010) = 111001102.s-comp(11100110) = 00011010
3. The negative of a number is it’s additive inverse.(a +−a = 0) As an example, we do 26 +−26.
1 10
10
10
11
11
10
01
00
+1110011000000000
Computer Organization: Basic Processor Structure
Shortcut 2’s Comp. Calculation
A copy transformation is used to calculate the 2’s comp. of abinary number.
00111001
00100110
trail. 0's1st 1
rest
as isas is1's comp.
Computer Organization: Basic Processor Structure
Boolean Algebra
I Boolean algebra is an algebra, like arithmetic algebra, in whichwe form expressions from operators, and operands.
I Arithmetic algebra, the expressions are used to describefunctions that operate on numbers.
I In Boolean algebra the expressions operate on Boolean values:false, written a 0, and true, written as 1.
Examples
Arithmetic expression:x + 2 · y
Boolean expression:a · b + a · b
(”+” is the OR operator, and ”·” is the AND operator.)
Computer Organization: Basic Processor Structure
AND, OR, and NOT Operations
Truth tables:
a b a · b a + b
0 0 0 00 1 0 11 0 0 11 1 1 1
a a
0 11 0
I The truth table shows the output of a Boolean function, forevery possible value of input.
I It is split into an input half, and an output half.
I To produce all input values, count in binary, with each rowhaving a different count in the input half. (In the table forAND, and OR, this would give 2-bit counts of 00, 01, 10, and11.)
Computer Organization: Basic Processor Structure
Other Common Boolean Operators
Operators XOR, XNOR, NAND, and NOR.
a b a⊕ b a� b a · b a + b
0 0 0 1 1 10 1 1 0 1 01 0 1 0 1 01 1 0 1 0 0
Computer Organization: Basic Processor Structure
Operation Summary
I a · b (AND): outputs 1 iff all of its operands are 1.
I a + b (OR): outputs 1 if any of its operands are 1.
I a (NOT): outputs 1 only if its operand is 0.
I a⊕ b (XOR): outputs 1 iff its operands are not equal.
I a� b = a⊕ b (XNOR): outputs 1 iff its operands are equal.
I a · b (NAND): outputs 1 only iff at least one of its operands is0.
I a + b (NOR): outputs 1 iff all of its operands are 0.
Computer Organization: Basic Processor Structure
Boolean Expressions, & Truth Tables
Examples
g = (ab + c)⊕ (ac + b)a b c ab ab ab + c c ac b ac + b g
0 0 0 0 1 1 1 0 1 1 00 0 1 0 1 1 0 0 1 1 00 1 0 0 1 1 1 0 0 0 10 1 1 0 1 1 0 0 0 0 11 0 0 0 1 1 1 1 1 1 01 0 1 0 1 1 0 0 1 1 01 1 0 1 0 0 1 1 0 1 11 1 1 1 0 1 0 0 0 0 1
I Boolean operators are combined to form Boolean expressions.I To build a truth table from a Boolean expression, form
columns for intermediate subexpressions.
Computer Organization: Basic Processor Structure
Boolean Expressions, & TruthTables (cont.)
Examples (Converting from table to equation.)a b c h
0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 01 0 1 11 1 0 11 1 1 1
h = abc + abc + abc + abc
Computer Organization: Basic Processor Structure
Table to Equation
I h is 1 only if a is 0, b is 1, and c is 0, or a is 1, b is 0, and cis 1, or a is 1, b is 1, and c is 0, or a is 1, b is 1, and c is 1.
I These correspond to the rows in the truth table that haveoutput of 1.
I The multiplicative terms that contain all input variables arecalled minterms.
I Minterms correspond to rows in the truth table.
I They are often referred to by there number. Reading theinput values of a row as a binary number yields the number.For example for a = 0, b = 1, and c = 1, we get the mintermnumber 011, so abc is Minterm 3.
Computer Organization: Basic Processor Structure
Don’t Care Conditions
An analogous incomplete function.
B(n, k) =
B(n − 1, k) + B(n − 1, k − 1), 0 < k ≤ n
1, n = k
1, k = 0
I When k = 0, the value of n doesn’t matter - we don’t carewhat it is; the function always returns 1.
I In Boolean algebra, we indicate don’t care conditions with thesymbol ”X”.
Computer Organization: Basic Processor Structure
Don’t Care Conditions (cont.)
Examplesa b c f g
0 0 X 0 10 1 0 1 X0 1 1 X 11 0 0 0 11 0 1 1 01 1 X 1 0
I When the don’t care is on the output, we do not care whatthe output is, and the designer can choose what to output, tooptimize a circuit.
I When the don’t care is on the input side, the given output isfor both a 0, and a 1 value of the input. (The last line of thetable is for both Minterm 110, and Minterm 111.)
Computer Organization: Basic Processor Structure
Boolean Simplification using Identities
Identities allow us to transform expressions into equivalentexpressions.
Examples (Arithmetic expression transformation using thedistributive law.)
(2a + 6) · (2a− 6)= (2a + 6) · 2a− (2a + 6) · 6= 22a2 + 6 · 2a− (6 · 2a + 62)
(Distributive Law: a(b + c) = ab + ac.)
There are other identities that allow further transformation.
Computer Organization: Basic Processor Structure
Boolean Identities
Simplifying Boolean expressions allows us to build circuits thathave fewer components, consume less power, are faster, and takeless physical space.
Identities:
1. Double negation: a = a
2. Contradiction: a · a = 0
3. Tautology: a + a = 1
4. Commutativity: a + b = b + a, a · b = b · a5. Associativity: a + (b + c) = (a + b) + c , a · (b · c) = (a ·b) · c6. Identity elements: a + 0 = a, a · 1 = a
7. Zero elements: a + 1 = 1, a · 0 = 0
8. Idempotency: a + a = a, a · a = a
Computer Organization: Basic Processor Structure
Boolean Identities (cont.)
Identities:
9. Distributive:a · (b + c) = a · b + a · c , a + bc = (a + b) · (a + c)
10. DeMorgan’s: a + b = a · b, a · b = a + b
11. Definition of XOR: a⊕ b = a · b + a · b
I DeMorgan’s Law specifies how to bring a negation into agroup. It also specifies two algebraic forms for the NAND,and NOR operators.
I The XOR operator has an algebraic equivalent. So does theXNOR operator:
a� b = ab + ab
Computer Organization: Basic Processor Structure
Example Simplification using Identities
Examples
(ab + c)⊕ bc
= ab + c · bc + (ab + c) · bc (R11)
= (ab · c)(b + c) + (ab + c)bc (R10)
= (ab · c)(b + c) + (ab + c)bc (R1)
= (a + b) · c)(b + c) + (ab + c)bc ((R10)= (a + b) · c(b + c) + (ab + c)bc (R1)= c · (a + b)(b + c) + bc · (ab + c) (R4)= (c · a + cb)b + (c · a + cb)c + cabb + bc (R8, R4, R10)= b(c · a + cb) + c(c · a + cb) + 0 + bc (R4,R2, R7)= bc · a + bcb + c · c · a + c · cb + bc (R6, R9)
Computer Organization: Basic Processor Structure
Example Simplification Using Identities (cont.)
Examples
= bc · a + 0 + c · a + cb + bc (R8, R4, R2, R7)= c · a(b + 1) + b(c + c) (R6, R4, R9)= c · a · 1 + b · 1 (R3, R7)= c · a + b (R6)
I Algebraic simplification is difficult, requiring strategicplanning.
I To allow automation of simplification, a more mechanicalmethod is needed.
Computer Organization: Basic Processor Structure
Boolean Simplification using Karnaugh-Maps
I There are only four Boolean functions with less than twoparameters.
1. f0 = 02. f1 = 13. fidentity (x) = x4. finverse(x) = x
I The smallest interesting functions have two independentvariables.
I K-maps come in differing sizes, depending on the number ofindependent variables.
Computer Organization: Basic Processor Structure
K-Maps of Two Variables
Examples
g = ab + ab + ab
a b g
0 0 10 1 01 0 11 1 1
g b
a
0 1
0
1
1
1 1
0
g = a + bComputer Organization: Basic Processor Structure
Combining Cells in the K-Map
The 2-variable K-mapr is a square with one variable on each axis.
Cell combination:
1. Adjacent cells that contain 1 can be combined.
2. Combined cells must form a rectangular group.
3. The size of a group must be a power of two.
4. The groups copied out must cover all cells that are 1. (Notice,however, that cells may be covered by several groups.)
5. The group names are ORed together, to form a simplifiedequation.
6. Group names are the AND of all variables that do not changetheir value, in the group.
7. The covering groups must be as large as possible.
Computer Organization: Basic Processor Structure
K-Maps for Functions of Three Variables
Examples
a b c h
0 0 0 10 0 1 00 1 0 10 1 1 01 0 0 11 0 1 11 1 0 11 1 1 0
h
a
bc
0
1
00 01 11 10
1 1
1 1 1
0 0
0
h = c + ab
Computer Organization: Basic Processor Structure
K-Maps for Functions of Three Variables (cont.)
I The 3-varaible K-Map is two 2-varaible K-maps stucktogether..
I The vertical axis is one of the variables, and the horizontalaxis is both of the other two variables.
I The horizontal coordinates are listed in Gray code sequence.
I Between elements of the Gray code sequence, only one bitchanges.
I K-Maps wrap around, both vertically, and horizontally. Thismeans that the cells on the let are next to the cells on theright of the Karnaugh-map.
Computer Organization: Basic Processor Structure
K-Maps for Functions of Four Variables
Examples (Four variable)a b c d z
0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 0
z
ab
cd
00
01
11
10
00 01 11 10
1 1 1 1
1 1
1
1 1 1 1
0 0
0 0 0
Computer Organization: Basic Processor Structure
K-Maps for Functions of Four Variables (cont.)
Examples (Four variable (cont.))
z = b + cd + a · cd
More than 4-variable K-maps become large, and it is best to use asoftware authoring tool to do simplification, rather than draw amap by hand.
Computer Organization: Basic Processor Structure
Don’t Care Conditions in Karnaugh-Maps
Examples
a b c m
0 0 0 10 0 1 X0 1 0 00 1 1 11 0 0 X1 0 1 11 1 0 X1 1 1 0
m
a
bc
0
1
00 01 11 10
1 1
1X
X
X
0
0
Computer Organization: Basic Processor Structure
Don’t Care Conditions in Karnaugh-Maps (cont.)
I Don’t cares in the output can be assigned either a value of 0,or 1, to yield the best simplification, allowing larger groups tobe pulled out of the K-map.
I Without using the don’t cares.m = abc + abc + abc
I Using the don’t cares.m = b + ac.
Computer Organization: Basic Processor Structure
Chapter 3: Digital Circuitry
I Processors are digital circuits.I Digital circuits have wires that carry one of two possible
signals.I low : a low voltage, like 0V.I high: a high voltage, like 5V.
I we are not concerned with the actual voltage, and so we callthese signals 0, and 1.
I How 0, and 1 are assigned to voltage is irrelevant to us.I Types of digital circuits:
I Combinational circuits: they have no memory. The outputscan change immediately when the inputs are changed.
I Sequential circuits: they have memory. The outputs may notchange when the circuit is ”remembering” a previous value.
Computer Organization: Basic Processor Structure
Combinational Circuits
Logical gates:
ab ab a
ba + b
a a
ab a + b a
b ab
ab a + b
ab a b
(AND, OR, NOT (inverter), XOR, NAND, NOR, XNOR)
Computer Organization: Basic Processor Structure
Using Gates
Examples
Boolean function.f = (a⊕ b)(b + c)
Schematic.
a
b
c
f
Computer Organization: Basic Processor Structure
Using Gates (cont.)
Examples (cont.)
Alternate drawing.
a
b
c
f
Computer Organization: Basic Processor Structure
Buffers
The triangle on the inverter is a buffer, and the open circle is theinversion element.
Inverter types:
a a
a
a
c
m
a
(inverter, simple buffer, tri-state switch)
Computer Organization: Basic Processor Structure
Simple Buffer
It boosts power. It is use in fanout situations, where splitting asignal weakens it.
x
x
x
x
x
x
x
x
Computer Organization: Basic Processor Structure
Tri-State Switch
The control line, c, when cleared, turns the flow off (sets theoutput to a state of high impedance, Z ). The output has threestates: Z , 0, and 1.
a c m
0 0 Z0 1 01 0 Z1 1 1
Computer Organization: Basic Processor Structure
Common Combinational Circuits
I The decoder - transforms a numeric code into trigger signals.
I The encoder - translates trigger signals into a code.
I The multiplexer - routes multiple inputs imnto a single outputline.
I The adder - Adds binary signals that represent numbers.
Computer Organization: Basic Processor Structure
The Decoder
x1
x0 p2
p0
p1
p3
Dec2-4
I A decoder is a switch. It turns on (sets) one of several outputlines, and turns off (clears) the rest.
I The code x gives the index of the line to turn on.
I As an example, if x = 01, p1 would be 1, and all otheroutputs would be 0.
I Decoder sizes: k − 2k . k is the number of inputs, 2k is thenumber of outputs.
Computer Organization: Basic Processor Structure
The Decoder (cont.)
Examples (4-1 decoder)x1 x0 p0 p1 p2 p3
0 0 1 0 0 00 1 0 1 0 01 0 0 0 1 01 1 0 0 0 1
p0 = x1 · x0
p1 = x1x0
p2 = x1x0
p3 = x1x0
x1
x0
p0
p1
p2
p3
Computer Organization: Basic Processor Structure
The Encoder
x1
x0p2
p0p1
p3
Enc4-2
I An encoder checks several circuits, with only one circuit on(set), and reports a code indicating which circuit is.
I The code, x , gives the index of the line that is on.
I As an example, if p0 = 0, p1 = 0, p2 = 1, and p3 = 0, thenthe output x would be 10.
I Encoder sizes: 2j -j .
Computer Organization: Basic Processor Structure
The Encoder (cont.)
Examplesp0 p1 p2 p3 x1 x0
1 0 0 0 0 00 1 0 0 0 10 0 1 0 1 00 0 0 1 1 1
(Rows that are not shown aredon’t cares.)
x1 = p2 + p3
x0 = p1 + p3
x1
p0p1
p2p300 01 1011
00
01
10
11
11
0
0
X X
X X X
X X X
X X
X
X
x0
p0p1
p2p300 01 1011
00
01
10
11
01
0
1
X X
X X X
X X X
X X
X
X
x0
x1
p0
p3
p2
p1
Computer Organization: Basic Processor Structure
Encoder Schematic
Examples (cont.)
x0
x1
p0
p3
p2
p1
Computer Organization: Basic Processor Structure
The Multiplexer (MUX)
MUX4-1 p
i0i1
i3
i2
s1 s0
I A MUX routes one of several inputs to a single output.
I Only one input is allowed to pass through. The other inputsare stopped.
I The input allowed through is specified by the code s.
I As an example if s = 11, the output p would be whatever ison the line i3.
I MUX sizes: 2k − 1. Width of the selector line s: k bits.
Computer Organization: Basic Processor Structure
The Multiplexer (cont.)
Examplesi0 i1 i2 i3 s1 s0 p
0 X X X 0 0 01 X X X 0 0 1X 0 X X 0 1 0X 1 X X 0 1 1X X 0 X 1 0 0X X 1 X 1 0 1X X X 0 1 1 0X X X 1 1 1 1
p = i0s1 ·s0+i1s1s0+i2s1s0+i3s1s0
(Simplification is either byK-map, or by copying out eachminterm, ignoring the don’t careconditions.)
Computer Organization: Basic Processor Structure
MUX Schematic
Examples (cont.)
s1
s0
i1
i0
i2i3
p
Computer Organization: Basic Processor Structure
MUX Composition
Examples (4-1 MUX from 2-1 MUX’s)
MUX2-1
MUX2-1
MUX2-1
i0
i1
i2
i3
s0 s1
p
I The MUX’s are structuredinto a tournament, in theprocess called interleaving.
I The low-order bit, s0, isused to choose the betweenodd, and even indexes, inthe first round.
I The high-order bit, s1,chooses between the twofirst-round-heats, in the finalround.
Computer Organization: Basic Processor Structure
The Adder
cin 0 1a 0 1
+b +1 +1scout 01 11
a
b
cin
cout
s+
An adder adds three 1-bit numbers, a, b, and cin, to form a sumbit, s, and a carry bit, cout .
Computer Organization: Basic Processor Structure
The Adder (cont.)
cin a b cout s
0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1
s = cin ⊕ a⊕ bcout = cina + cinb + ab
Checkerboard pattern for cout :
I XOR - odd parity cellcoordinates (the oddfunction).
I XNOR - even parity cellcoordinates (the evenfunction).
couts
cin cin
abab
0 0
11
00 01 11 10 00 01 11 10
1
1 11
0
0
0 0
1
1
1
1
0
0
0
0
Computer Organization: Basic Processor Structure
Adder Schematic
a
b
cin
s
cout
Computer Organization: Basic Processor Structure
The Ripple-Carry Adder
To add multi-bit numbers we use several adders, one per column ofthe long addition problem, to add the a operand, b operand, andthe carry-in. The carry-out becomes the carry-in of the nextcolumn.
1 11
10
11
01
+11011000
Notice that the carry ”ripples” up from the bottom column, to thetop. (The calculation of one column has to wait until thecalculation of the previous column is complete.)
Computer Organization: Basic Processor Structure
The Ripple-Carry Adder (cont.)
a0
a1
a2
a3
b0
b1
s0
b3
b2
s1
s2
s3
cin
cout
+1
+0
+3
+2
a
b
+4-bit
s
4
cin
cout
4
4
(4-bit bus line in the interface diagram indicate inputs of fourlines.)
Computer Organization: Basic Processor Structure
Sequential Circuits
I Sequential circuits are called sequential because the flowthrough a sequence of states.
I Code example:sum = 0
for i = 1 to n do
sum = sum + i
I The state is the variables, and their values.
Computer Organization: Basic Processor Structure
The Clock
The clock is a device that produces regular ”beat” type signal.
0
1
t
period
I The signal has a rising edge, and a falling edge.I The time for one cycle is called the period.I The frequency is the number of cycle per second.
F = 1P , where F is the frequency, and P is the period.
I The unit of measurement for frequency is a Hertz. 1 Htz = (1cycle) / (1 second). (50MHtz = 50,000,000 cycles persecond.)
Computer Organization: Basic Processor Structure
The Clock (cont.)
I The clock is used to synchronize the state changes ofsequential circuits.
I On of the two signal edges is designated as the trigger edge.
I All all state changes occur on the trigger edge. This simplifiesthe interaction between circuits.
I In our discussion we assume that the trigger edge is the risingedge.
I Although must circuitry in the processor are synchronized,there are a small number of asynchronous circuits. Not havingto wait for the trigger edge for a state change helps speed upasynchronous circuitry.
Computer Organization: Basic Processor Structure
Storage Devices
Types:
I The latch - an unclocked device that stores one bit.
I The flip-flop - a 1-bit clocked storage device.I Device subtypes:
I D-typeI J-K-type
Computer Organization: Basic Processor Structure
The D-latch
D-latchD
C
Q
Q
The D-latch is controlled by the input C . When C = 1, the latchis loaded with the value D. When C = 0 the latch locks its currentvalue, ignoring D. The output Q is the value stored in the latch.
D-latch exitation table:
D C Q(1)
X 0 Q(0)
0 1 01 1 1
(Q(0) - old latch value, Q(1) - new latch value)
Computer Organization: Basic Processor Structure
The D-latch (cont.)
Examples
Example timing diagram for the D-latch.
D
C
Q
Computer Organization: Basic Processor Structure
The D-flip-flop
DD
Clk
Q
Q>
On the D-flip-flop, the control signal is the clock. The flip-floponly loads exactly at the trigger edge.
D-flip-flop excitation table:
D Clk Q(1)
X ↑/ Q(0)
0 ↑ 01 ↑ 1
(Arrows indicate passing trigger edge.)
Computer Organization: Basic Processor Structure
The D-flip-flop (cont.)
Examples
Example timing diagram for the D-flip-flop.
D
Clk
Q
Computer Organization: Basic Processor Structure
The J-k Storage Devices
>J-K-latch J-K
JJ
KK
Clk
Q Q
Q Q
J-K excitation tables.
J K Q(1)
0 0 Q(0)
0 1 01 0 1
1 1 Q(0)
J K Clk Q(1)
X X ↑/ Q(0)
0 0 ↑ Q(0)
0 1 ↑ 01 0 ↑ 1
1 1 ↑ Q(0)
Computer Organization: Basic Processor Structure
The J-K Storage Device (cont.)
Operations of the J-K Device:
I Lock: The device keeps its current value. This operation isspecified with J = K = 0.
I Set: The value of the device changes to 1. This operation isspecified with J = 1,K = 0.
I Reset: The value of the device changes to 0. This operation isspecified with J = 0,K = 1.
I Compliment: The value of the device is toggled from 0 to 1,or from 1 to 0. This operation is specified with J = K = 1.
Computer Organization: Basic Processor Structure
Flip-Flops with Extra Pins
DD
Clk >
ST CL
LD
Q
Q
I The set pin (ST) is asynchronous (changes do not wait forthe clock pulse, but occur instantly). It initializes the flip-flopto 1.
I The clear pin (CL) is also asynchronous, and initializes theflip-flop value to 0.
I The load pin (LD) disables the clock signal, locking the valueof the flip-flop.
Computer Organization: Basic Processor Structure
Flip-Flops with Extra Pins (cont.)
It is possible to implement the LD line on flop-flops that do nothave a load input, using a feedback loop.
>
D
D
LD
Clk
Q
Q
0
1
Computer Organization: Basic Processor Structure
Sequential Design using the FSM
The tool for sequential circuit design is the finite state machine(FSM). The state diagram is a graphical representation of an FSM.
Examples (FSA0)
1
0
1
0
10
1
0
00/1 01/1
10/111/0
I States are the circles. Theirlabels are S/P, where S isthe state number, and P isthe output.
I Transitions are the arrows.They are labeled with I , theinput. Transitions from agiven state must havemutually exclusive labels.
Computer Organization: Basic Processor Structure
The FSM and State Diagrams
Examples (FSA0 (cont.))
The interface for FSA0.
>
i
ClkpFSA0
The FSM shows the output ateach state, and the transitionform one state to the next, onthe clock pulse, and based on theinput.
Examples (FSA1)
0/10 1/01
00,10
01,11
00,11
01,10
>Clk
ab
c1
c0FSA1
Computer Organization: Basic Processor Structure
The FSM and the State Transition Table
The state transition table is a tabular representation of the statediagram.
Examples (FSA0)i Q(0)1 Q(0)0 Q(1)1 Q(1)0 p
0 0 0 0 1 10 0 1 0 1 10 1 0 0 0 10 1 1 0 0 01 0 0 0 0 11 0 1 1 0 11 1 0 1 1 11 1 1 1 1 0
Computer Organization: Basic Processor Structure
The FSM and the State Transition Table (cont.)
I The table has an input half, and an output half.
I In the input half you list the circuit inputs, and the bits of thecurrent state number, Q(0).
I In the output half you list the next state, Q(1), and the circuitoutputs.
I Each row represents a transition.
I Circuit output is based on the current state.
Computer Organization: Basic Processor Structure
The FSM and the State Transition Table (cont.)
Examples (FSA1)
The transition table for FSA1.
a b Q(0) Q(1) c1 c0
0 0 0 0 1 00 0 1 1 0 10 1 0 1 1 00 1 1 0 0 11 0 0 0 1 01 0 1 0 0 11 1 0 1 1 01 1 1 1 0 1
Computer Organization: Basic Processor Structure
State Diagrams, and Transition Tables; Building OneRepresentation from the Other
From table to diagram.
I Lay down states using numbers from the current state column.
I Fill in outputs from the output columns.
I Draw arrows, one per row in the state table, from the currentstate to the next state.
I Fill in the input labels on the diagram, from the inputcolumns in the table.
Computer Organization: Basic Processor Structure
State Diagrams, and Transition Tables; Building OneRepresentation from the Other (cont.)
From diagram to table.
I Create the state table heading, listing out the input variables,the bits of the current state number, the bits of the next statenumber, and the output variables.
I Fill in all possible bit configurations on the input half of thetable.
I On each row, fill in the output for the current state.
I On each row, fill in the next state, using the arrow in thestate diagram corresponding to the row in the transition table.
Bits in the state number: for m states, you will have dlog me bits.
Computer Organization: Basic Processor Structure
Moore versus Mealy Machines
A Moore machine associates output with the current state only. AMealy machine associates output with the current state, and theinput. The result, in the Mealy diagram, is that the output label ison the transition, and not the state.
Examples (Mealy machine for FSA1)
0 1
00/10,10/10
01/01,11/01
00/01,11/01
01/10,10/10
Computer Organization: Basic Processor Structure
Implementing a Sequential Design
The Structure of a sequential circuit.
>ControlRegister
Input
Output
Q(0)Q(1)
I The register is a collection of flip-flops that store the currentstate number.
I The control circuit is a combinational circuit that calculatesthe output, and the next state.
Computer Organization: Basic Processor Structure
Implementing a Sequential Design (cont.)
Examples (FSA0)
Equations for next state, and output are derived using K-maps, inhe usual way.
Q(1)1 = iQ(0)0 + iQ(0)1 = i(Q(0)0 + Q(0)1)
Q(1)0 = i · Q(0)1 + iQ(0)1 + Q(0)1Q(0)0
p = Q(0)1 + Q(0)0
I Use one flip flop to stoer each bit of the current state number.
I The input of the flip-flop is the next state, and the output ofthe flip-flop is the current state.
Computer Organization: Basic Processor Structure
Implementing a Sequential Design (cont.)
Examples (FSA0 (cont.))
Schematic of FSA0.
>
>
D0
D1
pi
Q(0)0
Q(0)1
Q(1)0
Q(1)1
Computer Organization: Basic Processor Structure
Implementing a Sequential Design (cont.)
Examples (FSA1)
Equations.Q(1) = ab + aQ(0)b + aQ(0)b = ab + a(Q(0) ⊕ b)
c1 = Q(0)
c0 = Q(0)
Schematic.
>D
a
b c0c1
Computer Organization: Basic Processor Structure
Sequential Circuit Analysis
Going from schematic to FSM. Reverse the procedure used indesign.
Examples
Schematic:
zp
D0
D1
>
>
Computer Organization: Basic Processor Structure
Sequential Circuit Analysis (cont.)
Examples (cont.)
Equations (byfollowing connectionsin the schematic):
p = Q(0)0 � Q(0)1
Q(1)0 = zQ(0)0
Q(1)1 = Q(0)0 +zQ(0)1
Table:
z Q(0)1 Q(0)0 Q(1)1 Q(1)0 p
0 0 0 1 0 10 0 1 0 1 00 1 0 1 0 00 1 1 0 1 11 0 0 1 0 11 0 1 0 0 01 1 0 1 0 01 1 1 1 0 1
Computer Organization: Basic Processor Structure
Sequential Circuit Analysis (cont.)
Examples (cont.)
State Diagram (copy out rows as transitions):
00/1 01/0
10/011/1
1
0
0
0,1
0,1
1
Computer Organization: Basic Processor Structure
Common Sequential Circuits
I Used to store multiple bit binary numbers.
I They use one flip flop to store each of the bits.
I Bit numbering:x = 1100 = x3x2x1x0
I Register types.I Parallel load register.I Shift register.I Counter.
Computer Organization: Basic Processor Structure
The Parallel-Load Register
It’s a multi-bit flip-flop.The LD input causes the MUX’s to feed the value back, for a lockoperation, or feed in a new value, for a load operation.
D0D1D2D3
> > > >
0
1
0
1
0
1
0
1d0d1d2d3
Q0Q1
Q2Q3
LD
Computer Organization: Basic Processor Structure
The Shift Register
The input SH controls the operation: SH = 0, to lock the register,and SH = 1 to perform a shift.Input MUX’s implement the operations with feedback loops, or theoutput of the adjacent bit.
Cincout
cin cout
Shift-left
Shift-right
Computer Organization: Basic Processor Structure
The Shift Register (cont.)
Shl-Reg4-bit
SH
Q
>
4cin
cout
D0D1D2D3
> > > >
0
1
0
1
0
1
0
1
Q0Q1Q2Q3
SH
cout
cin
Computer Organization: Basic Processor Structure
The Counter
I An input IN chooses an operation: IN = 0, the register islocked, and IN = 1, the register increments.
I The increment takes the register through the sequence 0000,0001, 0010, ..., 1111, 0000, ..., one value per clock cycle.
I It uses an adder to increment.
I The input MUX now chooses between a feedback, or theadder.
Computer Organization: Basic Processor Structure
The Counter (cont.)
Count4-bit
IN
Q
>
4
cout
D0D1D2D3
> > > >
0
1
0
1
0
1
0
1
Q0Q1Q2Q3
INcout
+ ++ +1
0000
Computer Organization: Basic Processor Structure
The Standard Register
Reg4-bit>
d Q
cout
LD IN CL
4 4
d
LDINCL
Qcout
0123
Enc
D +>
0000
0001
4
4
4
4
4
4
4
4
2
4
Computer Organization: Basic Processor Structure
The Standard Register (cont.)
I We combine an increment, a load, and a clear operation toform a register that we use regularly.
I All 4-bit inputs are are shown by abrevieted notation, using abus.
I A MUX chooses between one of four computation units thatcalculate one of the operations.
I An encoder turns the three trigger lines into a code that canbe used to operate the MUX.
Computer Organization: Basic Processor Structure
Chapter 4: Devices and the Bus
I Devices that interact with the processor are mostly external tothe processor, but on the motherboard
I Device types (collectively knwn as external devices):I Memory devices.I Peripheral devices.
I Connection:I Direct connection - the processor can be connected to each
device using dedicated connections.I Bus connection - the processor is connected via a single shared
line to all devices.
I Comparison:I Wiring complexity - Bus connection produces simpler wiring.I Concurrent communication - Direct connection allows several
devices to communicate with the processor, simultaneously.
Computer Organization: Basic Processor Structure
Devices and the Bus (cont.)
CPU Mem IO DevIO Dev
Bus
Computer Organization: Basic Processor Structure
Memory
I Stores multi-bit values.
I Each storage device is calleda word.
I The memory unit has a size:l × w , where l is the lengthof the unit (number ofwords), and w is the widthof the unit (number of bitsper word).
I Words are given addresses(numbers) to identify them.
0
1
2
3
4
5
6
7
Address
Memory8x4
Computer Organization: Basic Processor Structure
Memory (cont.)
Memory operations:
I Read : produce the contents of a particular memory location.
I Write: store a given value in a particular memory location.
Memory types:
I Read Only Memory (ROM). (Allows a read operation only)
I Random Access Memory (RAM). (does both read and writeoperations)
RAM8x4 ROM
8x4Din
Dout
A
W E
Dout
A
E
33
444
Computer Organization: Basic Processor Structure
Memory Types
I ROM’s are used, for example, to provide manufacturerinformation to an OS. (like the BIOS)
I RAM’s are the standard working memory in a computer.I Inputs
I A - the address of the word.I Din - the input data for a write operation.I W and E - control the operation on a RAM unit. Assert W
for a write operation, and assert E for a read operation.I Dout - the output data for a read operation.
Computer Organization: Basic Processor Structure
Memory Types (cont.)
Performing a read operation.
1. Assert he desired address on the A port.
2. Strobe the E line, and allow time for the data to present itselfon the Dout port.
Performing a write operation.
1. Set up the inputs.
1.1 Assert the desired address on the A port.1.2 Assert the desired data on the Din port.
2. Perform the operation by strobing (setting and the thenresetting) the W line.
Computer Organization: Basic Processor Structure
Memory Composition
The size of a memory unit is 2k ×m, where 2k is its length, and mis its width. The unit would have a k-bit address port, to representaddresses between 0 and 2k .
Examples
An 8× 4 memory has eight 4-bit words.An address is 3 bits (8 = 23). to specify addresses between 0 and7 (000 - 111).
Composition types:
I Horizontal - Creating a wider memory unit out of thinnerunits.
I Vertical - Creating a longer memory unit out of shorter units.
Computer Organization: Basic Processor Structure
Horizontal Composition
Examples (Building an 8× 4 RAM from two 8× 2 RAMs.)
RAM8x4
A Dout43
E
RAM8x2
RAM8x2
A
E
Dout,3-2
Dout,1-03
3
3 2 1 0 3 2 1 0
RAM 8x4RAM 2x(8x2)
2
2
W
Din,1-0
Din,2-34Din
Computer Organization: Basic Processor Structure
Vertical Composition
Examples (Building an 8× 4 ROM from four 2× 4 ROMs.)
0
1
23
4
5
6
7
0
1
23
4
5
6
7
ROM 8x4 ROM 4x(2x4)
ROM8x4
A Dout43
E
Dec2-4
0
1
2
3
A0
A1
A2
ROM2x4
ROM2x4
ROM2x4
ROM2x4
E Dout
4
4
4
4
4
Computer Organization: Basic Processor Structure
Vertical Composition (cont.)
I The ROM is split into four sections. Each section is coveredby a small ROM unit.
I We number the small units, 0 - 3, for our example. The 3-bitaddress is split into a unit number, and an internal address.
(The field sizes depend onthe composition beingperformed.)
A0A1A2
Unit # Int. Address
I The unit number is used to enable the correct ROM, and theinternal address is fed into the ROM as its address signal.
I This, where the unit number is the high-order part of theaddress, is called high-order interleaving.
I When the unit number is the low-order part of the address,that is called low-order interleaving.
Computer Organization: Basic Processor Structure
Internal Memory Structure
Dec2-4
0
1
2
3
Reg0
Reg1
Reg2
Reg3
2
2
2
LD
LD
LD
LD
W E
Dout
Din
A
2
2
2
2
Computer Organization: Basic Processor Structure
Internal Memory Structure (cont.)
I Shown is a 4× 2 RAM. Each word is stored in a register.
I An address decoder turns an address into trigger lines.
I AND gates check for the the selected row, and the correctoperation.
I The input, Din, presents itself at each register, and enters theregister only if its LD input is triggered.
I The output, from each row is allowed onto the output bus,Dout , only if the tri-state switch is opened.
I A ROM has the same output structure, and no input.
Computer Organization: Basic Processor Structure
RAM Types
RAM units can be classified as follows.
I Dynamic RAM (DRAM). It uses capacitors to store bits.(Charged is a 1, and depleted is a 0.) Capacitors leak overtime. A capacitpor memory has to be rewritten (refreshed) topersist.
I Static RAM (SRAM). It uses latches to store Boolean values.
Comparison:
I Access Speed: DRAM units tend to be slower than SRAM.This is because charging capacitors requires a latency.
I Density: Capacitors can be built much smaller than gates, andthe DRAM can be built more compactly than the SRAM.
I Cost: Storage using capacitor technology is cheaper to buildthan storage using the technology used in gates.
Computer Organization: Basic Processor Structure
ROM Types
I ROM: Standard read-only memory. Contents are burnt in oncreation.
I PROM: Programmable ROM. Chips are originally blank.Using a PROM burner you ip;oad its contents. Once burnt, itis permanent.
I EPROM: Erasable PROM. The chip contains a window,through which you shine UV light, which erases the chipcontents. So, the chip can be reprogrammed.
I EEPROM: Electrically EPROM. The chip is erasable, like theEPROM, only with a special high voltage input pin.
Computer Organization: Basic Processor Structure
Word and Byte Addressing
I The same memory is used o store both integers, andcharacters, which have radically different sizes.
I A character requires 8 bits (1 byte) to represent 256 possiblekeyboard characters.
I Use a combined memory. For a 16-bit integer, each wordwould be 16 bits. It would be split into 4 bytes, allowing us tostore 4 characters in it.
I Each byte has an address.
I Word addresses are multiples of 4. Byte addresses aremultiples of 1
Computer Organization: Basic Processor Structure
Word and Byre Addressing (cont.)
Addressing for a 16× 4 memory:
0
2
4
68
10
12
14
RAM 8x16
01
byte
Instructions to store data into a bytemovb M[7], R0
or a wordmovw M[6], R0
Computer Organization: Basic Processor Structure
Machine Byte Order
Addressing Schemes
I little-endian
I big-endian
0123 0 1 2 3
little-endian big-endian
Computer Organization: Basic Processor Structure
Peripheral Devices
I Input devices. These are devices from which the processorreads data. The keyboard and pointer devices like the mouseare examples of input devices.
I Output devices. These are devices to which the processorwrites data. The monitor and printer are examples of suchdevices.
I I/O devices. These are devices that combine both an inputelement and an output element. The processor can write to,and read from, these devices. An example of such a device isa disk drive.
Computer Organization: Basic Processor Structure
Peripheral Device Types
Reg
Reg
In
Out
I/O
Din
Din
Dout
Dout
LD
LD
E
E
Output Device
Input Device
I/O Device
Computer Organization: Basic Processor Structure
Device Interface
I The output device has a register that is loaded with theoutput value.
I The input device has a switch that lets the input out of theoutput port.
I The I/O device has both interfaces.
Computer Organization: Basic Processor Structure
Device Polling
I Problem: There is no way to determine when a device is readywith new input/output.
I Solution: Every device has a READY bit associated with it.
I The READY bit is raised to a 1 by the device, when thedevice is free.
I When the read/write operation is performed, the RFEADY bitis lowered to 0 by the processor.
I Before the processor accesses the device, it checks theREADY bit to see if it is 1, indicating the device is ready.
Computer Organization: Basic Processor Structure
Interrupts
I When using device polling, the processor spends a lot of time“busy waiting”. (In a loop where it checks the READY bit,over and over and over.)
I With interrupts, the processor is sent a signal when theREADY bit is raised. It no longer needs to busy wait.
I The processor can now work on another process, whileprocessing I/O.
I When an in interrupt is received, the processor suspends theprocess it is executing, and jumps to an Interrupt ServiceRoutine (ISR). The ISR handles the interrupt request.
I When the ISR is done, the processor jumps back to where itleft off in the other process.
Computer Organization: Basic Processor Structure
Interrupts (cont.)
Interrupts have many causes. A CAUSE register is used by thedevice to pass the ISR a cause code, so that the ISR knows how to
handle the interrupt.
PC
ISR
User Program
Memory
Computer Organization: Basic Processor Structure
Software Interrupts
I Even a user program can request to be interrupted. (Asoftware interrupt.) Why?
I System security:I User mode – The process in user mode is limited in what
operations it can perform.I Kernel mode – The process in kernel mode is unlimited.
I To do a kernel operation, the user program (operating in usermode) it requests a service of the OS by asking to beinterrupted, and passing the ISR information on its request.
Computer Organization: Basic Processor Structure
The CPU
The processor is a device that executes the machine cycle over andover. Each time the machine cycle is executed, a single machineinstruction is executed.
Machine cycle:
1. Fetch. The PC contains the address of the next instruction tobe executed. The instruction indicated by the PC is fetchedinto the CPU from memory, and the PC is updated.
2. Decode. The processor determines the operation to beperformed, and the location of the operands required.
3. Execute. Any operands are fetched, the operation isperformed, and the result is written to the destination.
Computer Organization: Basic Processor Structure
Bus Communication
Bus structure:
bus
A
D
CtRd
Wt
Computer Organization: Basic Processor Structure
Bus Use
There are three buses:
I The data bus carries data from the processor to the device. Italso carries data from a device to the CPU.
I The address bus carries addresses to memory units.
I The control bus carries the control signals read, to inputdevices, and write, to output devices.
How does a bus device know if a message is for it, or some otherdevice?
Computer Organization: Basic Processor Structure
Bus Addressing
Every device has a collection of bus addresses that belong to it.The bus address is split into two fields:
I The unit number – every device on the bus is given a numberthat identifies it.
I The internal address – memory units are sent addresses forread and write operations.
Computer Organization: Basic Processor Structure
Bus Addressing Example
I An 8× 4 RAM unit; addresses range from 0000000 to0000111.
I A 16× 4 ROM unit; addresses range from 0010000 to0011111.
I An input device with address 0100000.
I An output device with address 0110000.
I An I/O device with address 1000000.
Computer Organization: Basic Processor Structure
Bus Addressing Example (cont.)
Deduce:
I The unit number is 3 bits (there are 5 devices).
I The internal address is 4 bits (the largest memory unit is oflength 16).
I The bus address is 7 bits. This is the size of the address bus.
I The data bus is of width 4 (all units are at must 4 bits wide).
I The RAM should perform a read operation only if the CPUsends a unit number of 000, and a read request.
I The RAM should perform a write operation only if the CPUsends a unit number of 000, and a write request.
Computer Organization: Basic Processor Structure
Bus Addressing Example (cont.)
I The ROM performs a read when the CPU asks for a readoperation on Unit 001.
I The input device performs a read when the request is for aread from Unit 010.
I The output device performs a write operation when therequest is for a write operation on Unit 011.
I The I/O device performs a read or write when the thatoperation is requested on Unit 100.
I For each device control input we use 2 gates:
1. Addressing gate — checks for proper unit number.2. operation gate — checks for the proper operation (read or
write).
Computer Organization: Basic Processor Structure
Example Memory Connection
RAM8x4
ROM16x4
A
D
Rd
Wt
7
4
444
A3-0
A3-0
A6-4
A6,5 A4
W E
E
DinDout
A
DoutA
Computer Organization: Basic Processor Structure
Example Peripheral Connection
A
D
Rd
Wt
7
4
InOut
I/O
Reg
Reg
E
LD E
LD
A6,4 A5 A6 A5,4
A5,4A6
4
4
4 4
Computer Organization: Basic Processor Structure
Chapter 5: The Register Transfer Language Level
I RTL (register transfer langauge) provides a tool for describingcircuitry at a higher level than the FSM or truth-table.
I The tools we have developed so far are structural descriptions.They describe the structure of a circuit.
I RTL is a behaviorla description. It describes the behavior of acircuit.
I An RTL description is a collection of µ-instructions.
I Each µ-instruction describes a circuit.
I A µ-instruction is composed of one or more µ-operations.
Computer Organization: Basic Processor Structure
RTL Design
A µ-instruction has two parts (separated by a colon):I A data-path specification, describing how data flows through
the circuit.I A control part indicating when the µ-operations are
performed.
Examples (RTL implementation)
T : R1← R2
R1
R2
T
LD
>
>
Computer Organization: Basic Processor Structure
RTL Design (cont.)
Examples (Use of trigger gates to generate control.)
ab + c : R1← 0
ab
c
R1
CL
>
Computer Organization: Basic Processor Structure
RTL Design (cont.)
Examples
ab : R1← R1 + R2,R2← 3
a
b
3
+
R1
R2>
>
LD
LD
Computer Organization: Basic Processor Structure
RTL Design (cont.)
Examples (RTL with input choice, using a MUX and OR gate.)ab : R1← R1 + R2
ab : R1← 3
a
b
R1
R2
+
013
LD
>
>
Computer Organization: Basic Processor Structure
A Larger Example
Examples (Use of decoder instead of trigger gates.)x · y : R1← R1 + R3,R2← 0xy : R3← R3 + 1xy : R2← R1,R0← R3xy : R1← R0,R3← 5
R0 R1
R2R3
+
01
Decx
y
5
LD CLLD IN
LD LD
0123
> >
> >
Computer Organization: Basic Processor Structure
RTL Analysis
To generate an RTL description from a schematic:
1. Write down control signals, using the decoder values. For theprevious example, Option 0 gives us xy , Option 1 gives us xy ,and so on.
2. Follow the decoder trigger lines to determine the µ-operationsperformed. For example for Option 2, the trigger line triggersthe LD line on R0, and the IN line on R3. This means that aµ-operation is performed on R0, and Another is performed onR3.
3. Follow the data-path lines to determine the exactµ-instruction. In the example the input port of R0 isconnected to R3, giving us the µ-instructionxy : R0← R3,R3← R3 + 1
4. Repeat the procedure for all decoder options.
Computer Organization: Basic Processor Structure
From Structural to Behavioral Description
The method of transforming from circuit diagram to RTL is notuniversal. Here is a universal method.
i Q(0)1 Q(0)0 Q(1)1 Q(1)0 p
0 0 0 0 1 10 0 1 0 1 10 1 0 0 1 10 1 1 0 0 01 0 0 0 0 11 0 1 1 0 11 1 0 1 1 11 1 1 1 1 0
Computer Organization: Basic Processor Structure
Structural to Behavioral (cont.)
Copy of each row as a µ-instruction.
i · Q1 · Q0 : Q ← 1, p ← 1
i · Q1Q0 : Q ← 1, p ← 1
iQ1Q0 : Q ← 1, p ← 1iQ1Q0 : Q ← 0, p ← 0
iQ1 · Q0 : Q ← 0, p ← 1
iQ1Q0 : Q ← 2, p ← 1
iQ1Q0 : Q ← 3, p ← 1iQ1Q0 : Q ← 3, p ← 0
Computer Organization: Basic Processor Structure
Problems with Reverse Engineering
Building a circuit from this µ-program, with data-path and control,yields a poor design, compared to the original design. Themechanical translation looses semantic information.)
Q
01234567
p
01234567
1
1
1
11
1
0
0
0
1
2
1
3
1
3
0
>
i
2
2
222
22
22
2
LD
1
Computer Organization: Basic Processor Structure
Common Processor µ-Instructions
RTL is good at describing high-level circuitry, but it can be used atall levels.
Examples (Combinational Circuit: the MUX)s1 · s0 : p ← i0s1s0 : p ← i1s1s0 : p ← i2s1s0 : p ← i3
Examples (Sequential Circuit: the J-K flip-flop)JK : Q ← 0
JK : Q ← 1
JK : Q ← Q
Examples (Sequential Circuit: the counter)
IN : Q ← Q + 1
Computer Organization: Basic Processor Structure
Processor µ-Instructions
Arithmetic
1. Addition: X ← X + Y
2. Subtraction: X ← X − Y
3. Increment: X ← X + 1
4. Decrement: X ← X − 1
5. Transfer: X ← Y
6. Clear: X ← 0
Logic
1. AND: X ← X ∧ Y
2. OR: X ← X ∨ Y
3. NOT: X ← X
4. XOR: X ← X ⊕ Y
Computer Organization: Basic Processor Structure
Processor µ-Instructions (cont.)
Shift
1. Logic Shift left: X ← shl X
2. Logic Shift right: X ← shr X
3. Circular shift left: X ← cir X
4. Circular shift right: X ← cil X
5. Arithmetic shift left: X ← ashl X
6. Arithmetic shift right: X ← ashr X
Memory
1. Read: X ← M[AR]
2. Write: M[AR]← X
Computer Organization: Basic Processor Structure
Processor µ-Instructions (cont.)
Logic operations are bitwise. (they are done column by column.)
0110 0 1 1 0∧ 0101 ⇒ ∧ 0 ∧ 1 ∧ 0 ∧ 1
0100 0 1 0 0
Shifts of 1110
1. shl: 1100
2. shr: 0111
3. cil: 1101
4. cir: 0111
5. ashl: 1100
6. ashr: 1111
Memory addresses arespecified using the addressregister (AR). To fetch aninstruction from thelocation specified by thePC requires twoµ-operations.
AR ← PCX ← M[AR]
Computer Organization: Basic Processor Structure
Shift types
Left Right
shl shr
cil cir
ashl ashr
0cout
0cout
cout cout
cout cout0
Computer Organization: Basic Processor Structure
Algorithmic Machines
RTL is typically considered a declarative language: it specifies howa circuit is put together.
We can, however, use it as a procedural language: specifying asequence of steps, or actions.
Examples (The Teapot Example)
Design a control circuit for a teapot.
>
teaS
T H
X
Computer Organization: Basic Processor Structure
Teapot Example
I InputsI S , the switch sensor: S = 0 if the switch is off, and S = 1 if
the switch is on.I T , the temperature sensor: T = 0 if the liquid is too cool, and
T = 1 if the liquid is hot enough.
I OutputsI X , turns off the on/off switch: X = 0 to turn the switch off,
and X = 0 to leave the switch state unchanged.I H, turns on the heating element: H = 0 turns off the element,
and H = 1 turns on the element.
Computer Organization: Basic Processor Structure
Teapot Control Algorithm
stuck = 0
loopforever
if S and not stuck and not T then
H = 1
X = 0
stuck = 0
else if S and not stuck and T then
H = 0
X = 1
stuck = 1
else if S and stuck then
H = 0
X = 0
stuck = 1
else if not S then
H = 0
X = 0
stuck = 0
Computer Organization: Basic Processor Structure
Teapot Flowchart
stuck = 0 S(stuck)T
S(stuck)T
S(stuck)
S
H = 1X = 0stuck = 0
H = 0X = 1stuck = 1
H = 0X = 0stuck = 1
H = 0X = 0stuck = 0
0
0
0
0
1
1
1
1
T0 T1
T2
T3
T4
T5
T6
T7
T8
I Each node of the chart isgiven a state name, Ti .
I A sequencer is a circuit thatproduces timing triggersignals, Ti .
I It consists of a counter, anda decoder.
I Each node becomes aµ-instruction in RTL, withthe timing signals as control,and the node actions asdata-path.
Computer Organization: Basic Processor Structure
Teapot Sequencer
CDec
0123456789ABCDEF
T0T1T2T3T4T5T6T7T8
4
Computer Organization: Basic Processor Structure
Generating RTL from the Flowchart
Def : T0 ≡ C = 0,T1 ≡ C = 1,T2 ≡ C = 2,T3 ≡ C = 3,T4 ≡ C = 4,T5 ≡ C = 5,T6 ≡ C = 6,T7 ≡ C = 7,T8 ≡ C = 8
T0 : stuck ← 0,C ← 1
T1S(stuck) · T : C ← 5
T1S(stuck) · T : C ← 2
T2S(stuck)T : C ← 6
T2S(stuck)T : C ← 3T3S(stuck) : C ← 7
T3S(stuck) : C ← 4T4S : C ← 8T4S : C ← 1T5 : H ← 1,X ← 0, stuck ← 0,C ← 1T6 : H ← 0,X ← 1, stuck ← 1,C ← 1T7 : H ← 0,X ← 0, stuck ← 1,C ← 1T8 : H ← 0,X ← 0, stuck ← 0,C ← 1
Computer Organization: Basic Processor Structure
RTL and Verilog
RTL can be thought of as pseudo-code for VHDL (VLSICHardware Description Language).
// tea pot controller
module teapot(clk, S, T, H, X);
// input ports
input clk, S, T;
// output ports
output reg H, X;
// internal registers
reg stuck;
reg [3:0] C;
// define the states
assign T0 = C == 4’b0000;
assign T1 = C == 4’b0001;
assign T2 = C == 4’b0010;
assign T3 = C == 4’b0011;
assign T4 = C == 4’b0100;
assign T5 = C == 4’b0101;
assign T6 = C == 4’b0110;
assign T7 = C == 4’b0111;
assign T8 = C == 4’b1000;
// the circuit behavior
always @(posedge clk) begin
if (T0) begin
stuck = 0;
C = 4’b0001;
end
if (T1) begin
if (S && !stuck && !T)
C = 4’b0101;
else
C = 4’b0010;
end
Computer Organization: Basic Processor Structure
RTL and Verilog (cont.)
if (T2) begin
if (S && !stuck && T)
C = 4’b0110;
else
C = 4’b0011;
end
if (T3) begin
if (S && stuck)
C = 4’b0111;
else
C = 4’b0100;
end
if (T4) begin
if (!S)
C = 4’b1000;
else
C = 4’b0001;
end
if (T5) begin
H = 1;
X = 0;
stuck = 0;
C = 4’b0001;
end
if (T6) begin
H = 0;
X = 1;
stuck = 1;
C = 4’b0001;
end
if (T7) begin
H = 0;
X = 0;
stuck = 1;
C = 4’b0001;
end
Computer Organization: Basic Processor Structure
RTL and Verilog (cont.)
if (T8) begin
H = 0;
X = 0;
stuck = 0;
C = 4’b0001;
end
end // behavior
// initialize the state
initial begin
C = 4’b0000;
H = 0;
X = 0;
end
endmodule
I The moduledefinition gives thenames of the inputand output ports.This is followed bydeclarations that givethe port types, andsizes.
I Types are either inputor output, or reg, aregister, with optionalbit numbers tospecify the size.
Computer Organization: Basic Processor Structure
RTL and Verilog (cont.)
I We define the timing signals, Ti , according to the sequencervalues.
I An action section specifies that the action takes place on thepositive edge of the clock signal.
I µ-instructions are implemented as if expressions. The testimplements the control, and the body implements thedata-path.
I The code contains µ-instructions for all timing signals, T0 –T8.
I The last section initializes the registers.
Computer Organization: Basic Processor Structure
Chapter 6: Common Computer Architectures
I We examine some common ways of organizing a processor.
I Each organization is currently in use in some processor.I Topics
I ISA (instruction set architecture) — what instructions areavailable.
I Instruction format — how information is codded as a number.I Addressing modes — how location of operands is specified as
a number.
Computer Organization: Basic Processor Structure
Instruction Set Architecture
Instruction types
I Data transfer. Move data from one location to another.
I Data manipulation. Perform arithmetic, logic, or shiftoperations on data.
I Control. Change the order of execution of machineinstructions.
Computer Organization: Basic Processor Structure
Data Transfer Instructions
Categories based on location of the data.
I Register-to-Register. Movement inside the processor from oneregister to another.
mov R0, R1 ; R0 <- R1
I Register-to-Memory. Movement from a processor register outto a memory unit.
store 5, R1 ; M[5] <- R1
I Memory-to-Register. Movement from a memory in to aprocessor register.
load R0, 5 ; R0 <- M[5]
Computer Organization: Basic Processor Structure
Data Transfer Instructions (cont.)
I Register-to-Device. Movement from a register out to anoutput device. (Devices are designated by their channelnumber.)
out 3, R0 ; D[3] <- R0
I Device-to-Register. Movement from an input device to aregister.
in R0, 3 ; R0 <- D[3]
I/O organization
I In special instruction I/O (as above) the processor uses inputand output instructions to perform I/O.
I In memory-mapped I/O devices are mapped to special meorylocations. (I/O is just movement to/from memory.)
store 255, R0 ; D[255] = M[255]
Computer Organization: Basic Processor Structure
Data Manipulation & Data-Types
The processor operates on Data.
Common data-types
I Integer data.
I Real data.
I Boolean data.
I Character data.
I Binary coded decimal (BCD) data.
Computer Organization: Basic Processor Structure
The Integer Data-Type
I Integer data consists of whole numbers.
I Integers are stored in a word.
I Word-size must be large enough to store the integer values auser is interested in operating on. Eight bits is not sufficient.A typical word size might be 32 bits.
I Integer typesI Unsigned integer – all bit configurations of the word are used
to represent non-negative numbers. (The range for a 32-bitword is 0 – 232 − 1 < 4× 109.)
I Signed integer – half the bit configurations are used fornon-negative integers, and half are used for negative integers.
Computer Organization: Basic Processor Structure
The Real Data-Type
I A real number is a number with a fractional part.I Real number representation is based on scientific notation.I There are three important pieces of information in the
scientific notation: sign, mantissa, and exponent.I On a computer, this scientific notation representation is called
floating-point formatI There are two sized floating-point formats: single precision,
with a 32-bit FP word., and douple precision, with a 64-bitword.
−45.375 = −4.5375× 101
sign
exponent mantissa
Computer Organization: Basic Processor Structure
The Boolean Data-Type
I It only takes a single bit to store the values true, or false.
I Most computer memories can only be accessed by the word,or byte, and so this is not convenient.
I Boolean values are stored in a byte: 0 for false, and not 0 fortrue.
Computer Organization: Basic Processor Structure
The Character Data-Type
I To represent characters on a computer, (we can only storenumbers on a computer) the characters must be encoded.
I All of the characters on a keyboard can be numbered withcode from 0 to 255. This requires 8 bits.
I Eight bits is called a byte. It is possible to encode allcharacter on the keyboard with code that fits in a byte.
I The standard 1-byte code is ASCII (American Standard Codefor Information Interchange).
I The ASCII byte is only large enough for the Latin characters.TO represent other languages and scripts a larger code isneeded.
I UNICODE is a 16-bit code. ASCII is a subset of UNICODE.To form the UNICODE code for a Latin character you prefix itwith a byte of 0. (Other languages have prefix bytes that arenot 0.)
Computer Organization: Basic Processor Structure
Binary Code Decimal (BCD)
I Humans work in decimal. When inputing and outputingnumbers, numbers typically need to be converted to or fromdecimal and binary.
I BCD is a way of representing integers that allows easyconversion to or from binary and decimal. The drawback, isthat arithmetic is more complex for BCD than it is forpositional binary.
I In BCD, an integer is represented as astring of 4-bit binaryencodings of its digits.
Examples
Decimal 365 has digits3 — 0011 6 — 0110 5 — 0101
In BCD: 0011 0110 0101
Computer Organization: Basic Processor Structure
Data Manipulation Operation Types
I Arithmetic operations. The usual arithmetic operations, likeaddition, subtraction, multiplication, and division.
I Logic operations. Bitwise Boolean operation, like AND, OR,and NOT.
I Shift operations. Shifting integers left or right.
Computer Organization: Basic Processor Structure
Data Manipulation Operations: Arithmetic
Arithmetic Operations.
I A processor must support arithmetic on all numericdata-types.
I The ISA may contain an instructioniadd R0, R1
for integer addition, andfadd F0, F1
for floating-point addition.
Computer Organization: Basic Processor Structure
Data Manipulation Operations: Logic
Logic Operations.
I Logic operators allow us to manipulate individual bits in aninteger.
I As an exampleor R0, #00010000b
sets Bit 4 in R0, ORing R0 with the mask 00010000.
I In the exampleand R0, #00010000b
the AND operator clears all bits but Bit 4 in R0.
Computer Organization: Basic Processor Structure
Data Manipulation Operations: Shift
Shift Operations.
I Shift operations can be to the right, or left, and can belogical, circular, or arithmetic.
I As an exampleshl R0
shifts R0 by one bit to the left.
I Shift operators can be used to perform multiplication by aconstant. They can do this faster than a full multiplicationinstruction.
I As an example, to multiply R0 by 5:mov R1, R0 ; R1 <- R0
shl R0 ; R0 <- R0 * 2
shl R0 ; R0 <- R0 * 2
add R0, R1 ; R0 <- R0 + R1
Computer Organization: Basic Processor Structure
Control Operations
Control operations change the flow of control of a program fromthe next sequential instruction to another instruction.
Types of instructions
I Unconditional branches.
I Conditional branches.
I Machine reset.
I Context manipulation.
Computer Organization: Basic Processor Structure
Unconditional Branches
...jump xyz
...xyz:...
I The jump causes the execution of the instruction at label xyz.A label is a symbolic address.
I The smbolic address represents the memory address of thenext instruction in machine language.
I The jump instruction is called unconditional because thebranch is always taken. In a conditional branch, the branch istaken only under certain circumstances.
Computer Organization: Basic Processor Structure
Conditional Branches: Arithmetic
...beq R0, xyz
...xyz:...
I The beq checks to see if R0 is equal to zero. If so it branchesto address xyz.
I If R0 6= 0, execution continues with the sequentially nextinstruction.
I It is called an arithmetic branch because the beq instructionperforms arithmetic (compares R0 with 0) to determine if thebranch should be taken.
Computer Organization: Basic Processor Structure
Conditional Branches: Status Flag...
sub R0, #0
bz R0, xyz
...xyz:
...
I In status flag branches, there is a processor register (theFLAGS register) which is a collection of status bits.
I When the processor performs arithmetic it sets the status bitsaccording to the results.
I The bz (Branch if zero) instruction checks the Z status FLAGto determine if a jump should occur. The Z flag is set if theresults of the last arithmetic operation were zero.
I The subtract instruction before the bz instruction is used toset the Z flag to 1 if R0 is 0.
Computer Organization: Basic Processor Structure
Machine Reset Instruction
halt
I This instruction halts the machine.
I Usually the user does not want to terminate a program byhalting the machine.
I Usually, when a program finishes it should return control tothe operating system.
I Only low level OS programs would need to halt the machine.
Computer Organization: Basic Processor Structure
Context Manipulation Instructions: Subroutines
High and low-level subroutine instructions....f()...function f()
begin...
return
end
...call f
...f:...
ret
Computer Organization: Basic Processor Structure
Context Manipulation: Subroutines (cont.)
I A function call is translated into a call instruction.
I A return statement is translated into a ret instruction.
I A call instruction causes a jump to the address specified.
I A ret instruction causes a jump back to the instructionfollowing the call instruction. (This jump is back to what iscalled the return address.)
I The return address is pushed on to the system stack by thecall instruction.
I When the ret instruction needs the return address, it ispopped off the stack.
Computer Organization: Basic Processor Structure
Context Manipulation: Subroutines (cont.)
Stack:I A LIFO (last in first out) data structure.I It supports a pop operation, which takes the top element off
the stack.I It supports a push operation that places a new element on the
top of the stack.
SP
SP
SP
push(x)push(y)
xy
x
pop() = y
Computer Organization: Basic Processor Structure
Context Manipulation: Subroutines (cont.)
call instruction actions:
1. Push the return address on to the system stack.
2. Jump to the subroutine address.
ret instruction actions:
1. Pop the return address off of the system stack.
2. Load the popped return address into the PC.
Computer Organization: Basic Processor Structure
Context Manipulation: Subroutines (cont.)
Examples (A function f calls a function g)
SP
SPSP
ra(f) ra(f)
ra(g)
call f call g
retret
Computer Organization: Basic Processor Structure
Context Manipulation Instructions: Interrupts
I Interrupts are like subroutines, except there is no callinstruction executed.
I Upon an interrupt, the control jumps to the start of the ISR.
I The interrupted program must be unaware that it has beeninterrupted.
I When the interrupt occurs, the state of the processor registersmust be saved. (This is called context saving.)
I On return from the interrupt, the processor registers must berestored (context restoration).
I An interrupt may be caused by stack problems. The returnaddress, and the context are therefor save in the ISR area,rather than on the stack.
Computer Organization: Basic Processor Structure
Context Manipulation: Interrupts (cont.)
I A prgram can request to be interrupted with a syscallinstruction.
I A syscall instruction is issued when the program requests theOS to perform a service which it does not have permission toperform itself (like use a printer).
Examples (A syscallexecution.)...
syscall...ISR:...
iret
I The syscallinstruction jumps tothe fixed ISRlocation, and savesthe context.
I The iret instructionjumps back to thereturn address, afterrestoring the context.
Computer Organization: Basic Processor Structure
Context Manipulation: Interrupts (cont.)
Executipon Modes:
I User mode. The program operating in user mode hasrestrictions. It cannot access certian devices, and certainmemory sections.
I Kernel mode. The program operating in kernel mode can doanything.
The users’ programs, typically operate in user mode. Many partsof the OS operate in kernel mode.
To terminate, a program asks the OS to take over. This is donethrough a syscall.
Computer Organization: Basic Processor Structure
Instruction Format
Assembly instructions must be represented numerically.
Instruction partsadd R0, R1
I The operation: the operation being performed. In thisinstruction it is addition.
I The destination: the operand where the result will be stored.In this instruction this would be R0.
I The source: the other operand. In this case this would be R1.
Machine code equivalent
op dst src
6 5 5
Computer Organization: Basic Processor Structure
Instruction Format (cont.)
Example instruction
001001 00000 00001
dst srcop
Full instruction: 0010010000000001
I The machine has 26 = 64 instructions in its ISA.
I The machine has 25 = 32 registers.
Computer Organization: Basic Processor Structure
Addressing Modes
Numbers in operand fields have several interpretations, in a lesssimplistic computer. They could indicate a register number, amemory address, or a constant.Addressing modes indicate where the operand is found.Addressing Modes:
1. Direct mode.
2. Indirect mode.
3. Register direct mode.
4. Register indirect mode.
5. Immediate mode.
6. Implicit mode
7. Relative mode.
8. Indexed mode.
Computer Organization: Basic Processor Structure
Addressing Modes (cont.)
Direct mode: The operand field gives the address of the effectiveoperand.
load R0, 5
R0← M[5]
Indirect mode: Theoperand field gives theaddress of a pointer to theeffective operand.
load R0, (5)
R0← M[M[5]]
RAM
5
9 3
9
Computer Organization: Basic Processor Structure
Addressing Modes (cont.)
Register direct mode: The operand field contains the number of aregister which contains the effective operand.
mov R0, R5
R0← R5
Register indirect mode: The operand field contains the number ofa register that contains a pointer to the effective operand.
load R0, @R5
R0← M[R5]
Immediate mode: The operand field contains the effective operand.mov R0, #5
R0← 5
Computer Organization: Basic Processor Structure
Addressing Modes (cont.)
Implicit mode: An operand is not explicitly given.Operand explicitly given (subroutine address):
call 5
Operand not given explicitly (ISR address):syscall
Relative mode: The address of the effective operand is calculatedas the contents of the operand field added as an offset to thecontents of the PC.
load R0, $5
R0← M[PC + 5]
Computer Organization: Basic Processor Structure
Addressing Modes: Relative Mode
Relative mode addressing allows the relocation of a program,without modifying addresses.
Examples (A program that addresses a location 125, inside itsworkspace.)
I VA: The address is absolute.load R0, 125
I VR : The address is relative to the PC.load R0, $25
Computer Organization: Basic Processor Structure
Addressing Modes: Relative Mode (cont.)
If the workspace is moved, the load instruction is VA will fetchfrom outside the workspace. The load for VR will fetch from thesame offset inside the workspace.
PC
125
PC
125
PC
80
100
30
50 7550
30
7550 + 25
VA VR
Computer Organization: Basic Processor Structure
Addressing Modes (cont.)
Indexed mode: The address of the effective operand is calculateadding together two fields in the instruction: an operand offsetfield, and and operand index register field.
load R0, 5(R1)
R0← M[R1 + 5]This mode allows for the easy implementation of array structuresin memory.
Computer Organization: Basic Processor Structure
Addressing Modes: Indexed Mode
Array layout in memory.I The array has a base address.I Each element is located at an address which is the base
address added to an offset.
A[0]A[1]A[2]A[3]
A + 0A + 1A + 2A + 3
A
.
.
.
.
.
.
Computer Organization: Basic Processor Structure
Addressing Modes: Index Mode (cont.)
Index mode is used to implement common array operations, withthe index register storing the offset.
for i = 0 to n-1 do
x = x + A[i]
(Assume that the variablei is stored in R1. Memoryaddresses are specified assymbolic addresses.)
mov R1, #0
lab1:
load R0, n
sub R0, #1
sub R0, R1
bz ext
add x, A(R1)
add R1, #1
jump lab1
ext:
Computer Organization: Basic Processor Structure
Addressing in Machine Language
A machine with a 16-bit word, a 6-bit op-code, and two 5-bitoperand fields. (Notice that this small operand field is notadequate for a reasonably sized memory unit.) To representaddressing modes, we use one bit of the operand as an addressingmode bit. The single bit chooses between register direct (0), anddirect (1) modes.
op dstDM srcSM
6 1 4 1 4
Examples
add R0, 14
op dstDM srcSM
1001001 0 0000 1110
Computer Organization: Basic Processor Structure
Alternate Machine Architectures
Machines can be classified by the number of operands in theirinstructions.
I Register machine (3-operand machine).
I Register implicit machine (2-operand machine).
I Accumulator machine (1-operand machine).
I Stack machine (0-operand machine).
We use a standard machine specification called the RIM machineto illustrate these alternate architectures.RIM specification:
I Registers R0 through R7.
I A 256× 16 RAM unit.
I A singe I/O device.
I A bus connection.
Computer Organization: Basic Processor Structure
RIM ISA Specification
I Data transfer.I Register-to-register.I Memory-to-register.I Register-to-memory.
I Arithmetic.I Addition.I Subtraction.
I Logic.I AND.I OR.I NOT.
I Control.I Jump.I Branch if zero.I Branch if not zero.
Computer Organization: Basic Processor Structure
The Register Machine
At least the arithmetic instructions have three operands.add R0, R1, R2
R0← R1 + R2ISA:
Assembly Code Machine Code Meaningload R1, m 0000 R1 m R1 ← M[m]store m, γ 0001 γ m M[m]← γadd R1, γ2, γ3 0010 R1 γ2 γ3 R1 ← γ2 + γ3,Z ← (γ2 + γ3) = 0sub R1, γ2, γ3 0011 R1 γ2 γ3 R1 ← γ2 − γ3,Z ← (γ2 − γ3) = 0and R1, γ2, γ3 0100 R1 γ2 γ3 R1 ← γ2 ∧ γ3,Z ← (γ2 ∧ γ3) = 0or R1, γ2, γ3 0101 R1 γ2 γ3 R1 ← γ2 ∨ γ3,Z ← (γ2 ∨ γ3) = 0not R1, γ2 0110 R1 γ2 0000 R1 ← γ2,Z ← γ2 = 0jump m 0111 0000 m PC ← mbz m 1000 0000 m if Z then PC ← mbnz m 1001 0000 m if Z then PC ← m
Computer Organization: Basic Processor Structure
Register Machine: Instruction Format
The register machine has two formats:
I The register format
I The memory format
The memory format allows for larger address fields.
Interpreting the table.
Examples (Register format)
add R1, γ2, γ3
1. First operand, R1 – indicates a register number.
2. Second operand, γ2 – indicates a register, or an immediatevalue.
3. Third operand, γ3 – has the same meaning as the secondoperand.
Computer Organization: Basic Processor Structure
Register Machine: Instruction Format (cont.)
Examples (Register Format (cont.))
add R0, R0, #1
Machine language:
op dst src1SI1 SI2 src2
4 1 3 1 3 1 3
0010 0 000 0000 0011
I op-code – 0010, for the add instruction.
I one bit 0.
I dst – 000, for R0.
I SI1 – 0, for register mode for the first operand.
I src1 – 000, for R0.
I SI2 – 1, for immediate mode for the second operand.
I src2 – 001, for the immediate value #1.
Computer Organization: Basic Processor Structure
Register Machine: Instruction Format (cont.)
Examples (Memory Format)
load R1, m0000 R1 m
1. First operand, R1 – indicates a register number.
2. Second operand, m – indicates a memory address, orimmediate value.
load R2, 19
Machine language:
op dst address
4 1 3 8
0000 0 010 00010011
Computer Organization: Basic Processor Structure
Register Machine: Instruction Format (cont.)
Examples (Memory Format (cont.))
I op-code – 0000 for load.
I one bit of 0.
I dst – 010 for R2.
I address – 00010011 for 19.
Computer Organization: Basic Processor Structure
Register Machine: Instruction Format (cont.)
Instruction types:I Data Transfer
I load – memory to registerI store – register to memory
I Data Manipulation (all set the Z status flag)I addI subI andI orI not
I ControlI jump – unconditional branchI bz – conditional (branch if zero ) (uses the Z status flag)I bnz – conditional (branch if not zero)
Computer Organization: Basic Processor Structure
Register Machine: Programming Example
Examples (Program to output x × y to sum.)sum = 0
i = 0
while i != x do
sum = sum + y
i = i + 1
Computer Organization: Basic Processor Structure
Register Machine: Programming Example (cont.)
Assembly version of the multiplication program.
add R0, #0 #0 ; sum = 0
store sum
add R0, #0 #0 ; i = 0
store i
lp: ; while i != x do
load R0, i
load R1, x
sub R0, R0, R1
bz ext
load R0, sum ; sum = sum + y
load R1, y
add R0, R0, R1
store sum, R0
load R0, i ; i = i + 1
add R0, R0, #1
store i, R0
jump lp
ext:
Computer Organization: Basic Processor Structure
Register Machine: Programming Example (cont.)
Machine language version of multiplier.
Address Machine Code Assembly code00010100 0010 0 000 1 000 1 000 add R0, #0, #0
00010101 0001 0 000 00000000 store sum, R0
00010110 0010 0 000 1 000 1 000 add R0, #0, #0
00010111 0001 0 000 00000001 store i, R0
00011000 0000 0 000 00000001 load R0, i
00011001 0000 0 001 00000010 load R1, x
00011010 0011 0 000 0 000 0 001 sub R0, R0, R1
00011011 1000 0 000 00100100 bz ext
00011100 0000 0 000 00000000 load R0, sum
00011101 0000 0 001 00000011 load R1, y
00011110 0010 0 000 0 000 0 001 add R0, R0, R1
00011111 0001 0 000 00000000 store sum, R0
00100000 0000 0 000 00000001 load R0, i
00100001 0010 0 000 0 000 1 001 add R0, R0, #1
00100010 0001 0 000 00000001 store i, R0
00100011 0111 0 000 00011000 jump lp
Computer Organization: Basic Processor Structure
Register Machine: Programming Example (cont.)
Machine language translation.
I A machine language program is split into two segments ofmemory: a data segment, and a code segment. The datasegment contains variables. The code segment containsmachine instructions.
I Our data segment starts at location 0, and out code segmentstarts at location 20.
I Variable assignments:I isum is M[0]I i is M[1]I x is M[2]I y is M[3]
I The table is used to assemble each machine instruction, fromthe assembly instruction.
Computer Organization: Basic Processor Structure
The Register Implicit Machine
The instructions have two operands.add R0, R1
ISA:Assembly Code Machine Code Meaningload R1, m 0000 1 R1 m R1 ← M[m]mov R1, γ2 0000 0 R1 γ2 R1 ← γ2
store m, R1 0001 0 R1 m M[m]← R1
add R1, γ2 0010 0 R1 γ2 R1 ← R1 + γ2,Z ← (R1 + γ2) = 0sub R1, γ2, 0011 0 R1 γ2 R1 ← R1 − γ2,Z ← (R1 − γ2) = 0and R1, γ2 0100 0 R1 γ2 R1 ← R1 ∧ γ2,Z ← (R1 ∧ γ2) = 0or R1, γ2 0101 0 R1 γ2 R1 ← R1 ∨ γ2,Z ← (R1 ∨ γ2) = 0not R1 0110 0 R1 00000000 R1 ← R1,Z ← R1 = 0jump m 0111 0 000 m PC ← mbz m 1000 0 000 m if Z then PC ← mbnz m 1001 0 000 m if Z then PC ← m
Computer Organization: Basic Processor Structure
Register Implicit Machine: Instruction Format
A single instruction format. The src field can take one of threeforms.
op M dst src
4 1 3 8
Address
Register
8
1 3 0000
srsRI
Immediate
1 7
srsRI
Computer Organization: Basic Processor Structure
Register Implicit Machine: Instruction Format (cont.)
I The M bit controls the interpretation of the src field:I 0 – the src field is either in register direct mode (sRI = 0), or
immediate mode (sRI = 1).I 1 – the src field is an address.
I The load and mov instructions are actually the sameinstruction; one with the src operand in direct mode, and theother with the src operand in either register direct orimmediate mode. This instruction is referred to as themovR/M instruction.
Examples (Forms of the movR/M instruction.)
load R0, 128 ; move from memory
mov R0, R1 ; move between registers
Computer Organization: Basic Processor Structure
Register Implicit Machine: Machine Language
Examples
mov R0, R1
Op-code is 0000 for movR/M, M is 0 for register direct mode, dstis 000 for R0, sRI is 0 for register direct mode, sr is 001 fro R1.
0000 0 000 0 001 0000
mov R2, #4
Op-code is 0000, M is 0, dst is 010 for R2, sRI is 1 for immediatemode, sr is 0000100 for #4.
0000 0 010 1 0000100
load R1, 127
Op-code is 0000, M is 1 for direct mode, dst is 001 for R1, addressis 01111111 for 127.
0000 1 001 01111111Computer Organization: Basic Processor Structure
Register Implicit Machine: Programming Example
mov R0, #0 ; sum = 0
store sum R0
mov R0, #0 ; i = 0
store i, R0
lp: ; while i != x do
load R0, i
load R1, x
sub R0, R1
bz ext
load R0, sum ; sum = sum + y
load R1, y
add R0, R1
store sum, R0
load R0, i ; i = i + 1
add R0, #1
store i, R0
jump lp
ext:
Computer Organization: Basic Processor Structure
Register Implicit Machine: Programming example (cont.)
Address Machine Code Assembly code00010100 0000 0 000 1 0000000 mov R0, #0
00010101 0001 0 000 00000000 store sum, R0
00010110 0000 0 000 1 0000000 mov R0, #0
00010111 0001 0 000 00000001 store i, R0
00011000 0000 1 000 00000001 load R0, i
00011001 0000 1 001 00000010 load R1, x
00011010 0011 0 000 0 001 0000 sub R0, R1
00011011 1000 0 000 00100100 bz ext
00011100 0000 1 000 00000000 load R0, sum
00011101 0000 1 001 00000011 load R1, y
00011110 0010 0 000 0 001 0000 add R0, R1
00011111 0001 0 000 00000000 store sum, R0
00100000 0000 1 000 00000001 load R0, i
00100001 0010 0 000 1 0000001 add R0, #1
00100010 0001 0 000 00000001 store i, R0
00100011 0111 0 000 00011000 jump lp
Computer Organization: Basic Processor Structure
The Accumulator Machine
For the accumulator machine instruction have only one exlicitoperand; a special register, the accumulator (AC) is always asecond implicit operand.
Examples
add R2 ; add R2 to AC
load 128 ; put M[128] into the AC
Computer Organization: Basic Processor Structure
The Accumulator Machine (cont.)
ISA:Assembly Code Machine Code Meaning
load m 0000 1 000 m AC ← M[m]load γ2 0000 0 000 γ2 AC ← γ2
store m 0001 1 000 m M[m]← ACstore R1 0001 0 000 0 R1 0000 R1 ← ACadd γ2 0010 0 000 γ2 AC ← AC + γ2,
Z ← (AC + γ2) = 0sub γ2, 0011 0 000 γ2 AC ← AC − γ2,
Z ← (AC − γ2) = 0and γ2 0100 0 000 γ2 AC ← AC ∧ γ2,
Z ← (AC ∧ R2) = 0or γ2 0101 0 000 γ2 AC ← AC ∨ γ2,
Z ← (AC ∨ γ2) = 0
not 0110 0 000 00000000 AC ← AC ,Z ← AC = 0jump m 0111 0 000 m PC ← mbz m 1000 0 000 m if Z then PC ← m
bnz m 1001 0 000 m if Z then PC ← m
Computer Organization: Basic Processor Structure
Accumulator Machine: Instruction Format
The accumulator machine uses the same format as the registerimplicit machine. The destination field is unused.
op M src
4 1 8
000
(Notice that the store instruction is capable of storing to memory,using direct mode, or a register, using register direct mode.)
Computer Organization: Basic Processor Structure
Accumulator Machine: Programming Example
load #0 ; sum = 0
store sum
load #0 ; i = 0
store i
lp: ; while i != x do
load x
store R0
load i
sub R0
bz ext
load y ; sum = sum + y
store R0
load sum
add R0
store sum
load #1 ; i = i + 1
store R0
load i
add R0
store i
jump lp
ext:
Computer Organization: Basic Processor Structure
Accumulator Machine: Programming Example (cont.)
Address Machine Code Assembly code00010100 0000 0 000 1 0000000 load #0
00010101 0001 1 000 00000000 store sum
00010110 0000 0 000 1 0000000 load #0
00010111 0001 1 000 00000001 store i
00011000 0000 1 000 00000010 load x
00011001 0001 0 000 0 000 0000 store R0
00011010 0000 1 000 00000001 load i
00011011 0011 0 000 0 000 0000 sub R0
00011100 1000 0 000 00101000 bz ext
00011101 0000 1 000 00000011 load y
00011110 0001 0 000 0 000 0000 store R0
00011111 0000 1 000 00000000 load sum
00100000 0010 0 000 0 000 0000 add R0
00100001 0001 1 000 00000000 store sum
00100010 0000 0 000 1 0000001 load #1
00100011 0001 0 000 0 000 0000 store R0
00100100 0000 1 000 00000001 load i
00100101 0010 0 000 0 000 0000 add R0
00100110 0001 1 000 00000001 store i
00100111 0111 0 000 00011000 jump lp
Computer Organization: Basic Processor Structure
The Stack Machine
On the stack machine, instructions have no explicit operands. Alloperands implicitly come off the arithmetic stack.
I Operands are pushed onto the stack.I Operators pop their operands off of the stack, and push their
results onto the stack.
Examples
3× (4 + 5)
3 3
4
3
4
5
3
9
27
push #3 push #4 push #5 add mult
Computer Organization: Basic Processor Structure
The Stack Machine (cont.)
Examples (Arithmetic example)push #3
push #4
push #5
add
mult
Although some instructions have operands, the arithmeticoperations have no operands.
Computer Organization: Basic Processor Structure
The Stack Machine (cont.)
Assembly Code Machine Code Meaning
push m 0000 0 000 m push(M[m])push i 0000 1 000 i push(i)pop m 0001 0 000 m M[m]← popadd 0010 000000000000 push(pop1 + pop2)sub 0011 000000000000 push(pop1 − pop2)and 0100 000000000000 push(pop1 ∧ pop2)or 0101 000000000000 push(pop1 ∨ pop2)not 0110 000000000000 push(pop)jump 0111 000000000000 PC ← popbz 1000 000000000000 if pop1 = 0 then PC ← pop2
bnz 1001 000000000000 if pop1 6= 0 then PC ← pop2
Computer Organization: Basic Processor Structure
Stack Machine: Instruction Format
I There are two types of instructions: 0-operand, and 1-operandinstructions.
I Both instructions fit into the same format as for theaccumulator machine.
I The 0-operand machine leaves the operand field blank.
I pop, and push are used to transfer data from, and to thestack, respectively.
I Notation: Subscripts in the table on the pop operationindicate order. (pop1 is the first pop, and pop2 is the secondpop.)
I The Z flag is no longer used; the bz and bnz instructions nowchecks its first operand (arithmetic branch).
Computer Organization: Basic Processor Structure
Stack Machine: Programming Example
push #0 ; sum = 0
pop sum
push #0 ; i = 0
pop i
lp: ; while i != x do
push #ext
push x
push i
sub
bz
push y ; sum = sum + y
push sum
add
pop sum
push #1 ; i = i + 1
push i
add
pop i
push #lp
jump
ext:
Computer Organization: Basic Processor Structure
Stack Machine: Programming Example (cont.)
Address Machine Code Assembly code00010100 0000 1 000 00000000 push #0
00010101 0001 0 000 00000000 pop sum
00010110 0000 1 000 00000000 push #0
00010111 0001 0 000 00000001 pop i
00011000 0000 1 000 00100111 push #ext
00011001 0000 0 000 00000010 push x
00011010 0000 0 000 00000001 push i
00011011 0011 0 000 00000000 sub
00011100 1000 0 000 00000000 bz
00011101 0000 0 000 00000011 push y
00011110 0000 0 000 00000000 push sum
00011111 0010 0 000 00000000 add
00100000 0001 0 000 00000000 pop sum
00100001 0000 1 000 00000001 push #1
00100010 0000 0 000 00000001 push i
00100011 0010 0 000 00000000 add
00100100 0001 0 000 00000001 pop i
00100101 0000 1 000 00011000 push #lp
00100110 0111 0 000 00000000 jump
Computer Organization: Basic Processor Structure
ISA Design Issues
I Number of registersI The more registers, the more operands that can be held in the
processor, without reading from memory, decreasing operandfetch latency.
I However, the more registers in the processor, the bigger theprocessor circuit, making it slower.
I Word sizeI With a large word size the machine can accommodate large
data values, and it is easier to fill all of the fields of machineinstructions into a single word.
I However, a large word size increases the size of registers, andmemory, slowing them down. Also, for small data, many of theword bits will be wasted.
Computer Organization: Basic Processor Structure
ISA Design Issues (cont.)
I Variable or fixed length instructionsI By using variable length instructions, you can better
accommodate a varying number of operands on the machine.I However, the variable number of words are harder to fetch,
and the circuitry is more complex than that needed if all of theinstructions fit in a single word.
I Memory accessI Allowing all instructions to fetch operands from eliminates the
need to prefetch direct mode instructions.I However, data manipulation instructions must be slowed down
to allow time for the memory fetch. Also, it is a problem to fitmemory addresses in instructions with several operands.
Computer Organization: Basic Processor Structure
ISA Design Issues (cont.)
I Orthogonality – an instruction set is orthogonal if there isonly one way to do any operation.Example: if a machine the two instructions
inc R2
add R2 #1
these two instructions are not orthogonal, because anincrement can be done with an addition instruction.
I Completeness – an instruction set is complete if everyoperation the user requires is in the ISA.
I Orthogonality and completeness are often at odds:Completeness leads to large ISAs, and orthogonality tends torestrict the size of the ISA. Orthogonality can reduce the sizeof the processor, speeding up the machine. Completenessproduces an instruction set that is easier to use.
Computer Organization: Basic Processor Structure
ISA Design Issues (cont.)
I RISC (reduced instruction set computer – Computer classifiedas RISC have small instruction sets, with simple instructions.They tend to have instruction sets that are orthogonal, butharder to use (incomplete).
I CISC (complex instruction set computer) – Computersclassified as CISC have large instruction sets, with complexinstructions that combine operations. Their instruction set iscomplete, but sacrifices orthogonality.
Computer Organization: Basic Processor Structure
ISA Design Issues (cont.)
Architecture
I As we move from 3-operand to 0-operand machines, moreinstructions are needed to perform a programming task.
I However, instruction have more implicit operands, operandsthat are in fixed locations, and the fetching hardware becomessimpler. So, the instruction may execute faster.
I However, we need more memory fetches.
I Which architecture is best depends on a complex set offactors.
Computer Organization: Basic Processor Structure
The BRIM Machine
The BRIM (basic register implicit machine) is the machine we usethroughout the book. It is the register implicit machine presentedwith a single I/O device and instructions to use it.
Assembly Code Machine Code Meaning
in R 1010 0 R 00000000 R ← Din
out γ 1011 0 000 γ Dout ← γ
Computer Organization: Basic Processor Structure
Chapter 7: Hardwire CPU Design
I We construct the processor for the BRIM machine, and adaptit to other architectures.
I Processor design types:I Hardwire control.I Micro-programmed control.
I For Hardwired design, the processor is designed as asequential circuit, at the RTL level, building the data-path,and the control unit.
I For micro-programmed control, the processor is structured asa smaller processor (the sequencer), executingmicro-instructions to perform machine instruction operations.
I We start with hardwired control.
Computer Organization: Basic Processor Structure
Register Implicit Machine Design
I We use the bus-based architecture.
I the BRIM machine has several registers, numbered 0—7.These are collected in a register file.
>
RegA
Din
Dout
3
16
16
LD E
Computer Organization: Basic Processor Structure
Bus-Based Architecture
Reg
>
I/O
PC
DR
SR
ALU
RAM
IR
AR
>
>
>
>
>
01 LD E
RLDSRF
16
IR6-4
IR10-8
16
EW
MEMW
816
16
A
Din
Dout
LD E
IOLD IOE
1616
LD IN
PCLD PCIN SPC
8 88
LDDRLD
16 16
LDSRLD
16 16
ALUOP 16
16
SALU
LDIRLD
16
16
LDARLD
88
16
Z
16
ZLD
ZX
IRX
IR7-08
SAD
3
RESS SD
Mode
IR15-0
>
>
16
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
I The figure shows the data bus.I We have eliminated the address bus and bus addresing. This
is done by expanding the control bus to include dedicatedread/write control lines for each device.
A bus with a single write line, and bus addressing, versus a buswith dedicated write lines.
A
D
D
Wt
Wt0
Wt1
D0 D1
D0 D1
ww
w w
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
I The diagram contains a mix of devices: some internal to theprocessor, like registers, and some external, like the memoryunit.
I The control unit (CU) is not shown. It sends the controlsignals to the bus.
I Bus data is blocked from enetering a register by the registerload line.
I Register data is blocked from entering the bus by a tri-stateswitch.
I Bus connections are 16-bit, to transfer data, and 8-bit, totransfer addresses. For addresses, only the lower half of thebus connection is used.
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
The bus diagram contains an I/O device, a memory unit, a registerfile, and 6 processor registers.
I The Program Counter (PC) — contains the address of thenext instruction to be executed. (8 bits)
I The Destination Register (DR) — the destination operandfrom the current instruction. (16 bits)
I The Source Register (SR) — the source operand from thecurrent instruction. (16 bits)
I The Instruction Register (IR) — the current instruction, afterit has been fetched from memory. (16 bits)
I The Address Register (AR) — used to address the RAM unit.It is hardwired the RAM address port. (8 bits)
I The Zero status flag (Z) — a zero status flag. (1 bit)
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
Verifying that all instructions can be executed.
I Destination. The destination comes over the bus, into theregister file. The register file must be addressable using thedst field (IR bits 8 through 10).
I Source. The source comes from a register, an immediatevalue, or memory. The regiister file must be addressable bythe src field (IR bits 4 through 6). The memory unit must beaddressable by the IR bits 0 through 7. It must also bepossible to put to put an immediate value, IR bits 0 through6, on to the bus.
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
I It must be possible to load the DR, and SR registers from thebus, for the arithmetic and logic instructions.
I For the jump instructions, it must be possible to load the PCoff of the bus.
I The Z flag is calculated with a NOR gate. The ALU is usedto do arithmetic and logic operations.
I Connections must be present to the I/O device.
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
Bus control lines:1. Register file load (RLD)2. Write to memory (MW)3. I/O load (IOLD)4. Load program counter (PCLD)5. Load destination register (DRLD)6. Load source register (SRLD)7. Load instruction register (IRLD)8. Load address register (ARLD)9. Load Z flag (ZLD)
10. Register file enable (RE)11. Enable memory (ME)12. Enable I/O device (IOE)13. Select program counter (SPC)14. Select arithmetic logic unit (SALU)15. Select instruction address (SAD)16. Select register file address (SRF)17. Select source operand (SS)18. Select destination operand (SD)19. Increment program counter (INPC)20. ALU op-code; 3-bit (ALUOP)
Computer Organization: Basic Processor Structure
Bus-Based Architecture (cont.)
Bus status lines:
1. Instruction register contents; 16-bit (IRX)
2. Z flag contents (ZX)
The CU needs access to the Z flag, and the IR register. These aresupplied by the status lines.
Components we have not covered:
I The ALU.
I The Mode MUX (selects the addressing mode).
Computer Organization: Basic Processor Structure
The ALU
A simple add-AND ALU. MUX chooses which computational unitproduces the result.
ALU
A
B
A
B
ZZ
ALUOPALUOP
+
0
1
0
Computer Organization: Basic Processor Structure
The ALU (cont.)
The BRIM ALU. It has two units: an arithmetic unit (AU), and alogic unit (LU).
ALUOP0
+
ALUOP1
0 1
0 1
3 2 1 0
ALUOP2
A
B
Z
16
16
16
1616
16
Computer Organization: Basic Processor Structure
The ALU (cont.)
The bits of the ALUOP are used to control the output MUX, theAU, and the LU.
Unit#Option#
ALUOP
1 - LU options
00 - Identity
01 - NOT
10 - OR
11 - AND
0 - AU optionsCinSelectB
00 - Add
11 - Sub
Computer Organization: Basic Processor Structure
The ALU (cont.)
I ALUOP2 specifies the unit number: 0 for AU, 1 for LU.
I If the LU is being used, bits ALUOP0,1 give the operationnumber: 0 for identity, 1 for compliment, 2 for OR, and 3 forAND.
I If the AU is used ALUOP1 selects the B operand: 0 for just B,and 1 for the one’s compliment of B.
I If the AU is used ALUOP0 gives the carry-in to the adder.
Examples
Addition: ALUOP = 000.Subtraction (add the two’s compliment): ALUOP = 011.OR: ALUOP = 110.
Computer Organization: Basic Processor Structure
The Mode MUX
BRIM addressing modes:
I Direct – So few instructions use direct mode, that it can beimplemented by the control circuits.
I Register direct – handled by the data-path mode MUX.
I Immediate – handled by the data-path mode MUX.
The mode MUX delivers either a register value (register direct) oran immediate value, for the source operand on to the bus. Itdelivers a register value onto the bus, for the destination operand.
Computer Organization: Basic Processor Structure
The Mode MUX: Structure
Reg
IR Bus
16
16 16
16
16
16
IR6-016
6
000000SS
SD
IR7
IR7
Computer Organization: Basic Processor Structure
The Mode MUX: Structure (cont.)
I Inputs:I Register fileI IR register (the whole machine instruction)
I The output is sent to the bus.I Tri-state switches choose an option.I Options:
I A register direct value from the register file. Two switchesgenerate this option: one for the source, and one for thedestination.
I An immediate value from the machine instruction, extended to16 bits.
I Source register direct is chosen when the SS line is on, andthe mode bit is 1.
I Source immediate value is chosen when the SS line is on, andthe mode bit is 0.
I Destination register direct is chosen when the SD line is on.Computer Organization: Basic Processor Structure
The RIM Control Unit
CU
Data-path
Sequencer
5
2224
2
8
Inputs to CU:
I 8 timing signals from the sequencer.
I 5 signals from the data-path: IRX0−3, and ZX.
Outputs: 24 bits — 22 bus sign, and 2 sequencer control signals.
Computer Organization: Basic Processor Structure
The CU: The Machine Cycle
Fetching an InstructionAR ← PCIR ← M[AR],PC ← PC + 1
Decoding an InstructionDecoding is a CU operation, not a data-path operation.
Executing an InstructionVaries from instruction to instruction.
Computer Organization: Basic Processor Structure
The CU: Executing Instructions
Executing the movR/M Instruction
I Direct ModeAR ← IR7−0
SR ← M[AR]
I Register Direct, or Immediate ModeSR ← if IR7 then IR6−0 else R[IR6−4]
Executing the store InstructionAR ← IR7−0
M[AR]← R[IR10−8]
Computer Organization: Basic Processor Structure
The CU: Executing Instructions (cont.)
Executing the Arithmetic and Logic Instructions
Operand Fetch:SR ← if IR7 then IR6−0 else R[IR6−4]DR ← R[IR10−8]
I Add.
R[IR10−8]← DR + SR,Z ←15∨i=0
(DR + SR)i
I Sub.
R[IR10−8]← DR − SR,Z ←15∨i=0
(DR − SR)i
I AND.
R[IR10−8]← DR ∧ SR,Z ←15∨i=0
(DR ∧ SR)i
I OR.
R[IR10−8]← DR ∨ SR,Z ←15∨i=0
(DR ∨ SR)i
I NOT.
R[IR10−8 ← DR,Z ←15∨i=0
DR i
Computer Organization: Basic Processor Structure
The CU: Executing Instructions (cont.)
Executing the Branch Instructions(They differ only by under what situation the µ-instructions areperformed.)
PC ← IR7−0
Executing the I/O Instructions
I In.R[IR10−8]← Din
I Out.Dout ← if IR7 then IR6−0 else R[IR6−4]
Computer Organization: Basic Processor Structure
The CU Behavioral Description
Overall structure of the CU:
Control
CU
Dec
Dec
op
time
Sequencer
3C
T7-0
2
IN CL
IRX15-12
Z
IR
Data-path
OP0-15
Data-path
22
Ctl busCtl bus
ZX
Computer Organization: Basic Processor Structure
The CU Behavioral Description (cont.)
Flow-chart for the CU:AR <- PC
IR <- M[AR], PC <- PC + 1
IR15-12 = ?
0 1 2,3,4,5,6 7 9
10AR <- IR7-0
SR <- M[AR]
R[IR10-8] <- SR
AR <- IR7-0
M[AR] <- R[IR10-8]
DR <- R[IR10-8]
R[IR10-8] <- SR + DR, Z <- (SR + DR) = 0
R[IR10-8] <- SR - DR, Z <- (SR - DR) = 0
R[IR10-8] <- SR & DR, Z <- (SR & DR) = 0
R[IR10-8] <- SR | DR, Z <- (SR | DR) = 0
R[IR10-8] <- !DR, Z <- (!DR) = 0
PC <- IR7-0 PC <- IR7-0
PC <- IR7-0
R[IR10-8] <- Din
Dout <- IR7 ? IR6-0 : R[IR6-4]
F0
Dec
St2 Is0
J2
I2
Ot2
IR111
0
F1
Dir
M2
MD3
MRD3
St3
M4
Ad2
Sb2
An2
Or4
N4
2
3
4
5
6
11
Bz3 Bn3
Z Z0
1 0
1
8
DR <- R[IR10-8]
DR <- R[IR10-8]
DR <- R[IR10-8]
DR <- R[IR10-8]
SR <- IR7 ? IR6-0 : R[IR6-4]Ad3
Ad4Sb3
Sb4An3
An4Or3
Or2
N3
N2
SR <- IR3 ? IR2-0 : R[IR2-0]
SR <- IR7 ? IR6-0 : R[IR6-4]
IsN0
SR <- IR7 ? IR6-0 : R[IR6-4]
SR <- IR7 ? IR6-0 : R[IR6-4]
SR <- IR7 ? IR6-0 : R[IR6-4]
Computer Organization: Basic Processor Structure
The CU: Flow-Chart
I Each square node contains a µ-instruction. Each diamondcontains a control decision.
I Subscripting on node names indicates the clock cycle in themachine cycle on which that µ-instruction is performed.
I Fetch is performed in stages F0, and F1.
I Decode is done in stage Dec . Control branches to one ofseveral nodes, based on the value if the op-code.
I Branch 0 implements the movR/M instruction. A sub-branchhandles either direct addressing mode or register/immediatemode.
Computer Organization: Basic Processor Structure
The CU: Flow-Chart (cont.)
I Branch 1 handles the store instruction.
I Branches 2 – 6 implement the 5 ALU instructions.
I Branches 7 and 8 implement the conditional branches, bz andbnz . Each branch contains a sub-branch that either causesthe branch, or does nothing, based on the value of the Z flag.
I Branch 9 performs a branch for the jump instruction.
I Branches 10 and 11 perform the µ-code for the in, and outinstructions.
I After executing a machine language instruction, the flow-chartreturns to stage F0 to begin the next instruction.
Computer Organization: Basic Processor Structure
The CU: Stage Control
I The CU knows what stage it is in, by the values of its controlinputs: the op-code, indicated by the inputs OP15−0, from theop-decoder; and the timing step, indicated by the inputsT7−0, from the sequencer.
I The stage is updated by changing the value of the sequencercounter.
I For square nodes in the flow-chart, the counter, C is eitherincremented to the next stage in a sequence, or cleared, tosend control back to the stage F0.
Computer Organization: Basic Processor Structure
The CU: Stage Control (cont.)
Phase Control Signals Sequencer ControlF0 T0 C ← C + 1F1 T1 C ← C + 1M2 T2 · OP0 C ← C + 1St2 T2 · OP1 C ← C + 1Ad2 T2 · OP2 C ← C + 1Sb2 T2 · OP3 C ← C + 1An2 T2 · OP4 C ← C + 1Or2 T2 · OP5 C ← C + 1N2 T2 · OP6 C ← C + 1Bz2 T2 · OP7 · ZXBn2 T2 · OP8 · ZXBz2 ∨ Bz2 T2 · OP7 C ← 0
Bn2 ∨ Bn2 T2 · OP8 C ← 0J2 T2 · OP9 C ← 0I2 T2 · OP10 C ← 0Ot2 T2 · OP11 C ← 0
Phase Control Signals Sequencer ControlMD3 T3 · OP0 · IRX11 C ← C + 1
MRD3 T3 · OP0 · IRX11 C ← C + 1St3 T3 · OP1 C ← 0Ad3 T3 · OP2 C ← C + 1Sb3 T3 · OP3 C ← C + 1An3 T3 · OP4 C ← C + 1Or3 T3 · OP5 C ← C + 1N3 T3 · OP6 C ← C + 1M4 T4 · OP0 C ← 0Ad4 T4 · OP2 C ← 0Sb4 T4 · OP2 C ← 0An4 T4 · OP4 C ← 0Or4 T4 · OP5 C ← 0N4 T4 · OP6 C ← 0
(Bz2 ∨ Bz2, and Bn2 ∨ Bn2 indicate actions that are taken as partof a conditional branch, whether or not the branch is taken.)
Computer Organization: Basic Processor Structure
The CU: Full Control Specification
Phase Micro-instruction Control OutputF0 T0 : AR ← PC ,C ← C + 1 ARLD, SPC, CINF1 T1 : IR ← M[AR],PC ← PC + 1, IRLD, ME, PCIN, CIN
C ← C + 1M2 T2 · OP0 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINMD3 T3 · OP0 · IRX11 : SR ← M[AR], SRLD, ME, CIN
C ← C + 1
MRD3 T3 · OP0 · IRX11 : SRLD, RE, SS, CINSR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1M4 T4 · OP0 : R[IR10−8]← SR,C ← 0 RLD, SRF, SALU, CCL,
ALUOP = 100St2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINSt3 T3 · OP1 : M[AR]← R[IR10−8], MW, RE, SRF, SD, CCL
C ← 0Ad2 T2 · OP2 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN
C ← C + 1Ad3 T3 · OP2 : SRLD, RE, SS, CIN
SR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1
Computer Organization: Basic Processor Structure
The CU: Full Control Specification (cont.)
Phase Micro-instruction Control OutputAd4 T4 · OP2 : R[IR10−8]← SR + DR, RLD, SRF, SALU, ZLD,
Z ← (SR + DR) = 0,C ← 0 ALUOP = 000, CCLSb2 T2 · OP3 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN
C ← C + 1Sb3 T3 · OP3 : SRLD, RE, SS, CIN
SR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1Sb4 T4 · OP3 : R[IR10−8]← SR − DR, RLD, SRF, SALU, ZLD,
Z ← (SR − DR) = 0,C ← 0 ALUOP = 011, CCL,An2 T2 · OP4 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN
C ← C + 1An3 T3 · OP4 : SRLD, RE, SS, CIN
SR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1An4 T4 · OP4 : R[IR10−8]← SR ∧ DR, RLD, SRF, SALU, ZLD,
Z ← (SR ∧ DR) = 0,C ← 0 ALUOP = 111, CCLOr2 T2 · OP5 : DR ← R[IR10−8], DRLD, RE, SD, SRF, CIN
C ← C + 1
Computer Organization: Basic Processor Structure
The CU: Full Control Specification (cont.)
Phase Micro-instruction Control OutputOr3 T3 · OP5 : SRLD, RE, SS, CIN
SR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1Or4 T4 · OP5 : R[IR10−8]← SR ∨ DR, RLD, SRF, SALU, ZLD,
Z ← (SR ∨ DR) = 0,C ← 0 ALUOP = 110, CCLN2 T2 · OP6 : DR ← R[IR10−8], DRLD, RE, SD, SRF,
C ← C + 1 CINN3 T3 · OP6 : SRLD, RE, SS, CIN
SR ← if IR7 then IR6−0
else R[IR6−4],C ← C + 1
N4 T4 · OP6 : R[IR10−8]← DR, RLD, SRF, SALU, ZLD,
Z ← DR = 0,C ← 0 ALUOP = 101, CCLBz2 T2 · OP7 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP7 : C ← 0 CCL
Bz2
Bn2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBn2∨ T2 · OP8 : C ← 0 CCL
Bn2
Computer Organization: Basic Processor Structure
The CU: Full Control Specification (cont.)
Phase Micro-instruction Control OutputJ2 T2 · OP9 : PC ← IR7−0,C ← 0 PCLD, SAD, CCLI2 T2 · OP10 : R[IR10−8]← Din : C ← 0 RLD, SRF, IOE, CCLOt2 T2 · OP11 :
Dout ← if IR7 then IR6−0 SS, RE, IOLD, CCLelse R[IR6−4],C ← 0
The table gives the stage, the µ-instruction performed, and thecontrol signals output by the CU to realize the µ-instruction.
Equations for the outputs can be derived from the table by copyingout inputs for rows where the output signal occurs.
Computer Organization: Basic Processor Structure
The Control CircuitryOutput Signal FormulaRLD T2 · OP10 + T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)MW T3 · OP1
IOLD T2 · OP11
PCLD T2(OP7 · ZX + OP8 · ZX + OP9)DRLD T2(OP2 + OP3 + OP4 + OP5 + OP6)SRLD T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)IRLD T1
ARLD T0 + T2(OP0 + OP1)ZLD T4(OP2 + OP3 + OP4 + OP5 + OP6)RE T2(OP2 + OP3 + OP4 + OP5 + OP6 + OP11)
+T3(OP0 · IRX11 + OP1 + OP2 + OP3
+OP4 + OP5 + OP6)ME T1 + T3 · OP0 · IRX11
IOE T2 · OP10
SPC T0
SALU T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)
SAD T2(OP0 + OP1 + OP7 · ZX + OP8 · ZX + OP9)SRF T2(OP0 + OP2 + OP3 + OP4 + OP5 + OP6 + OP10)
+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)SS T3 · OP11
+T3(OP0 · IRX11 + OP2 + OP3 + OP4 + OP5 + OP6)SD T2(OP2 + OP3 + OP4 + OP5 + OP6) + T3 · OP1
PCIN T1
ALUOP0 T4(OP3 + OP4 + OP6)ALUOP1 T4(OP3 + OP4 + OP5)ALUOP2 T4(OP0 + OP4 + OP5 + OP6)CIN T0 + T1
+T2(OP0 + OP1 + OP2 + OP3 + OP4 + OP5 + OP6)+T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)
CCL T2(OP7 + OP8 + OP9 + OP10 + OP11) + T3 · OP1
+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)
Computer Organization: Basic Processor Structure
The Control Circuitry (cont.)
Output Signal FormulaSS T3 · OP11
+T3(OP0 · IRX11 + OP2 + OP3 + OP4 + OP5 + OP6)SD T2(OP2 + OP3 + OP4 + OP5 + OP6) + T3 · OP1
PCIN T1
ALUOP0 T4(OP3 + OP4 + OP6)ALUOP1 T4(OP3 + OP4 + OP5)ALUOP2 T4(OP0 + OP4 + OP5 + OP6)CIN T0 + T1
+T2(OP0 + OP1 + OP2 + OP3 + OP4 + OP5 + OP6)+T3(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)
CCL T2(OP7 + OP8 + OP9 + OP10 + OP11) + T3 · OP1
+T4(OP0 + OP2 + OP3 + OP4 + OP5 + OP6)
Examples
SRLD occurs in stages MD3, MRD3, Ad3, Sb3, An3, Or3, and N3
This corresponds to input signals T3 ·OP0 · IRX11, T3 ·OP0 · IRX11,T3 · OP2, T3 · OP3, T3 · OP4, T3 · OP5, and T3 · OP6.
Computer Organization: Basic Processor Structure
Control for the Register Machine
Data-path modification.
LD E
RERLD
8IR
LDIRLD
16
16
IRX SAD
2
0
1
2
IR6-4
IR2-0
IR10-8
SRF
16
Reg>
16 Mode
16
16
SD SS1 SS2
16
Computer Organization: Basic Processor Structure
Control for the Register Machine (cont.)
Data-path modifications:
I The register addressing MUX allows register numbers fromthe dst field, the src1 field, or the src2 field.
I The mode MUX now selects between the register directdestination (SD), either an immediate value or a registerdirect operand for the first source (SS1), or an immediatevalue or register direct operand for the second source (SS2).
Computer Organization: Basic Processor Structure
Control for the Register Machine (cont.)
Phase Micro-instruction Control OutputF0 T0 : AR ← PC ,C ← C + 1 ARLD, SPC, CINF1 T1 : IR ← M[AR],PC ← PC + 1, IRLD, ME, PCIN, CIN
C ← C + 1L2 T2 · OP0 : AR ← IR7−0, ARLD, SAD, CIN
C ← C + 1L3 T3 · OP0 : SR ← M[AR], SRLD, ME, CIN
C ← C + 1L4 T4 · OP0 : R[IR10−8]← SR, RLD, SALU, SRF = 10,
C ← 0 CCL, ALUOP = 100St2 T2 · OP1 : AR ← IR7−0, ARLD, SAD, CIN
C ← C + 1St3 T3 · OP1 : M[AR]← R[IR10−8], MW, RE, SRF = 10, SD,
C ← 0 CCLAd2 T2 · OP2 : DR ← R[IR6−4], DRLD, RE, SRF = 01,
C ← C + 1 SS1, CINAd3 T3 · OP2 : SR ← R[IR2−0], SRLD, RE, SRF = 00,
C ← C + 1 SS2, CINAd4 T4 · OP2 : R[IR10−8]← SR + DR, RLD, SRF = 10, SALU,
Z ← (SR + DR) = 0,C ← 0 ZLD, CCL, ALUOP = 000
Computer Organization: Basic Processor Structure
Control for the Register Machine (cont.)
Phase Micro-instruction Control OutputSb2 T2 · OP3 : DR ← R[IR6−4], DRLD, RE, SRF = 01,
C ← C + 1 SS1, CINSb3 T3 · OP3 : SR ← R[IR2−0], SRLD, RE, SRF = 00,
C ← C + 1 SS2, CINSb4 T4 · OP3 : R[IR10−8]← SR − DR, RLD, SRF = 10, SALU,
Z ← (SR − DR) = 0,C ← 0 ZLD, CCL, ALUOP = 011An2 T2 · OP4 : DR ← R[IR6−4], DRLD, RE, SRF = 01,
C ← C + 1 SS1, CINAn3 T3 · OP4 : SR ← R[IR2−0], SRLD, RE, SRF = 00,
C ← C + 1 SS2, CINAn4 T4 · OP4 : R[IR10−8]← SR ∧ DR, RLD, SRF = 10, SALU,
Z ← (SR ∧ DR) = 0,C ← 0 ZLD, CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[IR6−4], DRLD, RE, SRF = 01,
C ← C + 1 SS1, CINOr3 T3 · OP5 : SR ← R[IR2−0], SRLD, RE, SRF = 00,
C ← C + 1 SS2, CINOr4 T4 · OP5 : R[IR10−8]← SR ∨ DR, RLD, SRF = 10, SALU,
Z ← (SR ∨ DR) = 0,C ← 0 ZLD, CCL, ALUOP = 110
Computer Organization: Basic Processor Structure
Control for the Register Machine (cont.)
Phase Micro-instruction Control OutputN2 T2 · OP6 : DR ← R[IR6−4], DRLD, RE, SRF = 01,
C ← C + 1 SS1, CINN3 T3 · OP6 : SR ← R[IR2−0], SRLD, RE, SRF = 00,
C ← C + 1 SS2, CIN
N4 T4 · OP6 : R[IR10−8]← DR, RLD, SRF = 10, SALU,
Z ← DR = 0,C ← 0 ZLD, CCL, ALUOP = 101Bz2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP8 : C ← 0 CCL
Bz2
Bn2 T2 · OP9 · ZX : PC ← IR7−0 PCLD, SADBn2∨ T2 · OP9 : C ← 0 CCL
Bn2
J2 T2 · OP7 : PC ← IR7−0,C ← 0 PCLD, SAD, CCL
Computer Organization: Basic Processor Structure
Control for the Accumulator Machine
Data-path modification: The same data-path used for the registerimplicit machine is used for the accumulator machine, except thatthe SR register is renamed the AC.
Phase Micro-instruction Control OutputF0 T0 : AR ← PC ,C ← C + 1 ARLD, SPC, CINF1 T1 : IR ← M[AR],PC ← PC + 1, IRLD, ME, PCIN, CIN
C ← C + 1L2 T2 · OP0 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINL3 T3 · OP0 : AC ← M[AR], ACLD, ME, CCL
C ← 0St2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINSt3 T3 · OP1 : M[AR]← AC ,C ← 0 MW, SALU,
ALUOP=100, CCLAd2 T2 · OP2 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINAd3 T3 · OP2 : AC ← AC + DR, ACLD, SALU, ZLD,
Z ← (AC + DR) = 0,C ← 0 CCL, ALUOP = 000Sb2 T2 · OP3 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINSb3 T3 · OP3 : AC ← AC − DR, ACLD, SALU, ZLD,
Z ← (AC − DR) = 0,C ← 0 CCL, ALUOP = 011
Computer Organization: Basic Processor Structure
Control for the Accumulator Machine (cont.)
Phase Micro-instruction Control OutputAn2 T2 · OP4 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINAn3 T3 · OP4 : AC ← AC ∧ DR, ACLD, SALU, ZLD,
Z ← (AC ∧ DR) = 0,C ← 0 CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CINOr3 T3 · OP5 : AC ← AC ∨ DR, ACLD, SALU, ZLD,
Z ← (AC ∨ DR) = 0,C ← 0 CCL, ALUOP = 110N2 T2 · OP6 : DR ← R[IR2−0],C ← C + 1 DRLD, RE, SS, CIN
N3 T3 · OP6 : AC ← AC , ACLD, SALU, ZLD,
Z ← AC = 0,C ← 0 CCL, ALUOP = 101Bz2 T2 · OP8 · ZX : PC ← IR7−0 PCLD, SADBz2∨ T2 · OP8 : C ← 0 CCL
Bz2
Bn2 T2 · OP9 · ZX : PC ← IR7−0 PCLD, SADBn2∨ T2 · OP9 : C ← 0 CCL
Bn2
J2 T2 · OP7 : PC ← IR7−0,C ← 0 PCLD, SAD, CCL
Computer Organization: Basic Processor Structure
Control for the Stack Machine
Data-path modifications:
LD E
RERLD
8IR
LDIRLD
16
16
IRX SAD
16
Reg>
16
TP
TPIN TPDE
IN DE
3
>
>
Computer Organization: Basic Processor Structure
Control for the Stack Machine (cont.)
I The register file is now only addressed by the stack toppointer (TP).
I The TP register is an up-down counter, capable of beingincremented for the pop operation, and decremented for thepush operation.
Computer Organization: Basic Processor Structure
Control for the Stack Machine (cont.)
Phase Micro-instruction Control OutputF0 T0 : AR ← PC ,C ← C + 1 ARLD, SPC, CINF1 T1 : IR ← M[AR],PC ← PC + 1, IRLD, ME, PCIN, CIN
C ← C + 1Pu2 T2 · OP0 : AR ← IR7−0, ARLD, SAD, TPDE,
TP ← TP − 1,C ← C + 1 CIN
PuM3 T3 · OP0 · IRX11 : R[TP]← M[AR] RLD, MEPuI3 T3 · OP0 · IRX11 : R[TP]← IR7−0 RLD, SADPuM3∨ T3 · OP0 : C ← 0 CCL
PuI3Po2 T2 · OP1 : AR ← IR7−0,C ← C + 1 ARLD, SAD, CINPo3 T3 · OP1 : M[AR]← R[TP], MW, RE, TPIN, CCL
TP ← TP + 1,C ← 0Ad2 T2 · OP2 : DR ← R[TP], DRLD, RE, TPIN, CIN
TP ← TP + 1,C ← C + 1Ad3 T3 · OP2 : SR ← R[TP],C ← C + 1 SRLD, RE, CINAd4 T4 · OP2 : R[TP]← SR + DR, RW, SALU, ZLD,
Z ← (DR + SR) = 0,C ← 0 CCL, ALUOP = 000
Computer Organization: Basic Processor Structure
Control for the Stack Machine (cont.)
Phase Micro-instruction Control OutputSb2 T2 · OP3 : DR ← R[TP], DRLD, RE, TPIN, CIN
TP ← TP + 1,C ← C + 1Sb3 T3 · OP3 : SR ← R[TP],C ← C + 1 SRLD, RE, CINSb4 T4 · OP3 : R[TP]← SR − DR, RW, SALU, ZLD,
Z ← (DR − SR) = 0,C ← 0 CCL, ALUOP = 011An2 T2 · OP4 : DR ← R[TP], DRLD, RE, TPIN, CIN
TP ← TP + 1,C ← C + 1An3 T3 · OP4 : SR ← R[TP],C ← C + 1 SRLD, RE, CINAn4 T4 · OP4 : R[TP]← SR ∧ DR, RW, SALU, ZLD,
Z ← (DR ∧ SR) = 0,C ← 0 CCL, ALUOP = 111Or2 T2 · OP5 : DR ← R[TP], DRLD, RE, TPIN, CIN
TP ← TP + 1,C ← C + 1Or3 T3 · OP5 : SR ← R[TP],C ← C + 1 SRLD, RE, CINOr4 T4 · OP5 : R[TP]← SR ∨ DR, RW, SALU, ZLD,
Z ← (DR ∨ SR) = 0,C ← 0 CCL, ALUOP = 110N2 T2 · OP6 : DR ← R[TP],C ← C + 1 DRLD, RE, CIN
N3 T3 · OP6 : R[TP]← DR, RW, SALU, ZLD,
Z ← (DR) = 0,C ← 0 CCL, ALUOP = 101
Computer Organization: Basic Processor Structure
Control for the Stack Machine (cont.)
Phase Micro-instruction Control OutputBz2 T2 · OP8 · ZX : PC ← R[TP], PCLD, RE, TPIN
TP ← TP + 1Bz2∨ T2 · OP8 : C ← 0 CCL
Bz2
Bn2 T2 · OP9 · ZX : PC ← R[TP], PCLD, RE, TPINTP ← TP + 1
Bn2∨ T2 · OP9 : C ← 0 CCL
Bn2
J2 T2 · OP7 : PC ← R[TP], PCLD, RE, CCLTP ← TP + 1,C ← 0
I The movR/M instruction has been replaced by the pushinstruction, and the store instruction has been replaced by thepop instruction.
I Branch instructions ge the target address off the top of thestack, instead of as an immediate value operand.
Computer Organization: Basic Processor Structure
Chapter 8: Computer Arithmetic
Performed by the ALU (or ALSU) (Arithmetic, logic, shift unit).Operations:
I Arithmetic.I Logic (bit-wise Boolean).I Shift.
Logic and Shift operations are performed by shift units, or logicgate arrays.
A0
A1
A2
A3
Z0
Z1
Z2
Z3
A Z
0
shl
A0
A1
A2
A3
B0
B1
B2
B3
Z0
Z1
Z2
Z3
A
BZ
4 4 4
4
4
Computer Organization: Basic Processor Structure
Arithmetic Operations
Arithmetic is performed on elements of numeric data-types.Numeric data-types:
I Integer DataI Unsigned IntegerI Signed Integer
I Floating-Point Data
Computer Organization: Basic Processor Structure
Unsigned and Signed Integers
I With unsigned integers, all bit configurations of the word areconsidered to represent non-negative integers.
I With signed integers, half of the bit configurations representnegative integers, and half represent non-negative integers.
I Unsigned integers are used for representing and manipulatingaddresses, that are all non-negative.
I Signed integers are used to implement general integerarithmetic.
Computer Organization: Basic Processor Structure
Unsigned and Signed Integers (cont.)
I Example: the 8-bit configuration 11110100, interpreted as anunsigned value would just be the very large decimal number244. Interpreted as a signed integer, however, it represents–12.
I The processor treats a bit configuration according to theinstruction it is executing.
addu R0, R1 ; treat the values in R0, and R1 as unsigned
addi R0, R1 ; treat the values in R0, and R1 as signed
Computer Organization: Basic Processor Structure
Unsigned and Signed Integers (cont.)
I Unsigned integers (8-bit): 00000000 to 11111111 (0 to28 − 1 = 255).
I Signed integers (8-bit): The top bit is used to determine sign.I 0 (non-negative): 00000000 to 01111111 (0 to 27 − 1 = 127).I 1 (negative): 11111111 to 10000000 (–1 to 27 = 128).
I Negative numbers proceed towards negative infinity bycounting with zeros; 11111111, 11111110, 11111101,11111100, ...
I Notice that there is one more negative number, than positivenumbers, since one non-negative configuration is taken by 0.
Computer Organization: Basic Processor Structure
Unsigned Arithmetic
Operations:
I Addition: Z ← A + B
I Subtraction: Z ← A− B
I Multiplication: Z ← A× B
I Division: Z ← A÷ B
I Remainder: Z ← A mod B
Computer Organization: Basic Processor Structure
Unsigned Addition
Binary addition requires two operands, and a carry-in as input, andproduces a sum and a carry-out as output. The carry-out isdiscarded, serves a purpose detecting overflow.
Examples (Demonstrating overflow.)
52 + 141 = 193
0 00
10
11
11
10
01
00
00
+1000110111000001
Computer Organization: Basic Processor Structure
Unsigned Addition (cont.)
Examples (Demonstrating overflow.(cont.))
151 + 116 = 11 (something is wrong: arithmetic overflow)
1 11
10
10
01
10
01
01
01
+0111010000001011
The processor typically sets a status flag (V) when there isoverflow. Overflow ocurrs when there is a non-zero carry-out.
V = Cout
Computer Organization: Basic Processor Structure
Unsigned Subtraction
Subtraction is done by the adder circuit. That isZ ← A− B
is implemented asZ ← A + (−B)
(The A operand is added to the two’s compliment of the Boperand, which is the one’s compliment of B plus a carry-in of 1.)
1
A
B
Cin
Cout
S+
Computer Organization: Basic Processor Structure
Unsigned Subtraction (cont.)
Examples (Overflow)
55− 14 = 41
1 10
10
11
01
10
11
11
11
+1111000100101001
57− 112 = −55 (something is wrong: we cannot representnegative values as unsigned integers)
0 10
10
11
11
11
10
10
11
+1000011111000001
In subtraction, overflow occurs when the carry out is zero.V = Cout
Computer Organization: Basic Processor Structure
Unsigned Multiplication
6× 11 = 66
0110×1011
01100110
000001101000010
The intermediate rows are either shifted copies of the multiplicand,or rows of zeros.
Computer Organization: Basic Processor Structure
Unsigned Multiplication (cont.)
A method for the ALSU. (4-bit multiplication)I We use three registers
I M — The multiplier: 4 bits wide. For k-bit multiplication, thisregister would be k bits wide.
I N — The multiplicand: 8 bits wide. For k-bit multiplication,this register would be 2k bits wide.
I P — The product: 8 bits wide. For k-bit arithmetic, thisregister would be 2k bits wide.
(The N, and P registers must be the same size, since the Nregister is added to the M register.)
I We shift the product right, and hold the multiplicandstationary, rather than shifting the multiplicand left, andholding the product stationary.
I Rather than save up intermediate rows, we add themultiplicand to the product immediately.
Computer Organization: Basic Processor Structure
Unsigned Multiplication (cont.)
N M P Action
01100000 1011 00000000 +01100000 →
0101 00110000 +10010000 →
0010 01001000 →0001 00100100 +
10000100 →0000 01000010 X
(Trace of 6× 11.)
Computer Organization: Basic Processor Structure
Unsigned Multiplication (cont.)
I The multiplicand starts in the top half of the N register. Itremains constant throughout the algorithm.
I A loop is executed. Each time through the loop the M, andthe P registers are shifted one bit to the left.
I On iterations where the bottom bit of the multiplier is 1, theN register is added to the P register, before the shift.
I The iterations stop when 4 shifts have been performed.
Computer Organization: Basic Processor Structure
Unsigned Multiplication (cont.)
Algorithm:
N7−4 = N3−0
N3−0 = 0P = 0I = 0while I 6= 4 do
if M0 then
P = P + NM = M >> 1P = P >> 1I = I + 1
Computer Organization: Basic Processor Structure
Unsigned Multiplication (cont.)
µ-Program:
Def: I 4X ≡ I = 4,Tk ≡ C = kT0 : N7−4 ← N3−0,N3−0 ← 0,P ← 0, I ← 0,C ← C + 1T1 · I 4X : C ← 4
T1 · I 4X : C ← 2T2 ·M0 : P ← P + NT2 : C ← C + 1T3 : M ← shr M,P ← shr P, I ← I − 1,C ← 1T4 :
Notice that while addition can be performed in one clock cycle,multiplication requires several, and is significantly slower thanaddition.
Computer Organization: Basic Processor Structure
Unsigned Division
79÷ 7: quotient 11, remainder 2.
01011111|1001111−111
010111−111
1001−111
010
The intermediate rows are either shifted copies of the divisorsubtracted from the remaining dividend, a subtraction of zero (nosubtraction).
Computer Organization: Basic Processor Structure
Unsigned Division (cont.)
A method for the ALSU. (4-bit division)I We use three registers
I Q — The quotient: 4 bits wide. For k-bit multiplication, thisregister would be k bits wide.
I D — The divisor: 8 bits wide. For k-bit multiplication, thisregister would be 2k bits wide.
I R — The remainder: 8 bits wide. For k-bit arithmetic, thisregister would be 2k bits wide.
(The D, and R registers must be the same size, since the Dregister is added to the R register.)
I We shift the remainder left, and hold the divisor stationary,rather than shifting the divisor right, and holding theremainder stationary.
I The remainder starts as the dividend, and as we proceed itrepresents the portion of the dividend remaining. When wefinish the R register contains the remainder after division.
Computer Organization: Basic Processor Structure
Unsigned Division (cont.)
D Q R Action
01110000 0000 01001111 ←0000 10011110 -0001 00101110 ←0010 01011100 ←0100 10111000 -0101 01001000 ←1010 10010000 -1011 00100000 X
(Trace of 79÷ 11.)
Computer Organization: Basic Processor Structure
Unsigned Division (cont.)
I The divisor starts in the top half of the D register. It remainsconstant throughout the algorithm.
I A loop is executed. Each time through the loop the Q, andthe R registers are shifted one bit to the right.
I On iterations where, after the shift, R ≥ D, the D register issubtracted from the R register, and 1 is added to the Qregister.
I The iterations stop when 4 shifts have been performed.
Computer Organization: Basic Processor Structure
Unsigned Division (cont.)
Algorithm:
D7−4 = D3−0
D3−0 = 0Q = 0I = 0while I 6= 4 do
Q = Q << 1R = R << 1if R − D ≥ 0 then
R = R − DQ = Q + 1
I = I + 1R3−0 = R7−4
R7−4 = 0
Computer Organization: Basic Processor Structure
Unsigned Division (cont.)
µ-Program:
Def: I 4X ≡ I = 4,Tk ≡ C = k ,RDX ≡ R − D ≥ 0T0 : D7−4 ← D3−0,D3−0 ← 0,Q ← 0, I ← 0,C ← C + 1T1 · I 4X : C ← 4
T1 · I 4X : C ← 2T2 : Q ← shl Q,R ← shl R,C ← C + 1T3 · RDX : R ← R − D,Q ← Q + 1T3 : I ← I + 1,C ← 1T4 : R3−0 ← R7−4,R7−4 ← 0,C ← 5T5 :
The circuit that that performs add-shift multiplication can be builtso that it can be “reversed” to perform shift-subtract division.
Computer Organization: Basic Processor Structure
Signed Addition and Subtraction
Addition of signed integers is done using the same adder as is usedfor unsigned addition. A difference is in how overflow is detected.
Examples (Overflow)
118− 38 = 80
1 10
11
11
11
10
11
01
00
+1101101001010000
95 + 51 = 146 (something when wrong: the resut of adding twopositive numbers was negative)
0 10
11
10
11
11
11
11
01
+0011001110010010
Computer Organization: Basic Processor Structure
Signed Addition and Subtraction (cont.)
Overflow occurs when the sum of two operands of the same signresults in a value of the opposite sign. Or, equivalently, thecarry-in to the sign bit is not equal to the carry-out of the sign bit.
V = Cout ⊕ Cin
Examples (No overflow with negative operands.)
−40− 3 = −43 (Cin = Cout , so no overflow.)
1 11
11
10
11
01
00
00
00
+1111110111010101
Computer Organization: Basic Processor Structure
Signed Multiplication and Division
We can use the unsigned add-shift multiplier circuit to multiplysigned numbers, following the following procedure. (Example:11110110× 00000110.)
1. Calculate their magnitudes: 00001010, and 00000110.
2. Use the unsigned multiplier to multiply the magnitudes:0001010× 0000110 = 00111100.
3. Calculate the product sign, by taking the XOR of the twooperand sign bits: 1⊕ 0 = 1.
4. If the product sign is negative, take the two’ s compliment ofthe product magnitude, to produce the actual product:11000100.
Computer Organization: Basic Processor Structure
Unsigned Multiplication and Division (cont.)
I For division we can use the unsigned shift-subtract circuit onthe magnitudes of the operands, ans we did for signedmultiplication.
I The sign of the quotient is calculated as the XOR of the signsof the operands, as it is for the product in multiplication.
I For the remainder, it represents part of the dividend. It shouldhave the same sign as the dividend.
Examples
d ÷D (d is the dividend, and D is the divisor) should yield (Q,R),where d = D × Q + R.
−56÷ 9 = (−6,−2), where −56 = −6 · 9 + (−2).
Computer Organization: Basic Processor Structure
Floating-Point Data
I Floating-point (FP) is used to represent “real numbers”.
I We cannot represent exactly all real numbers. For instance πcannot be represented with a finite number of bits, since it isirrational.
I We represent only rational numbers, with finite expansions.Arbitrary real numbers must be approximated by the closestsuch rational number.
I We store rationals with finite expansions in a FP word with afixed width. To do this we store the scientific notationrepresentation of the number.
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
I The scientific notation for the number has 3 pieces ofsignificance: (for the example −43.8125 = −4.38125× 101)
I sign of the number (negative, in this case).I mantissa; the fraction (4.38125).I exponent; the power (1 in this case)
(The base, which in this case is 10, is insignificant, since 10 isalways used.)
I The mantissa is always normalized. (There is only onenonzero digit above the decimal point.)
I The decimal point is moved, or floated to normalize themantissa.
I Certain numbers, like 0, cannot be normalized. (You cannotwrite 0 as a mantissa with a nonzero digit.) These numbersare given standard notations. (0 = +0.0× 100)
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
I The computer uses binary scientific notation.−101011.1101 = −1.010111101× 25
I In binary, the leading digit in a normalized mantissa is 1.
I Storing a rational number in a floating-point word. (Afloating-point word is 32 bits.)
sign exponent mantissa
1 8 23
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
Floating-point format.
I sign — behaves the same as an integer sign bit (0,non-negative; 1 negative).
I mantissa — a 23-bit field. Since the leading bit is 1 fornormalized numbers, it is not stored. For the example,−1.010111101× 25, the mantissa stored is 0101 1110 10000000 0000 000 (truncate off the “1.”, pad to 23 bits withzeros).
I exponenet — stored in 127-bias notation. To calculate the127-bias notation, add 127 to the exponent, and write theresult in 8 bits.
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
127-bias notation.The top bit of the exponent is its sign bit. It behave in thecontrary to 2s complimnet; a 0 in the sign bit indicates tha theexponent is non-positive, and a 1 indicates that it is positive.
Decimal Number 127-Bias–127 00000000–126 00000001–125 00000010–124 00000011...
...0 011111111 100000002 100000013 10000010...
...128 11111111
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
Examples (127-bias)
Actual exponent: -5−5 + 127 = 122 = 01111010 127-bias exponent
127-bias exponent: 1000011110000111− 01111111 = 135− 127 = 8 actual exponent
Shortcut conversion:
I Positive exponent, x : write a 1 sign bit, and the 7-bit valuex − 1.Actual exponent — 12: 1 0001011 = 10001011, 127-biasexponent.
I Non-positive exponent, x : write a 0 sign bit, and a 7-bit 1scompliment of x .Actual exponent — –12: 0 comp(0001100) = 0 1110011 =01110011, 127-bias exponent.
Computer Organization: Basic Processor Structure
Floating-Point Data (cont.)
Examples (−1.010111101× 25)
Sign: 1Exponent: 10000100Mantissa: 0101 1110 1000 0000 0000 000
Floating-point word: 1,10000100,0101 1110 1000 0000 0000 000
Computer Organization: Basic Processor Structure
Converting between Floating-Point and Decimal
1. Convert the decimal number to binary.
2. Write the binary number in scientific notation.
3. Pack the scientific notation number into the floating-pointword.
Converting to binary
I The integer part is converted using successive division.
I The fraction part is converted using successive multiplication.
I In successive multiplication, we multiply the fraction by 2. Foreach multiplication whatever digit pops up above the decimalpoint is recorded as a part of the result, and deleted.
Computer Organization: Basic Processor Structure
Converting between Floating-Point and Decimal (cont.)
Example: –43.8125 (43 = 101011)
Calculation Integer Part Fractional Part
.8125× 2 1 .625
.625× 2 1 .25
.25× 2 0 .5
.5× 2 1 .0
(.8125 = .1101)–43.8125 = –101011.1101Scientific notation: −1.010111101× 25
Floating-point format: 1,10000100,0101 1110 1000 0000 0000 000(Calculate 127-bias exponent, truncate “1.” off of the mantissa.)
Computer Organization: Basic Processor Structure
Converting between Floating-Point and Decimal (cont.)
Examples (From floating-point to decimal.)
0,01111110,1010 0000 0000 0000 0000 000:
Sign: +Exponent: −comp(1111110) = −0000001 = −1Mantissa: (1.)1010 = 1.1010
Scientific notation: +1.101× 2−1
Converting from scientific notation to decimal.
I CM — convert and multiply. Convert all parts to decimal, andmultiply them, in decimal, to obtain the result.
I MC — multiply and convert. Multiply the parts, in binary,and convert the result to decimal.
Computer Organization: Basic Processor Structure
Converting between Floating-Point and Decimal (cont.)
CM ConversionIn the same way that we can rewrite 2.543 = 2543
103 , we can write
the mantissa 1.101 = 110123 = 1101
1000 .
+1.101× 2−1 = + 11011000 ×
110 = + 13
8 ×12 (convert to decimal)
= + 1316 = +0.8125 (multiply)
MC Conversion+1.101× 2−1 = + 1101
23 × 121 = + 1101
24 = +0.1101 (multiply)
= + 132−4 = + 13
16 = +0.8125 (convert)
Computer Organization: Basic Processor Structure
Standardization
I Several different formats for the floating-point word used toexist.
I This caused programs to be non-portable. A program wouldwork on one machine, but would result in floating-pointoverflow on another, due to a smaller exponent field.
I Industry stakeholders got together, and agreed on a standardformat for the floating-point word. This standardization wasfacilitated by IEEE.
I The floating-point standard is referred to as the IEEE 754standard. It is the 32-bit format we have been using.
Computer Organization: Basic Processor Structure
Standardization (cont.)
I For some scientific applications an 8-bit exponent, and a23-bit mantissa are not precise enough.
I IEEE 754 actually has two standard floating-point wordwidths: a 32-bit word, and a 64 bit format.
I The two word sizes are referred to as single precision, anddouble precision. They correspond to the C types of float,and double.
I The double precision exponent used 1023-bias notation, whichis the 11-bit version of 127-bias.
sign exponent mantissa
1 11 52
Computer Organization: Basic Processor Structure
Standardization (cont.)
Non-standard numbers include:
I 0 — represented as 0,00000000,0000 0000 0000 0000 0000000.
I +∞I −∞I NaN — (not a number) a value used, often to represent an
undefined result in some arithmetic calculation.
Computer Organization: Basic Processor Structure
Field Order
I When we write numbers in scientific notation, the order of thefields is sign, mantissa, and exponent.
I in floating-point we do not use this usual order. In stead wewrite the number as sign, exponent, and mantissa.
I The floating-point format allows us to use integer comparisoncircuitry to compare floating-point numbers
Examples (Comparing floating-point bit strings as if they wereintegers.)
Exponent first:a = 0, 10000011, 00100000000000000000000 = +1.001× 24
b = 0, 01111000, 00110000000000000000000 = +1.0011× 2−7
a > b, as it should be, because the most significant bits (theexponent) are further to the left.
Computer Organization: Basic Processor Structure
Field Order (cont.)
Examples (Comparing floating-point bit strings as if they wereintegers. (cont.))
Mantissa first:a = 0, 00100000000000000000000, 10000011b = 0, 00110000000000000000000, 01111000
b > a, incorrectly, because the least significant bits (themantissa) are further to the left.
We use 127-bias for the same reason. That is to say that the bitstring for a negative exponent should appear smaller that the bitstring for a positive exponent.
Computer Organization: Basic Processor Structure
Field Order (cont.)
Examples (Comparing exponents as part of an integer.)
127-bias:a = 0, 10000011, 00100000000000000000000 = +1.001× 24
b = 0, 01111000, 00110000000000000000000 = +1.0011× 2−7
a > b, as it should be, because a positive exponent starts witha 1, and a negative exponent starts with a 0.
2s-compliment:a = 0, 00000100, 00100000000000000000000 = +1.001× 24
b = 0, 11111001, 00110000000000000000000 = +1.0011× 2−7
b > a, incorrectly, because a positive exponent starts with a 0,and a negative exponent starts with a 1.
Computer Organization: Basic Processor Structure
Arithmetic Approximation
Three properties associated with approximating real numbers withfloating-point.
I Precision. The number of digits in the mantissa. So, for theIEEE single precision format, we would say that we have aprecision of 24 bits. (This works out to be about sevendecimal digits of precision.)
I Range. This is the interval of numbers that we can represent,from the largest to the smallest. For the single precisionformat, this is the interval [1,11111110,1111 1111 1111 11111111 111 .. 0,11111110,1111 1111 1111 1111 1111 111] (thesmallest non-infinite negative number, to the largestnon-infinite positive number).
I Gap. The largest distance between consecutive, representablerational numbers.
Computer Organization: Basic Processor Structure
Arithmetic Approximation (cont.)
The gap draws attention to the fact that we cannot represent allreal numbers in the range.
Examples
0, 10000011, 01010000111101010000110 =+1.0101, 0000, 1111, 0101, 0000, 110× 24 (approximately21.059826)
The next largest number is:0, 10000011, 01010000111101010000111 =+1.0101, 0000, 1111, 0101, 000, 111× 24 (approximately21.059828) (derived by flipping the rightmost bit)
Computer Organization: Basic Processor Structure
Arithmetic Approximation
The range can also be exceeded by an arithmetic calculation.
I Floating-point overflow — The magnitude of the number islarger than the largest floating-point number that isrepresentable. This is indicated by an exponent that exceeds11111111 (128).
I Floating-point underflow — The magnitude of the number isto small to be represented. This is indicated by n exponentthat is smaller that 00000000 (-127).
Computer Organization: Basic Processor Structure
Rounding
I When performing an arithmetic calculation, we cannotrepresent infinite precision.
I We typically keep the allowable precision, and two more bits.These bits are called the round bt, and the guard bit. (Thehigh-order bit is the round bit.)
I For example, if we had 5 bits of precision, the result of acalculation might be +1.0110, 10× 26, with round and guardbits.
I When packing this into the floating point word, the round,and guard bit are dropped, through the process of rounding.
I IEEE 754 allows 4 methods to be used to round.
Computer Organization: Basic Processor Structure
Rounding (cont.)
Rounding methods:
I Round Nearest (RN): Round the number to the nearestfloating-point number.
I Round Zero (RZ): Round the number to the closestfloating-point number, towards zero.
I Round Positive (RP): Round the number to the closestfloating-point number, in the direction of positive infinity.
I Round Minus (RM): Round the number to the closestfloating-point number in the direction of negative infinity.
Computer Organization: Basic Processor Structure
Rounding (cont.)
I The round and guard bits are used to determine if rounding isnecessary (they are non-zero).
I The round bit bit is used to perform RN. If it is 1, then youwould round up. If it is 0, then you would round down.
I To round up, you add 1 to the magnitude, and drop the roundand guard bits.
I To round down, or truncate, you simply drop the round andguard bits.
I For RN you use the round bit to determine if you round themagnitude up, or just truncate it.
Computer Organization: Basic Processor Structure
Rounding (cont.)
I For RZ you always truncate.
I For RM you truncate positive numbers, and round up negativenumbers.
I For RP you truncate negative numbers, and round up positivenumbers.
Method +1.0110, 10 −1.0110, 10 +1.0110, 01 −1.0110, 01
RN +1.0111 −1.0111 +1.0110 −1.0110RZ +1.0110 −1.0110 +1.0110 −1.0110RP +1.0111 −1.0110 +1.0111 −1.0110RM +1.0110 −1.0111 +1.0110 −1.0111
Computer Organization: Basic Processor Structure
Floating-Point Addition
Example: +1.0111× 23 − 1.1011× 22
1. Align binary points. Adjust exponent to the largest.+1.0111, 00× 23 − 0.1101, 10× 23.
2. Add the mantissas.
001.0111, 001 10
10
01.
10
11
01
01,
00
00
−000.1101.10 ⇒ +111.0010, 10
000.1001, 10
Result:
+0.1001, 10× 23.
3. Normalize the mantissa.+1.0011, 00× 22.
4. Round the mantissa (using one of the four rounding methods).+1.0011× 22.
(It may be necessary to re-normalize the mantissa. Example:1.1111,11 rounded up to 10.000.)
Computer Organization: Basic Processor Structure
Floating-Point Addition (cont.)
Algorithm: FP numbers (sign, exponent, mantissa) (SA, EA, MA)added to (SB, EB, MB) producing (S, E, M)
EZ = EA
if EB > EA then
EZ = EB
while EA < EZ do
EA = EA + 1
MA = MA >> 1
while EB < EZ do
EB = EB + 1
MB = MB >> 1
if SA then
MA = -MA
if SB then
MB = -MB
MZ = MA + MB
SZ = 0
if MZ < 0 then
MZ = -MZ
SZ = 1
if MZ[1] then
MZ = MZ >> 1
EZ = EZ + 1
while !M[0] do
MZ = MZ << 1
EZ = EZ - 1
M[-5..-6] = [00]
Computer Organization: Basic Processor Structure
Floating-Point Multiplication
Example: +1.0111× 23 ×−1.1011× 22
1. Add the exponents.+1.0111×−1.1011× 25.
2. Multiply the mantissas.1.0111×1.1011
1 011110 111
000 001101 1
1 101110.11001101
Result: +1×−1× 10.1100, 11× 25 (The sign has not beendetermined yet.)
3. Calculate the sign. −10.1100, 11× 25
Computer Organization: Basic Processor Structure
Floating-Point Multiplication (cont.)
Example: +1.0111× 23 ×−1.1011× 22 (cont.)
4. Normalize the result.−1.0110, 01× 26
5. Round the result. We would use one of the four roundingmethods. (For the example, we use RM.)−1.0111× 26
Computer Organization: Basic Processor Structure
Floating-Point Multiplication (cont.)
Algorithm: FP numbers (sign, exponent, mantissa) (SA, EA, MA)added to (SB, EB, MB) producing (S, E, M)
EZ = EA + EB
MZ = MA * MB
SZ = SA ^ SB
if MZ[1] then
MZ = MZ >> 1
EZ = EZ + 1
if SZ then
if MZ[-5] | MZ[-6] then
MZ[1..-4] = MZ[1..-4] + [000001]
MZ[-5..-6] = [00]
if MZ[1] then
MZ = MZ >> 1
EZ = EZ + 1
Computer Organization: Basic Processor Structure
Computer Arithmetic: Increasing Efficiency
Because multiplication is so much slower than addition, efforts toincrease arithmetic efficiency have concentrated on multiplication.
I Wallace Trees. Instead of using one adder to sequentially addthe rows of shifted multiplicand to the product, use one adderper row. This way the additions can be carried out in parallel.The adders are usually organized as a tree.
I ROM Lookup Table. Instead of doing the multiplication ofsmaller numbers, store the products in a ROM, and look upthe result using the operands as an address.
I Arithmetic Pipeline. Do the additions sequentially, but withmultiple address. Once an adder has done its job on thecurrent operation, it can be used to do the nextmultiplication. In this way the data-path can actually work onseveral multiplications simultaneously.
Computer Organization: Basic Processor Structure
Chapter 9: Micro-Programmed CPU Design
µ-instructions look like programming langauge instructions.
Examples
RTL.R ← R + S
C++R = R + S;
More complex RTLc : R ← R + 1c : R ← 0
More complex C++if(c)
R = R + 1;
else
R = 0;
Computer Organization: Basic Processor Structure
Micro-Programmed CPU Design (cont.)
Structure of the µ-controlled processor:
>
μROMμAR
μDec
control input
control output
sequence
μop
Address selector
Computer Organization: Basic Processor Structure
Micro-Programmed CPU Design (cont.)
Elements of the µ-sequencer.
I µ-ROM — contains the µ-instructions that implement themachine cycle.
I µAR — (µ-address-register) contains the address of thecurrent µ-instruction. It is changed each clock cycle.
I Address selector — chooses the next µ-instruction address.
I µDec — (µ-decoder) sends the required control signals toimplement the current µ-instruction.
Computer Organization: Basic Processor Structure
Micro-Instruction Format
We must design a numeric form of RTL instructions, tailored tothe BRIM machine.
Fields:
I µ-op field. Specifies the µ-instruction to be performed.
I sequence field. Specifies the next µ-instruction to beexecuted.
Computer Organization: Basic Processor Structure
The Sequence Field
The µ-progrm is stored in the µROM, one instruction per word.
Address calculation methods:
1. Increment the current address. The sequencer would use anadder to add one to the current address, producing theaddress of the next micro-instruction.
2. Unconditional jump to a new address. The new address mightbe given as a field in the micro-instruction.
3. Conditional branch to a new address. The sequencer woulddecide whether or not to take the branch, based on a controlinput. If the branch were taken, the new address would belooked up in a ROM jump table. if the branch were taken. Ifthe branch were not taken, the current address would beincremented.
Computer Organization: Basic Processor Structure
The Sequence Field (cont.)
The sequence filed contains a code that indicated=s the choice ofthe addressing methods.
Alternate ways of calculating a conditional branch address:
I A hardwired address calculator could be used to calculate thenew address.
I The new address might be given as part of themicro-instruction, in the same way as it is for an unconditionalbranch.
I The new address might be looked up in a ROM table, oftencalled a jump table, using the control input to the sequenceras an index into the table. (This is our choice.)
Computer Organization: Basic Processor Structure
The Sequence Field (cont.)
A simple jump table, based on the BRIM machine.Inputs are used as the address into the table:
I IRX15−12 — the op-code.
I IRX11 — the addressing mode bit.
I ZX — the contents of the Z flag.
IRX15−12 IRX11 ZX Address
0000 X X 011010101 1 X 100111010 X 1 01111
Computer Organization: Basic Processor Structure
The Select and Address Sub-Fields
The sequence field can be split into two subfields:
I select field: selects between address calculation methods 2,and 3. (Method 1 is not implemented for the BRIM machine.)
I address field: the address of an unconditional branch.(Conditional branches use a jumo table.)
address select
sequence
micro-op
1 405
Computer Organization: Basic Processor Structure
The Select and Address Sub-Fields (cont.)
I The sequence field is one of th inputs to the address selector.
I The other input is the control word.
I The control word consists of the op-code, and the value of theZ flag.
ZXIRX
16 1
Computer Organization: Basic Processor Structure
The Select and Address Sub-Fields (cont.)
Address Selector Structure:
μCROMmapper 0
1
control word
selectaddress
μAR
address selector
I The µCROM (micro control ROM) contains the jump table.
I For a conditional branch, the control word is hashed into anaddress by the mapper, and that jump table entry is used asthe address of the next µ-instruction.
I For an unconditional branch, the address field from theµ-instruction is used as the address of the next µ-instruction.
Computer Organization: Basic Processor Structure
Micro Architectures
I The µDec takes the µ-op field of the µ-instruction, andtranslates it into control signals that are sent to the data-path.
I There are several ways of implementing the µ-op field.I Direct control — The µ-op field is simply composed of the
control signals.I Vertical control — The data-path is split into several units.
The µ-op field specifies an op-code for each of the units.I Horizontal control — The µ-op field is composed of bits that
correspond to individual µ-operations in the processor. This isa compromise between vertical control, and direct control.
Computer Organization: Basic Processor Structure
Micro Architecture (cont.)
Examples (Simple Machine)
A single operation:M[TP + 1]← M[TP] + M[TP + 1],TP ← TP + 1
The µ-instructions:AR ← TPX ← M[AR],AR ← AR + 1Y ← M[AR],TP ← TP + 1M[AR]← X + Y
Computer Organization: Basic Processor Structure
Micro Architecture (cont.)
Names, and control for each µ-operation:
Code Name Micro-Inst. Control
Reg1 AR ← TP ARLD, STPMem1 X ← M[AR] XLD, MEReg2 AR ← AR + 1 INARMem2 Y ← M[AR] YLD, MEReg3 TP ← TP + 1 INTPMem3 M[AR]← X + Y MW, SADD
We have split the operations into two categories: Reg (registeroperations), and Mem (memory operations). Notice that no tworegister operations are performed in the same µ-instruction, and notwo memory operations are performed in the same µ-instruction.
Computer Organization: Basic Processor Structure
Direct Control
In direct control, the µop field contains one bit for each controlline. For our sample machine this would result in a 9-bit field.
ARLD STP
XLD ME
INAR
YLD
INTP
MW
SADD
0 0 0 0 0 011 1
(This sample shows the µop field for the µ-instructionX ← M[AR],AR ← AR + 1.)Notice that the µDec is just a straight-line connection between theµ-instruction, and the control lines, for direct control.
Computer Organization: Basic Processor Structure
Horizontal Control
In horizontal control, each µ-operation has a bit in the µop field.We start by giving each µ-operation a name. (Notice that this, ingeneral, shrinks the size of the µop field.)
Bit Name Micro-Op.
TP2AR AR ← TPRdX X ← M[AR]ARInc AR ← AR + 1RdY Y ← M[AR]TPInc TP ← TP + 1WtXY M[AR]← X + Y
Computer Organization: Basic Processor Structure
Horizontal Control (cont.)
The resulting µop field, for our simple machine is 6 bits wide. Weshow the same sample µ-instruction.
TP2AR ARInc
RdX RdY
TPInc
WtXY
0 1 1 0 0 0
The µDec is no longer a simple direct connection. It musttranslate between µ-operations, and control signals.
Computer Organization: Basic Processor Structure
Horizontal Control (cont.)
The Horizontal control µDec:
Interface:
μdec
TP2AR
RdX
ARInc
RdY
TPInc
WtXY
ARLD
XLD
STP
ME
INAR
YLD
INTP
MW
SADD
Control:
Output Inputs
ARLD TP2ARSTP TP2ARXLD RdXME RdX + RdYINAR ARIncYLD RdYINTP TPIncMW WtXYSADD WtXY
Computer Organization: Basic Processor Structure
Vertical Control
I The µop field is divided into sub-fields, correspojnding togroups of µ-operations.
I We group µ-operations. Typically the grouping is based ondata-path devices.
I We assign each µ-operation in a group an op-code. Toperform a particular µ-operation, the op-code is stored in thesub-field belonging to that group.
I You must verify that no two µ-operations in a group areperformed simultaneously.
I Groups for our example machine:I Micro-operations associated with register manipulation.I Micro-operations associated with memory manipulation.
Computer Organization: Basic Processor Structure
Vertical Control (cont.)
The names previously given to the µ-operations, Reg1, Reg2,Reg3, Mem1, Mem2, and Mem3, give the group, and code foreach µ-operation.
Instruction format, and codes for sample µ-instruction:
Reg
Mem
01 0 1
(Notice that the width of the µop field decreases from thehorizontal control.)
Computer Organization: Basic Processor Structure
Vertical Control (cont.)
The vertical µDec.
I The two sub-fields of the µop field are decoded in to triggerlines for the horizontal machine.
I The trigger lines are fed into a horizontal µDec, whichdecodes them into control signals for the data-path.
Reg
Mem
Dec
Dec
Hrztl.μDec
2
2Data-path
9
123
123
TP2AR
RdXRdy
ARIncTPInc
WtXY
(Notice that as we move from direct control to vertical control, thesize of the µop field, and therefor the µROM, decreases. However,the complexity of the µDec increases.)
Computer Organization: Basic Processor Structure
Micro-Control for the BRIM Machine
We will be implementing the horizontal control for the BRIMmachine, since it is a good compromise in terms of the size of theµROM, and the complexity of the µDec.
Design tasks:
I Specify the contents of the µROM.
I Specify the contents of the µCROM.
I Describe the structure of the mapper.
I Give a structural description of the µDec.
Computer Organization: Basic Processor Structure
The BRIM Micro-Program
Naming the BRIM µ-operations.
Micro-Operation Signal
AR ← PC PC2ARIR ← M[AR] RdIRPC ← PC + 1 PCIncAR ← IR7−0 IR2ARSR ← M[AR] RdSRSR ← if IR7 then IR6−0 else R[IR6−4] SRFtchR[IR10−8]← SR SR2RM[AR]← R[IR10−8] WtRDR ← R[IR10−8] R2DRR[IR10−8]← SR + DR RAddZ ← (SR + DR) = 0 ZAddR[IR10−8]← SR − DR RSubZ ← (SR − DR) = 0 ZSub
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
Naming the BRIM µ-operations (cont.).
Micro-Operation Signal
R[IR10−8]← SR ∧ DR RAndZ ← (SR ∧ DR) = 0 ZAndR[IR10−8]← SR ∨ DR ROrZ ← (SR ∨ DR) = 0 ZOr
R[IR10−8]← DR RNot
Z ← DR = 0 ZNotPC ← IR7−0 IR2PCR[IR10−8]← Din InRDout ← if IR7 then IR6−0 else R[IR6−4] OutR
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
µOp Format:
PC2AR
RdR
PCInc
IR2AR
RdSR
SRFtch
SR2R
WtR
R2DR
RAdd
ZAdd
RSub
ZSub
RAnd
ZAnd
ROr
ZOr
RNot
ZNot
IR2PC
InR
OutR
We now give the layout of the µROM.
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
Phs Loc Add Sel
PC2AR
RdIR
PCInc
IR2AR
RdSR
SRFtch
SR2R
WtR
R2DR
RAdd
ZAdd
RSub
ZSub
RAnd
ZAnd
ROr
ZOr
RNot
ZNot
IR2PC
InR
OutR
F0 00000 00001 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0F1 00001 00000 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MD2 00010 00011 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MD3 00011 00100 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0M4 00100 00000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MRD2 00101 00110 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0MRD3 00110 00100 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0St2 00111 01000 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0St3 01000 00000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ad2 01001 01010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Ad3 01010 01011 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Ad4 01011 00000 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0Sb2 01100 01101 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Sb3 01101 01110 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Sb4 01110 00000 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
Phs Loc Add Sel
PC2AR
RdIR
PCInc
IR2AR
RdSR
SRFtch
SR2R
WtR
R2DR
RAdd
ZAdd
RSub
ZSub
RAnd
ZAnd
ROr
ZOr
RNot
ZNot
IR2PC
InR
OutR
An2 01111 10000 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0An3 10000 10001 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0An4 10001 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0Or2 10010 10011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0Or3 10011 10100 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Or4 10100 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0N2 10101 10110 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0N3 10110 10111 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0N4 10111 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0J2 11000 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0I2 11001 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0Ot2 11010 00000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
I The phase, and location of each µ-instruction are given.
I The location must match the jump table.
I All µ-operations corresponding the the sequencer, C , havebeen eliminated. The µ-controlled processor does not use thehardwired sequencer.
I µ-instructions consist of the addess field, the select field, andthe horizontal µop bits.
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
Differences between the µ-controlled machine, and the hardwiredmachine.
1. The phase M2 has been replaced by two identicalmicro-instructions: MD2, and MRD2.
2. The phases Bz2, Bz2 ∨ Bz2, Bn2, Bn2 ∨ Bn2, and J2 have allbeen compressed into a single phase, J2.
Difference 1 is to simplify the jump table. Difference 2 is to makethe µ-program smaller.
Computer Organization: Basic Processor Structure
The BRIM Micro-Program (cont.)
Difference 1 illustrated:
IR15-12 = ?
0
AR <- IR7-0
SR <- M[AR]
R[IR10-8] <- SR
Dec
IR11
10
Dir
M2
MD3MRD3
M4
SR <- IR3 ? IR2-0 : R[IR2-0]
IR15-12 = ?
0
AR <- IR7-0
SR <- M[AR]
R[IR10-8] <- SR
Dec
IR11
10
Dir
MRD2
MD3MRD3
M4
SR <- IR3 ? IR2-0 : R[IR2-0]
AR <- IR7-0
MD2
Computer Organization: Basic Processor Structure
The BRIM Jump Table and Mapper
The mapper converts from the µ-control word to an address intothe the µCROM. The jump table is used to jump to code for themachine instruction, from F1.
ZXIRX15-11
5 1
(This mapper only eliminates bits of IRX.)
Computer Organization: Basic Processor Structure
The BRIM Jump Table and Mapper (cont.)
The jump table.
IRX15−12 IRX11 ZX Add Phase0000 1 X 00010 MD2
0000 0 X 00101 MRD2
0001 X X 00111 St2
0010 X X 01001 Ad2
0011 X X 01100 Sb2
0100 X X 01111 An2
0101 X X 10010 Or2
0110 X X 10101 N2
0111 X 1 11000 J2
0111 X 0 00000 F0
1000 X 0 11000 J2
1000 X 1 00000 F0
1001 X X 11000 J2
1010 X X 11001 I21011 X X 11010 Ot2
11XX X X XXXXX
Computer Organization: Basic Processor Structure
The µDec
Converting from horizontal signals to direct control signals.(Figure out the control for each µ-operation.)
Output Signal FormulaRLD SR2R + RAdd + RSub + RAnd + ROr + RNot + InRMW WtRIOLD OutRPCLD IR2PCDRLD R2DRSRLD RdSR + SRFetchIRLD RdIRARLD PC2AR + IR2ARZLD ZAdd + ZSub + ZAnd + ZOr + ZNotRE SRFetch + WtR + R2DR + OutRME RdIR + RdSRIOE InR
Computer Organization: Basic Processor Structure
The µDec (cont.)
Output Signal FormulaSPC PC2ARSALU SR2R + RAdd + RSub + RAnd + ROr + RNotSAD IR2AR + IR2PCSRF SR2R + WtR + R2DR + RAdd + RSub + RAnd + ROr
+RNot + InRSS SRFetch + OutRSD WtR + R2DRPCIN PCIncALUOP0 RAnd + ZAnd + RSub + ZSub + RNot + ZNotALUOP1 RSub + ZSub + RAnd + ZAnd + ROr + ZOrALUOP2 SR2R + RAnd + ZAnd + ROr + ZOr + RNot + ZNot
Computer Organization: Basic Processor Structure
Comparing µ-Control to Hardwired Control
I Ease of Processor ModificationWe might want modify the processor to add capability, fixbugs, or impose a new architecture. The hardwire-controlledprocessor must be redesigned, from scratch. With theµ-controlled all that is often necessary is to reprogram it.
I Complexity of the Processor CircuitryWith the hardwire-control processor, the circuitry mustimplement the full control flowchart. In the µ-controlprocessor, most of the complexity is in the firmware
I Speed of Machine Instruction ExecutionThe Hardwire-controlled processor is an efficient, customcircuit. The µ-controlled processor design is more general, andit does extensive memory access to fetch the µ-instructionsfrom the µROM.
Computer Organization: Basic Processor Structure
Chapter 10: A Few Last Topics
We address two limitations of computers:I Decreasing execution time. Techniques:
I Cache memory.I Instruction pipelining.
I Increasing memory space. Techniques:I Virtual memory.
Computer Organization: Basic Processor Structure
Cache Memory
I A small, fast memory in the processor.
I The processor keeps data will probably be used soon, in thecache unit.
I The prediction of usefulness for data is based on twoprinciples.
I Temporal Locality. If a particular piece of data has just beenexecuted, it will probably be executed again soon.
I Spatial Locality. If a particular piece of data has just beenexecuted, data close to it will probably be executed soon.
I We study one one caching method.
Computer Organization: Basic Processor Structure
Direct Mapped Cache
I Data is always placed in a fixed location in cache.I Example:
I DRAM is 64× 8.I Cache is a 4× 20 SRAM unit.
I Each word of the cache is called a cache entry.
I When a word is fetched into cache, its neighbor is alsofetched. (Each entry contains two words of DRAM.)
I Placing a word: Use its modulus 4 value. (010110 would beplaced at SRAM location 10.)
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
I This location might be already in use. (For example, DRAMaddress 000110 would map to location 10 also.) This is calleda cache conflict.
I A cache conflict may occur even if there are unoccupiedentries.
I Use: When the processor needs a datum, it first looks to see ifit is in cache. If so, (a cache hit) it is etched from cache. Ifnot, (a cache miss it is fetched from DRAM, and placed incache.
I We need a way of determining if an entry is in its cache entry,or not.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Details of the cache entry.
tag Vdata
00
01
10
11
0 1
1
1
1
0
101
011
110
000
10100001 00101111
10000001 00011100
11110111 00001000
01100110 10011001
I The Data field stores two words from DRAM.
I The V and tag field are bookkeeping fields used to determinewhat locations are stored in the entry.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
I The data field contains two DRAM words.
I Each DRAM word has a different offset:one at offset 0, andone at offset 1.
I When a DRAM word is fetched, so is the word at its offsetpair.
I The V bit indicates that an entry is occupied.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Cache addressing: The DRAM address is split into fields, and usedto address the cache unit.
tag offsetindex
123
I The offset field is used to access the correct data word in theentry.
I The index is the entry number in the SRAM.
I The tag is stored in the entry, to uniquely identify the DRAMwords in the cache entry.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Examples
For DRAM address 00011100, the offset is 0, the index is 10, andthe tag is 00011.
To determine if this word is in the cache, the entry 10 is examined.If the entry’s V bit is set, and its tag is 00011, then it is a cachehit. Otherwise it is a cache miss.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Procedure for a read operation:
1. Use the index to locate the cache entry corresponding to theaddress.
2. If the V bit is clear, this read results in a cache miss. Theword and its offset pair are read from DRAM into the cahceunit.
3. If the tag does not match the tag of the entry, this is also amiss
4. Otherwise, theread results in a hit, The offset of the DRAMaddress specifies which of the two data words contain thecorresponding data.
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Determining address field sizes for a cache.
Examples
I DRAM size: 2m × n (A DRAM address is m bits.)
I Number of words in the data field: 2w
I Cache length: 2k
I Offset field: w bits
I Index field: k bits
I Tag field: m − k − w bits
Computer Organization: Basic Processor Structure
Direct Mapped Cache (cont.)
Cache entry width:
I Size of tag: m − k − w
I Size of V bit: 1
I Size of data: n × w
I Total: wn + m − k − w + 1
Examples
I A DRAM that is of size 1K × 8 (210 × 8).
I A cache with eight (23) data words per entry, and 16 (24)entries.
Size of tag: 10− 4− 3 = 3Width of entry: 8 · 8 + (10− 4− 3) + 1 = 68
Computer Organization: Basic Processor Structure
Writing to Cache
I Writes are always performed on the cache entry.
I When a cache entry must be replaced because of a conflict,We must ensure that the changes have been written toDRAM.
I A cache entry that has changes is called dirty. A cache entrythat has not been changed is called clean.
I Writing dirty entries to DRAM:I Write-back — the dirty cache entry is written to DRAM when
the cache entry is replaced, due to a conflict.I Write-through — the dirty cache entry is written to Dram
immediately after each change.I Comparison: Write-back keeps the SRAM and DRAM
inconsistent for long periods of time. Write-through requiresmany DRAM accesses, which decreases the cache performance
Computer Organization: Basic Processor Structure
Cache Performance
Measuring performance.
Hit and miss ratioRh = Nh
n ,Rm = 1− Rh
where n is the number of memory accesses, and Nh is the numberof hits.
Expected hit and miss ratesh = E (Rh),m = 1− h
Expected memory access time, on a machine with cache.TA = Th + (1− h) · Tm
Expected time to access memory on a machine without cache.TB = Tm
Computer Organization: Basic Processor Structure
Cache Performance (cont.)
Cache performance (comparing the performance of the twomachines).
PA,B = TBTA
Examples
For Machine A, Th = 2, Tm = 8, and h = 0.75. The performanceof the cached machine compared to the base machine is as follows:
PA,B = 82+(1−0.75)·8 = 8
4 = 2
Computer Organization: Basic Processor Structure
Instruction Pipelining
I Pipelining increases throughput: the number of instructionsthat can be executed in a given time period.
I Pipelining does not change execution speed : the time neededto execute a single instruction.
I Pipelining introduces parallelism into the architecture: theability to work on several instructions simultaneously.
I Ways of introducing parallelism.I Using several processors. This requires divvying up the
program among the processor, and devising a method ofcommunication between processors.
I Using multiple cores. Cores are scaled down processors thatare on the same chip, and that share devices, like memory.Multi-core processors have less problems with inter-corecommunication.
I Use a single data-path and control unit that that is designedto process several instructions simultaneously (pipelining).
Computer Organization: Basic Processor Structure
Instruction Pipelining (cont.)
I Pipelining requires restructuring of the normal busarchitecture into a stream architecture.
I In a stream architecture, the data-path is structured as severaldevices, one feeding its output in as the input the the nextdownstream device.
I Each device in the stream is referred to as a stage.
I In a simple scenario, each stage does its work in 1 clock cycle.
Computer Organization: Basic Processor Structure
Instruction Pipelining (cont.)
An example 5-stage pipeline:
Ftch Dec Ex Mem
Memory
Reg ALSU
WtB
Computer Organization: Basic Processor Structure
Instruction Pipelining
Pipeline Stages:
1. Ftch — Fetch an instruction from memory.
2. Dec — Decode the instruction, and fetch operands.
3. Ex — Execute the instruction.
4. Mem — Read/write from/to memory, if needed.
5. WtB — (Write-back) Write result to the register file, ifnecessary.
Clocking.
I The clock period must be long enough to accomodate theslowest stage.
I For an s-stage pipeline, an instruction will finish in s cycles.
Computer Organization: Basic Processor Structure
Instruction Pipelining (cont.)
I We plan on running all stages simultaneously.I Resources, like the memory unit, used by the pipeline must be
specially designed to handle multiple requests simultaneously,from several stages.
I In the pipelining scheme, all stages work on differentinstructions.
Example with consecutive instructions, I1, I2, I3, I4, and I5.Cycle Ftch Dec Ex Mem WtB0 I11 I2 I12 I3 I2 I13 I4 I3 I2 I14 I5 I4 I3 I2 I15 I5 I4 I3 I26 I5 I4 I37 I5 I48 I5
Computer Organization: Basic Processor Structure
Problems with Pipelines
Problems:
I Data hazards. A result of an unfinished machine instruction isneeded by a later machine instruction.
I Branch hazard. A result from a conditional branch instructionis needed to determine which instruction to fetch next.
Examples (Data Hazard)
add R0, R1, R2
mult R2, R0, R2
Cycle Ftch Dec Ex Mem WtB0 Add1 Mult Add2 Mult Add
(Notice that when the Mult instruction is fetching R2, the Addinstruction has not stored its result in R2.)
Computer Organization: Basic Processor Structure
Problems with Pipelines (cont.)
One solution, although it slows the pipeline, is to inject 3 cyclestall between the two instructions.
Cycle Ftch Dec Ex Mem WtB
0 Add
1 Add
2 Add
3 Add
4 Mult Add
5 Mult
(Notice that the Add instruction has completed writing its resultby the time the Mult instruction is fetching operands.)
Computer Organization: Basic Processor Structure
Problems with Pipelines (cont.)
Examples (Branch Hazard)
beq R0, R1, xyz
Cycle Ftch Dec Ex Mem WtB0 Beq1 I1 Beq2 I2 I1 Beq3 I3 I2 I1 Beq4 I4 I3 I2 I1 Beq
I Notice that by the time that the Beq instruction has writtenits result to the PC, the 4 next instructions have entered thepipeline.
I If the branch is taken, these instructions should not have beenstarted.
I Again, this can be solved by inserting a 4 cycle stall after theBeq instruction.
Computer Organization: Basic Processor Structure
Pipeline Performance
We compare a pipelined machine to one that is not.
Expected time to execute a sequence of n instructions on a k-stage pipeline:
Tk = (n − 1) + k(It is assumed that the pipeline is operated at full capacity.)
Expected time to execute n instructions on a single-stage machine:T1 = n · k
Performance:Pk = T1
Tk
Computer Organization: Basic Processor Structure
Pipeline Performance (cont.)
Examples (A 4-stage machine)
Tk = (1, 000− 1) + 4 = 1, 003
T1 = 1, 000 · 4 = 4, 000
Pk = 4,0001,003 ≈ 3.988
(The pipelined machine is about 4 times as fast.)
Computer Organization: Basic Processor Structure
Increasing Memory Space
The memory hierarchyextends logical memoryfrom the cache, all of theway to disk storage.
Reg
CacheL1
CacheL2
Memory
Disk
Computer Organization: Basic Processor Structure
Virtual Memory and Paging
I To execute a program that cannot fit in memory, we split itinto pieces called pages.
I Pages are brought into memory as needed, and shuffled outwhen we finish with them. On disk they reside in the swap file.
I every word in the program has a virtual address: its locationin the large virtual memory.
I Every word that is in DRAM has a physical address: itslocation in the DRAM unit.
I When the processor fetches a word using its virtual address, itmust be determined if the word is on disk, or in DRAM.
I If the word is in DRAM (a page hit) It is used from DRAM.
I If the word is not in memory (a page fault), it is read fromthe swap file and placed in DRAM.
Computer Organization: Basic Processor Structure
Paging
Examples
I Program size: 16K = 24 × 210 (14-bit virtual address)
I Workspace size: 4K = 22 × 210 (12-bit physical address)
I Page size: 1K = 210 (10-bit page offset)
I The program would be split into 16 pages.
I The workspace is split into frames. Each frame holds 1 page.
I Frames are numbered 0 – 3 (00 – 11)
I We refer to a workspace split up into frames as a frame table.
I The swap file would be 16K words. It would be split into 16pages.
I Pages would be numbered 0 – 15 (0000 - 1111).
Computer Organization: Basic Processor Structure
Paging (cont.)
Address translation: the 14-bit virtual address would be split into a4-bit page number, and a 10-bit address. for the example machine.
Physical Address
Page# OffsetPage Table
Swap File
Workspace
00
01
10
11
Frame#V Frame#
01010101010100
101
Page#
0100
Virtual Address
Frame# Offset
10 0101010101
Page#
0100
Computer Organization: Basic Processor Structure
Paging (cont.)
I A page table is kept in protected memory. It has an entry forevery page in the workspace. The page table entry for a pagenumber gives its frame number, if it is DRAM (as determinedby the V bit).
I To assemble the physical address of a page that is in DRAM(the V bitis 1), the page number is looked up in the pagetable, yielding the frame number
I The virtual address is then completed by adding in the offsetfrom the virtual address.
I The required word is then accesses from the frame table.
I If a page is not in the frame table (the V bit is 0), the swapfile is accessed using the page number, and the page is loadedinto the frame table.
Computer Organization: Basic Processor Structure
Page Replacement
If a new page needs to be loaded into the frame table, but theframe tabl is full, one of the frames must be written out to theswap file, and the emptied frame must be filled with the new page.This process is called a page swap.
Page replacement strategies:
I Random (RAN) replacement. Randomly choose a frame toreplace.
I Least recently used (LRU) replacement. Replace the pagethat has not been accessed for the longest time.
I First in, first out (FIFO) replacement. Replace the frame thathas been sitting in the frame table the longest.
Computer Organization: Basic Processor Structure
Disk Access
I LRU, and FIFO both make valid postulation on how useful thepages in the frame table are
I It is, however, possible to create page reference scenarios, thatare non-obscure which cause LRU and FIFO to do excessiveswaps.
I Random replacement is more immune to this, mostly becauseof its unpredictability.
I Another problem with paging is that pages are fixed size, anda program is split automatically into pages. Naturalprogramming structures, like loops, might be split betweenpages. This could cause page swapping as the loop isexecuted.
Computer Organization: Basic Processor Structure
Disk Access (cont.)
I In an alternate scheme to paging, segmentation, the userexplicitly breaks the program up into variable size blocks,resulting in more natural breaks.
I The problem with segmentation is making full use of theworkspace with variable size segments. Unused space in theworkspace results in an increase in swapping.
I (It is possible to combine paging and segmentation into a2-level hybrid compromise system.)
Computer Organization: Basic Processor Structure
Memory Protection
I In a non-virtual memory system, there is nothing stopping oneprocess from writing or reading to/from another process’sworkspace, by specifying an appropriate address.
I In a virtual memory system, any address specified by aprogram is considered a virtual address, and translated into alocation in the program’s own workspace.
I The translation process automatically protects all processesfrom each other.
I The weakness of the virtual memory system is the page table.program cannot be allowed to change its own table table.
I The page table is stored in protected memory that can only beaccessed by a process running in kernel mode (the OS).Because of this, the page table can only be changed throughthe interrupt system.
Computer Organization: Basic Processor Structure
Other Interesting Topics
I/O Structure
I How is data sent to another device: one bit at a time (serialcommunication), or all at once (parallel communication).
I The actual circuitry involved in handling interrupts (theinterrupt cycle).
I Speeding up memory access. This can be done byout-sourcing transfers to a separate processor, called a directmemory access device (DMA). This allows the CPU tocontinue work while the DMA works on the transferasynchronously.
Computer Organization: Basic Processor Structure
Other Interesting Topics (cont.)
Parallel Architectures
I Systems with several processors
I Adding processors to a task can decrease execution time.However there are complications
I A scheme for interprocess communication must be developed.
I Processor synchronization must be addressed.
I Sharing of memory is an issue: is a single memory shared byall processors, is memory divided up among processors(distributed memory), or does each processor have its ownunshared memory?
Computer Organization: Basic Processor Structure
Top Related