Download - Computer Systems Organization and Architecture Topic 3: Processor Design.

CPU must:◦ Fetch instructions

(suruhan ambil)◦ Interpret _______ (tafsir

_______)◦ _______ data (_______ data)◦ Process data (proses data)◦ Write data (tulis data)

Registers form the highest level of the memory hierarchy (hierarki ingatan)◦ Small set of high speed storage locations◦ ______ storage for data and control information

Two types of registers◦ User-visible

May be referenced by assembly-level instructions (suruhan paras perhimpunan) and are thus “_______” to the user

◦ Control (kawalan) and _______ registers Used to control the operation of the CPU Most are not visible to the user

General categories based on function◦ General purpose (Serba guna)

Can be assigned a variety of functions Ideally, they are defined _______ to the operations

within the instructions◦ _______

These registers only hold data◦ Address (Alamat)

These registers only hold _______ information Examples: general purpose address registers,

segment pointers, stack pointers, index registers◦ _______ codes (Kod _______)

Visible to the user but values set by the CPU as the result of performing operations

Example code bits: _______, _______, overflow (limpahan)

Bit values are used as the basis for conditional jump instructions (suruhan lompat bersyarat)

Design trade off (tukar ganti) between general purpose and specialized registers◦ General purpose registers _______ flexibility in

instruction design◦ _______ purpose registers permit implicit register

specification in instructions - reduces register field size in an instruction

◦ No clear “best” design approach How many registers are enough?

◦ More registers permit more operands (kendalian) to be held within the CPU - reducing memory bandwidth requirements to some extent

◦ More registers cause an _______ in the field sizes needed to specify registers in an instruction word

◦ Locality of reference may not support too many registers

◦ Most machines use _______registers

How big (wide)?◦ Address registers should be _______ enough to hold the

longest address◦ Data registers should be wide enough to hold most data

types Would not want to use _______-bit registers if the vast

majority of data operations used 16 and 32-bit operands

Related to width of memory _______ bus Concatenate registers together to store longer formats

B-C registers in the 8085 AccA-AccB registers in the 68HC11

These registers are used during the _______, decoding (penyahkodan) and _______ of instructions◦ Many are not visible to the user / programmer◦ Some are visible but can not be (easily) modified

Typical registers◦ _______ counter (PC)

Points to the next instruction to be executed◦ _______ register (IR)

Contains the instruction being executed (most recently)

◦ Memory _______ register (MAR) Contains the address of a location in memory

◦ Memory _______ / _______ register (MBR) Contains a word of data to be written to memory or

the word most recently read◦ Program _______ word(s)

Superset of condition code register Interrupt masks, supervisory modes, etc. Status information

A set of bits Includes Condition Codes _______

◦ Contains the sign of the result of the last arithmetic operation _______

◦ Set when the result is 0 _______

◦ Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high-order bit

_______◦ Set if a logical compare result is equality

_______◦ Used to indicate arithmetic overflow

Interrupt enable/disable◦ Used to enable or disable interrupts

Supervisor◦ Indicates whether the CPU is executing in supervisor or user

mode

_______ Cycle◦ May require memory access

to fetch operands◦ _______ addressing requires

more memory accesses◦ Can be thought of as

additional instruction ________

Depends on CPU design In general:

_______◦ PC contains _______ of next instruction◦ Address moved to _______◦ Address placed on address bus◦ Control unit requests memory read◦ Result placed on _______ bus, copied to MBR, then to IR◦ Meanwhile PC _______ by 1

IR is examined If indirect addressing, indirect cycle is _______

◦ Right most N bits of _______ transferred to _______◦ Control unit requests memory _______◦ Result (address of _______) moved to MBR

May take many forms Depends on _______ being executed May include

◦ _______ read/write◦ Input/Output◦ _______ transfers◦ _______ operations

_______ _______ Current PC saved to allow resumption after interrupt Contents of PC copied to MBR Special memory location (e.g. _______ pointer) loaded to

MAR MBR written to _______ PC loaded with address of interrupt handling routine Next instruction (first of _______ handler) can be fetched

Prefetch◦ Fetch accessing main _______◦ Execution usually does not _______ main memory◦ Can fetch next instruction during execution of current

instruction◦ Called instruction _______

Improved Performance◦ But not doubled:

Fetch usually _______ than execution Prefetch more than one instruction?

Any jump or _______ means that prefetched instructions are not the required instructions

◦ Add more _______ to improve performance

The Central Processing Unit (CPU) is the _______ combination (kombinasi lojik) of the _______ _______ _______ (ALU) and the system’s control unit

In this sub-section, we focus on the ALU and its operation◦ Overview of the ALU◦ Data representation (Perwakilan data)◦ Computer Arithmetic and its hardware implementation

The ALU is that part of the computer that actually performs _______ and _______ operations on data

All other elements of the computer system are there mainly to bring _______ to the ALU for processing or to take _______ from the ALU

Registers are used as _______ and _______ for most ALU operations

In early machines, _______ and _______ determined the overall structure of the CPU and its ALU◦ Result was that machines were built around a single

register, known as the __________ (penumpuk)◦ The __________ was used in almost all ALU related

_________

The _______ and _______of the CPU and the ALU is improved through increases in the complexity of the hardware◦ Use _______ register sets to store operands, addresses

and results◦ _______ the capabilities of the ALU◦ Use special hardware to support _______ of execution

between points in a program◦ _______ functional units within the ALU to permit

concurrent operations Problem: design a minimal cost yet fully functional ALU

◦ What building block components would be included?

Solution:◦ Only 2 basic _______ are required to produce a fully

functional ALU A bit-wide _______ _______ unit A 2-input _______ gate

◦ NAND is a functionally complete logic operation◦ Similarly, if you can add, all other arithmetic operations

can be derived from addition.◦ To conduct operations on _______ bit words is clearly

tedious (menjemukan)!◦ Goal then is to develop arithmetic and logic circuitry that

is algorithmically _______ while remaining cost effective

_______-_______ format◦ Positional representation using n bits◦ Left most bit position is the sign bit

0 for _______ number 1 for _______ number

◦ Remaining n-1 bits represent the _______◦ Range: {-2n-1-1, +2n-1-1}◦ Problems:

Sign must be considered during arithmetic operations Dual representation of zero (-0 and +0)

Ones ______________ format◦ Binary case of diminished (menyusut) _______

complement ◦ Negative numbers are represented by a bit-by-bit

______________ of the (positive) magnitude (the process of negation)

◦ Sign bit interpreted as in sign-magnitude format◦ Examples (8-bit words):

+42 = 0 00101010- 42 = 1 11010101

◦ Still have a _______ representation for zero (all zeros and all ones)

Twos ______________ format◦ Binary case of radix complement◦ Negative numbers, -X, are represented by the pseudo-

positive number 2n - |X|◦ With 2n symbols

2n-1-1 _______ numbers 2n-1 _______ numbers

◦ Given the representation for +X, the representation for -X is found by taking the 1s complement of +X and adding 1

◦ Caution: avoid confusion with “2s complement _______ (representation) and the 2s complement _______

◦ Converting between two word lengths (e.g., convert an 8-bit format into a 16-bit format) requires a sign extension: The _______ bit is extended from its current location up

to the new location All bits in the extension take on the value of the old

_______ bit

+18= 00010010+18= 00000000 00010010

-18= 11101110-18= 11111111 11101110

Use of a single _______ adder is the simplest hardware◦ Must implement an n-repetition for-loop for an n-bit

addition◦ This is lots of _______ for a typical addition

Use a _______ adder unit instead◦ n full adder units cascaded together◦ In adding X and Y together unit i adds Xi and Yi to

produce SUMi and CARRYi◦ Carry out of each stage is the carry in to the next stage◦ Worst case add time is n times the delay of each unit --

despite the _______ operation of each adder unit -- Order (n) delay

◦ With signed numbers, watch out for _______: when adding 2 positive or 2 negative numbers, _______ has occurred if the result has the _______ sign

Alternatives to the ripple adder◦ Must allow for the worst case delay in a ripple adder◦ In most cases, _______ signals do not propagate through

the entire adder◦ Provide additional hardware to detect where carries will

occur or when the carry _______ is completed◦ Carry Completion Sensing Adders use additional circuitry

to detect the time when all carries are completed Signal control unit that add is finished Essentially an ______________ device Typical add times are O(log n)

◦ Carry ___________ Adders Predict in advance what adder stage of a ripple adder

will generate a carry out Use prediction to avoid the carry propagation delays --

generate all of the carries at once Add time is a _______, regardless of the width, n, of the

word -- O(1) Problem: prediction in stage i requires information from

all previous stages -- gates to implement this require large numbers of inputs, making this adder impractical for even moderate values of n

To perform X-Y, realize that X-Y = X+(-Y)

Therefore, the following hardware is “typical”

A number of methods exist to perform integer multiplication◦ Repeated _______: add the multiplicand to itself

“multiplier” times◦ Shift and add -- traditional “pen and paper” way of

multiplying (extended to binary format)◦ High speed (special purpose) hardware multipliers

_______ addition◦ Least sophisticated method◦ Just use adder over and over again◦ If the multiplier is n bits, can have as many as 2n

iterations of addition -- O(2n) !!!!◦ Not used in an _______

Shift and add◦ Computer’s version of the pen and paper approach:

1011 (11)x 1101 (13)

===========1011

00000 Partial products 101100 1011000

=========== 10001111 (143)

◦ The computer version accumulates the partial products into a running (partial) sum as the algorithm progresses

◦ Each partial product generation results in an _______ and _______ operation

Shift and add hardware for unsigned integers

Shift and add flowchart for unsigned integers

To multiply signed numbers (2s ____________)◦ Normal shift and add does not work (problem in the

basic algorithm of no sign extension to 2n bits)◦ ________ all numbers to their positive magnitudes,

multiple, then figure out the correct sign◦ Use a method that works for both positive and negative

numbers ________ algorithm is popular (recoding the multiplier)

◦ ________ algorithm As in S&A, strings of 0s in the ________ only require

shifting (no addition steps) “Recode” strings of 1s to permit similar ________ String of 1s from 2u down to 2v is treated as 2u+1- 2v

In other words,- At the right end of a string of 1s in the multiplier, perform a ________- At the left end of the string perform an ________- For all of the 1s in between, just do

________ Hardware modifications required in (Figure shift and

add hardware for unsigned integers)- Ability to perform ________- Ability to perform ________ shifting rather than logical shifting (for sign extension)- A flip flop for bit Q-1

To determine ________ (add and shift, subtract and shift, shift) examine the bits Q0Q-1

- 00 or 11: just shift- 10: ________ and shift- 01: ________ and shift

Booth’s algorithm for multiplication

Advantages of Booth:- Treats positive and negative numbers

________- Strings of 1s and 0s can be skipped over

with shift operations for faster ________ time High performance multipliers

◦ ________ the computation time by employing more hardware than would normally be found in a S&A-type multiplier unit

◦ Not generally found in general-purpose processors due to expense

◦ Examples Combinational hardware multipliers Pipelined Wallace Tree adders from Carry-Save Adder

units

Once you have committed to implementing multiplication, implementing division is a relatively easy next step that utilizes much of the same hardware

Want to find quotient, Q, and remainder, R, such thatD = Q x V + R

Restoring division for ________ integers◦ Algorithm adapted from the traditional “pen and paper”

approach◦ Algorithm is of time complexity O(n) for n-bit dividend◦ Uses essentially the same ALU hardware as the ________

multiplication algorithm Adder / subtractor unit ________ wide shift register AQ that can be shifted to the

left ________ for the divisor Control logic

Restoring division algorithm for unsigned integers

For two’s complement numbers, must deal with the ________ extension “problem”

Algorithm:◦ Load M with divisor, AQ with dividend (using sign bit

extension)◦ ________ AQ left 1 position◦ If M and A have same sign, AA-M, otherwise AA+M◦ Q01 if sign bit of A has not changed or (A=0 AND

Q=0), otherwise Q0=0 and restore *A◦ Repeat ________ and +/- operations for all bits in Q◦ Remainder is in A, quotient in Q

If the signs of the divisor and the dividend were the same, quotient is correct, otherwise, Q is the 2’s complement of the quotient

2’s complement division examples

________ fixed point schemes do not have the ability to represent very large or very small numbers

Need the ability to dynamically ________ the decimal point to a convenient location

Format: +/-M x R +/-E

Significand / mantissas are stored in a ________ format◦ Either 1.xxxxx or 0.1xxxxx◦ Since the 1 is required, don’t need to explicitly store it in

the data word -- insert it for calculations only Exponents can be positive or negative values

◦ Use ________ (Excess coding) to avoid operating on negative exponents

◦ ________ is added to all exponents to store as positive numbers

For a fixed n-bit representation length, 2n combinations of symbols◦ If floating point ________ the range of numbers in the

format (compared to integer representation) then the “spacing” between the numbers must increase This causes a ________ in the format’s precision

◦ If more bits are allocated to the exponent, range is ________ at the expense of decreased precision

◦ Similarly, more significand bits increases the ________ and reduces the range

◦ The ________ is chosen at design time and is not explicitly represented in the format Small -- smaller range Large -- increased range but loss of significant bits as

a result of mantissa alignment when normalizing

Problems to deal with in the format◦ Representation of ________◦ Over and ________ and how to detect◦ ________ operations

IEEE 754 format◦ Defines single and double ________ formats (32 and

64 bits)◦ Standardizes formats across many different

platforms◦ Radix 2◦ Single

Range 10-38 to 10+38

8-bit exponent with 127 bias 23-bit mantissa

◦ Double Range 10-308 to 10+308

11-bit exponent with 1023 bias 52-bit mantissa

IEEE 754 Formats

Floating point arithmetic operations◦ Addition and subtraction

________ significand Add or subtract significand Post ________

◦ Multiplication ________ exponents Multiply significand Post normalize

◦ Division ________ exponents Divide significand Post normalize

In this section, we have focused on the operation of the CPU◦ Registers and their use◦ Instruction execution

Looked at the basicd concepts associated with computer arithmetic◦ Number representation◦ Basic ALU construction◦ Hardware and software implementations of multiplication

and division operations◦ Floating point numbers and operations

Computer Organization and Architecture, 6th Edition. Stallings, W. Prentice Hall.

Computer Organization and Design. David A. Patterson, John L. Hennessy. Morgan Kaufmann