The Central Processing Unit - Edward...

Chapter 11 – The Central Processing Unit

We now focus on the detailed design of the CPU (Central Processing Unit) of the Boz–5. The CPU has two major components: the Control Unit and the ALU (Arithmetic Logic Unit). The goal of this chapter is to explain the design as it evolves and justify the decisions made as they are taken; not “here it is – take it”, but “here is what I have done and why I chose to do it that way”. The hope is that following this author’s thought process, flawed as it might be, will help the student understand the process of design.

Architecture and Design of the Boz–5 CPU There are a number of ways in which one might approach this chapter. One of the simplest (and perhaps most interesting) would be to design a CPU and then discover what it does. This text follows a more traditional approach of specifying a functional description of the computer architecture and then evolving the implementation of that architecture to respond to the original functional design. Along the way, we might discover that the implementation might suggest fortunate modifications to the functional specification; but this is a side effect.

In a previous chapter we have described the assembly language of the Boz–5. The assembly language forms a large part of the functional specification that we now must attempt to satisfy. This chapter begins by examining each assembly language instruction and showing the implementation details that follow from the necessity to execute that instruction. We first shall discover that a considerable amount of functionality is implied by the necessity to fetch each instruction, independently of the details of its execution.

Along the way, we shall make choices for the implementation. A few are almost random, as if the designer flipped a coin and took the results as binding. Some are required in order to have a consistent design. The overall goal is simplicity in the control unit, even at the cost of additional special-purpose registers in the CPU. Registers are static devices in that they always exist and can be understood easily. Control signals are dynamic events that exist for only one clock pulse; management of these can be difficult.

The central point of this chapter is simple. It is that the design of the CPU is driven by the functional specifications for the computer as represented in its assembly language.

It would be tempting to say that all design decisions are made with full anticipation of the side-effects of the choices made; in other words, perfect foreknowledge. This is not the case. In fact, the original specification had to be changed a number of times in order to avoid complexities that arose in the design at a later point.

We have mentioned the IR (Instruction Register) and the three-bus structure in a previous chapter. We mentioned that buses B1 and B2 would be used to feed results into the ALU and bus B3 would take a result from the ALU and store it in an appropriate register. Each register places its contents on one of B1 or B2 for transmission to the ALU.

Page 335 CPSC 5155 Last Revised October 15, 2008 Copyright © 2008 by Ed Bosworth

Chapter 11 Boz-5 Design of the Central Processing Unit

Program Execution The program execution cycle is the basic Fetch / Execute cycle in which the 32-bit instruction is fetched from the memory and executed. This cycle is based on two registers: PC the Program Counter – a 20-bit address register IR the Instruction Register – a 32-bit data register.

At the beginning of the instruction fetch cycle the PC contains the address of the instruction to be executed next. The fetch cycle begins by reading the memory at the address indicated by the PC and copying the memory into the IR. At this point, the PC is incremented by 1 to point to the next instruction. This is done due to the high probability that the instruction to be executed next is the instruction in the address that follows immediately; program jumps (BRU, BGT, etc.) are somewhat unusual, during these the PC might be given a new value by execution of the instruction.

All instructions share a common beginning to the fetch sequence. The common fetch sequence is adapted to the relative speed of the CPU and memory. We assume that the access time of the memory unit is such that the memory contents are not available on the step following the memory read, but on the step after that. Here is the common fetch sequence.

MAR ← PC send the address of the instruction to the memory Read Memory this causes MBR ← MAR[PC] PC ← PC + 1 cannot access memory, so might as well increment the PC IR ← MBR now the instruction is in the Instruction Register.

At this point, we note that the Boz–5 is simpler than most modern computers in that it lacks an instruction pre-fetch unit. If the design did include an instruction pre-fetch unit, that unit would independently fetch instructions and place them in an instruction queue for use by the execute unit, which might then fetch and execute an instruction in a single step. For such a design, the queue is implemented using a number of fast registers on the CPU chip.

When the instruction is in the IR, it is decoded and the common fetch sequence terminates. After this point, the execution sequence is specific to the instruction. This subsequent execution sequence includes calculation of the EA (Effective Address) for those instructions that take an operand. For the Boz–5, these are the LDR, STR, BR, and JSR instructions.

The next step in the design of the CPU is to specify the microoperations corresponding to the steps that must be executed in order for each of the assembly language instructions to be executed. Before considering these microoperations, we study several topics. the structure of the bus or buses internal to the CPU the functional requirements on the ALU



CPU Internal Bus Structure We first consider the bus structure of the computer. Note that the computer has a number of buses at several levels. For example, there is a bus that connects the CPU to the memory unit and a bus that connects the CPU to the I/O devices. In addition to these important buses, there are often buses internal to the CPU, of which the programmer is usually unaware. We now consider the bus structure in light of the common fetch sequence.

PC ← PC + 1 This microoperation represents the incrementing of the PC to point to the next instruction on the probability that the next instruction will be the next to be executed. Note that this one microoperation places a functional requirement on the ALU – it must implement an addition operation. We shall use the notation add to denote the ALU addition operation (and the control signal that causes that ALU operation) and the all uppercase ADD to denote the assembly language operation.

At this point, we know that there must be at least one bus internal to the CPU so that the contents of the PC can be transferred to the ALU and the incremented value copied back to the PC. We consider a one bus solution and immediately notice a problem. The ALU must have two inputs for the add operation, one for the value of the PC and one for the value 1 used to increment the PC. If we use a single bus solution, we must allow for the fact that only one value at a time may be placed on the bus. We now present a design based on the single bus assumption.

One design would add an increment primitive for the ALU, but we avoid that complexity and base our solution on the add operation only. We need a source of the constant 1, so we create a “1 register” to hold the number. We postulate a two input ALU with a register Z to hold the output. Since the bus can have only one value at a time, we must have a temporary register Y to hold one of the two inputs to the ALU. Here are the microoperations.

CP1: 1 → Bus, Bus → Y

CP2: PC → Bus, add // Result cannot be placed on bus

CP3: Z → Bus, Bus → PC // Bus is now available

We note that the single bus solution is rather slow. We would like another way to do this, preferably a faster one.

The solution we use is to have three buses in the CPU, named B1, B2, and B3. With three buses, we can put one value on each of two buses that serve as input to the ALU and copy the results on the third bus, serving as input to the PC, as follows PC → B1, 1 → B2, add, B3 → PC

37 CPSC 5155 Last Revised October 15, 2008 Copyright © 2008 by Ed Bosworth


More Implications of the Above Design We now discuss explicitly a number of issues that arise as a direct result of the desire to implement the operation to increment the PC as a single simple addition operation, with microinstructions as shown above and repeated here.

PC → B1, 1 → B2, add, B3 → PC

Timing Constraints The first requirement is that the CPU be fast enough to accomplish the operations in the time allowed. A detailed examination of a clock pulse will show the timing requirements.

Figure: Timing Imposed by a Single Clock Cycle

The figure above attempts to show the constraints. The contents of the PC are placed on bus B1 and the contents of the constant register +1 are placed on bus B2 some time after the rise of the clock pulse. Before the rise of the next clock pulse, the new contents for the PC must have been transferred into that register. Note the number of things that must happen within this clock cycle: 1. The contents of the PC and the +1 register must be placed on the two buses, 2. The ALU must have added the contents of its two input buses, 3. The ALU must have placed the results of the addition on its output bus B3, and 4. The contents of B3 must have been transferred into the PC and become stable there.

We now see where the clock rate of a computer comes from. We want the clock rate to be as high as possible so the computer can be as fast as possible. Nevertheless, the clock rate must be slow enough to allow for transfers on the buses and for computation by the ALU. As an example, suppose that the ALU requires 2 nanoseconds to complete its computation. If we allow the CPU one–half cycle to do its work, that means that the whole cycle time cannot be shorter than 4 nanoseconds, and the clock rate cannot exceed 250 megahertz.

The Use of Master–Slave Registers Note that the contents of the PC are incremented within the same clock pulse. As a direct consequence, the PC must be implemented as a master–slave flip–flop; one that responds to its input only during the positive phase of the clock. In the design of this computer, all registers in the CPU will be implemented as master–slave flip–flops.



The Three-Bus Structure As mentioned above, the design of a CPU with three internal data buses allows a more efficient design. We name the buses B1, B2, and B3. The use of these buses is as follows: B1 and B2 are input to the ALU B3 is an output from the ALU

Put another way: B3 is the source for all data going to each register. Each special–purpose register outputs data to one of bus B1 or bus B2. We allocate these registers to buses based partially on chance and partially on the requirement to avoid conflicts; if two data need to be sent to the ALU at the same time they need to be assigned to different buses. When we introduce the eight general–purpose registers, we specify that each of those can output to either bus B1 or bus B2. At times such a register feeds B1, and at other times it feeds B2.

What does the ALU require? The only way to determine what must be placed on each input bus is to examine each assembly language instruction, break it into microoperations, and allocate the bus assignments based on the requirements of the microoperations.

Common Fetch Sequence We repeat the main steps in the common fetch sequence MAR ← PC send the address of the instruction to the memory Read Memory this causes MBR ← MAR[PC] PC ← PC + 1 cannot access memory, so might as well increment the PC IR ← MBR now the instruction is in the Instruction Register.

This sequence of four microoperations gives rise to a remarkable number of requirements for both the ALU and the bus assignments. We first examined the simple microoperation PC ← PC + 1 and investigated the design implications of the requirement to execute this efficiently.

We have already noted the requirement that the ALU have an add control signal associated with the eponymous ALU primitive operation (use your dictionary). We have also noted the requirement that the ALU have two input buses and one output bus, in order to produce the output within one clock cycle.

If the ALU is to produce the sum (PC + 1) in one clock pulse, the PC and the +1 register must be allocated to different buses. The CPU has two buses for input to the ALU: B1 and B2. We allocate the PC to one and, necessarily, the +1 register to the other. We make the bus allocations as follows The PC is allocated to B1, in that it outputs an address to B1. At this moment the allocation is arbitrary.

We allocate the constant +1 to B2, because it is the other available bus. In this 32–bit design, such a register has bit 0 connected to voltage and all other bits connected to ground.

As an aside at this point, we have noted that B3 is used to transfer the results of the addition into the PC. As noted above, the complete set of control signals we have specified is PC → B1, 1 → B2, add, B3 → PC



The Primitives For Data Transfer We now consider the implication of the microoperation MAR ← PC. We have noted that the PC outputs to B1 and that B3 is used to transfer data to all registers. We now consider possibilities for transferring the contents of the PC to the MAR.

One possibility would be for a direct transfer via a data bus dedicated to communication between the Program Counter and the Memory Address Register. Experience in the design of computers and their control units has shown that a direct–connect design is overly complex (see the appendix to this chapter) and that it is better to minimize dedicated data paths and maximize the use of common buses. The design of the Boz–5 follows this approach and uses the three data buses as a shared way to communicate between most of the registers in the CPU. As mentioned earlier, these are B1, B2, and B3.

We have specified the three buses (B1, B2, and B3) in terms of their functionality for the ALU. Let us now define them as used by the registers in the CPU: 1. Buses B1 and B2 communicate data from the registers to the ALU, and 2. Bus B3 communicates data from the ALU to the registers.

Under this design approach, all transfers between any two registers must be passed through the ALU. Specifically this necessitates control signals to connect the buses that input into the ALU (B1 and B2) to the bus that outputs from the ALU (B3). This leads to the definition of ALU primitives to affect the transfer between buses.

We define the two ALU primitives for data transfer tra1 transfer the contents of B1 to B3 tra2 transfer the contents of B2 to B3.

Under this design, the only way for data to get to B3 from B1 is via the ALU. Thus, the requirement to transfer the contents of the PC to the MAR gives rise to the control signals

PC → B1, tra1, B3 → MAR

This is read as “place the PC contents on bus B1, connect bus B1 to bus B3, and then copy the contents of bus B3 into the MAR”.

Since we have mentioned the Memory Address Register, we might as well allocate it a bus so that it can send data to the ALU. We arbitrarily allocate the MAR to bus B1.

We now examine the last microoperation IR ← MBR. We assign the MBR to B2, thus requiring the tra2 primitive, already defined. At this point, we review what we have discovered from these four microoperations by converting them to control signals.

MAR ← PC PC → B1, tra1, B3 → MAR Read Memory READ PC ← PC + 1 PC → B1, 1 → B2, add, B3 → PC IR ← MBR MBR → B2, tra2, B3 → IR

For reasons that will become obvious later, we assign the IR to the bus not assigned to the MBR. As the MBR outputs to bus B2, we allocate the IR to bus B1.



Notation for Control Signals Microoperations correspond to basic steps in program execution that can be executed in one clock pulse. Control signals correspond to those discrete signals that actually cause the microoperations to have effect. We discussed the difference above, when we mentioned the possibility of a control signal IR ← MBR to implement the microoperation IR ← MBR. Control signals are named for the action that each enables; microoperations may correspond to a sequence of control signals that all can be asserted in parallel during one clock pulse.

Consider the following three control signal sequences. They are identical, in that each has the same interpretation and causes the same actions to take place.

MBR → B2, tra2, B3 → IR. B2 ← MBR, tra2, IR ← B3. IR ← B3, tra2, B2 ← MBR.

We use whatever notation that is most convenient. This author prefers the first notation, and will use it almost exclusively. Students may use any of the tree, if the use is consistent.

A First Look At The CPU and Its Buses We now look at the CPU design as it has evolved to this point in response to the requirements imposed by the common fetch sequence.

Figure: Partial CPU Design

Note that the buses B1 and B2 are shown as input to the ALU and that the divided bus B3 is shown as output from the ALU. The convention of drawing bus B3 this way, coming down from the ALU and dividing into two parts, is a convention to facilitate drawing the figures and has no particular significance otherwise.



Another Look at the IR (Instruction Register) We now note that the IR does not communicate with bus B1 in the same way as other registers communicate with the bus structure. In order to understand this difference, we must examine the structure of the IR; specifically what data are placed into it.

Figure: Different Allocations of Bits in the Instruction Register

At this point, the important fact is that only the low order 20 bits are transferred to bus B1. This is due to the fact that only the low order 20 bits are interpreted as an address or data; other bits signify the op–code and other control information, such as register selection. In other words, the only part of the Instruction Register that is passed to the bus system is that part that is used in address computation or as data for the immediate operands. The bits that are used to determine the operation and select registers are passed directly to the control unit.

The reader will note that bits 19 through 17 of the IR are sent to both bus B1 and to the control unit. This is not a duplication, but a simplification in the design. When those bits are used as an address part, the control unit will make no use of them. When they are used by the control unit, they will specify a register number in an instruction that does not use addresses. Bottom line: we may use bits in a register for several distinct purposes.

We now address the issue of how to transfer 20 bits via a 32–bit bus. There are two options: as a sign extended 20–bit two’s–complement integer, or as 32 individual bits with the 20 high order bits set to 0. In order to understand this decision, we examine the seven instructions that will involve one of these transfers. The instructions are the following. LDI Load the (sign extended) value of IR19-0 into the 32–bit register. This allows loading negative values in the range ( – 219) to ( – 1). ANDI Use the 20 bits in IR19-0 as a 20–bit Boolean mask for logical AND with the contents of the 32–bit register. At present, this is not sign–extended. ADDI Add the (sign extended) value of IR19-0 to the 32–bit register. This allows subtraction of constant numbers. LDR Use the unsigned value of IR19-0 to compute a memory address. STR Use the unsigned value of IR19-0 to compute a memory address. BR Use the unsigned value of IR19-0 to compute a memory address. JSR Use the unsigned value of IR19-0 to compute a memory address.



We use a control signal “Extend” to determine how to interpret the 20 low–order bits found in the Instruction Register. The interpretation of this signal is as follows: 1) If Extend = 1, the value of IR19-0 is treated as a 20–bit two’s–complement integer and sign extended into a 32–bit two’s–complement integer. 2) If Extend = 0, the value of IR19-0 is treated as a 20–bit unsigned integer and 0000 0000 0000 ¢ IR19-0 is transferred to the bus.

Figure: Communicate the IR to the Bus

General Purpose Register File We now add the eight general purpose registers to the mix, specifying that each can feed either bus B1 or bus B2. Note that constant register %R0 has no input from bus B3.

Figure: Add the General Purpose Registers



Add The Other Registers

Before we continue, it is prudent to add the other registers to the bus diagram of the CPU. The other registers are introduced now because this author cannot think of a better place to do it. Each of the new registers will be explained at the appropriate time, although all have been discussed briefly in the chapter on the Instruction Set Architecture.

Figure: The Complete Register Set of the Boz–5

There are four new registers introduced here. SP the Stack Pointer, used in calling subroutines and returning from them. Subroutine calls will PUSH the return address onto the stack, and subroutine returns will POP the return address from the stack. Future revisions in this design might add user–callable PUSH and POP to the Instruction Set Architecture.

– 1 the “minus one” constant register is used to decrement the SP on POP.

+ 1 the “plus one” constant register is used to increment the SP (Stack Pointer) on PUSH and to increment the PC (Program Counter) during the fetch cycle.

IOA the 16–bit address used to select the I/O register.

IOD the 32–bit register used for I/O data, either input or output.

We are about to discuss addressing modes as used to access computer memory. In the current design, these do not apply to I/O device registers, which are directly addressed.



Two Addressing Modes: Direct and Indexed We shall soon consider all four addressing modes. For now, we consider the impact of two of the addressing modes on the CPU design. Recall that the address part of the register load and store instructions occupies the lower 20 bits: bits 19 through 0 inclusive. When a LDR (Load Register) or STR (Store Register) instruction is copied into the Instruction Register, the address part is IR19–0. In direct addressing, this is the address to use. In indexed addressing, the address to use is IR19–0 + (R), where (R) denotes the contents of the register specified in IR22-20 to be used as an index register. These addresses go to the MAR, thus Direct Addressing MAR ← IR19–0 Indexed Addressing MAR ← IR19–0 + (R)

At this point, we mention the trick with register 0, actually a standard design practice. Consider the above two descriptions, slightly rewritten. IR22IR21IR20 = 000 MAR ← IR19–0 + 0 IR22IR21IR20 = 001 MAR ← IR19–0 + (%R1) IR22IR21IR20 = 010 MAR ← IR19–0 + (%R2) IR22IR21IR20 = 011 MAR ← IR19–0 + (%R3) IR22IR21IR20 = 100 MAR ← IR19–0 + (%R4) IR22IR21IR20 = 101 MAR ← IR19–0 + (%R5) IR22IR21IR20 = 110 MAR ← IR19–0 + (%R6) IR22IR21IR20 = 111 MAR ← IR19–0 + (%R7)

The trick is to define register %R0 as a constant register containing the constant value 0. With this new design consideration, the microoperation MAR ← IR19–0 becomes the same as MAR ← IR19–0 + (%R0). The advantage of this trick is that the control unit is considerably simplified – always a good thing. As a result, we have only two design options at the control signal level: indexed and indexed-indirect. The effect is given in the following table.

Indexed by %R0 Indexed by another register

Indirection Not Used, IR26 = 0 Direct Indexed

Indirection Used, IR26 = 1 Indirect Indexed-Indirect

Attaching the General-Purpose Registers to the Three Buses The next step here is to decide how to attach the general purpose registers to the bus structure. To do this, we use selectors and control signals. The selectors are three-bit signals generated based on bits in the Instruction Register. B1S Bus 1 Source, a 3-bit selector specifying the register to place on bus B1 when the control signal R → B1 is asserted by the control unit.

B2S Bus 2 Source, a 3-bit selector specifying the register to place on bus B2 when the control signal R → B2 is asserted by the control unit.

B3D Bus 3 Destination, a 3-bit selector specifying the register to copy the contents of bus B3 when the control signal B3 → R is asserted by the control unit.

Here is the figure showing how the eight general purpose registers are connected to the three buses B1, B2, and B3. For simplicity, only a single bit is shown.



Figure: Connecting a Single Bit of Each Register to the Buses

Note that the enable input of the 3-to-8 decoder is connected to the signal B3 → R. When this signal is asserted, the contents of the three-bit selector signal B3D determine which register is to receive the contents of bus B3 and the clock input of each flip-flop in that register is pulsed, thus loading the register. Note that output 0 of the decoder goes nowhere, corresponding to the fact that register %R0 is a constant register that cannot be loaded.

The three-bit selector signals B1S and B2S are always active, so that each of the two 8-to-1 multiplexers always has an output. Each of these outputs is transferred to the corresponding bus only when the corresponding control signal is asserted. For example, we might have B1S = 011, but %R3 is placed on the bus if and only if R → B1 = 1. If R → B1 = 0, either a special-purpose register, such as the IR, is being placed on bus B1 or the bus is not active.

The three–bit selectors B1S, B2S, and B3D are related to the bit fields found in the IR, but not identical to them due to the structure of the instruction set. In order to determine how to generate these three selectors, we must look at the structure of each assembly language instruction that references a general purpose register.



Generation of the Bus Select Signals B1S, B2S, and B3D We now examine the instruction set to determine how we generate the three–bit selectors B1S, B2S, and B3D. These three 3–bit selectors are associated with bits in the IR. In general, the association of these selectors with bits in the Instruction Register (IR) is quite straightforward. For many instructions, the fields are uniformly specified as follows.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 – 0 Op Code I bit Destination

Register Source

Register 2 Source

Register 1 Not used

The general rule is that B3D is determined by bits 25 – 23 of the Instruction Register B2S is determined by bits 22 – 20 of the Instruction Register B1S is determined by bits 19 – 17 of the Instruction Register

We shall soon note a number of variations on this basic format, which is based on the binary register–to–register operations. We begin with a few observations.

1) Bits 19 – 0 of the Instruction Register are used by some instructions in address computation. For these instructions, the selector B1S is not used.

2) We shall provide hardware for generating the selectors even when they are not used. This is much simpler than any restriction based on usage.

3) The instructions that do compute argument addresses can use indexed addressing, in which the contents of a general–purpose register (including %R0) are added to an address from the IR19-0 to compute an effective address. Indexed addresses will be computed using the following sequence of control signals.

IR19-0 → B1, R → B2, add, B3 → MAR.

4) The one exception to the “general rule” is the STR (Store Register) instruction, in which the register denoted by bits 25 – 23 of the IR must be used for the source register. For this instruction, bits 25 – 23 of the IR determine the value of B1S, and B3D is not used. Since the 3–bit value B3D is not used, it is also set to IR25-23.

As the last statement might seem a bit abstract and even arbitrary, we shall examine it in a bit more detail. In order to do this, we must look ahead and notice the format of the instruction.

The STR Instruction

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 0 1 1 0 1 I

bitSource

Register Index

Register Address

This is the only instruction for which bits 25 to 23 contain a source register. Normally these bits specify the destination for bus 3 (that is the selector B3D). As there is no destination register for this instruction, the control signal B3 → R is never asserted, and there is no need to suppress generation of the selector B3D. The source register must be copied to either bus B1 or bus B2, as those two buses are the only way to communicate data from a register to the ALU. However bus B2 is “claimed” by the index register, so we use bus B1.



Immediate Addressing

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 Op–Code Destination

Register Source

Register Immediate Argument

Op–Code 00000 HLT Halt 00001 LDI Load Immediate (Does not use Source Register) 00010 ANDI Immediate logical AND 00011 ADDI Add Immediate

In these instructions, the source register most commonly will be the same as the destination register. While there is some benefit to having a distinct source register, the true motivation for this design is that it simplifies the logic of the control unit. For these four instructions, the contents of IR25-23 will always be interpreted as a destination register (generate B3D) and the contents of IR22-20 will always be interpreted as a source register (generate B2S).

The most common immediate instructions will probably be the following. LDI %RD, 0 -- Load the register with a 0. LDI %RD, 1 -- Load the register with a 1 ADDI %RD, %RD, 1 -- Increment the register ADDI %RD, %RD, – 1 -- Decrement the register

A Gap in the Op–Codes Op–Codes 00100 (0x04), 00101 (0x05), 00110 (0x06), and 00111 (0x07) are presently not assigned. This gap has been introduced in order to facilitate design of the control unit.

Input/Output Instructions The design calls for isolated I/O, so it has dedicated input and output instructions. A memory–mapped I/O design would skip the GET and PUT, having dedicated I/O addresses.

Input Op-Code 01000 GET Get a 32–bit word into a destination register from an input. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 – 0 0 1 0 0 0 Destination

Register Not Used Not Used I/O

Address Output Op-Code 01001 PUT Put a 32–bit word from a source register to an output register. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 – 0 0 1 0 0 1 Not Used Source

Register Not Used I/O

Address Note that these two instructions use different fields to denote the register affected. This choice will simplify the control circuit. All unused bits are assumed to be 0, but need not be as these bits will be ignored by the control unit.



Return from Subroutine / Return from Interrupt.

31 30 29 28 27 26 – 0 Op–Code Not Used

Op–Code = 01010 RET Return from Subroutine 01011 RTI Return from Interrupt (Not presently implemented) Neither of these instructions takes an argument or uses an address, as the appropriate information is assumed to have been placed on the stack.

Memory Addressing The next four instructions (LDR, STR, JSR, and BR) can use memory addressing. The first two use the memory address for a data copy between a specific register and memory. The next two use the memory address as the target location for a jump. The generic structure of these instructions is as follows.

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 Op Code I bit Register/Flags Index Address

The contents of bits 25 – 23 depends on the instruction. The Real Reason for %R0 ≡ 0 We now discuss an addressing trick that is one of the real reasons that we have included a general–purpose register that is identically 0. What we are doing is simplifying the control unit by not having to process non-indexed addressing; that is, direct or indirect. Note that bits 22 – 20 of the IR specify the index register to be used in address calculations. When the I-bit (bit 26) is zero, we will call for indexed addressing, using the specified register. Thus the effective address is given by EA = Address + (%Rn), where %Rn is the register specified in bits 22 – 20 of the IR. But note the following If Bits 22 – 20 = 0, we have %R0 and EA = Address + 0, thus a direct address. When the I-bit is 1, we have the same convention. Indexed by %R0, we have indirect addressing, and indexed by another register, we have indexed-indirect addressing. The “bottom line” on these addresses is shown in the table below.

IR22-20 = 000 IR22-20 ≠ 000 IR26 = 0 Indexed by %R0 (Direct) Indexed IR26 = 1 Indirect, indexed by %R0 (Indirect) Indexed-Indirect



Load Register

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 0 1 1 0 0 I

bitDestination

Register Index

Register Address

Here the I bit can be considered part of the opcode, if desired. 011000 Load the register using direct or indexed addressing 011001 Load the register using indirect or indexed-indirect addressing For a load register operation, bits 25 – 23 specify the destination register. If the destination register is %R0, no register will change value. While this seems to be a “no operation”, it does set the condition codes in the PSR and might be used solely for that effect. We note here that such a programming trick is recommended in a number of text books.

Store Register

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 0 1 1 0 1 I

bitSource

Register Index

Register Address

Here the I bit can be considered part of the opcode, if desired. 011010 Store the register using direct or indexed addressing 011011 Store the register using indirect or indexed-indirect addressing. For a store register operation, bits 25 – 23 specify the source register. If the source register is %R0, the memory at the effective address will be cleared.

Subroutine Call

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 0 1 1 1 0 I bit Not Used Index

Register Address

Here the I bit can be considered part of the opcode, if desired. 011100 Call subroutine using direct or indexed addressing 011101 Call subroutine using indirect or indexed-indirect addressing

An earlier design of this computer used conditional subroutine calls, with bits 25 – 23 of the instruction specifying a condition, as they do for the BR instruction. This was rejected as both overly complex and not reflected in the design of commercial computers. All JSR instructions are unconditional; the subroutine is always called.



Branch

31 30 29 28 27 26 25 24 23 22 21 20 19 – 0 0 1 1 1 1 I bit Branch

Condition Index

Register Address

Here the I bit can be considered part of the opcode, if desired. 011110 Branch using direct or indexed addressing 011111 Branch using indirect or indexed-indirect addressing The branch condition code field determines under which conditions the Branch instruction is executed. The eight possible options are. Condition Action 000 Branch Always (Unconditional Jump) 001 Branch on negative result 010 Branch on zero result 011 Branch if result not positive 100 Branch if carry–out is 0 101 Branch if result not negative 110 Branch if result is not zero 111 Branch on positive result

The alert reader will note that most of the condition codes come in pairs; with one exception condition code “1xy” specifies the opposite of condition code “0xy”. This facilitates the design of the hardware to generate the signal “Branch” that will actually determine if the branch is to be taken.

Some authors have taken this symmetry to an extreme, thus having condition 000 for “branch always” and condition “100” for “Not (branch always)”; i.e., “branch never”. The designer of this computer has dismissed the “branch never” instruction as nonsense, and looked around for another useful condition. The best he can do is to select a condition that will facilitate multiple–precision arithmetic.

We shall here anticipate a design decision that will speed up the CPU. There are two options for conditional branches: either the branch is to be taken or it is not to be taken. This will depend on the value of a signal, called “Branch”, that will be generated from the status bits in the PSR (Program Status Register) and the condition codes, listed above.

If Branch = = 1, the branch is always taken. This is always true for condition code 0.

If Branch = = 0, the branch is not taken. This can be the case when the condition code is not 0 and the condition required for branching is not satisfied. When this is the case, the control unit will proceed to fetch the instruction following the branch instruction, and not waste cycles computing an address that is guaranteed not to be used.

We shall see that this action is controlled by the Major State Register, which will be defined in due time.



Binary Register-To-Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 – 0

Op–Code Destination Register

Source Register 2

Source Register 1

Not used

Here the bits IR25-23 specify a destination register and each of IR22-20 and IR19-17 specify a source register. Here the assignments appear obvious: B3D = IR25-23, B2S = IR22-20, and B1S = IR19-17.

Note that subtraction with the destination register set to %R0 becomes a comparison to set the condition codes for a future branch operation.

Opcode = 10101 ADD Addition 10110 SUB Subtraction 10111 AND Logical AND 11000 OR Logical OR 11001 XOR Logical Exclusive OR

Unary Register-To-Register

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 – 0

Op–Code Destination Register

Source Register

Shift Count Not Used

Here bits IR25-23 specify a destination register and IR22-20 specify a source register. In previous instructions, we have used IR22-20 to specify the control B2S, so we continue the practice. Thus we have B3D = IR25-23 and B2S = IR22-22.

Note that bus B1 is not used by these instructions. To simplify the control unit, we arbitrarily make the assignment B1S = IR19–17, even though the assignment will not be used.

Opcode = 10000 LLS Logical Left Shift 10001 LCS Circular Left Shift 10010 RLS Logical Right Shift 10011 RAS Arithmetic Right Shift 10100 NOT Logical NOT (Shift count ignored)

NOTES: 1. If (Count Field) = 0, a shift operation becomes a register move. 2. If (Source Register = 0), the operation becomes a clear. 3. Circular right shifts are not supported, because they may be implemented using circular left shifts. A right circular shift by N bits (0 ≤ N ≤ 31) may be implemented as a circular left shift by (32 – N) bits. No bits are lost. 4. The shift count, being a 5 bit number, has values 0 to 31 inclusive. 5. When the control unit is processing the NOT signal, bits 19 – 0 of the IR are ignored. Specifically, the field called “shift count” is not used. 6. The use of a variable or register to hold the shift count is not supported by this microarchitecture. Use a looping structure with repeated shifts to do this.



Summary The following table summarizes the requirements levied by the instructions on the generation of the control signals B1S, B2S, and B3D.

B1S B2S B3D HLT LDI IR25-23

ANDI IR22-20 IR25-23ADDI IR22-20 IR25-23GET IR25-2PUT IR22-20 LDR IR22-20 IR25-23STR IR25-23 IR22-20 BR IR22-20 JSR IR22-20 RET RTI

Unary Register IR22-20 IR25-23Binary Register IR19-17 IR22-20 IR25-23

We now display a circuit that is compatible with these requirements.

Figure: Generation of Selectors From the IR

Note that B1S = IR25-23 for IR31-27 = 01101 and B1S = IR19-17 otherwise. This will give a value to B1S for a number of instructions that do not use bus B1, but this causes no trouble and yields a simpler control unit. Note that we always have B2S = IR22-20 and B3D = IR25-23.



A Clarification The figure above is a bit busy, so we shall give two different simplifications, one for the STR instruction and one for other instructions.

STR Op–Code = 01101 Here is the effective circuit when IR31-27 = 01101. The selector B3D is not used as the control signal B3 → R is not asserted.

Other Op–Codes Here is the effective circuit for other instructions.



Major States vs. Minor States In this version of the design, the computer will have a control unit for the CPU based on three major states: Fetch, Defer, and Execute. We shall present two designs for the control unit: hardwired and microprogrammed. The hardwired control unit will be based on the major states, each containing four minor states, labeled T0, T1, T2, and T3. In the microprogrammed control unit, the major states will represent logical divisions of the microcode and the minor states will be present only by implication. The design will focus on “single state” execution, meaning that most instructions will execute in the “Fetch” major state, with only the memory-referencing instructions requiring Defer and Execute.

Control Signals We now present a discussion of the control signals for each of the instructions. We begin with a discussion of the common fetch control signals. F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR)

In the above, the student should recall that the parentheses indicate the contents of a register. The notation is perhaps redundant, but we use “(PC)” to refer to the contents of the PC.

At this point, the control unit will attempt to execute the instruction during the T3 phase of the Fetch major state. The only instructions that cannot be executed in this time slot are those four instructions that reference memory: LDR memory address of the argument to be copied into a general-purpose register, STR memory address to receive the contents of a general-purpose register, BR memory address indicating the next instruction for execution, and JSR memory address indicating the location of the subroutine.

For these three instructions only, the Fetch state is defined fully as follows. F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: 000000000000 ¢ IR19-0 → B1, R → B2, add, B3 → MAR.

The operation in F, T3 is the concatenation operator. Here twelve zeroes are appended to the 20-bit address from the IR to produce a full 32-bit address with the twelve high-order bits all set to 0. The hardware has been designed to append these 0 bits during the transfer.

Defer State For these four instructions only, the control unit may cause execution of a Defer state if the “I bit” – IR26 is set to 1. Here is the uniform code for the defer state. The reader will note the two WAIT states. This is due to the fact that our design calls for four minor states per major state and there is nothing else to do in the defer state. D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT.



Control Signals for the Boz-5 The control signals are listed in numeric order by Op-Code, with some general comments added as necessary to clarify the control signals.

HLT Op-Code = 00000 (Hexadecimal 0x00) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: 0 → RUN. // Reset the RUN Flip-Flop

LDI Op-Code = 00001 (Hexadecimal 0x01) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, extend, tra1, B3 → R. // Copy IR19-0 as signed integer

In the next instructions, the source register most commonly will be the same as the destination register. While there is some benefit to having a distinct source register, the true motivation for this design is that it simplifies the logic of the control unit.

ANDI Op-Code = 00010 (Hexadecimal 0x02)

F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, and, B3 → R. // Copy IR19-0 as 20 bits. // The 20 bits IR19-0 are copied without extension, so we have in reality // 0000 0000 0000 ¢ IR19-0 → B1. This may be changed in a future design.

ADDI Op-Code = 00011 (Hexadecimal 0x03) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, extend, add, B3 → R. // Add signed integer

A Gap in the Op–Codes Op–Codes 00100 0x04 00101 0x05 00110 0x06 00111 0x07 are presently not assigned.



The next two instructions will have immediate action with regard to the Input/Output devices. These two instructions should be used only after the status of the I/O device has been tested and the device found to be ready for an I/O transaction.

At present the I/O Address Register, IOA, is a 16–bit register. In the transfer from the 32–bit bus B3, denoted by B3 → IOA, only the 16 low order bits of the bus are copied.

The reader will note that (F, T3) for each of these instructions is a WAIT or No–Operation. This choice is made to isolate the I/O–specific code to the Execute phase. The reader will also note that neither instruction uses the Defer phase. This is due to the simplicity of generation of addresses for the I/O device registers; just put the value into IR15-0.

The observant reader will also note that neither of these instructions is particularly sophisticated, in that neither performs a number of important checks. In particular, the GET operation will input from the addressed register without regard to two important items: 1) that the register actually exists and is an input register, and 2) that the register actually has fresh data in it.

Similarly, the PUT operation will attempt to output data to nonexistent registers or registers that are for input only. In addition, there is no interlock to prevent this instruction from overwriting data previously sent out and not yet processed by the output device.

GET Op-Code = 01000 (Hexadecimal 0x08) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: WAIT.

E, T0: IR → B1, tra1, B3 → IOA. // Send out the I/O address E, T1: WAIT. E, T2: IOD → B2, tra2, B3 → R. // Get the results. E, T3: WAIT.

PUT Op-Code = 01001 (Hexadecimal 0x09) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: WAIT.

E, T0: R → B2, tra2, B3 → IOD // Get the data ready E, T1: WAIT. E, T2: IR → B1, tra1, B3 → IOA. // Sending out the address E, T3: WAIT. // causes the output of data.

The timing assumptions for the PUT operation may soon be revised, but for the moment it is assumed that data are placed into the output data register as soon as its address is placed into the register IOA, and thus onto the I/O address bus.



Subroutine Call and Return The Boz–5 provides the stack–based mechanisms for subroutine call and return that are required to support recursive subroutine and function calls. A full implementation (yet to be designed) would provide for pushing arguments onto the stack prior to subroutine call and popping them from the stack after the return.

If function calls are implemented, functions will return values by use of a dedicated register to hold either the return value or the address of a data structure used to return the values. In this, the design follows that used by the CDC–6400 and CDC–7600.

At this point, the reader might ask why the RET (and associated RTI) instruction are defined before the JSR instruction. Again, the answer lies in the design of the Major State Register. The key feature, which we might as well admit now, is that the four instructions (GET, PUT, RET, and RTI) that execute in Fetch and Execute, without ever entering Defer, all have the prefix “010” for their op–codes.

RET Op-Code = 01010 (Hexadecimal 0x0A) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: WAIT

E, T0: SP → B1, – 1 → B2, add, B3 → SP. // Decrement the SP E, T1: SP → B1, tra1, B3 → MAR, READ. // Get the return address E, T2: WAIT. E, T3: MBR → B2, tra2, B3 → PC. // Put return address into PC

RTI Op-Code = 01011 (Hexadecimal 0x0B) Not yet implemented.

This will not be implemented until a consistent interrupt strategy is designed and implemented.



LDR Op-Code = 01100 (Hexadecimal 0x0C) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing.

Here the major state register takes control. 1) If the I–bit (bit 26) is 1, then the Defer state is entered. 2) If the I–bit is 0, then the E state is entered.

D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT.

Here the transition is automatic from the D state to the E state. E, T0: READ. // Again, address is already in the MAR. E, T1: WAIT. E, T2: MBR → B2, tra2, B3 → R. E, T3: WAIT.

STR Op-Code = 01101 (Hexadecimal 0x0D) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing.


E, T0: WAIT. E, T1: R → B1, tra1, B3 → MBR, WRITE. E, T2: WAIT. E, T3: WAIT.

We have two comments about the execute phase of the above instruction. 1) In (E, T1), the register feeds bus 1, as bus 2 is allocated to the index register. 2) The sequence of micro–operations in (E, T1) could have been done in any of (E, T0), (E, T1), or (E, T2). The requirement of a one cycle “slack time” after a memory write requires that it be done no later than (E, T2). It is done in T1 to facilitate design of the control signal generation tree.



JSR Op-Code = 01110 (Hexadecimal 0x0E) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing.


// At this point, the MAR has the target address for the subroutine. // the SP points to the top of the stack. // the PC contains the return address.

E, T0: PC → B1, tra1, B3 → MBR. // Put return address in MBR E, T1: MAR → B1, tra1, B3 → PC. // Set up for jump to target. E, T2: SP → B1, tra1, B3 → MAR, WRITE. // Put return address on stack. E, T3: SP → B1, 1 → B2, add, B3 → SP. // Bump SP for the next PUSH.

Now the Program Counter contains the address of the first instruction in the subroutine and the memory at the top of the stack contains the return address. The Stack Pointer contains the address into which the next address will be placed. M[SP – 1] has the return address.

BR Op-Code = 01111 (Hexadecimal 0x0F) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing.

Here the Major State Register takes control. If the control signal Branch = 1, then the following is executed. If the control signal Branch = 0, the next instruction is fetched.


E, T0: WAIT. E, T1: WAIT. E, T2: WAIT. E, T3: MAR → B1, tra1, B3 → PC.

Placing an address into the Program Counter causes the instruction at that address to be the next one executed. This is always the way that a branch to a new address is implemented.



Setting the Branch Condition Signals from the PSR are input into an 8–to–1 MUX that uses the branch condition bits to select which signal is to be passed to the single discrete “Branch”. The branch is taken if and only if Branch = 1. This signal is used by the Major State Register to determine the next state. If the state following Fetch is also Fetch, the instruction immediately following the BR is fetched into the Instruction Register and executed; the branch is not taken.

To clarify what will become obvious when we completely discuss the Major State Register, the BR instruction enters the Execute State (possibly following the Defer State) if and only if the signal Branch = 1; that is, if the branch condition specified by IR25-22 is satisfied. If the branch condition is not satisfied, there is no reason to devote clock cycles to the computation of an address that will not be used. As we have a simple mechanism to avoid this extra work, we elect to use it. It is also the case that the results of (F, T3) are not used when the branch condition is not satisfied, but there is no easy way to cut that step short.

Why Use The Signal “Branch”? As indicated above, the use of the signal “Branch” is simple: if it is asserted the branch is taken and if it is not, the branch is not taken and the instruction immediately following the branch instruction is executed. We now explain the use of the multiplexer to generate the single signal “Branch” from the branch condition codes (IR25-22) and the PSR status bits.

The motivation for use of the one signal “Branch” is a desire to reduce the complexity of the control unit. Other designs with which this author is familiar have three separate control signals (“BGT”, “BEQ”, and “BLT”), each of which requires dedicated logic to test it. This results in a proliferation of logic gates for the signal generation tree and more microcode instructions for the microprogrammed implementation; in short a more complex design.

This author greatly favors simplicity in the design of the control unit. As a result, we are using the simpler implementation with the use of one multiplexer (an easy design) and one signal being sent to the control logic.



Unary Register-To-Register These instructions take the contents of one register as input (hence the name “unary”) and copy the result to another register, possibly the same as the source register. Four of these instructions use the barrel shifter for effect. There are four control signals for the shifter.

shift causes the barrel shifter to be activated. R/L if 0, a right shift is taken; if 1, a left shift is taken. C if C = 1 the shift is circular A if C = 0 and A = 1, the shift is arithmetic.

The structure of the barrel shifter is shown below. The lines labeled “Control Signals” refer to the four control signals defined just above.

Figure: The Barrel Shifter

Here are the control signals, listed by instruction. Note that the Shift Count register is hardwired to bits 19 – 15 of the Instruction Register and available for use by the shifter. In the figure above, the 32-bit input to the shift register is indicated by X31-0 and the 32-bit output by Y31-0. We shall discuss the barrel shifter and its connection to the rest of the Arithmetic-Logic unit when we discuss the design of the ALU.



LLS Op-Code = 10000 (Hexadecimal 0x10) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B2, shift, R/L = 1, A = 0. C = 0, B3 → R.

LCS Op-Code = 10001 (Hexadecimal 0x11) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B2, shift, R/L = 1, A = 0. C = 1, B3 → R.

RLS Op-Code = 10010 (Hexadecimal 0x12) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B2, shift, R/L = 0, A = 0. C = 0, B3 → R.

RAS Op-Code = 10011 (Hexadecimal 0x13) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B2, shift, R/L = 0, A = 1. C = 0, B3 → R.

NOT Op-Code = 10100 (Hexadecimal 0x14) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B2, not, B3 → R.

As noted in above, the negate instruction is syntactic sugar, implemented as subtraction from the constant register %R0 ≡ 0. One has two choices other than implementing both subtract and negate as ALU primitives – either to implement the negate and convert subtraction to adding the negated value (thus A – B = A + ( – B) ), or implement the subtract and have negation as subtraction from 0 (thus – B = 0 – B). This design opts for the latter.



Binary Register-To-Register These instructions take the contents of two source registers as input (hence the name “binary”) and copy the result to a destination register. The design allows for the two source registers to be the same and either or both of the source registers to be the same as the destination register. Here are the control signals for these operations.

ADD Op-Code = 10101 (Hexadecimal 0x15) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B1, R → B2, add, B3 → R.

SUB Op-Code = 10110 (Hexadecimal 0x16) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B1, R → B2, sub, B3 → R.

AND Op-Code = 10111 (Hexadecimal 0x17) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B1, R → B2, and, B3 → R.

OR Op-Code = 11000 (Hexadecimal 0x18) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B1, R → B2, or, B3 → R.

XOR Op-Code = 11001 (Hexadecimal 0x19) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: R → B1, R → B2, xor, B3 → R.



Top-Level View of the Arithmetic-Logic Unit

Before we begin the design of the ALU, let us recall that we have seen hints of how it must be organized. In the definition of the assembly language, presented in chapter 7 of this text, we hinted that the ALU would be divided into a number of execution units. In our analysis of the assembly language instructions and translation into control signals, we have specified a number of functions required of the ALU. Let’s list what we require of the ALU.

Function Reason add Need to perform addition. First seen in the need to update the PC, this also supports the ADD assembly language instruction. tra1 Transfer bus B1 contents to bus B3 tra2 Transfer bus B2 contents to bus B3. shift Needed to activate the barrel shifter not Needed to support the assembly language instruction NOT. sub Needed to support the subtract instruction SUB. or Needed to support the assembly language instruction OR. and Needed to support the assembly language instruction AND. xor Needed to support the assembly language instruction XOR.

As indicated above, the ALU will be designed as a collection of functional units, each of which is responsible for the complete execution of only a few machine instructions.

As another study in preparation for the design of the ALU, let us look at the source of data for each of the nine ALU primitives. This study will assist in allocating the primitives to functional units of the Arithmetic Logic Unit. This table has been populated by surveying the control signals for the machine instructions and placing an “X” in the column for an ALU primitive whenever it uses a given bus as a source.

Source tra1 tra2 shift not add sub or and xor

B1 X X X X X X

B2 X X X X X X X X

Table: ALU Primitives associated with data sources.



The above two analyses indicate a simple division of the ALU into four primitive units. TRA / NOT this handles the tra1, tra2, and not primitives, SHIFT this is the barrel shifter; it handles the shift primitives, ADD / SUB this handles addition and subtraction, and LOGICAL this handles the logical operations: or, and, xor.

Here then is the top-level ALU design.

Note that each of buses B1 and B2 feed all of the units except the barrel shifter, which is fed only by bus B3. All units output on bus B3 and are connected by tri-state units, so that bus conflicts do noccur. For this first unit, we have tra1 input from B1 tra2 input from B2 not input from B1

ot

The Logical Unit contains those Boolean functions that take two inputs. These are AND, OR, and XOR. Although the NOT is also a Boolean function, it is more easily placed in the first unit.

The Adder/Subtractor unit is also a binary unit and might be placed with the Logical Unit, except that such a design would appear more complicated than necessary. We design this unit as a standalone module.

The Barrel Shifter accepts input only from bus B2. We have seen its design earlier, with the input labeled X31-0 and the output labeled Y31-0.

Figure: Top-Level ALU Design. The above control signals are generally required to be mutually exclusive in order for the ALU to function correctly. Of the set {tra1, tra2, not, or, and, xor, add, sub, shift} at most one may be active during any clock pulse or the ALU will malfunction. The three shift mode selectors (A, C, and R/L ) may be asserted in any combination (though A = 1 and C = 1 is arbitrarily changed to A = 0 and C = 1) and have effect only when shift = 1.



The TRA / NOT Unit The design of this unit is particularly simple. We present the design for a single bit and note that the complete design just replicates this 32 times. In this and other figures B1K refers to bit K on bus B1, B2K to bit K on bus B2, and B3K to bit K on bus B3, with 0 ≤ K ≤ 31. Note the extensive use of tri-state buffers to connect output to bus B3.

Figure: The TRA/NOT Unit

The Binary Logical Unit The design of this unit is also particularly simple. Again, we show the design for just one bit and note that the complete design replicates this 32 times.

Figure: The Binary Logical Unit



The Add/Subtract Unit To avoid additional complexity, we implement this unit as a 32-bit ripple-carry unit.

Figure: High and Lower Order Bits of an Adder/Subtractor, with Overflow Bit V

The Barrel Shifter Unit All we do here is to attach the barrel shifter to buses B2 and B3.

Figure: The Barrel Shifter Unit



The State Registers We now consider the two state registers used in the hardwired control unit. These are the minor state register (responsible for the four pulses T0, T1, T2, and T3) and the major state register (responsible for transitions between the three major states: F, D, and A).

The Minor State Register The minor state register will be a modulo-4 counter, implemented as a one-hot design using four D flip-flops. Because of the close association, we shall also show the RUN flip-flop.

Figure: The Minor State Register with the Run Flip-Flop

Note that when RUN = 0, the system clock does not reach the minor state register, which remains frozen in its last state until the system is restarted. The Start signal is used to reset the minor state register to T0 = 1. The state register is a four-bit circular shift register.

This style of shift register is called “one hot” because, at any given time, only one flip–flop has value 1 and the rest are set to 0. A design with two flip–flops and a 2–to–4 decoder could perform an equivalent function, though with time delays for the state decoding.

The Major State Register The function of the major state register is to control the execution state of the machine language instructions. The current design has 3 major states: Fetch, Defer, and Execute. The design of this register is simplified by the fact that almost all of the instructions execute in the Fetch cycle. Only eight instructions (GET, PUT, RET, RTI, LDR, STR, BR, and JSR) even enter the Execute state, much less the Defer state. Recall that GET, PUT, RET, and RTI cannot enter the Defer stage and that the others enter it only if IR26 = 1.



In the next table, we examine these instructions closely to determine patterns that will be of use in defining the Major State Register. For all other instructions, the state after Fetch is Fetch again; the instruction completes execution in one major cycle and the next is fetched.

IR31 IR30 IR29 IR28 IR27 IR26 = 0 IR26 = 1 GET 0 1 0 0 0 Execute PUT 0 1 0 0 1 Execute RET 0 1 0 1 0 Execute RTI 0 1 0 1 1 Execute LDR 0 1 1 0 0 Execute Defer STR 0 1 1 0 1 Execute Defer JSR 0 1 1 1 0 Execute Defer BR 0 1 1 1 1 Execute if Branch = 1,

Fetch Otherwise Defer if Branch = 1,

Fetch Otherwise We define two generated control signals, S1 and S2, as follows: 1. If the present state is Fetch and S1 = = 0, the next state will be Fetch. If the present state is Fetch and S1 = = 1, the next state is either Defer or Execute.

2. If the present state is Fetch, S1 = = 1, and S2 = = 0, the next state will be Execute. If the present state is Fetch, S1 = = 1, and S2 = = 1, the next state will be Defer.

3. Automatic rule: If the present state is Defer, the next state will be Execute.

4. Automatic rule: If the present state is Execute, the next state will be Fetch.

This leads to the following state diagram for the Major State Register.

Figure: State Diagram for the Major State Register

A three–state diagram requires two flip–flops for its implementation. To begin this design, we assign two–bit binary numbers, denoted Y1Y0, to each of the major states.

State Y1 Y0F 0 0 D 0 1 E 1 0



The easiest way to implement this design uses two D flip–flops, with inputs D1 and D0. We are now left with only two questions: 1. How to generate the two inputs D1 and D0 from S1, S2, Y0, and Y1. 2. How to generate S1 and S2 from the op–codes.

It will be seen below that the circuitry to generate these signals is quite simple. We first ask ourselves how it came to be so simple when it had the possibility of great complexity. To see what has happened, we examine the evolution of the op–codes for the first 12 instructions.

Op-Code Version 1 Version 2 Version 3 Version 4 00 000 HLT HLT HLT HLT 00 001 LDI LDI LDI LDI 00 010 ANDI ANDI ANDI ANDI 00 011 ADDI ADDI ADDI ADDI 00 100 GET 00 101 PUT 00 110 LDR 00 111 STR 01 000 BR GET GET GET 01 001 JSR PUT PUT PUT 01 010 RET LDR RET RET 01 011 RTI STR RTI RTI 01 100 BR LDR LDR 01 101 JSR STR STR 01 110 RET BR JSR 01 111 RTI JSR BR

In each of these designs, the four “immediate instructions” are allocated the first 4 op–codes, numbered 0 through 3. The original idea was that all such instructions would have op–codes beginning with “000”. This was a good idea, but has yet to be exploited in these designs.

Version 1 of the list of instructions just presented the instructions in the way the author thought them up. The instructions were considered to exist in four groups: GET & PUT; LDR & STR; JSR, RET, & RTI; and BR. They were listed in that order, with the exception that the BR was listed first, because early designs did not allow for subroutine calls. This almost–random order of op–codes yielded a very messy control unit.

Version 2 of this list resulted from the observation that introducing four NOP instructions and moving the instructions beginning with GET down by four would yield the result that all instructions that could leave the Fetch state would have op–codes beginning with “01”. This decision was taken because it introduced a regularity into the pattern of op–codes and this author expected such a pattern to yield a simplification in the circuitry.

Version 3 of the list resulted from the observation that moving the RET and RTI instructions to follow GET and PUT would yield the result that those instructions that might use the Defer state all began with “011”.

1 CPSC 5155 Last Revised October 15, 2008 Copyright © 2008 by Ed Bosworth


Version 4 of the list is a minor modification of version 3. It is a result of the observation that the branch instruction, BR, is the only one that has an additional constraint on its leaving the Fetch state. It leaves Fetch if and only if the signal Branch = = 1. This is a time saving feature that avoids computation of an effective address when the branch is not going to be taken. For this reason, the BR instruction was moved to last in the list.

We now repeat the table that began this discussion and recall the definition of the two generated control signals S1 and S2.

IR31 IR30 IR29 IR28 IR27 IR26 = 0 IR26 = 1 GET 0 1 0 0 0 Execute PUT 0 1 0 0 1 Execute RET 0 1 0 1 0 Execute RTI 0 1 0 1 1 Execute LDR 0 1 1 0 0 Execute Defer STR 0 1 1 0 1 Execute Defer JSR 0 1 1 1 0 Execute Defer BR 0 1 1 1 1 Execute if Branch = 1,

Fetch Otherwise Defer if Branch = 1,

Fetch Otherwise

We define two generated control signals, S1 and S2, as follows: 1. If the present state is Fetch and S1 = = 0, the next state will be Fetch. If the present state is Fetch and S1 = = 1, the next state is either Defer or Execute.

2. If the present state is Fetch, S1 = = 1, and S2 = = 0, the next state will be Execute. If the present state is Fetch, S1 = = 1, and S2 = = 1, the next state will be Defer.

We now see the end result of modification of the op–codes: 1. Only instructions with op–codes beginning with “01” can leave Fetch 2. Only instructions with op–codes beginning with “011” can enter Defer. We now derive the equations for the generated control signals.

S1: We note that S1 is 0 when IR31IR30 ≠ “01”. We also note that S1 is 0 when IR31IR30 = “01”, if Branch = 0 and IR29IR28IR27 = “111”. We could say S1 is 1 when IR31IR30 = “01”, and either Branch = 1 or IR29-27 ≠ “111”. But IR29-27 ≠ “111” is the same as IRIRIR 272829 ++ . Given this observation, we see

S1 = IRIR 3031• •( Branch + IRIRIR 272829 ++ ).

S2: Given that this signal is used only when S1 is 1, we can proceed from two observations. 1. Only instructions with IR29 = 1 can enter the defer state. 2. The defer state is entered by these four instructions only when IR26 = 1.

S2 = IR29 • IR26

As an aside, we note that many textbooks set S2 = IR26, thus saying that all instructions for which the Indirect bit is set will enter the defer state. Our definition of S2 = IR29 • IR26 and our insistence that Defer is entered only when S1•S2 = 1 avoids traps on bad bits.



Design of the Major State Register We now have all we need to complete a design of the major state register.

1. The register will be designed using two D flip–flops, with inputs D1 and D0, and outputs Y1 and Y0. The binary encoding for these states is shown in the table.

State Y1 Y0F 0 0 D 0 1 E 1 0

2. There will be two control signals, S1 and S2, to sequence the register. If the present state is Fetch and S1 = = 0, the next state will be Fetch. If the present state is Fetch, S1 = = 1, and S2 = = 0, the next state will be Execute. If the present state is Fetch, S1 = = 1, and S2 = = 1, the next state will be Defer.

Automatic rule: If the present state is Defer, the next state will be Execute. Automatic rule: If the present state is Execute, the next state will be Fetch.

3. S1 = IRIR 3031• •( Branch + IRIRIR 272829 ++ ). S2 = IR29 • IR26

4. We note that the circuit, when operating properly, never has both D1 = 1 and D0 = 1. Thus we may say that D1 = conditions to move to Execute D0 = conditions to move to Defer

So we have the following equations: D0 = F•S1•S2 D1 = SSF 21•• + D // D = 1 if and only if in the Defer state

Figure: The Major State Register of the Boz–5

Note that the trigger for the transition between major states is T3 from the minor state register. When it is active, the minor state register continuously cycles through its states, and the major state register changes to its next state when triggered.



Instruction Decoder The function of the Instruction Decoder is to take the output of the appropriate bits of the IR (Instruction Register) and generate the discrete signal associated with the instruction. Note that the discrete signal associated with an assembly language instruction has the same name; thus LDI is the discrete signal asserted when the op-code in the IR is 000001, which is associated with the LDI (Load Register Immediate) assembly language instruction.

Figure: The Decoding of IR31-27 into Discrete Signals for the Instructions

The instruction decoder is implemented as a simple 5–to–32 decoder, in that there are five bits in the op–code and a maximum of 32 instructions. To save space outputs 26 – 31 of the decoder are not shown. Also, outputs 4 – 7 of the decoder are not connected to any circuit, indicating that these op–codes are presently NOP’s.



Signal Generation Tree We now have the three major parts of circuits required to generate the control signals. 1) the major state register (F, D, and E), 2) the minor state register (T0, T1, T2, and T3), and 3) the instruction decoder.

Common Fetch Cycle In hardwired control units, these and some other condition signals are used as input to combinational circuits for generation of control signals. As an example, we consider the generation of the control signals for the first three steps of the fetch phase. Note that these signals are common for all machine language instructions, as (F, T2) results in the placing of the instruction into the Instruction Register, from whence it is decoded.

Figure: Control Signals for the Common Fetch Sequence

This figure involves logical signal, asserted to either 0 or 1. Each output of the AND gates should be viewed also as a discrete logic signal, which when asserted as 1 causes an action (indicated by the signal name) to take place. Thus, when F = 1 and T2 = 1 (indicating that the control unit is in step T2 of the Fetch state), then the three signals MBR → B2, tra2, and B3 → IR are asserted as logic 1. The assertion of the signal MBR → B2 as logic 1 causes the contents of the MBR register to be transferred to bus B2. The assertion of signal tra2 to logic 1 causes the contents of bus B2 to be transferred through the ALU and onto bus B3. The assertion of signal B3 → IR to logic 1 causes the contents of bus B3 to be copied into the Instruction Register, also called the IR.

There is one obvious remark about the above drawing. Notice that each of the top two AND gates generates a signal labeled “PC → B1”. At some point in the design, these and any other identical signals are all input into an OR gate used to effect the actual transfer.

The reader will note that we now have terminology that must be used carefully. Consider the machine language instruction with op-code = 10101. There are 3 terms associated with this. ADD the mnemonic for the assembly language instruction associated, and ADD the discrete signal (logic 0 or logic 1) emitted by the instruction decoder, and add the discrete signal emitted by the control unit that causes the ALU to add. The first and second used of the term “ADD” are distinguished by context. Whenever the term is used as a logic signal, it cannot be the assembly language mnemonic.



Defer Cycle We now show the only other part of the signal generation tree that is independent of the machine language instruction being executed. This is the tree for signals associated with the Defer phase of execution. The reader will recall that only three instructions (LDR, STR, and BR) can enter the Defer phase, and then only when IR26 = 1. Note that there are no signals generated for T1 or T3 during the Defer phase, because nothing happens at those times.

Figure: Control Signals for the Defer Major State

The Rest of Fetch We now investigate the control signals issued during step T3 of Fetch for the rest of the instructions. We use the next table to investigate commonalities in the signal generation.

Op–Code B1 B2 B3 ALU Other IR31 IR30 IR29 IR28 IR27

0 0 0 0 0 HLT 0 → RUN 0 0 0 0 1 LDI IR R tra1 0 0 0 1 0 ANDI IR R R and 0 0 0 1 1 ADDI IR R R add 0 1 0 0 0 GET 0 1 0 0 1 PUT 0 1 0 1 0 RET 0 1 0 1 1 RTI 0 1 1 0 0 LDR IR R MAR add 0 1 1 0 1 STR IR R MAR add 0 1 1 1 0 JSR IR R MAR add 0 1 1 1 1 BR IR R MAR add 1 0 0 0 0 LLS R R shift 1, 0, 0* 1 0 0 0 1 LCS R R shift 1, 0, 1 1 0 0 1 0 RLS R R shift 0, 0, 0 1 0 0 1 1 RAS R R shift 0, 1, 0 1 0 1 0 0 NOT R R not 1 0 1 0 1 ADD R R R add 1 0 1 1 0 SUB R R R sub 1 0 1 1 1 AND R R R and 1 1 0 0 0 OR R R R or 1 1 0 0 1 XOR R R R xor

*Shift control signals: L/R’, A, and C; for Left/Right, Arithmetic, and Circular



While we certainly could focus on generation of a set of signals for each of the twenty–two instructions in the above table, we shall use the commonalities displayed by the table to simplify the signal tree considerably. We begin with a consideration of the first four instructions, as these contain an insight that will yield significant reductions in complexity.


0 0 0 0 0 HLT 0 → RUN 0 0 0 0 1 LDI IR R tra1 0 0 0 1 0 ANDI IR R R and 0 0 0 1 1 ADDI IR R R add

The bus allocations for these instructions are obvious, but worth note. The LDI instruction has no allocation for bus B2. Suppose we allocated a general–purpose register to B2. Then some random register would have its contents transferred to bus B2, only to be ignored by the ALU which is executing a tra1 instruction. Now consider the HLT instruction. Here we might also allocate registers to both bus B1 and bus B2, as the ALU is not active and does nothing with its input. Hence the table above might as well be the following.


0 0 0 0 0 HLT IR R 0 → RUN 0 0 0 0 1 LDI IR R R tra1 0 0 0 1 0 ANDI IR R R and 0 0 0 1 1 ADDI IR R R add

One might legitimately ask why not go “whole hog” and allocate a register to B3 for the HLT instruction. The answer is that such an action would cause some random register to become corrupted as it would cause data (possibly all 0’s) to be input to the selected register. It is very likely that register %R0 would be selected, resulting in a NOP, but the designer of a control unit cannot make such assumptions.

Examining the above table, we come to the following conclusions. 1) We use the ALU code to differentiate between the instructions, placing registers on buses B1 and B2 in any way that does not cause problems.

2) The rule for bus B1 is as follows: IR → B1 if IR31 = = 0 and R → B1 if IR31 = = 1. This will cause IR → B1 for the GET, PUT, RET, and RTI instructions, but that is not a problem as the ALU does nothing for these. It will cause R → B1 for the shift and NOT instructions, but that also is not problem as only bus B2 is input to these.

3) The rule for bus B2 is as follows: R → B2 always. The only instructions that do not call for such are HLT, LDI, GET, PUT, RET, and RTI. The last four do nothing in this minor cycle, and the first two are not made to be incorrect by the assignment.

4) The handling of bus B3 is trickier. If IR31-29 = = “011”, we have B3 → MAR. If either IR31 = = 1 or IR31-27 = = “00001”, “00010”, or “00011”, then B3 → R. A bit of Boolean algebra yields the condition for B3 → R as follows: ( IR31 = = 1 ) OR [ ( IR30 = = 0 ) AND ( IR29 = = 0 ) AND ( IR28 + IR27 = = 1 ) ]



Here is the complete signal generation tree for the T3 minor cycle in the Fetch major cycle. Note that the signals output from the AND gates to the right of the tree are control signals that activate actual transfers; thus “IR → B1” causes the contents of the IR (Instruction Register) to be transferred to bus B1.

Figure: Signal Generation Tree for Fetch, T3

Note that the last fourteen entries on the left side of the signal tree are all in upper case letters. Each of these is the control signal generated by the instruction decoder based on the op–code bits in the Instruction Register. The entries in lower case, to the right of the signal tree, are control signals to the ALU.



Study of the Execute Phase The reader will recall that only eight instructions access this state. These instructions are GET, PUT, RET, RTI (Not Implemented), LDR, STR, JSR, and BR. We collect the control signals used by these instructions in the execute phase in an attempt to organize our thinking prior to drawing the signal generation trees for the Execute phase.

At this point, we make two remarks that are only marginally related to this study.

1. The original design for the control signal had both the STR (store register) and BR (branch) issue their final control signal in (E, T2). This lead to the following counts for control signals in Execute: 5, 2, 6, and 2. The final signal for STR could not be later than (E, T2) so it was moved to (E, T1). The final signal for BR could occur any time in the Execute phase, so it was moved to (E, T3). This resulted in the counts: 5, 3, 4, and 3.

2. The name “Execute” for this phase is a bit awkward. The Fetch phase is so named because if fetches the instruction. The Defer phase is so named because it calculates the deferred address. The Execute phase does contain execution logic for eight of the instructions, but the majority of the instructions complete execution in the Fetch phase. There is simply no good name for this phase.

Execute, T0

GET: IR → B1, tra1, B3 → IOA. // Send out the I/O address PUT: R → B2, tra2, B3 → IOD // Get the data ready RET: SP → B1, – 1 → B2, add, B3 → SP. // Decrement the SP LDR: READ. // Again, address is already in the MAR. JSR: PC → B1, tra1, B3 → MBR. // Put return address in MBR

Execute, T1

RET: SP → B1, tra1, B3 → MAR, READ. // Get the return address STR: R → B1, tra1, B3 → MBR, WRITE. JSR: MAR → B1, tra1, B3 → PC. // Set up for jump to target.

Execute, T2

GET: IOD → B2, tra2, B3 → R. // Get the results. PUT: IR → B1, tra1, B3 → IOA. // Sending out the address LDR: MBR → B2, tra2, B3 → R. JSR: SP → B1, tra1, B3 → MAR, WRITE. // Put return address on stack. Execute, T3

RET: MBR → B2, tra2, B3 → PC. // Put return address into P JSR: SP → B1, 1 → B2, add, B3 → SP. // Bump SP BR: MAR → B1, tra1, B3 → PC.



The Execute State We now show the signal generation trees for the Execute State.

Figure: Control Signals for the Execute State

The reader should remember that the Major State Register will enter the Execute State for the BR (Branch) instruction only if the branch condition is true. If the branch is not to be taken, then execution of the BR proceeds directly to Fetch, at which time the next instruction is fetched and executed.



Micro–Programmed Version of the Control Unit We have just shown the signal generation trees for the hardwired version of the control unit. We now present the micro–programmed version of the same unit.

We begin with a summary of the control signals used. This table is just a listing of the signals. At some time later, these signals will be assigned numeric codes and the table shown in another presentation. Note that the first row in the table is unlabeled, reflecting the fact that we must allow for no activity on each of the units.

Bus 1 Bus 2 Bus 3 ALU Other

PC → B1 1 → B2 B3 →PC tra1 L / R’ MAR → B1 – 1 → B2 B3 → MAR tra2 A

R → B1 R → B2 B3 → R shift C IR → B1 MBR → B2 B3 → IR not READ SP → B1 IOD → B2 B3 → SP add WRITE

B3 → MBR sub extend B3 → IOD and 0 → RUN B3 → IOA or xor

Microcoding (microprogramming) is another way of generating control signals. Rather than generating these signals from hardwired gates, these are generated from words in a memory unit, called a micro–memory. To illustrate this concept, consider a simple micro–controller to generate control signals for bus B1.

Figure: A Sample Micro–Memory

Here we see an example, written in the style of horizontal micro–coding (soon to be defined) with one bit in the micro–memory for each of the control signals to be emitted. When the word at micro–address 105 is read into the micro–MBR (the register at the bottom), the control signals generated are PC → B1 = 0, MAR → B1 = 1, R → B1 = 0, IR → B1 = 0, and SP → B1 = 0. Thus, copying micro–word 105 into the Micro–MBR asserts MAR → B1. Similarly, copying micro–word 106 into the Micro–MBR asserts R → B1.



Horizontal vs. Vertical Micro–Code The micro–programming strategy called “horizontal microcode” allows one bit in the micro–memory for each control signal generated. We have illustrated this with a small memory to issue control signals for bus B1. There are five control signals associated with this bus, so this part of the micro–memory would comprise five–bit numbers.

A quick count from the table of control signals shows that there are thirty–four discrete control signals associated with this control unit. A full horizontal implementation of the microcode would thus require 34 bits in each micro–word just to issue the control signals. The memory width is not a big issue; indeed there are commercial computers with much wider micro–memories. We just note the width requirement.

In vertical microcoding, each signal is assigned a numeric code that is unique for its function. Thus, each of the five signals for control of bus B1 would be assigned a numeric code. The following table illustrates the codes actually used in the design of the Boz–5.

Code Signal 000 001 PC → B1 010 MAR → B1 011 R → B1 100 IR → B1 101 SP → B1

It is particularly important that a vertical microcoding scheme allow for the option that no signal is being placed on the bus. In this design we reserve the code 0 for “nothing on bus” or “ALU does nothing”, etc. The three bits in this design are placed into a 3–to–8 decoder, as shown in the figure below. Admittedly, this design is slower than the horizontal microcode in that it incurs the time penalty associated with the decoder.

Figure: Sample of Vertical Microcoding

In this revised example, word 105 generates MAR → B1 and word 106 generates R → B1.



One advantage of encoding the control signals is the unique definition of the signal for each function. As an example, consider both the horizontal and vertical encodings for bus B1. In the five–bit horizontal encoding, we were required to have at most one 1 per micro–word. An examination of that figure will show that the micro–word “10100” would assert the two control signals PC → B1 and R → B1 simultaneously, causing considerable difficulties. In the vertical microcoding example, the three–bit micro–word with contents “011” causes the control signal R → B1, and only that control signal, to be asserted. To be repetitive, the code “000” is reserved for not specifying any source for bus B1; in which case the contents of the bus are not specified. In such a case, the ALU cannot accept input from bus B1.

The design chosen for the microcode will be based on the fact that four of the CPU units (bus B1, bus B2, bus B3, and the ALU) can each have only one “function”. For this reason, the control signals for these units will be encoded. There are seven additional control signals that could be asserted in any combination. These signals will be represented in horizontal microcode, with one bit for each signal.

Structure of the Boz–5 Microcode As indicated above, the Boz–5 microcode will be a mix of horizontal and vertical microcode. The reader will note that some of the encoded fields require 3–bit codes and some require 4–bit codes. For uniformity of notation we shall require that each field be encoded in 4 bits.

The requirement that each field be encoded by a 4–bit binary number has no justification in engineering practice. Rather it is a convenience to the student, designed to remove at least one minor nuisance from the tedium of writing binary microcode and converting it to hex.

Consider the following example, taken from the common fetch sequence.

MBR → B2, tra2, B3 → IR.

A minimal–width encoding of this sequence of control signals would yield the following.

0 000 110 100 010 000 0000 0000 0000 0000 0000.

Conversion of this to hexadecimal requires regrouping the bits and then rewriting.

0000 1101 0001 0000 0000 0000 0000 0000 0000 or 0x0 D100 0000

The four–bit constant width coding of this sequence yields the following.

0000 0000 0110 0100 0010 0000 0000 0000 0000 0000 0000

This is immediately converted to 0x006 4200 0000 without shuffling any bits.

Dispatching the Microcode In addition to micro–words that cause control signals to be emitted, we need micro–words to sequence the execution of the microcode. This is seen most obviously in the requirement for a dispatch based on the assembly language op–code. Let’s begin with an observation that is immediately obvious. If the microprogrammed control unit is to handle each distinct assembly language opcode differently, it must have sections of microprogram that are unique to each of the assembly language instructions.

The solution to this will be a dispatch microoperation, one which invokes a section of the microprogram that is selected based on the 5–bit opcode that is currently in the Instruction Register. But what is called and how does it return?



The description above suggests the use of a micro–subroutine, which would be the microprogramming equivalent of a subroutine in either assembly language or a higher level language. This option imposes a significant control overhead in the microprogrammed control unit, one that we elect not to take.

The “where to return issue” is easily handled by noting that the action next after executing any assembly language instruction is the fetching of the next one to execute. For reasons that will soon be explained, we place the first microoperation of the common fetch sequence at address 0x20 in the micromemory; each execution phase ends with “go to 0x20”.

The structure of the dispatch operation is best considered by examination of the control signals for the common fetch sequence.

F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: Do something specific to the opcode in the IR.

In the hardwired control unit, the major and minor state registers would play a large part in generation of the control signals for (F, T3) and the major state register would handle the operation corresponding to “dispatch”, that is selection of what to do next. Proper handling of the dispatch in the microprogrammed control unit requires an explicit micro–opcode and a slight resequencing of the common fetch control signals. Here is the revised sequence.

F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR)

Dispatch based on the assembly language opcode

F, T3: Do something specific to the opcode in the IR.

The next issue for our consideration in the design of the structure of the microprogram is a decision on how to select the address of the micro–instruction to be executed next after the current micro–instruction. In order to clarify the choices, let’s examine the microprogram sequence for a specific assembly language instruction and see what we conclude.

The assembly language instructions that most clearly illustrate the issue at hand are the register–to–register instructions. We choose the logical AND instruction and arbitrarily assume that its microprogram segment begins at address 0x80 (a new design, to be developed soon, will change this) and see what we have. Were we to base our control sequence on the model of assembly language programming, we would write it as follows.

0x20 PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) 0x21 PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 0x22 MBR → B2, tra2, B3 → IR. // IR ← (MBR) 0x23 Dispatch based on the assembly language opcode

0x80 R → B1, R → B2, and, B3 → R. 0x81 Go to 0x20.



While the above sequence corresponds to a coding model that would be perfectly acceptable at the assembly language level, it presents several significant problems at the microcode level. We begin with the observation that it requires the introduction of an explicitly managed microprogram counter in addition to the micro–memory address register.

The second, and most significant, drawback to the above design is that it requires two clock pulses to execute what the hardwired control unit executed in one clock pulse. One might also note that the present design calls for using two micro–words (addresses 0x80 and 0x81) where one micro–word might do. This is a valid observation, but the cost of memory is far less significant than the “time cost” to execute the extra instruction.

The design choice taken here is to encode the address of the next microinstruction in each microinstruction in the microprogram. This removes the complexity of managing a program counter and the necessity of the time–consuming explicit branch instruction. Recasting the example above in the context of our latest decision leads to the following sequence.

Address Control Signals Next address 0x20 PC → B1, tra1, B3 → MAR, READ. 0x21 0x21 PC → B1, 1 → B2, add, B3 → PC. 0x22 0x22 MBR → B2, tra2, B3 → IR. 0x23 0x23 Dispatch based on IR31–IR27. ?? – We decide later

0x80 R → B1, R → B2, and, B3 → R. 0x20

Note that the introduction of an explicit next address causes the execution phase of the logical AND instruction to be reduced to one clock pulse, as desired. The requirement for uniformity of microcode words leads to use of an explicit next address in every micro–word in the micromemory. The only microinstruction that appears not to require an explicit next address in the dispatch found at address 0x23.

A possible use for the next address field of the dispatch instruction is seen when we consider the effort put into the hardwired control unit to avoid wasting execution time on a Branch instruction when the branch condition was not met. The implementation of this decision in a microprogrammed control unit is to elect not to dispatch to the opcode–specific microcode when the instruction is a branch and the condition is not met. What we have is shown below.

Address Control Signals Next address 0x20 PC → B1, tra1, B3 → MAR, READ. 0x21 0x21 PC → B1, 1 → B2, add, B3 → PC. 0x22 0x22 MBR → B2, tra2, B3 → IR. 0x23 0x23 Dispatch based on IR31–IR27. 0x20

0x80 R → B1, R → B2, and, B3 → R. 0x20

The present design places the next address for dispatch when the condition is not met in the field of the micro–word associated with the next address for two reasons:

1. This results in a more regular design, one that is faster and easier to implement.

2. This avoids “hard coding” the address of the beginning of the common fetch.



At this point in the design of the microprogrammed control unit, we have two distinct types of microoperations: a type that issues control signals and a type that dispatches based on the assembly language opcode. To handle this distinction, we introduce the idea of a micro–opcode with the following values at present.

Micro–Op Function 0000 Issue control signals 0001 Dispatch based on the assembly language opcode.

We have stated that there are conditions under which the dispatch will not be taken. There is only one condition that will not be dispatched: the assembly–language opcode is 0x0F and the branch condition is not met. Before we consider how to handle this situation, we must first address another design issue, that presented by indirect addressing.

Handling Defer Consider the control signals for the LDR (Load Register) assembly language instruction.

LDR Op-Code = 01100 (Hexadecimal 0x0C) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing.

Here the major state register takes control. 1) If the I–bit (bit 26) is 1, then the Defer state is entered. 2) If the I–bit is 0, then the E state is entered.


Here the transition is automatic from the D state to the E state. E, T0: READ. // Again, address is already in the MAR. E, T1: WAIT. E, T2: MBR → B2, tra2, B3 → R. E, T3: WAIT.

The issue here is that we no longer have an explicit major state register to handle the sequencing of major states. The microprogram itself must handle the sequencing; it must do something different for each of the two possibilities: indirect addressing is used and indirect addressing is not used. Assuming a dispatch to address 0x0C for LDR (as it will be done in the final design), the current design calls for the following microinstruction at that address.

Address Control Signals Next address 0x0C IR → B1, R → B2, add, B3 → MAR. Depends on IR26.

Suddenly we need two “next addresses”, one if the defer phase is to be entered and one to be used if that phase is not to be entered. This last observation determines the final form of the microprogram; each micro–word has length 44 bits with structure as shown below.



In this representation of the microprogram words, we use “D = 0” to indicate that the defer phase is not to be entered and “D = 1” to indicate that it should be entered. This notation will be made more precise after we explore the new set of signals used to control the sequencing of the microprogram. Here we assume no more than 256 micro–words in the control store.

Micro–Op B1 B2 B3 ALU M1 M2 D = 0 D = 1

4 bit 4 bit 4 bit 4 bit 4 bit 4 bit 4 bit 8 bit 8 bit

Notes: 1. The width of each field is either four or eight bits. The main reason for this is to facilitate the use of hexadecimal notation in writing the microcode.

2. The use of four bits to encode only two options for the micro–opcode may appear extravagant. This follows our desire for a very regular structure.

3. The use of “D = 0” and “D = 1” is not exactly appropriate for the dispatch instruction with micro–opcode = 0001. We shall explain this later.

4. The bits associated with the M1 field are those specifying the shift parameters Bit 3 L / R (1 for a left shift, 0 for a right shift) Bit 2 A (1 for an arithmetic shift) Bit 1 C (1 for circular shift) Bit 0 Not used

5. The bits associated with the M2 field are Bit 3 READ (Indicates a memory reference) Bit 2 WRITE (Unless READ = 1) Bit 1 extend (Sign–extend contents of IR when copying to B1) Bit 0 0 → RUN (Stop the computer)

6. For almost every micro–instruction, the two “next addresses” are identical. For these, we cannot predict the value of the generated control signal “branch” and do not care, since the next address will be independent of that value.

7. The values for next addresses will each be two hexadecimal digits. It is here that we have made the explicit assumption on the maximum size of the micromemory.

Sequencing the Boz–5 Microprogrammed Control Unit In addition to the assembly language opcode, we shall need two new signals in order to sequence the microprogrammed control unit correctly. We call these two control signals “S1” and “S2”, because they resemble the control signals S1 and S2 used in the hardwired control unit but are not exactly identical.

In the hardwired control unit, the signal S1 was used to determine whether or not the state following Fetch would again be Fetch. This allowed completion of the execution of 14 of the 22 assembly language instructions in the Fetch phase. In the microprogrammed control unit, the signal S1 will be used to determine whether or not the dispatch microinstruction is executed. The only condition under which it is not executed is that in which the assembly language calls for a conditional branch and the branch condition is not met.



This leads to a simple statement defining this sequencing signal.

S1 = 0 if and only if the assembly language opcode = 0x0F and branch = 0, where branch = 1 if and only if the branch condition is met.

The sequencing signal S2 is used to control the entering of the defer code for those instructions that can use indirect addressing. Recall that the assignment of opcodes to the assembly language instructions has been structured so that only instructions beginning with “011” can enter the defer phase. Even these enter the defer phase only when IR26 = 1. Thus, we have the following definition of this signal.

S2 = 1 if and only if (IR31 = 0, IR30 = 1, IR29 = 1, and IR26 = 1)

In a way, this is exactly the definition of the sequencing control signal S2 as used in the hardwired control unit. The only difference is that in this design the signal S2 must be used independently of the signal S1, so we must use the full definition. The figure below illustrates the circuitry to generate the two sequencing signals S1 and S2.

Given these circuits, we have the final form and labeling of the micro–words in the micro–memory. Note that there are no “micro–data” words, only microinstructions.

Micro–Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1


The form of the type 1 instruction is completely defined and can be given as follows.

Micro–Op B1 B2 B3 ALU M1 M2 D = 0 D = 1

0001 0x0 0x0 0x0 0x0 0x0 0x0 0x20 0x20

But what exactly does this dispatch instruction do? The question becomes one of defining the dispatch table, which is used to determine the address of the microcode that is invoked explicitly by this dispatch. We now address that issue.



The design of the Boz–5 uses a dispatch mechanism copied from that used by Andrew S. Tanenbaum in his textbook Structured Computer Organization (Fifth Edition, published by Pearson/Prentice–Hall in 2006; ISBN 0 – 13 – 148521 – 0). It is apparent that this dispatch mechanism is commonly used in many commercial implementations of microcode.

The solution is to use the opcode itself as the dispatch address. As the Boz–5 uses a five bit opcode, the sequencer unit for the microprogrammed control unit must expand it to an 8–bit address by adding three high order 0 bits so that the dispatch address is 000 ¢ IR31–27. As an example, the binary opcode for LDR is 01100; its dispatch address is 0000 1100 or 0x0C.

The Boz–5 uses five–bit opcodes, with range 0x00 through 0x1F. It then follows that addresses 0x00 through 0x1F in the micromemory must be reserved for dispatch addresses. It is for this reason that the common fetch sequence begins at address 0x20; that is the next available address. We now see that this structure works only because the address of the next microinstruction is explicitly encoded in the microinstruction.

This dispatch mechanism works very well on the 14 of 22 assembly language instructions that complete execution in one additional clock pulse. As examples, we examine the control signals for the first 8 opcodes (0x00 – 0x07) along with the common fetch microcode.

Address Micro Control Signals Next Address Opcode S2 = 0 S2 = 1

0x00 0 0 → RUN 0x20 0x20

0x01 0 IR → B1, extend, tra1, B3 → R 0x20 0x20

0x02 0 IR → B1, R → B2, and, B3 → R 0x20 0x20

0x03 0 IR → B1, R → B2, extend, add, B3 →R 0x20 0x20

0x04 0 NOP 0x20 0x20

0x05 0 NOP 0x20 0x20

0x06 0 NOP 0x20 0x20

0x07 0 NOP 0x20 0x20

0x20 0 PC → B1, tra1, B3 → MAR, READ 0x21 0x21

0x21 0 PC → B1, 1 → B2, add, B3 → PC 0x22 0x22

0x22 0 MBR → B2, tra2, B3 → IR 0x23 0x23

0x23 1 Dispatch based on opcode 0x20 0x20

At this point, our design continues to look sound. Note that the unused opcodes (those at microprogram addresses 0x04 through 0x07) simply do nothing and return to the common fetch sequence at address 0x20. This allows for future expansion of the instruction set.

The only problem left to be addressed is the proper handling of instructions that require more than one clock cycle (following the common fetch) in which to execute. The first such instruction is the GET assembly language instruction.



The control signals for the GET assembly language instruction, as implemented in the hardwired control unit are shown below. Note that (F, T3) here does nothing as the instruction cannot be completed in Fetch and I decided not to make the (F, T3) signal generation tree more complex when I could force the execution into (E, T0) and (E, T2).

GET Op-Code = 01000 (Hexadecimal 0x08) F, T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) F, T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 F, T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) F, T3: WAIT.

E, T0: IR → B1, tra1, B3 → IOA. // Send out the I/O address E, T1: WAIT. E, T2: IOD → B2, tra2, B3 → R. // Get the results. E, T3: WAIT.

Noting that the WAIT microoperations in this sequence are used only because there is nothing that can be done during those clock pulses, we can rewrite the sequence as follows.

GET Op-Code = 01000 (Hexadecimal 0x08) T0: PC → B1, tra1, B3 → MAR, READ. // MAR ← (PC) T1: PC → B1, 1 → B2, add, B3 → PC. // PC ← (PC) + 1 T2: MBR → B2, tra2, B3 → IR. // IR ← (MBR) T3: IR → B1, tra1, B3 → IOA. // Send out the I/O address T4: IOD → B2, tra2, B3 → R. // Get the results.

This fits into the structure of the microcode fairly well, except that it seems to call for two instruction to be placed at address 0x08. The solution to the problem is based on the fact that each microinstruction must encode the address of the next microinstruction. Just select the next available word in micromemory and place the “rest of the execution sequence” there.

What we have is as follows.

Address Micro Control Signals Next Address Opcode S2 = 0 S2 = 1

0x08 0 IR → B1, tra1, B3 → IOA. 0x24 0x24

0x20 0 PC → B1, tra1, B3 → MAR, READ 0x21 0x21

0x21 0 PC → B1, 1 → B2, add, B3 → PC 0x22 0x22

0x22 0 MBR → B2, tra2, B3 → IR 0x23 0x23

0x23 1 Dispatch based on opcode 0x20 0x20

0x24 0 IOD → B2, tra2, B3 → R. 0x20 0x20

In software engineering, such a structure is called “spaghetti code” and is highly discouraged. The reason is simple; one writes a few thousand lines in this style and nobody (including the original author) can follow the logic. The microprogram, however, comprises a small number (fewer than 33) independent threads of short (less than 12) instructions. For such a structure, even spaghetti code can be tolerated.



Assignment of Numeric Codes to Control Signals We now start writing the microcode. This step begins with the assignment of numeric values to the control signals that the control unit emits.

The next table shows the numeric codes that this author has elected to assign to the encoded control signals; these being the controls for bus B1, bus B2, bus B3, and the ALU. While the assignment may appear almost random, it does have some logic. The basic rule is that code 0 does nothing. The bus codes have been adjusted to have the greatest commonality; thus code 6 is the code for both MBR → B2 and B3 → MBR.

Code Bus 1 Bus 2 Bus 3 ALU 0 1 PC → B1 1 → B2 B3 →PC tra1 2 MAR → B1 – 1 →B2 B3 → MAR tra2 3 R → B1 R → B2 B3 →R shift 4 IR → B1 B3 → IR not 5 SP → B1 B3 → SP add 6 MBR → B2 B3 → MBR sub 7 IOD →B2 B3 → IOD and 8 B3 → IOA or 9 xor 10

Other assignments may be legitimately defended, but this is the one we use.

Example: Common Fetch Sequence We begin our discussion of microprogramming by listing the control signals for the first three minor cycles in the Fetch major cycle and translating these to microcode. We shall mention here, and frequently, that the major and minor cycles are present in the microcode only implicitly. It is better to think that major cycles map into sections of microcode.

For this example, we do the work explicitly. Location 0x20 F, T0: PC → B1 B1 code is 1 tra1 ALU code is 1 B3 → MAR B3 code is 2 READ M2(Bit 3) = 1, so M2 = 8

Micro–Op = 0. B2 code and M1 code are both 0.

Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x20 0 1 0 2 1 0 8 0x21 0x21



Location 0x21 F, T1: PC → B1 B1 code is 1 1 → B2 B2 code is 1 add ALU code is 5 B3 → PC. B3 code is 1


Location 0x22 F, T2: MBR → B2 B2 code is 6 tra2 ALU code is 2 B3 → IR B3 code is 4


Location 0x23 Dispatch on the op–code in the machine language instruction

For this we assume that the Micro–Op is 1 and that none of the other fields are used.


Here is the microprogram for the common fetch sequence.

Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x20 0 1 0 2 1 0 8 0x21 0x21 0x21 0 1 1 1 5 0 0 0x22 0x22 0x22 0 0 6 4 2 0 0 0x23 0x23 0x23 1 0 0 0 0 0 0 0x20 0x20

Here is the section of microprogram for the common fetch sequence, written in the form that would be seen in a utility used for debugging the microcode.

Address Contents 0x20 0x010 2108 2121 0x21 0x011 1500 2222 0x22 0x006 4200 2323 0x23 0x100 0000 2020

We now have assembled all of the design tricks required to write microcode and have examined some microcode in detail. It is time to finish the microprogramming.



The Execution of Op–Codes 0x00 through 0x07 The first four of these machine instructions (0x00 –0x00) use immediate addressing and execute in a single cycle, while the last four (0x04 –0x07) are NOP’s, also executing in a single cycle. The microcode for these goes in addresses 0x00 through 0x07 of the micro–memory. The next step for each of these is Fetch for the next instruction, so the next address for all of them is 0x20.

HLT Op-Code = 00000 0 → RUN.


LDI Op-Code = 00001 IR → B1, extend, tra1, B3 → R.


ANDI Op-Code = 00010 IR → B1, R → B2, and, B3 → R.


ADDI Op-Code = 00011 IR → B1, R → B2, extend, add, B3 → R.


We are now in a position to specify the first eight micro–words.

Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x00 0 0 0 0 0 0 1 0x20 0x20 0x01 0 4 0 3 1 0 2 0x20 0x20 0x02 0 4 3 3 7 0 0 0x20 0x20 0x03 0 1 3 3 5 0 2 0x20 0x20 0x04 0 0 0 0 0 0 0 0x20 0x20 0x05 0 0 0 0 0 0 0 0x20 0x20 0x06 0 0 0 0 0 0 0 0x20 0x20 0x07 0 0 0 0 0 0 0 0x20 0x20

Based on the tables above, we state the contents of the first eight micro–words.

Address Contents 0x00 0x 000 0001 2020 0x01 0x 040 3102 2020 0x02 0x 043 3700 2020 0x03 0x 013 3502 2020 0x04 0x 000 0000 2020 0x05 0x 000 0000 2020 0x06 0x 000 0000 2020 0x07 0x 000 0000 2020



For the moment, let’s skip the next eight opcodes and finish the simpler cases.

LLS Op-Code = 10000 R → B2, shift, R/L = 1, A = 0. C = 0, B3 → R.


LCS Op-Code = 10001 R → B2, shift, R/L = 1, A = 0. C = 1, B3 → R.


RLS Op-Code = 10010 R → B2, shift, R/L = 0, A = 0. C = 0, B3 → R.


RAS Op-Code = 10011 R → B2, shift, R/L = 0, A = 1. C = 0, B3 → R.


NOT Op-Code = 10100 R → B2, not, B3 → R.


ADD Op-Code = 10101 R → B1, R → B2, add, B3 → R.


SUB Op-Code = 10110 R → B1, R → B2, sub, B3 → R.


AND Op-Code = 10111 R → B1, R → B2, and, B3 → R.


OR Op-Code = 11000 R → B1, R → B2, or, B3 → R.


XOR Op-Code = 11001 R → B1, R → B2, xor, B3 → R.


We have now completed the microprogramming for all but eight of the instructions. The table on the next page shows what we have generated up to this point.



Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x00 0 0 0 0 0 0 1 0x20 0x20 0x01 0 4 0 3 1 0 2 0x20 0x20 0x02 0 4 3 3 7 0 0 0x20 0x20 0x03 0 1 3 3 5 0 2 0x20 0x20 0x04 0 0 0 0 0 0 0 0x20 0x20 0x05 0 0 0 0 0 0 0 0x20 0x20 0x06 0 0 0 0 0 0 0 0x20 0x20 0x07 0 0 0 0 0 0 0 0x20 0x20 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F 0x10 0 0 3 3 3 8 0 0x20 0x20 0x11 0 0 3 3 3 9 0 0x20 0x20 0x12 0 0 3 3 3 0 0 0x20 0x20 0x13 0 0 3 3 3 4 0 0x20 0x20 0x14 0 0 3 3 4 0 0 0x20 0x20 0x15 0 3 3 3 5 0 0 0x20 0x20 0x16 0 3 3 3 6 0 0 0x20 0x20 0x17 0 3 3 3 7 0 0 0x20 0x20 0x18 0 3 3 3 8 0 0 0x20 0x20 0x19 0 3 3 3 9 0 0 0x20 0x20 0x1A 0 0 0 0 0 0 0 0x20 0x20 0x1B 0 0 0 0 0 0 0 0x20 0x20 0x1C 0 0 0 0 0 0 0 0x20 0x20 0x1D 0 0 0 0 0 0 0 0x20 0x20 0x1E 0 0 0 0 0 0 0 0x20 0x20 0x1F 0 0 0 0 0 0 0 0x20 0x20 0x20 0 1 0 2 1 0 8 0x21 0x21 0x21 0 1 1 1 5 0 0 0x22 0x22 0x22 0 0 6 4 2 0 0 0x23 0x23 0x23 1 0 0 0 0 0 0 0x20 0x20

Note that instructions 0x1A through 0x1F are not yet implemented, so they show as NOP’s. We now move to those instructions that require Defer and Execute for completion. Due to the ordering of the op–codes, we first investigate those instructions that cannot enter Defer.



GET Op-Code = 01000 (Hexadecimal 0x08) F, T3: WAIT. E, T0: IR → B1, tra1, B3 → IOA. // Send out the I/O address E, T1: WAIT. E, T2: IOD → B2, tra2, B3 → R. // Get the results. E, T3: WAIT.

As noted above, we can ignore any WAIT signal that is not required by considerations of memory timing. The first of two microoperations is associated with the dispatch address for the GET instruction and the second one at the first available micromemory word.

IR → B1, tra1, B3 → IOA.


IOD → B2, tra2, B3 → R.


PUT Op-Code = 01001 (Hexadecimal 0x09) F, T3: WAIT. E, T0: R → B2, tra2, B3 → IOD // Get the data ready E, T1: WAIT. E, T2: IR → B1, tra1, B3 → IOA. // Sending out the address E, T3: WAIT. // causes the output of data.

R → B2, tra2, B3 → IOD


IR → B1, tra1, B3 → IOA.


RET Op-Code = 01010 (Hexadecimal 0x0A) F, T3: WAIT E, T0: SP → B1, – 1 → B2, add, B3 → SP. // Decrement the SP E, T1: SP → B1, tra1, B3 → MAR, READ. // Get the return address E, T2: WAIT. E, T3: MBR → B2, tra2, B3 → PC. // Put return address into PC

Here we have three non–waiting instructions plus a WAIT that is necessary for the memory access. As a result, we must allocate four micro–memory words to the execution.



SP → B1, – 1 → B2, add, B3 → SP

Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x0A 0 5 2 5 5 0 0 0x26 0x26

SP → B1, tra1, B3 → MAR, READ. Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x26 0 5 0 2 1 0 8 0x27 0x27

WAIT Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x27 0 0 0 0 0 0 0 0x28 0x28

MBR → B2, tra2, B3 → PC Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x28 0 0 6 1 2 0 0 0x20 0x20

RTI Op-Code = 01011 (Hexadecimal 0x0B) Not yet implemented.

This is encoded as another NOP, until time to develop the details of interrupt handling.

The “link” Address Micro-Op B1 B2 B3 ALU M1 M2 B = 0 B = 1

0x0B 0 0 0 0 0 0 0 0x20 0x20

We now turn to the four instructions that use both Defer and Execute. One might be tempted to write a common Defer “subroutine” to be “called” by each of the Execute sections. While this would reduce duplication of micro–code, we opt for in–line coding.

In each of the following four instructions, the step corresponding to (F, T3) issues control signals. These will be issued by the “link” micro–operation. We must also introduce a new micro–op to account for conditionally entering or not entering the Defer section.



The next four assembly language instructions (LDR, STR, JSR, and BR) are the only that possibly use indirect addressing. As discussed in our presentation of the sequencing control signals for the microprogram, the defer phase will be entered if and only if S2 = 1. The template for the DEFER state is shown below. It has one essential WAIT state. X READ X + 1 WAIT X + 2 MBR → B2, tra2, B3 → MAR X + 3 Code for E, T0 Due to this structure, the address for (S2 = = 0) will be 3 more than that for (S2 = = 1).

LDR Op-Code = 01100 (Hexadecimal 0x0C) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing. D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT. E, T0: READ. // Again, address is already in the MAR. E, T1: WAIT. E, T2; MBR → B2, tra2, B3 → R. E, T3: WAIT.

IR → B1, R → B2, add, B3 → MAR Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x0C 0 4 3 2 5 0 0 0x2C 0x29

READ Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x29 0 0 0 0 0 0 8 0x2A 0x2A


0x2A 0 0 0 0 0 0 0 0x2B 0x2B

MBR → B2, tra2, B3 → MAR Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x2B 0 0 6 2 2 0 0 0x2C 0x2C

READ. Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x2C 0 0 0 0 0 0 8 0x2D 0x2D

WAIT. Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x2D 0 0 0 0 0 0 0 0x2E 0x2E


0x2E 0 0 6 2 2 0 0 0x20 0x20



STR Op-Code = 01101 (Hexadecimal 0x0D)

F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing. D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT. E, T0: WAIT. E, T1; R → B1, tra1, B3 → MBR, WRITE. E, T2: WAIT. E, T3: WAIT.


0x0D 0 4 3 2 5 0 0 0x32 0x2F


0x2F 0 0 0 0 0 0 8 0x30 0x30


0x30 0 0 0 0 0 0 0 0x31 0x31


0x31 0 0 6 2 2 0 0 0x32 0x32

R → B1, tra1, B3 → MBR, WRITE Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x32 0 3 0 6 1 0 4 0x33 0x33


0x33 0 0 0 0 0 0 0 0x20 0x20

Note here that we must have a WAIT state following the last WRITE of the execute phase. This allows the memory to complete the instruction before the next instruction is fetched.



JSR Op-Code = 01110 (Hexadecimal 0x0E) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing. D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT. E, T0: PC → B1, tra1, B3 → MBR. // Put return address in MBR E, T1: MAR → B1, tra1, B3 → PC. // Set up for jump to target. E, T2: SP → B1, tra1, B3 → MAR, WRITE. // Put return address on stack. E, T3: SP → B1, 1 → B2, add, B3 → SP. // Bump SP


0x0E 0 4 3 2 5 0 0 0x37 0x34


0x34 0 0 0 0 0 0 8 0x35 0x35


0x35 0 0 0 0 0 0 0 0x36 0x36


0x36 0 0 6 2 2 0 0 0x37 0x37

PC → B1, tra1, B3 → MBR Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x37 0 1 0 6 1 0 0 0x38 0x38

MAR → B1, tra1, B3 → PC Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x38 0 2 0 1 1 0 0 0x39 0x39

SP → B1, tra1, B3 → MAR, WRITE Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x39 0 5 0 2 1 0 0 0x3A 0x3A

SP → B1, 1 → B2, add, B3 → SP Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x3A 0 5 1 5 5 0 0 0x20 0x20



BR Op-Code = 01111 (Hexadecimal 0x0F) F, T3: IR → B1, R → B2, add, B3 → MAR. // Do the indexing. D, T0: READ. // Address is already in the MAR. D, T1: WAIT. // Cannot access the MBR just now. D, T2: MBR → B2, tra2, B3 → MAR. // MAR ← (MBR) D, T3: WAIT. E, T0: WAIT. E, T1: WAIT. E, T2: WAIT. E, T3: MAR → B1, tra1, B3 → PC.

Remember that the microcode at address 0x23 will not dispatch to address 0x0F if the branch condition is not true. This avoids wasted time and incorrect execution.


0x0F 0 4 3 2 5 0 0 0x3E 0x3B


0x3B 0 0 0 0 0 0 8 0x3C 0x3C


0x3C 0 0 0 0 0 0 0 0x3D 0x3D


0x3D 0 0 6 2 2 0 0 0x3E 0x3E

MAR → B1, tra1, B3 → PC Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1

0x3E 0 2 0 1 1 0 0 0x20 0x20

This completes the derivation of the microprogram for the Boz–5.



Here now is the complete microprogram of the Boz–5, shown in two pages of tables.

Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x00 0 0 0 0 0 0 1 0x20 0x20 0x01 0 4 0 3 1 0 2 0x20 0x20 0x02 0 4 3 3 7 0 0 0x20 0x20 0x03 0 1 3 3 5 0 2 0x20 0x20 0x04 0 0 0 0 0 0 0 0x20 0x20 0x05 0 0 0 0 0 0 0 0x20 0x20 0x06 0 0 0 0 0 0 0 0x20 0x20 0x07 0 0 0 0 0 0 0 0x20 0x20 0x08 0 4 0 8 1 0 0 0x24 0x24 0x09 0 0 3 7 2 0 0 0x25 0x25 0x0A 0 5 2 5 5 0 0 0x26 0x26 0x0B 0 0 0 0 0 0 0 0x20 0x20 0x0C 0 4 3 2 5 0 0 0x2C 0x29 0x0D 0 4 3 2 5 0 0 0x32 0x2F 0x0E 0 4 3 2 5 0 0 0x37 0x34 0x0F 0 4 3 2 5 0 0 0x3E 0x3B 0x10 0 0 3 3 3 8 0 0x20 0x20 0x11 0 0 3 3 3 9 0 0x20 0x20 0x12 0 0 3 3 3 0 0 0x20 0x20 0x13 0 0 3 3 3 4 0 0x20 0x20 0x14 0 0 3 3 4 0 0 0x20 0x20 0x15 0 3 3 3 5 0 0 0x20 0x20 0x16 0 3 3 3 6 0 0 0x20 0x20 0x17 0 3 3 3 7 0 0 0x20 0x20 0x18 0 3 3 3 8 0 0 0x20 0x20 0x19 0 3 3 3 9 0 0 0x20 0x20 0x1A 0 0 0 0 0 0 0 0x20 0x20 0x1B 0 0 0 0 0 0 0 0x20 0x20 0x1C 0 0 0 0 0 0 0 0x20 0x20 0x1D 0 0 0 0 0 0 0 0x20 0x20 0x1E 0 0 0 0 0 0 0 0x20 0x20 0x1F 0 0 0 0 0 0 0 0x20 0x20 0x20 0 1 0 2 1 0 8 0x21 0x21 0x21 0 1 1 1 5 0 0 0x22 0x22 0x22 0 0 6 4 2 0 0 0x23 0x23 0x23 1 0 0 0 0 0 0 0x00 0x00



Address Micro-Op B1 B2 B3 ALU M1 M2 S2 = 0 S2 = 1 0x24 0 0 7 7 2 0 0 0x20 0x20 0x25 0 4 0 8 1 0 0 0x20 0x20 0x26 0 5 0 2 1 0 8 0x27 0x27 0x27 0 0 0 0 0 0 0 0x28 0x28 0x28 0 0 6 1 2 0 0 0x20 0x20 0x29 0 0 0 0 0 0 8 0x2A 0x2A 0x2A 0 0 0 0 0 0 0 0x2B 0x2B 0x2B 0 0 6 2 2 0 0 0x2C 0x2C 0x2C 0 0 0 0 0 0 8 0x2D 0x2D 0x2D 0 0 0 0 0 0 0 0x2E 0x2E 0x2E 0 0 6 2 2 0 0 0x20 0x20 0x2F 0 0 0 0 0 0 8 0x30 0x30 0x30 0 0 0 0 0 0 0 0x31 0x31 0x31 0 0 6 2 2 0 0 0x32 0x32 0x32 0 3 0 6 1 0 4 0x33 0x33 0x33 0 0 0 0 0 0 0 0x20 0x20 0x34 0 0 0 0 0 0 8 0x35 0x35 0x35 0 0 0 0 0 0 0 0x36 0x36 0x36 0 0 6 2 2 0 0 0x37 0x37 0x37 0 1 0 6 1 0 0 0x38 0x38 0x38 0 2 0 1 1 0 0 0x39 0x39 0x39 0 5 0 2 1 0 0 0x3A 0x3A 0x3A 0 5 1 5 5 0 0 0x20 0x20 0x3B 0 0 0 0 0 0 8 0x3C 0x3C 0x3C 0 0 0 0 0 0 0 0x3D 0x3D 0x3D 0 0 6 2 2 0 0 0x3E 0x3E 0x3E 0 2 0 1 1 0 0 0x20 0x20

The last address is 0x3E = 62 (in decimal). The microprogram has used 63 of the available 256 addresses (it is an 8–bit address) for a 25% usage. With 63 addresses used, we could have opted for a 6–bit address. We chose an 8–bit address for the sake of simplicity.

The reader will note that this leaves plenty of unused microprogram space for possible implementation of any new instructions.



The Sequencer for the Microprogrammed Control Unit

We now discuss the micro–control unit, which is responsible for sequencing the control unit itself, which is microprogrammed.

Recall that there are no “microdata” in the microprogram, but only microinstructions. Based on this observation, the only function of the micro–control unit is the selection of the next address to place into the micro–memory address register (μMAR).

There are four sources for this address. 1. the assembly language opcode, 2. the S2 = 0 field of the microinstruction, 3. the S2 = 1 field of the microinstruction, and 4. a constant register 0x20 which is used only when the computer is started.

Here is the circuit for the micro–control unit, omitting only the constant register.



Summary of Control Signals

This is a summary of control signals, organized by bus and functional unit. It is designed to support discussions of microprogramming, but can stand on its own as a handy table.

Code Bus 1 Bus 2 Bus 3 ALU 0 1 PC → B1 1 → B2 B3 →PC tra1 2 MAR → B1 – 1 →B2 B3 → MAR tra2 3 R → B1 R → B2 B3 →R shift 4 IR → B1 B3 → IR not 5 SP → B1 B3 → SP add 6 MBR → B2 B3 → MBR sub 7 IOD →B2 B3 → IOD and 8 B3 → IOA or 9 xor 10

Miscellaneous control signals Specified by the M1 and M2 fields These fields are not encoded, so that each bit can be set separately. Each of M1 and M2 is a four bit field, having bits Bit3, Bit2, Bit1, and Bit0. Bit Number Shift Select Other Signals Bit3 L / R READ Bit2 A WRITE Bit1 C extend Bit0 0 → RUN

Micro-Code Format The following assumes no more than 256 micro-words in the control store.

On Off Next Address if

Micro-Op B1 B2 B3 ALU M1 M2 Branch = 0 Branch = 1


Four–bit field format

Bit3 Bit2 Bit1 Bit0


The Central Processing Unit - Edward...

Documents

Transcript of The Central Processing Unit - Edward...