Multi Cycle CPU Mid-term Review Discussion...

15
1 CS141-L4-1 Tarun Soni, Summer ‘03 Multi Cycle CPU Previously: built a Single Cycle CPU. Today: Exceptions Multi-cycle CPU; Microprogramming CS141-L4-2 Tarun Soni, Summer ‘03 Mid-term Review Discussion Session Peterson Hall 104 Tue: 2-3 pm Tue: 3-4 pm 0 5 10 15 20 25 30 35 40 45 50 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 Series1 CS141-L4-3 Tarun Soni, Summer ‘03 Instruction Set Architectures Performance issues 2s complement, Addition, Subtraction Multiplication, Division, Floating Point numbers ALUs Single Cycle CPU Exceptions Multicycle CPU: datapath; control Microprogramming The Story so far: CS141-L4-4 Tarun Soni, Summer ‘03 Design alternative: provide more powerful operations goal is to reduce number of instructions executed danger is a slower cycle time and/or a higher CPI Sometimes referred to as “RISC vs. CISC” virtually all new instruction sets since 1982 have been RISC VAX: minimize code size, make assembly language easy instructions from 1 to 54 bytes long! We’ll look at Pentium, UltraSparc and JVM Alternative Architectures CS141-L4-5 Tarun Soni, Summer ‘03 Pentium CS141-L4-6 Tarun Soni, Summer ‘03 Java VM Most instr one byte ADD POP One byte arg ILOAD IND8 BIPUSH CON8 Two byte arg SIPUSH CON16 IF_ICMPEQ OFFSET16 Type = int, signed int etc.

Transcript of Multi Cycle CPU Mid-term Review Discussion...

Page 1: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

1

CS141-L4-1 Tarun Soni, Summer ‘03

Mult i Cycle CPU

�Previously: built a Single Cycle CPU.

�Today:

�Exceptions

�Multi-cycle CPU;

�Microprogramming

CS141-L4-2 Tarun Soni, Summer ‘03

Mid-term Review Discussion Session

�Peterson Hall 104

�Tue: 2-3 pm

�Tue: 3-4 pm

0

5

10

1520

25

30

35

40

45

50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

Series1

CS141-L4-3 Tarun Soni, Summer ‘03

�Instruction Set Architectures

�Performance issues

�2s complement, Addition, Subtraction

�Multiplication, Division, Floating Point numbers

�ALUs

�Single Cycle CPU

�Exceptions

�Multicycle CPU: datapath; control

�Microprogramming

The Story so far:

CS141-L4-4 Tarun Soni, Summer ‘03

• Design alternative:

– provide more powerful operations

– goal is to reduce number of instructions executed

– danger is a slower cycle time and/or a higher CPI

• Sometimes referred to as “RISC vs. CISC”

– virtually all new instruction sets since 1982 have been RISC

– VAX: minimize code size, make assembly language easyinstructions from 1 to 54 bytes long!

• We’ll look at Pentium, UltraSparc and JVM

Alternative Architectures

CS141-L4-5 Tarun Soni, Summer ‘03

Pentium

CS141-L4-6 Tarun Soni, Summer ‘03

Java VM

• Most i nstr one byte– ADD– POP

• One byte arg– ILOAD IND8– BIPUSH CON8

• Two byte arg– SIPUSH CON16– IF_ICMPEQ OFFSET16

• Type = int, signed int etc.

Page 2: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

2

CS141-L4-7 Tarun Soni, Summer ‘03

UltraSparc

CS141-L4-8 Tarun Soni, Summer ‘03

Exceptions

or

Oops!

CS141-L4-9 Tarun Soni, Summer ‘03

Exceptions

• There are two sources of non-sequential control flow in a processor– explicit branch and jump instructions– exceptions

• Branches are synchronous and determinist ic• Exceptions are typically asynchronous and non-determinist ic• Guess which is more difficult to handle?

• exceptions as any unexpected change in control flow

• interrupts as any externally-caused exception

• Literature is not consistent

arithmetic overflow

divide by zero

I/O device signals completion to CPU

user program invokes the OS

memory parity error

il legal instruction

timer signal

CS141-L4-10 Tarun Soni, Summer ‘03

Exceptions

• The machine we’ve been designing in class can generate two types of exceptions.

– arithmetic overflow

– illegal instruction

• On an exception, we need to

– save the PC (invisible to user code)

– record the nature of the exception/interrupt

– transfer control to OS

user programSystemExceptionHandlerException:

return fromexception

CS141-L4-11 Tarun Soni, Summer ‘03

Exceptions

• MIPS architecture defines the instruction as having no effect if the instruction causes an exception.

• When we get to virtual memory we will see that certain classes of exceptions must prevent the instruction from changing the machine state.

• This aspect of handling exceptions becomes complex and potentially limits performance => why it is hard

• Interrupts

– caused by external events

– asynchronous to program execution

– may be handled between instructions

– simply suspend and resume user program

• Traps/Exceptions

– caused by internal events

• exceptional conditions (overflow)

• errors (parity)

• faults (non-resident page)

– synchronous to program execution

– condition must be remedied by the handler

– instruction may be retried or simulated and program continued or program may be aborted

CS141-L4-12 Tarun Soni, Summer ‘03

Exceptions

Addressing the Exception Handler

• Traditional Approach: Interupt Vector– PC <- MEM[ IV_base + cause || 00]– 370, 68000, Vax, 80x86, . . .

• RISC Handler Table– PC <– IT_base + cause || 0000– saves state and jumps– Sparc, PA, M88K, . . .

• MIPS Approach: fixed entry– PC <– EXC_addr– Actually very small table

• RESET entry• TLB • other

iv_basecause

handlercode

iv_basecause

handler entry code

Page 3: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

3

CS141-L4-13 Tarun Soni, Summer ‘03

Exceptions

Saving State

• Push it onto the stack

– Vax, 68k, 80x86

• Save it in special registers– MIPS EPC, BadVaddr, Status, Cause

• Shadow Registers

– M88k– Save state in a shadow of the internal pipeline registers

Significant component of “ interrupt response time”

CS141-L4-14 Tarun Soni, Summer ‘03

Exceptions

• For our MIPS-subset architecture, we will add two registers:

– EPC: a 32-bit register to hold the user’s PC

– Cause: A register to record the cause of the exception

• we’ll assume undefined inst = 0, overflow = 1

• We will also add three control signals:

– EPCWrite (will need to be able to subtract 4 from PC)

– CauseWrite

– IntCause

• We will extend PCSource multiplexor to be able to latch the interrupt handler address into the PC.

CS141-L4-15 Tarun Soni, Summer ‘03

Cau

se

CauseWrite

IntCause

EPC

PC

PCWrite EPCWrite

PCSource

InterruptHandlerAddress

sub4

imm

16

32

ALUctr

Clk

busW

RegWr

3232

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Extender

Mux

3216imm16

ALUSrcExtOp

Mux

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LU

Equal

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Adder

Adder

PC

Clk

00

Mux

4

nPC_sel

PC

Ext

Adr

InstMemory

Exceptions

CS141-L4-16 Tarun Soni, Summer ‘03

ALUctrRegDst ALUSrcExtOp MemtoRegMemWr Equal

Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

nPC_sel

Adr

InstMemory

DATA PATH

Cont rol

Op

<21:25>

Fun

RegWrExceptionSignals

Exceptions: Creating a “ Control line”

Regs: – EPC: – Cause:

control s ignals:– EPCWrite (subtract 4 from PC)– CauseWrite– IntCause

CS141-L4-17 Tarun Soni, Summer ‘03

Clk

5

Rw Ra Rb

32 32-bitRegisters

Rd

AL

U

Clk

Data In

DataAddress

IdealData

Memory

Instruction

InstructionAddress

IdealInstruction

Memory

Clk

PC

5Rs

5Rt

16Imm

32

323232

A

B

Nex

t A

dd

ress

Regs: – EPC: – Cause:

control s ignals:– EPCWrite (subtract 4 from PC)– CauseWrite– IntCause

Extend PCSource MUX to include jump address from int-table

Exceptions: Creating the data path

CS141-L4-18 Tarun Soni, Summer ‘03

CPU

Multi Cycle CPU

Page 4: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

4

CS141-L4-19 Tarun Soni, Summer ‘03

CPU

The Big Picture: Where are We Now?

• The Five Classic Components of a Computer

• Datapath Design, then Control Design

Cont rol

Datapath

Memory

Processor

Input

Output

CS141-L4-20 Tarun Soni, Summer ‘03

Recap: Processor Design is a Process

• Bottom-up

– assemble components in target technology to establish critical timing

• Top-down

– specify component behavior from high-level requirements

• Iterative refinement

– establish partial solution, expand and improve

datapath control

processorInstruction SetArchitecture

=>

Reg. File Mux ALU Reg Mem Decoder Sequencer

Cells Gates

CS141-L4-21 Tarun Soni, Summer ‘03

CPU: The single cycle

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction ° Design hardware for each of these steps!!!

Execute anentire instruction

Fet

ch

Dec

ode

Fet

ch

Exe

cute

Sto

re

Nex

t

CS141-L4-22 Tarun Soni, Summer ‘03

CPU: Clocking

Clk

Don’ t Care

Setup Hold

.

.

.

.

.

.

.

.

.

.

.

.

Setup Hold

• All storage elements are clocked by the same clock edge

CS141-L4-23 Tarun Soni, Summer ‘03

CPU: Main Control PLA Implementation of the Main Control

op<0>

op<5>. .op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

op<5>. .

<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoReg

MemWrite

Branch

Jump

RegDst

ExtOp

ALUop<2>

ALUop<1>

ALUop<0>CS141-L4-24 Tarun Soni, Summer ‘03

CPU: Main Control

• In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction”

– in general, the controller is a finite state machine

– microinstruction can also control sequencing (see later)

Control Logic / Store(PLA, ROM)

OPcode

Datapath

Inst

ruct

ion

Decode

Co

nditi

ons

ControlPoints

microinstruction

Page 5: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

5

CS141-L4-25 Tarun Soni, Summer ‘03

CPU: Abstract View of a single cycle processor

• looks like a FSM with PC as state

PC

Ne

xt P

C

Reg

iste

rF

etch ALU

Reg

. W

rt

Mem

Acc

ess

Dat

aM

emInst

ruct

ion

Fet

ch

Res

ult S

tore

AL

Uct

r

Reg

Dst

AL

US

rcE

xtO

p

Mem

Wr

Eq

ual

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

MainControl

ALUcontrol

op

fun

Ext

CS141-L4-26 Tarun Soni, Summer ‘03

CPU: Why is a CPI=1 processor bad?

• Long Cycle Time• All instructions take as much time as the slowest• Real memory is not so nice as our idealized memory

– cannot always get the job done in one (short) cycle

PC Inst Memory mux ALU Data Mem mux

PC Reg FileInst Memory mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Reg File

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

setup

CS141-L4-27 Tarun Soni, Summer ‘03

3.0--.1.911beq

3.9-1-.911Store

4.7.81-.911Load

3.7.8--.911R-type

TotalR-WriteD cachePC update

ALUDecode,R-Read

I cache

•Load needs 5 cycles•Store and R-type need 4•beq needs 3

Goal: balance amount of work done each cycle.

CPU: Why is a CPI=1 processor bad?

CS141-L4-28 Tarun Soni, Summer ‘03

CPU: Reducing Cycle Time

• Cut combinational dependency graph and insert register / latch

• Do same work in two fast cycles, rather than one slow one

storage element

Acyclic CombinationalLogic

storage element

storage element

Acyclic CombinationalLogic (A)

storage element

storage element

Acyclic CombinationalLogic (B)

=>

CS141-L4-29 Tarun Soni, Summer ‘03

CPU: Building blocks

• Adder

• MUX

• ALU

32

32

A

B

32Sum

Carry

32

32

A

B

32Result

OP

32A

B32

Y32

Select

Ad

derM

UX

AL

U

Carry In

CS141-L4-30 Tarun Soni, Summer ‘03

CPU: Building blocks

OP

32A

B32

Y32

Select

MU

X

3232

A[31..0]

B[31..0]32

Sum[31..0]

Carry

Ad

der

Carry In

32A[63..32]

B[63..32]32

Sum[63..32]

Carry

Ad

der

Carry In

32

• Building a 64-bit adder from 2x32-bit adders

• Speed of addition? • For one ADD? • For consecutive ADDS?

Page 6: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

6

CS141-L4-31 Tarun Soni, Summer ‘03

Multicycle CPU: Individual operations

• Next address logic

– PC <= branch ? PC + offset : PC + 4

• Instruction Fetch

– InstructionReg <= Mem[PC]

• Register Access

– A <= R[rs]

• ALU operation

– R <= A + B

PC

Ne

xt P

C

Ope

rand

Fet

ch Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult S

tore

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

Control

CS141-L4-32 Tarun Soni, Summer ‘03

• Five execution steps (some instructions use fewer)– IF: Instruction Fetch– ID: Instruction Decode (& register fetch & add PC+immed)– EX: Execute– Mem: Memory access– WB: Write-Back into registers

IF ID EX Mem WB

3.0--.1.911beq

3.9-1-.911Store

4.7.81-.911Load

3.7.8--.911R-type

TotalR-WriteD cachePC update

ALUDecode,R-Read

I cache

Multicycle CPU: Partitioning Time

CS141-L4-33 Tarun Soni, Summer ‘03

PC

Instruction memory

Read address

Instruction

16 32

Add ALU result

M u x

Registers

Write registerWrite data

Read data 1

Read data 2

Read register 1Read register 2

Shift left 2

4

M u x

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALU result

ZeroALU

Data memory

Address

Write data

Read data M

u x

Sign extend

Add

��� ��� ��� ��� ���

Multicycle CPU: StepsNote: Reuse of ALU

CS141-L4-34 Tarun Soni, Summer ‘03

Multicycle CPU

Partitioning the CPI=1 Datapath

• Add registers between smallest stepsP

C

Ne

xt P

C

Ope

rand

Fet

ch Exec

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult S

tore

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

CS141-L4-35 Tarun Soni, Summer ‘03

Multicycle CPU

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Ifetch Reg Exec Mem

Load Store

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Cycle 1 Cycle 2

CS141-L4-36 Tarun Soni, Summer ‘03

Step R-type Memory Branch Instruction Fetch IR = Mem[PC]

PC = PC + 4 Instruction Decode/ register fetch

A = Reg[IR[25-21]] B = Reg[IR[20-16]]

ALUout = PC + (sign-extend(IR[15-0]) << 2) Execution, address computation, branch completion

ALUout = A op B ALUout = A + sign-

extend(IR[15-0])

if (A==B) then PC=ALUout

Memory access or R-type completion

Reg[IR[15-11]] = ALUout

memory-data = Mem[ALUout]

or Mem[ALUout]=

B

Write-back Reg[IR[20-16]] = memory-data

Multicycle CPU: Instruction Types

Page 7: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

7

CS141-L4-37 Tarun Soni, Summer ‘03

Multicycle CPU: Sharing Hardware

• Example: memory is used twice, at different times

– Ave memaccess per inst = 1 + Flw + Fsw ~ 1.3

– if CPI is 4.8, imem util ization = 1/4.8, dmem =0.3/4.8

• We could reduce HW without hurting performance

– extra control

IR <- Mem[PC]

A <- R[rs] ; B<– R[rt]

S <– A + B

R[rd] <– S;PC <– PC+4;

S <– A + SX

M <– Mem[S]

R[rd] <– M;PC <– PC+4;

S <– A or ZX

R[rt] <– S;PC <– PC+4;

S <– A + SX

Mem[S] <- B

PC <– PC+4; PC < PC+4; PC < PC+SX;

CS141-L4-38 Tarun Soni, Summer ‘03

Multicycle CPU: Sharing Functional Units

PC

Memory

Address

Instruction �or data

Data

Instruction �register

RegistersRegister #

Data

Register #

Register #

ALU

Memory �data �

register

A

B

ALUOut

S t ep n a m eA c t i o n f o r R -t y p e

i n s t r u c t io n sA c t io n fo r m e m o r y -r e f er en c e

in s t r u c t i o n sA c ti o n f o r b r an c h e s

A c t io n f o r j u m p s

In st ruc tio n fet c h IR = M em o ry[ P C]P C = P C + 4

In st ruc tio n A = Re g [IR [2 5-2 1] ]de c od e /reg ist er fet c h B = Re g [IR [2 0-1 6] ]

A LU O u t = P C + (s ign -ex t en d (IR [ 15 -0] ) < < 2)

E xe c u tio n , a dd re ss AL U O ut = A o p B AL U O ut = A + sig n-e x te nd if (A = = B ) t he n PC = P C [3 1 -2 8 ] I Ic om p uta tio n, bra nc h / (IR [ 15 -0] ) P C = A L UO u t (I R [ 25 -0] << 2 )ju m p co m ple tion

Me m ory ac c e ss or R -ty p e Re g [IR [1 5-1 1 ]] = L o ad : M D R = Me m ory [ AL U O ut ]c om p le tion A L UO u t o r

S to re : M e mo ry [A L UO ut] = B

Me m ory re a d co m ple tion L o ad : R e g[ IR [ 20 -16 ]] = M DR

Reuse:

• ALU

• Memory

Need more

• Muxing

• Control

Single ALU, Common data and instruction memory datapath

CS141-L4-39 Tarun Soni, Summer ‘03

Since we reuse logic (e.g. ALU), we need to store results between states

Need extra registers when:– signal is computed in one clock cycle and used in

another, AND– the inputs to the combinational circuit can change

before the signal is written into a state element.

Multicycle CPU: Adding State Elements

CS141-L4-40 Tarun Soni, Summer ‘03

PC

Instruction memory

Read address

Instruction

16 32

Add ALU result

M u x

Registers

Write registerWrite data

Read data 1

Read data 2

Read register 1Read register 2

Shift left 2

4

M u x

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALU result

ZeroALU

Data memory

Address

Write data

Read data M

u x

Sign extend

Add

� � � � ��� ��� � �

Multicycle CPU: Adding State Elements

CS141-L4-41 Tarun Soni, Summer ‘03

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction [15–11]

M u x

0

1

M u x

0

1

4

Instruction [15–0]

Sign extend

3216

Instruction [25–21]

Instruction [20–16]

Instruction [15–0]

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op [5–0]

Instruction [31-26]

Instruction [5–0]

M u x

0

2

Jump address [31-0]Instruction [25–0] 26 28

Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

Multicycle CPU: The Full Multi-Cycle Implementation

CS141-L4-42 Tarun Soni, Summer ‘03

Cycle 1: Instruction Fetch

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction � � � � � � � M u x

0

1

M u x

0

1

4

Instruction � � � � � �

Sign extend

3216

Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op � �

Instruction [31-26]

� � � � � � � � � � � � � � � �

M u x

0

2

Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28

Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

Datapath: IR = Memory[PC], PC = PC + 4 (may be revised later)Control: IorD=0, MemRead=1, MemWr=0, IRwrite=1, ALUsrcA=0, etc

Page 8: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

8

CS141-L4-43 Tarun Soni, Summer ‘03

A = Register [IR[25-21]]B = Register [IR[20-16]]ALUout = PC + (sign-extend (IR[15-0]) << 2)

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction � � � � � � � M u x

0

1

M u x

0

1

4

Instruction � � � � � �

Sign extend

3216

Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op � �

Instruction [31-26]

� � � � � � � � � � � � � � � �

M u x

0

2

Jump address [31-0]

� � � � � � � � � � � � � � � � � 26 28Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

Cycle 1: Instruction Decode

CS141-L4-44 Tarun Soni, Summer ‘03

A = Reg[IR[25-21]]B = Reg[IR[20-16]]

ALUout = PC + (sign-extend (IR[15-0]) << 2)

We compute target address even though we don’t know if it will be used – Operation may not be branch– Even if it is, branch may not be taken

Why? Everything up to this point must be instruction-independent,

because we haven’t decoded the instruction.The ALU, the (incremented) PC, and the immed field are now all

available

Cycle 2: Instruction Decode & RegFetch

CS141-L4-45 Tarun Soni, Summer ‘03

Cycle 3 for beq: EXecute

• In cycle 1, PC was incremented by 4• In cycle 2, ALUout was set to branch target•This cycle, we conditionally reset PC: if (A==B) PC=ALUout

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction � � � � � � � M u x

0

1

M u x

0

1

4

Instruction � � � � � �

Sign extend

3216

Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op � �

Instruction [31-26]

� � � � � � � � � � � � � � � �

M u x

0

2

Jump address [31-0]

� � � � � � � � � � � � � � � � � 26 28Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

A

B

ALUout

CS141-L4-46 Tarun Soni, Summer ‘03

• Cycle 3 (EXecute)

ALUout = A op B

• Cycle 4 (WriteBack)

Reg[IR[15-11]] = ALUout

R-type instruction is finished

Cycle 3: R-type Instruction

CS141-L4-47 Tarun Soni, Summer ‘03

Cycle 3: ALUout = A op B

Cycle 4: Reg[IR[15-11]] = ALUout

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction � � � � � � � M u x

0

1

M u x

0

1

4

Instruction � � � � � �

Sign extend

3216

Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op � �

Instruction [31-26]

� � � � � � � � � � � � � � � �

M u x

0

2

Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28

Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

B

A

Cycle 3: R-type Instruction

CS141-L4-48 Tarun Soni, Summer ‘03

Cycle 3: ALUout = A op B

Cycle 4: Reg[IR[15-11]] = ALUout

Shift left 2

PCM u x

0

1

RegistersWrite register

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Instruction � � � � � � � M u x

0

1

M u x

0

1

4

Instruction � � � � � �

Sign extend

3216

Instruction � � � � � � �Instruction � � � � � � �Instruction � � � � � �

Instruction register

ALU control

ALU result

ALUZero

Memory data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op � �

Instruction [31-26]

� � � � � � � � � � � � � � � �

M u x

0

2

Jump address [31-0]� � � � � � � � � � � � � � � � � 26 28

Shift left 2

PC [31-28]

1

1 M u x

0

32

M u x

0

1ALUOut

Memory

MemData

Write data

Address

B

AALUout

Cycle 4: R-type Instruction

Page 9: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

9

CS141-L4-49 Tarun Soni, Summer ‘03

Multicycle CPU: The datapathP

C

Nex

t PC

Ope

rand

Fet

ch

Ext

ALU Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult S

tore

AL

Uct

r

Reg

Dst

AL

USr

c

Ext

Op

nPC

_sel

Reg

Wr

Mem

Wr

Mem

Rd

IR

A

B

R

M

RegFile

Mem

ToR

eg

Equ

al

Extra Registers:

• IR

• A,B

• R ( sometimes called S or ALUout)

• MCS141-L4-50 Tarun Soni, Summer ‘03

Multicycle CPU: The datapath

• Logical Register Transfer

• Physical Register Transfers

inst Logical Register Transfers

ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

ADDU A<– R[rs]; B <– R[rt]

S <– A + B

R[rd] <– S; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Ne

xt P

C

IR

Inst

. M

em

CS141-L4-51 Tarun Soni, Summer ‘03

Multicycle CPU: The datapath

• Logical Register Transfer

• Physical Register Transfers

inst Logical Register Transfers

ORI R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

ADDU A<– R[rs]; B <– R[rt]

S <– ( A or ZeroExt(Im16) )

R[rt] <– S; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Ne

xt P

C

IR

Inst

. M

em

CS141-L4-52 Tarun Soni, Summer ‘03

Multicycle CPU: The datapath

• Logical Register Transfer

• Physical Register Transfers

inst Logical Register Transfers

LW R[rt] <– MEM(R[rs] + sx(Im16);

PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

LW A<– R[rs]; B <– R[rt]

S <– A + SignEx(Im16)

M <– MEM[S]

R[rd] <– M; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Ne

xt P

C

IR

Inst

. M

em

CS141-L4-53 Tarun Soni, Summer ‘03

Multicycle CPU: The datapath

• Logical Register Transfer

• Physical Register Transfers

inst Logical Register Transfers

SW MEM(R[rs] + sx(Im16) <– R[rt];

PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

SW A<– R[rs]; B <– R[rt]

S <– A + SignEx(Im16);

MEM[S] <– B; PC <– PC + 4

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Ne

xt P

C

IR

Inst

. M

em

CS141-L4-54 Tarun Soni, Summer ‘03

Multicycle CPU: The datapath

• Logical Register Transfer

• Physical Register Transfers

inst Logical Register Transfers

BEQ if R[rs] == R[rt]

then PC <= PC + sx(Im16) || 00

else PC <= PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

BEQ|Eq PC <– PC + 4

inst Physical Register Transfers

IR <– MEM[pc]

BEQ|Eq PC <– PC + sx(Im16) || 00

Exe

c

Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

A

B

S

M

Reg

File

Equ

al

PC

Ne

xt P

C

IR

Inst

. M

em

Page 10: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

10

CS141-L4-55 Tarun Soni, Summer ‘03

Multicycle CPU: Summary

Step nameAction for R-type

instructionsAction for memory-reference

instructionsAction for branches

Action for jumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

CS141-L4-56 Tarun Soni, Summer ‘03

Multicycle CPU: Mid-term alert !!

• How many cycles will it take to execute this code?

lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)

Label: ...

• What is going on during the 8th cycle of execution?

• In what cycle does the actual addition of $t2 and $t3 takes place?

CS141-L4-57 Tarun Soni, Summer ‘03

Multicycle CPU: Sharing Hardware

“Princeton” Organization

• Single memory for instruction and data access

– memory utilization -> 1.3/4.8

• In this case our state diagram does not change

– several additional control signals

– must ensure each bus is only driven by one source on each cycle

RegFile

A

B

A-BusB Bus

IR S

W-Bus

PC

nextPC ZX SX

Mem

CS141-L4-58 Tarun Soni, Summer ‘03

Multicycle CPU: Control Line Timing

Shift �left 2

MemtoReg

IorD MemRead MemWrite

PC

Memory

MemData

Write �data

M �u �x

0

1

RegistersWrite �register

Write �data

Read�data 1

Read �data 2

Read �register 1

Read �register 2

Instruction �[15– 11]

M �u �x

0

1

M�u �x

0

1

4

ALUOpALUSrcB

RegDst RegWrite

Instruction �[15– 0]

Instruction [5– 0]

Sign �extend

3216

Instruction �[25– 21]

Instruction �[20– 16]

Instruction �[15– 0]

Instruction �register

1 M �u �x

0

3

2

ALU �control

M �u �x

0

1ALU �

resultALU

ALUSrcA

ZeroA

B

ALUOut

IRWrite

Address

Memory �data �

register

Clk

Cycle 1

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Ifetch Reg Exec Mem

Load Store

Ifetch

R-type

IRWr ite

CS141-L4-59 Tarun Soni, Summer ‘03

Review: Finite State Machines

• Finite state machines:

– a set of states and

– next state function (determined by current state and the input)

– output function (determined by current state and possibly input)

– We’ll use a Moore machine (output based only on current state)

Next-state�

functionCurrent state

Clock

Output�

function

Next�

state

Outputs

Inputs

CS141-L4-60 Tarun Soni, Summer ‘03

Multicycle CPU: Control

PCWrite

PCWriteCondIorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

NS3NS2NS1NS0

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

State register

IRWrite

MemRead

MemWrite

Instruction register �opcode field

Outputs

Control logic

Inputs

If (State == Instruction Fetch)

{

IRWrite = 1;

// All other signals are 0;

State = Operand Fetch;

}

If (State == Execute && InstructionOpCode == BEQ )

{

// Do your thing..

}

ControlOutput = f(State, OpCode)

NextState = f(State, OpCode)

Page 11: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

11

CS141-L4-61 Tarun Soni, Summer ‘03

Multicycle CPU: Our basic FSM

Instruction fetch

Decode and Register Fetch

Memoryinstructions

R-typeinstructions

Branchinstructions

Jumpinstruction

CS141-L4-62 Tarun Soni, Summer ‘03

Multicycle CPU: Control

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & Equal

BEQ & ~Equal

PC <= PC + 4 PC <= PC +SX || 00

SW

“instruction fetch”

“decode / operand fetch”

Exe

cute

Mem

ory

Writ

e-ba

ck

CS141-L4-63 Tarun Soni, Summer ‘03

Multicycle CPU: Control

PCWrite �PCSource = 10

ALUSrcA = 1 �ALUSrcB = 00 �ALUOp = 01 �PCWriteCond �

PCSource = 01

ALUSrcA =1 �ALUSrcB = 00 �ALUOp = 10

RegDst = 1 �RegWrite �

MemtoReg = 0

MemWrite �IorD = 1

MemRead �IorD = 1

ALUSrcA = 1 �ALUSrcB = 10 �ALUOp = 00

RegDst=0 �RegWrite �

MemtoReg =1 ��

ALUSrcA = 0 �ALUSrcB = 11 �ALUOp = 00

MemRead �ALUSrcA = 0 �

IorD = 0 �IRWrite �

ALUSrcB = 01 �ALUOp = 00 �

PCWrite �PCSource = 00

Instruction fetchInstruction decode/�

register fetch

Jump �completion

Branch �completionExecution

Memory address �computation

Memory �access

Memory �access R-type completion

Write-back step

(Op = 'LW') or (Op = 'SW') (Op = R-type)

(Op

= 'B

EQ')

(Op

= 'J

')

(Op = 'SW

')

(Op

= 'L

W')

4

01

9862

753

Start

Number of states?

Number of bits for state?

CS141-L4-64 Tarun Soni, Summer ‘03

Multicycle CPU: Control: Assigning States

IR <= MEM[PC]

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= SPC <= PC + 4

S <= A or ZX

R[rt] <= SPC <= PC + 4

ORi

S <= A + SX

R[rt] <= MPC <= PC + 4

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= BPC <= PC + 4

BEQ & EqualBEQ & ~Equal

PC <= PC + 4 PC <= PC +SX || 00

SW

“instruction fetch”

“decode”

Exe

cute

Mem

ory

Writ

e-ba

ck

0000

0001

0100

0101

0110

0111

1000

1001

1010

0011 00101011

1100

CS141-L4-65 Tarun Soni, Summer ‘03

Multicycle CPU: Detailed control spec.

0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 01010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1

State Op field Eq Next IR PC Ops Exec Mem Write-Backen sel A B Ex Sr ALU S R W M M-R Wr Dst

R:

ORi:

LW:

SW:

CS141-L4-66 Tarun Soni, Summer ‘03

Multicycle CPU: Implementation styles

• ROM = "Read Only Memory"

– values of memory locations are fixed ahead of time

• A ROM can be used to implement a truth table

– if the address is m-bits, we can address 2m entries in the ROM.

– our outputs are the bits of data that the address points to.

– 2m is the "height", and n is the "width"

m n

0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1

Page 12: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

12

CS141-L4-67 Tarun Soni, Summer ‘03

Multicycle CPU: Implementation styles

• How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)

• How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs

• ROM is 210 x 20 = 20K bits (and a rather unusual size)

• Rather wasteful, since for lots of the entries, the outputs are the same— i.e., opcode is often ignored

CS141-L4-68 Tarun Soni, Summer ‘03

Multicycle CPU: Implementation styles

• Break up the table into two parts

— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM

— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM

— Total: 4.3K bits of ROM

• PLA is much smaller

— can share product terms

— only need entries that produce an active output

— can take into account don't cares

• Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms)

For this example = (10x17)+(20x17) = 460 PLA cells

• PLA cells usually about the size of a ROM cell (slightly bigger)

CS141-L4-69 Tarun Soni, Summer ‘03

Multicycle CPU: Implementation styles

PLA ImplementationOp5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

IorD

IRWrite

MemReadMemWrite

PCWritePCWriteCond

MemtoRegPCSource1

ALUOp1

ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0

ALUSrcB1ALUOp0

PCSource0

IRWrite = (!S0 && !S1 && !S2 && !S3)

NS0 = ( S[3..0] == 0000) ||( S[3..0] == 0110 ) ||( S[3..0] == 1001 && OP[5..0]=000010 ) ||(…)(…)

CS141-L4-70 Tarun Soni, Summer ‘03

Microprogramming

PCWrite�PCSource = 10

ALUSrcA = 1�ALUSrcB = 00�ALUOp = 01�PCWriteCond�

PCSource = 01

ALUSrcA =1 �ALUSrcB = 00�ALUOp= 10

RegDst = 1 �RegWrite�

MemtoReg = 0

MemWrite�IorD = 1

MemRead�IorD = 1

ALUSrcA = 1�ALUSrcB = 10 �ALUOp = 00

RegDst=0�RegWrite�

MemtoReg=1 ��

ALUSrcA = 0 �ALUSrcB = 11 �ALUOp = 00

MemRead�ALUSrcA = 0�

IorD = 0 �IRWrite �

ALUSrcB = 01�ALUOp = 00�

PCWrite �PCSource = 00

Instruction fetchInstruction decode/ �

register fetch

Jump �completion

Branch �completionExecution

Memory address �computation

Memory �access

Memory �access R-type completion

Write-back step

(Op = 'LW') or

(Op = 'SW') (Op = R

-t ype)

(Op

= 'BE

Q')

(Op

= 'J

')

( Op = 'SW')

(Op

= 'L

W')

4

01

9862

753

Start

• Control is the hard part of processor design

° Datapath is fairly regular and well-organized

° Memory is highly regular

° Control is irregular and global

Consider the FSM in case of 100s of i nstructions !!!

• FSMs get unmanageable quickly as they grow.

– hard to specify

– hard to manipulate

– error prone

– hard to visualize

• The state digrams that arise define the controller for an instruction set processor are highly structured

• Use this structure to construct a simple “microsequencer”

• Control reduces to programming this very simple device

– microprogramming

CS141-L4-71 Tarun Soni, Summer ‘03

Microprogramming

Opcode

State Reg

Inputs

Outputs

Control LogicPLA or ROM

M ulticycleDatapath

1

Address Select Logic

Adder

Types of “ branching”• Set state to 0• Dispatch (state 1)• Use incremented state

number

Common case: State += 1;

Microprogramming:A Part icular Strategy for Implement ing the Control Unit of a processor by "programming" at the level of register transfer operations

Microarchitecture:Logical structure and functional capabilities of the hardware as seen by the microprogrammer

Historical Note:

IBM 360 Series first to distinguish between architecture & organizat ion Same instruction set across wide range of implementat ions, each with different cost/performance

CS141-L4-72 Tarun Soni, Summer ‘03

Macro-Micro programming?

MainMemory

executionunit

controlmemory

CPU

ADDSUBAND

DATA

.

.

.

User program plus Data

this can change!

AND microsequence

e.g., FetchCalc Operand AddrFetch Operand(s)CalculateSave Answer(s)

one of these ismapped into oneof these

Page 13: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

13

CS141-L4-73 Tarun Soni, Summer ‘03

Horizontal Microinstructions

° “Horizontal” Microcode

– control field for each control point in the machine

µseq µaddr A-mux B-mux bus enables register enables

Control Logic / Store(PLA, ROM)

OPcode

Datapath

Inst

ruct

ion

Decode

Con

ditio

ns

ControlPoints

microinstruction

Depending on bus organization, many potent ial control combinations simply wrong, i.e., implies transfers that can never happen atthe same time.

Idea: encode fields to save ROM space

Example: mem_to_reg and ALU_to_reg should never happen simultenously;=> encode in single bit which is decoded rather than two separate bits

CS141-L4-74 Tarun Soni, Summer ‘03

Vertical Microinstructions

° “Vertical” Microcode

– encoded control fields with local decode

src dst

DEC

DEC

other control fields next states inputs

MUX

Some of these may havenothing to do with registers!

CS141-L4-75 Tarun Soni, Summer ‘03

Design Microinstruction Sets

1) Start with list of control signals2) Group signals together that make sense (vs. random): called “ fields”3) Places fields in some logical order

(e.g., ALU operation & ALU operands first andmicroinstruction sequencing last)

4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals

– Use computers to design computers5) To minimize the width, encode operations that will never be used at the same

time

CS141-L4-76 Tarun Soni, Summer ‘03

Microinstructions Start with list of control signals, grouped into fields

Signal name Effect when deasserted Effect when assertedALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs]RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst

Reg. dest. no. = rt Reg. dest. no. = rdTargetWrite None Target reg. = ALU MemRead None Memory at address is readMemWrite None Memory at address is written IorD Memory address = PC Memory address = ALUIRWrite None IR = MemoryPCWrite None PC = PCSourcePCWriteCond None IF ALUzero then PC = PCSource

Sing

le B

it C

ontr

ol

Signal name Value EffectALUOp 00 ALU adds

01 ALU subtracts 10 ALU does function code11 ALU does logical OR

ALUSelB 000 2nd ALU input = Reg[rt] 001 2nd ALU input = 4 010 2nd ALU input = sign extended IR[15-0] 011 2nd ALU input = sign extended, shift left 2 IR[15-0]100 2nd ALU input = zero extended IR[15-0]

PCSource 00 PC = ALU 01 PC = Target 10 PC = PC+4[29-26] : IR[25–0] << 2

Mul

tiple

Bit

Con

trol

CS141-L4-77 Tarun Soni, Summer ‘03

Microinstructions

Field Name Width Control Signals Set

wide narrow

ALU Control 4 2 ALUOp

SRC1 2 1 ALUSelA

SRC2 5 3 ALUSelB

ALU Destination 6 4 RegWrite, MemtoReg, RegDst, TargetWr.

Memory 4 3 MemRead, MemWrite, IorD

Memory Register 1 1 IRWrite

PCWrite Control 5 4 PCWrite, PCWriteCond, PCSource

Sequencing 3 2 AddrCtl

Total width 30 20 bits

CS141-L4-78 Tarun Soni, Summer ‘03

Microinstructions: MIPS f ield name and values

Field Name Values for Field Function of Field with Specific ValueALU Add ALU adds

Subt. ALU subtractsFunc code ALU does function codeOr ALU does logical OR

SRC1 PC 1st ALU input = PCrs 1st ALU input = Reg[rs]

SRC2 4 2nd ALU input = 4Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]

ALU destination Target Target = ALUoutrd Reg[rd] = ALUout

Memory Read PC Read memory using PCRead ALU Read memory using ALU outputWrite ALU Write memory using ALU output

Memory register IR IR = MemWrite rt Reg[rt] = MemRead rt Mem = Reg[rt]

PC write ALU PC = ALU outputTarget-cond. IF ALU Zero then PC = Targetjump addr. PC = PCSource

Sequencing Seq Go to sequential µinstructionFetch Go to the first microinstructionDispatch Dispatch using ROM.

Page 14: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

14

CS141-L4-79 Tarun Soni, Summer ‘03

Microinstructions: The datapath again

Shift�

left 2

MemtoReg

IorD MemRead MemWri te

PC

Memory

MemData

Write�

data

M�

u�

x

0

1

RegistersWrite

�register

Write�

data

Read�

data 1

Read�

data 2

Read�

register 1

Read�

register 2

Instruction�

[15–11]

M�

u�

x

0

1

M�

u�

x

0

1

4

ALUOpALUSrcB

RegDst RegWrite

Instruction�

[15–0]

Instruction [5– 0]

Sign�

extend

3216

Instruction�

[25–21]

Instruction�

[20–16]

Instruction�

[15– 0]

Instruction�

register1 M

�u

�x

0

3

2

ALU�

control

M�

u�

x

0

1ALU

�result

ALU

ALUSrcA

ZeroA

B

ALUOut

IRWr ite

Address

Memory�

data�

register

Field Name Values for Field Function of Field with Specific ValueSRC1 PC 1st ALU input = PC

rs 1st ALU input = Reg[rs]SRC2 4 2nd ALU input = 4

Extend 2nd ALU input = sign ext. IR[15-0]Extend0 2nd ALU input = zero ext. IR[15-0] Extshft 2nd ALU input = sign ex., sl IR[15-0]rt 2nd ALU input = Reg[rt]

ALU destination Target Target = ALUoutrd Reg[rd] = ALUout

CS141-L4-80 Tarun Soni, Summer ‘03

Microinstructions: Pros-Cons

• Specification Advantages:

– Easy to design and write

– Design architecture and microcode in parallel

• Implementation (off-chip ROM) Advantages

– Easy to change since values are in memory

– Can emulate other architectures and instruction sets

– Can make use of internal registers

• Implementation Disadvantages, SLOWER now that:

– Control is implemented on same chip as processor

– ROM is no longer faster than RAM

– No need to go back and make changes

CS141-L4-81 Tarun Soni, Summer ‘03

CPU Control: Methodology

Initial�

representationFinite state

diagramMicroprogram

Sequencing�

controlExplicit next

state functionMicroprogram counter

+ dispatch ROMS

Logic�

representationLogic

equationsTruth

tables

Implementation�

techniqueProgrammable

logic arrayRead only

memory

CS141-L4-82 Tarun Soni, Summer ‘03

Microprogramming: the last word ?

Summary: Microprogramming one inspiration for RISC

• If simple instruction could execute at very high clock rate…

• If you could even write compilers to produce microinstructions…

• If most programs use simple instructions and addressing modes…

• If microcode is kept in RAM instead of ROM so as to fix bugs …

• If same memory used for control memory could be used instead as cache for “macroinstructions”…

• Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1)

CS141-L4-83 Tarun Soni, Summer ‘03

Exceptions

Support ing exceptions in our FSM

MemReadALUSelA = 0

IorD = 0IRWrite

ALUSelB = 01ALUOp = 00

PCWritePCSource = 00

ALUSelA = 0ALUSelB = 11ALUOp = 00TargetWrite

Memory InstFSM

R-type InstFSM

Branch InstFSM

Jump InstFSM

Instruction Fetch, state 0 Instruction Decode/ Register Fetch, state 1

Opcode = LW or SW

Opcode = R-ty

pe

Opc

ode

= BE

Q

Opc

ode

= J

MP

Start

to state 10

Opcode = anything else

CS141-L4-84 Tarun Soni, Summer ‘03

Exceptions

Support ing exceptions in our FSM

ALUSelA = 1ALUSelB = 00ALUOp = 10

from state 1

ALUSelA = 1RegDst = 1RegWrite

MemtoReg = 0ALUSelB = 10ALUOp = 10

To state 0

R-type instructions

overflowTo state 11

Page 15: Multi Cycle CPU Mid-term Review Discussion Sessioncseweb.ucsd.edu/~tsoni/cse141/L5.pdfMultiplication, Division, Floating Point numbers ALUs –Single Cycle CPU Exceptions Multicycle

15

CS141-L4-85 Tarun Soni, Summer ‘03

Exceptions

IntCause=1CauseWrite

ALUSelA = 0ALUSelB = 01ALUOp = 01

EPCWrite

To state 0 (fetch)

IntCause=0CauseWrite

PCWritePCSource=11

state 11

state 13

state 10 state 12

EPC

Cau

se

PC

PCWrite EPCWrite

CauseWrite

IntCause

PCSource

InterruptHandlerAddress

sub4

illegalinstruction

arithmeticoverflow

Support ing exceptions in our FSM

Write Cause into registerWrite PC into EPCLoad Exception Handler address to PC

CS141-L4-86 Tarun Soni, Summer ‘03

Exceptions

IR <= MEM[PC]PC <= PC + 4

R-type

A <= R[rs]B <= R[rt]

S <= A fun B

R[rd] <= S

S <= A op ZX

R[rt] <= S

ORi

S <= A + SX

R[rt] <= M

M <= MEM[S]

LW

S <= A + SX

MEM[S] <= B

SW

other

undefined instruction

EPC <= PC - 4PC <= exp_addrcause <= 10 (RI)

EPC <= PC - 4PC <= exp_addrcause <= 12 (Ovf)

overflow

Additional condition fromDatapath

Equal

BEQ

PC <= PC +SX || 00

0010

0011

S <= A - B ~Equal

CS141-L4-87 Tarun Soni, Summer ‘03

Summary

• multicycle CPUs make things faster.

• control is harder.

• microprogramming can simplify (conceptually) CPU control generation

• a microprogram is a small program inside the CPU that executes the individual instructions of the “real” program.

• exception-handling is difficult in the CPU, because the interactions between the executing instructions and the interrupt are complex and unpredictable.

CS141-L4-88 Tarun Soni, Summer ‘03

Mid-Term Review

• Technology trends: Design for the future • Instruction Set Architectures: types of ISAs: Addressing modes, length of instruction etc.• MIPS instruction format-basic classes of instructions• Registers and load store architectures• Data types, operands, memory organization/addressing• Basic MIPS instructions: Arithmetic, logical, data transfer, branching, jumps• Issues in jump/branching distance and immediate addressing modes• Stacks and frames• E.g., swap(), leaf_procedure(), nested_procedure()

• Performance: Relative (Boeing e.g,), Metrics, Benchmarking, SPEC marks• Performance = Instruction Count x Cycles/Instruction x Seconds/Cycle • Amdahl’s law Improvement = Execution Time Unaffected + ( Execution Time Affected /

Amount of Improvement )• Arithmetic: 2s complement• Basic digital logic, 1-bit adder, full adder, 32-bit adder/subtractor• ALU: adder+mux+special conditions• Delays in combinational logic, clocking• Ripple carry vs. Carry look ahead adders

CS141-L4-89 Tarun Soni, Summer ‘03

Mid-Term Review

• Multiplication & Division: grade school version• 3 incrementally better algorithms (data paths)

• Basics of booth arithmetic• Floating point representation• Floating point operations (+,-,*,/)• Guard,round and sticky bits

• Single cycle CPU• Building blocks: Register files, memory etc.• Storage units, clocking methodology • PC arithmetic• Instruction fetch• Datapath on various operations: Load, Store, Branch, R-type, I-type• Control: basic control signals for the MIPS subset• Distributed control: Main control + ALU control • PLA implementation• Timing diagrams

CS141-L4-90 Tarun Soni, Summer ‘03

Mid-Term Review

• Multi-cycle CPU• Datapath: registers/stages: Ifetch, A,B, Execute, Store etc.• Various instructions through the datapath• Control: Sharing functional units• Finite state machine perspective for control: FSM for MIPS • Implementation styles: ROM, PLA• Microprogramming: Horizontal, vertical, relationship to RISC• Exceptions: change in FSM, internal, external; need to save state.