COSC 2021: Computer Organization Instructor: Dr. Amir Asif...

Post on 22-Jul-2020

6 views 0 download

Transcript of COSC 2021: Computer Organization Instructor: Dr. Amir Asif...

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Department of Computer Science York University

Handout # 11 Multicycle Implementation of a MIPS Processor

Topics: A multiple cycle implementation

Distributed Notes

2

Why Multicycle?

Example: Assume that the operation times for major functional unit in a microprocessor are: Memory unit ~ 2ns, ALU and adders ~ 2ns, Register file ~ 1ns Compare the performance of the following instruction mix Loads: 24%; Stores: 12%; ALU instructions: 44%; Branches: 18%; Jumps: 2% on the two implementations Implementation I: Each instruction operates in 1 clock cycle Implementation II: Each instruction is as long as it needs to be.

Instruction Class Functional units used (Steps involved) ALU type Instruction fetch Register Access ALU Register Access 6ns

Load word Instruction fetch Register Access ALU Memory Access Register Access 8ns

Store word Instruction fetch Register Access ALU Memory Access 7ns

Branch Instruction fetch Register Access ALU 5ns

Branch Instruction fetch 2ns

Average time per instruction: Implementation 1: ~ 8ns Implementation 2: ~0.24(8)+0.12(7)+0.44(6)+0.18(5)+0.02(2) = 6.34ns

3

Multicycle Implementation

Instruction: —  Execution of each instruction is broken into different steps —  Each step requires 1 clock cycle —  Each instruction takes multiple clock cycles Functional Unit: —  Can be used more than once in an instruction (but still only once in a clock cycle) Advantages: —  Functional units can be shared —  ALU and adder is combined —  Single memory is used for instructions and data

4

Multicycle Implementation: Abstract Diagram

P C

M e m o r y

A d d r e s s

I n s t r u c t i o n o r d a t a

D a t a

I n s t r u c t i o n r e g i s t e r

R e g i s t e r s R e g i s t e r #

D a t a

R e g i s t e r #

R e g i s t e r #

A L U M e m o r y

d a t a r e g i s t e r

A

B

A L U O u t

—  One ALU is used for incrementing PC and for arithmetic operations —  Data memory and Instruction memory are combined —  5 additional registers are added

1.  An instruction register (IR) to hold instructions before distributing data to register file or ALU 2.  A memory data register (MDR) to hold data before distributing to register file or ALU 3.  Regsiters A and B that hold data before the ALU 4.  Register ALUout that hold data computed by ALU

5

S h i f t l e f t 2

P C

M e m o r y M e m D a t a

W r i t e d a t a

M u x 0

1 R e g i s t e r s

W r i t e r e g i s t e r W r i t e d a t a

R e a d d a t a 1

R e a d d a t a 2

R e a d r e g i s t e r 1 R e a d r e g i s t e r 2

M u x

0

1 M u x

0

1

4 I n s t r u c t i o n

[ 1 5 – 0 ]

S i g n e x t e n d

3 2 1 6

I n s t r u c t i o n [ 2 5 – 2 1 ]

I n s t r u c t i o n [ 2 0 – 1 6 ]

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n r e g i s t e r 1 M

u x

0

3 2

M u x

A L U r e s u l t A L U Z e r o

M e m o r y d a t a

r e g i s t e r

I n s t r u c t i o n [ 1 5 – 1 1 ]

A

B A L U O u t

0

1 A d d r e s s

Multicycle Implementation: Multiplexers added

Because functional units are shared, multiplexers are added to select data between different devices 1.  MUX before memory selects either the PC output (fetch instruction) or ALU output (storing data) 2.  MUX before “write register” selects write-register number (instruction [15-11] or instruction[20-16]) 3.  MUX before “write data” selects data from “ALUOut” (R-type instruction) or “MemData” (lw instruction) 4.  Upper MUX before ALU selects PC output (increment PC) or “Read data 1” (R-type instruction) 5.  Lower MUX before ALU selects “Read data 2”, or “sign extended instruction[15-0]” or shift left sign

extended instruction[15-0], or 4

6

Multicycle Implementation: Controls added

Because functional units are shared, multiplexers are added to select data between different devices 1.  MUX before memory selects either the PC output (fetch instruction) or ALU output (storing data) 2.  MUX before “write register” selects write-register number (instruction [15-11] or instruction[20-16])

S h i f t l e f t 2

M e m t o R e g

I o r D M e m R e a d M e m W r i t e

P C

M e m o r y M e m D a t a

W r i t e d a t a

M u x 0

1 R e g i s t e r s

W r i t e r e g i s t e r W r i t e d a t a

R e a d d a t a 1 R e a d

d a t a 2

R e a d r e g i s t e r 1 R e a d r e g i s t e r 2

I n s t r u c t i o n [ 1 5 – 1 1 ]

M u x 0

1

M u x 0

1

4

A L U O p A L U S r c B

R e g D s t R e g W r i t e

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n [ 5 – 0 ]

S i g n e x t e n d

3 2 1 6

I n s t r u c t i o n [ 2 5 – 2 1 ]

I n s t r u c t i o n [ 2 0 – 1 6 ]

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n r e g i s t e r 1 M

u x

0

3 2

A L U c o n t r o l

M u x 0

1 A L U

r e s u l t A L U

A L U S r c A

Z e r o A

B A L U O u t

I R W r i t e

A d d r e s s

M e m o r y d a t a

r e g i s t e r

7

Multicycle Implementation: Control Units added

S h i f t l e f t 2

P C M u x 0

1 R e g i s t e r s

W r i t e r e g i s t e r W r i t e d a t a

R e a d d a t a 1 R e a d

d a t a 2

R e a d r e g i s t e r 1 R e a d r e g i s t e r 2

I n s t r u c t i o n [ 1 5 – 1 1 ]

M u x 0

1 M u x 0

1

4 I n s t r u c t i o n

[ 1 5 – 0 ]

S i g n e x t e n d

3 2 1 6

I n s t r u c t i o n [ 2 5 – 2 1 ]

I n s t r u c t i o n [ 2 0 – 1 6 ]

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n r e g i s t e r

A L U c o n t r o l

A L U r e s u l t A L U Z e r o

M e m o r y d a t a

r e g i s t e r

A

B

I o r D M e m R e a d M e m W r i t e

M e m t o R e g

P C W r i t e C o n d P C W r i t e

I R W r i t e

A L U O p A L U S r c B A L U S r c A

R e g D s t

P C S o u r c e

R e g W r i t e C o n t r o l O u t p u t s

O p [ 5 – 0 ]

I n s t r u c t i o n [ 3 1 - 2 6 ]

I n s t r u c t i o n [ 5 – 0 ]

M u x

0

2 J u m p a d d r e s s [ 3 1 - 0 ] I n s t r u c t i o n [ 2 5 – 0 ] 2 6 2 8

S h i f t l e f t 2 P C [ 3 1 - 2 8 ]

1

1 M u x

0

3 2

M u x 0

1 A L U O u t

M e m o r y M e m D a t a

W r i t e d a t a

A d d r e s s

8

Action of 1-bit Control Signals

Control Input Effect when Deasserted (0) Effect when asserted (1)

IorD PC supplies address to memory (instruction fetch) ALUout supplies address to memory (lw/sw)

MemRead None Memory content specified by address is placed on “Memdata” o/p (lw/any instruction)

MemWrite None I/p “Write data” is stored at specified address (sw)

IRWrite None “MemData” o/p is written on IR (instruction fetch)

RegDst “Write Register” specified by Instruction[20-16] (lw) “WriteRegister” specified by Instruction[15-11] (R-type)

RegWrite None Data from “WriteData” i/p is written on the register specified by “WriteRegister” number

ALUSrcA PC is the first operand in ALU (increment PC) Register A is the first operand in ALU

MemtoReg “WriteData” of the register file comes from ALUOut “WriteData” of the register file comes from MDR

PCWrite Operation at PC depends on PCWriteCond and zero output of ALU PC is written; Source is determined by PCSource

PCWriteCond Operation at PC depends on PCWrite PC is written if zero o/p of ALU = 1; Source is determined by PCSource

9

Action of 2-bit Control Signals

Control Input Value Effect

ALUOp

00 ALU performs an add operation

01 ALU performs a subtract operation

10 The function field of Instruction defines the operation of ALU

ALUSrcB

00 The second operand of ALU comes from Register B

01 The second operand of ALU = 4

10 The second operand of ALU is sign extended Instruction[15-0]

11 The second operand of ALU is sign extended, 2-bit left shifted Instruction[15-0]

PCSource

00 Output of ALU (PC + 4) is sent to PC

01 Contents of ALUOut (branch target address = PC + 4 + 4 x offset) is sent to PC

10 Contents of Instruction[25-0], shift left by 2, and concatenated with the MSB 4-bits of PC is sent to PC (jump instruction)

10

Breaking the Instruction Execution into Clock Cycles

Execution of each instruction is broken into a series of steps —  Each step is balanced to do almost equal amount of work —  Each step takes one clock cycle —  Each step contains at the most 1 ALU operation, or 1 register file access, or 1 memory access —  Operations listed in 1 step occurs in parallel in 1 clock cycle —  Different steps occur in different clock cycles —  Different steps are:

1.  Instruction fetch step 2.  Instruction decode and register fetch step 3.  Execution, memory address computation, or branch completion step 4.  Memory access of R-type instruction completion step 5.  Memory read completion step

11

Multicycle Implementation: Control Units added

S h i f t l e f t 2

P C M u x 0

1 R e g i s t e r s

W r i t e r e g i s t e r W r i t e d a t a

R e a d d a t a 1 R e a d

d a t a 2

R e a d r e g i s t e r 1 R e a d r e g i s t e r 2

I n s t r u c t i o n [ 1 5 – 1 1 ]

M u x 0

1 M u x 0

1

4 I n s t r u c t i o n

[ 1 5 – 0 ]

S i g n e x t e n d

3 2 1 6

I n s t r u c t i o n [ 2 5 – 2 1 ]

I n s t r u c t i o n [ 2 0 – 1 6 ]

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n r e g i s t e r

A L U c o n t r o l

A L U r e s u l t A L U Z e r o

M e m o r y d a t a

r e g i s t e r

A

B

I o r D M e m R e a d M e m W r i t e

M e m t o R e g

P C W r i t e C o n d P C W r i t e

I R W r i t e

A L U O p A L U S r c B A L U S r c A

R e g D s t

P C S o u r c e

R e g W r i t e C o n t r o l O u t p u t s

O p [ 5 – 0 ]

I n s t r u c t i o n [ 3 1 - 2 6 ]

I n s t r u c t i o n [ 5 – 0 ]

M u x

0

2 J u m p a d d r e s s [ 3 1 - 0 ] I n s t r u c t i o n [ 2 5 – 0 ] 2 6 2 8

S h i f t l e f t 2 P C [ 3 1 - 2 8 ]

1

1 M u x

0

3 2

M u x 0

1 A L U O u t

M e m o r y M e m D a t a

W r i t e d a t a

A d d r e s s

12

Step 1: Instruction Fetch

Fetch instruction from memory and compute the address of next sequential instruction

IR = Memory[PC]; PC = PC + 4;

Operation: 1.  Send PC to the memory as address (IorD = 0) 2.  Read memory cell defined by PC (MemRead = 1) 3.  Copy output of memory (MeMdata) into IR (IRwrite = 1) 4.  Increment PC by 4 (ALUSrcA = 0, ALUSrcB = 01, PCSrc = 00) 5.  Store (PC + 4) into PC (PCWrite = 1)

13

Step 2: Instruction Decode and Register Fetch

Read register rs in register file and store content of rs in register A Read rt in register file and store content of rt from register file Compute branch target address

A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2);

Operation: 1.  Access register file to write rs in A. 2.  Access register file to write rt in B. 3.  Compute branch target address and store in ALUOut (ALUSrcA = 0; ALUSrcB = 11)

Remember that ALU must add (ALUOp = 00)

After this step, one of the four actions are possible: Memory reference (lw/sw), R-type, Branch, or Jump

14

Step 3: Execution, Memory address Computation, or Branch Completion

Memory Reference (sw/lw): ALUOut = A + sign-extend(IR[15-0])

ALU adds content of A and sign-extend(IR[15-0]) (ALUSrcA = 1, ALUSrcB = 10), (ALUOp = 00)

R-type (add/sub/or/and): ALUOut = A op B

ALU performs specified operation on A and B (ALUSrcA = 1, ALUSrcB = 00), Operation of ALU is determined by the function field code (ALUOp = 10) Branch (beq):

if (A == B) PC = ALUOut; ALU does the equal comparison operation on A and B (ALUSrcA = 1, ALUSrcB = 00), ALU must subtract (ALUOp = 01) Update PC with ALUOut if A == B (PCWriteCond = 1, PCSource = 01) . Complete. Jump (j):

PC = PC[31-28] || (IR[25-0) << 2); PC gets overwritten by output of jump address MUX (PCSource = 10, PCWrite = 1). Complete.

15

Step 4: Memory Access or R-type Instruction Completion

Memory Reference (sw/lw): MDR = Memory[ALUOut]; (for lw) or Memory[ALUOut] = B; (for sw)

1.  Address from ALUOut is applied at “address” i/p of memory (IorD = 1) 2.  For sw, MemWrite = 1. For lw, MemRead = 1.

sw is complete. R-type Instruction (add/sub/or/and):

Reg[IR[15-11]] = ALUOut;

ALUOut is stored into the register specified by IR[15-11] (MemtoReg = 0, RegWrite = 1). Complete.

16

Step 5: Memory Read Completion

load (lw): Reg[IR[20-16]] = MDR;

MDR is stored into the register specified by IR[20-16] (MemtoReg = 1, RegWrite = 1, RegDst = 0)

17

Summary of Steps used in different Instructions

Step Name Action for

R-type Instruction Memory Reference Instruction Branch Jump

Instruction fetch IR = Memory[PC];

PC = PC + 4;

Instruction decode / Register fetch

A = Reg[IR[25-21]]; B = Reg[IR[20-16]];

ALUOut = PC + (sign-extend(IR[15-0])<<2);

R-type Execution / address computation / Branch / Jump

ALUOut = A op B ALUOut = A + sign-extend(IR[15-0])

if(A == B) then PC = ALUOut;

PC = PC[31-28]|| (IR[25-0)<<2);

Memory Access / R-type Completion

Reg[IR[15-11]] = ALUOut;

lw: MDR = Memory[ALUOut] or sw: Memory[ALUOut] = B

Memory Read Completion

lw: Reg[IR[20-16]]=MDR;

18

Multipath Datapath Implementation: Control

—  Recall that design of single cycle datapath was based on a combinational circuit —  Design of multicycle datapath is more complicated

1.  Instructions are executed in a series of steps 2.  Each step must occur in a sequence 3.  Control of multicycle must specify both the control signals and the next step

—  The control of a multicycle datapath is based on a sequential circuit referred to as a finite state machine

00

10

01 11

State 0

State 1

State 2

State 3

A finite state diagram for a 2-bit counter —  Each state specifies a set of output —  By default, unspecified outputs are assumed disabled —  The number of the arrows identify inputs

19

Finite State Machine Control of Multicycle Datapath (1)

M e m o r y a c c e s s i n s t r u c t i o n s ( F i g u r e 5 . 3 8 )

R - t y p e i n s t r u c t i o n s ( F i g u r e 5 . 3 9 )

B r a n c h i n s t r u c t i o n ( F i g u r e 5 4 0 )

J u m p i n s t r u c t i o n ( F i g u r e 5 . 41 )

I n s t r u c t i o n f e t c h / d e c o d e a n d r e g i s t e r f e t c h ( F i g u r e 5 . 3 7 )

S t a r t

High-Level View

20

Finite State Machine Control of Multicycle Datapath (2)

A L U S r c A = 0 A L U S r c B = 1 1 A L U O p = 0 0

M e m R e a d A L U S r c A = 0

I o r D = 0 I R W r i t e

A L U S r c B = 0 1 A L U O p = 0 0

P C W r i t e P C S o u r c e = 0 0

I n s t r u c t i o n f e t c h I n s t r u c t i o n d e c o d e / R e g i s t e r f e t c h

( O p =

' J M P

' )

0 1

S t a r t

M e m o r y r e f e r e n c e F S M ( F i g u r e 5 . 3 8 ) R - t y p e F S M

( F i g u r e 5 . 3 9 ) B r a n c h F S M ( F i g u r e 5 . 4 0 )

J u m p F S M ( F i g u r e 5 . 4 1 )

Fig. 5.37: Steps 1 and 2: Instruction Fetch and Decode Instructions

21

Finite State Machine Control of Multicycle Datapath (3)

Fig. 5.38: Finite State Machine for Memory Reference Instructions M e m W r i t e

I o r D = 1 M e m R e a d I o r D = 1

A L U S r c A = 1 A L U S r c B = 1 0 A L U O p = 0 0

R e g W r i t e M e m t o R e g = 1

R e g D s t = 0

M e m o r y a d d r e s s c o m p u t a t i o n ( O p = ' L W ' ) o r ( O p = ' S W ' )

M e m o r y a c c e s s

W r i t e - b a c k s t e p

( O p =

' L W '

)

4

2

5 3

F r o m s t a t e 1

T o s t a t e 0 ( F i g u r e 5 . 3 2 )

M e m o r y a c c e s s

22

Finite State Machine Control of Multicycle Datapath (4)

Fig. 5.39: Finite State Machines for R-type Instructions

A L U S r c A = 1 A L U S r c B = 0 0 A L U O p = 1 0

R e g D s t = 1 R e g W r i t e

M e m t o R e g = 0

E x e c u t i o n

R - t y p e c o m p l e t i o n

6

7

( O p = R - t y p e ) F r o m s t a t e 1

T o s t a t e 0 ( F i g u r e 5 . 3 2 )

23

Finite State Machine Control of Multicycle Datapath (5)

Fig. 5.40: Finite State Machine for Branch Instruction

B r a n c h c o m p l e t i o n 8

( O p = ' B E Q ' ) F r o m s t a t e 1

T o s t a t e 0 ( F i g u r e 5 . 3 2 )

A L U S r c A = 1 A L U S r c B = 0 0 A L U O p = 0 1 P C W r i t e C o n d

P C S o u r c e = 0 1

J u m p c o m p l e t i o n 9

( O p = ' J ' ) F r o m s t a t e 1

T o s t a t e 0 ( F i g u r e 5 . 3 2 )

P C W r i t e P C S o u r c e = 1 0

Fig. 5.41: Finite State Machine for Jump Instruction

24

P C W r i t e P C S o u r c e = 1 0

A L U S r c A = 1 A L U S r c B = 0 0 A L U O p = 0 1 P C W r i t e C o n d

P C S o u r c e = 0 1

A L U S r c A = 1 A L U S r c B = 0 0

A L U O p = 1 0

R e g D s t = 1 R e g W r i t e

M e m t o R e g = 0 M e m W r i t e I o r D = 1

M e m R e a d I o r D = 1

A L U S r c A = 1 A L U S r c B = 1 0 A L U O p = 0 0

R e g D s t = 0 R e g W r i t e

M e m t o R e g = 1

A L U S r c A = 0 A L U S r c B = 1 1 A L U O p = 0 0

M e m R e a d A L U S r c A = 0

I o r D = 0 I R W r i t e

A L U S r c B = 0 1 A L U O p = 0 0

P C W r i t e P C S o u r c e = 0 0

I n s t r u c t i o n f e t c h I n s t r u c t i o n d e c o d e / r e g i s t e r f e t c h

J u m p c o m p l e t i o n

B r a n c h c o m p l e t i o n E x e c u t i o n

M e m o r y a d d r e s s c o m p u t a t i o n

M e m o r y a c c e s s

M e m o r y a c c e s s R - t y p e c o m p l e t i o n

W r i t e - b a c k s t e p

( O p =

' J ' )

( O p =

' L W

' )

4

0 1

9 8 6 2

7 5 3

S t a r t

25

Multicycle Implementation: Control Units added

S h i f t l e f t 2

P C M u x 0

1 R e g i s t e r s

W r i t e r e g i s t e r W r i t e d a t a

R e a d d a t a 1 R e a d

d a t a 2

R e a d r e g i s t e r 1 R e a d r e g i s t e r 2

I n s t r u c t i o n [ 1 5 – 1 1 ]

M u x 0

1 M u x 0

1

4 I n s t r u c t i o n

[ 1 5 – 0 ]

S i g n e x t e n d

3 2 1 6

I n s t r u c t i o n [ 2 5 – 2 1 ]

I n s t r u c t i o n [ 2 0 – 1 6 ]

I n s t r u c t i o n [ 1 5 – 0 ]

I n s t r u c t i o n r e g i s t e r

A L U c o n t r o l

A L U r e s u l t A L U Z e r o

M e m o r y d a t a

r e g i s t e r

A

B

I o r D M e m R e a d M e m W r i t e

M e m t o R e g

P C W r i t e C o n d P C W r i t e

I R W r i t e

A L U O p A L U S r c B A L U S r c A

R e g D s t

P C S o u r c e

R e g W r i t e C o n t r o l O u t p u t s

O p [ 5 – 0 ]

I n s t r u c t i o n [ 3 1 - 2 6 ]

I n s t r u c t i o n [ 5 – 0 ]

M u x

0

2 J u m p a d d r e s s [ 3 1 - 0 ] I n s t r u c t i o n [ 2 5 – 0 ] 2 6 2 8

S h i f t l e f t 2 P C [ 3 1 - 2 8 ]

1

1 M u x

0

3 2

M u x 0

1 A L U O u t

M e m o r y M e m D a t a

W r i t e d a t a

A d d r e s s

26

Finite State Machine Control of Multicycle Datapath (5)

P C W r i t e P C W r i t e C o n d I o r D

M e m t o R e g P C S o u r c e A L U O p A L U S r c B A L U S r c A R e g W r i t e R e g D s t

N S 3 N S 2 N S 1 N S 0

O p 5

O p 4

O p 3

O p 2

O p 1

O p 0

S 3

S 2

S 1

S 0

S t a t e r e g i s t e r

I R W r i t e

M e m R e a d M e m W r i t e

I n s t r u c t i o n r e g i s t e r o p c o d e f i e l d

O u t p u t s

C o n t r o l l o g i c

I n p u t s