Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: +...

50
Carnegie Mellon 1 Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier http://www.syssec.ethz.ch/education/Digitaltechnik_14 Multi-cycle MIPS Processor

Transcript of Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: +...

Page 1: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

1

Design of Digital Circuits 2014Srdjan CapkunFrank K. Gürkaynak

Adapted from Digital Design and Computer Architecture, David Money Harris & Sarah L. Harris ©2007 Elsevier

http://www.syssec.ethz.ch/education/Digitaltechnik_14

Multi-cycle MIPS Processor

Page 2: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

2

What Will We Learn?

What are the problems of Single-cycle Processor

Multi-cycle Architecture for the MIPS

Determine the performance of Multi-cycle Processor

Page 3: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

3

Single-cycle MIPS Processor

Single-cycle microarchitecture:+ simple

- cycle time limited by longest instruction (lw)

- two adders/ALUs and two memories

Page 4: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

4

Single-cycle Performance

SignImm

CLK

A RD

Instruction

Memory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1

A RD

Data

Memory

WD

WE0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

Control

Unit

Zero

PCSrc

CLK

ALUControl2:0

ALU

1

0100

1

0

1

0 0

Page 5: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

5

Delay of Individual Components (Example)

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Page 6: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

6

Processor Performance

How fast is my program? Every program consists of a series of instructions

Each instruction needs to be executed.

So how fast are my instructions ? Instructions are realized on the hardware

They can take one or more clock cycles to complete

Cycles per Instruction = CPI

How much time is one clock cycle? The critical path determines how much time one cycle requires =

clock period.

1/clock period = clock frequency = how many cycles can be done each second.

Page 7: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

7

Processor Performance

Now as a general formula Our program consists of executing N instructions.

Our processor needs CPI cycles for each instruction.

The maximum clock speed of the processor is f,and the clock period is therefore T=1/f

Our program will execute in

N x CPI x (1/f) = N x CPI x T seconds

Page 8: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

8

How can I Make the Program Run Faster?

N x CPI x (1/f)

Reduce the number of instructions Make instructions that ‘do’ more (CISC)

Use better compilers

Use less cycles to perform the instruction Simpler instructions (RISC)

Use multiple units/ALUs/cores in parallel

Increase the clock frequency Find a ‘newer’ technology to manufacture

Redesign time critical components

Adopt pipelining

Page 9: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

9

Multi-cycle MIPS Processor

Single-cycle microarchitecture:+ simple

- cycle time limited by longest instruction (lw)

- two adders/ALUs and two memories

Multi-cycle microarchitecture:+ higher clock speed

+ simpler instructions run faster

+ reuse expensive hardware on multiple cycles

- sequencing overhead paid many times

Same design steps: datapath & control

Page 10: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

10

What Do We Want To Optimize

Single Cycle Architecture uses two memories

One memory stores instructions, the other data

We want to use a single memory (Smaller size)

Page 11: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

11

What Do We Want To Optimize

Single Cycle Architecture uses two memories

One memory stores instructions, the other data

We want to use a single memory (Smaller size)

Single Cycle Architecture needs three adders

ALU, PC, Branch address calculation

We want to use the ALU for all operations (smaller size)

Page 12: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

12

What Do We Want To Optimize

Single Cycle Architecture uses two memories

One memory stores instructions, the other data

We want to use a single memory (Smaller size)

Single Cycle Architecture needs three adders

ALU, PC, Branch address calculation

We want to use the ALU for all operations (smaller size)

In Single Cycle Architecture all instructions take one cycle

The most complex operation slows down everything!

Divide all instructions into multiple steps

Simpler instructions can take fewer cycles (average case may be faster)

Page 13: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

13

Consider lw instruction

For an instruction such as: lw $t0, 0x20($t1)

We need to:

Read the instruction from memory

Then read $t1 from register array

Add the immediate value (0x20) to calculate the memory address

Read the content of this address

Write to the register $t0 this content

Page 14: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

14

Multi-cycle Datapath: instruction fetch

First consider executing lw

STEP 1: Fetch instruction

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Register

File

PCPC' Instr

CLK

WD

WE

CLK

EN

IRWrite

read from the memory location [rs]+imm to location [rt]

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 15: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

15

Multi-cycle Datapath: lw register read

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Register

File

PCPC' Instr25:21

CLK

WD

WE

CLK CLK

A

EN

IRWrite

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 16: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

16

Multi-cycle Datapath: lw immediate

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

PCPC' Instr25:21

15:0

CLK

WD

WE

CLK CLK

A

EN

IRWrite

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 17: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

17

Multi-cycle Datapath: lw address

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

PCPC' Instr25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK CLK

A CLK

EN

IRWrite

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 18: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

18

Multi-cycle Datapath: lw memory read

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

PCPC' Instr25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

0

1

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 19: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

19

Multi-cycle Datapath: lw write register

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

PCPC' Instr25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

RegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

0

1

op rs rt imm

6 bits 5 bits 5 bits 16 bits

I-Type

Page 20: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

20

Multi-cycle Datapath: increment PC

PCWrite

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1PCPC' Instr25:21

15:0

SrcB

20:16

ALUResult

SrcA

ALUOut

ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00

01

10

11

4

CLK

ENEN

ALUSrcB1:0

IRWriteIorD

0

1

Page 21: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

21

Multi-cycle Datapath: sw

Write data in rt to memory

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

MemWrite ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00

01

10

11

4

CLK

ENEN

ALUSrcB1:0

IRWriteIorDPCWrite

B

Page 22: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

22

Multi-cycle Datapath: R-type Instructions

Read from rs and rt

Write ALUResult to register file

Write to rd (instead of rt)

0

1

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

ALUResult

SrcA

ALUOut

RegDstMemWrite MemtoReg ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0

IRWriteIorDPCWrite

Page 23: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

23

Multi-cycle Datapath: beq

Determine whether values in rs and rt are equal

Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4)

SignImm

b

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0

IRWriteIorD PCWrite

PCEn

Page 24: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

24

Complete Multi-cycle Processor

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Re

gD

st

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

Page 25: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

25

Control Unit

ALUSrcA

PCSrc

Branch

ALUSrcB1:0

Opcode5:0

Control

Unit

ALUControl2:0

Funct5:0

Main

Controller

(FSM)

ALUOp1:0

ALU

Decoder

RegWrite

PCWrite

IorD

MemWrite

IRWrite

RegDst

MemtoReg

Register

Enables

Multiplexer

Selects

Page 26: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

26

Main Controller FSM: Fetch

Reset

S0: Fetch

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

0

1 1

0

X

X

00

01

0100

1

0

Page 27: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

27

Main Controller FSM: Fetch

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

0

1 1

0

X

X

00

01

0100

1

0

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

Reset

S0: Fetch

Page 28: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

28

Main Controller FSM: Decode

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

Reset

S0: Fetch S1: Decode

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

X

0 0

0

X

X

0X

XX

XXXX

0

0

Page 29: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

29

Main Controller FSM: Address Calculation

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LW

or

Op = SW

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

X

0 0

0

X

X

01

10

010X

0

0

Page 30: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

30

Main Controller FSM: Address Calculation

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LW

or

Op = SW

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

X

0 0

0

X

X

01

10

010X

0

0

Page 31: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

31

Main Controller FSM: lw

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemRead

Op = LW

or

Op = SW

Op = LW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Page 32: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

32

Main Controller FSM: sw

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1IorD = 1

MemWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

Op = LW

or

Op = SW

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Page 33: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

33

Main Controller FSM: R-Type

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

Op = LW

or

Op = SW

Op = R-type

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Page 34: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

34

Main Controller FSM: beq

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Page 35: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

35

Complete Multi-cycle Controller FSM

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Page 36: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

36

Main Controller FSM: addi

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

Op = ADDI

S9: ADDI

Execute

S10: ADDI

Writeback

Page 37: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

37

Main Controller FSM: addi

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 0

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

RegDst = 0

MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDI

Execute

S10: ADDI

Writeback

Page 38: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

38

Extended Functionality: j

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0

IRWriteIorD PCWrite

PCEn

00

01

10

<<2

25:0 (jump)

31:28

27:0

PCJump

Page 39: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

39

Control FSM: j

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 00

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

RegDst = 0

MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDI

Execute

S10: ADDI

Writeback

Op = J

S11: Jump

Page 40: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

40

Control FSM: j

IorD = 0

AluSrcA = 0

ALUSrcB = 01

ALUOp = 00

PCSrc = 00

IRWrite

PCWrite

ALUSrcA = 0

ALUSrcB = 11

ALUOp = 00

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

IorD = 1

RegDst = 1

MemtoReg = 0

RegWrite

IorD = 1

MemWrite

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 01

PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALU

Writeback

S8: Branch

Op = LW

or

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0

MemtoReg = 1

RegWrite

S4: Mem

Writeback

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

RegDst = 0

MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDI

Execute

S10: ADDI

Writeback

PCSrc = 10

PCWrite

Op = J

S11: Jump

Page 41: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

41

Multi-cycle Performance

Instructions take different number of cycles:

3 cycles: beq, j

4 cycles: R-Type, sw, addi

5 cycles: lw

CPI is weighted average, i.e. SPECINT2000 benchmark:

25% loads

10% stores

11% branches

2% jumps

52% R-type

Average CPI = (0.11 + 0.02) 3 +(0.52 + 0.10) 4 +(0.25) 5 = 4.12

Page 42: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

42

Multi-cycle Performance

Multi-cycle critical path:

Tc =

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Re

gD

st

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

Page 43: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

43

Multi-cycle Performance

Multicycle critical path:

Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup

SignImm

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Re

gD

st

Branch

MemWrite

Mem

toR

eg

ALUSrcA

RegWriteOp

Funct

Control

Unit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

Page 44: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

44

Multicycle Performance Example

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc =

Page 45: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

45

Multicycle Performance Example

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup

= [30 + 25 + 250 + 20] ps

= 325 ps

Page 46: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

46

Multi-cycle Performance Example

For a program with 100 billion instructions executing on a multi-cycle MIPS processor

CPI = 4.12

Tc = 325 ps

Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12)= 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why?

Page 47: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

47

Multi-cycle Performance Example

For a program with 100 billion instructions executing on a multi-cycle MIPS processor

CPI = 4.12

Tc = 325 ps

Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12)= 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why?

Not all steps the same length

Sequencing overhead for each step (tpcq + tsetup= 50 ps)

Page 48: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

48

Review: Single-Cycle MIPS Processor

SignImm

CLK

A RD

Instruction

Memory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1

A RD

Data

Memory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

Control

Unit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0

1

25:0 <<2

27:0 31:28

PCJump

Jump

Page 49: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

49

Review: Multicycle MIPS Processor

ImmExt

CLK

ARD

Instr / Data

Memory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

Register

File

0

1

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

Zero

CLK

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

00

01

10

<<2

25:0 (Addr)

31:28

27:0

PCJump

5:0

31:26

Branch

MemWrite

ALUSrcA

RegWriteOp

Funct

Control

Unit

PCSrc

CLK

ALUControl2:0

ALUSrcB1:0IRWrite

IorD

PCWrite

PCEn

Re

gD

st

Mem

toR

eg

Page 50: Multi-cycle MIPS Processor - ETH Z · Multi-cycle MIPS Processor Single-cycle microarchitecture: + simple-cycle time limited by longest instruction (lw)-two adders/ALUs and two memories

Carnegie Mellon

50

What Have We Learned?

A more ‘realistic’ architecture

Shared data and program memory

A single ALU for all operations

Multi-cycle: Operations take different number of cycles

Simpler operations take less steps

More complex operations take more steps

Average CPI

Bottom line

Smaller

More complex control

Not necessarily faster (overhead)