Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS...

45
Chapter 4 The Processor

Transcript of Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS...

Page 1: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4

The Processor

Page 2: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 — The Processor — 2

Introduction We will examine two MIPS implementations

A simplified version A more realistic pipelined version

Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j

§4.1 Introduction

Page 3: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Login using

Username : your username

Password : your email password.

Uoh.blackboard.com

Page 4: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Go to “Courses” menu

Page 5: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Select “201401_COE308_001_3646: Computer

Architecture”

Page 6: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Select “Content “

Page 7: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Slides

Page 8: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

First Task

Page 9: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

First Task

Page 10: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 — The Processor — 10

Pipelining Analogy Pipelined laundry: overlapping execution

Parallelism improves performance

§4.5 An O

verview of P

ipelining Four loads: Speedup

= 8/3.5 = 2.3

Page 11: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 — The Processor — 11

MIPS Pipeline Five stages, one step per stage

1. IF: Instruction fetch from memory

2. ID: Instruction decode & register read

3. EX: Execute operation or calculate address

4. MEM: Access memory operand

5. WB: Write result back to register

Page 12: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 — The Processor — 12

Pipeline Performance Assume time for stages is

100ps for register read or write 200ps for other stages

Compare pipelined datapath with single-cycle datapath

Instr Instr fetch Register read

ALU op Memory access

Register write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Page 13: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 — The Processor — 13

Pipeline PerformanceSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 14: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

BasicBasic IdeaIdea

Assembly Line

Divide the execution of a task among a number of stages

A task is divided into subtasks to be executed in sequence

Performance improvement compared to sequential execution

Page 15: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

PipelinePipeline

Task

1 2 n

Sub-tasks

1 2 n

Pipeline

Stream ofTasks

Page 16: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

5 Tasks on 4 stage pipeline5 Tasks on 4 stage pipeline

Task 1

Task 2

Task 3

Task 4

Task 5

1 2 3 4 5 6 7 8Time

Page 17: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

SpeedupSpeedupt t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Speedup = n * m/n + m -1

Page 18: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Efficiency Efficiency t t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Efficiency = Speedup/ n =m/(n+m-1)

Page 19: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Throughput Throughput t t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Throughput = no. of tasks executed per unit of time = m/((n+m-1) x t)

Page 20: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Instruction Pipeline Instruction Pipeline

Pipeline stall Some of the stages might need more time to perform its

function. E.g. I2 needs 3 time units to perform its function

This is called a “Bubble” or “pipeline hazard”

Page 21: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Pipeline and Instruction Dependency Pipeline and Instruction Dependency

Instruction Dependency The operation performed by a stage depends on the operation(s)

performed by other stage(s).

E.g. Conditional Branch Instruction I4 can not be executed until the branch

condition in I3 is evaluated and stored. The branch takes 3 units of time

Page 22: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group Activity Group Activity

Show a Gantt chart for 10 instructions that enter a four-stage pipeline (IF, ID, IE , and IS)?

Assume that I5 fetching process depends on the results of the I4 evaluation.

Page 23: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Answer Answer

Page 24: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Pipeline and Data Dependency Pipeline and Data Dependency

Data Dependency: A source operand of instruction Ii depends on the results of

executing a proceeding Ij i > j

E.g. Ij can not be fetched unless the results of Ii are saved.

Page 25: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group Activity Group Activity

ADD R1, R2, R3 R3 R1 + R2 Ii

SL R3; R3 SL(R3) Ii+1

SUB R5, R6, R4 R4 R5 – R6 Ii+2

Assume that we have five stages in the pipeline: IF (Instruction Fetch) ID (Instruction Decode) OF (Operand Fetch) IE (Instruction Execute) IS (Instruction Store)

Show a Gantt chart for this code?

Page 26: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Answer Answer

R3 in both Ii and Ii+1 need to be written Therefore, the problem is a

Write after Write Data Dependancy

Page 27: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

When stalls occur in the pipeline ?When stalls occur in the pipeline ? Write after write Read after write Write after read Read after read does not cause stall

Page 28: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Read after write

Page 29: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group Activity Group Activity Consider the execution of the following sequence of

instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?

Page 30: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Answer Answer

Page 31: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

No Operation MethodNo Operation Method

Prevents Fetching the Wrong Instruction / Operand

Equivalent to doing nothing

Page 32: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group ActivityGroup Activity Consider the execution of ten instructions I1–I10 on a

pipeline consisting of four pipeline stages: IF, ID, IE, and IS. Assume that instruction I4 is a conditional branch instruction and that when it is executed, the branch is not taken; that is, the branch condition is not satisfied. Draw Gantt chart showing Nop?

Page 33: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Answer Answer Prevents Fetching Wrong Instruction

Page 34: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group ActivityGroup Activity Consider the execution of the following

piece of code on a five-stage pipeline (IF, ID, OF, IE, IS). Draw Gantt chart with Nop?

Page 35: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Answer Answer Prevents Fetching Wrong Operands

Page 36: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Reducing the Stalls Due to Instruction Dependency

Page 37: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Unconditional Branch InstructionsUnconditional Branch Instructions Reordering of Instructions

Use of Dedicated Hardware in the Fetch Unit Speed up the fetching instruction

Precomputing the Branch and Reordering the Instructions

Instruction prefetch Instructions can be fetched and stored in the instruction

queue.

Page 38: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Conditional Branching Instructions Conditional Branching Instructions The target of the conditional branch address will not be known

until the execution of the conditional branch has been completed.

Delayed Branch Fill the pipeline with some instruction until the branch instruction is

executed

Prediction of the next instruction to be fetched It is based on that the branch outcome is random Assume that the branch is not taken If the predication is correct , we saved the time Otherwise, we redo everything

Page 39: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Example Example Before delaying

After Delaying

Page 40: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Reducing Pipeline Stalls due to Data Dependency

Page 41: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Hardware Operand ForwardingHardware Operand Forwarding Allows the result of ALU operation to be available to another

ALU operation.

SUB can not start until R3 is stored If we can forward R3 to the Sub at the same time of the store

operation will save a stall time

Page 42: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group ActivityGroup Activity

Page 43: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group Activity Group Activity

Page 44: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group activity Group activity int I, X=3;

for( i=0;i<10;i++ ) {

X= X+ 5 ;

}Assume that we have five stages in the pipeline:IF (Instruction Fetch)ID (Instruction Decode)OF (Operand Fetch) IE (Instruction Execute)IS (Instruction Store)

Show a Gantt chart for this code?

Page 45: Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Group activity Group activity int I, X=3;

for( i=0;i<10;i++ ) {

X= X+ 5 ;}

MIPS Code

1.li $t0, 10 # t0 is a constant 10

2.li $t1, 0 # t1 is our counter (i)

3.li $t2, 3 # t2 is our x

4.loop:

5.beq $t1, $t0, end # if t1 == 10 we are done

6.Add $t2, $t2, 5 #Add 5 to x

7.addi $t1, $t1, 1 # add 1 to t1

8.j loop # jump back to the top

9.end: