Post on 04-Jan-2016
description
Csci 136 Computer Architecture IICsci 136 Computer Architecture II – Branch Hazards, Exceptions – Branch Hazards, Exceptions
Xiuzhen Chengcheng@gwu.edu
Announcement
Homework assignment #10, Due time – Before class, April 12
Readings: Sections 6.4 – 6.5
Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them will be graded. Your TA will give hints in the lab sections.)
Project #3 is due on April 10, 2005
Quiz #4: April 12, 2005
Final: Thursday, May 12, 12:40AM-2:40PM
Note: you must pass final to pass this course!
Review on Data Hazards, Forwarding, Stall
When does a data hazard happen?Data dependencies
Using forwarding to overcome data hazardsData is available after ALU stage
Forwarding conditions
Stall the pipeline for load-use instructionsData is available after MEM stage (lw instruction)
Hazard detection conditionsWhy in ID stage?
Review on Data Hazards
Review on Data Hazards, Forwarding, Stall
Sign-extend
PC+4
LW and SW
lw $5, 0($15)sw $5, 100($15)
Sign-Ext
lw $5, 0($15)beq $5, $0, Exitsw $5, 100($15)
lw $5, 0($15)add $8, $8, $8sw $5, 100($15)
SW is in MEM Stage
MEM/WB.RegWrite and EX/MEM.MemWrite and
MEM/WB.RegisterRd = EX/MEM.RegisterRd and
MEM/WB.RegisterRD != 0
Sign-Ext
EX/MEM
Data memory
lwsw
lw $5, 0($15)sw $5, 100($15)
SW is In EX Stage
ID/EX.MemWrite and MEM/WB.RegWrite and
MEM/WB.RegisterRd = ID/EX.RegisterRt and
MEM/WB.RegisterRd != 0
Sign-Ext
lwsw
More Cases
lw $15, 0($8) # load-use,sw $5, 100($15) # stall pipeline
R-Type followed by sw?The result from R-Type will be saved into memory
R-Type will overwrite base register for sw
An Example
40: lw $2, 20($1)
44: and $4, $2, $5
48: or $8, $2, $4
Clock Cycle 1:
Clock Cycle 2:
Clock Cycle 3:
Clock Cycle 4:
Clock 1
Sign-extend
PC+4
Clock 1
Lw $2, 20($1)
44
Clock 2
Sign-extend
PC+4
Clock 2
And $4, $2, $5
48
Lw $2, 20($1)
44
$1
20
122
11
010
0001
Clock 3
Sign-extend
PC+4
Clock 3
Or $8, $2, $4
52
And $4, $2, $5
44
$2
255
10
000
1100
$5
4
Lw $2, 20($1)
11
010
122
$1
20
Clock 4
Sign-extend
PC+4
Clock 4
Or $8, $2, $4
52
And $4, $2, $5
44
$2
255
10
000
1100
$5
4
Bubble
00
000
Lw $2, 20($1)
11
Clock 5
Sign-extend
PC+4
Clock 5
Or $8, $2, $4 And $4, $2, $5
44
$2
244
10
000
1100
$4
8
Bubble
10
000
Lw $2, 20($1)
00
$2
$5
255
44 2
11
Branch Hazards
Control hazard: attempt to make a decision before condition is evaluated
Branch Hazards
flush flush flush
Decision is made here
Observations
Branch decision does not occur until MEM stage; 3 CCs are wasted. – Current design, non-optimized
Is it possible to reduce branch delay?YESIn EXE stage?
Two CCs branch delay
In ID Stage?One CC branch delayHow? – for beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation. Also we have a separate ALU to compute branch address.
3 strategiesDelayed branch; Static branch prediction; Dynamic branch Prediction
Delayed Branch
Will always execute the instruction following the branch.
Only one will be executed
Done by compiler or assembler50% successful rate
Losing popularityWhy?
More pipeline stages
Superscalar
Scheduling the Branch Delay Slot
Independent instruction, best choice B is good when branch taking probability is high. It must be OK to execute the sub instruction when the branch goes to the unexpected direction
Static Branch Prediction
Assume the branch will not be taken; If prediction is wrong, clear the effect of sequential instruction execution.
How to discard instructions in the pipeline?Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded.
Branch decision is made at ID stage: only flush IF/ID pipeline register!
Static Branch Prediction
flush flush flush
Decision is made here
Static Branch Prediction
IF.Flush
Pipelined Branch – An Example36:
10
$4
$8
40:
44
28
72
IF.Flush
44:
Pipelined Branch – An Example72:
Dynamic Branch Prediction
Static branch prediction is crude!
Take history into considerationIf a branch was taken last time, then fetching the new instruction from the same place
Branch prediction buffer – indexed by the lower bits of the branch instruction
This memory contains a bit (or bits) which tells whether the branch was recently taken or not
Is the prediction correct? Any bad effect?
1-bit prediction scheme
2-bit prediction scheme
Prediction Taken Prediction Taken
Prediction not Taken Prediction not Taken
taken
Not taken
takentaken
Not taken
Not taken
Not taken
taken
Observation
Since we move branch prediction to the ID stage, we need to copy forwarding control related hardware to the ID stage too!
Beq following lwHazard detection unit should work.
In-Class Exercise
Consider a loop branch that branches nine times in a row, then is not taken once. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer?
1-bit prediction?
With 2-bit prediction?
Prediction Taken Prediction Taken
Prediction not Taken Prediction not Taken
taken
Not taken
takentaken
Not taken
Not taken
Not taken
taken
Performance Comparision
Compare the performance of single-cycle, multi-cycle and pipelined datapath
200ps for memory access, 100ps for ALU operation, 50ps for register file access
25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops
For piplelined datapath, 50% of load are immediately followed an instruction that uses the result
Branch delay on misprediction is 1 clock cycle and 25% branches are mispredicted
Jump delay is 1 clock cycle
Exceptions
Exceptions: events other than branch or jump that change the normal flow of instruction
Arithmetic overflow, undefined instruction, etc
Internal of the processor
Interrupts from external – IO interrupts
Use arithmetic overflow as an exampleWhen an overflow is detected, we need to transfer control to the exception handling routine at location 0x 8000 0180 immediately because we do not want this invalid value to contaminate other registers or memory locations
Similar idea as branch hazard
Detected in the EX stage
De-assert all control signals in EX and ID stages, flush IF/ID
Exceptions
80000180
Example
sub $11, $2, $4
and $12, $2, $5
or $13, $2, $6
add $1, $2, $1 -- overflow occurs
slt $15, $6, $7
lw $16, 50($7)
Exceptions handling routine:
0x 8000 0180 sw $25, 1000($0)
0x 8000 0184 sw $26, 1004($0)
Example
80000180
Clock 6
Example
Clock 7
80000180
Questions?