ECE 486/586 Computer Architecture Lecture # 17web.cecs.pdx.edu/~zeshan/ece586_lec17.pdf ·...

ECE 486/586

Computer Architecture

Lecture # 17

Spring 2019

Portland State University

Lecture Topics

• Branch Prediction– Tournament Predictors

– Branch Target Buffer (BTB)

– Return Address Stack (RAS)

• Speculative Execution

Reference:

• Chapter 3: Sections 3.3, 3.6, 3.9

Tournament Predictors: Adaptively Combining Local and Global Predictors

• Some branches are predicted more accurately with global predictors

• Some branches are predicted better with local predictors

• Key Idea: Combine both local and global predictors and dynamically select the right predictor for the right branch

• The selector is yet another 2-bit predictor with a state machine

• Based on which predictor (local, global or even some mix) was most effective in recent predictions

Predictor Comparison

Predicting Branch Targets

• To avoid branch penalty in 5-stage pipeline, we need to know which address to fetch next instruction from before end of IF stage

• Requires us to know whether the (as-yet undecoded) instruction is a branch and, if so, what the next PC should be

• Solution: Predict the target address for a potential branch during the IF stage

Branch Target Buffer (BTB)

• During IF stage, use PC of current instruction (possible branch) to index into table of predicted target PCs for that branch

• Fetch of predicted target begins at the start of next cycle

Predicting Branch Targets

• Unlike branch prediction behavior, we cannot permit aliasing but must match the PC; otherwise we would fetch predicted targets for non-branch instructions, impacting performance

• If branch is later resolved to be not-taken, remove the BTB entry

• Fetch for predicted-not-taken branch is the same as a non-branch; sequential

• If using a two-bit branch predictor within the BTB

• Can retain the BTB entry but use prediction bits in the table

Branch target Buffer Behavior

Branch Penalty

Instruction in Buffer

Prediction Actual Branch Penalty Cycles

Yes Taken Taken 0

Yes Taken Not Taken 2

No Not Taken Taken 2

No Not Taken Not Taken 0

The above penalty assumes that the branch outcome is being computed in EX stage

Branch Folding

• Store the actual target instruction rather than its PC• Saves a memory fetch cycle => can be leveraged to build a larger branch target

buffer (additional latency compensated by instruction fetch savings)

• Zero-cycle unconditional branches• Branch target buffer signals a hit and provides the target instruction

• Target instruction substituted for current instruction (unconditional branch)

11

Dealing with Indirect Branches

• Indirect branches have multiple potential targets, since address comes from a register, which can have many possible values

• Branch target buffers could be used for indirect branch target prediction

– However, many mispredictions can happen because the BTB can store only one target per branch

• Most indirect branches come from return instructions

Returning from Procedure Calls

• Procedures return to their callers via a Jump instruction• Procedures may be called from different places in a program

• Makes branch target prediction inaccurate if relying on previous return address

JALR R31

JALR R31

JALR R31

………

………

Procedure

JR R31

………

Return Address Stack (RAS)

Key Idea: Cache the most recent return addresses in a small buffer operating as a stack, called return address stack (RAS)

• When “procedure call” occurs, push the return address (which is the Call address + 4) onto the RAS

• When return instruction encountered, pop the address from the RAS (last-in, first-out) and use it as the target

Return Address1

Return Address2

.

.

.

Return Addressn

Return Address Stack (RAS)

Key Idea: Cache the most recent return addresses in a small buffer operating as a stack, called return address stack (RAS)

• When “procedure call” occurs, push the return address (which is the Call address + 4) onto the RAS

• When return instruction encountered, pop the address from the RAS (last-in, first-out) and use it as the target

If the RAS is sufficiently large (i.e., as large as the maximum call depth), it will predict the return addresses perfectly

Speculative Execution

• In high performance pipelines with multiple issue, control dependency becomes the primary bottleneck

• Branch prediction allows the pipeline to partially continue until branch outcome is known

• Instructions continue to be fetched/issued but cannot be executed until branch outcome known

Solution: Speculatively execute instructions based upon predicted branch outcome

Hardware-based Speculation

• Resulting Problem: If branch prediction is wrong:

• must “undo” the effects of wrongly executed instructions

• Must deal with potential exceptions arising from wrongly executed instructions

• Solution: Separate “execution” and “write results” from “commit”

Hardware-based Speculation

• Combines three key ideas:

• Dynamic branch prediction

• Speculation– Allow execution of instructions before control dependences are resolved

– Ability to undo the effects of incorrectly executed instructions

• Dynamic scheduling

• Used in most modern high performance processors

• Relies on extension to Tomasulo’s algorithm

• Separate bypassing from actual instruction completion (“commit”)

• Commit order implemented with “Re-order Buffer”

ECE 486/586 Computer Architecture Lecture # 17web.cecs.pdx.edu/~zeshan/ece586_lec17.pdf ·...

Documents

Transcript of ECE 486/586 Computer Architecture Lecture # 17web.cecs.pdx.edu/~zeshan/ece586_lec17.pdf ·...