Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...

Post on 04-Jan-2016

213 views 0 download

Tags:

Transcript of Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...

Sequential Hardware Prefetchingin Shared-Memory Multiprocessors

Fredrik Dahlgren, Member, IEEE Computer Society,Michel Dubois, Senior Member, IEEE, and Per Stenstrom, Member, IEEE

INTRODUCTION

• Why prefetching?

• Motivations for Prefetching of data.

Types of prefetching.

SOFTWARE HARDWARE

SIMPLEST & MOST OBVIOUS PREFETCHING TECHNIQUE:

INCREASE BLOCK SIZE!!!

IMPACT OF BLOCK SIZE ON ACCESS PENALTIES AND TRAFFIC

TYPES OF MISSES

TRUE SHARING FALSE SHARING

REPLACEMENTCOHERENCECOLD

EFFECT OF INCREASING THE BLOCK SIZE ON THE DIFFERENT TYPES OF MISSES

• EFFECT OF BLOCK SIZE ON DIFFERENT TYPES OF MISSES

• EFFECT OF BLOCK SIZE ON MEMORY TRAFFIC

• EFFECT OF BLOCK SIZE ON WRITE PENALTY

SIMULATED NODE ARCHITECTURE

TWO SIMPLE HARDWARE CONTROLLED SEQUENTIAL PREFETCHING TECHNIQUE

1. FIXED SEQUENTIAL PREFETCHING

2. ADAPTIVE SEQUENTIAL PREFETCHING

K is controlled by the LookAhead Counter

Mechanism needed :Prefetch bit

Zero bitLook Ahead Counter

Prefetch CounterUseful Counter

ADAPTIVE SEQUENTIAL PREFETCHING ALGORITHM

MEASURES PREFETCH EFFICIENCY BY COUNTING THE FRACTION OF USEFUL PREFETCHS.

COUNTING PREFETCH BLOCKS : INCREMENT THE PREFETCH COUNTER WHENEVER WE DO A PREFETCH.

COUNTING THE USEFUL PREFETCHES : INCREMENT THE USEFUL COUNTER WHENEVER A BLOCK WITH ITS PREFETCH BIT = 1 IS ACCESSED.

IF PREFETCH COUNTER = MAX , THEN CHECK USEFUL COUNTER.

USEFUL COUNTER > UPPER THRESHOLD ; LOOKAHEAD COUNTER INCREASED.USEFUL COUNTER < LOWER THRESHOLD ; LOOKAHEAD COUNTER DECREASED.LOWER THRESHOLD < USEFUL COUNTER < UPPER THRESHOLD ; LOOKAHEAD COUNTER UNAFFECTED.

EXPERIMENTAL RESULTS

READ, WRITE AND SYNCHRONIZATION TRAFFIC & THE THREE SCHEMES

RELATIVE READ STALL TIMES FOR FIXED AND ADAPTIVE PREFETCHING NORMALIZED TO NO PREFETCHING.

CONCLUSIONS

Prefetching improves efficiency Fixed Sequential Prefetching analyzed for

K=1. Read Misses Decreases by 25 – 45 %, Read

stall time decreases by 20 – 35 % Under Adaptive Sequential prefetching Read

stall time reduced by 58% , execution time decreased by 25%.

QUESTIONS ??

THANK YOU