Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...
-
Upload
shawn-heath -
Category
Documents
-
view
213 -
download
0
Transcript of Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...
Sequential Hardware Prefetchingin Shared-Memory Multiprocessors
Fredrik Dahlgren, Member, IEEE Computer Society,Michel Dubois, Senior Member, IEEE, and Per Stenstrom, Member, IEEE
INTRODUCTION
• Why prefetching?
• Motivations for Prefetching of data.
Types of prefetching.
SOFTWARE HARDWARE
SIMPLEST & MOST OBVIOUS PREFETCHING TECHNIQUE:
INCREASE BLOCK SIZE!!!
IMPACT OF BLOCK SIZE ON ACCESS PENALTIES AND TRAFFIC
TYPES OF MISSES
TRUE SHARING FALSE SHARING
REPLACEMENTCOHERENCECOLD
EFFECT OF INCREASING THE BLOCK SIZE ON THE DIFFERENT TYPES OF MISSES
• EFFECT OF BLOCK SIZE ON DIFFERENT TYPES OF MISSES
• EFFECT OF BLOCK SIZE ON MEMORY TRAFFIC
• EFFECT OF BLOCK SIZE ON WRITE PENALTY
SIMULATED NODE ARCHITECTURE
TWO SIMPLE HARDWARE CONTROLLED SEQUENTIAL PREFETCHING TECHNIQUE
1. FIXED SEQUENTIAL PREFETCHING
2. ADAPTIVE SEQUENTIAL PREFETCHING
K is controlled by the LookAhead Counter
Mechanism needed :Prefetch bit
Zero bitLook Ahead Counter
Prefetch CounterUseful Counter
ADAPTIVE SEQUENTIAL PREFETCHING ALGORITHM
MEASURES PREFETCH EFFICIENCY BY COUNTING THE FRACTION OF USEFUL PREFETCHS.
COUNTING PREFETCH BLOCKS : INCREMENT THE PREFETCH COUNTER WHENEVER WE DO A PREFETCH.
COUNTING THE USEFUL PREFETCHES : INCREMENT THE USEFUL COUNTER WHENEVER A BLOCK WITH ITS PREFETCH BIT = 1 IS ACCESSED.
IF PREFETCH COUNTER = MAX , THEN CHECK USEFUL COUNTER.
USEFUL COUNTER > UPPER THRESHOLD ; LOOKAHEAD COUNTER INCREASED.USEFUL COUNTER < LOWER THRESHOLD ; LOOKAHEAD COUNTER DECREASED.LOWER THRESHOLD < USEFUL COUNTER < UPPER THRESHOLD ; LOOKAHEAD COUNTER UNAFFECTED.
EXPERIMENTAL RESULTS
READ, WRITE AND SYNCHRONIZATION TRAFFIC & THE THREE SCHEMES
RELATIVE READ STALL TIMES FOR FIXED AND ADAPTIVE PREFETCHING NORMALIZED TO NO PREFETCHING.
CONCLUSIONS
Prefetching improves efficiency Fixed Sequential Prefetching analyzed for
K=1. Read Misses Decreases by 25 – 45 %, Read
stall time decreases by 20 – 35 % Under Adaptive Sequential prefetching Read
stall time reduced by 58% , execution time decreased by 25%.
QUESTIONS ??
THANK YOU