Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...

15
Sequential Hardware Prefetching in Shared- Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and Per Stenstrom, Member, IEEE

Transcript of Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE...

Page 1: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

Sequential Hardware Prefetchingin Shared-Memory Multiprocessors

Fredrik Dahlgren, Member, IEEE Computer Society,Michel Dubois, Senior Member, IEEE, and Per Stenstrom, Member, IEEE

Page 2: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

INTRODUCTION

• Why prefetching?

• Motivations for Prefetching of data.

Types of prefetching.

SOFTWARE HARDWARE

Page 3: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

SIMPLEST & MOST OBVIOUS PREFETCHING TECHNIQUE:

INCREASE BLOCK SIZE!!!

Page 4: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

IMPACT OF BLOCK SIZE ON ACCESS PENALTIES AND TRAFFIC

TYPES OF MISSES

TRUE SHARING FALSE SHARING

REPLACEMENTCOHERENCECOLD

Page 5: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

EFFECT OF INCREASING THE BLOCK SIZE ON THE DIFFERENT TYPES OF MISSES

• EFFECT OF BLOCK SIZE ON DIFFERENT TYPES OF MISSES

• EFFECT OF BLOCK SIZE ON MEMORY TRAFFIC

• EFFECT OF BLOCK SIZE ON WRITE PENALTY

Page 6: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

SIMULATED NODE ARCHITECTURE

Page 7: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

TWO SIMPLE HARDWARE CONTROLLED SEQUENTIAL PREFETCHING TECHNIQUE

1. FIXED SEQUENTIAL PREFETCHING

Page 8: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

2. ADAPTIVE SEQUENTIAL PREFETCHING

K is controlled by the LookAhead Counter

Mechanism needed :Prefetch bit

Zero bitLook Ahead Counter

Prefetch CounterUseful Counter

Page 9: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

ADAPTIVE SEQUENTIAL PREFETCHING ALGORITHM

MEASURES PREFETCH EFFICIENCY BY COUNTING THE FRACTION OF USEFUL PREFETCHS.

COUNTING PREFETCH BLOCKS : INCREMENT THE PREFETCH COUNTER WHENEVER WE DO A PREFETCH.

COUNTING THE USEFUL PREFETCHES : INCREMENT THE USEFUL COUNTER WHENEVER A BLOCK WITH ITS PREFETCH BIT = 1 IS ACCESSED.

IF PREFETCH COUNTER = MAX , THEN CHECK USEFUL COUNTER.

USEFUL COUNTER > UPPER THRESHOLD ; LOOKAHEAD COUNTER INCREASED.USEFUL COUNTER < LOWER THRESHOLD ; LOOKAHEAD COUNTER DECREASED.LOWER THRESHOLD < USEFUL COUNTER < UPPER THRESHOLD ; LOOKAHEAD COUNTER UNAFFECTED.

Page 10: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

EXPERIMENTAL RESULTS

READ, WRITE AND SYNCHRONIZATION TRAFFIC & THE THREE SCHEMES

Page 11: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

RELATIVE READ STALL TIMES FOR FIXED AND ADAPTIVE PREFETCHING NORMALIZED TO NO PREFETCHING.

Page 12: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.
Page 13: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.
Page 14: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

CONCLUSIONS

Prefetching improves efficiency Fixed Sequential Prefetching analyzed for

K=1. Read Misses Decreases by 25 – 45 %, Read

stall time decreases by 20 – 35 % Under Adaptive Sequential prefetching Read

stall time reduced by 58% , execution time decreased by 25%.

Page 15: Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

QUESTIONS ??

THANK YOU