Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
description
Transcript of Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Flexible Snooping:
Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Karin Strauss, Xiaowei Shen*, Josep Torrellas
University of Illinois at Urbana-Champaign
*IBM Research
http://iacoma.cs.uiuc.edu
Karin Strauss Flexible Snooping 2QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Motivation
• CMPs are becoming standard components
• cheaper to build medium size machines– 32 to 128 cores (multi-CMP)
• shared memory, cache coherent– easier to program, easier to manage
• supporting cache coherence is difficult
Karin Strauss Flexible Snooping 3QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cache coherence solutions
long latenciessimplenosnoopy
embedded ring
difficult to scale
simpleyessnoopy
broadcast bus
indirection,
extra hardwarescalableno
directory based protocol
consprosordered
network?strategy
• other proposals (e.g. token coherence)
Karin Strauss Flexible Snooping 4QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Contributions
compared to fastest state-of-the-art scheme
performance energyconsumption
Superset Aggressive
performance energyconsumption
Superset Conservative
• family of adaptive coherence protocols for rings
• two were chosen as best options
high performance scheme energy conscious scheme
Karin Strauss Flexible Snooping 5QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Multi-CMP multiprocessor
local network
CMP Proc + L1 + L2
memory
• coherence protocol used: only one supplier if line is cached
Karin Strauss Flexible Snooping 6QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Ring in actionR
S
R
S
R
S
supplierpredictor
snoop
request
cmp
Lazy Eager Oracle
response
datadata
data
Karin Strauss Flexible Snooping 7QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Ring in actionR
S
R
S
R
S
latency
snoops
messages
• goal: adaptive schemes that approximate Oracle’s behavior
Lazy Eager Oracle
Karin Strauss Flexible Snooping 8QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Primitive snooping actions
X X
• snoop and then forward
• forward and then snoop
• forward only
+ fewer messages
+ shorter latency
+ fewer snoops+ shorter latency– false negative predictions not allowed
Karin Strauss Flexible Snooping 9QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Predictors and algorithms
snoopforwardExact
forward
then snoopAgg
forward
snoopforward
then snoopSubset
action on positive
prediction
action on negative
prediction
predictor / algorithm
Superset
Consnoop then
forward
node can supply
in predictor
set of addresses:
Karin Strauss Flexible Snooping 10QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Eager
Subset
Lazy
SupersetAgg
SupersetCon
Oracle
Algorithms
/ Exact
number of snoops
snoop messagelatency
number of messages
Per miss service:
algorithm negative positive
Subsetforward
then snoop
snoop
S
u
p
e
r
set
C
o
n forward
snoop then
forward
A
gg
forward then
snoop
Exact forward snoop
Karin Strauss Flexible Snooping 11QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Predictor implementation• Subset
– associative table:
subset of addresses that can be supplied by node
• Superset– bloom filter: superset of addresses that can be supplied by node– associative table (exclude cache):
addresses that recently suffered false positives
• Exact– associative table: all addresses that can be supplied by node
– downgrading: if address has to be evicted from predictor table,
corresponding line in node has to be downgraded
Karin Strauss Flexible Snooping 12QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Downgrading
AB
ES Negative effects:
• writes by this node need to snoop other nodes
• reads and writes by other nodes need to fetch line from memory
A
Karin Strauss Flexible Snooping 13QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Experiments• 8 CMPs, 4 ooo cores each = 32 cores
– private L2 caches
• on-chip bus interconnect
• off-chip 2D torus interconnect with embedded unidirectional ring
• per node predictors: latency of 3 processor cycles
• sesc simulator (sesc.sourceforge.net)
• SPLASH-2, SPECjbb, SPECweb
Karin Strauss Flexible Snooping 14QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Execution time
0
0.2
0.4
0.6
0.8
1
1.2
SPLASH-2 SPECjbb SPECweb
Normalized
execution time
Lazy Eager Oracle Subset SupersetCon
SupersetAggExact
• the fastest of all algorithms is SupersetAgg
• performance of most flexible snooping algorithms is similar to Eager
Karin Strauss Flexible Snooping 15QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Miss service energy
0
0.5
1
1.5
2
SPLASH-2 SPECjbb SPECweb
Normalized
energy consumption Lazy
Eager Oracle Subset SupersetCon
SupersetAggExact
3.22
• SupersetCon is least energy-hungry algorithm
• algorithms that eagerly forward messages use more energy
Karin Strauss Flexible Snooping 16QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Most cost-effective algorithms
00.20.40.60.81
1.2
SPLASH-2 SPECjbb SPECweb
Normalized
execution time
Lazy Eager Oracle Subset SupersetCon SupersetAggExact
0
0.5
1
1.5
2
SPLASH-2 SPECjbb SPECweb
Normalized
energy consumption
3.22Lazy Eager Oracle Subset SupersetCon SupersetAggExact
• high performance: Superset Aggressive • faster than Eager at lower energy consumption
• energy conscious: Superset Conservative• slightly slower than Eager at much lower energy consumption
Karin Strauss Flexible Snooping 17QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Most cost-effective algorithmscompared to fastest state-of-the-art scheme (Eager)
can be combined by only changing forwarding policy
performance energyconsumption
performanceenergy
consumption
Superset Aggressive high performance scheme
Superset Conservative energy conscious scheme
Karin Strauss Flexible Snooping 18QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Conclusions
• proposed flexible snooping, a family of adaptive protocols for embedded rings
• two chosen protocols– high performance: Superset Aggressive – energy conservation: Superset Conservative – can be selected dynamically
• embedded-ring protocols more attractive
Karin Strauss Flexible Snooping 19QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Arch map
Google: architecture conference map(1st hit)
http://iacoma.cs.uiuc.edu/students/archmap/archmap.html
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Flexible Snooping:
Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Karin Strauss, Xiaowei Shen*, Josep Torrellas
University of Illinois at Urbana-Champaign
*IBM Research
http://iacoma.cs.uiuc.edu