(C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill,...

54
(C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project http://www. cs . wisc . edu / multifacet / University of Wisconsin— Madison

Transcript of (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill,...

Page 1: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

(C) 2003 Milo Martin

Token Coherence:Decoupling Performance and Correctness

Milo Martin, Mark Hill, and David Wood

Wisconsin Multifacet Projecthttp://www.cs.wisc.edu/multifacet/University of Wisconsin—Madison

Page 2: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 2

We See Two Problems in Cache Coherence

1. Protocol ordering bottlenecks– Artifact of conservatively resolving racing requests– “Virtual bus” interconnect (snooping protocols)– Indirection (directory protocols)

2. Protocol enhancements compound complexity– Fragile, error prone & difficult to reason about– Why? A distributed & concurrent system– Often enhancements too complicated to implement

(predictive/adaptive/hybrid protocols)

Performance and correctness tightly intertwined

Page 3: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 3

Rethinking Cache-Coherence Protocols

• Goal of invalidation-based coherence– Invariant: many readers -or- single writer– Enforced by globally coordinated actions

• Enforce this invariant directly using tokens– Fixed number of tokens per block– One token to read, all tokens to write

• Guarantees safety in all cases– Global invariant enforced with only local rules– Independent of races, request ordering, etc.

Key innovation

Page 4: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 4

Token Coherence: A New Framework for Cache Coherence

• Goal: Decouple performance and correctness– Fast in the common case– Correct in all cases

• To remove ordering bottlenecks– Ignore races (fast common case)– Tokens enforce safety (all cases)

• To reduce complexity– Performance enhancements (fast common case)– Without affecting correctness (all cases)– (without increased complexity)

Focus of this talk

Briefly described

Page 5: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 5

Outline

• Overview• Problem: ordering bottlenecks• Solution: Token Coherence (TokenB)• Evaluation• Further exploiting decoupling• Conclusions

Page 6: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 6

Technology Trends

• High-speed point-to-point links– No (multi-drop) busses

• Desire: low-latency interconnect– Avoid “virtual bus” ordering– Enabled by directory protocols

Technology trends unordered interconnects

• Increasing design integration– “Glueless” multiprocessors– Improve cost & latency

Page 7: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 7

Workload Trends

P P P M

1

2

P P P M

2

1

3

DirectoryProtocol

Workload trends avoid indirection, broadcast ok

• Commercial workloads– Many cache-to-cache misses– Clusters of small multiprocessors

• Goals:– Direct cache-to-cache misses

(2 hops, not 3 hops)– Moderate scalability

Page 8: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 8

Basic Approach

• Low-latency protocol– Broadcast with direct responses– As in snooping protocols

P P P M

1

2

Fast & works fine with no races… …but what happens in the case of a race?

• Low-latency interconnect– Use unordered interconnect – As in directory protocols

Page 9: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 9

1

•P0 issues a request to write (delayed to P2)

Request to write

Basic approach… but not yet correct

P2

Read/Write

P1

No Copy

P0

No Copy

Delayed in interconnect

3

•P1 issues a request to read

Request to read

2Ack

Page 10: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 10

P2

Read/Write

P1

No Copy

P0

No Copy

Basic approach… but not yet correct

12

3

4

Read-only Read-only

•P2 responds with data to P1

Page 11: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 11

Basic approach… but not yet correct

P2

Read/Write

P1

No Copy

P0

No Copy 12

3

4

5

Read-only Read-only

•P0’s delayed request arrives at P2

Page 12: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 12

Basic approach… but not yet correct

P2

Read/Write

P1

No Copy

P0

Read/Write 12

3

4

5

6

7

Read-only Read-onlyNo Copy

•P2 responds to P0

Page 13: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 13

Basic approach… but not yet correct

P2

Read/Write

P1

No Copy

P0

Read/Write 12

3

4

5

6

7

Read-only Read-onlyNo Copy

Problem: P0 and P1 are in inconsistent states

Locally “correct” operation, globally inconsistent

Page 14: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 14

Contribution #1: Token Counting

• Tokens control reading & writing of data• At all times, all blocks have T tokens

E.g., one token per processor• One or more to read• All tokens to write

• Tokens: in caches, memory, or in transit• Components exchange tokens & data

Provides safety in all cases

Page 15: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 15

Basic Approach (Revisited)

• As before:– Broadcast with direct responses (like snooping)– Use unordered interconnect (like directory)

• Track tokens for safety

• More refinement in a moment…

Page 16: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 16

Token Coherence Example

P2

T=16 (R/W)

P1

T=0

P0

T=0

2

Delayed

1

•P0 issues a request to write (delayed to P2)

Request to write

3

•P1 issues a request to read

Delayed Request to read

Max Tokens

Page 17: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 17

Token Coherence Example

P2

T=16 (R/W)

P1

T=0

P0

T=0 12

3

4

T=1(R) T=15(R)

•P2 responds with data to P1

T=1

Page 18: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 18

Token Coherence Example

P2

T=16 (R/W)

P1

T=0

P0

T=0 12

3

4

5

T=1(R) T=15(R)

•P0’s delayed request arrives at P2

Page 19: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 19

Token Coherence Example

P2

T=16 (R/W)

P1

T=0

P0

T=15(R) 12

3

4

5

6

7

T=1(R) T=15(R)T=0

•P2 responds to P0

T=15

Page 20: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 20

Token Coherence Example

P2

T=16 (R/W)

P1

T=0

P0

T=15(R) 12

3

4

5

6

7

T=1(R) T=15(R)T=0

Page 21: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 21

Token Coherence Example

P2

T=0

P1

T=1(R)

P0

T=15(R)

Now what? (P0 wants all tokens)

Page 22: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 22

Basic Approach (Re-Revisited)

• As before:– Broadcast with direct responses (like snooping)– Use unordered interconnect (like directory)– Track tokens for safety

• Reissue requests as needed– Needed due to racing requests (uncommon)– Timeout to detect failed completion

• Wait twice average miss latency• Small hardware overhead

– All races handled in this uniform fashion

Page 23: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 23

Token Coherence Example

P2

T=0

P1

T=1(R)

P0

T=15(R)

8

•P0 reissues request

•P1 responds with a token

T=19Timeout!

Page 24: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 24

Token Coherence Example

P2

T=0

P0

T=16 (R/W)

P1

T=0

•P0’s request completed

One final issue: What about starvation?

Page 25: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 25

Contribution #2:Guaranteeing Starvation-Freedom

• Handle pathological cases– Infrequently invoked– Can be slow, inefficient, and simple

• When normal requests fail to succeed (4x)– Longer timeout and issue a persistent request– Request persists until satisfied– Table at each processor– “Deactivate” upon completion

• Implementation– Arbiter at memory orders persistent requests

Page 26: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 26

Outline

• Overview• Problem: ordering bottlenecks• Solution: Token Coherence (TokenB)• Evaluation• Further exploiting decoupling• Conclusions

Page 27: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 27

Evaluation Goal: Four Questions

1. Are reissued requests rare? Yes

2. Can Token Coherence outperform snooping?Yes: lower-latency unordered interconnect

3. Can Token Coherence outperform directory?Yes: direct cache-to-cache misses

4. Is broadcast overhead reasonable?Yes (for 16 processors)

Quantitative evidence for qualitative behavior

Page 28: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 28

Workloads and Simulation Methods

• Workloads– OLTP - On-line transaction processing– SPECjbb - Java middleware workload– Apache - Static web serving workload– All workloads use Solaris 8 for SPARC

• Simulation methods– 16 processors– Simics full-system simulator– Out-of-order processor model– Detailed memory system model– Many assumptions and parameters (see paper)

Page 29: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 29

Q1: Reissued Requests(percent of all L2 misses)

OLTP SPECjbb Apache

Page 30: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 30

Q1: Reissued Requests(percent of all L2 misses)

Outcome OLTP SPECjbb Apache

Not Reissued 98% 98% 96%

Reissued Once

2% 2% 3%

Reissued > 1 0.4% 0.3% 0.7%

Persistent Requests (Reissued > 4)

0.2% 0.1% 0.3%

Yes; reissued requests are rare (these workloads, 16p)

Page 31: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 31

Q2: Runtime: Snooping vs. Token CoherenceHierarchical Switch Interconnect

Similar performance on same interconnect

“Tree” interconnect

Page 32: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 32

Q2: Runtime: Snooping vs. Token Coherence Direct Interconnect

Snooping not applicable

“Torus” interconnect

Page 33: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 33

Q2: Runtime: Snooping vs. Token Coherence

Yes; Token Coherence can outperform

snooping

(15-28% faster)

Why? Lower-latency interconnect

Page 34: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 34

Q3: Runtime: Directory vs. Token Coherence

Yes; Token Coherence can outperform

directories

(17-54% faster with slow directory)

Why? Direct “2-hop” cache-to-cache misses

Page 35: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 35

Q4: Traffic per Miss: Directory vs. Token

Yes; broadcast overheads reasonable

for 16 processors

(directory uses 21-25% less bandwidth)

Page 36: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 36

Q4: Traffic per Miss: Directory vs. Token

Yes; broadcast overheads reasonable

for 16 processors

(directory uses 21-25% less bandwidth)

Why? Requests are smaller than data

(8B v. 64B)

Requests & forwards

Responses

Page 37: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 37

Outline

• Overview• Problem: ordering bottlenecks• Solution: Token Coherence (TokenB)• Evaluation• Further exploiting decoupling• Conclusions

Page 38: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 38

Contribution #3: Decoupled Coherence

Cache Coherence Protocol

Correctness Substrate(all cases)

Performance Protocol(common cases)

Safety(token counting)

Starvation Freedom(persistent requests)

Many Implementation choices

Page 39: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 39

Example Opportunities of Decoupling

• Predict a destination-set [ISCA ‘03]– Based on past history– Need not be correct (rely on persistent requests)– Enables larger or more cost-effective systems

• Example#2: predictive push

P2P2P2P2PnP1

T=16

P0 T=16

• Example#1: Broadcast is not required

Requires no changes to correctness substrate

Predictive push

Page 40: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 40

Conclusions

• Token Coherence (broadcast version)– Low cache-to-cache miss latency (no indirection)– Avoids “virtual bus” interconnects– Faster and/or cheaper

• Token Coherence (in general)– Correctness substrate

• Tokens for safety• Persistent requests for starvation freedom

– Performance protocol for performance– Decouple correctness from performance

• Enables further protocol innovation

Page 41: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 41

Page 42: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 42

Cache-Coherence Protocols

• Goal: provide a consistent view of memory

• Permissions in each cache per block– One read/write -or-

– Many readers

• Cache coherence protocols– Distributed & complex

– Correctness critical

– Performance critical

• Races: the main source of complexity– Requests for the same block at the same time

Page 43: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 43

Evaluation Parameters

• Processors– SPARC ISA– 2 GHz, 11 pipe stages– 4-wide fetch/execute– Dynamically scheduled– 128 entry ROB– 64 entry scheduler

• Memory system– 64 byte cache lines– 128KB L1 Instruction and

Data, 4-way SA, 2 ns (4 cycles)

– 4MB L2, 4-way SA, 6 ns (12 cycles)

– 2GB main memory, 80 ns (160 cycles)

• Interconnect– 15ns link latency– Switched tree (4 link

latencies) - 240 cycles 2-hop round trip

– 2D torus (2 link latencies on average) - 120 cycles 2-hop round trip

– Link bandwidth: 3.2 Gbyte/second

• Coherence Protocols– Aggressive snooping– Alpha 21364-like directory– 72 byte data messages– 8 byte request messages

Page 44: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 44

Q3: Runtime: Directory vs. Token Coherence

Yes; Token Coherence can outperform

directories

(17-54% faster with slow directory)

Why? Direct “2-hop” cache-to-cache misses

Page 45: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 45

More Information in Paper

• Traffic optimization– Transfer tokens without data– Add an “owner” token

• Note: no silent read-only replacements– Worst case: 10% interconnect traffic overhead

• Comparison to AMD’s Hammer protocol

Page 46: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 46

Verifiability & Complexity

• Divide and conquer complexity– Formal verification is work in progress– Difficult to quantify, but promising– All races handled uniformly (reissuing)

• Local invariants– Safety is response-centric; independent of requests– Locally enforced with tokens– No reliance on global ordering properties

• Explicit starvation avoidance– Simple mechanism

• Further innovation no correctness worries

Page 47: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 47

Traditional v. Token Coherence• Traditional protocols

– Sensitive to request ordering• Interconnect or directory

– Monolithic• Complicated• Intertwine correctness and performance

• Token Coherence– Track tokens (safety)– Persistent requests (starvation avoidance)– Request are only “hints”

Separate Correctness and Performance

Page 48: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 48

Conceptual Interface

PerformanceProtocol

CorrectnessSubstrate

Data, Tokens,& PersistentRequests

“Hint” requests

MP0

Controller

MP1

Controller

Controller Controller

Page 49: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 49

Conceptual Interface

PerformanceProtocol

CorrectnessSubstrate

I’m looking forblock B

Please sendBlock B to P1

Here is block B& one token

(Or, no response)

MP0

Controller

MP1

Controller

Controller Controller

Page 50: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 50

• Snooping multiprocessors– Uses broadcast– “Virtual bus” interconnect+ Directly locate data (2 hops)

• Directory-based multiprocessors Directory tracks writer or readers+ Avoids broadcast+ Avoids “virtual bus” interconnect– Indirection for cache-to-cache (3 hops)

Examine workload and technology trends

Snooping v. Directories: Which is Better?

P P P M

1

2

P P P M

2

1

3

Page 51: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 51

Workload Trends

• Commercial workloads– Many cache-to-cache misses or sharing misses– Cluster small- or moderate-scale multiprocessors

• Goals:– Low cache-to-cache miss latency (2 hops)– Moderate scalability

Workload trends snooping protocols

Page 52: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 52

Technology Trends

• High-speed point-to-point links– No (multi-drop) busses

• Desire: unordered interconnect– No “virtual bus” ordering– Decouple interconnect & protocol

Technology trends directory protocols

• Increasing design integration– “Glueless” multiprocessors– Improve cost & latency

Page 53: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 53

Multiprocessor Taxonomy

Directories

Interconnect

Virtual Bus

Unordered

Snooping

Cache-to-cache latencyHigh LowTechnology

Trends

Workload Trends

Our Goal

Page 54: (C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project

Token Coherence – Milo Martinslide 54

Overview

• Two approaches to cache-coherence• The Problem

– Workload trends snooping protocols– Technology trends directory protocols– Want a protocol that fits both trends

• A Solution– Unordered broadcast & direct response– Track tokens, reissue requests, prevent starvation

• Generalization: a new coherence framework– Decouple correctness from “performance” policy– Enables further opportunities