Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I...

30
Cache Coherence Protocols Are Hard Easy Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IM AD M IM A IM AD I IM AD S IM A I IM A S IM AD SI IM A SI S ProtoGen

Transcript of Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I...

Page 1: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Cache Coherence Protocols Are Hard Easy

Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin

S

I

M

I

IMAD

M

IMA

IMADI

IMADS

IMAI

IMAS

IMADSI

IMASI

S

ProtoGen

Page 2: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Directory Cache Coherence

Interconnect

Directory Memory

Core

Cache X = 0

Core

Cache X = 0

Interconnect

Directory Memory

Core

Cache X = 1

Core

Cache X = 0

Interconnect

Directory Memory

Core

Cache X = 1

Core

Cache

Page 3: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

“Cache coherence protocols are notoriously

difficult to implement” [DSL 1997]“Sophisticated cache coherence protocols are notoriously

difficult to get right” [ICS 1999]

“Cache coherence protocols for distributed shared

memory multiprocessors are notoriously difficult

to design” [ICFS 1996]

“Cache coherence protocols are notoriously

difficult to design and verify” [High Perf.

Memory Systems, 2004]

“… directory-based cache coherence protocols

are notoriously complex” [PACT 2011]

“The coherence problem is difficult, because it

requires coordinating events across nodes”

[IEEE Concurrency 2000]

“Coherence protocols are notoriously difficult to

design and implement correctly” [ASPLOS 2017]

“… designing and verifying a new hardware coherence

protocol is difficult”

[Spandex: A Flexible Interface for Efficient Heterogeneous

Coherence - ISCA 2018]

Page 4: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

From AnandTech: “… coherency was broken and manually disabled on the Galaxy S 4. The implications are serious from a power consumption (and performance) standpoint.”

Bugs in the Wild

Page 5: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

S

I

M

I

IMAD

M

store / send GetM to Dir

IMA

recv Data + Ackrecv Data

recv Acks

IMADI recv Fwd-GetM

IMADS recv Fwd-GetS

IMAI

IMAS

recv Fwd-GetM

recv Fwd-GetS

recv Data

recv Data

IMADSI

IMASI

recv Acks

recv Inv

recv Inv

recv Data

recv Acks recv Data + Ack

recv Data + Ack

Srecv Data + Ack

recv Acks

physical atomicity logical atomicity

Page 6: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

S Mstore / send GetM / recv Ack

Atomic S to M Transition

Page 7: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

S SMAD Mstore / send GetM recv Ack

Transient States

non-atomic transaction

Page 8: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

SMAD

recv Inv recv Fwd-GetS

S Mstore / send GetM recv Ack

Concurrent Transactions

Page 9: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

SMAD

recv Inv recv Fwd-GetS

S Mstore / send GetM recv Ack

recv Inv recv Fwd-GetS

Concurrent Transactions

non-atomic transactions + concurrency = complexity

Page 10: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

To Summarize…

§ Stable state protocols assume physically atomic transactions

§ Need to support concurrency for performance

§ Transient states required to provide logically atomic transactions

Page 11: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Thus…

§ Stable state protocol is a sequential specification

§ The final protocol is a non-blocking concurrent implementation

§ Transient states are synchronization operations

Page 12: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Insight

S

I

M

I

IMAD

M

IMA

IMADI

IMADS

IMAI

IMAS

IMADSI

IMASI

S

Sequential object {………}

Non-blocking concurrent {………}

No wonder cache coherence protocols are Hard!

Page 13: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Insightconcurrent Method-1 {…RMW(…); //linearization point…}

timeMethod-1()

Method-2()

Method-1()Method-2()

Interconnect

Directory Memory

Core

Cache X = 0

Core

Cache X = 0

Directory is the Linearization point!

Page 14: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Demystifying Transient States

How do transient states provide logical atomicity?

§ Convey directory serialization order to caches

§ Transient states ensure that caches obey this order

ProtoGen automates by leveraging this insight!

Page 15: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

How does cache infer serialization order?

recv Inv recv Fwd-GetS

S Mstore / send GetM recv Ack

recv Inv recv Fwd-GetS

S MAD

Page 16: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

recv Fwd-GetM

recv Fwd-GetM

O Mstore / send GetM recv Data + Ack

recv Fwd-GetM

recv Fwd-GetM

O MAC

How to resolve name conflicts?

Page 17: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

recv Fwd-GetM-O

recv Fwd-GetM-M

O Mstore / send GetM recv Data + Ack

recv Fwd-GetM-O

recv Fwd-GetM-M

O MAC

Rename Messages

Page 18: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

ProtoGen Summary

§ Infer serialization order from incoming messages

§ Rename messages in order to achieve this

§ React like in stable state

Page 19: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

ProtoGen Tool

S

I

M

I

IMAD

M

IMA

IMADI

IMADS

IMAI

IMAS

IMADSI

IMASI

S

ProtoGen Murϕ(DSL)

ProtoGen IR for protocolsProtoGen DSL

Page 20: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

ProtoGen Verification

Verified Protocols

MSI ✓MESI ✓MOSI ✓MOESI ✓TSO-CC ✓

Verified Protocols

MSI ✓MESI ✓MOSI ✓MOESI ✓

Page 21: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

How good are ProtoGen protocols?

§ Protocol specifications from Primer

§ Stalling protocols: Almost identical

§ Non-stalling MSI protocol: 5 fewer stalls

ProtoGen as good (or better) than manually generated protocols

Page 22: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

ProtoGen is work in progress…

• Only directory protocols (we believe snooping is possible)

• Needs a correct SSP (working on autocorrecting SSP protocols)

• Only flat protocols (working on hierarchical)

• Needs virtual channel assignment (working on automating it)

Page 23: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

ProtoGen makes coherence protocols easy!

§ https://github.com/icsa-caps/ ProtoGen

N. Oswald, V. Nagarajan and D.J. SorinProtoGen: Automatically Generating Directory Cache Coherence Protocols from Atomic Specifications ISCA 2018.

Page 24: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence
Page 25: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Own Transaction after Remote Transaction

recv Inv-Ack

I MAD

M

I

S

store / send Upgrade

store /send GetM

recv Data

recv Data recv Last(Inv-Ack)

recv Last(Inv-Ack)

recv Inv / send Inv-Ack

recv Ack

recv DataNoAck

recv Inv-Ack

IMA

recv Inv-Ack

SMA

S MAD

recv Inv / send Inv-Ack

recv Inv-Ack

Page 26: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Own Transaction before Remote Transaction

M

S

store / send Upgrade

recv Data recv Last(Inv-Ack)

recv Ack

recv Fwd-GetM / send DataNoAck

S

recv Inv-Ack

S MAD

SMADIrecv Ack /

send DataNoAck

recv Fwd-GetM

recv Inv-Ack

SMA

recv Data

recv Last(Inv-Ack) / send DataNoAck

SMAI

recv Inv-Ack

SMAII

Page 27: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

S

recv Inv

Mstore / send GetM recv Ack

recv Fwd-GetM

recv Fwd-GetS

S MAD

recv Inv

recv Fwd-GetM

recv Fwd-GetS

SMA

recv Fwd-GetM

recv Fwd-GetS

recv Data recv Last(Inv-Ack)

recv Inv-Ack

Page 28: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Bluespec

§ Idea: Guarded atomic actions§ Atomic updates of multiple participants

§ Dave et al. implemented non-blocking coherence protocol§ Input was not SSP protocol, but complete non-blocking MSI protocol

description

Page 29: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Teapot

§ Language Support for Writing Memory Coherence Protocols§ Similar to ProtoGen DSL§ Does not automatically generate transient states nor transitions

§ Input is not SSP protocol, but complete non-blocking MSI protocol description

Page 30: Cache Coherence Protocols Are HardEasy · Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin S I M I IMAD M IMA IMADI IMADS IMAI IMAS IMADSI IMASI S ProtoGen. Directory Cache Coherence

Atomic Coherence

§ Atomic Coherence: Leveraging Nanophotonics Similar to Build Race-Free Cache Coherence Protocols

§ Atomic transactions § Mutex based approach

§ Performance achieved by leveraging optical interconnects