Volition: Scalable and Precise Sequential Consistency...
Transcript of Volition: Scalable and Precise Sequential Consistency...
Universityof Illinois http://iacoma.cs.uiuc.edu/
Volition: Scalable and Precise Sequential Consistency Violation
Detection
Xuehai Qian, Benjamin SahelicesJosep Torrellas, Depei Qian
University of Illinois
Xuehai Qian Volition: Scalable and Precise SC Detection
Sequential Consistency (SC)
2
• In SC, memory accesses:
• Appear atomic
• Have a total global order
• For each thread, follow program order
P0
A0: x=1A1: y=1
P1
B0: p=yB1: t=x
Total Order
A0A1B0B1
A0B0A1B1
A0B0B1A1
B0B1A0A1
B0A0B1A1
B0A0A1B1
Xuehai Qian Volition: Scalable and Precise SC Detection
Sequential Consistency Violation(SCV)
3
• SCV: accesses reorder that does not conform to SC
• Machines support relaxed memory models, not SC
• Machines may induce SC violations (SCV)
P0
A0: x=1A1: y=1
P1
B0: p=yB1: t=x
Total Order
A0A1B0B1
A0B0A1B1
A0B0B1A1
B0B1A0A1
B0A0B1A1
B0A0A1B1
Initially: x=y=0
A1B0B1A0(p=1)
(t=0) (1,0)(1,1) (0,0) (0,1)
SCV can cause very unintuitive
results (bugs)
Xuehai Qian Volition: Scalable and Precise SC Detection
Why Detecting SCVs is Important?
4
• Programmers assume SC
• SCV is almost always a bug: unexpected interleaving
• Single-stepping debuggers cannot reproduce the bug
• Causes portability problems (e.g Intel TBB)
• Code may not work across machines
• Lock-free data structures sometimes explicitly use races but rely on SC
• Traditional data race detectors won’t work
Xuehai Qian Volition: Scalable and Precise SC Detection
Contribution
5
• Volition: first hardware scheme that detects SCVs in a relaxed-consistency machine, such that
• It is precise
• It is scalable
• Works for an arbitrary number of participating processors
• Leverages coherence protocol transactions to detect cycles in memory access orders across threads
• Incurs negligible overhead and is decoupled from the cache coherence protocol
Xuehai Qian Volition: Scalable and Precise SC Detection
Outline
6
•Motivation
•Volition Mechanisms
•Evaluation
Xuehai Qian Volition: Scalable and Precise SC Detection
Understanding SCV
7
• Two or more data races overlap and form a cycle
• Problem: detecting such cycles without affecting the execution timing
• Design Goals:
• Precise
• Scalable
• Low overhead
P0 P1 P0 P1 P2
wr y
rd x
wr x
rd y
Not completed
Completed
Xuehai Qian Volition: Scalable and Precise SC Detection
Definition: Active Accesses
8
P0 P1
wr x
rd y
wr y
Not completed
Completed
• Active Accesses are the accesses that may participate in SCV
• Active Access:
• Either itself or older local access is not completed OR
• It is the destination of a dependence from an active access
• Volition provides metadata hardware:
• For all accesses: Sequence Number (SN)
• For active accesses: current state (address, completion status, ..)
rd x
Xuehai Qian Volition: Scalable and Precise SC Detection
Definition: Active Data Race (AR)
9
• Data race where the source (and therefore destination) are active
• SCVs are composed of two or more ARs forming a cycle
• Volition detects ARs and cycles in hardware dynamically
P0 P1 P0 P1 P2
wr y
rd x
wr x
rd y
Xuehai Qian Volition: Scalable and Precise SC Detection
How Does Volition Detect an SCV?
10
• All active accesses in a processor: recorded in an ACtive Table (ACT)
• If active access is the source of a dependence
• Record AR in AR Source Table (ARST)
• Respond with a bit in the coherence response
• Destination:
• Record the AR in the AR Destination Table (ARDT)
• If source becomes inactive
• Remove local ARST
• Send Expire msg to destination, remove ARDT
• SCV: when a processor finds two ARs forming a cycle (i.e. SNsrc > SNdst)
P0 P1
wr y
rd x
wr x
rd y
ACT ACT94100
94
100
SN
90
98
SN
9098
ARST100➙90
ARST98➙9498➙94
ARDT ARDT100➙90
Xuehai Qian Volition: Scalable and Precise SC Detection
Active Table (ACT)
11
• FIFO table of all active accesses
• When to insert an ACT entry?
• Access is issued (program order)
• When to delete an ACT entry?
• Access is at the head of ACT and completed, AND
• Access is not the destination of an AR
• HW keeps deleting entries until the conditions don’t hold
• After deleting an entry: if it is the source of an AR
• Remove the related entry in ARST
• Send Expire msg
• Remove ARDT entry when expire is received
P0 P1
wr y
rd x
wr x
ACT ACT
94100
94
100
SN
90
SN
90yx
SN Addr PC ... SN Addr PC ...
ARST100➙90
ARDT100➙90
x
Xuehai Qian Volition: Scalable and Precise SC Detection
Finding Cycles with any # of Processors
12
Pi Pj Pk
SNi
SNjd
SNjs
SNkAR
ARARtrans
• AR Propagation: Volition creates a transitive dependence out of two dependences
• ARtrans: from the src of one AR to the dst of the other AR
• Detected by the processor in the middle
• Only if program order connects these two ARs
P0 P1 P2
Xuehai Qian Volition: Scalable and Precise SC Detection
SCV Detection with Arbitrary # of Processors
13
Pi Pj Pk
1
2
3
4
5
6
• In each processor: Volition HW keeps generating transitive dependences
• When a processor detects a local dependence from younger to older access ➙ SCV
Pi Pj Pk
1
2
3
4
5
6
Pi Pj Pk
1
2
3
4
5
6
AR Propagation AR detected or propagated so farSCV!!
Xuehai Qian Volition: Scalable and Precise SC Detection
Multi-word Cache Line Issue
14
P0 P1
a b
Wc
Rb
Wa
Wa does not access same word as Rb ➙ HW
does not record ARRc
Wb
Wb does not cause a coherence transaction ➙HW does not record AR
Missed AR and cycle
Xuehai Qian Volition: Scalable and Precise SC Detection
Support for Multi-word Cache Line
15
• For lines that are being actively shared
• Augment line with Summary of Active Information (SAI)
• Contains current information on system-wide active accesses to any of the words
• Indicates: when future access, do we need to record an AR?
a bSAIa
R3,R6W1
SAIb
• When a word is accessed and AR-recording is necessary:
• If a cache coherence transaction occurs ➙ reuse the transaction to record the AR
• Else ➙ create a “metadata access” transaction to record the AR
• SAI info cleared when an access becomes inactive
Xuehai Qian Volition: Scalable and Precise SC Detection
Summary of Evaluation
16
• Volition is able to detect SCVs in codes in which we removed fences
• Support for multi-word cache line metadata is needed to avoid missing SCVs
• Using data races as proxies for SCVs is insufficient
Xuehai Qian Volition: Scalable and Precise SC Detection
Conclusion
17
• Volition is the first scalable and precise SCV detector
• Works with directory protocol
• Can detect SCVs involving an arbitrary number of processors
• Incurs negligible execution time and traffic overhead
• Needs modest-sized hardware structures
Universityof Illinois http://iacoma.cs.uiuc.edu/
Volition: Scalable and Precise Sequential Consistency Violation
Detection
Xuehai Qian, Benjamin SahelicesJosep Torrellas, Depei Qian
University of Illinois