Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics...
Transcript of Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics...
![Page 1: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/1.jpg)
Chasing Away RAts: Semantics and Evaluation
for Relaxed Atomics on Heterogeneous Systems
Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve
University of Illinois @ Urbana-Champaign
Matt’s Future: AMD Research, University of Wisconsin
![Page 2: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/2.jpg)
“Everyone (thinks they) can cook” use relaxed atomics (RAts)
Incorrect usage No formal definition
Out-of-thin-air values Hard to debug
Not portable
Health code violations: Correctness
![Page 3: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/3.jpg)
Consistency is Complex
“If you think you understand quantum computers, it’s
because you don’t. Quantum computing is actually
harder than memory consistency models.”
- Luis Ceze, video in ISCA ‘16 Keynote
Memory consistency: gold standard for complexity
3
Relaxed atomics add even more complexity
![Page 4: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/4.jpg)
No Formal Specification for Relaxed Atomics
C++17 "specification" for relaxed atomics • Races that don't order other accesses
• Implementations should ensure no “out-of-thin-air”
values are computed that circularly depend on their own
computation “C++ (relaxed) atomics were the worst idea ever. I just
spent days (and days) trying to get something to work.
… My example only has 2 addresses and 4 accesses, it
shouldn’t be this hard. Can you help?”
- Email from employee at major research lab
4
Formal specification for relaxed atomics is a longstanding problem
![Page 5: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/5.jpg)
• But generally use simple, SW-based coherence
– Cost of staying away from relaxed atomics too high!
5
Why Use Relaxed Atomics?
0X
10X
20X
Spe
ed
up
27X 99X 28X
![Page 6: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/6.jpg)
• Previous work
– Goal: formal semantics for all possible relaxed atomics uses
– Unsuccessful despite ~15 years of effort
• Insight: analyze how real codes use relaxed atomics
– What are common uses of relaxed atomics?
– Why do they work?
– Can we formalize semantics for them?
6
Our Approach
![Page 7: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/7.jpg)
Contributions
7
Everyone can safely use RAts
• Identified common uses of relaxed atomics
– Work queues, event counters, ref counters, seqlocks, …
• Data-race-free-relaxed (DRFrlx) memory model:
– Sequentially consistent (SC) centric semantics + efficiency
• Evaluated benefits of using relaxed atomics
– Up to 53% less cycles (33% avg), 40% less energy (20% avg)
![Page 8: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/8.jpg)
Outline
• Motivation
• Background
• Data-race-free-relaxed
• Results
• Conclusion
8
![Page 9: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/9.jpg)
Atomics Background
• Default: Data-race-free-0 (DRF0) [ISCA ‘90]
– Identify all races as synchronization accesses (C++: atomics)
– All atomics order data accesses
– Atomics order other atomics
Ensures SC semantics if no data races
9
// each thread for i = 0:n … ADD R4, A[i], R1 ADD R5, B[i], R1 …
synch (atomic)
synch (atomic)
![Page 10: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/10.jpg)
Atomics Background (Cont.)
• Default: Data-race-free-0 (DRF0) [ISCA ‘90]
– All atomics order data accesses
– Atomics order other atomics
Ensures SC semantics if no data races
• Data-race-free-1 (DRF1): unpaired atomics [TPDS ‘93]
+ Unpaired atomics do not order data accesses
– Atomics order other atomics
Ensures SC semantics if no data races
• Relaxed atomics [PLDI ‘08]
+ Do not order data or other atomics
But can violate SC and no formal specification 10
![Page 11: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/11.jpg)
Outline
• Motivation
• Background
• Data-race-free-relaxed
• Results
• Conclusion
11
![Page 12: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/12.jpg)
Identifying Relaxed Atomic Use Cases
• Our Approach
– What are common uses of relaxed atomics?
– Why do they work?
– Can we formalize semantics for them?
• Contacted vendors, developers, and researchers
12 How do relaxed atomics work in Event Counters?
![Page 13: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/13.jpg)
Accel
• Threads concurrently update counters
– Read part of a data array, updates its counter
13
Event Counter
L1
Cache
L1
Cache
L1
Cache
L1
Cache
L2 Cache
Counters … 0 0 0 0 0 0
…
1 1 1 1 1 1
![Page 14: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/14.jpg)
Accel
• Threads concurrently update counters
– Read part of a data array, updates its counter
– Increments race, so have to use atomics
14
Event Counter (Cont.)
L1
Cache
L1
Cache
L1
Cache
L1
Cache
L2 Cache
Counters … 2 1 3 2 1 1
…
1 1 1 1 1 1
![Page 15: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/15.jpg)
Accel
• Threads concurrently update counters
– Read part of a data array, updates its counter
– Increments race, so have to use atomics
15
Event Counter (Cont.)
L1
Cache
L1
Cache
L1
Cache
L1
Cache
L2 Cache
Counters …
…
7 1 9 1 5 3
Commutative increments: order does not affect final result
How to formalize?
![Page 16: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/16.jpg)
Incorporating Commutativity Into DRFrlx
16
What about the other use cases?
• New relaxed atomic category: commutative
• Formalism:
– Accesses are commutative
– Intermediate values must not be observed
Final result is always SC
![Page 17: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/17.jpg)
Incorporating Other Use Cases Into DRFrlx
17
SC
Final result always SC
SC-centric: non-SC parts isolated
Unpaired
Non-Ordering
Commutative
Speculative
Quantum
Semantics Category Use Case
Work Queues
Flags
Seqlocks
Event Counters
Ref Counters
Split Counters
![Page 18: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/18.jpg)
Outline
• Motivation
• Background
• Data-race-free-relaxed
• Results
• Conclusion
18
![Page 19: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/19.jpg)
Evaluation Methodology
19
• 1 CPU core + 15 GPU compute units (CU)
– Each node has private L1, scratchpad, tile of shared L2
• Simulation Environment
– GEMS, Simics, Garnet, GPGPU-Sim, GPUWattch, McPAT
• Study DRF0, DRF1, DRFrlx w/ GPU & DeNovo coherence
• Workloads
– Microbenchmarks for each use case
• Relaxed atomics help a little (Avg: 10% cycles, 5% energy)
– Benchmarks with biggest RAts speedups on discrete GPU
• UTS, PageRank (PR), Betweeness Centrality (BC)
![Page 20: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/20.jpg)
Relaxed Atomics Applications – Execution Time
20
0%
20%
40%
60%
80%
100%
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
G0 = GPU coherence + DRF0
G1 = GPU coherence + DRF1
GRlx = GPU coherence + DRFrlx
D0 = DeNovo coherence + DRF0
D1 = DeNovo coherence + DRF1
DRlx = DeNovo coherence + DRFrlx
![Page 21: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/21.jpg)
Relaxed Atomics Applications – Execution Time
21
0%
20%
40%
60%
80%
100%
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
Relaxed atomics reduce cycles up to ~50%
DeNovo increases reuse over GPU: 10% avg. for DRFrlx
104
G0 G1 D0 D1 DRlx GRlx
PR-2 PR-3 PR-4 PR-1 UTS BC-1 BC-2 BC-3 BC-4 AVG
![Page 22: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/22.jpg)
Relaxed Atomics Applications – Energy
22
0%
20%
40%
60%
80%
100%
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
GD
0G
D1
GD
2D
D0
DD
1D
D2
N/W L2 $ L1 D$ Scratch GPU Core+
Energy similar to execution time trends
DeNovo’s reuse reduces energy over GPU: 29% avg. for DRFrlx
104
PR-2 PR-3 PR-4 PR-1 UTS BC-1 BC-2 BC-3 BC-4 AVG
![Page 23: Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics …rsim.cs.uiuc.edu/Talks/17-isca-sinclair-rats.pdf · 2018-02-19 · Consistency is Complex “If you think you understand](https://reader034.fdocuments.in/reader034/viewer/2022050113/5f4aae20e35a1a131942d585/html5/thumbnails/23.jpg)
Conclusion
23
DRFrlx: SC-centric semantics + efficiency
Everyone can safely use RAts
• Cost of avoiding relaxed atomics too high
• Difficult to use correctly: no formal specification
• Insight: Analyze how real codes use relaxed atomics