A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2...
-
Upload
jodie-strickland -
Category
Documents
-
view
215 -
download
1
Transcript of A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2...
![Page 1: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/1.jpg)
A Shape Analysis for Optimizing Parallel Graph Programs
Dimitrios Prountzos1
Keshav Pingali1,2
Roman Manevich2
Kathryn S. McKinley1
1: Department of Computer Science, The University of Texas at Austin2: Institute for Computational Engineering and Sciences, The
University of Texas at Austin
![Page 2: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/2.jpg)
2
Motivation
• Graph algorithms are ubiquitous
• Goal: Compiler analysis for optimization of parallel graph algorithms
Computational biology Social Networks
Computer Graphics
![Page 3: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/3.jpg)
3
Organization• Parallelization of graph algorithms in Galois system
– Speculative execution– Example: Boruvka MST algorithm
• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis
• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery
• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x
![Page 4: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/4.jpg)
4
Boruvka’s Minimum Spanning Tree Algorithm
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
c d
a b
e f
g
2 4
6
5
3
7
4
1d
a,c b
e f
g
4
6
3
4
17
lt
![Page 5: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/5.jpg)
5
• Algorithm = repeated application of operator to graph
– Active node: • Node where computation is needed
– Activity: • Application of operator to active node
– Neighborhood:• Sub-graph read/written to perform activity
– Unordered algorithms: • Active nodes can be processed in any order
• Amorphous data-parallelism– Parallel execution of activities, subject to neighborhood constraints
• Neighborhoods are functions of runtime values – Parallelism cannot be uncovered at compile time in general
Parallelism in Boruvka
i1
i2
i3
![Page 6: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/6.jpg)
6
Optimistic Parallelization in Galois• Programming model
– Client code has sequential semantics– Library of concurrent data structures
• Parallel execution model– Thread-level speculation (TLS)– Activities executed speculatively
• Conflict detection– Each node/edge has associated exclusive
lock– Graph operations acquire locks on
read/written nodes/edges– Lock owned by another thread conflict
iteration rolled back– All locks released at the end
• Two main overheads– Locking– Undo actions
i1
i2
i3
![Page 7: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/7.jpg)
7
Overheads (I): Locking• Optimizations– Redundant locking elimination– Lock removal for iteration private data– Lock removal for lock domination
• ACQ(P): set of definitely acquired locks per program point P• Given method call M at P:
Locks(M) ACQ(P) Redundant Locking
![Page 8: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/8.jpg)
8
Overheads (II): Undo actions
Lockset Grows
Lockset Stable
Failsafe
…
foreach (Node a : wl) {
…
…
}
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Program point P is failsafe if: Q : Reaches(P,Q) Locks(Q) ACQ(P)
![Page 9: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/9.jpg)
9
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Lockset Analysis
• Redundant Locking• Locks(M) ACQ(P)
• Undo elimination• Q : Reaches(P,Q)
Locks(Q) ACQ(P)
• Need to compute ACQ(P)
: Runtime overhead
![Page 10: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/10.jpg)
10
Analysis Challenges• The usual suspects: – Unbounded Memory Undecidability – Aliasing, Destructive updates
• Specific challenges:– Complex ADTs: unstructured graphs– Heap objects are locked– Adapt abstraction to ADTs
• We use Abstract Interpretation [CC’77]– Balance precision and realistic performance
![Page 11: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/11.jpg)
11
Organization• Parallelization of graph algorithms in Galois system
– Speculative execution– Example: Boruvka MST algorithm
• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis
• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery
• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x
![Page 12: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/12.jpg)
Shape Analysis Overview
12
HashMap-Graph
Tree-based Set
……
Graph { @rep nodes @rep edges …}
Graph Spec
Concrete ADTImplementationsin Galois library
Predicate Discovery
Shape Analysis
Boruvka.javaOptimizedBoruvka.java
Set { @rep cont …}
Set Spec
ADT Specifications
![Page 13: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/13.jpg)
13
ADT Specification
Graph<ND,ED> {
@rep set<Node> nodes @rep set<Edge> edges
Set<Node> neighbors(Node n);
}
Graph Spec
...Set<Node> S1 = g.neighbors(n);
...
Boruvka.java
Abstract ADT state by virtual set fields
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) )
Assumption: Implementation satisfies Spec
![Page 14: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/14.jpg)
14
Graph<ND,ED> {
@rep set<Node> nodes@rep set<Edge> edges
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}
Modeling ADTs
c
a bGraph Spec
dst
src
srcdst
dst
src
![Page 15: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/15.jpg)
15
Modeling ADTs
c
a b
nodes edges
Abstract State
cont
ret nghbrs
Graph Spec
dst
src
srcdst
dst
src
Graph<ND,ED> {
@rep set<Node> nodes@rep set<Edge> edges
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}
![Page 16: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/16.jpg)
16
Organization• Parallelization of graph algorithms in Galois system
– Speculative execution– Example: Boruvka MST algorithm
• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis
• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery
• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x
![Page 17: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/17.jpg)
17
cont cont
S1 S2L(S1.cont) L(S2.cont)
Abstraction Scheme
(S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont)
• Parameterized by set of LockPaths: L(Path) o . o ∊ Path Locked(o)– Tracks subset of must-be-locked objects
• Abstract domain elements have the form: Aliasing-configs 2LockPaths …
![Page 18: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/18.jpg)
18
( L(y.nd) ) ( () L(x.nd) )
( L(y.nd) ) ( L(y.nd) L(x.rev(src)) ) ( () L(x.nd) )
Joining Abstract States
( L(y.nd) ) ( () L(x.nd) )
Aliasing is crucial for precisionMay-be-locked does not enable our optimizations
#Aliasing-configs : small constant (6)
![Page 19: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/19.jpg)
19
lt
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Example Invariant in Boruvka
The immediate neighbors of a and lt are locked
a
( a ≠ lt ) ∧ L(a) L(a.rev(src)) L(a.rev(dst))∧ ∧ ∧ L(a.rev(src).dst) L(a.rev(dst).src) ∧ ∧ L(lt) L(lt.rev(dst)) L(lt.rev(src)) ∧ ∧ ∧ L(lt.rev(dst).src) L(lt.rev(src).dst)∧
…..
![Page 20: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/20.jpg)
20
Heuristics for Finding Paths
• Hierarchy Summarization (HS)– x.( fld )*– Type hierarchy graph acyclic
bounded number of paths
– Preflow-Push: • L(S.cont) L(S.cont.nd)∧• Nodes in set S and their data are locked
Set<Node>
S
Node
NodeData
cont
nd
![Page 21: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/21.jpg)
21
Footprint Graph Heuristic• Footprint Graphs (FG)[Calcagno et al. SAS’07]– All acyclic paths from arguments of ADT method to
locked objects– x.( fld | rev(fld) )* – Delaunay Refinement: L(S.cont) L(S.cont.rev(src)) L(S.cont.rev(dst)) ∧ ∧ ∧ L(S.cont.rev(src).dst) L(S.cont.rev(dst).src)∧– Nodes in set S and all of their immediate
neighbors are locked
• Composition of HS, FG– Preflow-Push: L(a.rev(src).ed)
FG HS
![Page 22: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/22.jpg)
22
Organization• Parallelization of graph algorithms in Galois system
– Speculative execution– Example: Boruvka MST algorithm
• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis
• Shape analysis – Abstract Data Type modeling– Hierarchy summarization abstraction– Predicate discovery
• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x
![Page 23: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/23.jpg)
23
Experimental Evaluation• Implement on top of TVLA– Encode abstraction by 3-Valued Shape Analysis
[SRW TOPLAS’02]• Evaluation on 4 Lonestar Java benchmarks
• Inferred all available optimizations• # abstract states practically linear in program size
Benchmark Analysis Time (sec)
Boruvka MST 6
Preflow-Push Maxflow 7
Survey Propagation 12
Delaunay Mesh Refinement 16
![Page 24: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/24.jpg)
24
Impact of Optimizations for 8 Threads
Boruvka MST Delaunay Mesh Refinement
Survey Propaga-tion
Preflow-Push Maxflow
0
50
100
150
200
250
Baseline Optimized
Tim
e (s
ec)
5.6× 4.7×11.4×
2.9× 8-core Intel Xeon @ 3.00 GHz
![Page 25: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/25.jpg)
25
Related Work• Safe programmable speculative parallelism [Prabhu et al. PLDI’10]
– Focused on value speculation on ordered algorithms– Different rollback freedom condition
• Transactional Memory compiler optimizations [Harris et al. PLDI’06, Dragojevic et al. SPAA’09]– Similar optimizations– Don’t target rollback freedom– Imprecise for unbounded data-structures
• Optimizations for parallel graph programs [Mendez-Lojo et al. PPOPP’10]– Manual optimizations– Failsafe subsumes cautious
• Verifying conformance of ADT implementation to specification– The Jahob project (Kuncak, Rinard, Wies et al.)
![Page 26: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/26.jpg)
26
Conclusion• New application for static analysis – Optimization of optimistically parallelized graph
programs
• Novel shape analysis– Utilize observations on the structure of concrete
states and programming style
• Enables optimizations crucial for performance
![Page 27: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/27.jpg)
27
Thank You!
![Page 28: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/28.jpg)
28
Backup
![Page 29: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/29.jpg)
Outline of Boruvka MST CodeGSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Pick arbitrary worklist node
Find lightest neighbor
Update neighbors of lightest
Update worklist and MST
![Page 30: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/30.jpg)
30
Approximating Sets of Locked Objects
Reachability-based Scheme
cont cont
S1 S2
Reach(S1.cont)Reach(S2.cont) Reach(S2.cont) Reach(S2.cont)
cont cont
S1 S2
Reach(S1.cont)Reach(S2.cont) Reach(S2.cont)
![Page 31: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/31.jpg)
31
cont cont
S1 S2L(S1.cont) L(S2.cont)
Approximating Sets of Locked Objects
Hierarchy Summarization Scheme
cont cont
S1 S2L(S1.cont) L(S2.cont) (S1 ≠ S2)
∧ L(S1.cont) ∧ L(S2.cont)
L(Path) o . o ∊ Path Locked(o)
![Page 32: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/32.jpg)
32
• Graph API uses flags to enable/disable locking and storing undo actions
– removeEdge(Node src, Node dst, Flag f);
Enabling Optimizations in Galois
Challenge: Find minimal flag per ADT method call Solution: Lockset analysis
UNDOLOCKS
NONE
ALL
Lock(src)Lock(dst)
Lock( (src,dst) )addEdge(src, dst);
![Page 33: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/33.jpg)
33
Optimization Conditions• set of definitely acquired locks per program
point
• Given method call at :
• Program point is failsafe if:
– Infer from by simple backward analysis
![Page 34: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/34.jpg)
34
Speculation Overheads and Optimizations
Source of Overhead Optimization
Locking shared objects• Redundant locking elimination• Lock elision for iteration private data• Lock domination
Backup original state for rollback • Avoid backups after failsafe points
![Page 35: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/35.jpg)
35
Modeling ADTs
c d
a b
7
23
4
5
ns es
Abstract State
cont
retnghbrs
Graph<ND,ED> {
@rep set<Node> ns; // nodes @rep set<Edge> es; // edges
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);
}
Graph Spec
a b
5
srcdst
ed
![Page 36: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/36.jpg)
36
Failsafe Points – Eliminating Undo Actions
c d
a b
7
23
4
5
Graph Node
Graph Edge
Edge Data
lt
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
a
g.neighbors(lt);CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]
![Page 37: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/37.jpg)
37
Boruvka’s Minimum Spanning Tree Algorithm
• Build MST bottom uprepeat { pick arbitrary active node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a singular node
c d
a b
e f
g
2 4
6
5
3
7
4
1
3
7
![Page 38: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/38.jpg)
39
Failsafe Points – Eliminating Undo Actions
c d
a b
7
23
4
5
Graph Node
Graph Edge
Edge Data Acquired
New Lock
Redundant
lt
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
a
g.neighbors(lt);CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]
![Page 39: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/39.jpg)
40
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Redundant Locking Example
c d
a b
7
23
4
5
Graph Node
Graph Edge
Edge Data Acquired
New Lock
Redundant
lt
a
![Page 40: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/40.jpg)
41
Boruvka’s Minimum Spanning Tree Algorithm
c d
a b
e f
g
2 4
6
5
3
7
4
1
3
7
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
• Build MST iteratively• Pick random active node• Contract edge with lightest
neighbor
![Page 41: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/41.jpg)
42
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Parallelism in Boruvka’s Algorithm
c d
a b
e f
g
2 4
6
5
37
4
1 f
• Dependences between activities are functions of runtime values • Parallelism cannot be uncovered
at compile time in general
• Don’t Care Non-Determinism• All produced MSTs correct and
optimal
![Page 42: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/42.jpg)
43
Abstraction of a Single State
c d
a b
7
23
4
5
a
lt
State after first loop in Boruvka
( a lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧
UniqueRef(EdgeData)
We maintain We loose
Set of definitely locked objects denoted by lockpaths Maybe locked information
Aliasing of top level variables Cardinality of sets
Uniquely pointed-to types and objects referenced from the stack
Content sharing of multiple collections
![Page 43: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.](https://reader030.fdocuments.in/reader030/viewer/2022032611/56649e885503460f94b8c736/html5/thumbnails/43.jpg)
44
Hierarchy Summarization Intuition
GraphSet<Node>
Weight
Node
Edge
Iterator<Node>
es
src,dst
nscont
ed
past, at,future
nd
all
gaNghbrsltNghbrsnIter
a, lt, n
w,wan,
minW
e,an
Gset<Node>
gcont
wl
GBag<Weight>
mst
bcont
Void