Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations
description
Transcript of Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations
1
Evaluating the Impact of Thread Escape Analysis on
Memory Consistency Optimizations
Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff, Jaejin Lee and David Padua
University of Illinois at Urbana-Champaign
IBM T.J. Watson Research Center
Purdue University
Seoul National University
2
Outline
• Memory Models• The Pensieve System• Escape Analyses• Qualitative Impact of Escape Analyses
on Delay Set Analysis and Synchronization Analysis
• Experimental Results• Conclusion
3
Memory Models
• Consider the following code segments:– Thread 1 : data = 100; data_ready = true;– Thread 2 : while (!data_ready); t = data;
• Can t == 0?– Yes if reordering happens
• Thread 1 : data_ready = true; data = 100;• Can be done by compiler and hardware
– Memory models tell us the answer• Sequential Consistency says no
4
Objective of the Pensieve Project
• Sequential consistency (SC) on top of Intel x86 memory models– Implementation based on Jikes RVM
• All analyses done in JIT time• Need to minimize both analysis and application
execution time
5
Enforcing SC
• Done by enforcing memory accesses orders– not all orderings need to be enforced– only enforce orders really needed
• Delay Set Analysis (DSA) [SS88] computes such orders• Our approach : Approximation of DSA
– Orders enforced by inserting fences in generated code
6
Original DSA
• Program edge– x executes before y in
the same thread
• Conflict edge– x and x’ conflict accesses
• Order of access affects program outcome
• In this paper:– to the same memory
location– one of them is a write
x x’
x y
y’
y
x
x’
7
Original DSA (Cont’d)
• Critical cycle– Minimal
• Cannot form smaller cycle using subset of nodes
– Mixed• Contains both edges
• Enforce program edges on a critical cycle
y’
y
x
x’
Minimal
Not minimal
y’
y
x
x’
z Not mixed
y
x
Mixed
y
x
8
Approximate DSA
• Approximate of critical cycle– x precedes y– Conflict accesses for
• x and x’• y and y’
– y’ precedes x’
• Enforce program edges on approx critical cycle
x
y x’
y’
9
Source Program
Code Optimizations
Fence Insertion& Optimization
Program Analyses
Thread EscapeAnalysis
Program Analyses
The Pensieve System
Target Program
Orders toEnforce
SynchronizationAnalysis
Delay SetAnalysis
10
Escape Analyses
• Identify objects which may be accessed by two or more threads
• Output: set of variables– {v | v points to an object may be accessed by >= 2 threads}
11
Impact on Delay Set Analysis
• x, y, y’, x’ must be escaping accesses– Cannot form a cycle if
one of them is not escaping access
• Fewer escaping accesses implies fewer possible pairs of (x,y)– Fewer checks to be done– Fewer delays
y x’
y’x
12
Impact on Synchronization Analysis
• Synchronization analysis reduces number of conflict edges considered by DSA– Consider synchronized construct– Calls to start() and join()
• Our system only consider t1.join() – if it can match some t2.start() call– t1 and t2 are not escaping
• More precise escape info more join() calls matched more precise DSA result
13
Escape Analyses Comparison
• In this study, we compare 4 algorithms:– Connectivity Analysis (Pensieve)– Field Base Analysis (Pensieve)
• For comparison purposes
– Bogda’s Analysis• Removing Unnecessary Synchronization in Java. (OOPSLA
1999)
– Ruf’s Analysis• Effective Synchronization Removal for Java. (PLDI 2000)
14
Connectivity Escape Analysis
• An object is escaping if both– Reachable by more than one thread due to two
possible cases:• Reachable by a static field• Passed from a thread constructor
– Accessed by more than one thread
• Do not assume this escaping in run() by default
• Field insensitive for most memory accesses– I.e. do not distinguish x.f vs x.g– Except accesses to Runnable objects
15
Field Base Escape Analysis
• An object is escaping if– Reachable from a static field– Passed from a thread constructor
• Do not assume this escaping in run() by default– Similar to connectivity base analysis,
• Field sensitive– Suppose O1, O2 of same type
• O1.f different from O1.g• O1.f same as O2.f
16
Bogda’s Escape Analysis
• An object is escaping if it is reachable:– By a static field– By a Runnable object– Via more than 1 field reference
17
Ruf’s Escape Analysis
• An object is escaping if both– Reachable from either
• A static field or •A Runnable object
– Synchronized by more than one thread
• Adapted for our own use– “synchronized” “accessed”
18
Experimental Settings (Machine)
• Intel (Dell PowerEdge 6600 SMP)– 4 Intel hyperthreaded 1.5Ghz Xeon processors– with 1MB cache each– 6G system memory.
19
Experimental Settings (Software)
• Original– default Jikes RVM implementation– base case for performance comparison
• Enforcing SC– Empty– Arg Escaping– Connectivity analysis– Field-base analysis– Bogda’s analysis (bogda)– Ruf’s analysis
20
Measurements
• Escape Analysis Time• Impact on Delay Set Analysis Time• Impact on Synchronization Analysis
Time• Slowdown due to fence insertion
– Delay Set Analysis only– Delay Set Analysis with Synchronization
Analysis
21
Escape Analysis Time
1
10
100
1000
10000
100000
1000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Escape Analysis Time in ms
empty argEscape connect ruf5 field-base bogda
22
Impact on Delay Set Analysis Time
0
50
100
150
200
250
300
350
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Delay Set Analysis Time in ms
connect ruf5 bogda field-base argEscape empty
23
Impact on Synchronization Analysis Time
1
10
100
1000
10000
100000
1000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Synchronization Time in ms
field-base bogda empty argEscape connect ruf5
24
Escape+DSA+ Synchronization Analysis Time / Compilation Time
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Analysis Time / Compilation Time
empty argEscape connect ruf5 field-base bogda
25
Slowdown (DSA Only)
0
2
4
6
8
10
12
14
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Slowdown (DSA only)
connect ruf5 bogda field-base argEscape empty
26
Slowdown (DSA+Sync Analysis)
0
2
4
6
8
10
12
14
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Slowdown (DSA+Synchronization Analysis)
connect ruf5 bogda field-base argEscape empty
27
Slowdown of connect (DSA+Sync Analysis)
0
0.5
1
1.5
2
2.5
3
3.5
4
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Slowdown of connect (DSA+Synchronization Analysis)
connect
28
Conclusions
• Evaluate interaction between escape analysis and synchronization/delay set analysis
• Montecarlo and jbb motivates enabling field sensitivity for connectivity base analysis
29
Backup Slides Follow
30
Number of Delay Checks Performed
1
10
100
1000
10000
100000
1000000
10000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Number of Delay Check Performed
connect ruf5 bogda field-base argEscape empty
31
Total Compilation Time
1
10
100
1000
10000
100000
1000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Total Compilation Time in ms
connect ruf5 argEscape empty field-base bogda
32
Number of Delays Found (DSA Only)
1
10
100
1000
10000
100000
1000000
10000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Number of Delays Found (DSA Only)
connect ruf5 bogda field-base argEscape empty
33
Number of Delays Found (DSA + Sync Analysis)
1
10
100
1000
10000
100000
1000000
10000000
mtrt moldyn montecarlo raytracer boundedbuf disksched geneticalgo hashmap seive jbb AVG
Number of Delays Found (DSA+Sync Analysis)
connect ruf5 bogda field-base argEscape empty