Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of...
-
Upload
camilla-douglas -
Category
Documents
-
view
220 -
download
2
Transcript of Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of...
Synchronization Transformationsfor
Parallel Computing
Pedro Dinizand
Martin Rinard
Department of Computer ScienceUniversity of California, Santa Barbara
http://www.cs.ucsb.edu/~{pedro,martin}
Motivation
Parallel Computing Becomes Dominant Form of Computation
Parallel Machines Require Parallel Software
Parallel Constructs Require New Analysis and Optimization Techniques
Our GoalEliminate Synchronization Overhead
Talk Outline
• Motivation
• Model of Computation
• Synchronization Optimization Algorithm
• Applications Experience
• Dynamic Feedback
• Related Work
• Conclusions
Model of Computation
• Parallel Programs• Serial Phases• Parallel Phases
• Single Address Space
• Atomic Operations on Shared Data• Mutual Exclusion Locks• Acquire Constructs• Release Constructs
Acq
S1MutualExclusionRegion
Rel
Synchronization Optimization
Idea:Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only Once
Result:Reduction in the Number of
Executed Acquire and Release Constructs
Mechanism:Lock Movement Transformations and
Lock Cancellation Transformations
Synchronization Optimization Algorithm
Overview:
• Find Two Mutual Exclusion Regions With the Same Lock
• Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent
• Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion Region
Synchronization Optimization Trade-Off
• Advantage: • Reduces Number of Executed Acquires and Releases• Reduces Acquire and Release Overhead
• Disadvantage: May Introduce False Exclusion• Multiple Processors Attempt to Acquire Same Lock• Processor Holding the Lock is Executing Code that
was Originally in No Mutual Exclusion Region
False Exclusion Policy
Goal: Limit Potential Severity of False Exclusion
Mechanism: Constrain the Application of Basic
Transformations
• Original: Never Apply Transformations• Bounded: Apply Transformations only on
Cycle-Free Subgraphs of ICFG
• Aggressive: Always apply Transformations
Experimental Results
• Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]
• Set of Complete Scientific Applications (C++ subset)• Barnes-Hut N-Body Solver (1500 lines of Code)• Liquid Water Simulation Code (1850 lines of Code)• Seismic Modeling String Code (2050 lines of Code)
• Different False Exclusion Policies
• Performance of Generated Parallel Code on Stanford DASH Shared-Memory Multiprocessor
Lock Overhead
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
Barnes-Hut (16K Particles)
Original
Bounded
Aggressive
Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
Water (512 Molecules)
Original
BoundedAggressive
0
20
40
60
Perc
enta
ge L
ock
Ove
rhea
d
String (Big Well Model)
OriginalAggressive
Contention OverheadC
onte
ntio
n Pe
rcen
tage
Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors
100
0
25
50
75
0 4 8 12 16Processors
Barnes-Hut (16K Bodies)
0
25
50
75
100
0 4 8 12 16Processors
Water (512 Molecules)
0
25
50
75
100
0 4 8 12 16Processors
String (Big Well Model)
OriginalBoundedAggressive
0
2
4
6
8
10
12
14
16
Spe
edup
0 2 4 6 8 10 12 14 16Number of Processors
Ideal
Aggressive
Bounded
Original
Barnes-Hut (16384 bodies)
Performance Results : Barnes-Hut
Performance Results: Water
Ideal
Aggressive
Bounded
Original
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Water (512 Molecules)
Performance Results: String
String (Big Well Model)
Spe
edup
Number of Processors
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Ideal
Original
Aggressive
Choosing Best Policy
• Best False Exclusion Policy May Depend On• Topology of Data Structures• Dynamic Schedule Of Computation
• Information Required to Choose Best Policy Unavailable at Compile Time
• Complications• Different Phases May Have Different Best Policy• In Same Phase, Best Policy May Change Over Time
Solution: Dynamic Feedback
• Generated Code Consists of• Sampling Phases: Measure Performance of Different
Policies• Production Phases : Use Best Policy From Sampling
Phase
• Periodically Resample to Discover Changes in Best Policy
• Guaranteed Performance Bounds
Dynamic Feedback
AggressiveOriginalBounded
Time
Ove
rhea
d
Sampling Phase Production Phase Sampling Phase
AggressiveCodeVersion
Dynamic Feedback : Barnes-Hut
0
2
4
6
8
10
12
14
16
Spe
edup
0 2 4 6 8 10
12
14
16Number of Processors
Ideal
Aggressive
Dynamic Feedback
Bounded
Original
Barnes-Hut (16384 bodies)
Dynamic Feedback : Water
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Ideal
Bounded
Original
Aggressive
Dynamic Feedback
Water (512 Molecules)
Dynamic Feedback : String
String (BigWell Model)
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Spe
edup
Number of Processors
Ideal
Original
Aggressive
Dynamic Feedback
Related Work
• Parallel Loop Optimizations (e.g. [Tseng:PPoPP95])
• Array-based Scientific Computations• Barriers vs. Cheaper Mechanisms
• Concurrent Object-Oriented Programs (e.g. [PZC:POPL95])
• Merge Access Regions for Invocations of Exclusive Methods
• Concurrent Constraint Programming• Bring Together Ask and Tell Constructs
• Efficient Synchronization Algorithms• Efficient Implementations of Synchronization
Primitives
Conclusions
• Synchronization Optimizations• Basic Synchronization Transformations for Locks• Synchronization Optimization Algorithm
• Integrated into Prototype Parallelizing Compiler• Object-Based Programs with Dynamic Data Structures• Commutativity Analysis
• Experimental Results• Optimizations Have a Significant Performance Impact• With Optimizations, Applications Perform Well
• Dynamic Feedback