Function Level Parallelism Lead by Data Dependencies
Transcript of Function Level Parallelism Lead by Data Dependencies
Function Level Parallelism Lead by Data Dependencies
Sean Rul, Hans Vandierendonck and Koen De Bosschere
Ghent University, ELIS-PARIS, Sint-Pietersnieuwstraat 41, 9000 Gent, Belgium
Sean Rul is supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders
[email protected] http://www.elis.ugent.be/~srul
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Compression Decompression Total
Sp
eed
up
Original Heterogeneous Homogeneous
ConclusionConclusionConclusionConclusionResultsResultsResultsResults
ProblemProblemProblemProblem ApplicationsApplicationsApplicationsApplications
mmmm
ffff
hhhh
gggg
iiii llll
jjjj kkkk
Intercluster data stream
Intracluster data stream
FPGA
MethodMethodMethodMethod
Matching parallel constructsMatching parallel constructsMatching parallel constructsMatching parallel constructs
Call GraphCall GraphCall GraphCall Graph InterproceduralInterproceduralInterproceduralInterprocedural Data Flow GraphData Flow GraphData Flow GraphData Flow Graph Data Sharing GraphData Sharing GraphData Sharing GraphData Sharing Graph
Abstracting profiled informationAbstracting profiled informationAbstracting profiled informationAbstracting profiled information
ParallelizingParallelizingParallelizingParallelizing
Too much information
ProfileProfileProfileProfile
Sequential programSequential programSequential programSequential program
Multithreaded Multithreaded Multithreaded Multithreaded programprogramprogramprogram
Hybrid Hardware / Software or embedded
systems
Data Partitioning onCell processor
Besides parallelizing sequential programs:
Program Bzip2 (SPEC2000) with reference input
Executed on a quad Itanium® system
x 10
x 10 x 20x 20
x 100 x 30 x 20 x 10
1%mmmm
ffff
hhhh
gggg
iiii llll
jjjj kkkk20%
14% 15%
15% 10% 10%
14%
# executions
% execution time
Read
mmmm
ffff
hhhh
gggg
iiii llll
jjjj kkkk
dsdsdsds1111
dsdsdsds4444
dsdsdsds7777
dsdsdsds8888
dsdsdsds5555
dsdsdsds6666
dsdsdsds9999
dsdsdsds2222 dsdsdsds3333
Cluster privateCluster shared
Write
•New microprocessor generation: Increase in parallel computing power
•Sequential programs: Cannot exploit these resources
•Parallelizing by hand: Difficult and time consuming
•Let the compiler do it:Setup framework for parallelism detection
•Call graph and interprocedural data flow graphare useful for detecting parallel constructs
•Data sharing graph reveals data affinitybetween functions
•Future work:- Find new parallel constructs- Investigate bidirectional data streams
Look for a balanced solution
Detect for examplea data pipeline
Minimize communication between threads Add synchronization and initialization code
Elliptic node:Data structure
Rectangular node: Function