Low Cost, High Performance, and Scalability:A New Approach to User-Level Distributed
Shared Memory
Patrick Anthony La FrattaWORTS 2005
15 December 2005
GUARANTEED!
OR YOUR MONEY BACK!!!
Initialization, Step 3:Take away all access privileges from the
shared segments.
Implementing the DSM Client
Implementing the DSM System
Shared address read, Step 3:Client contacts the server to get the page’s
data.
Implementing the DSM System
Shared address read, Step 4:Client grants read access privileges to
application.
Implementing the DSM SystemShared address write, Step 3:
Client contacts server to with write notification.
Implementing the DSM SystemShared address write, Step 4:
Server calls back all other copies of pages being written.
Implementing the DSM SystemShared address write, Step 6:
Client grants write privileges to application.
Implementing the DSM SystemShared address write, Step 7:
Later, the app detaches pages so others may use them.
Preliminary Results: All Pairs Shortest Paths
Note: Results matched for all test cases, and all runs completed successfully.
Exec. Time vs. Problem Size for Seq. and Parallel (with Row-wise Decomposition) Implementations of Floyd's Algorithm
0.00001
0.00010
0.00100
0.01000
0.10000
1.00000
10.00000
100.00000
8 16 32 64 128 256
problem size, n, # of vertices
exe
cutio
n ti
me
, t, s
ec
Sequential
Parallel, 2 PEs
System Profiling: All Pairs Shortest Paths
0.010
0.100
1.000
10.000
100.000
1000.000
8 16 32 64 128 256 512 1024 2048
Problem Size, n, # of Vertices
Tim
e, t
, sec
Total Execution Time
0.001
0.010
0.100
1.000
10.000
100.000
1000.000
8 16 32 64 128 256 512 1024 2048
Problem Size, n, # of Vertices
Tim
e, t
, sec
T
C1DT
C2DT
BWT
System Profiling: All Pairs Shortest Paths
System Modifications and Extensions
• Better understanding of the trade-offs in the design of the interface.
• Efficient synchronization primitives through extended memory semantics with full/empty bits.
• Server-side per-page locking and client-side full- page flushing.
* Speedups > 1! *
System profiles resulted in:
Performance Results: Speedup for Various Configurations
0
1
2
3
4
5
6
7
256 512 1024 2048 4096 8192
Problem Size, n, # of Vertices
Sp
eed
up
vs.
Seq
uen
tial
Imp
lem
enta
tio
n2 P Es
4
8
16
0
1
2
3
4
5
6
7
256 512 1024 2048 4096 8192
Problem Size, n, # of Vertices
Sp
eed
up
vs.
Seq
uen
tial
Imp
lem
enta
tio
n2 P Es
4
8
16
Performance Results: Trends
Top Related