PADS 2002 1
Conservative Simulation usingDistributed-Shared Memory
Teo, Y. M., Ng, Y. K. and Onggo, B. S. S.
Department of Computer ScienceNational University of Singapore
PADS 2002 2
Improve performance of SPaDES/Java by reducing overhead:
Synchronization of events Distributed communications
Study the memory requirements in parallel simulations.
ObjectivesObjectives
PADS 2002 3
Presentation OutlinePresentation Outline
Parallel SimulationNull Message ProtocolPerformance ImprovementMemory RequirementConclusion
PADS 2002 4
Parallel SimulationParallel Simulation
Sequential simulations execute on a single thread in one processor.
Ideally, parallelizing the simulation should enhance its real-time performance since the workload is distributed.
The need to maintain causality throughout a parallel simulation => Event synchronization protocols.
=> Adds to inter-process communications.
=> New bottleneck!
PADS 2002 5
Null Message ProtocolNull Message Protocol
First designed by Chandy and Misra (1979).Prevents deadlock situations between LPs.LPi sends null messages to each of its neighbours
at the end of every simulation pass, with timestamp = local virtual time of LPi.
Timestamp on null message, T, indicates that the source LP will not send any messages to other LPs before T.
PADS 2002 7
Chandy-Misra-Byrant’s (CMB) protocol performs poorly due to high null message overhead. It transmits null msgs on every simulation pass
NMR ~> 1 for nearly all [0, T).
Optimizations incorporated: Carrier-null message scheme Flushing mechanism Demand-driven null message algorithm Remote communications using JavaSpace
Performance ImprovementPerformance Improvement
PADS 2002 8
Carrier-Null Message AlgorithmCarrier-Null Message Algorithm
Problem with cyclic topologiesUse carrier-null message algorithm (Wood,
Turner, 1996)Avoids transmissions of redundant null
messages in such cycles.
PADS 2002 9
Output Channel (A)
2520 35
REQ
30
Request Channel (B)
LogicalProcess
(A) LogicalProcess
(B)
FEL 20
1835
Flusher
Performance ImprovementPerformance Improvement
Demand driven null messaging + flushing
PADS 2002 10
Experiments conducted usingPC cluster of 8 nodes running RedHat
Linux version 7.0. Each node is a Pentium II 400 MHz processor with 256 MB of memory connected through 100 Mbps switch.
2 benchmark programs PHOLD system Linear Pipeline
Performance EvaluationPerformance Evaluation
PADS 2002 11
PHOLD (3x3, PHOLD (3x3, mm))
Node Node Node
Node
Node
Node
Node
Node
Node
Closed system
PADS 2002 12
Linear Pipeline (4, Linear Pipeline (4, ))Open system
ServiceCenter
ServiceCenter
ServiceCenter
ServiceCenter
Customer population
Depart
PADS 2002 13
PHOLD (PHOLD (nn x x n, mn, m))
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
4 x 4 8 x 8 16 x 16
Problem Size (n x n)
NM
R
CMB (m=1)
CMB (m=8)
CMB (m=16)
Carrier-null (m=1)
Carrier-null (m=8)
Carrier-null (m=16)
Flushing (m=1)
Flushing (m=8)
Flushing (m=16)
Demand-driven (m=1)
Demand-driven (m=8)
Demand-driven (m=16)
CMB
+ Carrier-Null
+ Flushing
+ Demand-driven null msging
PADS 2002 14
Linear Pipeline Linear Pipeline (n,(n, ))
0.4
0.5
0.6
0.7
0.8
0.9
1
4 8 12 16
Problem size (n)
NM
R
CMB / Carrier-null (0.2)
CMB / Carrier-null (0.4)
CMB / Carrier-null (0.6)
CMB / Carrier-null (0.8)
Flushing (0.2)
Flushing (0.4)
Flushing (0.6)
Flushing (0.8)
Demand-driven (0.2)
Demand-driven (0.4)
Demand-driven (0.6)
Demand-driven (0.8)
CMB + Carrier-Null
+ Flushing
+ Demand-driven null msging
PADS 2002 15
%tage Reduction in NMR:PHOLD system
CMB Carrier-null 30% Flushing incorporated 42%
Demand-driven null msg 55%Linear Pipeline
CMB Carrier-null 0% Flushing incorporated 23%
Demand-driven null msg 35%
Performance SummaryPerformance Summary
PADS 2002 16
Distributed CommunicationsDistributed Communications
Originally, SPaDES/Java uses the RMI library to transmit messages between remote LPs. But the serialization phase presents a bottleneck.
Previous performance optimization effort: message deflation.
Only solution to overcome remote communications overhead => send less messages. How?
Target at null messages.
PADS 2002 17
JavaSpacesJavaSpaces
A special Java-Jini service developed by Sun Microsystems, Inc., built on top of Java’s RMI, mimicking a tuple space.
Abstract platform for developing complex distributed applications.
Distributed data persistence.Holds objects, known as entries, with variable
attribute types.Key concept: matching of attribute types/values.
PADS 2002 18
JavaSpacesJavaSpaces
Client Client
write
Notifier
notify
read
take
4 generic operations: write, read, take and notify.
PADS 2002 19
Replace the RMI communication module in SPaDES/Java with one running on a single JavaSpace.
Use a FrontEndSpace: permits crash recovery of entries in the space.
Transmission of processes and null messages between remote hosts go through theFrontEndSpace as space entries.
Distributed CommunicationsDistributed Communications
PADS 2002 20
LP1 LP2
Space Communications : Space Communications : ProcessesProcesses
Time = 0Time = t > 0
SProcess
receiver = 1
SProcess
sender = 2
receiver = 1
……..
SProcess
receiver = 2
PADS 2002 21
LP1 LP2
Space Communications :Space Communications :Null MessagesNull Messages
NullMsg
sender = 2
……..
Req
sender = 2
LP3
LP4
Req
sender = 2
PADS 2002 22
Performance Evaluation – Performance Evaluation – PHOLD(PHOLD(nn x x nn, , mm) )
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
4 x 4 8 x 8 16 x 16
Problem Size (n x n)
NM
R
RMI/J avaSpace (1 processor, m=1)
RMI/J avaSpace (1 processor, m=8)
RMI/J avaSpace (1 processor, m=16)
RMI (4 processors, m=1)
RMI (4 processors, m=8)
RMI (4 processors, m=16)
RMI (8 processors, m=1)
RMI (8 processors, m=8)
RMI (8 processors, m=16)
J avaSpace (4 processors, m=1)
J avaSpace (4 processors, m=8)
J avaSpace (4 processors, m=16)
J avaSpace (8 processors, m=1)
J avaSpace (8 processors, m=8)
J avaSpace (8 processors, m=16)
RMI
JavaSpace (4 procs)
JavaSpace (8 procs)
PADS 2002 23
Overall Performance Evaluation – Overall Performance Evaluation – PHOLD(PHOLD(nn x x nn, , mm) )
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
4 x 4 8 x 8 16 x 16
Problem Size (n x n)
NM
R
CMB (m=1)
CMB (m=8)
CMB (m=16)
Carrier-null (m=1)
Carrier-null (m=8)
Carrier-null (m=16)
Flushing (m=1)
Flushing (m=8)
Flushing (m=16)
Demand-driven (m=1)
Demand-driven (m=8)
Demand-driven (m=16)
J avaSpace [4 procs] (m=1)
J avaSpace [4 procs] (m=8)
J avaSpace [4 procs] (m=16)
J avaSpace [8 procs] (m=1)
J avaSpace [8 procs] (m=8)
J avaSpace [8 procs] (m=16)
CMB
+ Carrier-Null
+ Flushing
+ Demand-driven null msging
JavaSpace (4 procs)
JavaSpace (8 procs)
PADS 2002 24
%tage Reduction in NMR:CMB Carrier-null 30%
Flushing incorporated 42%
Demand-driven null msg 55%
JavaSpace (4 processors) 63%
JavaSpace (8 processors) 74%
Performance SummaryPerformance Summary
PADS 2002 25
Mprob ni=1 MaxQueueSize(LPi)
Mord ni=1 MaxFELSize(LPi)
Msync ni=1 MaxNullMsgBufferSize(LPi)
Memory RequirementMemory Requirement
PADS 2002 26
Memory RequirementMemory Requirement
Space Usage0.2 0.4 0.6 0.8 1 8 16
Mprob 98 192 320 740 256 2048 4096
Mord 50 52 54 56
Msy nc (RMI) 331 341 348 352 665 651 638
Msy nc (JavaSpaces) 305 308 311 312 347 332 317M (RMI) 479 585 722 1148 921 2699 4734M (JavaSpaces) 453 552 685 1108 603 2380 4413
PIPELINE (16, p) PHOLD (16x16, m)mp
PADS 2002 27
Achievements & ConclusionAchievements & Conclusion
Enhanced the performance of SPaDES/Java through various synchronization protocols, achieving an excellent NMR of < 30%.
Implemented a brand new discrete-event simulation library based on the concept of shared memory in a JavaSpace.
Implemented a TSA into SPaDES/Java that can be used as a bench for memory usage studies in parallel simulations.
PADS 2002 28
AcknowledgmentsAcknowledgments
Port of Singapore Authority (PSA)Ministry of Education, SingaporeConstructive feed-back from referees