Parallel and Distributed Simulation
-
Upload
lee-dickson -
Category
Documents
-
view
36 -
download
2
description
Transcript of Parallel and Distributed Simulation
Parallel and Distributed Simulation
Time Parallel Simulation
Outline
• Space-Time Framework• Time Parallel Simulation• Parallel Cache Simulation• Simulation Using Parallel Prefix
Space-Time FrameworkA simulation computation can be viewed as computing the state of the physical processes in the system being modeled over simulated time.
algorithm:
1. partition space-time region into non-overlapping regions
2. assign each region to a logical process
3. each LP computes state of physical system for its region, using inputs from other regions and producing new outputs to those regions
4. repeat step 3 until a fixed point is reached
simulated timeph
ysic
al p
roce
sses
region inspace-time
diagram
inputs to regionLP6
LP5
LP4
LP3
LP2
LP1
simulated time
phys
ical
pro
cess
es
space-parallel simulation(e.g., Time Warp)
Time Parallel Simulation
Basic idea:• divide simulated time axis into non-overlapping intervals• each processor computes sample path of interval assigned to it
Observation: The simulation computation is a sample path through the set of possible states across simulated time.
simulated time
po
ssib
le s
ys
tem
sta
tes
processor1
processor2
processor3
processor4
processor5
Key question: What is the initial state of each interval (processor)?
simulated time
po
ssib
le s
ys
tem
sta
tes
processor1
processor2
processor3
processor4
processor5
Time Parallel Simulation: Relaxation Approach
Benefit: massively parallel execution
Liabilities: cost of “fix up” computation, convergence may be slow (worst case, N iterations for N processors), state may be complex
3. using final state of previous interval as initial state, “fix up” sample path
4. repeat step 3 until a fixed point is reached
1. guess initial state of each interval (processor)
2. each processor computes sample path of its interval
Example: Cache Memory
• Cache holds subset of entire memory– Memory organized as blocks– Hit: referenced block in cache– Miss: referenced block not in cache
• Replacement policy determines which block to delete when requested data not in cache (miss)– LRU: delete least recently used block from cache
• Implementation: Least Recently Used (LRU) stack• Stack contains address of memory (block number)• For each memory reference in input (memory ref trace)
– if referenced address in stack (hit), move to top of stack– if not in stack (miss), place address on top of stack, deleting
address at bottom
processor 1 processor 2 processor 3
address:
LRUStack:
second iteration: processor i uses final state of processor i-1 as initial state
1 2 1 3 4 3 6 7 2 1 2 6 9 3 3 6 4 2 3 1 7 2 7 4
address:
first iteration: assume stack is initially empty:
1 2 1 3 4 3 6 7 2 1 2 6 9 3 3 6 4 2 3 1 7 2 7 4
Example: Trace Drive Cache Simulation
1---
2---
4---
21--
12--
24--
12--
21--
324-
312-
621-
1324
4312
9621
7132
3412
3962
2713
6341
3962
7213
7634
6392
4721
processor 1 processor 2 processor 3
LRUStack:
1276
2463
2176
3246
2763
4639
(idle)
Done!
Given a sequence of references to blocks in memory, determine number of hits and misses using LRU replacement
9621
match!
9621
6217
1324
match!
1324
Outline
• Space-Time Framework• Time Parallel Simulation• Parallel Cache Simulation• Simulation Using Parallel Prefix
Time Parallel Simulation Using Parallel Prefix
Basic idea: formulate the simulation computation as a linear recurrence, and solve using a parallel prefix computationparallel prefix: Give N values, compute the N initial products
P1= X1
P2= X1 • X2
Pi = X1 • X2 • X3 • … • Xi for i = 1,… N; • is associative
add value one positionto the left
X1 X2 X3 X4 X5 X6 X7 X8
+++++++1,2 2,3 3,4 4,5 5,6 6,7 7,8
1
++++++1-3 1-4 2-5 3-6 4-7 5-8
add value two positionsto the left
1-2 ++++1-5 1-6 1-7 1-81-3 1-4
add value four positionsto the left
Parallel prefix requires O(log N) time
Example: G/G/1 QueueExample: G/G/1 queue, given
• ri = interarrival time of the ith job
• si = service time of the ith job
Compute
• Ai = arrival time of the ith job
• Di = departure time of the ith job, for i=1, 2, 3, … N
Solution: rewrite equations as parallel prefix computations:
• Ai = Ai-1 + ri (= r1 + r2 + r3 + … ri)
• Di = max (Di-1 , Ai ) + si
Summary of Time Parallel Algorithms
Con:• only applicable to a very limited set of problems
Pro:• allows for massive parallelism• often, little or no synchronization is required after
spawning the parallel computations• substantial speedups obtained for certain problems:
queueing networks, caches, ATM multiplexers