Blue Gene Simulator

33
1 Blue Gene Blue Gene Simulator Simulator Gengbin Zheng [email protected] Gunavardhan Kakulapati kakulapa @ uiuc . edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu

description

Blue Gene Simulator. Gengbin Zheng [email protected] Gunavardhan Kakulapati [email protected] Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu. Overview. Blue Gene Emulator Blue Gene Simulator - PowerPoint PPT Presentation

Transcript of Blue Gene Simulator

Page 1: Blue Gene Simulator

1

Blue Gene SimulatorBlue Gene Simulator

Gengbin [email protected]

Gunavardhan [email protected]

Parallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana-Champaignhttp://charm.cs.uiuc.edu

Page 2: Blue Gene Simulator

2

OverviewOverview

Blue Gene Emulator

Blue Gene Simulator

Timing correction schemes

Performance and results

Page 3: Blue Gene Simulator

3

Emulation on a Parallel MachineEmulation on a Parallel Machine

Simulating (Host) Processor

BG/C Nodes

Hardware thread

Page 4: Blue Gene Simulator

4

Blue Gene Emulator: functional viewBlue Gene Emulator: functional view

Communication threads

Non-affinity message queues Affinity message queues

Worker threads

inBuffer

One Blue Gene/C node

CorrectionQ

Page 5: Blue Gene Simulator

5

Blue Gene Emulator: functional viewBlue Gene Emulator: functional view

Affinity message queues

Communication threads

Worker threads

inBuff

Non-affinity message queues

CorrectionQ

Converse scheduler

Converse Q

Communication threads

Worker threads

inBuff

Non-affinity message queues

CorrectionQ Affinity message

queues

Page 6: Blue Gene Simulator

6

What is capable …What is capable …

Blue Gene API supportBlue Gene Charm++

– Structured DaggerTrace Projections

Page 7: Blue Gene Simulator

7

Emulator to SimulatorEmulator to Simulator

Emulator:

– Study programming model and application development

Simulator:

– performance prediction capability

– models communication latency based on network model;

– Doesn’t model memory access on chip, or network

contention

Page 8: Blue Gene Simulator

8

SimulatorSimulator

Parallel performance is hard to model– Communication subsystem

Out of order messagesCommunication/computation overlap

– Event dependenciesParallel Discrete Event Simulation

– Emulation program executes in parallel with event time stamp correction.

– Exploit inherent determinacy of application

Page 9: Blue Gene Simulator

9

How to simulate?How to simulate? Time stamping events

– Per thread timer (sharing one physical timer)

– Time stamp messages Calculate communication latency based on network model

Parallel event simulation– When a message is sent out, calculate the predicted

arrival time for the destination bluegene-processor

– When a message is received, update current time. currTime = max(currTime,recvTime)

– Time stamp correction

Page 10: Blue Gene Simulator

10

Thread Timer: curT

Time Stamping messages and threadsTime Stamping messages and threadsMessage sent:RecvT(msg) = curT+Latency

Message scheduled:curT = max(curT, RecvT(msg))

Page 11: Blue Gene Simulator

11

Need for timestamp correctionNeed for timestamp correction

Time stamp correction needed for out-of-order messages

Out-of-order delivery can occur:– A message arrives late while some other

message updates the thread time to future– So late message executes in the context of

future, although its predicted time is earlier

Page 12: Blue Gene Simulator

12

Parallel correction algorithmParallel correction algorithmSort message execution by receive time;Adjust time stamps when neededUse correction message to inform the change

in event startTime.Send out correction messages following the

path message was sentThe events already in the timeline may have

to move.

Page 13: Blue Gene Simulator

13

M8

M1 M7M6M5M4M3M2

RecvTime

ExecutionTimeLine

Timestamps CorrectionTimestamps Correction

Page 14: Blue Gene Simulator

14

M8M1 M7M6M5M4M3M2

RecvTime

ExecutionTimeLine

Timestamps CorrectionTimestamps Correction

Page 15: Blue Gene Simulator

15

M1 M7M6M5M4M3M2

RecvTime

ExecutionTimeLine

M8

ExecutionTimeLineM1 M7M6M5M4M3M2 M8

RecvTime

Correction Message

Timestamps CorrectionTimestamps Correction

Page 16: Blue Gene Simulator

16

M1 M7M6M5M4M3M2

RecvTime

ExecutionTimeLine

Correction Message (M4)

M4

Correction Message (M4)

M4

M1 M7M4M3M2

RecvTime

ExecutionTimeLineM5 M6

Correction Message

M1 M7M6M4 M3M2

RecvTime

ExecutionTimeLineM5

Correction Message

Timestamps CorrectionTimestamps Correction

Page 17: Blue Gene Simulator

17

Linear-order correctionLinear-order correction

Works only when– Programs have no alternate orders of

execution possible– Messages are processed in the same order for

multiple executions– Eg: MPI programs with no-wildcard recvs,

structured-dagger code with no “overlap” or “forall”.

Page 18: Blue Gene Simulator

18

Reasons:Reasons:

Correction algorithm breaks dependency logic– Only based on receive time;– Cases:

When an event depends on several messages– Last message triggers the computation

Message buffered until some condition holdsExample for invalid correction scheme:

Jacobi-1D

Page 19: Blue Gene Simulator

19

Page 20: Blue Gene Simulator

20

SolutionSolution

Use structured dagger to retrieve dependence information

As the program runs, form a chain of bluegene logs preserving the dependency information .

Bluegene logs for entry functions and structured dagger functions

Page 21: Blue Gene Simulator

21

Timestamp correction schemeTimestamp correction scheme

Every event has a list of backward and forward dependents.

An event cannot start till its backward dependents have finished.

Define effRecvTime =

max(recvTime, endOfBackDeps) An event can start only after its effRecvTime.

startTime = max(effRecvTime,timeline.last.endTime)

Page 22: Blue Gene Simulator

22

Timestamp correction schemeTimestamp correction scheme

Timeline is not sorted on the recvTime of the event like the previous case.

Timeline is sorted based on the effRecvTime. Steps to process a correction message

– Find the earliest updated event due to the message

– Cut the timeline from that event

– Calculate new effRecvTimes from then.

– Reinsert into the timeline in the order of effRecvTime

Page 23: Blue Gene Simulator

23

Non-linear order correction Non-linear order correction schemeschemeThe new scheme :

– Takes into account the event dependencies– Works even when messages can be received in

different orders in different runs.– Requires all the dependencies to be captured

using structured dagger.But the timing correction is very slow.

Several optimizations possible.

Page 24: Blue Gene Simulator

24

Optimizations to online Optimizations to online correction schemecorrection schemeOverwrite old corrections:

– An event can get multiple correction messages.

– Reduce the number of corrections– Same scheme if correction message arrives

earlier than the message itself Use multisend

– Messages destined to same real processor but different events can be sent collectively.

Page 25: Blue Gene Simulator

25

More optimizationsMore optimizations Prioritize messages based on their predicted

recvTime. Lazy processing

– Process correction messages periodically.

– Allows corrections to be overwritten. Batch processing

– Process many correction messages at a time

– Many events will be affected

– Choose the earliest and reinsert in the order of effRecvTime.

Ability to start corrections in the middle– Can ignore the startup events for timing correction

Page 26: Blue Gene Simulator

26

Timing correction still very slow.Observations:

– Don’t let the execution go far ahead of the correction wave.

– A large difference means many wrong events to be corrected.

– Closely following the execution wave also may not help.

A new scheme – Similar to the one used for gvt (Global virtual

time)

Page 27: Blue Gene Simulator

27

GVT-like schemeGVT-like schemeUse heartbeat

– Periodically broadcast asking for gvtGvt

– Is the time after which the events are invalid due to pending corrections

– Compute the gvt as the minimum of predict recvTimes of all correction messages and startTimes of all affected events.

Use a parameter “leash”. Execution of the program cannot go beyond “gvt + leash”

Page 28: Blue Gene Simulator

28

Projections before correctionProjections before correction

Page 29: Blue Gene Simulator

29

Projections after correctionProjections after correction

Page 30: Blue Gene Simulator

30

Correctness of the scheme (using Correctness of the scheme (using Jacobi1D)Jacobi1D)

Page 31: Blue Gene Simulator

31

Predicted time vs latency factorPredicted time vs latency factor

Page 32: Blue Gene Simulator

32

Predicted speedupPredicted speedup

Page 33: Blue Gene Simulator

33

More workMore workOngoing work

– Make sure gvt scheme is correctFuture work

– The presented scheme is on-line correction– Explore the off-line (post-mortem) correction

scheme using generated traces.