Improving the Scalability of Comparative Debugging with MRNet

35
Improving the Scalability of Comparative Debugging with MRNet MeSsAGE Lab (Monash Uni.) David Abramson Minh Ngoc Dinh Jin Chao Cray Inc. Luiz DeRose Robert Moench Andrew Gontarek Jin Chao

description

Improving the Scalability of Comparative Debugging with MRNet. Jin Chao. MeSsAGE Lab (Monash Uni.). Cray Inc. Luiz DeRose Robert Moench Andrew Gontarek. David Abramson Minh Ngoc Dinh Jin Chao. Outline. Assertion-based comparative debugging The architecture of Guard - PowerPoint PPT Presentation

Transcript of Improving the Scalability of Comparative Debugging with MRNet

Page 1: Improving the Scalability of Comparative Debugging with MRNet

Improving the Scalability of Comparative Debugging with MRNet

MeSsAGE Lab (Monash Uni.)

• David Abramson

• Minh Ngoc Dinh

• Jin Chao

Cray Inc.

• Luiz DeRose

• Robert Moench

• Andrew Gontarek

Jin Chao

Page 2: Improving the Scalability of Comparative Debugging with MRNet

Outline

Assertion-based comparative debugging

The architecture of Guard

Improving the scalability of Guard with MRNet

Performance evaluation

Conclusion and future work

Page 3: Improving the Scalability of Comparative Debugging with MRNet

Assertion-based Comparative Debugging

Page 4: Improving the Scalability of Comparative Debugging with MRNet

Challenge faced by parallel debugging (1)

Cognitive challenge Large dataset on popular supercomputer:

Cray Blue Water:•> 1.5PB aggregated memory•> 380,000 CPU cores

A large set of scientific data that is non-readable to human :•Multi-dimensional•Floating-point

What is the correct state?

Page 5: Improving the Scalability of Comparative Debugging with MRNet

Challenge faced by parallel debugging (2)

Control-flow based parallel debuggers

Limitations of visualization tools Errors in visualized presentations are hard to detect

Page 6: Improving the Scalability of Comparative Debugging with MRNet

Comparative debugging

Data-centric debugging: focusing the data set in parallel programs

Comparing state between programs Porting codes across different platforms Re-writing codes with different languages Software evolution: Modifying/improving existing codes

Comparing state with users’ expectations Invariants based on the properties of scientific modelling

or mathematical theories “Verifying” the correctness of computing status

Page 7: Improving the Scalability of Comparative Debugging with MRNet

Assertion for Comparative debugging

“An assertion is a statement about an intended state of a system’s component.”

Assertion in Guard: An ad hoc debug-time assertion Simple assertion:

assert $a::p@“source.c”:33 = 5000

31: for(j = 0; j < n; j++) {32: for (i = 0; i < m; i++) {33: p[j][j] = 5000; }}

31: for(j = 0; j < n; j++) {32: for (i = 0; i < m; i++) {33: p[j][i] = 5000; }}

Incorrect code Correct code

Page 8: Improving the Scalability of Comparative Debugging with MRNet

Assertion for Comparative debugging

“An assertion is a statement about an intended state of a system’s component.”

Assertion in Guard: An ad hoc debug-time assertion Simple assertion: assert $a::var@“source.c”:34 = 1024

Comparative assertions Observing the divergence in the key data structures as the

programs execute

assert $a::p_array@“prog1.c”:34 = $b::p_array@“prog2.c”:37

Statistical assertions Verifying the statistical properties of scientific modelling or

mathematical theories:• standard deviation• histogram

Page 9: Improving the Scalability of Comparative Debugging with MRNet

Example of statistical assertion: histogram

Histogram: user-defined abstract data model Creating the model with the two phase operation

• create $model=randset (Gaussian, 100000, 0.05)• set reduce histogram(1000, 0.0, 1.0)• assert $a::[email protected]:10~$model < 0.02

estimate operator: ~• 2 goodness of fit test

Page 10: Improving the Scalability of Comparative Debugging with MRNet

Implementation of assertionsAn assertion is compiled into a dataflow graph

Dataflow machine

set reduce “sum”assert $a::p_array@“prog.c”:34 > 1,000,000

PROCSET($a)

Go

Set BKPT

BKPT Hit

Get VAR

Compare

1,000,000

Exit

Reduce Sum(p_array)

Page 11: Improving the Scalability of Comparative Debugging with MRNet

Dataflow graphs

Page 12: Improving the Scalability of Comparative Debugging with MRNet

The Architecture of Guard

Page 13: Improving the Scalability of Comparative Debugging with MRNet

Features of Guard (CCDB)

A general parallel debugger Supporting C, FORTRAN and MPI

A relative debugger Comparative assertions and statistical assertions

Client/Server structure: Machine independent command line client Visual Studio Client for Windows SUN One Studio and IBM Eclipse

Supporting servers from different architecture: Unix: SUN (Solaris), x86 (Linux), IBM RS6000 (AIX) Windows: Visual Studio .NET

Architecture Independent Format (AIF)

Page 14: Improving the Scalability of Comparative Debugging with MRNet

The architecture of Guard

p0 p1 p2 pn

s0 s1 s2 sn

Back-end:

Front-end:

Debug Client

GDB GDBGDB GDB

SocketNetwork:

Relative debugger

CLI

Page 15: Improving the Scalability of Comparative Debugging with MRNet

Features of MRNet

Tree-Based Overlay Networks (TBONs)

Scalable broadcast and gather

Custom data aggregation

Page 16: Improving the Scalability of Comparative Debugging with MRNet

General purpose API of MRNet

User-defined tree topology Topology file: k-ary, k-nomial and tailored layout

Communicator A set of back-ends

Stream A logical data channel over a communicator Multicast, gather and custom reduction

Packet Collection of data

Filter Modify data transferred across it WaitForAll, TimeOut, NoWait

Startup: launching back-end processes

Page 17: Improving the Scalability of Comparative Debugging with MRNet

Improving the scalability of Guard with MRNet

Page 18: Improving the Scalability of Comparative Debugging with MRNet

The architecture of Guard with MRNet

p0 p1 p2 pn

Back-end:

Front-end:

MRNet Front-end

Debug Client

Command Line Interface

Comparative debugger

C Wrapper

Guard filter

sns2s1s0 MRNet BE

CP

CP

CP

GDB GDBGDB GDB

Page 19: Improving the Scalability of Comparative Debugging with MRNet

Communication with MRNetCommunication patterns in Guard

General parallel debugger

Topology Balanced, K-nomial Placement of MRNet internal nodes

• Requiring no additional resources

procset: Communicator

Channels for commands, events and I/O Command stream: synchronous channel Event and I/O Stream: asynchronous channel

Aggregating redundant messages: WaitForAll filter: synchronous channel TimerOut filter: asynchronous channel

Page 20: Improving the Scalability of Comparative Debugging with MRNet

Creating a communication tree of MRNet

Back-end:Front-end:

Guard Client

aprun apinit

p0

pn

launch helper agent

s0

sn

MRNet FE

Invoking a debug session on Cray:

GDB

GDB

Page 21: Improving the Scalability of Comparative Debugging with MRNet

Comparative assertion with MRNet (1)

assert $a::p_array@“prog1.c”:34 = $b::p_array@“prog2.c”:37

Back-end:

Front-end:

Guard client

CP

CP CP

BE sns1s0 BE sm

s1s0

CP

CP

CP

FE0 FE1

Page 22: Improving the Scalability of Comparative Debugging with MRNet

Comparative assertion with MRNet (2)Centralized comparison:

p0 p1p0 p1 p2

p3Back-end:

Front-end:

s0 s1 s2 s3 s0 s1 s2 s3d0 d1 d2 d3

s0 s1 s2 s3

d0 d1 d2 d3

s2 s3

Reconstruct and compare

MRNet MRNet

Page 23: Improving the Scalability of Comparative Debugging with MRNet

Data Reconstruction

Currently support regular decompositionsBlockmap:

Assertions require the debugger to understand data decomposition

P1

P2

P3

P4

p_array

s_array

blockmap test(P::V)define distribute(block, *)define data(1024, 1024)

end

Page 24: Improving the Scalability of Comparative Debugging with MRNet

Point-to-point comparison:

Comparative assertion with MRNet (3)

p0 p1p0 p1 p2

p3

Back-end:

Front-end:

d0 d1 d2 d3dd d1 d2 d3

r0 r1

Comparison results

d0 d1 dd d3

r2 r3

MRNet MRNet

Page 25: Improving the Scalability of Comparative Debugging with MRNet

s0 s1 sn

Statistical assertion with MRNet

Standard deviation: two phase operation Parallel: calculate a set of primary statistics Aggregation: form a full statistical model

Back-end:

Front-end: Guard client

CP

CP

CP

ps0 ps1 psn

Aggregation

Page 26: Improving the Scalability of Comparative Debugging with MRNet

Performance Evaluation

Page 27: Improving the Scalability of Comparative Debugging with MRNet

Performance Evaluation

The configuration of MRNet (3.1.0) : A balanced topology Fan-out: 64

Test bed:‘Hera’ Cray XE6 Gemini 1.2 system 752 computing nodes, each with 32GB of memory. 21,760 CPU cores totally

General Parallel Debugger SP (Scalar Pentagidgonal) of NAS Parallel Benchmarks

(NPB) 3.3.1Relative Debugger

Comparative assertion: data examples from WRF Statistical assertion: molecular dynamics simulation

Page 28: Improving the Scalability of Comparative Debugging with MRNet

Scalability of Invoke Command

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

0

2

4

6

8

10

12

14

16

18

The Number of Parallel Processes

Tim

e(se

cond

)

Total-MRNetMRNet InstantiationMRNet AttachmentMRNet Overall

Page 29: Improving the Scalability of Comparative Debugging with MRNet

Latency of Debugging Commands

0 2000 4000 6000 8000 10000 12000 14000 1600010

-3

10-2

10-1

100

The Number of Parallel Processes

Tim

e (s

econ

d)

All bkpt (MRNet)All step (MRNet)

Page 30: Improving the Scalability of Comparative Debugging with MRNet

Message Aggregation

A memory buffer of 256 bytes was added into the SP program. Its value was inspected under different degrees of aggregation (DoA).

0 2000 4000 6000 8000 10000 12000 14000 1600010

-3

10-2

10-1

100

The Number of Parallel Processes

Tim

e (s

econ

d)

DoA = 1(MRNet)DoA = 0.1(MRNet)DoA = 0.01(MRNet)

Page 31: Improving the Scalability of Comparative Debugging with MRNet

Comparative assertion

320 KBytes*10,000=3GB. The size of data is from WRF, a production climate model.

Page 32: Improving the Scalability of Comparative Debugging with MRNet

Statistical assertion Molecular dynamics simulation: 209*106 atoms

The atomistic mechanism of fracture accompanying structural phase transformation in AIN ceramic under hypervelocity impact.

0 2000 4000 6000 8000 10000 1200010

-1

100

101

102

The Number of Parallel Processes

Tim

e (s

econ

d)

Overall-HistogramReduction-HistogramOverall-StdevReduction-Stdev

Page 33: Improving the Scalability of Comparative Debugging with MRNet

Conclusion and Future Work

MRNet improves the scalability of Guard >20,000 debug servers

The overhead of creating an MRNet tree Attachment: Instantiation: one BE process per core

How to take advantage of computing capability of MRNet Comparison across multiple trees?

• Building a tree across multiple aprun? Programming filter for handling aggregations of

statistical assertions The best way of maintaining C wrapper for C++

interface ?

Page 34: Improving the Scalability of Comparative Debugging with MRNet

Related publications• Dinh M.N., Abramson D., Jin C., Gontarek A., Moench R., and DeRose L.,

Debugging Scientific Applications With Statistical Assertion, in International Conference on Computational Science (ICCS), Omaha, USA, 2012.

• Dinh M.N., Abramson D., Jin C., Gontarek A., Moench R., and DeRose L., Scalable parallel debugging with statistical assertions. ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP) 2012, New Orleans, USA. (Poster session)

• Jin C., Abramson D., Dinh M.N., Gontarek A., Moench R., and DeRose L., A Scalable Parallel Debugging Library with Pluggable Communication Protocols, the 12th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2012),Ottwa, Canada, May 2012.

• Dinh M.N., Abramson D., M.N., Kurniawan, Jin C., Moench R., and DeRose L., Assertion based parallel debugging, the 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), Newport Beach,USA, May 2011.

• Abramson, D., Dinh, M.N., Kurniawan, D., Moench, B. and DeRose, L. Data Centric Highly Parallel Debugging, 2010 International Symposium on High Performance Distributed Computing (HPDC 2010), Chicago, USA, June 2010.

• Abramson D., Foster, I., Michalakes, J. and Sosic R., Relative Debugging and its Application to the Development of Large Numerical Models, Proceedings of IEEE Supercomputing 1995, San Diego, December 95. (Best paper award)

Page 35: Improving the Scalability of Comparative Debugging with MRNet

Thanks and Questions?