MRNet: From Scalable Performance to Scalable Reliability

MRNet:From Scalable Performance

to Scalable Reliability

Dorian C. ArnoldUniversity of Wisconsin-Madison

Paradyn/Condor WeekApril 14-16, 2004

Madison, WI

More HPC Facts Statistics from Top500 List:

• 24%: number of processors ≥ 512• 10%: number of processors ≥ 1024• 9 systems: number of processors ≥ 4096• Largest system has 8192 processors• By 2009, 500th entry faster than today’s #1

Bottom Line: HPC systems with many thousands of nodes will soon be the standard.

Applications Must Address Scalability!

Challenge 1: Scalable Performance

• Provide distributed tools with a mechanism Provide distributed tools with a mechanism for scalable, efficient group for scalable, efficient group communicationscommunicationsand data analyses.and data analyses.– Scalable MulticastScalable Multicast– Scalable ReductionsScalable Reductions– In-network data aggregationsIn-network data aggregations

Applications Must Address Scalability!

Scalability necessitates reliability!

Challenge 2: Scalable Reliability• Provide mechanisms for reliability in our Provide mechanisms for reliability in our

large-scale environment that do not large-scale environment that do not degrade scalability.degrade scalability.– Scalable multicast– Scalable reductions– In-network data aggregations

Target Applications Distributed tools and debuggersDistributed tools and debuggers

• Paradyn, Tau, PAPI’s perfometer, …Paradyn, Tau, PAPI’s perfometer, … Grid and Distributed Middleware

• Condor, GlobusCondor, Globus Cluster and system monitoring applicationsCluster and system monitoring applications Distributed shell for command-line toolsDistributed shell for command-line tools

Goal: Provide a generic scaling mechanismfor monitoring, control, troubleshooting and general middleware components for Grid infrastructures.

Problem: Centralization leads to poor scalability• Communication overhead

does not scale.• Data Analyses restricted to

front-end.

Challenge 1: Scalable Performance

Tool Front End

BE0 BE1 BE2 BE3 BEn-4 BEn-3 BEn-2 BEn-1

a0 a1 a2 a3 an-4 an-3 an-2 an-1

Multicast/Reduction Network• Scalable data multicast and

reduction operations.• In-network data

aggregations.

MRNet: Solution to Scalable Tool Performance

Tool Front End

BE0 BE1 BE2 BE3 BEn-4 BEn-3 BEn-2 BEn-1

a0 a1 a2 a3 an-4 an-3 an-2 an-1…………

……

Paradyn/MRNet Integration Scalable start-up

• Broadcast metric data to daemons• Gather daemon data at front-end• Front-end/daemon clock skew detection

Performance data aggregation• Time-based synchronization

Paradyn Data Aggregation (32 metrics)

00.10.20.30.40.50.60.70.80.9

0 100 200 300 400 500 600

Number of Back-Ends

32 metrics, flat tree 16 metrics, flat tree

8 metrics, flat tree 1 metric, flat tree

32 metrics, 8-way fanout

MRNet References Technical papers:

• Roth, Arnold, and Miller, “MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools”, in SC2003 (Phoenix, AZ, November 2003).

• Roth, Arnold and Miller, “Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned”, in 2004 High-Performance Grid Computing Workshop held in conjunction with IPDPS 2004 (Santa Fe, New Mexico, April 2004).

Scalable Performance Achieved:What Next?

More and increasingly complex components in large scale systems.

component

componentsystem

MTTRNN

MTTFMTTF

A system with 10,000 nodes is 104 timesmore likely to fail than one with 100 nodes.

Challenge 2: Scalable Reliability Goals:

• Design scalable reliability mechanisms for communication infrastructures with reduction operations and in-network data aggregations.

• Quantitative understanding of scalability trade-off between different levels of resiliency and reliability.

Challenge 2: Scalable Reliability Reliability vs. Resiliency:

• A reliable system executes correctly in the presence of (tolerated) failures.

• A resilient system recovers to a mode in which it can once again execute correctly.– During a failure, errors are visible at the system

interface level.

Challenge 2: Scalable Reliability Problem:

• Scalability → decentralization, low-overhead– Scalability wants simple systems.

• Reliability → consensus, convergence, high-overhead– Reliability wants complex systems.

How can we leverage our tree-based topology to achieve scalable reliability?

Recovery Models and Semantics Fault model: crash-stop failures TCP-like reliability for tree-based multicast

and reduction operations

System should tolerate any and all internal node failures• System slowly degrades to flat topology

Models based on operational complexity• E.g. Are in-network filters stateful?

Recovery Models and Semantics: Challenges

Detecting loss , duplication and ordering

Quick recovery from message loss

Correct recovery from failure

Recovery of state information from aggregation operations

Simultaneous failures

Validation of our scalability methodology

Challenge 2: Scalable Reliability

Hypothesis: Aggregating control messagesHypothesis: Aggregating control messagescan effectively achievecan effectively achieve

scalable, reliable systems.scalable, reliable systems.

Example: Scalable Failure Detection

Goal: A scalable failure-detection service with high rates of convergence.

Previous work:• non-scalable overhead• poor convergence properties• non-deterministic guarantees• costly assumptions

– E.g. fully-connected meshes

Failure Detection Approaches•Gossip-style failure detection and propagation

•Gupta et al, van Renesse et al

Failure Detection Approaches•Hierarchical heartbeat detection and propagation

• Felber et al, Overcast, Grid monitoring

Scalable Failure Detection Tracking senders in aggregated message:

• Naïve approaches:– Append 32/64-bit source ID for each source

• Pathological case: many senders

– Bit-array where bits represent potential sources• Pathological case: many potential sources, few actual

senders

• Our Approach:– Variable size bit-array:

• Number of bits vary according to descendants beneath the intermediate node (i.e. depth in topology)

Scalable Failure DetectionHierarchical heartbeats/propagation (with message aggregation):

0 0 11 1 0 1

1 0 11

0 0 11 0 11

Scalable Failure Detection Study scalability and convergence

implications of our scalable failure detection protocol.• In theory:

– Pure Hierarchical• msgs = nh x h

– Hierarchical w/aggregation• msgs = ( (nh+1 – 1)/(n – 1) ) – 1

• Example n=8, h=4 (4096 leaves):– Pure hierarchical: 16,384 msgs- With aggregation: 4,680 msgs

Scalable Event Propagation Implement generic event propagation

service• Encode events into 1-byte codes

• Combine with aggregation protocol for low-overhead control messages

• Piggyback control messages with data messages

Summary MRNet provides tools and grid services

with scalable communications and data analyses.

We are studying techniques to provide high degrees of reliability at large scales.

MRNet website:• http://www.paradyn.org/mrnet

darnold@cs.wisc.edu

MRNet: From Scalable Performance to Scalable Reliability

Documents

Transcript of MRNet: From Scalable Performance to Scalable Reliability

Unveiling ~okeanos: A public cloud IaaS service coming ... · Why Google Ganeti? No need to reinvent the wheel Scalable, proven software infrastructure Built with reliability and

GEOSPATIAL INTELLIGENCE - L3Harris · We develop scalable and extensible databases and applications. We optimize system performance, reliability, and security. AGILE, RESPONSIVE,

2011-2012 Stakeholder Recap - WestGrid...(ORAN) partners: MRnet (Manitoba), SRnet (Saskatchewan), Cybera (Alberta) and BCNET (British Columbia); and CANARIE, Canada’s Research and

Status&of&Krell&Tools&Built&using&Dyninst/MRNet& · 04/30/2013 Status&of&Krell&Tools&Built&using&Dyninst/MRNet& & Paradyn’Week2013’ Madison,’Wisconsin’ April’30,’2013’

04/30/2013 Status of Krell Tools Built using Dyninst/MRNet Paradyn Week 2013 Madison, Wisconsin April 30, 2013 1Paradyn Week 2013 LLNL-PRES-503431.

Status of Krell Tools Built using Dyninst/MRNet

PlantPAx Distributed Control Sysytem · System is a modern DCS which is scalable, flexible, and open while providing the reliability, functionality, and performance expected from

IT Infrastructure - Creating Golden Opportunities for …2018/03/08 · varied customer needs, with flexible, scalable infrastructure Offers non-stop reliability and security for

Astrolabe: A Robust and Scalable Technology For ...alc/old/comp520/papers/vRBV03.pdf · [Operating Systems]: Reliability – fault tolerance; D.4.6 [Operating ... managing, and as

SCALABLE Network Technologies - Development Analysis ......2009/04/19 · SCALABLE Network Technologies n +1.424.603.6361 n info@scalable-networks.com n scalable-networks.comEXATA

Scalable Methods for Reliability Analysis in Digital ...sachin/Theses/JianxinFang.pdf · As technology has scaled aggressively, device reliability issues have become a growing concern

Mr. Scan: Efficient Clustering with MRNet and GPUs

Assisting with Scalable Scalable Vector Graphics and ... · SVG Scalable Vector Graphics [6] SSVG Scalable Scalable Vector Graphics [10] LWA Live Website Annotate [See Section 4]

Modern VLSI Design 4e: Chapter 2 Copyright 2008 Prentice Hall PTR Topics n SCMOS scalable design rules. n Reliability. n Stick diagrams.

High Performance/Reliability Flash Solution for Embedded ......DRAM & Flash products manufacturing Taiwan SMT facilities: Kaohsiung – EVT/DVT/Burn In, Mass Production Scalable to

Scalable database, Scalable language @ JDC 2013

Improving the Scalability of Comparative Debugging with MRNet

Issues in Scalable Clustered Network Architecture for ...link-level reliability, broadcasts are inherently unreliable in wireless ad hoc networks. A large-scale MANET aggravates the

Multiple Description Coding for Non-Scalable and Scalable ...paduaresearch.cab.unipd.it/628/1/tesi.pdf · Multiple Description Coding for Non-Scalable and Scalable Video Compression

Monitor Synthesis - Lee Pike · Proposed Monitoring Case Studies •NASA’s SPIDER (Scalable Process-Independent Design for Enhanced Reliability) –An ultra-reliable databus designed