Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu,...

Computing in theReliable Array of Independent Nodes

Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck

May 5, 2000

IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems

California Institute of Technology

Marc RiedelMarc Riedel

RAIN Project

Collaboration:

• Caltech’s Parallel and Distributed Computing Group www.paradise.caltech.edu

• JPL’s Center for Integrated Space Microsystems www.csmt.jpl.nasa.gov

RAIN Platform

switchswitch

Heterogeneous network of nodes and switches

nodenode

switch

nodenode

RAIN Testbed

www.paradise.caltech.edu

10 Pentium boxesw/multiple NICs

4 eight-way Myrinet Switches

Proof of Concept: Video Server

AA BB CC DD

switch switch

Video client & server on every node.

Limited Storage

AA BB CC DD

Insufficient storage to replicate all the data on each node.

switch switch

k-of-n Code

from any k of n columns

b = a+b a+

d = d+c c+

a b c d

recover data

Erasure-correcting code:

Encoding

AA BB CC DD

Encode video using 2-of-4 code.

switch switch

Decoding

AA BB CC DD

Retrieve data and decode.

switch switch

Node Failure

AA BB CC DD

switch switch

Node Failure

AA BB CC D

switch switch

Node Failure

Dynamically switch to another node.

AA BB CC D

switch switch

Link Failure

BB DCCAA

switch switch

Link Failure

AA BB CC D

switch switch

Link Failure

Dynamically switch to another network path.

AA BB DCC

switch switch

Switch Failure

AA BB DCC

switch switch

Switch Failure

AA BB CC D

switch switch

Switch Failure

AA CC D

Dynamically switch to another network path.

switch switch

Node Recovery

AA CCBB DD

switch switch

Node Recovery

AA CCBB DD

switch switch

Continuous reconfiguration (e.g., load-balancing).

Features

• tolerates multiple node/link/switch failures• no single point of failure

High availability:

Certified

Buzz-Word

Compliant

• multiple data paths • redundant storage• graceful degradation

Efficient use of resources:

Dynamic scalability/reconfigurability

RAIN Project: Goals

NetworksCommunication

key building blocks

Storage

Applications

Efficient, reliable distributed computing and storage systems:

Topics

• Fault-Tolerant Interconnect Topologies

• Connectivity

• Group Membership

• Distributed Storage

Today’s Talk:

NetworksCommunication

Storage

Applications

Interconnect Topologies

= computing/storage node

Network

Goal: lose at most a constant number of nodes for given network loss

NN NN NN NN NN NN NN NN NN NN

Network

Resistance to Partitions

Large partitions problematic for distributed services/computation

Resistance to Partitions

Large partitions problematic for distributed services/computation

Network

Related Work

• Hayes et al., Bruck et al., Boesch et al.

Embedding hypercubes, rings, meshes, trees in fault-tolerant networks:

• Ku and Hayes, 1997. “ Connective Fault-Tolerance in Multiple-Bus Systems”

Bus-based networks which are resistant to partitioning:

IEEE ACM

A Ring of Switches

= Node

= SwitchSS

a naïve solution

degree-2 compute nodes,degree-4 switches

A Ring of Switches

= Node

= SwitchSS

a naïve solution

= Node

= SwitchSS

easily partitioned

A Ring of Switches

a naïve solution

degree-2 compute nodes,degree-4 switches NN

Resistance to Partitioning

88nodes on diagonals

nodes on diagonals

• tolerates any 3 switch failures (optimal)

• generalizes to arbitrary node/switch degrees.

Details: paper IPPS’98, www.paradise.caltech.edu

Isomorphic

Point-to-Point Connectivity

Is the path from A to B up or down?

?nodenode

nodenode

Network

Connectivity

Link is seen as up or down by each node.

{U,D} {U,D}

Bi-directional communication.

Each node sends out pings.A node may time-out, deciding the link is down.

Consistent History

NodeState

The Slack

NodeState

A is 1 ahead

A is 2 ahead

Now A will wait for B to transition

Slack n=2:at most 2 unacknowledged transitions before a node waits

Consistent History

Consistency in error reporting:If A sees channel error, B sees channel error.

Birman et al.: “Reliability Through Consistency”

{U,D} {U,D}

Group Membership

ABCDABCD

• link/node failures• dynamic reconfiguration

Consistent global view given local, point-to-point connectivity information

Related Work

Totem, Isis/Horus, TransisSystems

TheoryChandra et al., Impossibility of Group Membership in an Asynchronous Environment

IEEE ACM

Token-Ring based Group Membership Protocol

Group Membership

• group membership list

• sequence number

Token carries:

1: ABCD

Group Membership

• sequence number

Token carries:

1: ABCD

Group Membership

• sequence number

Token carries:1

2: ABCD

Group Membership

• sequence number

Token carries:1

3: ABCD

Group Membership

• sequence number

Token carries:1

4: ABCD

Group Membership

• sequence number

Token carries:5

Group Membership

Node or link fails:

Group Membership

Node or link fails:

Group Membership

Node or link fails:

Group Membership

Node or link fails:

Group Membership

If a node is inaccessible,it is excluded and bypassed.

5: ACD

Node or link fails:

Group Membership

6: ACD

Node or link fails:

Group Membership

Node or link fails:

Group Membership

Node or link fails:

Group Membership

Node with token fails:

Group Membership

If the token is lost,it is regenerated.

Group Membership

5: ACD

Group Membership

5: ACD

Highest sequence numberprevails.

Group Membership

Highest sequence numberprevails.

Group Membership

Node recovers:

Group Membership

Recovering nodesare added.

Node recovers:

Group Membership

7: ADC

Node recovers:

Group Membership

8: ADC

Node recovers:

Group Membership

9: ADC

Node recovers:

Group Membership

Node recovers:

Group Membership

• Unicast messages

• Dynamic reconfiguration

• Mean time-to-failure > convergence time

Features:

Details: publication forthcoming.

Distributed Storage

disk disk diskdisk

101001001000101001001000

Distributed Storage

disk disk disk

Focus: reliability and performance.

1010 10 101 11

Array Codes

Ideally suited for distributed storage. Low encoding/decoding complexity.

dataredundancy

“B-code”

Array Codes

from any k of n columns

b = a+b a+

d = d+c c+

a b c d

recover data

Array Codes

a b c d

Details: IEEE Trans. Info Theory, www.paradise.caltech.edu

B-Code and X-Code:• optimally redundant• optimal encoding/decoding complexity

Summary

Fault-tolerant Interconnect Topologies

Connectivity

{U,D} {U,D}

Group Membership

1: ABCD

2: ABCD

3: ABCD

4: ABCD

Distributed Storage

Proof-of-Concept Applications

RAINVideoHigh-availability video server

RAINCheckDistributed checkpoint rollback/recovery system

SNOWStable Network of Webservers

Rainfinity

www.rainfinity.com

Start-up based on RAIN technology

• availability

• scalability

• performance

Clustered solutions for Internet data centers, focusing on:

Business Plan:

Rainfinity

• Founded Sept. 1998

• Released first product April 1999

• Received $15 million funding in Dec. 1999

• Now over 50 employees

www.rainfinity.com

Start-up based on RAIN technology

Company:

Future Research

• Development of API’s• Fault-Tolerant Distributed Filesystem• Fault-Tolerant MPI/PVM implementation

End of Talk

Material that was cut...

Erasure Correcting Codes

1 0 1 0 1 1 0 1 0 0 010

encoded data

Strategy:encode data with an erasure-correcting code.

1 0 1 0 1 1 0 1 0 0 010

lose up to m coordinates

1 0 1 0 1 1 0 1 0 0 010

reconstructed data

Code is optimally-redundant (MDS) if . knm Example: Reed-Solomon code.

1 0 1 0 1 1 0 1 0 0 010

reconstructed data

RAIN: Distributed Store

disk disk disk disk

• Encode data with (n, k) array code

• Store one symbol per node

RAIN: Distributed Retrieve

• Retrieve encoded data from any k nodes

• Reconstruct data

a b c d

a b c d• Reliability (similar to RAID systems)

disk disk

• Reliability (similar to RAID systems)

a b c d

• Performance: load-balancing

disk disk

Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu,...

Documents

Transcript of Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu,...

Fall The Wisconsin Psychiatrist 2017 - cdn.ymaws.com · Senator Devin LeMahieu (R-Oostburg) and Representative Mike Rohrkaste (R-Neenah) are introducing legislation to create a consistent

Afghanistan cannabis survey 2009 · 2015-07-02 · Afghanistan cannabis survey 2009 April 2010 Vienna International Centre, PO Box 500, 1400 Vienna, ... Kabul) Jean-Luc Lemahieu (Country

WORD FIND Yankees All Stars · 1. Yogi BERRA 2. Aroldis CHAPMAN 3. Brett GARDNER 4. Reggie JACKSON 5. Derek JETER 6. Aaron JUDGE 7. DJ LEMAHIEU 8. Mickey MANTLE 9. Thurman MUNSON

Radiophamaceuticals in Nuclear Cardiac Imaging Vasken Dilsizian, M.D. Professor of Medicine and Radiology Director of Cardiovascular Nuclear Medicine and.

t***********************Alice Houston - Assistant Superintendent Seattle Public Schools, Seattle, Washington ... Paul LeMahieu - Director, Research, Testing and Evaluation. Pittsburgh

HOLY TRINITY ARMENIAN CHURCH OF GREATER ... note that if you would like Fr. Vasken to offer your prayer request during his silent prayers at the celebration of the Divine Liturgy next

Prepared by: Vasken Khabayan, Consul General of Canada (Acting) Doing Business … · 2017. 11. 8. · Doing Business in Canada Seminar – AR WTC November 8, 2017 • • • 2 •

Writing Process Phase 3 Chapter 7 Kareem Babeel Shihab El Dessouki Saïd Isse Karim Sehaqui Vasken Vosguian.

Predictive Surgical Simulation for Cardiac Surgery › ~leowwk › thesis › lihao-proposal.pdfof user inputs, In this way, the surgeon can explore various surgical options to determine

Computing in the RAIN: a reliable array of independent ...authors.library.caltech.edu/5359/1/BOHieeetpds01.pdf · Computing in the RAIN: A Reliable Array of Independent Nodes Vasken

Registration-Based Segmentation of Medical Imagesleowwk/thesis/lihao-grp.pdf · registration algorithms and registration-based segmentation algorithms. 2.1 General Segmentation Algorithms

Nolan Arenado, DJ LeMahieu named to NL all-star team; Troy Tulowitzki still pending

OR Utilization Guidelines...Rachel LeMahieu, MSN, RNFA, CNOR, is the Director of Surgical Services, Cath Lab, and Specials at Henderson Hospital in Henderson, Nevada. Rachel received

Executive Summary Transforming the Education Sector into a … · 2019-03-06 · For more information on seven of the leading models and their application to education, see Paul LeMahieu,

Shenzhen BSX Technology Electronics Co., Ltd. General ...2020/11/20 · Shenzhen BSX Technology Electronics Co., Ltd. Fang Kai General Manager Rm301.3F 8th Building, LiHao Industrial

System Leadership for Continuous Improvement: The Role of ... · LeMahieu et al., (2017) also highlight the importance of leaders’ efforts in enabling effective and sustainable

IEEE 802.16j Relay-Based Wireless Access Networks VASKEN GENC, SEAN MURPHY, YANG YU, AND JOHN MURPHY, UNIVERSITY COLLEGE DUBLIN SCHOOL OF COMPUTER SCIENCE.

2019 Games Played By Position · 2020-02-19 · Charlie Culberson 0 10 0 1 7 46 0 DJ LeMahieu 0 39 75 52 0 0 1 Charlie Tilson 0 0 0 0 0 51 0 Dom Nunez 13 0 0 0 0 0 0 Cheslor Cuthbert

AFGHANISTAN OPIUM SURVEY 2010 - United Nations · PDF fileANP Afghan National Police ... Jean-Luc Lemahieu (Country Representative), ... But security, stability and an environment

Troy Tulowitzki selected for All-Star Game; DJ LeMahieu will start