Post on 08-Aug-2020
spcl.inf.ethz.ch
@spcl_eth
MACIEJ BESTA, TORSTEN HOEFLER
Fault Tolerance for Remote Memory Access
Programming Models
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory
Process p
A
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Process p Process q
A
BB
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
Process p Process q
A
BB
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
Process p Process q
A
BB
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
Process p Process q
A
BB
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
Process p Process q
A
BB
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
BB
AA
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
Bget
B
A
B
A
B
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
Bget
B
A
B
flush
A
B
spcl.inf.ethz.ch
@spcl_eth
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
Bget
B
A
B
flush
A
B
spcl.inf.ethz.ch
@spcl_eth
One-sided communication
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
Bget
B
A
B
flush
A
B
spcl.inf.ethz.ch
@spcl_eth
One-sided communication
2
REMOTE MEMORY ACCESS (RMA) PROGRAMMING
Memory Memory
Cray
BlueWaters
put
Process p Process q
A
Bget
B
A
B
flush
A
B
no active
participation
spcl.inf.ethz.ch
@spcl_eth
3
REMOTE MEMORY ACCESS PROGRAMMING
Implemented in hardware in NICs in the majority of HPC
networks support RDMA
spcl.inf.ethz.ch
@spcl_eth
3
REMOTE MEMORY ACCESS PROGRAMMING
Implemented in hardware in NICs in the majority of HPC
networks support RDMA
spcl.inf.ethz.ch
@spcl_eth
3
REMOTE MEMORY ACCESS PROGRAMMING
Implemented in hardware in NICs in the majority of HPC
networks support RDMA
spcl.inf.ethz.ch
@spcl_eth
3
REMOTE MEMORY ACCESS PROGRAMMING
Implemented in hardware in NICs in the majority of HPC
networks support RDMA
spcl.inf.ethz.ch
@spcl_eth
3
REMOTE MEMORY ACCESS PROGRAMMING
Implemented in hardware in NICs in the majority of HPC
networks support RDMA
spcl.inf.ethz.ch
@spcl_eth
4
REMOTE MEMORY ACCESS PROGRAMMING
Supported by many HPC libraries and languages
spcl.inf.ethz.ch
@spcl_eth
4
REMOTE MEMORY ACCESS PROGRAMMING
Supported by many HPC libraries and languages
spcl.inf.ethz.ch
@spcl_eth
4
REMOTE MEMORY ACCESS PROGRAMMING
Supported by many HPC libraries and languages
spcl.inf.ethz.ch
@spcl_eth
4
REMOTE MEMORY ACCESS PROGRAMMING
Supported by many HPC libraries and languages
spcl.inf.ethz.ch
@spcl_eth
5
REMOTE MEMORY ACCESS PROGRAMMING
Enables significant speedups over message passing in
many types of applications, e.g.:
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
[2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC. SPAA’12
spcl.inf.ethz.ch
@spcl_eth
5
REMOTE MEMORY ACCESS PROGRAMMING
Enables significant speedups over message passing in
many types of applications, e.g.: Speedup of ~1.5 for communication patterns in graph analytics
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
[2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC. SPAA’12
spcl.inf.ethz.ch
@spcl_eth
5
REMOTE MEMORY ACCESS PROGRAMMING
Enables significant speedups over message passing in
many types of applications, e.g.: Speedup of ~1.5 for communication patterns in graph analytics
Speedup of ~1.4-2 in physics computations
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
[2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC. SPAA’12
spcl.inf.ethz.ch
@spcl_eth
6
FAULT TOLERANCE + RMA
spcl.inf.ethz.ch
@spcl_eth
6
15.8h of MTBF (for nodes) for the TSUBAME2
FAULT TOLERANCE + RMA
spcl.inf.ethz.ch
@spcl_eth
7
FAULT TOLERANCE + RMA
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
7
FAULT TOLERANCE + RMA
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
7
FAULT TOLERANCE + RMA
Message Passing
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
7
FAULT TOLERANCE + RMA
Message Passing
Coordinated
Checkpointing (CC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
7
FAULT TOLERANCE + RMA
Message Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
7
FAULT TOLERANCE + RMA
Message Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
Message Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
logging memory
accesses vs. messages
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
logging memory
accesses vs. messages
checkpointing in RMA-
based applications
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
logging memory
accesses vs. messages
checkpointing in RMA-
based applications
fault tolerance
mechanisms and
schemes
spcl.inf.ethz.ch
@spcl_eth
Fault tolerance is well studied for message passing
Scarce research exists for fault tolerance for RMA
7
FAULT TOLERANCE + RMA
RMAMessage Passing
Coordinated
Checkpointing (CC)
uncoordinated
checkpointing
and message
logging (UC)
logging memory
accesses vs. messages
checkpointing in RMA-
based applications
performance
fault tolerance
mechanisms and
schemes
spcl.inf.ethz.ch
@spcl_eth
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
Generic model
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
Generic model
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
Generic model
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
Generic model
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
8
OVERVIEW OF OUR RESEARCH
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA Schemes
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA Schemes
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Schemes
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
Basic scheme
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
Distribution of
processes
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
Distribution of
processes
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
Distribution of
processes
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
Distribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemesSchemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
Distribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMA
8
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
8
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
8
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
8
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
9
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
10
COORDINATED CHECKPOINTING (MP)
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
global
rollback
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
barrier com
pute
com
pute
com
pute
com
pute
barrier
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
10
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
send
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
send
coordinated
checkpoint
recv
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
send
coordinated
checkpoint
recv
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
11
COORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
put
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
put
coordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
put
coordinated
checkpoint
flush
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
12
COORDINATED CHECKPOINTING (RMA)
Proc k Proc 1Proc 1 ... ... Proc k...
put
coordinated
checkpoint
flush
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
13
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
13
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
13
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A
B
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A
B
actions are
non-blocking
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A
B
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A
B
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B
C
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B
C
D
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B
C
D
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD
E
F
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD
E
F
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD E F
actions are
non-blocking
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD E F
actions are
non-blockingEpoch 0
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD E F
actions are
non-blockingEpoch 0
Epoch 1
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
14
RMA: EPOCHS
Proc p Proc q
memorymemory
A B C D E F
A B CD E F
actions are
non-blockingEpoch 0
Epoch 1
Epoch 2
data will be valid
upon synchronizing
memories
spcl.inf.ethz.ch
@spcl_eth
15
RMA: EPOCHS
Proc p Proc q
spcl.inf.ethz.ch
@spcl_eth
15
RMA: EPOCHS
Proc p Proc q
X
Y
Z
spcl.inf.ethz.ch
@spcl_eth
15
RMA: EPOCHS
Proc p Proc q
Epoch 0
X
Y
Z
spcl.inf.ethz.ch
@spcl_eth
15
RMA: EPOCHS
Proc p Proc q
Epoch 0
Epoch 1
X
Y
Z
spcl.inf.ethz.ch
@spcl_eth
15
RMA: EPOCHS
Proc p Proc q
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
B
C
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
B
C
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
B
C
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
B
C
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
B
C
B Cput put
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
B
C
B Cput put
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
B
C
B Cput put
E
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
For this,
we use epoch
counters (EC)
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
For this,
we use epoch
counters (EC)
memory
EC
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
For this,
we use epoch
counters (EC)
memory
EC 0
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
For this,
we use epoch
counters (EC)
memory
EC 1
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
16
RMA: THE CONSISTENCY ORDER
Proc p Proc q
A
D
Epoch 0
Epoch 1
Epoch 2
co
co
co
For recovery, a
process has to
replay actions
in the correct
order! co
For this,
we use epoch
counters (EC)
memory
EC 2
B
C
B Cput put
E Fget getE
F
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
17
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
17
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
17
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
dependency
rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
dependency
rollback
rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
dependency
dependency
rollback
rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
dependency
dependency
rollback
rollback rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
uncoordinated
checkpoint
logging a
message
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
local
rollback
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
get the log and
replay the message
spcl.inf.ethz.ch
@spcl_eth
Node 1 Node N
18
UNCOORDINATED CHECKPOINTING (MP)
Proc k Proc 1Proc 1 ... ... Proc k...
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
memoryCC
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
memoryCC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
memoryCC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
memory
Logs of puts
CC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
CC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
CC
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 1
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 1
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 1
1
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 1
1
spcl.inf.ethz.ch
@spcl_eth
19
RMA: LOGGING PUTS
Proc p Proc q
C
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
1
spcl.inf.ethz.ch
@spcl_eth
20
RMA: LOGGING GETS
Proc p Proc q
C
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
20
RMA: LOGGING GETS
Proc p Proc q
A
B
C
D
E
F
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
20
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory memoryD
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory
Logs of #gets
memoryD
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory
Logs of #gets
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory
Logs of #gets
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory
Logs of #gets
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
memory
Logs of #gets
Data is not
yet valid
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
Data is not
yet valid
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get
Data is not
yet valid
memoryD
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
Data is not
yet valid
EC 1memory
D1
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
Data is not
yet valid
EC 1memory
D
1
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
Data is not
yet valid
EC 1memory
D
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
Data is not
yet valid
EC 1memory
D
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
Data is not
yet valid
EC 1memory
DD
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
EC 1memory
DD
Data is valid
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
EC 1memory
D
Now this data
may be invalid
Data is valid
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
EC 1memory
D
Now this data
may be invalid
Data is valid
1...
Logs of gets
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
EC 1memory
+ #get
+ EC( )
D
Now this data
may be invalid
Data is valid
1...
Logs of gets
D
1
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
#get + EC( )
EC 1memory
+ #get
+ EC( )
D
Now this data
may be invalid
Data is valid
delete #get
and EC
1...
Logs of gets
D
1
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
D
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
log(#get + EC)
memory
Logs of #gets
EC 1memory
+ #get
+ EC( )
D
Now this data
may be invalid
Data is valid
delete #get
and EC
...Logs of gets
D
1
spcl.inf.ethz.ch
@spcl_eth
21
RMA: LOGGING GETS
Proc p Proc q
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
22
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
22
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
22
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
App. dataChpt. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
Stage 2: replay actions beyond the checkpoint
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
Stage 2: replay actions beyond the checkpoint
X
Y
Z
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
Stage 2: replay actions beyond the checkpoint
X
Y
Z
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
Stage 2: replay actions beyond the checkpoint
X
Y
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
Stage 2: replay actions beyond the checkpoint
X
Y
App. data
DataP
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
memory
Logs of puts
Stage 2: replay actions beyond the checkpoint
X
Y
App. data
DataPX + #put + EC( )0
Y + #put + EC( )0
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
memory
Logs of puts
Stage 2: replay actions beyond the checkpoint
X
Y
App. data
DataP
Epoch 0
X + #put + EC( )0
Y + #put + EC( )0
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
memory
Logs of puts
Stage 2: replay actions beyond the checkpoint
X
Y
X + #put + EC( )0
Y + #put + EC( )0
App. data
DataP
Epoch 0
X + #put + EC( )0
Y + #put + EC( )0
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
memory
Logs of puts
Stage 2: replay actions beyond the checkpoint
X
Y
X + #put + EC( )0
Y + #put + EC( )0
App. data
DataP
Epoch 0
X + #put + EC( )0
Y + #put + EC( )0
spcl.inf.ethz.ch
@spcl_eth
memorymemory
23
RMA: RECOVERY
Proc p Proc q
memory
Logs of puts
Stage 2: replay actions beyond the checkpoint
X + #put + EC( )0
Y + #put + EC( )0
App. data
DataPX + #put + EC( )0
Y + #put + EC( )0
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
A
B
C
D
E
F
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
A
B
C
D
E
F
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )2
F + #get + EC( )2
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )2
F + #get + EC( )2
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )2
F + #get + EC( )2
Stage 2: replay actions beyond the checkpoint
DataP
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
E + #get + EC( )2
F + #get + EC( )2
D + #get + EC( )1
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
E + #get + EC( )2
F + #get + EC( )2
D + #get + EC( )1
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
X + #put + EC( )0
Y + #put + EC( )0
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
D + #get + EC( )1
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
E + #get + EC( )
F + #get + EC( )2
2
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
updating
state...
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
memory memory
24
RMA: RECOVERY
Proc p Proc q
Logs of puts
X + #put + EC( )0
Y + #put + EC( )0
App. data
D
E
F
Logs of gets
state
restored!
Stage 2: replay actions beyond the checkpoint
DataP
D + #get + EC( )1D
E + #get + EC( )2
F + #get + EC( )2
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
25
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
25
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
25
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
A Cray XE/XT
supercomputer
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
...4 cabinets:
A Cray XE/XT
supercomputer
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
...4 cabinets:
3 chassis: ...
A Cray XE/XT
supercomputer
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
...4 cabinets:
3 chassis: ...
A Cray XE/XT
supercomputer
...8 blades:
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
...4 cabinets:
3 chassis:
4 nodes:
...
...
A Cray XE/XT
supercomputer
...8 blades:
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
...4 cabinets:
3 chassis:
4 nodes:
...
...
...
A Cray XE/XT
supercomputer
...8 blades:
32 cores:
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
A single hardware crash may kill multiple processes
...4 cabinets:
3 chassis:
4 nodes:
...
...
...
A Cray XE/XT
supercomputer
...8 blades:
32 cores:
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
A single hardware crash may kill multiple processes
...4 cabinets:
3 chassis:
4 nodes:
...
...
...
A Cray XE/XT
supercomputer
...8 blades:
32 cores:
Up to 128 process
failures (assuming
1 process per core)
spcl.inf.ethz.ch
@spcl_eth
26
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Today’s supercomputers have a hierarchical layout
A single hardware crash may kill multiple processes
Introduced protocols usually cannot handle > 1 process crash
...4 cabinets:
3 chassis:
4 nodes:
...
...
...
A Cray XE/XT
supercomputer
...8 blades:
32 cores:
Up to 128 process
failures (assuming
1 process per core)
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
G
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
G
A
B
C
Application data
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
Application data
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
Parity data
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
27
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 1: groups of processes: Divide processes into groups of size G each
Add m parity processes to each group to store the parity data
m
G
A
B
C
X
Y
D
E
F
S
T
G
H
I
U
W
J
K
L
Z
V
M
N
O
1
2
P
Q
R
3
4
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups:
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
28
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
Step 2: topology-aware distribution of groups: For example, apply topology-awareness at the level of blades...
... and nodes
Proc 5 Proc 6
Proc 7 Proc 8 Proc 11 Proc 12
Proc 13 Proc 14 Proc 17 Proc 18
Proc 19 Proc 20 Proc 23 Proc 24
Proc 19 Proc 20 Proc 21 Proc 22 Proc 23 Proc 24
Proc 1 Proc 2 Proc 3 Proc 4
Proc 9 Proc 10
Proc 15 Proc 16
Proc 21 Proc 22
A D G J M P
Y T W V 2 4
Q
R
3
B
X
E
F
S
H
I
U
K
L
Z
N
O
1
C
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failure
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failure
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failure
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...at every
hierarchy level
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
29
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓 =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗(𝑥𝑗 ∩ 𝑥𝑗,𝑐𝑓) =
𝑗=1
ℎ
𝑥𝑗=1
𝐻𝑗
𝑃𝑗 𝑥𝑗 𝑃𝑗(𝑥𝑗,𝑐𝑓|𝑥𝑗)
Probability that xj elements
of level j will fail and cause
a catastrophic failureProbability
that xj elements
of level j will fail
Probability that xj
given failures at level j
are catastrophic
Every number
of xj elements
is considered...
...at every
hierarchy level
...
...
...
...
...
spcl.inf.ethz.ch
@spcl_eth
30
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
spcl.inf.ethz.ch
@spcl_eth
30
EXTENDING THE PROTOCOLS FOR MORE RESILIENCE
The probability of a catastrophic failure in a multi-level
computing machine A catastrophic failure: a failure that takes place when more than m processes in
the same group die
𝑃𝑐𝑓/𝑑𝑎𝑦:
𝑁𝑟 𝑜𝑓 𝑝𝑎𝑟𝑖𝑡𝑦 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
31
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
31
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
31
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
uncoordinated checkpointing
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
uncoordinated checkpointing
coordinated checkpointing
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
uncoordinated checkpointing
coordinated checkpointing
topology-awareness
uses
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
[2] J.T.Daly, A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems,
Volume 22 Issue 3, February 2006, Pages 303-312
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
uncoordinated checkpointing
usescoordinated checkpointing
Daly’s [2]
formula
topology-awareness
uses
≈ 2𝑀𝑇
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
spcl.inf.ethz.ch
@spcl_eth
32
HOLISTIC RESILIENCE PROTOCOL FOR RMATHE OVERVIEW
[2] J.T.Daly, A higher order estimate of the optimum checkpoint interval for restart dumps. Future Generation Computer Systems,
Volume 22 Issue 3, February 2006, Pages 303-312
Implemented as FTRMA: a portable fault-tolerance library Based on C and foMPI, an available MPI-3 RMA implementation [1]
The layered protocol:
transparent logging
uncoordinated checkpointing
usescoordinated checkpointing
Daly’s [2]
formula
topology-awareness
uses
≈ 2𝑀𝑇
[1] R. Gerstenberger, M. Besta, T. Hoefler, Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided.
ACM/IEEE Supercomputing 2013, SC13, Best Paper Award
usesdemand
checkpoints
spcl.inf.ethz.ch
@spcl_eth
33
Evaluation on CSCS Monte Rosa
1,496 computing Cray XE6 nodes
47,872 schedulable cores
46TB memory
4 protocols
2 applications
PERFORMANCE
spcl.inf.ethz.ch
@spcl_eth
34
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
34
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
34
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
no-FT: no fault tolerance
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
34
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
no-FT: no fault tolerance
f-daly: using Daly’s interval
1-5% slower than no-FT
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
34
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
no-FT: no fault tolerance
f-daly: using Daly’s interval
1-5% slower than no-FT
f-no-daly: no Daly’s interval
1-15% slower than no-FT
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
34
NAS 3D FFT [1] Performance
SCR: a popular checkpoint
/ restart library
21-67% slower than no-FT
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
no-FT: no fault tolerance
f-daly: using Daly’s interval
1-5% slower than no-FT
f-no-daly: no Daly’s interval
1-15% slower than no-FT
PERFORMANCE: COORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
35
PERFORMANCE: UNCOORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
35
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: UNCOORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
35
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
How the log size impacts the
number of uncoordinated
checkpoints and the
performance of the code?
PERFORMANCE: UNCOORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
35
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
How the log size impacts the
number of uncoordinated
checkpoints and the
performance of the code?
PERFORMANCE: UNCOORDINATED CHECKPOINTING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
36
PERFORMANCE: ACCESS LOGGING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
36
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: ACCESS LOGGING
NAS 3D FFT
spcl.inf.ethz.ch
@spcl_eth
36
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: ACCESS LOGGING
NAS 3D FFT
no-FT: no fault tolerance
spcl.inf.ethz.ch
@spcl_eth
36
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: ACCESS LOGGING
NAS 3D FFT
no-FT: no fault tolerance
FTRMA: logging puts (FFT
code does not use gets)
8-9% slower than no-FT
spcl.inf.ethz.ch
@spcl_eth
36
NAS 3D FFT [1] Performance
[1] Nishtala et al. Scaling communication-intensive applications on BlueGene/P using one-sided communication and
overlap.IPDPS’09
PERFORMANCE: ACCESS LOGGING
NAS 3D FFT
no-FT: no fault tolerance
FTRMA: logging puts (FFT
code does not use gets)
8-9% slower than no-FT
ML: a simple protocol
based on message logging
18% slower than no-FT
spcl.inf.ethz.ch
@spcl_eth
37
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
An insert: 1 get and 1 put are logged
A collision: 4 gets and 6 puts are logged
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
no-FT: no fault tolerance
An insert: 1 get and 1 put are logged
A collision: 4 gets and 6 puts are logged
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
no-FT: no fault tolerance
f-puts: logging puts
12% slower than no-FT
An insert: 1 get and 1 put are logged
A collision: 4 gets and 6 puts are logged
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
no-FT: no fault tolerance
f-puts: logging puts
12% slower than no-FT
f-puts-gets: logging puts
and gets
33% slower than no-FT
An insert: 1 get and 1 put are logged
A collision: 4 gets and 6 puts are logged
spcl.inf.ethz.ch
@spcl_eth
37
Distributed Hashtable Performance
PERFORMANCE: ACCESS LOGGING
DISTRIBUTED HASHTABLE
no-FT: no fault tolerance
f-puts: logging puts
12% slower than no-FT
ML: logging puts and gets
40% slower than no-FT
f-puts-gets: logging puts
and gets
33% slower than no-FT
An insert: 1 get and 1 put are logged
A collision: 4 gets and 6 puts are logged
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Holistic fault-tolerance library
Topology-awareness
CC in RMAGeneric model
UC in RMA
Recovery in RMAProofs
38
OVERVIEW OF OUR RESEARCH
PerformanceDistribution of
processesDesign
MP vs. RMA
MP vs. RMA
Model extensions
Deadlock freedom
Correct recovery
Checkpointing
schemes
Optimizations
Checkpoints
on demand
Schemes
Basic scheme
Extended RMA
semantics
Decreasing
failure prob.
Logging accesses
spcl.inf.ethz.ch
@spcl_eth
Thanks to:
Paul Hargrove (and the whole UPC team)
and the MPI Forum RMA WG …
… and the institutions:
39
ACKNOWLEDGMENTS
spcl.inf.ethz.ch
@spcl_eth
Thank you
for your attention
40
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
memory
Logs of puts
CC
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
memory
Logs of puts
CC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
CC
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
CC
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
record the put
memory
Logs of puts
C + #put
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
To replay the actions
preserving ,
we also record
epoch counters
record the put
memory
Logs of putsco
C + #put
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
To replay the actions
preserving ,
we also record
epoch counters
record the put
memory
Logs of putsco
C + #put
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 2
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
To replay the actions
preserving ,
we also record
epoch counters
record the put
memory
Logs of putsco
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 2
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
To replay the actions
preserving ,
we also record
epoch counters
record the put
memory
Logs of putsco
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 2
2
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
To replay the actions
preserving ,
we also record
epoch counters
record the put
memory
Logs of putsco
C + #put
+ EC( )
C
Data is valid
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
EC 2
2
spcl.inf.ethz.ch
@spcl_eth
41
RMA: LOGGING PUTS
Proc p Proc q
C
𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔,… , 𝑑𝑎𝑡𝑎
#𝑎 = 𝑠𝑟𝑐, 𝑡𝑟𝑔, …
2
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory memory
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
memory
App. data
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
memory
Chpt. data
App. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
memory
Application
state before
the checkpoint
Chpt. data
App. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
memorymemory
Application
state before
the checkpoint
Chpt. data
App. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memorymemory
Application
state before
the checkpoint
Chpt. data
App. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
App. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
take a local
checkpointApp. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataPDataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
DataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
DataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
DataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
DataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
Application
state before
the checkpoint
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...Application
state some time
later, after some
communication...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...Application
state some time
later, after some
communication...
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
DataC
Chpt. data
DataP take a local
checkpointApp. data
Chpt. data
...
...State lost!
DataP
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
Integrate
the data
with the
checksum
The Epoch Condition
DataC
Chpt. data
take a local
checkpointApp. data
Chpt. data
...
...State lost!
can clear
the logs
spcl.inf.ethz.ch
@spcl_eth
memory
42
RMA: CHECKPOINTING
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
The Epoch Condition
DataC
Chpt. data
App. data
Chpt. data
...
...State lost!
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
Fetch the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
...
Fetch the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
...
Fetch the data
Decode
the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
...
Fetch the data
Decode
the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!
...
DataP
Fetch the data
Decode
the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!DataP
Fetch the data
Decode
the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!DataP
Fetch the data
Decode
the data
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
State lost!DataP
Fetch the data
Decode
the data
DataP
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
Proc C
a process
that stores
parity data
memory
Parity data
memory
DataC
Chpt. data
App. data
Chpt. data
...
...
RMA: RECOVERY
DataP
Fetch the data
Decode
the data
DataP
Stage 1: restore the state upon checkpointing
spcl.inf.ethz.ch
@spcl_eth
memory
42
Proc p Proc q
memory
App. data
memory
Chpt. data
RMA: RECOVERY
DataP