GGF10 - GridCPR-WG PARIS project-team Activities in Checkpoint Recovery
description
Transcript of GGF10 - GridCPR-WG PARIS project-team Activities in Checkpoint Recovery
Berlin, March 11th, 2004 1
GGF10 - GridCPR-WG
PARIS project-team Activities in Checkpoint Recovery
Christine Morin
PARIS INRIA project-team
IRISA – Rennes (France)
http://www.irisa.fr/paris
Berlin, March 11th, 2004 2
Cluster Federations
A particular case of grid Interconnection of several clusters of moderate size Homogeneity and heterogeneity
More and more homogeneous platforms: PC, Linux Heterogeneous networks (SAN, LAN, WAN) Clusters with different amount and kinds of resources
Considered applications Scientific applications (numerical simulation)
sequential and parallel applications based either on the shared memory or the message-passing communication paradigm
Code coupling applications Applications requiring a huge amount of resources (memory, computing
power) Dynamicity
A cluster may join or leave the federation at any time Individual nodes may fail in a cluster
SAN
SAN
LAN
WAN
Berlin, March 11th, 2004 3
Grid-aware OS for Cluster Federations
A single system image OS on each cluster A cluster appears as a single machine which offers a kind of standard
interface Mosix, Amoeba, Kerrighed
A cluster federation is seen as a set of pairs Structured peer to peer (P2P) network (instead of a hierarchy)
Fully decentralized control Native support for dynamicity Designed for scalability
Size of the routing tables bounded by log(N) Probabilistic log(N) bounds on the number of routing hops
“Standardization” of the APIs (IRIS project) Promising work to take into account the network's topology and
security issues (Pastry) Structured P2P systems usually provide distributed hash tables (DHT)
Building block for higher level services
DSM
DFS
CPU
Berlin, March 11th, 2004 4
Current Work on Checkpoint Recovery
Cluster Federation Execution of multithreaded applications in cluster federations
A coherence protocol for cached copies of volatile objects in peer-to-peer systems (multiple failures tolerated)
Hierarchical checkpointing protocol for code coupling applications Cluster SSI image operating system: Kerrighed
Full Posix thread interface Global process and memory management Configurable global scheduler
High availability Dynamic resource management for tolerating cluster
reconfigurations (node addition, eviction or failure) Checkpoint recovery mechanisms
Berlin, March 11th, 2004 5
Goals for Checkpoint Recovery in Kerrighed
Experimental platform for checkpointing strategies for parallel applications
Basic mechanisms common to different checkpointing protocols in MP and SM systems
Being able to checkpoint any kind of parallel application
Transparent checkpointing Implementation in a single system of
various checkpointing strategies To allow the programmer to
choose a suitable strategy for a particular application
To be able to compare several strategies with realistic (industrial) applications
Avoid code duplication in the system Robustness Fair comparison
Common framework Checkpoint and rollback servers Checkpoint numbering
Dependency management Unified model for message-passing
and shared memory models Direct Dependency Vector (DDV)
management Message logging Incremental checkpointing Checkpointing in background Communication system
Atomic multicast Stable storage
Different implementations Disk Memory
Berlin, March 11th, 2004 6
Checkpoint Recovery in Kerrighed: Current Status and Work Directions
Current Status Linux-based Kerrighed prototype
(2.4) Small kernel patch and a set of
modules Transparent checkpoint recovery for
(computing) individual processes Virtualization of a process in the
cluster Unique ghost mechanism for
process migration, checkpointing and restoration
Easy specialization of the stable storage implementation
Ghost can be sent to or retrieved from network, memory or disk
Work Directions Complete the debugging of
coordinated checkpointing (and recovery) for multithreaded and message-passing based applications
Checkpointable locks and barriers in a cluster
Disk I/O management Posix extension for a proper
integration of transparent checkpointing/recovery in the operating system
Ghost process
Memory Disk Network
Duplication Migration Checkpoint/restart
Berlin, March 11th, 2004 7
Hierarchical Checkpoint Recovery for Cluster Federations
Relaxed inter-cluster synchronism
to reflect the architecture Coordinated checkpointing in a cluster Communication-induced checkpointing
between clusters Independent checkpoints in each
cluster Forced checkpoints when a
communication generates a new dependency
Force a checkpoint only if the sender has saved a checkpoint since its last send
Several cluster checkpoints are kept Management of Direct Dependency
Vectors (DDV) to detect dependencies DDV included in inter-cluster messages DDV associated with cluster
checkpoints Garbage collection of useless cluster
checkpoints
Evaluation by discrete-event simulation
Works well if Few inter-cluster
communications Inter-cluster
communications « quasi-unidirectional »
Simulation Processing Display
Simulation Simulation
Berlin, March 11th, 2004 8
Future Work Checkpoint recovery in the large (we plan to hire a PhD student)
Dealing with applications with huge data sets executed in cluster federations Follow-up of our preliminary work on a hierarchical checkpointing protocol for
code coupling applications in cluster federations Based on Kerrighed experimental platform
Not only basic coordinated checkpointing but also various variants of independent and communication-induced strategies
Standard interface and basic building blocks Implementation in Kerrighed of ideas studied in previous projects
ICARE fault tolerant software DSM Combining replication inherent to the DSM with the replication needed for
ensuring recovery data stability Extension of the coherence protocol to manage recovery data in memory
HA-PSLS Integration of a DSM and a parallel file system Up-grading ICARE
Cohabitation of persistent and memory checkpoints Swap management (to avoid memory size limitation and to evict recovery data from
memory) Mapped file management (in-place checkpoints)
Berlin, March 11th, 2004 9
http://www.kerrighed.org
Kerrighed is registered as a community trademark.
Berlin, March 11th, 2004 10
Software Distribution
Kerrighed web site http://www.kerrighed.org (open since mid-November 2002) Open source under GPL licence Current version: Kerrighed V0.81 based on Linux 2.4.24
Kerrighed users mailing-list [email protected] (created in April 2003)
Kerrighed forum (created February 2004)
Notes Kerrighed is a registered trademark Kerrighed deposit at APP for each public release
Kerrighed tutorial (in conjunction with ICS’04, Saint-Malo (France), June 27th, 2004)
Berlin, March 11th, 2004 11
RoadMap for Kerrighed Prototype
March 2004 MPI (with migration)
April 2004 Kerrighed V1.00 (SSI-OSCAR) SGFD
January 2005 Kerrighed V1.10 64 bits (opteron) Checkpointing for parallel applications
July 2005 Kerrighed V2.0 High availability
Berlin, March 11th, 2004 12
Current Support: EDF
Kerrighed research prototype (2000-2003) CRECO EDF/INRIA
CIFRE Ph.D. grant (Geoffroy Vallée) Industrial Post-Doc (Renaud Lottiaux)
Experimentations with first industrial applications provided by EDF HRM1D, CATHARE, Cyrano 3, Aster
Kerrighed integration in OSCAR (2004-2005) INRIA Industrial Post-Doc (G. Vallée) with EDF & ORNL SSI-OSCAR
Berlin, March 11th, 2004 13
Current Support: DGA
Kerrighed robustness and full set of functionalities (2003-2005) COCA PEA funded by DGA
Partnership with CGEY and ONERA-CERT 2 full time engineers (Renaud Lottiaux, David Margery)
Experimentations with industrial applications Ligase, Gorf3D, Mixsar, RTI HLA
Berlin, March 11th, 2004 14
Current Kerrighed Team (being part of the PARIS project-team)
Faculty Christine Morin (DR, INRIA)
PhD students Pascal Gallard (INRIA) Gaël Utard (INRIA) Louis Rilling (ENS-Cachan)
Post-doc Geoffroy Vallée (PDI-EDF)
Engineers Renaud Lottiaux (INRIA) David Margery (INRIA)
Invited researcher Isaac Scherson (UCI)
Master students Jamal Ghaffour Etienne Rivière
Former members Ramamurthy Badrinath
(assistant professor, IIT Kharagpur, India)
May 2002 – April 2003 Viet Hoa Dinh (engineer)
September 2001-September 2002
Jean-Yves Burlett (Master student, univ. Rennes 1)
February-June 2001 Sébastien Monnet (Master
student, univ. Rennes 1) February-June 2003
H. Maka (Bachelor student, IIT Kharagpur)
May-July 2003
Berlin, March 11th, 2004 15
Academic Collaborations
University of Ulm, Germany Checkpointing for shared memory parallel applications
Rutgers University, USA Myrinet, Infiniband Self healing clusters
ORNL SSI-OSCAR
University of California, Irvine, USA Global scheduling
Deakin University, Australia SSI (informal contacts)