DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous...
-
Upload
emmeline-baker -
Category
Documents
-
view
216 -
download
0
Transcript of DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous...
![Page 1: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/1.jpg)
DISTRIBUTED ALGORITHMS
Luc Onana
Seif Haridi
![Page 2: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/2.jpg)
DISTRIBUTED SYSTEMS
• Collection of autonomous computers, processes, or processors (nodes) interconnected by a communication network.
• Motivations:– Information exchange (collaborative work)
– resources sharing (e.g. printer, backup storage, disk units, etc.)
– cost reduction
– partial-failure (enables increase of availability)
– increase of performance through parallelism,...
![Page 3: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/3.jpg)
Main characteristics
• No shared memory between different nodes– each node has its memory– communication by messages
• No global clock – each node has its own clock
• Impossible for a node to obtain an instantaneous global state of the system
![Page 4: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/4.jpg)
Why do we need distributed algorithms?
• Distributed algorithms are backbone of distributed computing systems– they are essential for the implementation of
distributed systems• distributed operating systems
• distributed databases, communication systems,
• real-time process-control systems,
• transportation, etc.
![Page 5: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/5.jpg)
Focus of the course
• How to design distributed algorithms– Study of fundamental problems– Analysis of distributed algorithms
• How to achieve fault-tolerance in a distributed system– Very important for high availability
– I.e. almost nonstop operation
![Page 6: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/6.jpg)
Categories of Distributed Algorithms
• Fully decentralized
– more difficult in general
• With a centralized coordinator
– conceptually simpler. But require efficient mechanisms for selecting a new coordinator if the current one fails
![Page 7: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/7.jpg)
Material
• Book
– Distributed Operating Systems & Algorithms
By
Randy Chow and Theodore Johnson
• Others
![Page 8: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/8.jpg)
Content
• Introductory chapter (chapter 9)– Causality
• Ordering of events (timestamps)
• Causal communication
– Distributed snapshots• Detecting stable properties, Diffusing computation
– Modeling a distributed computation• Expressing correctness properties of a dist. algo.
![Page 9: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/9.jpg)
Content (cont.)
• Introductory chapter (cont.)– Failures in a distributed system
• In a distributed system, a node may fail, a communication link may fail.
• There is then a need to minimize the damage of a component failure to the users of the system. This is the goal of fault-tolerance
![Page 10: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/10.jpg)
Content (cont.)
• Chapter 10:– Synchronization
• distributed mutual exclusion: needed to regulate accesses to a common resource that can be used only by one process at a time
– Election• as previously mentioned, when a central coordinator
fails, a distributed algorithm is needed to select a new coordinator (important for fault-tolerance)
![Page 11: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/11.jpg)
Content (cont.)
• Chapter 11:– Distributed agreement
• How to get a set of nodes to agree on a value.
• Distributed agreement is used for instance, – to determine which nodes are alive in the system
– to confine malicious behavior of some components
» Fault-tolerance
![Page 12: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/12.jpg)
Content (cont.)
• Chapter 12:– Replicated data management
• A key for high availability is to replicate components (data/files, servers, etc.)
• we shall be concerned with – techniques for maintaining replicated data in a distributed
system, (database techniques)
– atomic broadcast/multicast
– membership
![Page 13: DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e4c5503460f94b41831/html5/thumbnails/13.jpg)
Content (cont.)
• Chapter 13:– Checkpointing and recovery
• Error recovery is essential for fault-tolerance
• When a processor fails and then is repaired, it will need to recover its state of the computation.
• To enable recovery, checkpointing (recording of the state into a stable storage) is needed
• We will be concerned with techniques used for this, in the context of distributed systems