Clustered Systems for Massive Parallelism
-
Upload
candace-cantrell -
Category
Documents
-
view
49 -
download
1
description
Transcript of Clustered Systems for Massive Parallelism
![Page 1: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/1.jpg)
N. Xiong@ GSU Slide 1
Chapter 05
Clustered Systems for
Massive Parallelism
N. Xiong
Georgia State University
![Page 2: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/2.jpg)
N. Xiong@ GSU Slide 2
Chapter 05
Review and Introduction
![Page 3: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/3.jpg)
N. Xiong@ GSU Slide 3
Chapter 05
Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and
Management Virtual Clustering and Resource
Provisioning Homework Problems
Chapter 04 Main Contents
![Page 4: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/4.jpg)
N. Xiong@ GSU Slide 4
Chapter 05
Scalability Packaging Control Homogeneity Security
Design Objectives of Clustered Systems
![Page 5: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/5.jpg)
N. Xiong@ GSU Slide 5
Chapter 05
Design Objectives of Clustered Systems
![Page 6: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/6.jpg)
N. Xiong@ GSU Slide 6
Chapter 05
Fundamental Cluster Design Issues
Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and
HTC Systems
![Page 7: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/7.jpg)
N. Xiong@ GSU Slide 7
Chapter 05
Resource-Sharing in Cluster Systems
![Page 8: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/8.jpg)
N. Xiong@ GSU Slide 8
Chapter 05
An Idealized Cluster Architecture
Conventional databases and OLTP monitors offer users a desktop environment
Supports parallel programming based on standard languages and communication libraries
A user-interface subsystem combines the advantages of the Web interface and the windows GUI
![Page 9: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/9.jpg)
N. Xiong@ GSU Slide 9
Chapter 05
Node Architectures and System Packaging
Two types of cluster nodes compute nodes service nodes
![Page 10: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/10.jpg)
N. Xiong@ GSU Slide 10
Chapter 05
Compute Node Examples
![Page 11: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/11.jpg)
N. Xiong@ GSU Slide 11
Chapter 05
Modular Packaging of IBM BlueGene/L System
![Page 12: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/12.jpg)
N. Xiong@ GSU Slide 12
Chapter 05
Cluster System Interconnects
![Page 13: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/13.jpg)
N. Xiong@ GSU Slide 13
Chapter 05
High-Bandwidth Interconnects
![Page 14: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/14.jpg)
N. Xiong@ GSU Slide 14
Chapter 05
An InfiniBand Cluster Interconnection Network
![Page 15: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/15.jpg)
N. Xiong@ GSU Slide 15
Chapter 05
High-bandwidth Interconnects in Top-500 Systems
![Page 16: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/16.jpg)
N. Xiong@ GSU Slide 16
Chapter 05
Hardware, Software, and Middleware Support
![Page 17: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/17.jpg)
N. Xiong@ GSU Slide 17
Chapter 05
Design Principles of Clusters
Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent
![Page 18: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/18.jpg)
N. Xiong@ GSU Slide 18
Chapter 05
Design Principles of Clusters
Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer
![Page 19: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/19.jpg)
N. Xiong@ GSU Slide 19
Chapter 05
Design Principles of Clusters
Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory
Space Other Desired SSI Features
![Page 20: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/20.jpg)
N. Xiong@ GSU Slide 20
Chapter 05
Single Entry Point
![Page 21: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/21.jpg)
N. Xiong@ GSU Slide 21
Chapter 05
Single File Hierarchy
It is persistent. It is fault tolerant to some
degree. Network File System (NFS)
and Andrew File System (AFS).
![Page 22: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/22.jpg)
N. Xiong@ GSU Slide 22
Chapter 05
Single File Hierarchy
![Page 23: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/23.jpg)
N. Xiong@ GSU Slide 23
Chapter 05
Single I/O, Networking, and Memory Space
Single Input/Output Single Networking Single Point of Control Single Memory Space
![Page 24: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/24.jpg)
N. Xiong@ GSU Slide 24
Chapter 05
Single I/O, Networking, and Memory Space
![Page 25: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/25.jpg)
N. Xiong@ GSU Slide 25
Chapter 05
An Example
![Page 26: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/26.jpg)
N. Xiong@ GSU Slide 26
Chapter 05
Other Desired SSI Features
Single Job Management System
Single User Interface Single Process Space
![Page 27: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/27.jpg)
N. Xiong@ GSU Slide 27
Chapter 05
Middleware Support for SSI Clustering
![Page 28: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/28.jpg)
N. Xiong@ GSU Slide 28
Chapter 05
High Availability Through Redundancy
Reliability Availability Serviceability
![Page 29: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/29.jpg)
N. Xiong@ GSU Slide 29
Chapter 05
Availability and Failure Rate
![Page 30: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/30.jpg)
N. Xiong@ GSU Slide 30
Chapter 05
Availability Values of Several Representative Systems
![Page 31: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/31.jpg)
N. Xiong@ GSU Slide 31
Chapter 05
Redundancy Techniques
![Page 32: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/32.jpg)
N. Xiong@ GSU Slide 32
Chapter 05
Fault-Tolerant Cluster Configurations
Hot Standby Mutual Takeover Fault-Tolerance
![Page 33: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/33.jpg)
N. Xiong@ GSU Slide 33
Chapter 05
Recovery Schemes
Backward recovery Forward recovery: in real-
time systems
![Page 34: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/34.jpg)
N. Xiong@ GSU Slide 34
Chapter 05
Checkpointing and Recovery Techniques
Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval
![Page 35: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/35.jpg)
N. Xiong@ GSU Slide 35
Chapter 05
Checkpointing Parallel Programs
![Page 36: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/36.jpg)
N. Xiong@ GSU Slide 36
Chapter 05
Cluster Job Scheduling and Management
Cluster Job Management Issues A user server A job scheduler A resource manager
![Page 37: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/37.jpg)
N. Xiong@ GSU Slide 37
Chapter 05
Cluster Job Types
Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs
![Page 38: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/38.jpg)
N. Xiong@ GSU Slide 38
Chapter 05
Multi-Job Scheduling Schemes
![Page 39: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/39.jpg)
N. Xiong@ GSU Slide 39
Chapter 05
Share Cluster Nodes
Dedicated Mode Space Sharing
Time Sharing
![Page 40: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/40.jpg)
N. Xiong@ GSU Slide 40
Chapter 05
Migration Schemes Issues
Node Availability Migration Overhead Recruitment Threshold:
the amount of time a workstation stays unused before the cluster considers it an idle node
![Page 41: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/41.jpg)
N. Xiong@ GSU Slide 41
Chapter 05
Virtual Clustering and Resource Provisioning
![Page 42: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/42.jpg)
N. Xiong@ GSU Slide 42
Chapter 05
Five Virtual Cluster Research Projects
![Page 43: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/43.jpg)
N. Xiong@ GSU Slide 43
Chapter 05
Live VM Migration and Cluster Management
![Page 44: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/44.jpg)
N. Xiong@ GSU Slide 44
Chapter 05
Effect by Live Migration
![Page 45: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/45.jpg)
N. Xiong@ GSU Slide 45
Chapter 05
Dynamic Virtual Resource Provisioning
![Page 46: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/46.jpg)
N. Xiong@ GSU Slide 46
Chapter 05
Autonomic Adaptation of Virtual Environments
![Page 47: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/47.jpg)
N. Xiong@ GSU Slide 47
Chapter 05
Some References and Further Reading
![Page 48: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/48.jpg)
N. Xiong@ GSU Slide 48
Chapter 05
Homework Problems
![Page 49: Clustered Systems for Massive Parallelism](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813767550346895d9ef8d1/html5/thumbnails/49.jpg)
N. Xiong@ GSU Slide 49
Chapter 05
Homework Problems