InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee,...

6
rk for On- Demand Grid Po int Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer Science, The University of Hon g Kong Grid point construction a difficult task Different grid users/applications demand different execution environments (EE’s) Managing - and switching between - different EE’s incur much system administration overheads E.g. Computing grid (MPICH-G2, etc.) vs. service grid (GT3); different OS distributions/versions, libraries, etc. Our solution – Instant Grid A framework for efficient construction of grid point • Convenient system administration for multiple EE’s • Instant EE construction in remote nodes • Complete transparency to user applications • Supports in-memory execution – protects HD’s data from malicious access

Transcript of InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee,...

Page 1: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

InstantGrid: A Framework for On- Demand Grid Point Construction

R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang,and F.C.M. LauDept. of Computer Science, The University of Hong Kong

Grid point construction a difficult task

Different grid users/applications demand different execution environments (EE’s)

Managing - and switching between - different EE’s incur much system administration overheads

E.g. Computing grid (MPICH-G2, etc.) vs. service grid (GT3); different OS distributions/versions, libraries, etc.

Our solution – InstantGrid A framework for efficient construction of grid point

• Convenient system administration for multiple EE’s

• Instant EE construction in remote nodes

• Complete transparency to user applications

• Supports in-memory execution – protects HD’s data from malicious access

Page 2: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

The InstantGrid Framework

All EE’s are installed, configured, and managed in central InstantGrid servers

Cluster/grid nodes obtain customized EE’s through network (i.e., the “dissemination” process)

Framework consists of the following key elements: Application-centric software grouping Proactive software configuration Discriminative file sharing mechanisms Options for file storage in compute nodes An EE dissemination service

Single Linux Image Management (SLIM):The infrastructure for EE dissemination

SLIM is able to deliver customized EE’s for:

• HPC cluster/grid systems

• Linux desktops

• Diskless Linux nodes

Page 3: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

Application-centric Software Grouping

(a) A service-oriented grid point

(b) A frontend node for HPC job submission

(c) A typical cluster node which processes jobs dispatched from the frontend node

(b)+(c): A single EE group indicating the software requirement of a cluster-based grid point, which includes a gatekeeper and a number of compute nodes

Software are grouped together to match the specific requirements of applications

An EE is a collection of software components, which include an OS, system libraries, grid/cluster middleware, applications, and the user data

Customized EE “images” for different users/applications

Facilitates software management and dissemination

Sample EE’s:

Page 4: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

Proactive Software Configuration

Discrimitive File Sharing Mechanism

Full replication is impractical due to large size of typical EE’s

Updating files through NFS is slow

InstantGrid adopts a hybrid approach: Replicate (frequently- updated files) + NFS (other files)

Traditionally, software are installed/configured incrementally

InstantGrid advocates “configuration before dissemination”

Try to configure all software in the central server if possible

The EE’s disseminated are (almost) ready-to-run

Option for File Storage in Compute Nodes “Full-copy to RAM” – files stored entirely in physical memory

“Full-copy to HD” – files stored in hard disk

“Copy-if-needed” – files stored in HD; only new files are copied

EE Dissemination Service Service is offered through a DHCP server, a TFTP server

and an NFS server

When a client machine boots up, it obtains its IP address and the kernel from the DCHP and TFTP servers respectively

Constructs the pre-defined EE by replicating writable files to local storage and mounting the read-only directories through the NFS

Page 5: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

Example – Constructing a service-oriented grid point

/ usr/ local/gt3.2

OS image

SLIM server

client clientclient

DHCP

client clientclient

SLIM server

1TFTP

2

SLIM server

client clientclient

3

4

certificateSLIM server 1CA server

client

client

client

42

3 . . .

1. Software installation at SLIM server

2. Client boots and obtains kernel

3. OS image/App disseminated 4. Process to generate certificates

Page 6: InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

Performance evaluation

Future Work

To devise standard protocols for communicating EE specifications between the InstantGrid servers and compute nodes

To optimize InstantGrid’s performance in WAN

A 256-node cluster-based grid point can be constructed from scratch in three (copy-if-needed) to five (full-copy to hard disk) minutes

Standalone grid points take longer time to construct. The bottleneck mainly lies on the process to generate host certificates

Conducted in HKU CS’s Gideon Cluster (Pentium 4 x 300; fast ethernet; each node has 512MB ram, 40GB IDE hard disk)

Two tests: (a) a cluster-based grid point, and (b) standalone grid points