Virtualization in MetaSystems

29
Virtualization in MetaSystems Vaidy Sunderam Emory University, Atlanta, USA [email protected]

description

Virtualization in MetaSystems. Vaidy Sunderam Emory University, Atlanta, USA [email protected]. Credits and Acknowledgements. Distributed Computing Laboratory, Emory University Dawid Kurzyniec, Piotr Wendykier, David DeWolfs, Dirk Gorissen, Maciej Malawski, Vaidy Sunderam Collaborators - PowerPoint PPT Presentation

Transcript of Virtualization in MetaSystems

Page 1: Virtualization in MetaSystems

Virtualization in MetaSystems

Vaidy SunderamEmory University, Atlanta, USA

[email protected]

Page 2: Virtualization in MetaSystems

Credits and Acknowledgements

Distributed Computing Laboratory, Emory University

Dawid Kurzyniec, Piotr Wendykier, David DeWolfs, Dirk Gorissen, Maciej Malawski, Vaidy Sunderam

Collaborators Oak Ridge Labs (A. Geist, C. Engelmann, J. Kohl) Univ. Tennessee (J. Dongarra, G. Fagg, E. Gabriel)

Sponsors U. S. Department of Energy National Science Foundation Emory University

Page 3: Virtualization in MetaSystems

Virtualization

Fundamental and universal concept in CS, but receiving renewed, explicit recognitionMachine level

Single OS image: Virtuozo, Vservers, Zones Full virtualization: VMware, VirtualPC, QEMU Para-virtualization: UML, Xen (Ian Pratt et. al, cl.cam.uk)

“Consolidate under-utilized resources, avoid downtime, load-balancing, enforce security policy”

Parallel distributed computing Software systems: PVM, MPICH, grid toolkits and systems

Consolidate under-utilized resources, avoid downtime, load-balancing, enforce security policy + aggregate resources

Page 4: Virtualization in MetaSystems

Virtualization in PVM

Historical perspective – PVM 1.0, 1989

Page 5: Virtualization in MetaSystems

Key PVM Abstractions

Programming model Timeshared, multiprogrammed virtual machine Two-level process space

Functional name + ordinal number Flat, open, reliable messaging substrate

Heterogeneous messages and data representation

Multiprocessor emulation Processor/process decoupling Dynamic addition/deletion of processors Raw nodes projected

Transparently Or with exposure of heterogeneous attributes

Page 6: Virtualization in MetaSystems

Parallel Distributed Computing

Multiprocessor systems Parallel distributed memory computing Stable and mainstream: SPMD, MPI Issues relatively clear: performance Platforms

Applications Correspondingly tightly coupled

Page 7: Virtualization in MetaSystems

Parallel Distributed Computing

Metacomputing and grids Platforms

Parallelism Possibly within components, but mostly loose concurrency or

pipelining between components (PVM: 2-level model) Grids: resource virtualization across multiple admin domain

Moved to explicit focus on service orientation “Wrap applications as services, compose applications into

workflows”; deploy on service oriented infrastructure Motivation: service/resource coupling

Provider provides resource and service; virtualized access

Page 8: Virtualization in MetaSystems

Virtualization in PDC

What can/should be virtualized? Raw resource

CPU : process/task instantiation => staging, security etc Storage : e.g. network file system over GMail Data : value added or processed

Service Define interface and input-output behavior Service provider must operate the service

Communication Interaction paradigm with strong/adequate semantics

Key capability: Configurable/reconfigurable resource, service, and

communication

Page 9: Virtualization in MetaSystems

The Harness II Project

Theme Virtualized abstractions for critical aspects of parallel

distributed computing implemented as pluggable modules, (including programming systems)

Major project components Fault-tolerant MPI: specification, libraries Container/component infrastructure: C-kernel, H2O Communication framework: RMIX Programming systems:

FT-MPI + H2O, MOCCA (CCA + H2O), PVM

Page 10: Virtualization in MetaSystems

DVM-enabling components

Virtual layer

Harness II

Provider BProvider A Provider C

Cooperatingusers

FT-MPI PVM Comp.Activeobjects

...

Applications

App 1 App 2

Programming model

Aggregation for Concurrent High Performance ComputingHosting layer

Collection of H2O kernels Flexible/lightweight middleware

Equivalent to Distributed Virtual Machine But only on client side

DVM pluglets responsible for (Co) allocation/brokering Naming/discovery Failures/migration/persistence

Programming environments: FT- MPI, CCA, paradigm frameworks, distributed numerical libraries

Page 11: Virtualization in MetaSystems

H2O Middleware Abstraction

Providers own resourcesIndependently make them available over the networkClients discover, locate, andutilize resourcesResource sharing occurs between single provider and single client

Relationships may betailored as appropriate

Including identity formats, resource allocation, compensation agreements

Clients can themselves be providers Cascading pairwise relationships may

be formed

Network

Providers

Clients

Page 12: Virtualization in MetaSystems

H2O Framework

Resources provided as services

Service = active software component exposing functionality of the resource

May represent „added value” Run within a provider’s

container (execution context)

May be deployed by any authorized party: provider, client, or third-party reseller

Provider specifies policies Authentication/authorization Actors kernel/pluglet

Decoupling Providers/providers/clients

Container

Provider host

Deploy Lookup& use

ProviderClient

<<create>>

B

A

Provider

<<create>>

A

B

Container

Lookup& use

Client

DeployProvider,

Client,or Reseller

Provider host

Traditional model

H2O model

Page 13: Virtualization in MetaSystems

Example usage scenarios

Deploy

B

A

LegacyApp

DeployProvider

AClient

Repository

A BReseller

C

Deploy

Anativecode

ProviderClient

Repository

ABDeveloper

C

ProviderClient

B

A

...

Registration and Discovery e-mail,phone, ...JNDIUDDI LDAP DNS GIS ...

B

Publish Find

Provider

Resource = computational service Reseller deploys software

component into provider’s container

Reseller notifies the client about the offered computational service

Client utilizes the service

Resource = raw CPU power Client gathers application

components Client deploys components into

providers’ containers Client executes distributed

application utilizing providers’ CPU power

Resource = legacy application Provider deploys the service Provider stores the information

about the service in a registry Client discovers the service Client accesses legacy

application through the service

Page 14: Virtualization in MetaSystems

Model and Implementation

H2O nomenclature container = kernel component = pluglet

Object-oriented model, Java and C-based implementations

Pluglet = remotely accessible object

Must implement Pluglet interface, may implement Suspendible interface

Used by kernel to signal/trigger pluglet state changes

Model Implement (or wrap) service as a

pluglet to be deployed on kernel(s)

Pluglet

Pluglet

Functionalinterfaces

Kernel

Clients

[Suspendible]

Interface Pluglet { void init(ExecutionContext cxt); void start(); void stop(); void destroy();}

Interface Suspendible { void suspend(); void resume();}

Interface StockQuote { double getStockQuote();}

(e.g. StockQuote)

Page 15: Virtualization in MetaSystems

Accessing Virtualized Services

Request-response ideally suited, but Stateful service access must be supported Efficiency issues, concurrent access Asynchronous access for compute intensive service Semantics of cancellation and error handling Many approaches focus on performance alone and

ignore semantic issues

Solution Enhanced procedure call/method invocation Well understood paradigm, extend to be more

appropriate to access metacomputing services

Page 16: Virtualization in MetaSystems

The RMIX layer

H2O built on top of RMIX communication substrate Provides flexible p2p communication layer for H2O applications

Enable various message layer protocols within a single, provider-based framework library

Adopting common RMI semantics

Enable high performance and interoperability Easy porting between protocols, dynamic protocol negotiation

Offer flexible communication model, but retain RMI simplicity Extended with: asynchronous and one-way calls

Issues: Consistency, Ordering, Exceptions, Cancellation

RPC clientsWeb Services

SOAP clients...

Java H2O kernel

A

C

B

H2O kernel

E

F

D

RMIX

Networking

RMIX

NetworkingRPC, IIOP,JRMP, SOAP, …

Page 17: Virtualization in MetaSystems

RMIX Overview

Extensible RMI frameworkClient and provider APIs

uniform access to communication capabilities

supplied by pluggable provider implementations

Multiple protocols supported

JRMPX, ONC-RPC, SOAP

Configurable and flexible Protocol switching Asynchronous invocation

ONC-RPCWeb Services

SOAP clients

GM

RMIX

RMIXXSOAP

RMIXRPCX

RMIXMyri

RMIXJRMPX

Java

ServiceAccess

Page 18: Virtualization in MetaSystems

RMIX Abstractions

Uniform interface and API

Protocol switching Protocol negotiation Various protocol stacks

for different situations SOAP: interoperability SSL: security ARPC, custom (Myrinet,

Quadrics): efficiency

Harness Kernel

Internet

security

firewall

efficiency

efficiency

H2O PlugletClient or Server

H2O PlugletClient or Server

H2O PlugletClient or Server

H2O Pluglet

H2O PlugletClient or Server

Asynchronous access to virtualized remote resources

Page 19: Virtualization in MetaSystems

Parameter marshalling Data consistency Also in PVM, MPI etc

Exceptions/cancellation Critical for stateful servers Conservative vs. best effort

Other issues Execution order Security

Virtualizing communications

Performance/familiarity vs. semantic issues

:stub

:paramcreate()

asyncCall()

modify() read()

Asynchronous RMIX

:stub

“started”

:target

“completed”

Client Server

DisregardAt Client-Side

InterruptClient I/O

DisregardAt Server-Side

Interrupt Server Thread

InterruptServer I/O

Ignore ResultReset server state

Result Delivery

ResultUnmarshalling

ParameterMarshalling Parameter

Unmarshalling

ResultMarshalling

Method Call

Call Initiation

Cancellation at various stages of the call

Page 20: Virtualization in MetaSystems

Programming Models: CCA and H2O

Common Component Architecture

Component standard for HPC

Uses and provides ports described in SIDL

Support for scientific data types

Existing tightly coupled (CCAFFEINE) and loosely coupled, distributed (XCAT) frameworks

H2O Well matched to CCA

model

ContainerProvider host

Deploy Lookup& use

Provider Client

<<create>>

B

A

Provider

<<create>>

A

B

Container

Lookup& use

Client

DeployProvider,

Client,or Reseller

Provider host

Traditional model

Proposed model

ContainerProvider host

Deploy Lookup& use

Provider Client

<<create>>

BB

AA

Provider

<<create>>

AA

BB

Container

Lookup& use

Client

DeployProvider,

Client,or Reseller

Provider host

Traditional model

Proposed model

Page 21: Virtualization in MetaSystems

MOCCA implementation in H2O

ComponentPlugletComponent

Pluglet

CCAComponent

ComponentPluglet

CCAComponent

BuilderPluglet

H2O Kernel

BuilderService

Invoke

Manage

Builder

CCACCA

Pluglet Pluglet

Builder Builder

CCACCA

Pluglet Pluglet

BuilderBuilder

CCACCA

Pluglet Pluglet

Builder

MoccaMainBuilder

MoccaMainBuilder

Each component running in separate pluglet

Thanks to H2O kernel security mechanisms, multiple components may run without interfering

Two-level builder hierarchy

ComponentID: pluglet URI

MOCCA_Light: pure Java implementation (no SIDL)

Page 22: Virtualization in MetaSystems

Performance: Small Data Packets

Factors:SOAP header overhead in XCATConnection pools in RMIX

Page 23: Virtualization in MetaSystems

Large Data Packets

• Encoding (binary vs. base64)

• CPU saturation on Gigabit LAN (serialization)

• Variance caused by Java garbage collection

Page 24: Virtualization in MetaSystems

Use Case 2: H2O + FT-MPI

Overall scheme: H2O framework installed on computational nodes, or

cluster front-ends Pluglet for startup, event notification, node discovery FT-MPI native communication (also MPICH)

Major value added FT-MPI need not be installed anywhere on computing

nodes To be staged just-in-time before program execution Likewise, application binaries and data need not be

present on computing nodes The system must be able to stage them in a secure

manner

Page 25: Virtualization in MetaSystems

Staging FT-MPI runtime with H2O

FT-MPI runtime library and daemons Staged from a repository (e.g. Web server) to the

computational node upon user’s request Automatic platform type detection; appropriate binary files

are downloaded from the repository as needed

Allows users to run fault tolerant MPI programs on machines where FT-MPI is not pre-installed

Not needing login account to do so: using H2O credentials instead

host

FT-MPIbinary

repository

StartupPluglet.classLINUX/ startup_d libftmpi.soSUN4SOL2/ startup_d libftmpi.so...

kernel

startuppluglet

deployProvider

User

startup_d

libf tmpi.so

stage

Page 26: Virtualization in MetaSystems

Launching FT-MPI applications with H2O

Staging applications from a network repository

Uses URL code base to refer to a remotely stored application

Platform-specific binary transparently uploaded to a computational node upon client request

Separation of roles Application developer

bundles the application and puts it into a repository

The end-user launches the application, unaware of heterogeneity

Applicationrepository

http://myorg.edu/mpiapps/

LINUX/ myapp1 myapp2SUN4SOL2/ myapp1 myapp2...

kernel – cluster 1

startuppluglet

Providers

User

kernel – cluster 2

startuppluglet

kernel – cluster n

startuppluglet

ftmpirun -np 512 -codebase ”http://myorg.edu/mpiapps/” myapp1

Distributed Virtual Machine

stage, run

Page 27: Virtualization in MetaSystems

Interconnecting heterogeneous clusters

Private, non-routable networks Communication proxies on cluster front-ends route

data streams Local (intra-cluster) channels not affected Nodes use virtual addresses at the IP level; resolved

by the proxy

Cluster 1

App

Startup_d

App

Startup_d

Cluster 2App

Startup_d

App

Startup_d

Communicationacross clusters

Communicationwithin cluster

H2O proxy H2O proxy

Startuppluglet

Startuppluglet

Page 28: Virtualization in MetaSystems

Initial experimental results

Proxied connection versus direct connection Standard FT-MPI throughput benchmark was used within a Gig-Ethernet cluster: proxies retain 65% of

throughput

Page 29: Virtualization in MetaSystems

Summary

Virtualization in PDC Devising appropriate abstractions Balance pragmatics and performance vs. model

cleanness

The Harness II Project H2O kernel

Reconfigurability, by clients/tpr’s very valuable RMIX communications framework

High level abstractions for control comms (native data comms)

Multiple programming model overlays CCA, FT-MPI, PVM Concurrent computing environments on demand