From Clusters to Grids

October, 2003 – Linkoping, Sweden

Andrew Grimshaw

Department of Computer Science, Virginia

CTO & Founder Avaki Corporation

From Clusters to GridsFrom Clusters to Grids

Agenda

• Grid Computing Background

• Legion

• Existing Systems & Standards

• Summary

Grid Computing

First: What is a Grid System?

A Grid system is a collection

of distributed resources

connected by a network

Examples of Distributed Resources: Desktop Handheld hosts Devices with embedded processing resources such as

digital cameras and phones Tera-scale supercomputers

A grid enables users to collaborate securely by sharing processing, applications, and data across heterogeneous systems and administrative domains for collaboration, faster application execution and easier access to data.• Compute Grids • Data Grids

What is a Grid?

A grid is all about gathering together resources and making them accessible to users and applications.

What are the characteristics of a Grid system?

Numerous Resources

Ownership by MutuallyDistrustful Organizations

& Individuals

Potentially FaultyResources

Different SecurityRequirements

& Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different ResourceManagementPolicies

Connected byHeterogeneous, Multi-Level Networks

What are the characteristics of a Grid system?

Numerous Resources

Ownership by MutuallyDistrustful Organizations

& Individuals

Potentially FaultyResources

Different SecurityRequirements

& Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different ResourceManagementPolicies

Connected byHeterogeneous, Multi-Level Networks

Technical Requirements of a Successful Grid Architecture

Simple Secure Scalable Extensible Site Autonomy Persistence & I/O Multi-Language Legacy Support Single Namespace Transparency Heterogeneity Fault-tolerance & Exception Management

Success requires an integrated solution

ANDflexible policy

Manage Complexity!!

Implication:Complexity is THE Critical Challenge

How should complexity be addressed?

Robustness

Time & Cost

HighSockets & Shells

A low-level or “socket & shell” approach is low in robustness & high in time and cost to develop.

Integrated Solution

An integrated approach is high in robustness and low in time and cost to develop.

As Application Complexity Increases, Differences Between the Systems Increase Dramatically

High-level versus low-level solutions

The Importance of Integration in a Grid Architecture

If separate pieces are used, then the programmer must integrate the solutions.

If all the pieces are not present, then the programmer must develop enough of the missing pieces to support the application.

Bottom Line: Both raise the bar by putting the cognitive burden on the programmer.

• Simple cycle aggregation • State of the state is essentially scheduling and

queuing for CPU cluster management• These definitions are selling short the promise of

Grid technology• AVAKI believes grids are not just about aggregating

and scheduling CPU cycles but also …• Virtualizing many types of resources, internally and across

domains• Empowering anyone to have secure access to any and all

resources through easy administration

Misconceptions about Grids

• Sons of SETI@home • United Devices, Entropia, Data Synapse• Low-end, desktop cycle aggregation• Hard sell in corporate America

• Cluster Load Management • LSF, PBS, SGE• High end, great for management of local clusters but not

well proven in multi-cluster environments

• As soon as you go outside of the local cluster to cross-domain multi-cluster, the game changes dramatically with the introduction of three major issues:

• Data • Security• Administration

Compute Grids Categories

To address these issues, you need a fully-integrated solution, or a toolkit to build one

Typical Grid Scenarios

Desktop Cycle Aggregation• Desktop only• United Devices, Entropia, Data Synapse

Cluster & Departmental Grids• Single owner, platform, domain, file system and location• SUN SGE, Platform LSF, PBS

Enterprise Grids• Single enterprise; multiple owners, platforms, domains, file systems, locations, and security policies• SUN SGE EE, Platform Multi-cluster

Global Grids• Multiple enterprises, owners, platforms, domains, file systems, locations, and security policies• Legion, Avaki, Globus

What are grids being used for today?

• Multiple sites with multiple data sources (public and private)

• Need secure access to data and applications for sharing

• Have partnership relationships with other organizations: internal, partners, or customers

• Computationally challenging applications

• Distributed R&D groups across company, networks and geographies

• Staging large files

• Want to utilize and leverage heterogeneous compute resources

• Need for accounting of resources

• Need to handle multiple queuing systems• Considering purchasing compute cycles for spikes in demand

Legion

Legion Grid Software

Desktop Server

Wide-area access to data, processing and application

resources in a single, uniform operating environment that is secure and easy to administer

Server ApplicationData Server Data Cluster

ApplicationsLegion Grid Capabilities Wide-area data access Distributed processing Global naming Policy-based

administration Resource accounting Fine-grained security Automatic failure

detection and recovery

Legion G R I DLegion G R I D

Load Mgmt & Queuing

VendorDepartment BDepartment APartner

ApplicationData

Load Mgmt & Queuing

Legion Combines Data and Compute Grid

Users Applications

Desktop ServerServer ApplicationData Server Data Cluster

Load Mgmt & Queuing

ApplicationData

Load Mgmt & Queuing

The Legion Data Grid

Data Grid

Wide-area access to data at its source location based on business

policies, eliminating manual copying and errors caused by accessing

out-of-date copies

Applications

Desktop ServerServer ApplicationData Server Data Cluster

Application

Data Grid Capabilities

Federates multiple data sources

Provides global naming Works with local and

virtual file systems – NFS, XFS, CIFS

Accesses data in DAS, NAS, SAN

Uses standard interfaces Caches data locally

Data Grid Share

Users Applications

Linux NT Solaris Solaris

Tools VendorResearch CenterHeadquartersInformatics Partner

Data mapped to Grid namespace via Legion ExportDir

Legion Data Grid transparently handles client and application requests, maps them

to the global namespace, and returns the data

Data Grid Access

ServerRD - 2

App_APM-1 ClusterHQ - 1

sequence_b Cluster BLAST sequence_csequence_a

Users Applications

Fine-grained Security

Access Point

• Access files using standard NFS protocol or Legion commands

- NFS security issues eliminated- Caches exploit semantics

• Access files using global name• Access based on specified privileges

Data Grid Access using virtual NFS

Partner

Department A Department B

Legion-NFS

Complexity = Servers + Clients• Clients mount grid• Servers share files to grid• Clients access data using NFS protocol• Wide-area access to data outside administrative domain

sequence_csequence_a

Keeping Data in the grid

• Legion storage servers• Data is copied into Legion storage servers

that execute on a set of hosts.• The particular set hosts used is a

configuration option - here five hosts are used

• Access to the different files is completely independent and asynchronous

• Very high sustained read/write bandwidth is possible using commodity resources

Local Disk

I/O Performance

1 10 20 30 40 50

Number of readers

Large Read Aggregate Bandwidth

NFS lnfsd LegionFS

Read performance in NFS, Legion-NFS, and Legion I/Olibraries. The x axis indicates the number of clients that simultaneously perform 1 MB reads on 10 MB files, and the y axis indicates total read bandwidth. All results are the average of multiple runs. All clients on 400 MHZ Intel’s, NFS server on 800 MHZ Intel server.

Data Grid Benefits

• Easy, convenient, wide-area access to data – regardless of location, administrative domain or platform

• Eliminates time-consuming copying and obtaining accounts on machines where data resides

• Provides access to the most recent data available• Eliminates confusion and errors caused by inconsistent

naming of data• Caches remote data for improved performance• Requires no changes to legacy or commercial applications• Protects data with fine-grained security and limits access

privileges to those required • Eases data administration and management• Eases migration to new storage technologies

The Legion Compute Grid

Compute Grid

Wide-area access to processing resources based on business policies,

managing utilization of processing resources for fast, efficient job

completion

Applications

Desktop Server ApplicationServer ApplicationData Server Data Cluster

Application

Compute Grid Capabilities

Job scheduling and priority-based queuing

Easy integration with third party load management and queuing software

Automatic staging of data and applications

Efficient processing of both sequential and parallel applications

Failure detection and recovery

Usage accounting

Compute Grid Access

SolarisServerRD - 2

NT ServerPM-1

Data ClusterHQ - 1

Data Linux Cluster

Scheduling, Queuing, Usage Management, Accounting, Recovery

Login/SubmissionLogin/Submission

• The grid:Locates resourcesAuthenticates and grants access privilegesStages applications and dataDetects failures and recoversWrites output to specified locationAccounts for usage

App_AData

Users Applications

Tools - All are cross-platform

• MPI• P-space studies - multi-run• Parallel C++• Parallel object-based

Fortran• CORBA binding• Object migration• Accounting

• legion_make - remote builds

• Fault-tolerant MPI libraries • post-mortem debugger• “console” objects• parallel 2D file objects• Collections

One Favorite

Related Work

• Avaki• All distributed systems literature• Globus• AFS/DFS• LSF, PBS, ….• Global Grid Forum - OGSA

Avaki Company Background• Grid Pioneers - a Legion spin-off

• Over $20M capitalization

• The only commercial grid software provider with a solution that addresses data access, security, and compute power challenges

• Standards efforts leader

Partners StandardsOrganizations

Customers

AFS/DFS comparison with Legion Data Grid

• AFS presumes that all files kept in AFS - no federation with other file systems. Legion allows data to be kept in Legion, or in an NFS, XFS, PFS, or Samba file system.

• AFS presumes all sites using Kerberos and that realms “trust” each other - Legion assumes nothing about local authentication mechanism and there is no need for cross-realm trust

• AFS semantics are fixed - copy on open - Legion can support multiple semantics. Default is Unix semantics.

• AFS volume oriented (sub-tree’s) - Legion can be volume oriented or file oriented

• AFS caching semantics not extensible - Legion caching semantics are extensible

Legion & Globus GT2

• Projects with many common goals:• Metacomputing (or the “Grid”)• Middleware for wide-area systems• Heterogeneous resource sets• Disjoint administrative domains• High-performance, large-scale applications

Legion Specific Goals

• Shared collaborative environment including shared file system

• Fault-tolerance and high-availability

• Both HPC applications and distributed applications

• Complete security model including access control

• Extensible

• Integrated - create a meta-operating system

Many “Similar” Features

• Resource Management Support • Message-passing libraries

• e.g., MPI

• Distributed I/O Facilities• Globus GASS/remote I/O vs. Avaki Data Grid

• Security Infrastructure

• The “toolkit” approach• Provide services as separate libraries

• E.g. Nexus, GASS, LDAP

• Pros:• Decoupled architecture

• easy to add new services into the mix• Low buy-in: use only what you like!

• In practice all the pieces use each other

• Cons:• No unifying abstractions

• very complex environment to learn in full• composition of services difficult as number of services grows

• Interfaces keep changing due to ever evolving design

• Does not cover space of problems

Globus

Standards: GGF

Background:

• Grid standards are now being developed at the Global Grid Forum (GGF)

• In-development standard, Open Grid Services Infrastructure (OGSI) will extend Web Services (SOAP/XML, WSDL, etc.)

• Names and a two level name scheme

• Factories and lifetime management

• Mandatory set of interfaces, e.g., discovery interfaces

• OGSA – Open Grid Services Architecture• Over-arching architecture

• Still in development

Summary

• Grids are about resource federation and sharing• Grids are here today. They are being used in production computing

in industry to solve real problems and provide real value.• Compute Grids• Data Grids

• We believe that users want high-level abstractions - and don’t want to think about the grid.

• Need low activation energy and legacy support

• There are a number of challenges to be solved - and different applications and organizations want to solve them differently

• Policy heterogeneity• Strong separation of policy and mechanism

• Several areas where really good policies are still lacking• Scheduling• Security and security policy interactions• Failure recovery (and the interaction of different policies)

From Clusters to Grids

Documents

Transcript of From Clusters to Grids

Grids from photos

The optimization of data access on clusters and data grids ...aszt.inf.elte.hu/~grid/thes/lorincz-thesis.pdf · The optimization of data access on data grids László Csaba L rincz

Green Computing Metrics: Power, Temperature, CO2, … Computing system: Many-cores, Clusters, Grids and Clouds Algorithm and model: task scheduling, CFD.

Supercomputers and Clusters and Grids, Oh My!

Middleware for data mining applications on clusters and grids

Collaborative Cyberinfrastructure: Facilitating Ground-breaking Researchreu.cct.lsu.edu/documents/presentations/QualtersTalk.pdf · 2014-08-18 · Clouds, Grids, Clusters Visualization

Distributed Transactional Memory for Clusters and Grids EuroTM, Paris, May 20th, 2011

Pacemaker 1.1-clusters from-scratch

INFSO-RI-508833 Enabling Grids for E-sciencE Running ECCE on EGEE clusters Olav Vahtras KTH.

1/12 Distributed Transactional Memory for Clusters and Grids EuroTM, Paris, May 20th, 2011 Michael Schöttner.

Swift: A Scientist’s Gateway to Campus Clusters, Grids and Supercomputers

Proposal for TC Status: Technical Committee on Clusters and Grids (TCCG) Mark Baker University of Portsmouth, UK Mark.Baker@Computer.org.

Scientific Workflows in the Cloud - SciTech · Virtual Clusters • One approach to deploying workflows in the cloud is to replicate grid/cluster environments • Grids and clusters

Lessons from Netflix Mesos Clusters

Grids. Cluster High-availability (HA) Clusters (Linux HA) Load-balancing Clusters (Platform LSF HPC, Sun Grid En gine, Moab Cluster Suite and Maui Cluster.

Clouds, Grids, Clusters and FutureGrid IUPUI Computer Science February 11 2011 Geoffrey Fox gcf@indiana.edu ://.

Copyright Gordon Bell Clusters & Grids Crays, Clusters, Centers and Grids Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation.

Isosurface Extraction from Hybrid Unstructured Grids Containing

Isosurface Extraction from Hybrid Unstructured Grids ...

Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl.