SP3.1: High-Performance Distributed Computing

14
7 april 2006 1 vrije Universiteit vrije Universiteit SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu, Alex Iosup, Hashim Mohamed, Ozan Sonmez Henri Bal, Thilo Kielmann, Jason Maassen, Rob van Nieuwpoort, et al.

description

SP3.1: High-Performance Distributed Computing. the Ibis Java-centric grid middleware. The KOALA grid scheduler. and. Henri Bal, Thilo Kielmann, Jason Maassen, Rob van Nieuwpoort, et al. Dick Epema Catalin Dumitrescu, Alex Iosup, Hashim Mohamed, Ozan Sonmez. TUDelft: KOALA. - PowerPoint PPT Presentation

Transcript of SP3.1: High-Performance Distributed Computing

Page 1: SP3.1: High-Performance           Distributed Computing

7 april 2006 1vrije Universiteitvrije Universiteit

SP3.1: High-Performance Distributed Computing

The KOALA grid scheduler

and the Ibis Java-centricgrid middleware

Dick EpemaCatalin Dumitrescu, Alex Iosup,Hashim Mohamed, Ozan Sonmez

Henri Bal, Thilo Kielmann,Jason Maassen,Rob van Nieuwpoort, et al.

Page 2: SP3.1: High-Performance           Distributed Computing

7 april 2006 2vrije Universiteitvrije Universiteit

TUDelft: KOALA• KOALA is a multicluster/grid scheduler• Main goals of KOALA:

Load sharing of jobs across the sites in a grid: Automatic resource selection

Co-allocation of jobs across the sites in a grid: In order to use more resources As dictated by the structure of applications

(e.g., simulation/visualization)• KOALA has been released on the DAS in

september 2005

Page 3: SP3.1: High-Performance           Distributed Computing

7 april 2006 3vrije Universiteitvrije Universiteit

KOALA: Schedulingglobal queue

LS

local queues with local schedulers

local jobsglobal job

KOALA

clusters

LS LS load sharing

co-allocation

Page 4: SP3.1: High-Performance           Distributed Computing

7 april 2006 4vrije Universiteitvrije Universiteit

VU: Ibis• Ibis: Java-centric grid middleware for

distributed supercomputing• Satin: divide-and-conquer parallelism in

grids• GAT: Grid Application Toolkit• Implemented several Java applications from

SP 1.3 (Medical/Vumc) SP 1.6 (Telescience/AMOLF) SP 2.1 (iPSE/ UvA) SP 2.2 (AID/UvA)

Page 5: SP3.1: High-Performance           Distributed Computing

7 april 2006 5vrije Universiteitvrije Universiteit

Ibis: Grid’5000 experiments• Grid’5000: French computer science

Grid with 2000 nodes at 9 sites• Used Grid’5000 for

Running Satin applications Nqueens challenge (2nd Grid Plugtest)

Ibis/Satin/GAT application running on 960 nodes at 6 sites, ~85% efficiency Large-scale peer-to-peer experiments using Zorilla

(Gnutella-like latency-based flooding of ads for joining a compution)

Page 6: SP3.1: High-Performance           Distributed Computing

7 april 2006 6vrije Universiteitvrije Universiteit

KOALA feature 1: the Runners• There are many ugly

application types out there• No way they can all be

supported by a single scheduler

• Solution: runners (=interface modules)

• Currently supported: Any type of single-component job MPI/DUROC jobs Ibis jobs HOC applications

runner

Page 7: SP3.1: High-Performance           Distributed Computing

7 april 2006 7vrije Universiteitvrije Universiteit

KOALA feature 2: the policies• Originally supported co-allocation policies:

Worst-Fit: balance job components across sites Close-to-Files: take into account the locations of

input files to minimize transfer times • Different application types require different

ways of component placement• So:

Modular structure with pluggable policies Take into account internal communication

structure of applications

Page 8: SP3.1: High-Performance           Distributed Computing

7 april 2006 8vrije Universiteitvrije Universiteit

KOALA feature 3: support for HOCs

• Higher-Order Components: Pre-packaged software components with generic

patterns of parallel behavior Patterns: master-worker, pipelines, wavefront

• Benefits: Facilitates parallel programming in grids Enables user-transparent scheduling in grids

• Most important additional middleware: Translation layer that builds a performance model

from the HOC patterns and the user-supplied application parameters

• Supported by KOALA (with Univ. of Münster)• Initial results: up to 50% reduction in runtimes

Page 9: SP3.1: High-Performance           Distributed Computing

7 april 2006 9vrije Universiteitvrije Universiteit

TUDelft: GrenchMark• GrenchMark is a flexible grid workload

generator, submitter, and results analyzer• Main goals of GrenchMark:

Generic workload definition for many types of workloads and application characteristics

Grid workload generation Submitting and replaying workloads in

different grid settings• GrenchMark released in november 2005• GrenchMark used to test KOALA

Page 10: SP3.1: High-Performance           Distributed Computing

7 april 2006 10vrije Universiteitvrije Universiteit

KOALA future (1)• Support for more applications types, e.g.,

Workflows Parameter sweep applications

• Communication-aware and application-aware scheduling policies: Take into account the communication pattern of

applications when co-allocating Also schedule bandwidth (in DAS3)

• Better interface KOALA-local schedulers KOALA is too nice

Page 11: SP3.1: High-Performance           Distributed Computing

7 april 2006 11vrije Universiteitvrije Universiteit

KOALA future (2)

• Peer-to-peer structure instead of hierarchical grid scheduler

• Support heterogeneity DAS3 DAS2 + DAS3 PoC DAS3 + Grid’5000

Page 12: SP3.1: High-Performance           Distributed Computing

7 april 2006 12vrije Universiteitvrije Universiteit

DAS-3CPU

’s

R

CPU’sR

CPU’s

R

CPU’

s

R

CPU’s

R

NOC

KOALA and Ibis future

Page 13: SP3.1: High-Performance           Distributed Computing

7 april 2006 13vrije Universiteitvrije Universiteit

Conclusions

• SP3.1 is well on track• SP3.1 has delivered reliable software

tools for everybody to use: KOALA/Grenchmark Ibis/Satin

• SP3.1 has a bright future Still many research challenges (Access to) great new heterogeneous testbeds

Page 14: SP3.1: High-Performance           Distributed Computing

7 april 2006 14vrije Universiteitvrije Universiteit

More information• Web sites:

www.st.ewi.tudelft.nl/koala: general description KOALA tutorial papers

grenchmark.st.ewi.tudelft.nl: general description download papers

www.cs.vu.nl/ibis: Ibis distribution documentation papers