Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource...

20
IT-Symposium 2007 18.04.2007 www.hp-user-society.de 1 Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007 [email protected] | www.science-computing.de Middleware for Itanium Systems Roland Niemeier science + computing ag science + computing ag © 2007 Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007 science + computing at a Glance IT-Services and Software for Technical Computing Environments Revenue 2005/06: 22 Mio € Staff: 218 (Jul 2006) Locations: (Germany) Tuebingen Munich Berlin Duesseldorf Ingolstadt Founded in 1989 as a Spin Off from the Institute of Theoretical Astrophysics in Tuebingen 0,00 5,00 10,00 15,00 20,00 25,00 30,00 199 0 199 2 199 4 1 9 9 5 /1 9 96 1 9 9 7 /1 9 98 1 9 9 9 /2 0 00 2 0 0 1 /2 0 02 2 0 0 3 /2 0 04 2 0 0 5 /2 0 06 revenue Mio € 0 50 100 150 200 250 300 # of employees

Transcript of Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource...

Page 1: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 1

Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

[email protected] | www.science-computing.de

Middleware for Itanium Systems

Roland Niemeier

science + computing ag

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

science + computing at a Glance

IT-Services and Software for TechnicalComputing Environments

Revenue 2005/06: 22 Mio €

Staff: 218 (Jul 2006)

Locations: (Germany) Tuebingen

Munich

Berlin

Duesseldorf

Ingolstadt

Founded in 1989 as a Spin Off from the Institute

of Theoretical Astrophysics in Tuebingen

0,00

5,00

10,00

15,00

20,00

25,00

30,00

1990

1992

1994

1995

/199

6

1997

/199

8

1999

/200

0

2001

/200

2

2003

/200

4

2005

/200

6

reven

ue M

io €

0

50

100

150

200

250

300

# o

f em

plo

yees

Page 2: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 2

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Cluster & Departmental

Compute Farms

Enterprise Compute Farms

Global Compute Farms

Desktop Aggregation

Need for M

iddle

ware

• Increased complexity -Heterogeneous platforms-Multiple administrative domains-Multiple locations-Varying security policies-Flexible cooperation structures

IT´s all about Distributed Resource Management

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

scVENUSEfficient System Administration

Platform LSF (Platform Computing)Distributed Computing

Partner since 1995First major customers: Audi (‘95), MTU (‘96), Airbus (‘97)

openSoftwareSupport for Open Software

Middleware for Computing

Page 3: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 3

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

optiSLang(Dynardo)Multidisciplinary Optimization and Robust Design

EnginFrame(NICE)

Computing Portal

VMware(VMware EMC)Virtualization

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Challenges for System Administrators

Operating Systems: Unix, Linux, Windows

Hardware Processors: Itanium II, X86, RISC

Page 4: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 4

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Hardware

Operating System

Horizontal System Structure

Applications

Middleware

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

What is the Grid?

� Academic:

� Ian Foster: flexible, secure, coordinated resource-sharing among dynamic collections of individuals, institutions, and resources”.

� EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”.

� Commercial

� IBM: “Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer”.

� Deutsche Bank (New economy Report): “We expect B2B e-Commerce to be the main catalyst towards a new kind of computing, and a new kind of Internet... A new technology infrastructure, which we call the GRID, will emerge to supercede the Web, and form the bedrock of the new economy”.

Page 5: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 5

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

A question at the Platform’s User Meeting in Stuttgart in 2004:

Does science + computing currently provide Grid Services?

Even though the tools exist, we do not really provide significant Grid Services today. However when we will do this 10 years later, nobody will call it Grid Services.

Time

Vis

ibil

ity

- Parallel and Distributed Computing (peak ~ 1990)

- Virtualization (~ 1998)

- Grid Computing (~ 2003)

- IT Controlling and Optimization (?)

Technological Changes in Middleware

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Hype curve for Middleware

Technology

trigger

Peak of

inflated expectations

Trough of

disillusionment

Slope of

enlightenment

Plateau of

productivity

maturity

visibility

Distributed and Parallel Computing

Virtualization

Grid

Valley of death

EvolutionaryStrategies for IT

Page 6: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 6

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

New or improvedMiddleware

Tests and Pilots

Requests

Roll Out

FirstDeployments

Delay

Basic Elements of System Theory: Delay, Positive and Negative Feedback Loop

From Phenomenology to Understanding

Technology Loop Implementation Loop

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

The Technology Adoption Life Cycle

Page 7: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 7

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

The Chasm in the Cycle …

� Geoffrey A.Moore: Crossing the Chasm

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Challenges for Computational Grids(from The Grid, edited by Ian Foster and Carl Kesselmann,1999)

• Distributed Supercomputing

• High-Throughput Computing

• On Demand Computing

• Data-Intensive Computing

• Collaborative Computing

Page 8: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 8

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Hype curve for Grid Computing

Technology

trigger

Peak of

inflated expectations

Trough of

disillusionment

Slope of

enlightenment

Plateau of

productivity

maturity

visibility

High Throughput Computing

ASP

DataGrid

Valley of death

Collaboration

Security

ITControlling

IT Optimization

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

High Throughput Computing in CAE

• In Industrial Applications a lot of calculations are done on x86

Linux clusters (some are still 32 bit codes)

• Most of the CAE Applications are now available for Itanium II

• Computational Fluid Dynamics where among the first parallel

codes

• CFD is better using the scalability and performance of Itanium II

• In Aerospace there are large clusters with Itanium II

• Especially LSF HPC is useful there to have controlled throughput

Page 9: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 9

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

An example for a commercial CFD software package that runs on Itanium 2 (since 2003) package is Powerflow from Exa

Commercial CFD Calculations

The postprocessor of PowerFlow is

Developed by science + computing

Partial List of PowerFlow Customers:

Audi AGBMW AGDaihatsuDaimler-Chrysler….

Example: The Audi AG decided in 2005 for a cluster of Intel Itanium 2 based 2 way HP Integrity rx2006 servers

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Time Line of Past Events at Audi

Middleware and Services from s+c

1995 Platform LSF

1999 IT-Services for Audi

2002 scVENUS

2003 EnginFrame

2003 openSoftware

2006 Site Licenses for Platform

LSF and scVENUS

Page 10: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 10

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Measuring the Utilization Rate with Platform LSF

Jobs runningJobs pending

LSF HPC Cluster Efficency 01/2004 – 05/2006 (Itanium II Cluster with HP-UX)

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Measuring the Utilization with LSF by month

LSF HPC Cluster efficency March 2007

# NodesJobs runningJobs pending

Page 11: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 11

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Some Benefits of a Batch System

� Selection of best compute server

� Avoid overloading machines

� Users don´t hunt for resources and job results

� More organized system administration

� Recovery from system crashes

� Analysis of the usage of resources

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Some Challenges for a Batch System

� Parametric Studies

� Job Dependencies

� Interactive Usage

� Processor Sets

� Control of Massive Parallel Jobs

� Prioritization of Parallel Jobs

� Configuration Changes in Productive Systems

Page 12: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 12

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Batch system functionalities

• Resources (Definition & Measurement, Requirements,

Reservation, Selection)

• Queues (administration, limits, interactive)

• Scheduling (FCFS, Fairshare, Preemption, Exclusive)

• Job Control (Suspend and Resume, Dependencies,

Arrays, Signal Handling, Checkpointing)

Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

What is the LSF Family of Products?

Products/Solutions

Page 13: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 13

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

History of Distributed Resource Management

Homogeneous SupportCommercial Product Public Domain Discontinued

Host Centric

NQS

DNQS

LoadLeveler

Codine

GridEngine

DQS

TaskBroker

LoadBalancer GNQS

NQE

ConnectQCondor

Network Centric

UtopiaLSF

DREAM

PBS

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

LSF Batch Architecture

.

.

..

resource

batch queues

Scheduler

jobs

Master

LIMinformation

master host server hostssubmission hosts

Page 14: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 14

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Concepts : Resources

� Resources

They are anything that can be quantified and assigned value so they can be

shared by users.

� Defining Resources

These characteristics ensure that Platform LSF will choose the best host

that matches the requirements of an application.

� Types of Resources

Classification by:

� availability (general/special resources)

� the way values are assigned (dynamic/static resources)

� types of values (numerical/string/boolean resources)

� definition (built-in/site specific resources)

� locality (shared/non-shared resources)

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Job Resource Requirementsfor dynamic distributed scheduling

Examples:itanium2 && starcd order[cpu]

� Lightly loaded/fast CPU of Itanium II host type that is configured for starcd runs

idle>10 && mem>=1000 && swap>1000 && !risc

� Non-Risc host with at least 1 GB avail. mem, 1 GB avail. swap, and been idle for 10 min or more

order [mem : login]

� Machine with lots of memory and fewest users

Page 15: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 15

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Platform LSF HPC

• For parallel jobs

– scheduling across hosts

– distributed parallel task startup, monitoring & control

– job-level resource consumption, monitoring & limits

– configurable application level job checkpointing & migration

� Support variety of parallel packages: MPI, PVM, HPF, …

� Support for cpu sets

� All functionalities of Platform LSF

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Platform LSF HPC for CPU Sets

Example: LSF HPC can create with psets (for HP-UX 11i for Superdome multiprocessor systems) efficient execution

environments for parallel jobs by:

� Processor isolation

� Processor distance

� Pset creation and deallocation

� LSF HPC topology adapter for psets (RLA)

Page 16: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 16

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Platform LSF HPC Architecture

The Parallel Application Manager

(PAM) in combination with the

Task Starter (TS) keeps control

of the distributed parallel threads

of the application

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Enterprise Grid-level: LSF Multicluster

Page 17: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 17

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Job Forwarding Model

Compute

ServersCompute

Servers

Site ASite B

Send

queueReceive

queue

You submitLSF cares for

• Job transfer

• data staging

• Account mapping

• Accounting

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Resource Leasing Model

Single system image, ease of admin, scalabilityEnable fairshare, preemption, pending reason support, chunk jobs, advance

reservation, interactive jobs, parallel jobs, … across clusters

Page 18: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 18

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Compute

Servers

Compute

Servers

Site A Site B

Easy to Configure

Compute

Servers

Compute

ServersSite A Site B

Send

queueReceive

queue

Begin Queue

QUEUE=lease

HOSTS= all@siteB

End Queue

Begin HostExport

PER_HOST = hopper curie

DISTRIBUTION = [siteA, 10]

MEM = 2 GB

End HostExport

Begin Queue

QUEUE= export

SNDJOBS_TO = import@siteB

End Queue

Begin Queue

QUEUE = import

RCVJOBS_FROM= siteA

End Queue

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

A customer that uses Multicluster

... seen from a distance

� 400 server

� 1000 cpus

� 1,000 accounts

� 20,000 jobs / day

Largest local compute farm

About 20 compute farms worldwide

Page 19: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 19

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Where is the limit and when is it reached?

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Summary

CFD Calculations are a major domain for the Itanium II

Throughput is more important than peak-performance.

Nearly all CAE applications run on Itanium II

Monitoring and analytic understanding for large clusters is required

Middleware provide not the smallest common denominator

butenables new technology to get integrated

Page 20: Middleware for Itanium Systems · EU Expert Group: “A grid provides an abstraction for resource sharing and collaboration across multiple administrative domains”. Commercial IBM:

IT-Symposium 2007 18.04.2007

www.hp-user-society.de 20

science + computing ag © 2007Roland Niemeier, IT-Symposium 2007, HP User Society, April 18th, 2007

Thank you for your attention!