Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray...

53
Microsoft Keyboar

Transcript of Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray...

Page 1: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Microsoft Keyboard

Page 2: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Cluster and Grid Computing

Pittsburgh Supercomputing Center

John KochmarJ. Ray Scott(Derek Simmel)(Jason Sommerfield)

Page 3: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Pittsburgh Supercomputing CenterWho We Are

• Cooperative effort of– Carnegie Mellon

University

– University of Pittsburgh

– Westinghouse Electric

•Research Department of Carnegie Mellon •Offices in Mellon Institute, Oakland

–On CMU campus–Adjacent to University of Pittsburgh campus.

Page 4: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Westinghouse Electric Company

Energy Center, Monroeville, PA

Page 5: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Agenda

• HPC Clusters

• Large Scale Clusters

• Commodity Clusters

• Cluster Software

• Grid Computing

Page 6: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

TOP500 BenchmarkCompleted

October 1, 2001May August December February April May August

1999 1999 1999 2000 2000 2000 2000

October

2000

March

2001

August - October

2001

Page 7: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Three Systems in the Top 500

HP AlphaServer SC ES40 “TCSINI”Ranked 246 with 263.6 GFlops Linpack

Performance

Cray T3E900 “Jaromir”Ranked 182 with 341 GFlops Linpack Performance

HP AlphaServer SC ES45 “LeMieux”Ranked 6 with 4.463 TFlops Linpack Performance

Top Academic System

Page 8: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Cluster Node CountRank Installation Site Nodes

1 Earth Simulator Center 640

2 Los Alamos National Laboratory 1024

3 Los Alamos National Laboratory 1024

4 Lawrence Livermore National Laboratory 512

5 Lawrence Livermore National Laboratory 128

6 Pittsburgh Supercomputing Center 750

7 Commissariat a l'Energie Atomique 680

8 Forecast Systems Laboratory - NOAA 768

9 HPCx 40

10 National Center for Atmospheric Research 40

Page 9: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

One Year of Production

lemieux.psc.edu

Page 10: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

It’s Really all About Applications

• Single CPU with common data stream– seti@home

• Large Shared Memory Jobs

• Multi-CPU Jobs

• …but, let’s talk systems!

Page 11: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

HPC Systems Architectures

Page 12: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

HPC Systems

• Larger SMPs

• MPP- Massively Parallel Machines

• Non Uniform Memory Access (NUMA) machines

• Clusters of smaller machines

Page 13: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Larger SMPs

• Pros:– Use existing technology and management

techniques– Maintain parallelization paradigm (threading)– It’s what users really want!

• Cons:– Cache coherency gets difficult– Increased resource contention– Pin counts add up– Increased incremental cost

Page 14: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

HPC Clusters

• Rationale– If one box, can’t do it, maybe 10 can…– Commodity hardware is advancing rapidly– Potentially far less costly than a single larger

system– Big systems are only so big

Page 15: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

HPC Clusters

• Central Issues– Management of multiple systems– Performance

• Within each node

• Interconnections

– Effects on parallel programming methodology• Varying communication characteristics

Page 16: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

The Next Contender?

• CPU 128 Bit CPU

• System Clock Frequency 294.912 MHz

• 32MB Main Memory direct RDRAM

• Embedded Cache VRAM 4MB

• I/O Processor

• CD-ROM and DVD-ROM

Page 17: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Why not let everyone play?

Page 18: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

What’s a Cluster?Base Hardware

• Commodity Nodes– Single, Dual, Quad, ???– Intel, AMD– Switch port cost vs cpu

• Interconnect– Bandwidth– Latency

• Storage– Node local– Shared filesystem

Page 19: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Terascale Computing System

Hardware SummaryHardware Summary• 750 ES45 Compute Nodes• 3000 EV68 CPU’s @ 1 GHz

• 6 Tflop • 3 TB memory• 41 TB node disk, ~90GB/s• Multi-rail fat-tree network• Redundant Interactive nodes• Redundant monitor/ctrl• WAN/LAN accessible

• File servers: 30TB, ~32 GB/s• Mass Store buffer disk, ~150 TB• Parallel visualization• ETF coupled

QuadricsControl

LAN

Compute Nodes

File Servers/tmp

WAN/LAN

Interactive

/usr

Page 20: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

• Compute Nodes• AlphaServer ES45

– 5 nodes per cabinet

– 3 local disks /node

Page 21: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Row upon row…

Page 22: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

PSC/HP Grid Alliance• A strategic alliance to demonstrate

the potential of the National Science Foundation's Extensible TeraGrid

• 16 Node HP Itanium2/Linux cluster• Through this collaboration, PSC and

HP expect to further the TeraGrid goals of enabling scalable, open source, commodity computing on IA64/Linux to address real-world problems

Page 23: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

What’s a Cluster?Base Hardware

• Commodity Nodes– Single, Dual, Quad, ???– Switch port cost vs cpu

• Interconnect– Bandwidth– Latency

• Storage– Node local– Shared filesystem

Page 24: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 25: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Cluster InterconnectLow End

• 10/100 Mbit Ethernet– Very cheap– Slow with High Latency

• Gigabit Ethernet– Sweet Spot– Especially with:

• Channel Bonding

• Jumbo Frames

Page 26: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Cluster Interconnect, cont.Mid-Range

• Myrinet– http://www.myrinet.com/

– High speed with Good (not great) latency

– High port count switches

– Well adopted and supported in the Cluster Community

• Infiniband– Emerging

– Should be inexpensive and pervasive

Page 27: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Cluster Interconnect, cont.Outta Sight!

• Quadrics Elan– http://www.quadrics.com/– Very High Performance

• Great Speed• Spectacular Latency

– Software• RMS• QSNET

– Becoming more “Commodity”

Page 28: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 29: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

512-1024 way switch. (4096 & 8192-way same but bigger!)

Switches..8: 8*(16-way)

8-16: 64U64D

• • • • 8 • • • •

• • • • 8-16 • • •(13 for TCS)

FederatedFederated switch switch

Page 30: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Overhead Cables

Page 31: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Fully wired Fully wired switch cabinetswitch cabinet

1 of 24.1 of 24. Wires up & downWires up & down

Wiring:Wiring: QuadricsQuadrics

Page 32: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

What’s a Cluster?Base Hardware

• Commodity Nodes– Single, Dual, Quad, ???– Switch port cost vs cpu

• Interconnect– Bandwidth– Latency

• Storage– Node local– Shared filesystem

Page 33: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 34: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 35: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Commodity Cache Servers

• Linux• Custom Software

– libtcom/tcsiod– Coherency Manager (SLASH)

• Special Purpose DASP– Connection to Outside– Multi-Protocol

• *ftp• SRB• Globus

• 3Ware SCSI/ATA Disk Controllers

Page 36: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

What’s a Cluster?System Software

• Installation

• Replication

• Consistency

• Parallel File System

• Resource Management

• Job Control

Page 37: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

•Installation

•Replication

•Consistency

Page 38: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Users

Job Management Software

queues

submit

Batch JobManagement

“Simon”scheduler

TCS schedulingpractices

PBS/RMS

Job invocation

usageaccounting

database

Monitoring

What’s next? supply

process distributionexecution, control

PSC NSF

Visualization

Nodes

tcscommCheckpoint / restart

user file servers

HSM

tcscopy / hsmtcscopy

requeue

tcscomm

node eventmanagement

call tracking andfield service db

user notification

demand

Compute Nodes

CPR

CPR

CPR

CPR

PSC Terascale Computing System

Page 39: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 40: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 41: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Monitoring Non-Contiguous Scheduling

Page 42: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 43: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 44: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

What’s a Cluster?Application Support

• Parallel Execution

• MPI– http://www.mpi-forum.org/

• Shared Memory

• Other…– Portals– Global Arrays

Page 45: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Building Your Cluster

• Pre-Built– PSSC – Chemistry– Tempest

• Roll-your-Own– Campus Resources– Web

• Use PSC– Rich Raymond ([email protected])– http://www.psc.edu/homepage_files/state_funding.html

Page 46: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

• Open Source Cluster Application Resources

• Cluster on a CD – automates cluster install process

• Wizard driven• Nodes are built over network• OSCAR <= 64 node clusters

for initial target• Works on PC commodity

components• RedHat based (for now)• Components: Open source

and BSD style license• NCSA “Cluster in a Box”

basewww.oscar.sourgeforge.net

• Enable application scientists to build and manage their own resources

– Hardware cost is not the problem– System Administrators cost

money, and do not scale– Software can replace much of

the day-to-day grind of system administration

• Train the next generation of users on loosely coupled parallel machines

– Current price-performance leader for HPC

– Users will be ready to “step up” to NPACI (or other) resources when needed

• Rocks scales to Top500 sized resources

– Experiment on small clusters– Build your own supercomputer

with the same software!www.rockscluster.org

scary technology

Page 47: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 48: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

GriPhyN and European DataGrid

Virtual Data ToolsRequest Planning and

Scheduling ToolsRequest Execution Management Tools

Transforms

Distributed resources(code, storage,computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid Services

Other Grid Services

Interactive User Tools

Production Team

Individual Investigator Other Users

Raw data source

Illustration courtesy C. Catlett, ©2001 Global Grid Forum

Page 49: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Page 50: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Extensible Terascale Facility - ETF "TeraGrid"

NCSA: Compute-IntensiveSDSC: Data-Intensive PSC: Compute-Intensive

IA64

IA64 Pwr4 EV68

IA32

IA32

EV7

IA64 Sun

10 TF IA-64128 large memory nodes

230 TB Storage

5 TF IA-64DB2 Server 500 TB Storage1.1 TF Power4

6 TF EV6871 TB Storage

0.3 TF EV7 shared-memory150 TB Storage Server

1.25 TF IA-6496 Visualization nodes

20 TB Storage

0.4 TF IA-64IA32 Datawulf80 TB Storage

Extensible Backplane Network

LAHub

ChicagoHub

IA32

Storage Server

Disk Storage

Cluster

Shared Memory

VisualizationCluster

LEGEND

Storage Server

Disk Storage

Cluster

Shared Memory

VisualizationCluster

LEGEND

30 Gb/s

IA64

30 Gb/s

30 Gb/s30 Gb/s

30 Gb/s

Sun Sun

ANL: VisualizationCaltech: Data collection analysis

40 Gb/s

Page 51: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Grid Building Blocks

Middleware:Hardware and software infrastructure to enable access to

computational resources

Services:Security Information ServicesResource Discovery / LocationResource Management Fault Tolerance / Detection

Page 52: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

www.globus.org

Page 53: Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)

Thank You

lemieux.psc.edu