Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray...
-
Upload
thomas-carrow -
Category
Documents
-
view
214 -
download
0
Transcript of Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray...
Microsoft Keyboard
Cluster and Grid Computing
Pittsburgh Supercomputing Center
John KochmarJ. Ray Scott(Derek Simmel)(Jason Sommerfield)
Pittsburgh Supercomputing CenterWho We Are
• Cooperative effort of– Carnegie Mellon
University
– University of Pittsburgh
– Westinghouse Electric
•Research Department of Carnegie Mellon •Offices in Mellon Institute, Oakland
–On CMU campus–Adjacent to University of Pittsburgh campus.
Westinghouse Electric Company
Energy Center, Monroeville, PA
Agenda
• HPC Clusters
• Large Scale Clusters
• Commodity Clusters
• Cluster Software
• Grid Computing
TOP500 BenchmarkCompleted
October 1, 2001May August December February April May August
1999 1999 1999 2000 2000 2000 2000
October
2000
March
2001
August - October
2001
Three Systems in the Top 500
HP AlphaServer SC ES40 “TCSINI”Ranked 246 with 263.6 GFlops Linpack
Performance
Cray T3E900 “Jaromir”Ranked 182 with 341 GFlops Linpack Performance
HP AlphaServer SC ES45 “LeMieux”Ranked 6 with 4.463 TFlops Linpack Performance
Top Academic System
Cluster Node CountRank Installation Site Nodes
1 Earth Simulator Center 640
2 Los Alamos National Laboratory 1024
3 Los Alamos National Laboratory 1024
4 Lawrence Livermore National Laboratory 512
5 Lawrence Livermore National Laboratory 128
6 Pittsburgh Supercomputing Center 750
7 Commissariat a l'Energie Atomique 680
8 Forecast Systems Laboratory - NOAA 768
9 HPCx 40
10 National Center for Atmospheric Research 40
One Year of Production
lemieux.psc.edu
It’s Really all About Applications
• Single CPU with common data stream– seti@home
• Large Shared Memory Jobs
• Multi-CPU Jobs
• …but, let’s talk systems!
HPC Systems Architectures
HPC Systems
• Larger SMPs
• MPP- Massively Parallel Machines
• Non Uniform Memory Access (NUMA) machines
• Clusters of smaller machines
Larger SMPs
• Pros:– Use existing technology and management
techniques– Maintain parallelization paradigm (threading)– It’s what users really want!
• Cons:– Cache coherency gets difficult– Increased resource contention– Pin counts add up– Increased incremental cost
HPC Clusters
• Rationale– If one box, can’t do it, maybe 10 can…– Commodity hardware is advancing rapidly– Potentially far less costly than a single larger
system– Big systems are only so big
HPC Clusters
• Central Issues– Management of multiple systems– Performance
• Within each node
• Interconnections
– Effects on parallel programming methodology• Varying communication characteristics
The Next Contender?
• CPU 128 Bit CPU
• System Clock Frequency 294.912 MHz
• 32MB Main Memory direct RDRAM
• Embedded Cache VRAM 4MB
• I/O Processor
• CD-ROM and DVD-ROM
Why not let everyone play?
What’s a Cluster?Base Hardware
• Commodity Nodes– Single, Dual, Quad, ???– Intel, AMD– Switch port cost vs cpu
• Interconnect– Bandwidth– Latency
• Storage– Node local– Shared filesystem
Terascale Computing System
Hardware SummaryHardware Summary• 750 ES45 Compute Nodes• 3000 EV68 CPU’s @ 1 GHz
• 6 Tflop • 3 TB memory• 41 TB node disk, ~90GB/s• Multi-rail fat-tree network• Redundant Interactive nodes• Redundant monitor/ctrl• WAN/LAN accessible
• File servers: 30TB, ~32 GB/s• Mass Store buffer disk, ~150 TB• Parallel visualization• ETF coupled
QuadricsControl
LAN
Compute Nodes
File Servers/tmp
WAN/LAN
Interactive
/usr
• Compute Nodes• AlphaServer ES45
– 5 nodes per cabinet
– 3 local disks /node
Row upon row…
PSC/HP Grid Alliance• A strategic alliance to demonstrate
the potential of the National Science Foundation's Extensible TeraGrid
• 16 Node HP Itanium2/Linux cluster• Through this collaboration, PSC and
HP expect to further the TeraGrid goals of enabling scalable, open source, commodity computing on IA64/Linux to address real-world problems
What’s a Cluster?Base Hardware
• Commodity Nodes– Single, Dual, Quad, ???– Switch port cost vs cpu
• Interconnect– Bandwidth– Latency
• Storage– Node local– Shared filesystem
Cluster InterconnectLow End
• 10/100 Mbit Ethernet– Very cheap– Slow with High Latency
• Gigabit Ethernet– Sweet Spot– Especially with:
• Channel Bonding
• Jumbo Frames
Cluster Interconnect, cont.Mid-Range
• Myrinet– http://www.myrinet.com/
– High speed with Good (not great) latency
– High port count switches
– Well adopted and supported in the Cluster Community
• Infiniband– Emerging
– Should be inexpensive and pervasive
Cluster Interconnect, cont.Outta Sight!
• Quadrics Elan– http://www.quadrics.com/– Very High Performance
• Great Speed• Spectacular Latency
– Software• RMS• QSNET
– Becoming more “Commodity”
512-1024 way switch. (4096 & 8192-way same but bigger!)
Switches..8: 8*(16-way)
8-16: 64U64D
• • • • 8 • • • •
• • • • 8-16 • • •(13 for TCS)
FederatedFederated switch switch
Overhead Cables
Fully wired Fully wired switch cabinetswitch cabinet
1 of 24.1 of 24. Wires up & downWires up & down
Wiring:Wiring: QuadricsQuadrics
What’s a Cluster?Base Hardware
• Commodity Nodes– Single, Dual, Quad, ???– Switch port cost vs cpu
• Interconnect– Bandwidth– Latency
• Storage– Node local– Shared filesystem
Commodity Cache Servers
• Linux• Custom Software
– libtcom/tcsiod– Coherency Manager (SLASH)
• Special Purpose DASP– Connection to Outside– Multi-Protocol
• *ftp• SRB• Globus
• 3Ware SCSI/ATA Disk Controllers
What’s a Cluster?System Software
• Installation
• Replication
• Consistency
• Parallel File System
• Resource Management
• Job Control
•Installation
•Replication
•Consistency
Users
Job Management Software
queues
submit
Batch JobManagement
“Simon”scheduler
TCS schedulingpractices
PBS/RMS
Job invocation
usageaccounting
database
Monitoring
What’s next? supply
process distributionexecution, control
PSC NSF
Visualization
Nodes
tcscommCheckpoint / restart
user file servers
HSM
tcscopy / hsmtcscopy
requeue
tcscomm
node eventmanagement
call tracking andfield service db
user notification
demand
Compute Nodes
CPR
CPR
CPR
CPR
PSC Terascale Computing System
Monitoring Non-Contiguous Scheduling
What’s a Cluster?Application Support
• Parallel Execution
• MPI– http://www.mpi-forum.org/
• Shared Memory
• Other…– Portals– Global Arrays
Building Your Cluster
• Pre-Built– PSSC – Chemistry– Tempest
• Roll-your-Own– Campus Resources– Web
• Use PSC– Rich Raymond ([email protected])– http://www.psc.edu/homepage_files/state_funding.html
• Open Source Cluster Application Resources
• Cluster on a CD – automates cluster install process
• Wizard driven• Nodes are built over network• OSCAR <= 64 node clusters
for initial target• Works on PC commodity
components• RedHat based (for now)• Components: Open source
and BSD style license• NCSA “Cluster in a Box”
basewww.oscar.sourgeforge.net
• Enable application scientists to build and manage their own resources
– Hardware cost is not the problem– System Administrators cost
money, and do not scale– Software can replace much of
the day-to-day grind of system administration
• Train the next generation of users on loosely coupled parallel machines
– Current price-performance leader for HPC
– Users will be ready to “step up” to NPACI (or other) resources when needed
• Rocks scales to Top500 sized resources
– Experiment on small clusters– Build your own supercomputer
with the same software!www.rockscluster.org
scary technology
GriPhyN and European DataGrid
Virtual Data ToolsRequest Planning and
Scheduling ToolsRequest Execution Management Tools
Transforms
Distributed resources(code, storage,computers, and network)
Resource Management
Services
Resource Management
Services
Security and Policy
Services
Security and Policy
Services
Other Grid Services
Other Grid Services
Interactive User Tools
Production Team
Individual Investigator Other Users
Raw data source
Illustration courtesy C. Catlett, ©2001 Global Grid Forum
Extensible Terascale Facility - ETF "TeraGrid"
NCSA: Compute-IntensiveSDSC: Data-Intensive PSC: Compute-Intensive
IA64
IA64 Pwr4 EV68
IA32
IA32
EV7
IA64 Sun
10 TF IA-64128 large memory nodes
230 TB Storage
5 TF IA-64DB2 Server 500 TB Storage1.1 TF Power4
6 TF EV6871 TB Storage
0.3 TF EV7 shared-memory150 TB Storage Server
1.25 TF IA-6496 Visualization nodes
20 TB Storage
0.4 TF IA-64IA32 Datawulf80 TB Storage
Extensible Backplane Network
LAHub
ChicagoHub
IA32
Storage Server
Disk Storage
Cluster
Shared Memory
VisualizationCluster
LEGEND
Storage Server
Disk Storage
Cluster
Shared Memory
VisualizationCluster
LEGEND
30 Gb/s
IA64
30 Gb/s
30 Gb/s30 Gb/s
30 Gb/s
Sun Sun
ANL: VisualizationCaltech: Data collection analysis
40 Gb/s
Grid Building Blocks
Middleware:Hardware and software infrastructure to enable access to
computational resources
Services:Security Information ServicesResource Discovery / LocationResource Management Fault Tolerance / Detection
www.globus.org
Thank You
lemieux.psc.edu