Introduction to Cluster Introduction to Cluster ComputingComputing
Why we need High Performance Computer?
Hypothesis Experiment
Simulation and Modeling
Hypothesis Experiment
The change in scientific discovery process
Compute-Intensive Applications
• Simulation and Modeling problems:– Based on successive approximations. More calculations,
better results– Optimization problems
• Problems that dependent on computations andmanipulations of large amounts of data
• Example– Weather Prediction– Image and Signal Processing , Graphics– Database and Data Mining
CFD for Clean room
• Analyzing behaviour of air flow in clean room for electronics industry
• Collaboration project– Suranaree University of Technology– Kasetsart University– Funded by NECTEC
CFD Software• CAMETA version 3.0
(SUT)– Time-independent
(steady-state) solution– Three-dimensional
domain– Cartesian coordinate
system• Physical quantities of
interest:– Temperature
distribution– Relative humidity
distribution– Particle concentration
distribution
Molecular Dynamic Simulation
• Drug Discovery using molecular docking– Avian Flu– HIV
• Analyzing property of Chemical compound
SWAP Model Parameter identificationSWAP Model Parameter identification-- Data Assimilation using RS and GA Data Assimilation using RS and GA
–– (KU/AIT)(KU/AIT)
0.00
1.00
2.00
3.00
4.00
0 45 90 135 180 225 270 315 360Day Of Year
Evap
otra
nspi
ratio
nLA
I
RS ObservationRS Observation
SWAP Crop Growth ModelSWAP Crop Growth Model
SWAP Input ParametersSWAP Input Parameters
sowing date, soil property, sowing date, soil property, Water management, and etc.Water management, and etc.
LAI, LAI, EvapotranspirationEvapotranspiration
0.00
1.00
2.00
3.00
4.00
0 45 90 135 180 225 270 315 360Day Of Year
Eav
potra
nspi
ratio
nLA
I
FittingFitting
LAI, LAI, EvapotranspirationEvapotranspiration
Assimilation by Assimilation by finding Optimized finding Optimized
parametersparameters
By GABy GA
RSRS ModelModel
Challenges
• The calculation time for identify SWAP parameters only for 1 pixel (1 sq.km) takes several minutes to 30 minutes.
• Thus, a RS image of 1000 x 1000 sq.km of 1000x1000 pixels will take more than 50 years (30min x 1000 x 1000) is not acceptable.
• Solutions– Parallel Computing
• Longitude: 100.008133• Latitude: 14.388195
22/08/49 TAM2005 9
Graphics Rendering and Special Effect
• Rendering– Generating 3D image from Model
• Problem – Rendering is a time consuming
process especially for complex and realistic scene
– Massive number of rendering job needed to be done to create a movie
Top500 Fastest Installed Computers
• www.top500.org• List the top 500 supercomputer at sites
worldwide.• Provides a snapshot of the SC installed around
the world.• Began in 1993, published every 6 months.• Measures the performance for the TPP Linpack
Benchmark• Provides a way to measure trends
Parallel Processing• Solving large problem
by breaking it in to a number of smallproblems, then solvethem on multipleprocessors at the sametime
• Real life example– building a house using
many workers workingon separate part
– Assembly line in factory
LARGE SCALEPROBLEM
SUB TASK SUB TASK SUB TASK SUB TASK SUB TASK
PROCESSOR PROCESSOR PROCESSOR PROCESSOR
Parallel Processing
Parallel Computer
• Parallel computer is a special computer that is designed for parallel processing – Multiple processors – High-speed Interconnection network that link
these processors together – Hardware and software that help coordinate
the computing tasks• Sometime this tyep of computer are called
MPP (Massively Parallel Processor) Computer
Compare with SMP
Introduction
• Cluster computing is a technology related to the building of high performance scalable computing system from a collection of small computing system and high speed interconnection network
Applications• Scientific computing
– CAD/CAM– Bioinformatics– Large scale financial analysis– Simulation – Drug Design– Automobile Design ( Crash Simulation)
• IT infrastructure– Scalable web server, Search engine
(Google use more than 10000 node servers)
• Entertainment– Rendering– On-line Gaming
Why? Price Performance!
Beowulf Clustering
• Clustering technology originated from Beowulf Project at NASA– Project started by Thomas Sterling
and Donald Becker• Characteristics of Beowulf Cluster
– Use PC based hardware and commodity components to gain high performance at low cost
– Based on Linux OS, and open source software
– Very popular in high performance scientific computing world, moving quickly toward enterprise uses.
Beowulfs!
Beowulf Software Architecture
HTCHPC
HPTC
How to use cluster system for your work
• High Throughput Computing– Use cluster as a collection of processors– Run sequential applications– Using job scheduler to control the execution
• High Performance Computing– Use cluster as a parallel machine– Develop parallel application to harness the
computing power from the cluster
What is needed?
InterconnectionInterconnectionNetworkNetwork
NodesNodesSoftwareSoftware
Cluster System Structure
Compute Node
InterconnectionNetwork
Frontend
Component
• Node– CPU– Memory– Motherboard– Harddisk– Display card– Chassis
• Network• Others Accessories
– KVM Switch– UPS
CPU
• Various architecture – x86 or x86-64• x86/IA32 – Pentium 4, Xeon, Xeon MP,
Athlon XP• x86-64 – Athlon 64, Athlon 64 FX• IA64 – Itanium 2• Effect to chassis, heat sink, and power
supply
Heat Sink
• 2 Types of CPU – Box and Tray• Box equips with heat sink and fan• Tray has only CPU• Box is recommended
CPU Performance
Memory
• Speed and bandwidth must be compatible with CPU– Pentium 4 - DDR– Athlon XP – DDR– Athlon Opteron – Dual Channel DDR– Athlon 64 – Dual Channel DDR– Athlon 64 FX – Dual Channel DDR
Memory Performance
Motherboard
• Must be compatible with CPU• Bus speed• Compute node may be use full option
motherboard, for example, LAN and VGA on board– Advantage for rack system
Selected Motherboard Features
• Wake-on-LAN• RAID• PXE• PCI Slot• Onboard VGA• Onboard Gigabit Network
Pre-boot Execution Environment
• Network boot• Network must provide DHCP and TFTP
service• After POST, node use DHCP to request IP
and boot server• After node get its IP, it download OS from
boot server via TFTP• Advantage: installation via PXE is very
convenient or diskless cluster
PCI
• Peripheral Component Interconnect• Internal data path• Data rate depends on clock speed and
number of bits• Currently, 33 and 66 MHz with 32 and 64
bit wide• 2 Voltage lever – 3.3 and 5 V
PCI-X
• Next generation PCI• Compatible with PCI 3.3 V 66 MHz• PCI-X version 2.0 operates at 266 to 1066
MHz. The data rate is about 2.6 to 8.5 Gb/s
Harddisk
• Store system and user data• Various interface
– IDE or ATA– SATA– SCSI
IDE (Integrated Drive Electronics)
• So call ATA (AT Attachment)• Low cost• Ease of installation because very motherboard
support• 1 Interface can connect up to 2 drive: master and
slave• Maximum speed 133 MB/s by 80 line connector
by Ultra ATA/133 standard• Rotation speed: 7,200 RPM, Seek time: 9 ms• Size 10 – 200 GB (Seagate)
SATA – Serial ATA
• A bit more expensive that ATA
• Higher speed than ATAby 1-5%
• Transfer speed: 150 MB/s• Rotation Speed: 7200
RPM, Seek time about 9ms
• Size 80 – 200 GB• Need new kernel 2.6.2
SCSI (Small Computer System Interface)
• Intelligent device: Can connect various device, for example, Harddisk Scanner CD-ROM/RW Tape drive
• High cost for high-end system• Only server’s motherboard supports SCSI,
Low cost motherboard needs SCSI adapter
• 1 Intecface can connect up to 15 devices
SCSI (cont’d)
• Rotation speed: 10000 – 150000 RPM, Seek time is about 4.5 ms
• Various speed. Currently available speed are between 20 MB/s (UltraSCSI) to 320MB/s (Ultra 320 หรือ Ultra4 SCSI)
• Capacity 18 – 147 GB (Seagate)
Harddisk Performance
RAID
• Redundance Array of Independent Disk• Increase capacity, speed, and reliable of
data• Logically merge many hard disk to a big
one• Both software and hardware• Normally, RAID is built over SCSI hard
disks• RAID various level. The popular is 0, 1 and5
RAID Level
• Level 0 – Stripping Increase capacity and speed. None of the space is wasted as long as the hard drives used are identical
• Level 1 – Mirror increase data reliable via redundancy
• Level 5 – Block level striping and distributed parity. Increase capacity and data reliability
RAID Implementation
• Hardware – RAID adapter, for example, Adaptec. Some motherboard has build-in RAID controller– Linux supports many hardware RAID vendor,
especially Adaptec– There is only and logical RAID drive at boot time
• Software – Linux can emulate RAID 0, 1, and 5– Can be done by creating partition with “RAID” type,
then use RAID utility to create RAID drive. The RAID drives are mapped to /dev/md0, /dev/md1, etc.
RAID Performance
Display
• Monitor– Resolution at least 1024 × 768– May be LCD or CRT
• Display Adapter– Linux supports most display adapter– On-board display adapter is recommended for
ease of installation
Chassis
• 3 main types– Tower– Rack-mount– Blade
Tower Chassis
• The most popular• 2 size – Tower and mini-Tower• Advantage – Low cost, ease of
maintenance, installation, and air flow• Disadvantage – big size and wiring
problem in large cluster
Rock-Mount
• Smaller – suitable for large cluster
• Advantage – design for rack
• Disadvantage – the smaller, the problem with air flow
Blade
• Rack 1U is small but wiring is still a problem
• Blade-server is a case that contain one or morehot-swappable device called blade
• Each blade is a computer that share some equipment such as power supply, network connection
System Interconnect
• Interconnection among nodes is the key to cluster performance
• Consideration– Cost– Bandwidth and Latency– Scalability of the interconnection
• Technology used– Fast Ethernet (Switch)– SCI (Scalable Coherent Interface)– Gigabit Ethernet– Myrinet– Infiniband
Performance of MPICH-GM
• High Performance LinpackHPL Runtime
0
10
20
30
40
50
60
70
80
90
100
4000 5000 6000 7000 8000
Problem Size
Run
tim
e (s
ec)
MyrinetEthernet
Interconnection Network
• At least 1 GB/s• Various vendor and standard
– Gigabit Ethernet– Myrinet– Infiband
Gigabit Ethernet• IEEE 802.3• 1 Gb/s• Commodity hardware• Connect by switch• IEEE 802.3z using Fiber
or STP• IEEE 802.3ab using CAT
5 UTP• PCI 64 internal bus is
recommended
Gigabit Ethernet Switch
• Support both layer 2 or layer 3 switch• Cluster can use layer 2 switch because
routing function is not required• For future upgrade, stackable switch is
recommended
Myrinet
• Proprietary network• Operate at 2+2 Gb/s• High scalability• On-board
communication processor reduce main CPU utilization
• Support both PCI-X both PCI 64
• Fiber
Infiniband
• Infiniband is technology for connecting remote storage and network
• Create IPC cluster• Operate at 2.5 Gb/s with copper or fiber
cable
Network Performance
• Data from NetPIPE 3.6
KVM Switch
• Keyboard Video Mouse• Connect multiple nodes to single KVM• Very convenient
UPS
• Uninterruptable Power Supply• Emergency power source• Filter surge current from AC out let,
telephone line, and LAN• Should be able to notify CPU to shut down
the system• Load is in VA unit• Capacity is specified at batteries
Basic Clustering Software• Linux (Redhat, Mandrake, Fedora, etc.)• Cluster Distribution
– ROCKS, SCE, OSCAR• Management tools and environment
– SCMS, Ganglia• Programming
– Compilers : PGI, Intel, gcc– Parallel Programming PVM, MPI
• Library– PGAPack, Scalapack, PetSC, Linpack
• Load Scheduler– OpenPBS, SGE , SQMS, LSF
• Package– NWCHEM, GAMESS, Fluent, Oracle 10G
Cluster Distribution
• Custom made Linux distribution that can be used to build a cluster– Pre-selected set of software– Automatic Installation tool or “Builder”– Easy to use but user must follow some guide
line• Example: Scyld, Oscar, Rocks and SCE
NPACI ROCKSNational Partnership for Advanced Computational Infrastructure
• A scalable, complete and fully automated cluster deployment solution with out of the box rational default settings.
• Developed by– San Diego Supercomputer
Centre - Grid & Cluster Computing Group
– UC Berkeley Millennium Project
– SCS Linux Competency Centre
Rocks Philosophy
• NPACI ROCKS is an entire cluster-aware distribution– Full Red Hat release (tracks closely)– NPACI ROCKS packages– Installation, configuration & management tools– Integrated de-facto standard cluster packages
• PBS/MAUI, MPICH, ATLAS, ScaLAPack, HPL...• SGE, PVFS (added by SCS-LCC)
Integrated, easy to manage cluster system!
Rocks Philosophy
• Integrated, easy to manage cluster system.• Excellent scaling to large number of nodes.• Single consistent cluster management
methodology.• Avoid version skew and maintains
consistent image across cluster!• Highly skilled and dedicated development
team in SDCS, UC Berkeley and SCS-LCC.
• SUPPORT!
Close Cluster• Every node is behind
main node or frontend• Pros
– Suitable for parallel program because external traffic does not interfere cluster communication within cluster
– Ease of security• Cons
– If frontend is failure, user cannot access all nodes
Summary
• Cluster is a new dominant platform for HPC– Powerful– Cost effective
• A combination of software/hardware is needed to build a cluster
• Application is the key to harness the power of cluster
How to measure your performance
• standard open-source benchmarking tools– HPL (High Performance Linpack)– Linpack– Stream - memory benchmark– Bonnie++ - I/O benchmark– IOZone - Intensive I/O benchmark– IPerf - network performance snapshot– NetPipe - intensive bandwidth test
HPL (High Performance Linpack) and Linpack
• HPL– a software package that solves a (random) dense
linear system in double precision (64 bits) arithmetic on distributed-memory computers.
– a standard benchmark tool for Cluster computer, used at top500.org
– Performance reported in Flops (Floating-point operation per second)
– Rely on MPI (Message-Passing Interface)– More info. on http://www.netlib.org/benchmark/hpl/
• Linpack– Old implementation of Linpack, used to determined
single node performance
Stream
• Sustainable Memory Bandwidth in High Performance Computers
• Used to measure memory bandwidth using various method
• Performance reported in Mbytes/sec• More info. on
http://www.netlib.org/benchmark/hpl/
Bonnie++ and IOZone• Bonnie++
– a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance
– tests database type access to a single file– Performance reported in I/O rate and Kbytes/sec– More info. on http://www.coker.com.au/bonnie++/
• IOZone– a file system benchmark tool. The benchmark
generates and measures a variety of file operations– Performance intensive I/O benchmark on various file
size and record size– More info. on http://www.iozone.org/
Iperf and Netpipe
• Iperf– a tool to measure maximum TCP bandwidth, allowing
the tuning of various parameters and UDP characteristics.
– Report data as Kbps or Mbps– More info. http://dast.nlanr.net/Projects/Iperf/
• Netpipe– a protocol independent performance tool that visually
represents the network performance under a variety of conditions
– More info. on http://www.scl.ameslab.gov/netpipe/
Thank you
Thank you
Top Related