Linux Cluster Architecture
Linux Users Group Slide # 1 October 5th, 2002Copyright © 2002 Alexander Vrenios
by
Alex Vreniosmailto://[email protected]
(Shameless Plug)
Linux Cluster Architecture
Linux Users Group Slide # 2 October 5th, 2002Copyright © 2002 Alexander Vrenios
Overview:• Why would anyone want to build a cluster system?
• Computer Architecture Review: UPs through Clusters
• Gathering the PC computer hardware (on the cheap!)
• Connecting the node computers into a local area network
• Configuring relevant Linux OS files for internetworking
• Client-Services and sockets make PCs work as a team
• The design of our simple master-slave cluster server
• Internal and external performance monitoring and tuning
Linux Cluster Architecture
Linux Users Group Slide # 3 October 5th, 2002Copyright © 2002 Alexander Vrenios
Why would anyone want to build a cluster system?• Hobbyists:
It’s a new and interesting pathway to experience; andhow many of your friends have a cluster server anyway?
• Professionals:Sophisticated systems are often developed in parallel,meaning the hardware won’t be ready when you want totest your software. Having a test bed will get you pastthe hardware independent bugs, and put you in a positionto polish your product when the platform is finally ready.
• Managers:This is all bleeding edge stuff; you’ll want to prepare forthe issues your people might face and the questions theymight ask. Experience gives you the insight you’ll need.
• Academics:Analyze data from a live system, instead of questionableand potentially over-simplified simulation output.
Linux Cluster Architecture
Linux Users Group Slide # 4 October 5th, 2002Copyright © 2002 Alexander Vrenios
Linux Cluster Architecture
Linux Users Group Slide # 5 October 5th, 2002Copyright © 2002 Alexander Vrenios
Computer Architecture Review:
• Uniprocessor or UP
RAMCPU
InstructionProcessor
Arithmeticand Logic
Instructionsand Data
INP
UT
OU
TP
UT
I/O
Port
Port
Data Bus
SISD*: Single instruction,single data stream.
The typical PC is a uniprocessor.
* Flynn proposed this taxonomy - some other configurations follow…
Linux Cluster Architecture
Linux Users Group Slide # 6 October 5th, 2002Copyright © 2002 Alexander Vrenios
• Array or Vector Processor
CPU0 CPU1 CPUN
Data
Instruction Bus
. . .
CONTROLLER
Instructions
A[0] , B[0]Data
A[1] , B[1]Data
A[n] , B[n]
SIMD: Single Instruction,multiple data stream.
(ILLIAC IV, IBM 390, DSPs, etc.)
• Pipeline Processor
Pipe: confluent instruction execution
Data Bus
CPU
RAM: Instructions and Data
MISD: Multiple instruction,single data stream?
(Some say there is no MISD.)
• Multiprocessor or MP
Data Bus
CPU0
RAM: Instructions and Data
CPU1 CPUN. . . MIMD: Multiple instruction,multiple data streams.
Mainframe, Workstation, etc.(Mostly for the very wealthy!)
Linux Cluster Architecture
Linux Users Group Slide # 7 October 5th, 2002Copyright © 2002 Alexander Vrenios
• The MIMD is so interesting that gets its own taxonomy:
MULT
ICOM
PUTE
RP(L
oose
ly-C
oupl
ed)
CPU0 CPU1 CPUN
Inst+Data Inst+Data Inst+Data
Local Area Network
. . .NORMA: No (hardware) RemoteMemory Access (rare distinction)
Older Beowulf Clusters, DistributedShared Memory Systems (IVY), andsome Modern day cluster computers.
CPU0 CPU1 CPUN
Inst+Data Inst+Data Inst+Data
High-speed Back Plane
. . .
PCs, whose “personality” maybe molded by its software!
Data Bus
CPU0
RAM: Instructions and Data
CPU1 CPUN. . . UMA: Uniform Memory Access
Tightly-coupled multiprocessor.All CPUs access instructions anddata at the same transfer rate.
MP
NUMA: Non-uniform memory access
VME Chassis, some Beowulf Clusters,and many embedded processor systems.
Plug-in boards, for example, eachWith a CPU and some local memory.
Linux Cluster Architecture
Linux Users Group Slide # 8 October 5th, 2002Copyright © 2002 Alexander Vrenios
Gathering PC Computer Hardware:
• Small computer stores (Renaissance Computer, e.g.)
• Newspaper and club and organization newsletter ads
• Family, friends and neighbors (closets, garage sales)
• Large corporations? (hospitals, Am Exp, Mot, etc.)
• Computer salvage outlets:
ASU Salvage
University
Pim
a 10
1
N Rio Salado
Linux Cluster Architecture
Linux Users Group Slide # 9 October 5th, 2002Copyright © 2002 Alexander Vrenios
Connecting the Node PCs into a LAN:
alpha
. . .
10.0.0.1
beta omegachaos.orgHub
10.0.0.2 10.0.0.5
IP Address Network Interface Cables
PC Rear View
(10/100 Base T)
RJ-45 Jack RJ-45 Plugs and Hub Ports
Built-in Network Interface PortsAdd-on
Linux Cluster Architecture
Linux Users Group Slide # 10 October 5th, 2002Copyright © 2002 Alexander Vrenios
Network Block Diagram:
CHAOS: CHeap Array of Outmoded Systems
Multicomputer Server
10MB ‘386Linux rh4.2
12MB ‘386Linux rh4.2
12MB ‘386Linux rh4.2
13MB ‘386Linux rh4.2
13MB ‘386Linux rh4.2
13MB ‘386Linux rh4.2
16MB ‘386Linux rh4.2
13MB ‘386Linux rh4.232MB ‘486
(p75 Equiv.)Linux rh4.2
NFS
InteractiveClient
ExternalMonitor
Desktop PC(Real-Time
Performance)
EthernetClient queries& responses
Linux Cluster Architecture
Linux Users Group Slide # 11 October 5th, 2002Copyright © 2002 Alexander Vrenios
Configuring a Linux Network – Local User Files:/ (the root directory)
/home
/home/chiefalpha:/home/chief/src> make pgm
/home/chief/bin
/home/chief/src/home/chief/inc
.rhosts
(others)pgm.c
pgm.hpgm
makefile
pgm:gcc -I../inc/ –o../bin/pgm pgm.c
Linux Cluster Architecture
Linux Users Group Slide # 12 October 5th, 2002Copyright © 2002 Alexander Vrenios
Configuring a Linux Network – Remote User Files:
• Network File System: the illusion of locality via remote-mount points
omega
chaos.org
alpha HubNFS Server NFS Client
/dev/hda /dev/hda10.0.0.1 10.0.0.5
/ /
/home
/home/chief
/home/chief/bin /home/chief/src
/home/chief/inc
/home
/home/chief
adduser
Linux Cluster Architecture
Linux Users Group Slide # 13 October 5th, 2002Copyright © 2002 Alexander Vrenios
Configuring a Linux Network:
• File /etc/hosts.equiv on every cluster node:
alpha.chaos.org chief. . .omega.chaos.org chief
• File /home/chief/.rhosts in the user’s home directory:
alpha.chaos.org chief. . .omega.chaos.org chief
• Test access using rsh, a remote shell command:
alpha:/home/chief> rsh omega
> Note that this and what follows may lead to a SECURITY leak!.
Linux Cluster Architecture
Linux Users Group Slide # 14 October 5th, 2002Copyright © 2002 Alexander Vrenios
Configuring a Linux Network (continued):
• First, file /etc/hosts belongs on all the network nodes:
127.0.0.1 localhost localhost.chaos.org10.0.0.1 alpha alpha.chaos.org
. . .10.0.0.5 omega omega.chaos.org
• File /etc/exports on 10.0.0.1, the NFS server named alpha:
/home (rw)
• File /etc/fstab on each cluster node except the server named alpha:
/dev/hda1 swap swap defaults 0 0/dev/hda2 / ext2 defaults 1 1alpha:/home /home nfs rw 0 0/dev/fd0 /mnt/floppy ext2 noauto 0 0none /proc proc defaults 0 0
Server
Clients
Linux Cluster Architecture
Linux Users Group Slide # 15 October 5th, 2002Copyright © 2002 Alexander Vrenios
Internetworking Services – Operation:
Screen Output
myfile
catrcatd
UDP or TCPsocket
Remote Machine
service
rcat
Local Machine> rcat myfile remoteFirst line in myfileSecond line, etc.
Keyboard Input
rcat myfile remote
Net
wor
k
Linux Cluster Architecture
Linux Users Group Slide # 16 October 5th, 2002Copyright © 2002 Alexander Vrenios
Internetworking services – Configuration (inetd):
• Add a line to file /etc/services on each remote-server node:
rcatd 5000/udp # remote-cat UDP service on port 5000
• Add a line to file /etc/inetd.conf on each remote-server node:
rcatd dgram udp wait chief /home/chief/bin/rcatd
• [Reconfiguration if necessary] omega:/root> killall -HUP inetd
Sequence of events:
1. Client process sends a UDP packet to server’s port 50002. Daemon (inetd) starts process at /home/chief/bin/rcatd3. Service reads incoming UDP packet data from “keyboard”
Refers toentry inservices
Linux Cluster Architecture
Linux Users Group Slide # 17 October 5th, 2002Copyright © 2002 Alexander Vrenios
Internetworking Services – Configuration (xinetd):• File /etc/xinetd.d/rcatd on each (xinetd) remote-server node:
service rcatd{
port = 5000socket_type = dgramprotocol = udpwait = yesuser = chiefserver = /home/chief/bin/rcatdonly_from = 10.0.0.0disable = no
}
• [Reconfiguration] omega:/root> /etc/rc.d/init.d/xinetd restart
Refers toname ofservice
Means 10.0.0.*
Linux Cluster Architecture
Linux Users Group Slide # 18 October 5th, 2002Copyright © 2002 Alexander Vrenios
Distributed Systems C-Language Skills:SUBTASKING INTERNETWORKING
main
subtask
Sockets
SharedMemory
main
subtask
Sockets
SharedMemory
remoteprocess
main
subtaskSocketsShared
Memory remoteservice
inetdmain
subtask
SIGALRM
SIGCHLD
Many examplesin the book!
Network
NETWORK SERVICESSIGNAL HANDLING
Network Network
Linux Cluster Architecture
Linux Users Group Slide # 19 October 5th, 2002Copyright © 2002 Alexander Vrenios
Master-Slave Cluster Server - Initialization:
Broadcast starts slave tasks…
slave
master
slave
slave
beta gamma
alpha delta
master starts local subtask, onefor each registering remote slave.
s1 s2 s3
Local subtasks contact slaves…
perform
perform perform
performslave
master
slave
slave
beta gamma
alpha delta
s1 s2 s3
all tasks start perform subtasks.
Linux Cluster Architecture
Linux Users Group Slide # 20 October 5th, 2002Copyright © 2002 Alexander Vrenios
Master-Slave Cluster Server - Operation:
Perform tasks send performance info to monitor…
perform
perform perform
performslave
master
slave
slave
beta gamma
alpha delta
s1 s2 s3
client monitorquery
response
incoming queries are processed by first available slave, via subtask.
Linux Cluster Architecture
Linux Users Group Slide # 21 October 5th, 2002Copyright © 2002 Alexander Vrenios
Real-Time Performance Monitoring – Internal:
• Resource utilization reporting via the /proc pseudo-files*:
- CPU Utilization in /proc/stat – Running Jiffy Counts in each State
cpu 1256 0 1566 565277
- Disk Reads and Writes in /proc/stat – Running I/O Counts
disk_rio 1270 0 0 0
disk_wio 1337 0 0 0/dev/hda
idlesystemniceuser
* Note that the exact meaning and content of proc files can be OS release dependent.
Linux Cluster Architecture
Linux Users Group Slide # 22 October 5th, 2002Copyright © 2002 Alexander Vrenios
Real-Time Performance Monitoring – Internal (continued):
• Resource utilization reporting (continued):
- Memory Utilization in /proc/meminfo – Current Values
Mem: 14942208 13713408 1228800 . . .
- Packets Sent and Received in /proc/net/dev – Running I/O Counts
lo: 80 0 0 0 0 80 0 0 0 0eth0: 115 0 0 0 0 68 0 0 0 0
FreeUsedTotal
TransmittedReceived
Linux Cluster Architecture
Linux Users Group Slide # 23 October 5th, 2002Copyright © 2002 Alexander Vrenios
Real-Time Performance Monitoring – monitor/perform(s):
NEAR REAL-TIME CLUSTER PERFORMANCE STATISTICS
10Base2+----ALPHA-----+ | +-----BETA-----+| Cpu Mem | | | Cpu Mem || 7% 94% |Rcvd 0 | 21 Rcvd| 28% 40% || Rio Wio +-----------+-----------+ Rio Wio || 1 0 |Sent 12 | 1 Sent| 0 1 |+---10.0.0.1---+ | +---10.0.0.2---+
|+----GAMMA-----+ | +----DELTA-----+| Cpu Mem | | | Cpu Mem || 2% 75% |Rcvd 2 | 0 Rcvd| 5% 56% || Rio Wio +-----------+-----------+ Rio Wio || 4 0 |Sent 0 | 10 Sent| 3 0 |+---10.0.0.3---+ | +---10.0.0.4---+
chaos.org
- Overall Network Loading -23 Pkts/sec
Linux Cluster Architecture
Linux Users Group Slide # 24 October 5th, 2002Copyright © 2002 Alexander Vrenios
Real-Time Performance Monitoring – External (displayed):
• Resource utilization reporting via a custom client process:
RESPONSE | OBSERVATIONSTIME (msec) | 10 20 30 40 50------------+----+----+----+----+----+----+----+----+----+----+
1 10 |11 20 |21 30 |************************31 40 |************************41 50 |**51 60 |61 70 |71 80 |81 90 |91 100 |
50 Total Observations
Average = 30 milliseconds …so what if you’re not happy with this level of performance?
Linux Cluster Architecture
Linux Users Group Slide # 25 October 5th, 2002Copyright © 2002 Alexander Vrenios
Performance Tuning – Defining Execution Phases:
Slave 2Slave 1
Ethernet
MASTER
S1
S2
Response
Query
STPTable
6
12
7
4DB
5
3
8
Client
Transittimes
Shared
Linux Cluster Architecture
Linux Users Group Slide # 26 October 5th, 2002Copyright © 2002 Alexander Vrenios
Performance Tuning – Execution Phase Times:
������������
������������
������������
������������
������������
������������
������������
������������
����������
������������
������������
������������
������������
������������
������
��������
������������
������������
������������
������������
������������
������������
������
����������
������������
������������
������������
������������
������������
������
����������
����������
�����
����������
����������
����������
����������
����������
����������
����������
�����
������������
����������
����������
����������
����������
�����
������������
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
�����
����������
����������
�����
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
�����
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
�����
����������
����������
�����
Initial MSI Phase Times
0.00
0.01
0.02
1 2 3 4 5 6 7 8
Execution Phases(Three Time Distributions)
Ave
rag
e T
ime
(se
con
ds)
���������� Expon
�����
Pulse���������� Sweep
Leave thefile open?
Not a SW issue.
Founda bug!
Linux Cluster Architecture
Linux Users Group Slide # 27 October 5th, 2002Copyright © 2002 Alexander Vrenios
Performance Tuning – Final Times:
������������
����������
������������
������������
������������
������������
������
��������
������������
������������
������������
������������
������
����������
������������
������������
������������
������������
������
����������
����������
����������
������������
����������
����������
����������
�����
������������
����������
����������
����������
����������
����������
����������
����������
�����
����������
����������
�����
����������
����������
����������
����������
����������
����������
����������
����������
����������
����������
�����
����������
����������
����������
����������
����������
����������
����������
�����
Final MSI Phase Times
0.00
0.01
0.02
1 2 3 4 5 6 7 8
Execution Phases(Three Time Distributions)
Ave
rag
e T
ime
(se
con
ds)
�����
Expon
����� Pulse���������� Sweep
Dramatic reduction!
About a 10% improvement(not too bad)
See book for further details on statisticaldistributions.
Linux Cluster Architecture
Linux Users Group Slide # 28 October 5th, 2002Copyright © 2002 Alexander Vrenios
Performance Tuning:
• The proof is in the pudding!
RESPONSE | OBSERVATIONSTIME (msec) | 10 20 30 40 50------------+----+----+----+----+----+----+----+----+----+----+
1 10 |11 20 |*******21 30 |*********************************31 40 |********41 50 |**51 60 |61 70 |71 80 |81 90 |91 100 |
50 Total Observations
Average = 25 milliseconds = 17% improvement!
Linux Cluster Architecture
Linux Users Group Slide # 29 October 5th, 2002Copyright © 2002 Alexander Vrenios
Further Details are in the Book:• Download all the source code for free:
http://www.samspublishing.com
- Search on "Linux cluster architecture“ or “Vrenios”
- Click on the “Downloads” link in the book description
1. Individual chapter examples are in zip files
2. A complete user chief environment is in a tar.gz file
• Book Signings:
Sep 8th Borders Chandler, Sunday @ 2pmSep 15th Borders Arrowhead, Sunday @ 2pmOct 25th Barnes & Noble Arrowhead, Friday @ 7pm
Linux Cluster Architecture
Linux Users Group Slide # 30 October 5th, 2002Copyright © 2002 Alexander Vrenios
References:Distributed Operating Systems, Andrew S. Tanenbaum
(of MINIX and AMOEBA fame!), Prentice Hall, 1995
Unix Distributed Programming, Chris Brown,Prentice Hall, 1994
Advanced Programming in the UNIX Environment,W. Richard Stevens, Addison-Wesley, 1992
“CHAOS: A CHeap Array of Outmoded Systems,” Alex Vrenios,LinuxGazette.com, October 1998
“CHAOS Part 2,” LinuxGazette.com, Alex Vrenios, December 1998
Linux Programming White Papers, Rushling, et al, Coriolis Open, 1999
Linux Cluster Architecture
Linux Users Group Slide # 31 October 5th, 2002Copyright © 2002 Alexander Vrenios
You’ve been a terrific audience!
Any questions?
Manifold Server - Final Performance Results
0
5
10
15
20
1 2 3 4 5 6 7
Number of Remote Workers
Tim
e fo
r 10
Que
ries
(se
con
ds)
. .
Hurry out andbuy this book!
Top Related