Linux High Availability Cluster Selection
-
Upload
ritesh-shravs -
Category
Documents
-
view
226 -
download
0
Transcript of Linux High Availability Cluster Selection
-
8/9/2019 Linux High Availability Cluster Selection
1/34
1 11
Linux High Availability Cluster Selection
Tim Burke
-
8/9/2019 Linux High Availability Cluster Selection
2/34
1 11
Which cluster product is right for me ?
There is no one size fits allwinner
Rapidly evolving marketplace
The good news: There is a lot to choose from
The bad news: There is a lot to choose fromStrategy - be an informed consumer
-
8/9/2019 Linux High Availability Cluster Selection
3/34
1 11
Selection Process / Presentation Outline Identify target applications - usage model
Identify required cluster feature set
Open source vs proprietary, product vs project
Cost factors
Vendor evaluation
OEM & ISV endorsements
-
8/9/2019 Linux High Availability Cluster Selection
4/34
1 11
Identify Target ApplicationsClustering Categories
High Availibility Clusters
Database Fileservers
Off the shelf applications
Load Balancing Clusters
Dispatching web traffic
High Performance Computing
Large computational problems
-
8/9/2019 Linux High Availability Cluster Selection
5/34
1 11
High Performance Computing HPC, HPTC cluster attributes
1. Large # of systems working together to
solve a common problem -scalability2. Performance, not reliability is of utmost
importance
3. Requires custom parallelized applications
4. Tends to be bleeding edge, early adopters
5. Example deployments: genetics,pharmacutical, weather, seismic analysis,modeling
-
8/9/2019 Linux High Availability Cluster Selection
6/34
1 11
Load Balancing Clusters Front end dispatching node (or 2 for
redundancy)
Pool of inexpensive back end servers
Redirect transactions so no 1 system isoverloaded
Balancing algorithms: round robin,weighted, load based
Typically used for web server traffic(Apache front end)
Useful for static content
Not applicable for dynamic content
-
8/9/2019 Linux High Availability Cluster Selection
7/34
1 11
High Availability Clusters The need for high availability (HA)
Overview of high availability features
-
8/9/2019 Linux High Availability Cluster Selection
8/34
1 11
Reliability, Availability, Serviceability
(RAS) Users & businesses have high expectations
1. Reliability - high degree of protection for corporate
data. Information is a crucial business asset.2. Availability - near continuous data access
3. Serviceability - procedures to correct problems withminimal business impact
-
8/9/2019 Linux High Availability Cluster Selection
9/34
1 11
Sources of DowntimeThe Stand ish Group - 2001
Applicat ion bug or
error
Main-system
hardware fa i lureDatabase error
Main-server system
bu g
Network
Operator error
Other server 's
hardware fa i lure
Other server 's sys -
tem bugEnv i ronm enta l cond i -
tions
Planned outage
Other
-
8/9/2019 Linux High Availability Cluster Selection
10/34
1 11
Downtime Costs -The Standish Group
Electronic
resourceplanning
Supply
chain
an-
E-
co
-
Internet
bank ing
Custo
er
servicecenter
essaging0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
C os t p er in ute o f d o n ti e (d olla rs)
Colu
n 2
-
8/9/2019 Linux High Availability Cluster Selection
11/34
1 11
No Single Point of Failure (NSPF) Hardware Redundancy - increased overall
reliability and availability
1. Multiple paths between systems2. Storage - mirrored, RAID5
3. Multiple power sources
4. Multiple external networks
-
8/9/2019 Linux High Availability Cluster Selection
12/34
1 11
High Availability Clusters Redundancy for fault
tolerance
Failover - if 1 node shutsdown or fails, another nodetakes over application load
Facilitates planned
maintenance
-
8/9/2019 Linux High Availability Cluster Selection
13/34
1 11
Failover Involves selecting a target node & moving
resources - failover policies
Example resource types1. Physical disk ownership
2. Filesystems
3. Applications
4. Databases
5. IP addresses
-
8/9/2019 Linux High Availability Cluster Selection
14/34
1 11
Failover ConfigurationsActive / Passive
1 node runs application(s)
Other node on standby for takeover
Idle node can takeover with no performance degradation
Active / Active
All nodes actively running application(s) Workload moves to survivor on failure
Effectively utilizes capacity (TCO)
-
8/9/2019 Linux High Availability Cluster Selection
15/34
1 11
Data Integrity ProvisionsCrucial for safe failover of data centric services (filesystem /database)
In failure scenarios (eg hung node), ensure failed node can not
access storage - I/O Barriers, I/O Fencing
Lack of I/O Fencing can result in
Loss of data (backups ?)
System crashes
Common mechanisms
Power switches
SCSI reservations
Watchdog timers
-
8/9/2019 Linux High Availability Cluster Selection
16/34
1 11
Application MonitoringAll HA clusters monitor node state
Most monitor key cluster resources - network, disk
Many monitor application health
Process existence
Application check scripts
HTTP get on web server Record retrieval on database
Filesystem directory listing
-
8/9/2019 Linux High Availability Cluster Selection
17/34
1 11
Failover Times
Don't get too hung up on this
Remember that data integrity is paramount
Quoted failover times only include cluster overhead, don't include
application recovery
Application startup time
Filesystem consistency checks
Database recovery - transaction replay
Example Product literature cites 5 second failover time
Can be several minutes for database recovery (size & activitydependent)
-
8/9/2019 Linux High Availability Cluster Selection
18/34
1 11
Open Source vs Proprietary
Project vs Product Open source facilitates self-support &
customization
Support is a key determinant Products are generally well tested
Some products are also open source
If you care enough about high availability &solution stacks, you're likely to go the productroute
-
8/9/2019 Linux High Availability Cluster Selection
19/34
1 11
Heterogeneous HA Products
Proprietary offerings that run on Linux, W2K,UNIX
Unifies user training May compromise flexibility, adaptability or data
integrity (ouch!)
Some are Linux products with GUIs that run onother platforms
Virtually none allow heterogeneous platformswithin the same cluster
-
8/9/2019 Linux High Availability Cluster Selection
20/34
1 11
Cost Factors
Beware of hidden charges
Product base fee
Application specific charges (Oracle, DB2, NFS, etc) Support
Some only come with bundled service offerings
Hardware requirementsProprietary UNIX offerings typically cost severaltimes more
-
8/9/2019 Linux High Availability Cluster Selection
21/34
1 11
Vendor Evaluation
Company vision - do their cluster offerings complement ordistract. Futures roadmap.
Financial Stability
Ability to impact the marketplace
Responsiveness - ability to provide ongoing feature enhancements
Proprietary vs open source
Product integration - fit with distribution, kernel patches,compatibility & support implications
New Linux technology vs large monolithic legacy ports
How long its been on the market
-
8/9/2019 Linux High Availability Cluster Selection
22/34
1 11
Open Source Projects
FailSafe - from SGI & SuSE
Optional data integrity provisions (power switch)
Supports 16 nodes
Good set of application kits
Red Hat Cluster Manager
Also offered as a product
Described later in presentation
-
8/9/2019 Linux High Availability Cluster Selection
23/34
1 11
HA Cluster Product Comparisons
The ground rules
Trying to remain objective
Highlight product strengths Listed in alphabetical order
Based on web site content as of 10/2002
-
8/9/2019 Linux High Availability Cluster Selection
24/34
1 11
HP - MC/Serviceguard
Proprietary - Ported from HP/UX
Only supported on HP hardware
Dynamic online addition/removal of members
Worldwide support services
Quorum voting membership
Up to 8 nodes using FibreChannel storage, 2nodes using SCSI
Compaq Alpha line targeted at HPC clusters
-
8/9/2019 Linux High Availability Cluster Selection
25/34
1 11
Legato - Availability Manager
Proprietary
Heterogeneous (Linux, W2K, Solaris, HP-UX)
Strong data centric services
Well integrated with SAN environments
Replication
Storage management, volume management, backupApplication monitoring
Extensive set of application specific modules
-
8/9/2019 Linux High Availability Cluster Selection
26/34
1 11
PolyServe - Application Manager
Proprietary
Application monitoring
Up to 16 nodes
Multiple platforms - Linux, W2K, Solaris
Doesn't require shared storage
Dynamic member addition/removal
Centralized management
-
8/9/2019 Linux High Availability Cluster Selection
27/34
1 11
PolyServe - Matrix Server
Tailored for Oracle 9i Real Application Clusters
Concurrent read + write access to data on shared
storage SAN Cluster filesystem with lock manager +
distributed cache
Allows incremental growth by adding servers +storage
Proprietary
-
8/9/2019 Linux High Availability Cluster Selection
28/34
1 11
Red Hat - Cluster Manager
Bundled with RHL Advanced Server 2.1
Both open source & product
Data integrity provisions
Power switches (optional)
Watchdog timer software
Application monitoring
Heterogeneous fileserving via NFS + SambaWeb monitoring GUI
Also integrated Piranha load balancing cluster
-
8/9/2019 Linux High Availability Cluster Selection
29/34
1 11
Steeleye - LifeKeeper
Proprietary - UNIX port
Multi-platform - Linux, W2K
Wide set of application kits (separatelypurchaced)
Established OEM relationships
Data integrity provisions - via SCSI reservations,requiring kernel patches
Application monitoring
-
8/9/2019 Linux High Availability Cluster Selection
30/34
1 11
IBM
Focusing on HPC
Rackmounted Intel servers
Custom solutions (older) XCAT software for management, parallel
operations, and installation
(newer) Cluster Systems Mgt (CSM) for Linux
Remote monitoring, resets, bios console
Parallel shell
Requires IBM hardware for imbedded service processor
High Availability via partnering
-
8/9/2019 Linux High Availability Cluster Selection
31/34
1 11
Veritas Cluster Server
Recent Linux port
16 nodes, wide range of supported apps
Also runs on Windows, AIX, UNIX, SolarisIntegrates with their storage offerings (volumemanagement, backup, data replication)
Proprietary
-
8/9/2019 Linux High Availability Cluster Selection
32/34
1 11
Other Vendors
Dell
Strategic partnering for HA software
Penguin Computing HPC offering via partnership with Scyld Beowulf
-
8/9/2019 Linux High Availability Cluster Selection
33/34
1 11
Consolidated Solutions
Egenera
BladeFrame hardware, backplane eliminates cabling
Management software, HA, provisioningLinux NetworX
Turnkey solution, preintegrated hardware + management tools
Custom hardware, dense racks
-
8/9/2019 Linux High Availability Cluster Selection
34/34
1 11
Summary
Know what category of cluster is right for you
Be knowledgeable of required cluster features
Weigh your cost criteria
Chose a vendor you can trust to safeguard yourcorporate assets
Be wary of marketing collateral