HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf...
Transcript of HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf...
![Page 1: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/1.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
HA-OSCAR: High Availability Open Source Cluster Application Resources
Team:
•Chokchai (Box) Leangsuksun, Louisiana Tech University
•Stephen L. Scott, Oakridge National Lab, DOE
•Ibrahim Haddad, Open System Lab, Ericsson
•Richard Libby, Intel Corporation
![Page 2: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/2.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
• Goals: –COTS-based HPC solution towards non-stop
services–Linux clustering production quality –Ease of build, operation, maintenance
![Page 3: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/3.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
HA-OSCAR Goals• Aims to drive downtime toward ZERO via
infrastructure R&D– Unplanned downtime– Planned downtime
• COT-based HPC Beowulf cluster– Open source– Production quality– Solve HA issues with ease – Fault-Tolerant services (including computing nodes)
![Page 4: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/4.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
What is OSCAR?• Framework for cluster installation
configuration and management• Common used cluster tools• Wizard based cluster software installation
– Operating system– Cluster environment
• Administration• Operation
– Automatically configures cluster components• Increases consistency among cluster builds• Reduces time to build / install a cluster• Reduces need for expertise
Step 1 Start…
Step 2
Step 3Step 4
Step 5
Step 7Step 8 Done!
Step 6
![Page 5: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/5.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA BeowulfBeowulf Cluster
![Page 6: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/6.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA BeowulfHA-OSCAR Beowulf
![Page 7: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/7.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
Self-healing Schemes
![Page 8: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/8.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA BeowulfAdaptive recovery state diagram
working Failover
failure
Alert.
Detect
previous state, # counter,recovery
switch over & take control at thestandby
threshold reached after # retry
previous state, # counter,recovery
After the primary node repair, thenoptional Fallback
![Page 9: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/9.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA BeowulfNetwork monitoring (sample)
![Page 10: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/10.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
• HA-OSCAR 1.0 Beta release (March 2004)– active / hot-standby model for head node– The first known field-grade HA Beowulf cluster release– Self-configuration Multi-head Beowulf system– HA and HPC clustering techniques to enable critical HPC
infrastructure– Self-healing with 3-5 sec automatic failover time – 1-1.5 hour to self-build failover headnodes w/o preloaded OS– Optional Image Server for disaster recovery– Support existing HPC App(e.g. MPI) without any modification
![Page 11: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/11.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
• How to build HA-OSCAR – Can retrofit an existing Linux Beowulf– Or start with OSCAR installation tool – HA-OSCAR GUI based installation tool
![Page 12: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/12.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
HA-OSCAR installation• Adopt ease of build
and operation same as OSCAR concept
• ~30 min installation• Initial HA build takes
almost the same time as a disaster recovery (that is, each disaster recovery –providing you are prepared!)
step1
Step2 create head imageStep3 clone image
Step4 configStandby Step5 web admin to
add/config more services
![Page 13: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/13.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf1. HA-OSCAR Installation Wizard
![Page 14: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/14.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf2. Fetch / Clone Server Image
![Page 15: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/15.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf3. Standby Server Initial Network Configuration
![Page 16: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/16.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf4. Standby Server MAC Address Configuration
![Page 17: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/17.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf4+. After MAC Address Collection
![Page 18: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/18.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA BeowulfAdd / Configure more services
![Page 19: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/19.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
Architecture and Experiment• 2 Head Nodes
• dual Xeon 2.4 GHz• 1-GB RAM• 40 GB Disk• 2 NICs
• 4 Compute Nodes:• dual Xeon 2.4 GHz• 512-MB RAM• 40 GB Disk• 1 NIC
• 1 Switch 10/100 Mbps
![Page 20: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/20.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
Monitoring overheads0.9% CPU usage at each monitoring interval
0
50
100
150
200
250
300
1 2 5 10 15 20 30 60
HA-OSCAR Mon polling interval (s)
HA-O
SC
AR N
etw
ork
load
in
Pack
ets/
Min
m
easu
red b
y TC
Ptra
ce
Comparison of network usages for HA-OSCAR different polling sizes
![Page 21: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/21.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
Roadmap• Grid-aware HA-OSCAR• Multi-head n+1 active-active• Hardware abstraction and policy-based recovery
management• Hot-upgrade cluster (OS/CMS)• Fault-tolerant applications/services and interface
framework• FCAPS Management • Complete carrier grade• Policy-based Access Controls (LDAP)
![Page 22: HA-OSCAR: unleashing HA Beowulf HA-OSCAR: High ... · – The first known field-grade HA Beowulf cluster release – Self-configuration Multi-head Beowulf system – HA and HPC clustering](https://reader034.fdocuments.in/reader034/viewer/2022050514/5f9e7c30ad75d132ae26b3c8/html5/thumbnails/22.jpg)
Innovation and information technology
HA-OSCAR: unleashing HA Beowulf
Appeared in a front cover in two major Linux magazines, various technical papers, research exhibitions.web site: http://xcr.cenit.latech.edu/ha-oscar