A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines
description
Transcript of A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines
A Fast Rejuvenation Technique for Server Consolidation
with Virtual Machines
Kenichi KouraiShigeru Chiba
Tokyo Institute of Technology
Server consolidation with VMs Server consolidation is widely carried out
Multiple server machines are integrated on one physical machine
Recently, using virtual machines (VM) VMs are run on a virtual machine monitor (VMM)
Multiplexing resourcesVM
VMM
hardware
VM ...
Software aging of VMMs Software aging of a VMM is critical
Software aging is...• The phenomenon that software state degrades
with time• E.g. exhaustion of system resources
Software aging of a VMMaffects all VMs on it
• E.g. performance degradationVM
VMM
VM ...
Software rejuvenation of VMMs Preventive maintenance
Performed before software aging of a VMM affects its VMs
Occasionally stops a VMM, cleans its internal state, and restarts it
Typical example: rebooting a VMM Cleans the internal state automatically and
completely The easiest way
Drawbacks (1/2):Increasing service downtime The VMM reboot needs:
Rebooting all OSes running on the VMs• The time tends to be long
• Larger number of VMs• Longer startup time of services
A hardware reset• The BIOS power-on self test is time-consuming
OS
VMM
OS
VM
...
OSshutdown
hardwarereset
OSboot
VMMshutdown
VMMboot
Drawbacks (2/2):Performance degradation The file cache is lost by the OS reboot
OSes cannot restore performance until the file cache is re-filled
• They strongly rely on the file cacheto speed up file accesses
The time tends to be long• The file cache size is increasing
• Large amount of memory for a VM• Free memory as the file cache
disk
OS
filecache
process
Warm-VM reboot Fast rejuvenation technique
Efficiently reboots only a VMM• The VMM reboot causes no OS reboot
Basic idea• Suspend all VMs before the VMM reboot• Resume them after the reboot
Challenge• How does a VMM efficiently deal with the large
memory images of VMs?
On-memory suspend of VMs Freezes the memory images of VMs on the
main memory That memory area is just reserved
• The time does not depend on the memory size Saving them into a slow disk is inefficient
ACPI S3 state for VMs Suspend To RAM Traditional suspend is
ACPI S4 statedisk
main memory
VM
freeze
On-memory resume of VMs Unfreezes the memory images preserved on the
main memory They are reused directly as the memory of VMs
• No need to read them from a slow disk The file cache of OSes is also restored
• No performance degradation
diskmain memory
VM
unfreeze
Quick reload of VMMs Directly boots a new VMM without a hardware
reset The memory images of VMs are preserved
through the VMM reboot• Software can keep track of them• A hardware reset does not guarantee this
A VMM is rebooted quickly• No overhead due to
a hardware reset
old VMM
new VMMpreload
VM
main memory
Comparison with other methods Cold-VM reboot
Needs the OS reboot Saved-VM reboot
A naive implementation of the warm-VM reboot• VMs are saved into a disk
Reboot method Cold-VM Saved-VM Warm-VM
Depend on # of VMs Yes No No
Depend on services Yes No No
Depend on mem size of VMs No Yes No
Performance degradation Yes No No
Model for availability Must consider the software rejuvenation of both
a VMM and OSes Warm-VM reboot
• The OS rejuvenation isindependent
Cold-VM reboot• The OS rejuvenation is affected
by the VMM rejuvenation• # of the OS rejuvenation
increases
OS rejuvenation
VMM rejuvenation
OS rejuvenation
VMM rejuvenation
RootHammer We have implemented the warm-VM reboot into
Xen 3.0.0 On-memory suspend/resume
• Based on Xen's suspend/resume
• Manages the mapping from theVM memory to the physical memory
Quick reload• Based on the kexec mechanism in Linux
• Kexec for a VMM is included in the latest Xen• It is not for reusing the memory images
VMmemory
physicalmemory
Experiments Examine that the warm-VM reboot reduces
downtime and performance degradation Comparison
• Cold-VM reboot with the OS reboot• Saved-VM reboot using Xen's suspend/resume
VMM
Linux...
12 GBSDRAM
15,000 rpmSCSI disk
2 dual-coreOpteron
gigabitEthernet
Linux
Linux
server
client
Performance ofon-memory suspend/resume
Suspend/resume of one VM with 11 GB of memory Ours: 1 sec Xen's: 280 sec
• Depends on the memory size
Suspend/resume of 11 VMs Ours: 4 sec OS reboot: 58 sec
• Depends on # of VMs
Effect of quick reload The time of rebooting a
VMM with no VMs Warm-VM reboot
• 11 sec• The time of quick reload
is negligible Cold-VM reboot
• 59 sec• The time due to a
hardware reset is 48 sec0
10
20
30
40
50
60
70
Warm-VM Cold-VM
VMM boothardware reset or quick reloadVMM shutdown
Downtime of services Warm-VM reboot
Always the same• 42 sec
Saved-VM reboot Depends on # of VMs
• 429 sec (11 VMs) Cold-VM reboot
Affected by the service type• 157 sec (sshd)• 241 sec (JBoss)
Availability of JBoss The warm-VM reboot achieves four 9s
Assumptions• OS rejuvenation every week
• 34 sec
• VMM rejuvenation every 4 weeks• In 0.5 week after the last OS rejuvenation
Warm-VM reboot 99.993%
Cold-VM reboot 99.985%
Saved-VM reboot 99.977%
OS rejuvenation
VMM rejuvenation0.5 week
1 week
Performance degradation The throughput of the
Apache web server before and after the VMM
reboot Warm-VM reboot
• No degradation Cold-VM reboot
• Degraded by 69%
Software rejuvenationin a cluster environment Clustering achieves zero downtime
Multiple hosts can provide the same service Let us consider the total throughput of all hosts in a
cluster Warm-VM reboot
• (m-1)p Cold-VM reboot
• (m-1)p• (m-0.69)p for a while
after the rebootm: # of hostsp: throughput of one host
t
mp(m-1)p
total throughput
42 sec
241 sec
Comparison with VM migrationin a cluster environment VM migration achieves nearly zero downtime
VMs are moved to another host• Xen's live migration, VMware's VMotion
Total throughput Normal run
• (m-1)p• One host is reserved
for migration Live migration
• (m-1.12)p
t
mp(m-1)p
total throughput
42 sec
17 min
Related work Microreboot [Candea et al.'04]
Reboots only a part of subcomponents• The warm-VM reboot enables rebooting only a parent c
omponent (VMM for VMs) Checkpointing/restart [Randell '75]
Saves/restores OS processes• Similar to suspend/resume of VMs
Optimizations of suspend/resume Incremental suspend, compression of memory image
s
Conclusion We proposed the warm-VM reboot
On-memory suspend/resume• Freezes/unfreezes the memory images of VMs
Quick reload• Preserves the memory images through the VMM
reboot
It achieved fast rejuvenation Downtime reduced by 83% at maximum No performance degradation