Memory-efficient Virtual Machine High Availability
-
Upload
sequoia-lopez -
Category
Documents
-
view
23 -
download
1
description
Transcript of Memory-efficient Virtual Machine High Availability
1
Memory-efficient Virtual Machine High Availability
Karen Kai-Yuan HouProf. Kang G. Shin
University of Michigan
Mustafa Uysal (VMware)Arif Merchant (HP Labs)Sharad Singhal (HP Labs)
2
Protect VM from Host Failures
• Set up backup by primary VM replication• Backup takes over execution promptly if primary fails
• High memory costE.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup.
App 1
Primary VM
Hypervisor
Primary Host
App 2
App 1
Backup VM
Hypervisor
Backup Host
App 2Physical Host Failure
3
Use a Shared Storage
• “Maintain” backup VM in storage instead of RAM• Improve resource and energy efficiency. Recover anywhere.
Other primary (active) VM
Other primary (active) VM
App 1
Primary VM
Hypervisor
App 2
Host 1Hypervisor
Host 2
Shared Storage
HypervisorHost 2
Hypervisor
Host n
App 1
Backup VM
App 2
App 1
Primary VM
HypervisorHost 1
App 2
4
Protection: Tracking Primary VM State
• Take checkpoints of the primary VM– Incremental, periodic, copy-on-write checkpoints
Primary VM
App 1App 2
VM memory space
VM Fail-over Image
5
Fail-over: Bringing Up Backup VM
• Slim VM Restore – Load only necessary information
and switch on backup VM quickly– Fetch pages on-demand as the
backup VM executes
VM Fail-over Image
Restored backup VM
App 1App 2
VM memory space
6
Improving I/O Efficiency with SSDs
• Small, random I/O’s are more efficient on SSDs
Primary Side
Updating the VM image continuously.
Restore Side
Fetching from the VM image on-demand.
VM Fail-over Image
small, random writes small, random reads
7
Preliminary Evaluation
• Prototype built on Xen 3.3.2• Questions– How much overhead does continuous checkpointing
introduce on the primary VM?– How does the shared storage support continuous updating
of the fail-over image?– How quickly can our system bring up a backup VM?– How does the backup VM perform when it executes by
fetching pages on-demand?
8
Checkpointing Overheads
• Kernel Compilation • RUBiS
Every 10s Every 5s Every 2s0
5
10
15
20
25
30
35
40
Overhead (%)
Every 10s Every 5s Every 2s0
1
2
3
4
5
6
7 HDHD, COWSSDSSD, COW
Overhead (%)
9
CoW and SSD Enhancements
• CoW reduces VM pause time for taking checkpoints
• Checkpoints commit faster on a SSD
Every 10s Every 5s Every 2s0
50
100
150 w/o COWCOW
Pause Time (ms)
Every 10s Every 5s Every 2s0246
Commit Time (sec)
HD SSD
10
Fail-over Time and Demand Fetching
• Time required to bring up a backup VM
• Overheads of fetching VM pages on-demand
Kernel Compilation RUBiS Video Transcoding0
0.51
1.52
Fail-over Time (sec)
HD SSD
Kernel Compilation RUBiS Video Transcoding05
1015
Overhead (%)
HDSSD
11
Interesting Observations:Page Fetching Behavior
• How a VM uses (demand fetches) its pages while compiling a kernel:
12
Interesting Observations:Page Fetching Behavior
• What actually happens on disk (recorded by blktrace):
13
Conclusions
35
113 ms 10.1 ms 10.1 ms
20 s 20 s 20 s
1.47 s
save restore
35 s
14
• Thank you!