Memory-efficient Virtual Machine High Availability

1

Memory-efficient Virtual Machine High Availability

Karen Kai-Yuan HouProf. Kang G. Shin

University of Michigan

Mustafa Uysal (VMware)Arif Merchant (HP Labs)Sharad Singhal (HP Labs)

2

Protect VM from Host Failures

• Set up backup by primary VM replication• Backup takes over execution promptly if primary fails

• High memory costE.g. To protect a 1G VM, an additional 1G memory is reserved to just hold the backup.

App 1

Primary VM

Hypervisor

Primary Host

App 2

App 1

Backup VM

Hypervisor

Backup Host

App 2Physical Host Failure

3

Use a Shared Storage

• “Maintain” backup VM in storage instead of RAM• Improve resource and energy efficiency. Recover anywhere.

Other primary (active) VM

Other primary (active) VM

App 1

Primary VM

Hypervisor

App 2

Host 1Hypervisor

Host 2

Shared Storage

HypervisorHost 2

Hypervisor

Host n

App 1

Backup VM

App 2

App 1

Primary VM

HypervisorHost 1

App 2

4

Protection: Tracking Primary VM State

• Take checkpoints of the primary VM– Incremental, periodic, copy-on-write checkpoints

Primary VM

App 1App 2

VM memory space

VM Fail-over Image

5

Fail-over: Bringing Up Backup VM

• Slim VM Restore – Load only necessary information

and switch on backup VM quickly– Fetch pages on-demand as the

backup VM executes

VM Fail-over Image

Restored backup VM

App 1App 2

VM memory space

6

Improving I/O Efficiency with SSDs

• Small, random I/O’s are more efficient on SSDs

Primary Side

Updating the VM image continuously.

Restore Side

Fetching from the VM image on-demand.

VM Fail-over Image

small, random writes small, random reads

7

Preliminary Evaluation

• Prototype built on Xen 3.3.2• Questions– How much overhead does continuous checkpointing

introduce on the primary VM?– How does the shared storage support continuous updating

of the fail-over image?– How quickly can our system bring up a backup VM?– How does the backup VM perform when it executes by

fetching pages on-demand?

8

Checkpointing Overheads

• Kernel Compilation • RUBiS

Every 10s Every 5s Every 2s0

5

10

15

20

25

30

35

40

Overhead (%)


1

2

3

4

5

6

7 HDHD, COWSSDSSD, COW

Overhead (%)

9

CoW and SSD Enhancements

• CoW reduces VM pause time for taking checkpoints

• Checkpoints commit faster on a SSD


50

100

150 w/o COWCOW

Pause Time (ms)


Commit Time (sec)

HD SSD

10

Fail-over Time and Demand Fetching

• Time required to bring up a backup VM

• Overheads of fetching VM pages on-demand

Kernel Compilation RUBiS Video Transcoding0

0.51

1.52

Fail-over Time (sec)

HD SSD

Kernel Compilation RUBiS Video Transcoding05

1015

Overhead (%)

HDSSD

11

Interesting Observations:Page Fetching Behavior

• How a VM uses (demand fetches) its pages while compiling a kernel:

12

Interesting Observations:Page Fetching Behavior

• What actually happens on disk (recorded by blktrace):

13

Conclusions

35

113 ms 10.1 ms 10.1 ms

20 s 20 s 20 s

1.47 s

save restore

35 s

14

• Thank you!

Memory-efficient Virtual Machine High Availability

Documents

Transcript of Memory-efficient Virtual Machine High Availability