k Memvisor: Flexible System Wide Memory Mirroring in Virtual Environments

kMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments

Bin Wang

Zhengwei Qi

Haibing Guan

Haoliang Dong

Wei Sun

Shanghai Key Laboratory of Scalable Computing and Systems

Shanghai Jiao Tong University

Yaozu Dong Intel China Software Center

Is your memory error-prone?

Today's memories do become error-prone.

• [B. Schroeder et al. SIGMETRICS 09] Memory failures are common in clusters• 8% of DIMMs have correctable errors per year

• 1.29% uncorrectable errors in Google testbed

Today's memories do become error-prone.

• [B. Schroeder et al. SIGMETRICS 09] Memory failures are common in clusters• 8% of DIMMs have correctable errors per year

• 1.29% uncorrectable errors in Google testbed

Memory -intensive App.



Memory HA in Cloud Computing

1.29% error rate


1.29% error rate13,000 failures per year1,083 failures per month35 failures per day1.5 failure per hour



Service Level Aggrement

99.95%




99.95%4.38 hours downtime per year21.56 minutes downtime per month5.04 minutes downtime per day



[Andy A. Hwang et al. ASPLOS 12] Memory Errors happen at a significant rate in all four sites with 2.5 to 5.5% of nodes affected per system


99.95%4.38 hours downtime per year21.56 minutes downtime per month5.04 minutes downtime per day

Existing Solutions

HardwareECC (Hp, IBM, Google et al.)

Bit-granularity checking

Mirrored Memory (Hp, IBM, Google et al.)

Expensive

Low flexibility

• Software

- Duo-backup (GFS, Amazon Dynamo)

System level tolerance

- Checkpoint, hot spare+ VM migration /replication

Application-Specific and High overhead(eg. Remus [NSDI 08] with 40 Checkpoints/sec, overhead 103%)

Existing Solutions

Design Guideline

• Low cost

• Efficiency & compatibility - Arbitrary platform - On the fly+ hot spare

• Low maintaining - Little impact to others (eg., networking utilization) - Without migration

• Cloud requirement - VM granularity

kMemvisor

• A hypervisor providing system-wide memory mirroring based on hardware virtualization

• Supporting VMs with or without mirror memory feature on the same physical machine

• Supporting NModularRedundancy for some special mission critical applications

kMemvisor High-level Architecture

App …

Guest OS

Ordinary VM

kMemvisor

CPU Management

Memory Management

…

Hardware

CPU …Page TableMemory

Code

Code TranslationManagement

…

App …

Guest OS

App Level HA VM

App …

Guest OS

System-wide HA VM

DiskCode

LoadTranslate

...MOV addr,

data...

...MOV addr, dataMOV mirror_addr, data...

Modify

mva = mirror(nva)

Memory Mirroring

Virtual Addr.

Physical Mem.

Memory Mirroring

Virtual Addr.

Physical Mem.

Create native PTE

Memory Mirroring

Virtual Addr.

Physical Mem.

Create mirror PTE

Memory Mirroring

Virtual Addr.

Physical Mem. 2 2

mov $2, addr

Memory Mirroring

Virtual Addr.

Physical Mem. 2 2

mov $2, addrmov $2, mirror(addr)

Virtual Addr.

Physical Mem. X 2

Memory corruption

Retrieve Memory Failure

Virtual Addr.

Physical Mem. X 2


Virtual Addr.

Physical Mem. 2X

Re-create PTE


Virtual Addr.

Physical Mem. 22X

Copy


Create Mirror Page Table

update_va_mapping() kMemvisor

Guest OSmmap()

Applicationmalloc()

Hardware

Virtual Addr.

Physical Mem.

Native

Mirror

System Call

Hyper Call

Create native PTE Create mirror PTE

1 2

Modified Memory Layout

kMemvisor Space

Mirrored User Stack Space

Mirrored User Heap Space

Native Kernel Space

Native User Heap Space

Native User Stack Space

0xFFFFFFFFFFFF

0xFFFFFC000000

0x800000000000

0x000000000000

Virtual Memory Physical Memory

High Quality Memory

Mirrored Memory

Native Memory

Native Space

Mirror Space

mva = mirror(nva)=nva+offset

Memory Synchronization

… movq $4, 144(%rdi)… call log_text… addq %rdx, (%rax)… pushq %rbp… .Log_text:...

Compiler(cc1)

Assembler(as)

Linker(ld)

BinaryTanslation

… movq $4, 144(%rdi) movq $4, offset+144(%rdi)… call log_text... addq %rdx, (%rax) addq %rdx,offset(%rax)… pushq %rbp movq %rbp, offset(%rsp)… .Log_text: pushq %rax movq 8(%rsp), %rax movq %rax, (offset+8)(%rsp) popq %rax...

Modified Libraries

Sysbench.outSysbench.c Sysbench.s M_sysbench.s Sysbench.o

Native instructions Modified instructions

Implicit Memory Instructions

……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……

Explicit and Implicit Instructions

……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……

Would not be executed untill call returns causing data inconsistence


……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……grep:pushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax…...



……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……grep:pushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax…...


Instrumenting mirror instructions at the start of the called procedure

The Stack After An int InstructionCompletes

ss

esp

cs

eflags

eipesp

NativePrivilege changed

ss

esp

cs

eflags

eip

Mirror

Pushed by processor

Copied by kMemvisor

error code error code

The Stack After An int InstructionCompletes

ss

esp

cs

eflags

eipesp

NativePrivilege changed

ss

esp

cs

eflags

eip

Mirror

Pushed by processor

Copied by kMemvisor

cs

eflags

esp

NativePrivilege not changed

cs

eflags

error code

Mirror

Pushed by processor

error code

eip eip

Copied by kMemvisor

Data A

Data B

Data A

Data B

error code error code

Test Environment

• Hardware - Dell PowerEdge T610 server ： 6-core 2.67GHz Intel Xeon CPU

with 12MB L3 cache

- Samsung 8 GB DDR3 RAMs with ECC and a 148 GB SATA Disk

• Software - Hypervisor: Xen-3.4.2

- Kernel: 2.6.30

- Guest OS: Busybox-1.19.2

Malloc Micro-test with Different Block Size

The impact on performance is somewhat large when the allocated memory is less than 256KB but much more limited when the size is larger.

Overhead for sequential memory read

Memory Sequential Read & Write

33%overhead Average

Overhead for sequential memory write

overhead No

0

20

40

60

80

100

120

140

160

180

200

0

0.2

0.4

0.6

0.8

1

1.2

1.4

echo mkdir wc cat ls

Native

Modified

Count of memory instructions

Nor

mal

ized

Lat

ency

Coun

t of i

nstr

uctio

ns

XV6 Benchmark

Usertests performance comparison Command performance comparison

43%overhead Average %06overheadLargest

5%overhead Average %27overheadLargest .

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

sbrk bigwrite mem fork

Native Modified

Nor

mal

ized

Lat

ency

Web Server & Database

Performance of thttpdThe performance impact for create, insert, select,update, and delete in SQLite

%01overheadLargest

Double overhead

Impacted by lock operationsThe cache mechanism will cause frequent memory writing operations.

0

0.5

1

1.5

2

2.5

create insert select update delete

NativeModified

Nor

mal

ized

L

aten

cy

Compilation Time

Mirror instructions detail informationCompilation time overhead brought bykMemvisor

5.64%overhead SQLite 6%overhead Average

Sometimes, implicit write instructions are less but usually more complicated themselves which would cost extra overhead.

Discussion

• Special memory area - Hypervisor + Kernel page table

• Other device memory operation - IO operation

• Address conflict - mva = nva+offset

• Challenge in binary translation - Self-modifying code & value

• Multi-threads and multi-cores - Multi-thread: emulate mirrored instruction - Multi-cores: explicit mutex lock

Conclusion

• A software mirrored memory solution

– CPU-intensive tasks is almost unaffected– Our stressful memory write benchmark shows the

backup overhead of 55%.– Average overhead in real world applications 30%

• Dynamic binary translation • Full kernel mirroring

Thanks!

k Memvisor: Flexible System Wide Memory Mirroring in Virtual Environments

Documents

Transcript of k Memvisor: Flexible System Wide Memory Mirroring in Virtual Environments