k Memvisor: Flexible System Wide Memory Mirroring in Virtual Environments
description
Transcript of k Memvisor: Flexible System Wide Memory Mirroring in Virtual Environments
kMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments
Bin Wang
Zhengwei Qi
Haibing Guan
Haoliang Dong
Wei Sun
Shanghai Key Laboratory of Scalable Computing and Systems
Shanghai Jiao Tong University
Yaozu Dong Intel China Software Center
Is your memory error-prone?
Today's memories do become error-prone.
• [B. Schroeder et al. SIGMETRICS 09] Memory failures are common in clusters• 8% of DIMMs have correctable errors per year
• 1.29% uncorrectable errors in Google testbed
Today's memories do become error-prone.
• [B. Schroeder et al. SIGMETRICS 09] Memory failures are common in clusters• 8% of DIMMs have correctable errors per year
• 1.29% uncorrectable errors in Google testbed
Memory -intensive App.
Memory -intensive App.
Memory -intensive App.
Memory HA in Cloud Computing
1.29% error rate
Memory HA in Cloud Computing
1.29% error rate13,000 failures per year1,083 failures per month35 failures per day1.5 failure per hour
Memory HA in Cloud Computing
1.29% error rate13,000 failures per year1,083 failures per month35 failures per day1.5 failure per hour
Service Level Aggrement
99.95%
Memory HA in Cloud Computing
1.29% error rate13,000 failures per year1,083 failures per month35 failures per day1.5 failure per hour
Service Level Aggrement
99.95%4.38 hours downtime per year21.56 minutes downtime per month5.04 minutes downtime per day
Memory HA in Cloud Computing
1.29% error rate13,000 failures per year1,083 failures per month35 failures per day1.5 failure per hour
[Andy A. Hwang et al. ASPLOS 12] Memory Errors happen at a significant rate in all four sites with 2.5 to 5.5% of nodes affected per system
Service Level Aggrement
99.95%4.38 hours downtime per year21.56 minutes downtime per month5.04 minutes downtime per day
Existing Solutions
HardwareECC (Hp, IBM, Google et al.)
Bit-granularity checking
Mirrored Memory (Hp, IBM, Google et al.)
Expensive
Low flexibility
• Software
- Duo-backup (GFS, Amazon Dynamo)
System level tolerance
- Checkpoint, hot spare+ VM migration /replication
Application-Specific and High overhead(eg. Remus [NSDI 08] with 40 Checkpoints/sec, overhead 103%)
Existing Solutions
Design Guideline
• Low cost
• Efficiency & compatibility - Arbitrary platform - On the fly+ hot spare
• Low maintaining - Little impact to others (eg., networking utilization) - Without migration
• Cloud requirement - VM granularity
kMemvisor
• A hypervisor providing system-wide memory mirroring based on hardware virtualization
• Supporting VMs with or without mirror memory feature on the same physical machine
• Supporting NModularRedundancy for some special mission critical applications
kMemvisor High-level Architecture
App …
Guest OS
Ordinary VM
kMemvisor
CPU Management
Memory Management
…
Hardware
CPU …Page TableMemory
Code
Code TranslationManagement
…
App …
Guest OS
App Level HA VM
App …
Guest OS
System-wide HA VM
DiskCode
LoadTranslate
...MOV addr,
data...
...MOV addr, dataMOV mirror_addr, data...
Modify
mva = mirror(nva)
Memory Mirroring
Virtual Addr.
Physical Mem.
Memory Mirroring
Virtual Addr.
Physical Mem.
Create native PTE
Memory Mirroring
Virtual Addr.
Physical Mem.
Create native PTE
Memory Mirroring
Virtual Addr.
Physical Mem.
Create mirror PTE
Memory Mirroring
Virtual Addr.
Physical Mem.
Create mirror PTE
Memory Mirroring
Virtual Addr.
Physical Mem. 2 2
mov $2, addr
Memory Mirroring
Virtual Addr.
Physical Mem. 2 2
mov $2, addrmov $2, mirror(addr)
Virtual Addr.
Physical Mem. X 2
Memory corruption
Retrieve Memory Failure
Virtual Addr.
Physical Mem. X 2
Retrieve Memory Failure
Virtual Addr.
Physical Mem. 2X
Re-create PTE
Retrieve Memory Failure
Virtual Addr.
Physical Mem. 2X
Re-create PTE
Retrieve Memory Failure
Virtual Addr.
Physical Mem. 22X
Copy
Retrieve Memory Failure
Create Mirror Page Table
update_va_mapping() kMemvisor
Guest OSmmap()
Applicationmalloc()
Hardware
Virtual Addr.
Physical Mem.
Native
Mirror
System Call
Hyper Call
Create native PTE Create mirror PTE
1 2
Modified Memory Layout
kMemvisor Space
Mirrored User Stack Space
Mirrored User Heap Space
Native Kernel Space
Native User Heap Space
Native User Stack Space
0xFFFFFFFFFFFF
0xFFFFFC000000
0x800000000000
0x000000000000
Virtual Memory Physical Memory
High Quality Memory
Mirrored Memory
Native Memory
Native Space
Mirror Space
mva = mirror(nva)=nva+offset
Memory Synchronization
… movq $4, 144(%rdi)… call log_text… addq %rdx, (%rax)… pushq %rbp… .Log_text:...
Compiler(cc1)
Assembler(as)
Linker(ld)
BinaryTanslation
… movq $4, 144(%rdi) movq $4, offset+144(%rdi)… call log_text... addq %rdx, (%rax) addq %rdx,offset(%rax)… pushq %rbp movq %rbp, offset(%rsp)… .Log_text: pushq %rax movq 8(%rsp), %rax movq %rax, (offset+8)(%rsp) popq %rax...
Modified Libraries
Sysbench.outSysbench.c Sysbench.s M_sysbench.s Sysbench.o
Native instructions Modified instructions
Implicit Memory Instructions
……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……
Explicit and Implicit Instructions
……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……
Would not be executed untill call returns causing data inconsistence
Explicit and Implicit Instructions
……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……grep:pushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax…...
Would not be executed untill call returns causing data inconsistence
Explicit and Implicit Instructions
……call greppushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax……grep:pushl %eaxmovl %eax, offset(%esp)movl 4(%eax), %eaxmovl %eax, offset+4(%esp)popl %eax…...
Would not be executed untill call returns causing data inconsistence
Instrumenting mirror instructions at the start of the called procedure
The Stack After An int InstructionCompletes
ss
esp
cs
eflags
eipesp
NativePrivilege changed
ss
esp
cs
eflags
eip
Mirror
Pushed by processor
Copied by kMemvisor
error code error code
The Stack After An int InstructionCompletes
ss
esp
cs
eflags
eipesp
NativePrivilege changed
ss
esp
cs
eflags
eip
Mirror
Pushed by processor
Copied by kMemvisor
cs
eflags
esp
NativePrivilege not changed
cs
eflags
error code
Mirror
Pushed by processor
error code
eip eip
Copied by kMemvisor
Data A
Data B
Data A
Data B
error code error code
Test Environment
• Hardware - Dell PowerEdge T610 server : 6-core 2.67GHz Intel Xeon CPU
with 12MB L3 cache
- Samsung 8 GB DDR3 RAMs with ECC and a 148 GB SATA Disk
• Software - Hypervisor: Xen-3.4.2
- Kernel: 2.6.30
- Guest OS: Busybox-1.19.2
Malloc Micro-test with Different Block Size
The impact on performance is somewhat large when the allocated memory is less than 256KB but much more limited when the size is larger.
Overhead for sequential memory read
Memory Sequential Read & Write
33%overhead Average
Overhead for sequential memory write
overhead No
0
20
40
60
80
100
120
140
160
180
200
0
0.2
0.4
0.6
0.8
1
1.2
1.4
echo mkdir wc cat ls
Native
Modified
Count of memory instructions
Nor
mal
ized
Lat
ency
Coun
t of i
nstr
uctio
ns
XV6 Benchmark
Usertests performance comparison Command performance comparison
43%overhead Average %06overheadLargest
5%overhead Average %27overheadLargest .
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
sbrk bigwrite mem fork
Native Modified
Nor
mal
ized
Lat
ency
Web Server & Database
Performance of thttpdThe performance impact for create, insert, select,update, and delete in SQLite
%01overheadLargest
Double overhead
Impacted by lock operationsThe cache mechanism will cause frequent memory writing operations.
0
0.5
1
1.5
2
2.5
create insert select update delete
NativeModified
Nor
mal
ized
L
aten
cy
Compilation Time
Mirror instructions detail informationCompilation time overhead brought bykMemvisor
5.64%overhead SQLite 6%overhead Average
Sometimes, implicit write instructions are less but usually more complicated themselves which would cost extra overhead.
Discussion
• Special memory area - Hypervisor + Kernel page table
• Other device memory operation - IO operation
• Address conflict - mva = nva+offset
• Challenge in binary translation - Self-modifying code & value
• Multi-threads and multi-cores - Multi-thread: emulate mirrored instruction - Multi-cores: explicit mutex lock
Conclusion
• A software mirrored memory solution
– CPU-intensive tasks is almost unaffected– Our stressful memory write benchmark shows the
backup overhead of 55%.– Average overhead in real world applications 30%
• Dynamic binary translation • Full kernel mirroring
Thanks!