Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That...

1

Transcript of Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That...

Page 1: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Virtual Virtual Memory

Jason Power

4/17/2015 UNIVERSITY OF WISCONSIN 1

3/20/2015

With contributions from Jayneel Gandhi and Lena Olson

Page 2: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Virtual Machine History • 1970’s: VMMs

• 1997: Disco

• 1999: VMWare (binary translation)

• 2003: Xen (para-virtualization)

• 2006(ish): Hardware support

4/17/2015 UNIVERSITY OF WISCONSIN 2

Page 3: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

VM Origins

4/17/2015 UNIVERSITY OF WISCONSIN 3

1974

Page 4: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Virtual Machine Monitor (VMM)

4/17/2015 UNIVERSITY OF WISCONSIN 4

P1 P2

OS 1

P1 P2

OS 2

CPU Mem

I/O dev Disk

Virtual Machine Monitor

VMM also called a hypervisor

Page 5: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Disco Trap and emulate

4/17/2015 UNIVERSITY OF WISCONSIN 5

P1 P2

OS 1

P1 P2

OS 2

CPU Mem

I/O dev Disk

Virtual Machine Monitor

Page 6: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

What about x86? x86 can’t use trap and emulate

Classic Example: popf instruction Same instruction behaves differently depending on

execution mode User Mode: changes ALU flags Kernel Mode: changes ALU and system flags Does not generate a trap in user mode

4/17/2015 UNIVERSITY OF WISCONSIN 6

Page 7: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

VMWare Solution: binary translation

Only need to translate OS code Makes SPEC run fast by default

Most instruction sequences don’t change

Instructions that do change: Indirect control flow: call/ret, jmp PC-relative addressing Privileged instructions

4/17/2015 UNIVERSITY OF WISCONSIN 7

Page 8: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Overheads Traps are heavy weight

Binary translation Bad for OS-heavy workloads (many server apps)

What if you’re allowed to change OS a little?

4/17/2015 UNIVERSITY OF WISCONSIN 8

Page 9: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Paravirtualization and Xen Use hypercalls to bypass VMM

Still emulate for corner cases & safety reasons

Commonly used! Amazon EC2

Not “full virtualization”

4/17/2015 UNIVERSITY OF WISCONSIN 9

P1 P2

OS 1

P1 P2

OS 2

CPU Mem

I/O dev Disk

Virtual Machine Monitor

Page 10: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Hardware Support Another ring int moves from user-mode to kernel-mode vmrun moves from kernel-mode to vmm-mode

Many other instructions

What about virtual memory?

4/17/2015 UNIVERSITY OF WISCONSIN 10

Page 11: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Let’s recall virtual memory

4/17/2015 UNIVERSITY OF WISCONSIN 11

P1 P2

OS 1

0 264 0 264

0 234

Page 12: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Virtualize the OS’s memory?

4/17/2015 UNIVERSITY OF WISCONSIN 12

OS 1 OS 2

Hypervisor

0 234 0 234

0 234

Page 13: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Two dimensions of translation

4/17/2015 UNIVERSITY OF WISCONSIN 13

P1 P2

OS 1

P1 P2

OS 2

Hypervisor

0 234

Virtual addresses

Guest physical addresses

Physical addresses

Page 14: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Two dimensions of translation

14

Guest Virtual Address

Guest Physical Address

gVA gPA cr3 hPA

Host Physical Address

cr3

Guest Page Table

Nested Page Table

1 2

Page 15: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

What about the TLB? Want to cache virtual → machine in TLB

(Relatively) Easy with software-loaded TLBs TLB miss is a trap (virtual → guest physical) Guest OS loads TLB (VMM trap) Translates guest physical → machine physical VMM actually does the TLB insert

Problem?

4/17/2015 UNIVERSITY OF WISCONSIN 15

Page 16: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Hardware walked pagetable Page table walker walks nested pagetable

Need a “fake” page table

4/17/2015 UNIVERSITY OF WISCONSIN 16

Page 17: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

gPA

Shadow Paging

17

hPA

Nested Page Table

Guest Page Table

cr3cr3

1 2

gVA

cr3

Shadow Page Table Keeping shadow page table coherent introduces

overheads

VMM creates shadow page tables

VMM keeps them coherent

Page 18: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Hardware support Today’s hardware is aware of nested pagetable

Nested page table walk For each level, must do a full pagetable walk Can be very high overhead

4/17/2015 UNIVERSITY OF WISCONSIN 18

Page 19: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Support for Virtualizing Memory

CR3

gVA

ncr3

gPA

ncr3

gPA

ncr3

gPA

ncr3

gPA

ncr3

gPA

hPA

Page 20: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Tradeoffs Nested Paging

Up to 24 memory references

Updates to either page tables without VMM intervention

Beneficial with

Low TLB miss rate

High page table updates

Shadow Paging

Up to 4 memory references

Updates to either page tables requires costly VMM intervention

Beneficial with

High TLB miss rate

Low page table updates

20

Page 21: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Cost of virtualization

4/17/2015 UNIVERSITY OF WISCONSIN 21

0%200%400%600%800%1000%1200%

0%

20%

40%

60%

80%

100%

4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M 4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M

graph500 memcached NPB:CG gups

Exe

cuti

on t

ime

over

hea

d

Overheads (native)

Page 22: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Cost of virtualization

4/17/2015 UNIVERSITY OF WISCONSIN 22

0%200%400%600%800%1000%1200%

0%

20%

40%

60%

80%

100%

4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M 4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M

graph500 memcached NPB:CG gups

Exe

cuti

on t

ime

over

hea

d

Exe

cuti

on t

ime

over

hea

d

Overheads (native) Overheads (virtualized)

202%

Page 23: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Cost of virtualization

4/17/2015 UNIVERSITY OF WISCONSIN 23

0%200%400%600%800%1000%1200%

0%

20%

40%

60%

80%

100%

4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M 4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M

graph500 memcached NPB:CG gups

Exe

cuti

on t

ime

over

hea

d

Exe

cuti

on t

ime

over

hea

d

Overheads (native) Overheads (virtualized)

202%

12

6%

Page 24: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Cost of virtualization

4/17/2015 UNIVERSITY OF WISCONSIN 24

0%200%400%600%800%1000%1200%

0%

20%

40%

60%

80%

100%

4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M 4K4K

+4K

4K+

2M 2M2M

+2M 4K

4K+

4K4K

+2M 2M

2M+

2M

graph500 memcached NPB:CG gups

Exe

cuti

on t

ime

over

hea

d

Exe

cuti

on t

ime

over

hea

d

Overheads (native) Overheads (virtualized)

202%

12

6%

Page 25: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Reducing translation overhead Bhargava et. al: page walk cache

Opportunity? PTE reuse (10% of entries cover 90% of accesses)

Why? Nested translations are redundant

4/17/2015 UNIVERSITY OF WISCONSIN 25

Page 26: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Reducing translation overhead Bhargava et. al: page walk cache

Page walk cache Why not cache L1 entries?

What is the NTLB? Caches guest physical to system physical Skips the 2nd dimension walk

4/17/2015 UNIVERSITY OF WISCONSIN 26

Page 27: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

DEVICES AND VIRTUAL MEMORY That was fun, let’s make it more complicated…

4/17/2015 UNIVERSITY OF WISCONSIN 27

Page 28: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

No IOMMU: No Virtualization

Proc Kernel

Physical

Device

Page 29: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

No IOMMU (virtualized)

Proc Kernel

Guest Phys

Device

VMM Kernel

Physical

Page 30: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Virtualization ● Devices accessed by physical addresses

o Emulation of IO devices is too expensive!

● Approach 1: VMM driver (paravirtualization) o Protection domains: IOMMU checks permissions for the

memory location; use physical address o Need to rewrite drivers!

● Approach 2: Guest driver (true virtualization)

o Direct Assignment: driver uses guest physical address o IOMMU translates to machine physical address

Page 31: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

CPU

TLB

MMU

GPU I/O Device

IOTLB

IOMMU Memory

IOMMU Overview

Read memory

Send interrupts

Address translation

service

Device table

lookup

Interrupt remapper

VM proc

proc VM

proc

Page 32: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

History ● Initially combination of

o GART (graphics aperture remapping table) and o DEV (device exclusion vector)

● GART o Physical-to-physical translation so graphics

addresses appear contiguous o IOMMU is a generalization

● DEV o Devices classified into domains o Each domain is allowed to access a set of physical

addresses

Page 33: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Laundry list of features ● I/O page tables for I/O devices to access memory

o permission checking o virtual address translation

● Interrupt remapping for I/O interrupts ● Service page faults from I/O devices ● Legacy I/O ● User mode device access ● VM guest device access ● Virtualized user mode device access ● Two-level address translation ● Interrupt virtualization ● …

Page 34: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

IOMMU data structures

Page 35: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

I/O page tables

Page 36: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Memory

PT PT Data Data

CPU

TLB

MMU

GPU I/O Device

IOTLB

Hit Miss

Hit

IOMMU

Dev. Table

ATS

Miss

TLB

Device memory access

Page 37: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Page faults (before)

● Generated if the I/O device accesses unallowed memory

● Fatal error ● Written to a log ● Requires pinned memory

Page 38: Virtual Virtual Memorypages.cs.wisc.edu/~powerjg/files/10_vm.pdfDEVICES AND VIRTUAL MEMORY . That was fun, let’s make it more complicated… 4/17/2015 UNIVERSITY OF WISCONSIN 27

Page faults (now)

● Generated if the I/O device accesses unallowed memory

● Written to a buffer ● Interrupt raised on CPU core

o (Kernel) driver handles the fault ● No support to notify the device it should

retry o Device keeps on executing/waiting for the TLB

miss