Virtual Ization 101
Transcript of Virtual Ization 101
-
7/28/2019 Virtual Ization 101
1/50
Copyright 2007 VMware, Inc. All rights reserved.
MIT IAP CourseLecture #1: Virtualization 101
Carl Waldspurger (SB SM 89 PhD 95)VMware R&D
January 16, 2007
-
7/28/2019 Virtual Ization 101
2/50
2Copyright 2007 VMware, Inc. All rights reserved.
What is Virtualization?
Virtual systems Abstract physical components using logical objects
Dynamically bind logical objects to physical configurations
Examples Network Virtual LAN (VLAN), Virtual Private Network (VPN)
Storage Storage Area Network (SAN), LUN
Computer Virtual Machine (VM), simulator
vir tu al (adj): existing in essence or effect,though not in actual fact
-
7/28/2019 Virtual Ization 101
3/50
3Copyright 2007 VMware, Inc. All rights reserved.
Overview
Virtual Machines
Virtualization Approaches
Processor Virtualization
Additional Topics
-
7/28/2019 Virtual Ization 101
4/50
4Copyright 2007 VMware, Inc. All rights reserved.
Starting Point: A Physical Machine
Physical Hardware Processors, memory, chipset,
I/O bus and devices, etc. Physical resources often
underutilized
Software Tightly coupled to hardware Single active OS image
OS controls hardware
-
7/28/2019 Virtual Ization 101
5/50
5Copyright 2007 VMware, Inc. All rights reserved.
What is a Virtual Machine?
Hardware-Level Abstraction Virtual hardware: processors,
memory, chipset, I/O devices, etc. Encapsulates all OS and
application state
Virtualization Software Extra level of indirection
decouples hardware and OS
Multiplexes physical hardwareacross multiple guest VMs
Strong isolation between VMs Manages physical resources,
improves utilization
-
7/28/2019 Virtual Ization 101
6/50
6Copyright 2007 VMware, Inc. All rights reserved.
VM Isolation
Secure Multiplexing Run multiple VMs on
single physical host Processor hardware
isolates VMs, e.g. MMU
Strong Guarantees Software bugs, crashes,
viruses within one VMcannot affect other VMs
Performance Isolation
Partition system resources Example: VMware controls
for reservation, limit, shares
-
7/28/2019 Virtual Ization 101
7/50
7Copyright 2007 VMware, Inc. All rights reserved.
VM Encapsulation
Entire VM is a File OS, applications, data
Memory and device state
Snapshots and Clones Capture VM state on the fly
and restore to point-in-time
Rapid system provisioning,backup, remote mirroring
Easy Content Distribution Pre-configured apps, demos
Virtual appliances
-
7/28/2019 Virtual Ization 101
8/50
8Copyright 2007 VMware, Inc. All rights reserved.
VM Compatibility
Hardware-Independent Physical hardware hidden
by virtualization layer Standard virtual hardware
exposed to VM
Create Once, Run Anywhere No configuration issues Migrate VMs between hosts
Legacy VMs Run ancient OS on new platform
E.g. DOS VM drives virtual IDEand vLance devices, mapped tomodern SAN and GigE hardware
-
7/28/2019 Virtual Ization 101
9/50
9Copyright 2007 VMware, Inc. All rights reserved.
Common Virtualization Uses Today
Server Consolidation and Containment Eliminate serversprawl by deploying systems into virtual machines that can runsafely and move transparently across shared hardware
Test and Development Rapidly provision test anddevelopment servers; store libraries of pre-configured testmachines
Enterprise Desktop Secure unmanaged PCs withoutcompromising end-user autonomy by layering a security policy insoftware around desktop virtual machines
Business Continuity Reduce cost and complexity byencapsulating entire systems into single files that can bereplicated and restored onto any target server
-
7/28/2019 Virtual Ization 101
10/50
10Copyright 2007 VMware, Inc. All rights reserved.
Overview
Virtual Machines
Virtualization Approaches Virtual machine monitors (VMMs)
Virtualization platform types
Alternative system virtualizations
Processor VirtualizationAdditional Topics
-
7/28/2019 Virtual Ization 101
11/50
11Copyright 2007 VMware, Inc. All rights reserved.
What is a Virtual Machine Monitor?
VMM Characteristics Fidelity
Performance Isolation / Safety
An Old Concept Classic definition from
Popek & Goldberg 74
IBM mainframes since 60s
-
7/28/2019 Virtual Ization 101
12/50
12Copyright 2007 VMware, Inc. All rights reserved.
VMM Technology
So this is just like Java, right? No, a Java VM is very different from the physical machine that runs it
A hardware-level VM reflects underlying processor architecture
Like a simulator or emulator that can run old Nintendo games? No, they emulate the behavior of different hardware architectures
Simulators generally have very high overhead
A hardware-level VM utilizes the underlying physical processor directly
-
7/28/2019 Virtual Ization 101
13/50
13Copyright 2007 VMware, Inc. All rights reserved.
VMMs Past
An Old Idea Hardware-level VMs since 60s
IBM S/360, IBM VM/370mainframe systems
Timeshare multiple single-userOS instances on expensivehardware
Classical VMM Run VM directly on hardware
Trap and emulate modelfor privileged instructions
Vendors had vertical controlover proprietary hardware,operating systems, VMM
From IBM VM/370 product announcement, ca . 1972
-
7/28/2019 Virtual Ization 101
14/50
14Copyright 2007 VMware, Inc. All rights reserved.
VMMs Present
Renewed Interest Academic research since 90s
VMs for commodity systems
Server consolidation
VMM for x86
Industry-standard hardware,from laptops to datacenter
Run unmodified commodityguest operating systems
Significant challenges, e.g.
non-virtualizable instructions Pioneered by VMware in 98
VMware Fusion for Mac OS X running WinXP, 2006
-
7/28/2019 Virtual Ization 101
15/50
15Copyright 2007 VMware, Inc. All rights reserved.
VMM Platform Types
Hosted Architecture Install as application on existing x86 host OS,
e.g. Windows, Linux, OS X Small context-switching driver
Leverage host I/O stack and resource management
Examples: VMware Player/Workstation/Server,
Microsoft Virtual PC/Server, Parallels DesktopBare-Metal Architecture
Hypervisor installs directly on hardware
Acknowledged as preferred architecture for high-end servers
Examples: VMware ESX Server, Xen, Microsoft Viridian (2008)
-
7/28/2019 Virtual Ization 101
16/50
16Copyright 2007 VMware, Inc. All rights reserved.
System Virtualization Alternatives
OS Level Hardware Level
Virtual machines abstracted using a layer at different places
Language Level
-
7/28/2019 Virtual Ization 101
17/50
17Copyright 2007 VMware, Inc. All rights reserved.
System Virtualization Taxonomy
System Virtualization
Java Microsoft .NET / Mono Smalltalk
High-Level LanguageHardware Level
Bare-Metal/ Hypervisor
HP Integrity VM IBM zSeries z/VM VMware ESX Server Xen
Hosted Microsoft Virtual Server Microsoft Virtual PC Parallels Desktop VMware Player VMware Workstation VMware Server
Para-virtualization Virtual Iron VMware VMI Xen
OS Level FreeBSD Jail HP Secure Resource
Partitions Sun Solaris Zones SWsoft Virtuozzo User-Mode Linux
Bochs Microsoft VPC for Mac
QEMU Virtutech Simics
Emulators
-
7/28/2019 Virtual Ization 101
18/50
18Copyright 2007 VMware, Inc. All rights reserved.
Overview
Virtual Machines
Virtualization Approaches
Processor Virtualization Classical techniques
Software x86 VMM
Hardware-assisted x86 VMM Para-virtualization
Additional Topics
-
7/28/2019 Virtual Ization 101
19/50
19Copyright 2007 VMware, Inc. All rights reserved.
Classical Instruction Virtualization
Trap and Emulate Run guest operating system deprivileged
All privileged instructions trap into VMM VMM emulates instructions against virtual state
e.g. disable virtual interrupts, not physical interrupts
Resume direct execution from next guest instruction
Implementation Technique This is just one technique
Popek and Goldberg criteria permit others
-
7/28/2019 Virtual Ization 101
20/50
20Copyright 2007 VMware, Inc. All rights reserved.
Classical Memory Virtualization
Traditional VMM Approach
Extra Level of Indirection Virtual Physical
Guest maps VPN to PPNusing primary page tables
Physical Machine
VMM maps PPN to MPNShadow Page Table
Composite of two mappings
For ordinary memory references
Hardware maps VPN to MPN Cached by physical TLB
VPN
PPN
MPN
hardware TLB
shadow page table
guest
VMM
-
7/28/2019 Virtual Ization 101
21/50
21Copyright 2007 VMware, Inc. All rights reserved.
Memory Traces
Shadow Page Table Derived from primary page table in guest
VMM must keep primary and shadow coherent
Trace = Coherency Mechanism Write-protect primary page table
Trap guest writes to primary
Update or invalidate corresponding shadow
Transparent to guest
-
7/28/2019 Virtual Ization 101
22/50
22Copyright 2007 VMware, Inc. All rights reserved.
Classical VMM Performance
Native Speed Except for Traps No overhead in direct execution
Overhead = trap frequency average trap cost
Trap Sources Most frequent: Guest page table traces
Privileged instructions
Memory-mapped device traces
-
7/28/2019 Virtual Ization 101
23/50
23Copyright 2007 VMware, Inc. All rights reserved.
x86 Virtualization Challenges
Not Classically Virtualizable x86 ISA includes instructions that read or modify privileged state
But which dont trap in unprivileged mode
Example: POPF instruction Pop top-of-stack into EFLAGS register
EFLAGS.IF bit privileged (interrupt enable flag)
POPF silently ignores attempts to alter EFLAGS.IFin unprivileged mode!
So no trap to return control to VMM
Deprivileging not possible with x86!
-
7/28/2019 Virtual Ization 101
24/50
24Copyright 2007 VMware, Inc. All rights reserved.
How to Virtualize x86?
Interpretation Problem too inefficient
x86 decoding slow
Code Patching Problem not transparent
Guest can inspect its own code
Binary Translation (BT) Approach pioneered by VMware
Run any unmodified x86 OS in VM
Extend x86 Architecture
-
7/28/2019 Virtual Ization 101
25/50
25Copyright 2007 VMware, Inc. All rights reserved.
Software VMM: Binary Translation
Direct execute unprivileged guest application code Will run at full speed until it traps, we get an interrupt, etc.
Binary translate all guest kernel code, run it unprivileged Since x86 has non-virtualizable instructions,
proactively transfer control to the VMM (no need for traps)
Safe instructions are emitted without change
For unsafe instructions, emit a controlled emulation sequence
VMM translation cache for good performance
-
7/28/2019 Virtual Ization 101
26/50
26Copyright 2007 VMware, Inc. All rights reserved.
VMware Translator Properties
Binary input is x86 hex, not source
Dynamic interleave translation and execution
On Demand translate only what about to execute (lazy)
System Level makes no assumptions about guest code
Subsetting full x86 to safe subset
Adaptive adjust translations based on guest behavior
-
7/28/2019 Virtual Ization 101
27/50
27Copyright 2007 VMware, Inc. All rights reserved.
BT Mechanics
Each Translator Invocation Consume a basic block (BB)
Produce a compiled code fragment (CCF)
Store CCF in Translation Cache Future reuse
Capture working set of guest kernel
Amortize translation costs
Not patching in place
translator
Input: BB
Output: CCF55 ff 33 c7 03 ...
55 ff 33 c7 03 ...
-
7/28/2019 Virtual Ization 101
28/50
28Copyright 2007 VMware, Inc. All rights reserved.
Example: IDENT Translation
80304a69 push %ebp80403a6a push (%ebx)
80403a6c mov (%ebx), ffffffff80403a72 mov %edx, %esp80403a74 mov %esp, 81c(%ebx)80403a7a push %edx80403a7b mov %ebp, %eax80403a7d call 80460ba4
25555b0 push %ebp25555b1 push (%ebx)25555b3 mov (%ebx), ffffffff25555b9 mov %edx, %esp25555bb mov %esp, 81c(%ebx)25555c1 push %edx25555c2 mov %ebp, %eax25555c4 push 80403a82
25555c9 int 3a25555cb data: 80460ba4BB
CCF25555c4: push return address25555c9: invoke translator on callee
-
7/28/2019 Virtual Ization 101
29/50
29Copyright 2007 VMware, Inc. All rights reserved.
Adaptive BT
Translated Code Is Fast Mostly IDENT translations
Runs at speed
Except Writes to Traced Memory Page fault (shown as !*!)
Decode and interpret instruction
Fire trace callbacks
Resume execution
Can take 1000s of cycles
!*!
Invoke Translator
TranslationCache
-
7/28/2019 Virtual Ization 101
30/50
30Copyright 2007 VMware, Inc. All rights reserved.
Adaptive BT: Fast Trace Handling
Detect and Track Trace Faults
Splice in TRACE Translation Execute memory access in software
Avoid page fault
No re-decoding
Faster resumption
Faster Traces 10x performance improvement
Adapts to runtime behavior
JMP
InvokeTranslator
TRACE
-
7/28/2019 Virtual Ization 101
31/50
31Copyright 2007 VMware, Inc. All rights reserved.
Software VMM Evaluation
Benefits Adaptation
Fast traces Fast I/O emulation
Flexibility
Costs Running translator
Path lengthening
System call slowdown
Complexity
-
7/28/2019 Virtual Ization 101
32/50
32Copyright 2007 VMware, Inc. All rights reserved.
Hardware-Assisted VMM
Recent x86 Extension 1998 2005: Software-only VMMs using binary translation
2005: Intel and AMD start extending x86 to support virtualization
First-Generation Hardware Enables classical trap-and-emulate VMMs
Intel VT, aka Vanderpool Technology
AMD SVM, aka Pacifica
Performance VT/SVM help avoid BT, but not MMU ops (actually slower!)
Main problem is efficient virtualization of MMU and I/O,Not executing the virtual instruction stream
-
7/28/2019 Virtual Ization 101
33/50
33Copyright 2007 VMware, Inc. All rights reserved.
VT/SVM Architecture
Diagram Y-axis: old school x86 privilege (CPL)
X-axis: virtualization privilege
Guest Mode Runs unmodified OS
Sensitive operations exit(trap out) to host mode
VMCB Virtual Machine Control Block
VMM-controlled, hardware-walked
Buffers simple exits
CPL 3CPL 3
CPL 2
CPL 1
CPL 0
CPL 2
CPL 1
CPL 0
Host Guest
-
7/28/2019 Virtual Ization 101
34/50
34Copyright 2007 VMware, Inc. All rights reserved.
Hardware-Assisted VMM
Hardware-Assisted Direct ExecCPL 0-3
VMMCPL 0-3
Host mode
Guest mode
Fault,Trace,
Interrupt,I/O ...
Resume Guest
-
7/28/2019 Virtual Ization 101
35/50
35Copyright 2007 VMware, Inc. All rights reserved.
Hardware-Assisted VMM Evaluation
Benefits Simplicity (no BT)
Fast system calls No translator overheads
Costs Exits: 1000s of cycles for traces and I/O
No adaptation or software flexibility
Stateless model
Future
Hardware support for fast MMU virtualization Intel EPT, AMD NPT
-
7/28/2019 Virtual Ization 101
36/50
36Copyright 2007 VMware, Inc. All rights reserved.
What is Paravirtualization?
Full Virtualization No modifications to guest OS
Excellent compatibility, good performance, but complex
Paravirtualization Exports Simpler Architecture Term coined by Denali project in 01, popularized by Xen
Modify guest OS to be aware of virtualization layer
Remove non-virtualizable parts of architecture
Avoid rediscovery of knowledge in hypervisor
Excellent performance and simple, but poor compatibility
Ongoing Linux Standards Work Paravirt Ops interface between guest and hypervisor
Small team from VMware, Xen, IBM LTC, etc.
-
7/28/2019 Virtual Ization 101
37/50
37Copyright 2007 VMware, Inc. All rights reserved.
Paravirtualization: Conceptual Diagram
Hardware
Hypervisor
Guest
OS
Hardware
Hypervisor
GuestOS
Full Virtualization Paravirtualization
Hypercalls(GOOD)
System callinterface
NOT GOOD!
-
7/28/2019 Virtual Ization 101
38/50
38Copyright 2007 VMware, Inc. All rights reserved.
VMware Vision: Transparent Paravirtualization
Same OS binary
Xen 3.0.x VMware ESX
NativeNative Native
Dom0VMI
LinuxDomU
XenoLinux
VMILinux
VMILinux
WindowsSolaris
-
7/28/2019 Virtual Ization 101
39/50
39Copyright 2007 VMware, Inc. All rights reserved.
Further Reading
VMware Publications www.vmware.com/academic/resources.html
A Comparison of Software and Hardware Techniques for x86Virtualization (ASPLOS 06)
Fast Transparent Migration for Virtual Machines (USENIX 05)
Memory Resource Management in VMware ESX Server (OSDI 02)
Virtualizing I/O Devices on VMware Workstations Hosted VMM(USENIX 01)
Additional Academic Publications Xen and the Art of Virtualization (SOSP 03)
Disco: Running Commodity Operating Systems on ScalableMultiprocessors (SOSP 97)
Many more
-
7/28/2019 Virtual Ization 101
40/50
40Copyright 2007 VMware, Inc. All rights reserved.
Additional Topics
I/O Virtualization
Memory Management
-
7/28/2019 Virtual Ization 101
41/50
41Copyright 2007 VMware, Inc. All rights reserved.
I/O Virtualization Stack
Guest Device Driver
Virtual Device Model existing device, e.g. e1000
Model an idealized device, e.g. vmxnet
Virtualization Layer Emulates the virtual device
Remaps guest and real I/O addresses Multiplexes and drives physical device
Provides additional features,e.g. transparent NIC teaming
Real Device Physical hardware, e.g. bcm5700
Likely to be different than virtual device
Guest OS
Device Driver
Device Driver
I/O Stack
DeviceEmulation
-
7/28/2019 Virtual Ization 101
42/50
42Copyright 2007 VMware, Inc. All rights reserved.
I/O Virtualization Implementations
Device Driver
I/O Stack
Guest OS
Device Driver
DeviceEmulation
Device Driver
I/O Stack
Guest OS
Device Driver
DeviceEmulation DeviceEmulation
Host OS/Dom0/Parent Domain
Guest OS
Device Driver
DeviceManager
Hosted or Split Hypervisor DirectPassthrough I/O
VMware Workstation, VMware Server,VMware ESX Server (for slow devices),Xen, Microsoft Viridian, Virtual Server
VMware ESX Server(storage and network)
A Future OptionMany Challenges
Emulated I/O
-
7/28/2019 Virtual Ization 101
43/50
43Copyright 2007 VMware, Inc. All rights reserved.
Passthrough I/O Virtualization
High Performance Guest drives device directly
Minimizes CPU utilizationEnabled by HW Assists
I/O-MMU for DMA isolatione.g. Intel VT-d, AMD IOMMU
Partitionable I/O devicee.g. PCI-SIG IOV spec
Challenges Hardware independence
Migration, suspend/resume Memory overcommitment
I/O MMU
DeviceManager
VF VF VF
PF
PF = Physical Function, VF = Virtual FunctionI/O Device
Guest OS
Device Driver
Guest OS
Device Driver
Guest OS
Device Driver
VirtualizationLayer
-
7/28/2019 Virtual Ization 101
44/50
44Copyright 2007 VMware, Inc. All rights reserved.
Additional Topics
I/O Virtualization
Memory Management
-
7/28/2019 Virtual Ization 101
45/50
45Copyright 2007 VMware, Inc. All rights reserved.
Memory Management
Desirable capabilities Efficient memory overcommitment
Accurate resource controls Exploit sharing opportunities
Challenges Allocations should reflect both importance and working set
Best data to guide decisions known only to guest OS
Guest and meta-level policies may clash
-
7/28/2019 Virtual Ization 101
46/50
46Copyright 2007 VMware, Inc. All rights reserved.
VMware Memory Management
Reclamation mechanisms Ballooning guest driver allocates pinned PPNs,
hypervisor deallocates backing MPNs Swapping hypervisor transparently pages out PPNs,
paged in on demand
Page sharing hypervisor identifies identical PPNsbased on content, maps to same MPN copy-on-write
Allocation policies Proportional sharing revoke memory from VM
with minimum shares-per-page ratio
Idle memory tax charge VM more for idle pagesthan for active pages to prevent unproductive hoarding
-
7/28/2019 Virtual Ization 101
47/50
47Copyright 2007 VMware, Inc. All rights reserved.
Ballooning
Guest OS
balloon
Guest OS
balloon
Guest OS
inflate balloon(+ pressure)
deflate balloon( pressure)
may page outto virtual disk
may page infrom virtual disk
guest OS manages memoryimplicit cooperation
-
7/28/2019 Virtual Ization 101
48/50
48Copyright 2007 VMware, Inc. All rights reserved.
Page Sharing
Motivation Multiple VMs running same OS, apps
Collapse redundant copies of code, data, zerosTransparent page sharing
Map multiple PPNs to single MPN copy-on-write
Pioneered by Disco [Bugnion 97] , but required guest OS hooks
Content-based sharing General-purpose, no guest OS changes
Background activity saves memory over time
-
7/28/2019 Virtual Ization 101
49/50
49Copyright 2007 VMware, Inc. All rights reserved.
Page Sharing: Scan Candidate PPN
VM 1 VM 2 VM 3
011010110101010111101100
MachineMemory 06af
343f8123b
Hash:VM:PPN:MPN:
hint frame
hashtable
hash page contents 2bd806af
-
7/28/2019 Virtual Ization 101
50/50
50Copyright 2007 VMware, Inc. All rights reserved.
Page Sharing: Successful Match
VM 1 VM 2 VM 3
MachineMemory 06af
2123b
Hash:Refs:MPN:
shared frame
hashtable