Operating Systems & Virtual Machines Xen and the Art of Virtualization PhD Program, Seminars on...
-
Upload
oliver-bailey -
Category
Documents
-
view
227 -
download
3
Transcript of Operating Systems & Virtual Machines Xen and the Art of Virtualization PhD Program, Seminars on...
Operating Systems & Virtual Machines
Xen and the Art of Virtualization
PhD Program, Seminars on “Advances in Operating Systems”
Dott. Luca Veraldi, PhD Student in Computer Science
12th July 2006
Agenda• Why Virtualization
• challenges• Virtualization Concepts• Xen
• the overall picture• implementation details: MM, CPU, I/O, Exc/Int, Scheduling• administration tools• the cost of porting guest OS to Xen• performance evaluation
• VM Migration using Virtualization techniques• general concepts• implementation over Xen• related work
• References
Why Virtualization
• The OS is yet a virtualization layer by itself• it virtualizes FW resources for applications’ sake
• Key concepts in definition of OSs• modularity
• mantainability
• efficiency
• expandibility
• isolation?
• No sensibility for isolation matters in existing OS• We would like to isolate application domains
• drivers from the rest of the OS
• OS instances from one another
Firmware
OS
Applications
virtualized Firmware access
Why Virtualization - Challenges• Either a matter of security or performance
• isolating applications of mutually untrusting users• the problem of device drivers
• (untrusted) code running in privileged mode• potentially complete control over OS data structures• critical, complex and bug-prone
• server management activities cutdown• isolation provide for avoidance of unexpected configuration interactions between services
• List of desiderata:• allow multiple OS instances (domains or Virtual Machines)• isolated physical memory for each domain• restricted/verified privileged behavior• isolated devices• verified DMA accesses• performance isolation:
• execution of one domain may not affect performance of another one• generality
• support to binaries is fundamental• support for an as wide variety of existing OS as possible • unchanged guest OS?
• efficiency:• virtualization techniques may not introduce sensible overheads
• Some issues already targeted by µ/Exo kernels
Virtualization Concepts• A further virtualization layer in the middle between OS and FW: VMM• Allow for multiple concurrent OS instances
• modern PCs are powerful enough for creating the illusion of several OS virtual machines to run simultaneously
• Allow for OS migration
• Two different scenarios:• full virtualization
• there is a complete functional ordering between layers• full abstraction of machine (from BIOS to disks, DMA controllers, video…)• virtualization is fully transparent: guest OS unchanged• much more complex to design and implement
• VMWare overheads due to TLB shadow tables
Apps
OS
FW
OS OS
VMM
OS
Applications
Firmwarethe interface of FW is fully abstracted
Apps
OS
VMM
Apps
FW
Apps
Virtualization Concepts• A further virtualization layer in the middle between OS and FW: VMM• Allow for multiple concurrent OS instances
• modern PCs are powerful enough for creating the illusion of several OS virtual machines to run simultaneously
• Allow for OS migration
• Two different scenarios:• para- virtualization
• not a really hierarchical ordering between layers• virtualization is similar to FW interface but neither complete nor identical• guest OSs must be modified to become VM-aware
• there is a potential gain in performance, due to specialization of kernel code• easier to design
• but carefully think about interfaces
VMM
OS
Applications
Firmware
this interface is much more critical, now
Apps
OS
FW
OS OS
Apps
OS
VMM
Apps
FW
Apps
Xen – The overall picture• Para virtualized approach
• non-pure hierarchy between OS and VMM• we export both real FW and intermediate VMM abstractions to above layers
• speed up performance, reduce the levels of interpretation• Isolate VMM layer from OS
• use protection levels for ASM instructions (a.k.a. Intel protection rings)• ring[0,3]: applications at ring-3, OS at ring-0• modify guest OS to run in less privileged ring-1• privileged operations performed by VMM in ring-0• if no enough rings in FW, run OS within the same protection ring as applications
App Ring 3
Ring 1
Ring 0
System calls,Signals,Events
Scheduling ofprocesses Hyper calls,
Events
Scheduling of Virtual Machines
App App App App
OS
VMM
OS
Xen – Implementation details
• Memory Management virtualization• most critical aspect, huge intervention on guest OSs
• much more difficult due to x86 architecture• TLB faults handled directly at FW level
• Process Relocation Tables must be available at the FW level
…OS
FW
MMU
P0 RelocTable P1 RelocTable
CPU
RAM
?
• guest OS continues to manage its own relocation tables
• relocation tables need to be verified within Xen at creation time
• they remain read-only for OS
• Xen resides at the topmost entries • which are reserved and not used by OS
• to avoid TLB flushes on hypercalls
Xen – Implementation details• Process creation
• guest OS requires new relocation table to Xen• relocation tables are augmented to include Xen mapped pages• Xen registers the new relocation table and acquires exclusive write access• all updates from OS will cause page-faults, in order for Xen to verify the
update request
OS
FW
MMU
P0 RelocTable P1 RelocTable
CPU
RAM
VMM
protectionverification
protection fault
Xen – Implementation details• CPU virtualization
• change in protection ring of guest OS (01)• privileged instructionskernel
• replace direct privileged operations within OS kernel by hypercalls to Xen• scheduling of virtual machine shall be pretty efficient
• many applications depend on timing (TCP/IP rtt, real-time services, …)
• Borrowed Virtual Time scheduling algorithm• address low-latency contraint• grant efficient dispatch even for real-time contexts• notion of virtual-time• possibility to borrow the virtual time and get dispatch preference• general-purpose algorithm, not specialized upon complex real-time
paradigms• usefull to address the problem of virtualization overheads in scheduling
• The notion of system time• guest OSs are provided with real-time and virtual-time• timers are dispathed to guest OSs by means of events
Xen – Implementation details• I/O device virtualization
• Xen addresses two critical issues• define a simplifies interface for access to I/O• isolate drivers within their own virtual machine
• A simplified interface for I/O• not a novel issue• but always claimed disasters
• interface unioning instead of top-down semplification• a political matter, more than technicalities• device firmware continuously evolving over time• flexibility, extensibility
• Xen proposes one approach, validating it through experiments
• all data transfers are passed through and verified by Xen• a potential performance issue• but…
Xen – Implementation details• Implementing zero-copy message passing
• for high-performance data exchange among layers• shared, circular communication channels• out-of-band data buffers• shared memory pages among guest OS and Xen• pinning of physical memory pages for DMA• owership exchange upon data receipt (network, disk)
VMM
OS
pinned memory pages withing guest OS
shared, circular communication channel
descriptor of buffer
FW
protectionverification
DMA bypasses
VMM
page ownership exchange
Xen – Implementation details
• Exceptions and Interrupts virtualization• a matter of translation
• for page faults, more tricky: faulting virtual address in privileged register CR2 at ring 0
OS
FW
MMU
P0 RelocTable P1 RelocTable
CPU
RAM
?
VMM
bitp = 0
read CR2, save to known
location
register CR2 carries
faulting addr
page fault
jump to handler code within guest OS
• handler table registered within FW
• the table refers to Xen code
• the privileged Xen code will read the content of the CR2 register and copy it at a known location within the guest OS
• (the one responsible for the faulting virtual address space)
• eventually, the Xen code simply jumps to the guest OS handler
Xen – Implementation details• Exceptions and Interrupts virtualization
• two kinds of exceptions most frequently issued• page faults• kernel traps (software exceptions)
• A performance risk• the first one, necessarily requires Xen mediation• for the second one, maybe it can be skipped
• directly register guest OS exception handler table • prior to security validation by Xen, at starting time • only for those entries
• tables swapped on every virtual machine schedule
• Interrupts are more critical• data transfer through shared channels
• both directly in pinned pages within guest OS• or through ownership exchange
• no de-multiplexing in FW• validation by Xen• a matter of translation
• from FW interrupts to Xen HP events dispatched to guest OS
FW
VMM
OS
hypercall to Xen
validation (no privileged op claimed by code) modification of handler for page-fault
registration within FW
Xen – Administration tools• Xen layer just performs control and protection• Policies are left to the above layers
• exactly as it is in traditional OS design• separation of mechanisms and policies
• crucial mechanisms in µ/Exo Kernel• policies in user-space processes/library functions
• Management and administration issues• same as in traditional OS
• memory sharing among VM• through physical memory partitioning to enforce strong isolation
• scheduling parameters• to control dispatching of VM and weighted sharing of CPU time
• creation of new VM• virtual network interfaces• virtual block devices
• Several application-level tools to ease VMM management
Xen – The cost of porting guest OS• Para virtualized approach requires modifications
• priviledged operations replaced by hypercalls to Xen• device drivers and unified device interface
• The Linux case• somewhat modular structure of sources
• the case of three level relocation tables• a circumscribed intervention• Xenolinux
• The Windows XP case• a big mess• huge replication of code and function• not yet completed• really monolithical approach eventually hurts
• NumbersWhat Linux Windows XP
Architecture-independent 78 1299
Device drivers 1554 -----
Other (MM…) 1363 3321
Portion of the whole 1.36% -----
Xen – Performance evaluation• We have to evaluate
• Xen vs. other virtualization solutions• VMWare et similia (but benchmarking restriction…)
• we can only say that “Xen sensibly outperforms VMWare”
• User-Mode Linux
• guest OS over Xen vs. native OS• multiplexing of VM within Xen• performance isolation
• with synthetical antisocial processes running aside of web servers
Migration technology: how Virtualization can help• Typical problem from data centers/cluster administrators
• not HP solution• just server management issue
• What to migrate… a process or an OS?• an entire OS (a Virtual Machine within Xen) is easier
• extremely simple interface OSVMM• less prone to residual dependencies
• critical issue if migrating for maintainance• preserve OS-related abstractions
• network connections• open files
• do not care about application-dependent approaches
• Three phases• pushing
• let’s copy address space pages to target machine• the source entity is still computing
• stop, final copying, restart• suspend source entity• just re-transfer dirty pages
• pulling• use page fault handler to obtain missing pages• network connection to source on demand• residual dependencies
Migration technology: how Virtualization can help
• Implementation over Xen• a distributed file system• minimize downtime of the system
• migrate while still computing: push+stop phases• no pulling phase, cannot allow residual dependencies for management
activities on source• the concept of Writeable Working Set (WWS)• two different solution
• at Xen level, managed migration• at OS level, self migration
• interesting performance• 0.2 sec to migrate SPECWeb benchmark
VMM copy daemonOS
ApplicationsOS
Applications
OS
Applications
VMMstubOS
Applications
OS
Applications
network
Migration technology: how Virtualization can help
• Writeable Working Set (WWS)• extension based on traditional Working Set in Oss• a (possibly large) set of pages will seldom or never be
modified any more• usefull to estimate the downtime of the VM
• pages WWS will contribute to the overhead of the stop-copy phase
• Statistics about dirtying speed during each transfer phase• only transfer those dirty pages that were not dirty at previous
round• take care of the usual (small) amount of pages that will
always be dirtied• Stack
• Incremental network bandwidth utilization
References• T.E. Anderson
The case for Application-specific Operating SystemsIn Third Workshop on Workstation Operating Systems, pages 92-94, 1992
• Dawson R. Engler, M. Frans Kaashoek, James O’Toole Jr.Exokernel: An Operating System Architecture for Application-Level Resource ManagementIn Proceedings of the 15th ACM Symposium on Operating Systems Principles, pages 251-266, 1995
• P. Barham, B. Dragovic, K. Fraser, S. Hand, T, Harris, A. Ho, R. Neugebauer , I. Pratt, A. WarfieldXen and the Art of VirtualizationIn Proceedings of the ACM Symposium on Operating Systems Principles, 2003
• Keir Fraser, Steven Hand, Rolf Neugebauer, Ian Pratt, Andrew Warfield, Mark WilliamsonSafe Hardware Access with the Xen Virtual Machine MonitorIn Proceedings of the 1st Workshop on Operating System and Architectural Support for On-Demand IT Infrastructure, 2004
• K. J, Duda, D.R. CheritonBorrowed Virtual Time (BVT) Scheduling: Supporting latency-sensitive Threads in a general-purpose SchedulerIn Proceedings of the 17th ACM SIGOPS Symposium on Operating System Principles, pages 261-276, 1999
• T. Abels, P. Dhawan, B. ChandrasekaranAn Overview of Xen Virtualization
• C. Clark, K. Fraser, S. Hand, J. Gorm Hanseny, E. July, C. Limpach, I. Pratt, A. WarfieldLive Migration of Virtual MachinesIn Proceedings of the ACM/USENIX Symposium on Networked Systems Design and Implementation, 2005