Lecture 14: Advanced Conformational Sampling Dr. Ronald M. Levy Statistical Thermodynamics.
IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M....
-
Upload
lee-berenice-hamilton -
Category
Documents
-
view
216 -
download
0
Transcript of IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M....
![Page 1: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/1.jpg)
IMPROVING THE RELIABILITY OF COMMODITY OPERATING
SYSTEMS
Michael M. SwiftBrian N. BershadHenry M. LevyUniversity of Washington
![Page 2: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/2.jpg)
Key Idea
• Kernel extensions are a major source of system failures
• Nooks isolates extensions within lightweight protection domains inside the kernel address space
• Requires little or no changes to extension and kernel code
• Still prevents most system failures
![Page 3: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/3.jpg)
INTRODUCTION
• Want to allow existing kernel extensions to execute safely in commodity kernels because– Computer reliability remains an
unsolved issue– Kernel extensions are increasingly
popular in Windows and Linux– Account for most system failures (85%
for Windows XP)
![Page 4: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/4.jpg)
Why?
• Programmers writing device drivers are often less experienced than kernel programmers
• Number of variants prevent us from testing them completely
![Page 5: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/5.jpg)
The Nooks approach
• Works with commodity operating systems
• Requires little or no changes to extension and kernel code
• Prevents most but not all system failures
• Lets kernel extensions reside in the kernel address space
• Solution is practical, backward-compatible and efficient
![Page 6: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/6.jpg)
PREVIOUS WORK (I)
• Capability-based architectures, ring and segment architectures:
– Enable construction and isolation of privileged subsystems.
– Slow and do not address recovery issues
![Page 7: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/7.jpg)
PREVIOUS WORK (II)
• Microkernels:– Put extensions into separate address
spaces– Slow and do not address recovery
issues• Database recovery techniques:
– Atomic transactions– Work well for file system – Often awkward and slow
![Page 8: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/8.jpg)
PREVIOUS WORK (III)
• Type-safe programming languages:– Must rewrite the kernel
• Static analysis of extensions:– Can detect some errors
![Page 9: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/9.jpg)
• Virtual machines:– Can reduce the amount of code that
can crash whole machine– Fails if extension executes inside the
virtual machine monitor
PREVIOUS WORK (III)
VM
VM Monitor
Hardware
VM VM
![Page 10: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/10.jpg)
Required modifications
Approach Hardware OS ExtensionsCapabilities yes yes yesMicrokernels no yes yesLanguages no yes yesNew Driver Arch. no yes yesTransactions no no yesVirtual Machines no no noStatic Analysis no no noNooks no no no
![Page 11: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/11.jpg)
NOOKS
• Design for fault-resistance, not fault-tolerance:– System must prevent and recover from
most extension mistakes– Occupy middle ground between
unprotected (Linux, Windows) and safe (SPIN, Java VM)
• Design for mistakes, not abuse:– Exclude malicious behavior
![Page 12: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/12.jpg)
Nooks goals
• Isolation:– Isolate kernel from extension failures– Detect extension failures before they
corrupt kernel• Automatic recovery from extension
failures• Backward compatibility with existing
systems and extensions
![Page 13: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/13.jpg)
Nooks functions
• New system reliability layer:the Nooks Isolation Manager (NIM)
OS Kernel
Kernel extensions
Isolation Interposition Object Tracking Recovery
![Page 14: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/14.jpg)
Nooks isolation mechanisms
• Every extension executes within its own lightweight kernel protection domain
• Communication between kernel and extensions must go through new kernel service, the Extension Procedure Call (XPC)– Similar to an RPC between two
processes located on the same machine (Lightweight RPC)
![Page 15: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/15.jpg)
Lightweight protection domains
• Provide protection by having extensions execute with a different page table giving them – Read/write access to their own pages– Read only access to other kernel
pages
![Page 16: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/16.jpg)
Lightweight protection domains
• Lightweight solution because extensions execute in kernel mode– A malicious extension could switch
back to the kernel’s page table– Another could misuse DMA
![Page 17: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/17.jpg)
Nooks interposition mechanisms
• Ensure that– All extension-to-kernel and kernel-to-
extension control flow goes through Nooks XPC mechanism
– All data transfers between kernel and extension go through Nooks object tracking code
• All interfaces are done through wrappers, similar to the stubs of an RPC package
![Page 18: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/18.jpg)
Nooks object tracking functions
• Maintain a list of kernel data structures accessed by each extension
• Control all modifications to these structures
• Provide object information for cleanup if object fails
![Page 19: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/19.jpg)
Nooks object tracking functions
• Extensions cannot directly modify kernel data structures
• Object tracking code will:– Copy kernel data structures into
extension address space– Copy them back after changes have
been applied– Perform checks whenever possible
![Page 20: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/20.jpg)
Nooks recovery functions
• Detect and recover from various extension faults:– When an extension improperly invokes
a kernel function– When processor raises an exception
![Page 21: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/21.jpg)
Nooks recovery functions
• Recovery helped by Nooks isolation mechanisms:– All access to kernel structures are
done through wrappers– Nooks can successfully release
extension-held kernel structures
![Page 22: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/22.jpg)
IMPLEMENTATION
• Inside Linux 2.4.18 kernel on Intel x86 architecture
• Linux kernel provides– over 700 functions callable by
extensions– over 650 extension-entry functions
callable by the kernel• Most interactions between kernel and
extensions go through function calls
![Page 23: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/23.jpg)
Nooks
Linux Kernel
Nooks Isolation Manager
Interposition
Extension
Interposition
Extension
Extensions are wrapped by Nooks wrapper stubs
![Page 24: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/24.jpg)
Details
• 22,000 lines of code– Linux kernel has 2.4 million lines
• Makes no use of Intel x86 protection rings– Extensions execute at same
protection level as kernel
![Page 25: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/25.jpg)
Isolation
• Two parts:– memory management:
to implement lightweight protection domains
– extension procedure call (XPC)
![Page 26: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/26.jpg)
Memory management
• All components share the same kernel address space
• Each extension executes in a separate lightweight protection domain
• Extension has:– Read-only access to kernel– Read-write access to its own domain
![Page 27: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/27.jpg)
Lightweight protection domain
• Each lightweight protection domain has– A synchronized copy of the kernel page
table– Its own heap, a pool of stacks and other
private data structures• Changing protection domains requires a
change of page tables– Results in a TLB flush!
• No protection against DMA misuse by extensions
![Page 28: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/28.jpg)
Interposition (I)
• Done by wrapper stubs all executing in kernel protection domain:– Before the call when kernel calls an
extension– After the call when an extension calls the
kernel
• Wrappers:– Check parameters for validity– Implement call by value and result
![Page 29: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/29.jpg)
i = 1
i = 0
Passing by value and result
Caller: … i = 0; abc(&i); …
i
abc(int *k){ (*k)++;}
Variable i is increasedafter caller receivesserver’s reply
![Page 30: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/30.jpg)
Interposition (II)
• Writing wrappers is not an easy task:– Requires knowing how parameters
are used• Significant amount of wrapper sharing
among extensions– Especially when extensions
implement the same functionality
![Page 31: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/31.jpg)
Object tracking
• Records kernel objects and types manipulated by extensions
![Page 32: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/32.jpg)
Recovery
• Mostly through undoing
![Page 33: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/33.jpg)
Limitations
• Cannot prevent extensions from executing privileged instructions that would corrupt the kernel
• Cannot prevent infinite loops inside an extension
• Prevented by Linux semantics to do a thorough check of the parameters passed to the operating system
• Current implementation of recovery assumes that extensions can be killed and restarted
![Page 34: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/34.jpg)
Transparency
• “Neither the extension not the kernel is aware of the Nooks layer”– In reality, must sometimes modify a
few lines of extension code
![Page 35: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/35.jpg)
Reliability
• Tested eight extensions– Two sound card drivers– Four Ethernet drivers– A Win95 compatible file system
(VFAT)– An in-kernel Web server
• Injected 400 faults– 317 resulted in extension failures
![Page 36: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/36.jpg)
Test results
• Nooks eliminated 99% of the crashes observed with native Linux (313 out of 317)
• Nooks is slower– VFAT benchmark spent 165 s in the kernel
instead of 29.5 s for native Linux– Web server could only serve about 6,000
pages/s instead of 15,000 pages/s for native Linux
Ouch!
![Page 37: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/37.jpg)
PERFORMANCE
• Ten percent performance penalty for sound and Ethernet drivers
• Sixty percent performance penalty for in-kernel web server– (15,000 - 6,000)/15,000
![Page 38: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/38.jpg)
Recovery errors
• Big problems with VFAT extension
– 90% of attempted recoveries resulted in on-disk corruption
– VFAT extension cannot be safely killed and restarted
We should expect that!
![Page 39: IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649e235503460f94b1067a/html5/thumbnails/39.jpg)
CONCLUSIONS
• Nooks approach focuses on achievingbackward compatibility– Cannot provide complete isolation
and fault-tolerance• Can still achieve “an extremely high
level of operating system reliability”
• Performance loss varies between 0 and 60%