Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged...

41
VIRTUAL MEMORY CS6410 9/22/2011 DAN DENG

Transcript of Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged...

Page 1: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

VIRTUAL MEMORY

CS6410

9/22/2011

DAN DENG

Page 2: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Virtual Memory

• Separates

• Logical memory as perceived by users

• Physical memory in the system

• Provides for

• Larger programs

• Process isolation

• Memory sharing

Page 3: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Comparing Asbestos and Mach VM

• Similarities

• Underlying IPC mechanism

• Mach VM

• Simplify support for diverse architectures

• Efficiently support memory sharing

• Asbestos

• Applications specify protection policies

• Kernel enforces policies to isolate private data

Page 4: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Machine-Independent Virtual Memory

Management for Paged Uniprocessor and

Multiprocessor Architectures Richard Rashid - Chief Executive Officer, Microsoft Research (joined

Microsoft in 1991)

Lead developer of Mach

Avadis Tevanian – Chief Software Technology Officer (2003-2006) and

Senior VP of Software Engineering (1997-2003) at Apple Inc., Witness for

the United States DoJ testifying against Microsoft in United States v.

Microsoft,

William Bolosky (CMU) – Microsoft Research

David Black (CMU)

Michael Young (CMU)

David Golub (CMU)

Robert Baron (CMU)

Jonathan Chew (CMU)

Page 5: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Mach Takeaways

• Hardware-independent virtual memory (VM) is possible

• Hardware-dependent structures can be kept small

• Mach can be easily adapted to a variety of architectures

• Flaws

• Incomplete Evaluation – microbenchmarks

• Use of IPC for all tasks turned out costly

• Memory remapping between programs became more expensive

Page 6: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Context

• Successor to Accent

• Accent pioneered many novel OS concepts

• Accent limited by inability to execute UNIX applications

• Accent had strong ties to a single hardware architecture

• Explosion of hardware architectures and memory structures

• Many of them were multiprocessor systems

• Solution

• Mach – cleanly defined, UNIX-based, highly portable OS

Page 7: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Design Principles

• Support for diverse architectures

• Simplified kernel structure

• Distributed operation

• Integrated memory management and IPC

• Heterogeneous system support

Page 8: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Approach

• Simple, extensible kernel

• As little as possible inside the kernel

• What is in the kernel must be powerful

• Concentrate on communication facilities

• IPC handles all kernel requests and data movement among tasks

• System-wide protection by protecting the IPC mechanism

• Key to efficiency

• Integration of memory management and IPC

Page 9: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Abstractions

• Task

• Unit of resource allocation

• Thread

• Unit of execution

• Port

• Communication channel for IPC

• Message

• Collection of objects used in communication

• Memory Object

• Collection of data provided and managed by a task

Page 10: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Improving Portability

• Perform software memory management

• Manage all important virtual memory information in software

• Communicate with hardware using a well-defined interface

• Machine dependent code

• Implement the bare minimum to allow the kernel to run

• Export a clean interface to the software

Page 11: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Implementation

• Split hardware-independent and hardware-dependent

memory management

• Hardware-independent:

• Resident Page Table

• Address Map

• Memory Object

• Hardware-dependent:

• Pmap

• Hardware-independent parts are the driving force

Page 12: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Hardware-Independent

• Resident Page table

• Indexed by physical page number

• Entries are linked into

• Memory object list

• Memory allocation queue

• Object/offset hash bucket

• Address map

• Sorted, doubly-linked list of entries

• Maps a range of virtual addresses to area of a memory object

• Memory Object

• Resembles a UNIX file

• Managed by a pager

Page 13: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Hardware-Dependent

• Pmaps – management of physical address maps

• VAX – pmap corresponds to VAX page table

• IBM RT PC – pmap corresponds to allocated segment registers

• Pmaps are small

• Code is confined to a single module

• Interface with Mach is relatively small

• Modifying pmap requires little knowledge of Mach

• Pmap does not have to keep track of all valid mappings

Page 14: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

VM-IPC Integration

• Memory management based on use of memory objects,

which are represented by ports

• IPC in memory management

• Memory objects may reside on remote systems

• Kernel caches the contents of memory objects in local memory

• Memory-management use in IPC

• Mach passes messages by moving pointers to shared memory

objects

• Mach uses VM remapping to transfer large messages

Page 15: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Adapting Mach

• Code for VM originally ran on VAX machines

• IBM RT PC

• Approx. 3 months for pmap module

• Sequent Balance

• 5 weeks – bootable system

• Sun 3, Encore MultiMAX

Page 16: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Performance Table 7.1

The cost of various

measures of virtual

memory performance

for Mach, ACIS 4.2a,

SunOS 3.2, and 4.3bsd

UNIX

Table 7.2

Cost of compiling the

entire Mach kernel and

a set of 13C programs

on a VAX 8650 with 36

MB of memory under

both Mach and 4.3bsd

UNIX

Page 17: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Perspective

• Achieved Goals

• Sophisticated, hardware-independent VM system possible

• Good micro-benchmark performance

• Ongoing research

• Later evaluation demonstrated poor performance

Page 18: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Labels and Event Processes in the

Asbestos Operating System Petros Efstathopoulos – Researcher at Symantec, works on OS kernel

Projects, PhD from UCLA in 2008

Maxwell Krohn– Founder and Chief Scientist of OkCupid.com (built on

OKWS), Creator SFSLite and OK Web Server, PhD from MIT in 2008

Steve VanDeBogart – PhD from UCLA in 2009

Cliff Frey – CS Meng from MIT in 2005, Meraki

David Ziegler – CS Meng from MIT in 2005

Page 19: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Labels and Event Processes in the

Asbestos Operating System

Robert Morris – CS Professor at MIT, member of CSAIL, Parallel and

Distributed Operating Systems group. Creator of Chord.

Eddie Kohler – CS Professor at Harvard. Creator of Click Modular Router.

David Mazieres – CS Professor at Standard University, Secure Systems

Group. Creator of SFS and libasync.

Francs Kaashoek – CS Professor at MIT, member of CSAIL, Parallel and

Distributed Operating Systems group. Creator of Chord.

Page 20: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Asbestos Takeaways

• Asbestos operating system

• Applications use labels to define their own isolation policies

• Kernel enforces isolation policies

• Isolate processes by tracking and limiting the flow of information

• OK web server on Asbestos

• Performs comparably to Apache

• Provides better security properties than Apache

• Lessons/Flaws

• Label checking scales linearly with number of compartments

• Kernel must be invoked to update send labels

Page 21: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Context

• Problem

• Web servers and networked systems are insecure

• Lack of good primitives for security

• Monolithic code with many privileges

• Exploits: XSS, SQL Injection, Buffer Overrun

• Lack of isolation of user data

• Breaches divulge private information on a massive scale

• TJX – 45 million stolen CC numbers in 2007

• Sony PSN – $171 million in stolen accounts

Page 22: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Example The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Kernel

Apache

/alice_private_data.dat

Alice

123 Main St. 4275-8204-

4009-7915

By exploiting a

software flaw, Bob can

access Alice’s data

Alice

Bob

POST

/bin/sh Write()

Page 23: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Goal

• Bob can’t access Alice's data without Alice's permission

• Alice’s and Bob’s data are isolated

• Complications

• Bugs in the applications

• Alice's data may travel through several processes

• There may be hundreds of thousands of users

• To isolate, must prevent inappropriate data flow

Page 24: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Kernel

Apache

/bob_private_data.dat

Bob

456 Elm St. 5829-

7640-4607-1273

/alice_private_data.dat

Alice

123 Main St. 4275-

8204-4009-7915

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Label data w.r.t. its

owner (contaminate

w.r.t. to its owner)

Keep track of who

the connection is for

Base access control

decisions on labels

Page 25: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Asbestos Goals

• Large-scale

• Changing population of 100,000s of users

• Efficient

• Unprivileged

• Minimum privilege required to do its job

• Isolated

• One user can’t gain inappropriate access to other users’ data

• Isolation policy

• Application-defined, OS-enforced

Page 26: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Compared to MAC

• Mandatory access control systems

• Inflexible, administrator-established policies

• Fixed number of levels and compartments

• Central authority, no privilege delegation

• Asbestos

• Applications can define flexible policies

• Almost unlimited* number of compartments

• Any application can dynamically create compartments

Page 27: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Asbestos Overview

• Processes communicate using ports

• Asbestos labels

• Each process has a send label and a receive label

• Labels determine where messages can be sent

• Contamination of receiver updated

• Event processes

• Efficiently support/isolate many concurrent users

Page 28: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Asbestos Labels

• Applications have send and receive labels

• Labels consist of strings of handles and levels

• Handle

• 61-bit # identifying a compartment

• Applications can create compartments

• Creating a compartment returns its handle

• Level

• Contamination & Privilege = level (*, 0-3)

• Example send label – { } = {A 3, B 3, 1}

• Label size linear in # compartments

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Page 29: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Contamination Levels

• Asbestos’s four levels

• 2 levels for send(1)/receive(2) defaults, 1 above & below defaults

• User messages using PS of 3

• System defaults (2) denies user-tainted messages

• Raising PR requires special privilege

• User messages using PS of 2

• System defaults (2) allows communication

• must explicitly lower the PR to deny user messages

Page 30: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Policy enforcement

• Kernel performs clearance checks on message send

• Labels form a lattice

• Partial ordering • Sender's send label ⊑ destination's receive label

• L1 ⊑ L2 iff L1(h) ≤ L2(h) for all h

• Send label updated with a least upper bound operator

Implementing Clearance Checks

• How does the clearance check work?

• Labels form a lattice

• Partial ordering

– Sender's send label must be less than or equal to the

destination's receive label

• Send label updated with a least upper bound

operator

v

v v

v

Page 31: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Return to example The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Alice’s ahttpd Bob’s ahttpd cgi script

user

kernel

Send

Label

Receive

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The kernel contaminates

the message with the

sender’s contamination

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The kernel validates that the

destination has clearance to

receive the contamination of

the message

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

At delivery, the destination

takes on the contamination

of the message

Page 32: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Return to example The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The Problem

• If Bob compromises

the system, he can

access Alice's data

/submit_order.cgi

Kernel

Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Alice’s ahttpd Bob’s ahttpd cgi script

user

kernel

Send

Label

Receive

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

Basic Label

Example

Alice's

ahttpd

cgi script

User

Kernel

Bob's

ahttpd

Backend

DB

Send

Label

Recv

Label

Information Flow Control

/submit_order.cgi

Kernel

Label data with its

owner (contaminate

with respect to its owner) Alice

123 Main St.

4275-8204-4009-7915

Bob

456 Elm St.

5829-7640-4607-1273

The kernel contaminates

the message with the

sender’s contamination

The kernel validates that the

destination has clearance to

receive the contamination of

the message

Page 33: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Efficient Isolation

• Threads don’t provide enough isolation

• One process per user

• Processes consume too much memory

• Solution is an event-process abstraction The code for a typical event process-based server resembles the

event-driven dispatch loop above:

1. ep checkpoint(&msg);2. if (!state.initialized) {3. initialize state(state);4. state.reply = new port();5. }6. process msg(msg, state);7. ep yield();

Intuitively, many event process entities now share the event loop,

each with its own isolated state. The two ep system calls manage

control flow transfer between events. The single “state” variable

refers to a different user in each event process.

When the base process first calls the ep checkpoint system call,

it enters the event process realm, and the base process itself will

never run again. The process is de-scheduled until a message ar-

rives on a port for which the base process, or any of its event pro-

cesses, holds receive rights. The kernel then schedules an event pro-

cess to receive that message. If a particular event process holds re-

ceive rights for the relevant port, then ep checkpoint returns in that

event process’s context, restoring its private labels, receive rights,

and memory. If, on the other hand, the base process holds the port’s

receive rights, the kernel creates a new event process and returns

in that event process’s context. The event process starts with send

and receive labels copied from the base process’s labels, no receive

rights, and no private memory pages.

When ep checkpoint returns a message in an event process’s

context, the label contamination and declassification rules are ap-

plied to that event process’s labels. The kernel makes event process

memory writes private by marking all shared pages copy-on-write,

and an event process gets receive rights for any ports it creates.

After it finishes processing its message, the event process calls

the ep yield system call. This call saves any changes to the event

process’s labels, receive rights, and memory and then suspends

the whole process, just as when the base process first called

ep checkpoint. No event process will run until another message

is available for delivery.

Event processes often make temporary modifications to mem-

ory that are useful only for the current event. To keep the kernel

from saving such memory modifications across ep yields, an event

process can call ep clean to revert a specified memory range to

the base process’s state. An event process frees all its resources,

including its kernel-maintained state, with the ep exit system call.

Event processes can execute most system calls, including send-

ing and receiving messages, allocating memory, and so forth. Event

processes’ execution states, unlike their memory states, are not iso-

lated: an event process may block indefinitely in recv, blocking the

entire process, or even exit via the process-wide exit system call.

Usage Messages delivered to a base process handle typically cor-

respond to the advent of new client processes or new client net-

work connections, exactly the situations in which it is appropriate

to create a new event process. An event process can tell it is new

by checking and setting a memory location that the base process

initializes to zero; a new event process inherits the zero, while a

re-activation of an existing event process will see its previous non-

zero write to that location. A new event process will typically allo-

cate itself a new port on which to receive messages, as in line 4 of

the above code sample. The system ensures that messages to this

port will be delivered to the current event process, which can thus

send queries to file or database servers on behalf of the current user

and later receive any replies. For some applications, newly created

event processes might exit immediately without creating a handle;

though this cannot store state between messages, it does avoid ac-

cumulating taint.

An event process has all the power of an ordinary process to

restrict its labels, for example to reflect the fact that it is processing

a specific user’s private data. In the multi-user file server example

in Section 5.2, the file server would end up contaminating an event

process’s send label with the user’s uC handle, correctly reflecting

that just the event process was contaminated.

The base process does not explicitly create event processes, nor

does it know of their existence. In fact, once it calls ep checkpoint,

the base process never executes again in user space, and there is no

way to change its memory. Different event processes are also un-

aware of each other’s existence except possibly through message-

based communication, preserving the independence and isolation

represented by per-event process labels. We plan for future work

to investigate mechanisms for event processes to selectively share

memory, subject to label checks.

6.2 Implementation

Event processes are efficient for two reasons. First, the kernel

scheduling cost is little higher than that of a single process, even

with many event processes. Second, the memory overhead of an

event process can be as low as a single page of memory—to hold

the event process’s user-level state—plus a few hundred bytes of

kernel state.

While event process memory acts like a copy-on-write copy of

the base process’s memory, the implementation is optimized for

event processes that modify very little memory. The memory state

of each dormant event process includes just a list of modified pages

and the modified pages themselves. That is, event processes do not

keep their own page tables. A running event process borrows the

base process’s page table data structure in the kernel, changing it in

exactly those places that differ.

Typically, programs scatter users’ data across the stack in addi-

tion to various places on the heap. This would lead to a relatively

large number of pages that are unnecessarily specific to each event

process. Some of these pages merely hold temporary variables, oth-

ers must persist across the processing of several messages. Storing

the non-temporary data in a contained data structure on the heap

can minimize the number of persistent pages required. This data

management technique is much more natural in event-driven pro-

gramming, which the event process system calls symbiotically en-

courage. Thus, event processes tend to use minimal private mem-

ory, and the optimization of only storing page table differences is

profitable in practice.

7 WEB SERVER DESIGN

The Asbestos Web server is based on the OKWS system for

UNIX [20]. In the original OKWS design, a demultiplexer, ok-

demux, accepts each incoming TCP connection and parses its

HTTP headers to determine what service the remote client is re-

questing. It then hands the connection off to a worker process spe-

cialized for providing that service. OKWS’s goal is to isolate ser-

vices, so that one compromised service cannot affect others. Like

its UNIX predecessor, OKWS on Asbestos also isolates services

in different worker processes, but it additionally enforces user iso-

lation within workers via event processes: a compromised worker

cannot leak one user’s information to other users.

7.1 Startup

OKWS is started by a launcher process. The launcher spawns ok-

demux, site-specific workers requested by the site operator, and two

other processes seen in Figure 1, idd and ok-dbproxy. The processes

24

Page 34: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Event Process Abstraction

• Fork memory state for each new session

• No event process will run until message is available

• Kernel scheduling cost is little higher than a single process

• Small differences anticipated, stored efficiently

• Event loop allows shared execution state

• Light weight context switches

• Event process isolates state

• Each event process is only contaminated by one user

• One process per service with one event process per user

Page 35: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Evaluation

• Asbestos web server

• Based on the OKWS

• Enforces user isolation within workers via event processes

• Apache web server

• Version 1.3.33

• Application

• Response is a string of characters whose length depends on

client’s parameters

• OKWS had comparable throughput and latency to Apache

Page 36: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Label Costs

• Size of labels in the system increases with # sessions

• IPC operations take more time as # sessions increase

• Linear performance degradation as labels increase in size

Page 37: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Perspective

• Asbestos labels make MAC (mandatory access control)

tractable

• Labels provide decentralized policy specification

• Kernel enforces information flow control

• Event processes efficiently supports many users

• The OK web server on Asbestos

• Performs comparably to Apache

• Provides better security properties than Apache

Page 38: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Backup slides

Page 39: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Delegating to user processes

• Using IPC, a task can

• Allocate a region of VM

• De-allocate a region of VM

• Set the protection status of a region of VM

• Specify the inheritance of a region of VM

• Create and manage a memory object

Page 40: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Sharing Memory

• Copy-on-write for sharing memory objects

• Shadow objects

• Rely on original object for unmodified data

• Shadow objects may also be shadowed

• Read/write sharing must use the sharing map

• Sharing map is identical in organization to address map

Page 41: Virtual memory - Cornell University · Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Richard Rashid - Chief Executive Officer,

Multiprocessor Issues

• TLBs Consistency

• Solutions

• Force interrupts to all CPUs

• Wait until timer interrupt

• Temporarily allow inconsistency