1 Chapter 10 Virtual Memory. 2 Outline Background Demand Paging Process Creation Page Replacement...

Post on 04-Jan-2016

215 views 0 download

Transcript of 1 Chapter 10 Virtual Memory. 2 Outline Background Demand Paging Process Creation Page Replacement...

1

Chapter 10 Virtual Memory

2

Outline

• Background• Demand Paging• Process Creation• Page Replacement• Allocation of Frames • Thrashing• OS Examples

3

Background

• Memory Management in Chapter 9– Place the entire logical address space into the physical address

space• Exception -- overlays and dynamic loading: programmer

• Virtual memory – allow the execution of processes that may not be completely in memory– Only part of the program needs to be in memory for execution– Separate user logical address space (memory) from physical

address space (memory)– Logical address space can be much larger than physical address

space– Need to allow pages to be swapped in and out

4

Does the entire process need in memory for execution…?

• Code to handle unusual error conditions• Arrays, list, and tables are often allocated more memory

than they actually need• Certain options and features of a program may be used

rarely• Even when the entire program is needed… it may not all be

needed at the same time

5

Benefits of executing a program partially in memory…

• A program can be larger than the physical memory– Programmers no longer need to worry about the amount of physical

memory available

• Increase the level of multiprogramming– Increase the CPU utilization and throughput, without increasing the

response time or turnaround time (really?)

• Less I/O would be needed to load or swap each user program into memory

6

A large VM when only a small physical memory is available

Direct

access

Page is not in physical memory

Physical memory is full, swap a frame out

(Page Replacement)

7

Virtual Memory

• Implementation– Demand paging– Demand segmentation

• Segmentation-replacement algorithms are more complex because the segments have variable sizes

8

10.2 Demand Paging

9

Basic Concepts

• Similar to paging + swapping– Processes reside on secondary memory… When executing

• Swap: the entire process in/out memory (swapper)

• Demand Paging: never swap a page into memory unless it will be needed (Lazy Swapper or pager)

• Demand paging– Less I/O needed

– Less memory needed

– Faster response

– More processes

• Page is needed reference to it– invalid reference abort

– not-in-memory bring to memory

Pager guesses which pages will be used before the process is swapped out again Bring these necessary pages into memory only

10

Transfer of a paged memory to contiguous disk space

11

Hardware Support for Demand Paging

• With each page table entry a valid–invalid bit is associated– 1 legal and in-memory; 0 not-in-memory or illegal

• Initially valid–invalid bit is set to 0 on all entries• Marking a page invalid will have no effect if the process

never attempts to access that page• During address translation, if valid–invalid bit in page table

entry is 0 page-fault trap– Illegal?– Legal but not in memory

12

Page table when some pages are not in main memory…

illegal access

OS puts the process in the backing store when it starts executing.

13

Page-Fault Trap

• If there is a reference to a page, first reference will trap to OS page fault (this page has not yet brought into memory)

• OS looks at an internal table (keep with PCB) to decide:– Invalid reference abort– Just not in memory page it in

• Get a free frame from the free-frame list – What if no free fames Page Replacement

• Swap page into frame (schedule a disk operation)• Reset tables (internal and page tables), valid bit = 1.• Restart instruction (need architecture support)

14

Steps in handling a page fault

15

Restart any instruction after a page fault…

• ADD A, B, C– Fetch an decode the instruction (ADD)– Fetch A– Fetch B– Add A and B– Store the sum in C

The Key is -- How to keep the original states for restarting…

The Key is -- How to keep the original states for restarting…

Need Architecture SupportNeed Architecture Support

16

What happens if there is no free frame?

• Page replacement – find some page in memory, but not really in use, swap it out.– Algorithm?– Performance – want an algorithm which will result in minimum

number of page faults

• Same page may be brought into memory several times– Page in Page out Page in … Page out

17

More about Demand Paging…

• Pure demand paging– Never bring a page into memory until it is required– Start executing a process with no pages in memory

• Locality of reference– Result in reasonable performance from demand paging

• Hardware support: same as paging and swapping– Page Table– Secondary memory: hold the pages not in main memory

• Swap space or backing store• Must fast

18

Performance of Demand Paging

• Page Fault Rate 0 p 1.0– if p = 0 no page faults – if p = 1, every reference is a fault

• Effective Access Time (EAT)

EAT = (1 – p) x memory access

+ p (service page-fault trap

+ [swap page out ]

+ Read page in

+ restart overhead)

19

Performance of Demand Paging (Example)

• Memory access time = 100 nanoseconds• Average page-fault service time = 25 milliseconds• EAT = (1 – p) x (100) + p x (25 milliseconds)

= (1 – p) x (100) + p x 25,000,000 = 100 + 24,999,900 x p

• p = 0.001 EAT = 25 microseconds (250 slow down)

• EAT = 110 ns p < 0.0000004 (10-percent slow down)

20

10.3 Process Creation

21

Process Creation

• Virtual memory allows other benefits during process creation:– Copy-on-Write– Memory-Mapped Files

22

Copy-on-Write

• Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory– Their page tables point to the same frames

• If either process modifies a shared page, only then is the page copied (Copy-on-Write).

• COW allows more efficient process creation as only modified pages are copied.

• Free pages are allocated from a pool of zeroed-out pages.

23

Memory-Mapped Files

• Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory.

• A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page.

• Subsequent reads/writes to/from the file are treated as ordinary memory accesses.

• Simplifies file access by treating file I/O through memory rather than read() write() system calls.

• Also allows several processes to map the same file allowing the pages in memory to be shared.

24

Memory Mapped Files

25

10.4 Page Replacement

26

Introduction

• Prevent over-allocation of memory by modifying page-fault service routine to include page replacement.

• Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk.

• Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

27

Need for Page Replacement

28

Basic Page Replacement

• Find the location of the desired page on the disk• Find a free frame

– If there is a free frame, use it– If no free frame, use a page replacement algorithm to select a victim

frame– Write the victim to the disk; change the page and frame tables

accordingly

• Read the desired page into the newly free frame• Update the page and frame tables• Restart the user process

29

Page Replacement

30

Two Major Problems to Implement Demand Paging

• Frame-allocation algorithm– How many frames to allocate to

each process

• Page-replacement algorithm– Select the frames that are to be

replaced

– Want the lowest page-fault rate

– Evaluate an algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string

• Page-replacement algorithm (Cont.)– Reference String: the string of

memory references

– 0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105

• 1, 4, 1, 6, 1, 6, 1, 6, 1, 6, 1 (with page size=100)

– Reference String used: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.

31

Ideal graph of page faults VS. the number of frames…

32

First-In-First-Out (FIFO) Algorithm

33

FIFO Page Replacement

34

Belady’s Anomaly

The page-fault rate may increase as the number of allocated frames increases

35

Optimal Algorithm

• Replace page that will not be used for longest period of time• How do you know this? Need future knowledge• Used for comparison studies (like SJF)

36

Optimal Algorithm

37

Least Recently Used (LRU) Algorithm

• Replace the page that has not been used for the longest period of time– Associate each page the time of that page’s last use

38

LRU Algorithm

39

LRU Implementation

• Counter or Clock– Every page entry has a time-of-use field; every time page is

referenced through this entry, copy the clock into the field– When a page needs to be replaced, look at the counters to select

the victim– Require a search of the page table to find the LRU page, and a write

to memory for each memory access– Overflow of the clock must be considered

40

LRU Implementation(Cont.)

• Stack implementation – keep a stack of page numbers in a double link form:– Page referenced:

• move it to the top• requires 6 pointers to be changed

– No search for replacement

41

Stack Implementation of LRU

42

LRU Approximation Algorithms

• Reference bit– With each page associate a bit,

initially = 0

– When page is referenced bit set to 1.

– Replace the one which is 0 (if one exists). We do not know the order, however.

• Additional-Reference-Bit– Recording the reference bits at

regular intervals

– Say use 8-bits to record

• Illustration

– Replace the page with the lowest no. in history bits

• 11000100 vs. 01110111

1 0 1 1 0 0 1 1 0

Referencebit

0 1 0 1 1 0 0 1 1 0

History Bits

43

LRU Approximation Algorithms (Cont.)

• Second chance (Clock)– The basic algorithm is FIFO– Need reference bit– If page to be replaced (in FIFO order) has reference bit = 1. then:

• Set reference bit 0• Leave page in memory (give it the second chance)• Replace next page (in FIFO order), subject to same rules

44

The Clock Policy (Second Chance)

• The set of frames candidate for replacement is considered as a circular buffer

• When a page is replaced, a pointer is set to point to the next frame in buffer

• A use bit for each frame is set to 1 whenever – A page is first loaded into the frame– The corresponding page is referenced

• When it is time to replace a page, the first frame encountered with the use bit set to 0 is replaced.– During the search for replacement, each use bit set to 1 is changed

to 0

47

Second-Chance Page-Replacement Algorithm

48

Comparison of Clock with FIFO and LRU

• Asterisk indicates that the corresponding use bit is set to 1• Clock protects frequently referenced pages by setting the

use bit to 1 at each reference

49

Comparison of Clock with FIFO and LRU

• Numerical experiments tend to show that performance of Clock is close to that of LRU

• Experiments have been performed when the number of frames allocated to each process is fixed and when pages local to the page-fault process are considered for replacement– When few (6 to 8) frames are allocated per process, there is almost

a factor of 2 of page faults between LRU and FIFO– This factor reduces close to 1 when several (more than 12) frames

are allocated. (But then more main memory is needed to support the same level of multiprogramming)

50

51

LRU Approximation Algorithms (Cont.)

• Enhanced Second-Chance: reference + modify bit– Replace the first page encountered in the lowest class

• 1: (0,0) neither recently used nor modified• 2: (0,1) not recently used but modified• 3: (1,0) recently used but not modified• 4: (1,1) recently used and modified

52

Counting-Based Algorithms

• Keep a counter of the number of references that have been made to each page.– LFU Algorithm: replaces page with smallest count.– MFU Algorithm: based on the argument that the page with the

smallest count was probably just brought in and has yet to be used.

53

10.5 Allocation of Frames

54

Minimum Number of Frames

• Each process needs minimum number of pages– Defined by computer (instruction set) architecture

• Example: IBM 370 – 6 pages to handle SS MOVE instruction:– instruction is 6 bytes, might span 2 pages.– 2 pages to handle from.– 2 pages to handle to.

• Two major allocation schemes– fixed allocation– priority allocation

55

Fixed Allocation

• Equal allocation – e.g., if 100 frames and 5 processes, give each 20 pages.

• Proportional allocation – Allocate according to the size of process.

mS

spa

m

sS

ps

iii

i

ii

for allocation

frames of number total

process of size

5964137

127

564137

10

127

10

64

2

1

2

a

a

s

s

m

i

56

Priority Allocation

• Use a proportional allocation scheme w.r.t. priorities rather than size.

• If process Pi generates a page fault,– select for replacement one of its frames.– select for replacement a frame from a process with lower priority

number.

57

Global vs. Local Allocation

• Global replacement – process selects a replacement frame from the set of all frames; one process can take a frame from another.

• Local replacement – each process selects from only its own set of allocated frames.

58

10.6 Thrashing

59

Thrashing

• If a process does not have “enough” pages, the page-fault rate is very high. This leads to:– Low CPU utilization– Operating system thinks that it needs to increase the degree of

multiprogramming– Another process added to the system

• Think of a N-iteration loop, each iteration needs 4 pages and only 3 pages are allocated

• Thrashing a process is busy swapping pages in and out1

2

3

4

1 2 34 2 34 1 34 1 23 1 2

60

Thrashing Diagram

61

Locality Model

• To prevent thrashing, we must provide a process as many frames as it needs

• Why does paging work? Locality model– A locality is a set of pages that are actively used together– In the previous loop example, the locality consists of 4 pages– Process migrates from one locality to another– If we can keep the current localities of all processes in memory no

thrashing

• Why does thrashing occur? size of locality > total memory size

62

63

64

Working-Set Model

• Look at how many frames a process is actually using working-set window a fixed number of page

references– Example: 10,000 instruction

• WSSi (working set of Process Pi) =total number of pages referenced in the most recent – if too small will not encompass entire locality.– if too large will encompass several localities.– if = will encompass entire program.

• D = WSSi total demand frames• if D > m Thrashing

– Policy if D > m, then suspend one of the processes.

65

Working-Set Model Illustration

66

Keeping Track of Working Set

• Approximate with interval timer + a reference bit• Example: = 10,000

– Timer interrupts after every 5000 time units.– Keep in memory 2 bits for each page.– Whenever a timer interrupts, copy and sets the values of all

reference bits to 0.– If one of the bits in memory = 1 page in working set.

• Improvement = 10 bits and interrupt every 1000 time units.

67

Page-Fault Frequency Scheme

Use page-fault frequency to establish “acceptable” page-fault rate. If actual rate too low, process loses frame. If actual rate too high, process gains frame.

68

Process Suspension When Thrashing

• Lowest priority process• Faulting process

– this process does not have its working set in main memory so it will be blocked anyway

• Last process activated– this process is least likely to have its working set resident

69

Process Suspension

• Process with smallest resident set– this process requires the least future effort to reload

• Largest process– obtains the most free frames

• Process with the largest remaining execution window

70

10.8 Other Considerations

71

Pre-paging and Page Size

• Pre-paging – Attempt to prevent the high level of initial paging– Keep the working-set when a process is suspended– Whether the cost of pre-paging is less than that of the page-fault

service

• Page size selection (Prefer larger page now…)– Internal fragmentation small page– page table size large page– I/O overhead large page– Total I/O small page– # of page fault large page

72

TLB Reach

• TLB Reach - Amount of memory accessible from the TLB– TLB Reach = (TLB Size) X (Page Size)

• Ideally, the working set of each process is stored in the TLB. Otherwise there is a high degree of page faults

• Increasing TLB reach– Increase TLB size– Increase the Page Size. This may lead to an increase in

fragmentation as not all applications require a large page size– Provide Multiple Page Sizes: allow applications requiring larger page

sizes the opportunity to use them without increase in fragmentation• Need OS-managed TLB

73

Program Structure

• Program structure How to improve locality– Array A[1024, 1024] of integer– Each row is stored in one page– One frame – Program 1 for j := 1 to 1024 do

for i := 1 to 1024 doA[i,j] := 0;

1024 x 1024 page faults – Program 2 for i := 1 to 1024 do

for j := 1 to 1024 doA[i,j] := 0;

1024 page faults

74

Inverted Page Table

• IPT no longer contains complete information about the logical address space of a process, which is required to process page fault

• An external page table (one for each process) must be kept– Look like the traditional per-process page table, containing

information on where each virtual page is located– Reference only when a page fault occur no need to be quick– May page in and out of memory as necessary

75

I/O Interlocks

• What happens if a page waiting for I/O is paged out, and another page is put in the frame originally used for that page

• Solution 1: never execute I/O to user memory– I/O takes place only between system memory and I/O device– Extra copies are needed for transferring data between system

memory and user memory

• Solution 2: Allow pages to be locked into memory– A locked page cannot be selected for eviction by a page

replacement algorithm

76

Reason Why Frames Used for I/O Must be in Memory

77

Locking A Page to Prevent Replacement

• Kernel memory is usually locked in memory (performance)• Prevent replacing a newly brought-in page until it can be

used at least once

78

Real-Time Processing

• VM provides the best overall utilization of a computer• VM increases overall system throughput, but individual

processes may suffer from page faults and replacements• VM is the antithesis real-time computing

– VM can introduce unexpected long-term delays in the execution of a process while pages are brought into memory

– Real-time systems almost never have virtual memory

• Solaris 2 – real-time and time-sharing processes– Allow a process to tell it which pages are important to that process

(“hint” on page use so that important pages may not be paged out)– Privileged users can lock pages