OMSE-555/556 Software Engineering Practicum I & II Mid-Term I Status Winter 2013
OMSE 510: Computing Foundations 8: The Address Space
description
Transcript of OMSE 510: Computing Foundations 8: The Address Space
1
OMSE 510: Computing Foundations8: The Address Space
Chris Gilmore <[email protected]>
Portland State University/OMSE
Material Borrowed from Jon Walpole’s lectures
2
Today
Memory Management
Virtual/Physical Address Translation
Page Tables
MMU, TLB
3
Memory managementMemory – a linear array of bytes Holds O.S. and programs (processes) Each memory cell is named by a unique memory address
Recall, processes are defined by an address space, consisting of text, data, and stack regions
Process execution CPU fetches instructions from the text region according to the
value of the program counter (PC) Each instruction may request additional operands from the data
or stack region
4
Virtual memory management overview
What have know about memory management? Processes require memory to run We prove the appearance that the entire process is resident during execution
We know some functions/code in processes never get invoked Error detection and recovery routines In a graphics package, functions like smooth, sharpen, brighten, etc... may
not get invoked
Virtual Memory - allows for the execution of processes that may not be completely in memory (extension of paging technique from the last chapter)
5
Virtual memory overviewGoals:
Hides physical memory from userAllows higher degree of multiprogramming (only bring in pages that are accessed)Allows large processes to be run on small amounts of physical memoryReduces I/O required to swap in/out processes
(makes the system faster)
Requires: Pager - page in /out pages as required “Swap” space in order to hold processes that are partially complete Hardware support to do address translation
6
Addressing memoryCannot know ahead of time where in memory a program will be loaded!
Compiler produces code containing embedded addresses these addresses can’t be absolute ( physical addresses)
Linker combines pieces of the program Assumes the program will be loaded at address 0
We need to bind the compiler/linker generated addresses to the actual memory locations
7
Relocatable address generation
Prog P : : foo() : :End P
P: : push ... jmp _foo :
foo: ...
P: : push ... jmp 75 :
foo: ...
0
75
P: : push ... jmp 175 :
foo: ...
0
100
175
LibraryRoutines
P: : push ... jmp 1175 :
foo: ...
1000
1100
1175
LibraryRoutines
Compilation Assembly Linking Loading
8
Address bindingAddress binding fixing a physical address to the logical address of a
process’ address space
Compile time binding if program location is fixed and known ahead of time
Load time binding if program location in memory is unknown until run-
time AND location is fixed
Execution time binding if processes can be moved in memory during execution Requires hardware support!
9
P: : push ... jmp 175 :
foo: ...
0
100
175
LibraryRoutines
P: : push ... jmp 1175 :
foo: ...
1000
1100
1175
LibraryRoutines
P: : push ... jmp 1175 :
foo: ...
1000
1100
1175
LibraryRoutines
P: : push ... jmp 175 :
foo: ...
0
100
175
LibraryRoutines
1000
Base register
Execution Time Address
Binding
Load Time Address Binding
Compile Time Address Binding
10
Memory management architectures
Fixed size allocationMemory is divided into fixed partitions
Dynamically sized allocationMemory allocated to fit processes exactly
11
Runtime binding – base & limit registers
Simple runtime relocation scheme Use 2 registers to describe a partition
For every address generated, at runtime... Compare to the limit register (& abort if larger) Add to the base register to give physical
memory address
12
Dynamic relocation with a base register
process i
Operatingsystem
Max addr
0
Max Mem
0
Physical memory address
Relocation register for process i
1000
+MMU
Program generated address
Memory Management Unit (MMU) - dynamically converts logical addresses into physical address
MMU contains base address register for running process
13
Protection using base & limit registers
base
+
Physicaladdress memory
register
<
limitregister
yes
no
addressing error
logicaladdress
Memory protection Base register gives starting address for process Limit register limits the offset accessible from the
relocation register
14
Multiprogramming with base and limit registers
OSPartition A
Partition B
Partition C
Partition D
Partition E
base
limit
Multiprogramming: a separate partition per process
What happens on a context switch? Store process A’s base and limit register values Load new values into base and limit registers for process
B
15
128KO.S.
896K
16
128KO.S. 128KO.S.
896K
P1
576K
320K
17
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
128KO.S.
P1
352K
320K
224K
18
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P3
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
19
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P3
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
224K
64K
20
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P3
P4
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
128K
64K
96K
21
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P3
P4
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
128K
64K
96K
P4
P3
128KO.S.
288K
320K
128K
64K
96K
22
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P3
P4
P5
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
128K
64K
96K
P4
P3
128KO.S.
288K
320K
128K
64K
96K
P4
P3
128KO.S.
288K
224K
128K
64K
96K
96K
23
128KO.S. 128KO.S.
896K
P1
576K
320K
P2
P6
P3
P4
P5
128KO.S.
P1
352K
320K
224K P2
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
224K
64K
P3
128KO.S.
P1
288K
320K
128K
64K
96K
P4
P3
128KO.S.
288K
320K
128K
64K
96K
P4
P3
128KO.S.
288K
224K
128K
64K
96K
96K
P5
P4
P3
128KO.S.
288K
224K
128K
64K
96K
96K
???128K
24
SwappingWhen a program is running... The entire program must be in memory Each program is put into a single partition
When the program is not running... May remain resident in memory May get “swapped” out to disk
Over time... Programs come into memory when they get swapped in Programs leave memory when they get swapped out
25
Basics - swapping
Max mem
0
Operatingsystem
Process j
Process i
Process m
Process k
Swap in
Swap out
Benefits of swapping:Allows multiple programs to be run
concurrently… more than will fit in memory at once
26
Swapping can also lead to fragmentation
27
Dealing with fragmentation
P6
P5
P4
P3
128KO.S.
288K
224K
128K
64K
96K
96K
???128K P 6
P5
P 4
P3
128KO.S.
288K
224K
128K
256K
Compaction – from time to time shift processes around to collect all free space into one contiguous block
Placement algorithms: First-fit, best-fit, worst-fit
28
Influence of allocation policy
P1
P1
P2P2
P2
P3P3P3P4
P4
P5 P5
P2
P2 P2 P2P2
P2
P2
P3P3P3
P4 P4 P4
P4 P4
P4 P4 P4
P5 P5 P5 P5
P6P6
P6
P5
P2
P4
BEST-FIT
FIRST-FIT
O.S.O.S.O.S.O.S.O.S.O.S.
O.S.O.S.O.S.O.S.O.S.
1. Scan2. Compact
29
How big should partitions be?Programs may want to grow during execution More room for stack, heap allocation, etc
Problem: If the partition is too small programs must be moved Requires modification of base and limit regs Why not make the partitions a little larger than necessary to
accommodate “some” growth?
Fragmentation: External fragmentation = unused space between partitions Internal fragmentation = unused space within partitions
30
Allocating extra space within partitions
31
Managing memory
Each chunk of memory is either Used by some process or unused (“free”)
Operations Allocate a chunk of unused memory big enough
to hold a new process Free a chunk of memory by returning it to the
free pool after a process terminates or is swapped out
32
Managing memory with bit maps
Problem - how to keep track of used and unused memory?
Technique 1 - Bit Maps A long bit string One bit for every chunk of memory
1 = in use
0 = free
Size of allocation unit influences space required Example: unit size = 32 bits
overhead for bit map: 1/33 = 3% Example: unit size = 4Kbytes
overhead for bit map: 1/32,769
33
Managing memory with bit maps
34
Managing memory with linked lists
Technique 2 - Linked List
Keep a list of elements
Each element describes one unit of memory Free / in-use Bit (“P=process, H=hole”) Starting address Length Pointer to next element
35
Managing memory with linked lists
0
36
Merging holes
Whenever a unit of memory is freed we want to merge adjacent holes!
37
Merging holes
38
Merging holes
39
Merging holes
40
Merging holes
41
Managing memory with linked lists
Searching the list for space for a new process First Fit Next Fit
Start from current location in the list Not as good as first fit
Best Fit Find the smallest hole that will work Tends to create lots of little holes
Worst Fit Find the largest hole Remainder will be big
Quick Fit Keep separate lists for common sizes
42
Fragmentation
Memory is divided into partitions
Each partition has a different size
Processes are allocated space and later freed
After a while memory will be full of small holes! No free space large enough for a new process even
though there is enough free memory in total This is external fragmentation
If we allow free space within a partition we have internal fragmentation
43
Solution to fragmentation?Allocate memory in equal fixed size units? Reduces external fragmentation problems But what about wasted space inside a unit due to internal fragmentation?
How big should the units be? The smaller the better for internal fragmentation The larger the better for management overhead
Can we use a unit size smaller than the memory needed by a process? Ie, allocate non-contiguous units to the same process? … but how would the base and limit registers work?
44
Using pages for non-contiguous allocation
Memory divided into fixed size page frames Page frame size = 2n bytes Lowest n bits of an address specify byte offset in page
But how do we associate page frames with processes? And how do we map memory addresses within a process
to the correct memory byte in a page frame?
Solution Processes use virtual addresses Hardware uses physical addresses hardware support for virtual to physical address
translation
45
Virtual addresses
bit 0bit n-1bit 31
20 bits 12 bits
offsetpage number
Example: 32 bit virtual addressPage size = 212 = 4KBAddress space size = 232 bytes = 4GB
Virtual memory addresses (what the process uses) Page number plus byte offset in page Low order n bits are the byte offset Remaining high order bits are the page number
46
Physical addresses
bit 0bit n-1bit 24
12 bits 12 bits
offsetPage frame number
Example: 24 bit physical addressPage frame size = 212 = 4KBMax physical memory size = 224 bytes = 16MB
Physical memory addresses (what the CPU uses) Page frame number plus byte offset in page Low order n bits are the byte offset Remaining high order bits are the page frame
number
47
Address translation
Hardware maps page numbers to page frame numbers
Memory management unit (MMU) has multiple registers for multiple pages Like a base register except its value is
substituted for the page number rather than added to it
Why don’t we need a limit register for each page?
48
Memory Management Unit (MMU)
49
Virtual address spaces
Lowest address
Highest address
Virtual Addr Space
Here is the virtual address space (as seen by the process)
50
Virtual address spaces
Page 0
Page N
Page 1
Virtual Addr Space
01234567
N
A Page
The address space is divided into “pages” In x86, the page size is 4K
51
Virtual address spaces
Virtual Addr Space
01234567
N
Unused
In reality, only some of the pages are used
52
Physical memory
Physical memoryVirtual Addr Space
01234567
N
Physical memory is divided into “page frames” (Page size = frame size)
53
Virtual and physical address spaces
These framesare used forthis process
Virtual Addr Space Physical memory
01234567
N
Some page frames are used to hold the pages of this process
54
Virtual and physical address spaces
Used by otherProcesses
Virtual Addr Space Physical memory
01234567
N
Some page frames are used for other processes
55
Virtual address spaces
Virtual Addr Space Physical memory
01234567
N
Address mappings say which frame has which page
56
Page tables
Virtual Addr Space Physical memory
01234567
N
Address mappings are stored in a page table in memory
One page table entry per page... Is this page in memory? If so, which frame is it in?
57
Address mappings and translation
Address mappings are stored in a page table in memory Typically one page table for each process
Address translation is done by hardware (ie the MMU)
How does the MMU get the address mappings? Either the MMU holds the entire page table (too expensive) Or the MMU holds a portion of the page table
MMU caches page table entries called a translation look-aside buffer (TLB)
58
Address mappings and translationWhat if the TLB needs a mapping it doesn’t have?
Software managed TLB it generates a TLB-miss fault which is handled by the operating system (like
interrupt or trap handling) The operating system looks in the page tables, gets the mapping from the right
entry, and puts it in the TLB
Hardware managed TLB it looks in a pre-specified memory location for the appropriate entry in the page
table The hardware architecture defines where page tables must be stored in
memory
59
A Simple Architecture
Page size 4 Kbytes
Virtual addresses (“logical addresses”) 32 bits --> 4GB virtual address space 2M Pages --> 20 bits for page number
60
A Simple Architecture
011123220 bits 12 bits
offsetpage number
Page size 4 Kbytes
Virtual addresses (“logical addresses”) 32 bits --> 4GB virtual address space 2M Pages --> 20 bits for page number
61
A Simple Architecture
Physical addresses 32 bits --> 4 Gbyte installed memory (max) 4096K Frames --> 20 bits for frame number
Hardware Extensions…
62
A Simple Architecture
032
12 bits
The page table mapping: Page Directory -> Page Table --> Frame
Virtual Address:
20 bits
Page table(1M entries)
Page Frame(Physical Memory)
63
Quiz
What is the difference between a virtual and a physical address?
Why are programs not usually written using physical addresses?
64
Page tables
When and why do we access a page table?On every instruction to translate virtual to
physical addesses?
65
Page tables
When and why do we access a page table?On every instruction to translate virtual to
physical addresses? NO!On TLB miss faults to refill the TLBDuring process creation and destructionWhen a process allocates or frees memory?…
66
Translation Lookaside Buffer (TLB)
Problem:MMU must go to page table on every memory
access!
67
Translation Lookaside Buffer (TLB)
Problem:MMU must go to page table on every memory access!
Solution:Cache the page table entries in a hardware cacheSmall number of entries (e.g., 64)Each entry contains
Page numberOther stuff from page table entry
Associatively indexed on page number
68
Hardware operation of TLB0121331
page number offset
0121331frame number offset
physical address
virtual address
69
Hardware operation of TLB0121331
page number offset
0121331frame number offset
Page Number Frame NumberD R W Vunused
50 D R W Vunused
24 D R W Vunused
19 D R W Vunused
6 D R W Vunused
2317
92
12
physical address
5
37
Key
Other
virtual address
70
Hardware operation of TLB0121331
page number offset
0121331frame number offset
Page Number Frame NumberD R W Vunused
50 D R W Vunused
24 D R W Vunused
19 D R W Vunused
6 D R W Vunused
2317
92
12
physical address
5
37
Key
Other
virtual address
71
Hardware operation of TLB0121331
page number offset
0121331frame number offset
Page Number Frame NumberD R W Vunused
50 D R W Vunused
24 D R W Vunused
19 D R W Vunused
6 D R W Vunused
2317
92
12
physical address
5
37
Key
Other
virtual address
72
Hardware operation of TLB0121331
page number offset
0121331frame number offset
Page Number Frame NumberD R W Vunused
50 D R W Vunused
24 D R W Vunused
19 D R W Vunused
6 D R W Vunused
2317
92
12
physical address
5
37
Key
Other
virtual address
73
Hardware operation of TLB0121331
page number offset
0121331frame number offset
Page Number Frame NumberD R W Vunused
50 D R W Vunused
24 D R W Vunused
19 D R W Vunused
6 D R W Vunused
2317
92
12
physical address
5
37
Key
Other
virtual address
74
Software operation of TLBWhat if the entry is not in the TLB? Go to page table Find the right entry Move it into the TLB Which entry to replace?
Hardware TLB refill Page tables in specific location and format
Software refill Hardware generates trap (TLB miss fault) Lets the OS deal with the problem Page tables become entirely a OS data structure!
Want to do a context switch? Must empty the TLB Just clear the “Valid Bit”
75
Software operation of TLBWhat should we do with the TLB on a context switch?
How can we prevent the next process from using the last process’s address mappings? Option 1: empty the TLB
New process will generate faults until its pulls enough of its own entries into the TLB
Option 2: just clear the “Valid Bit” New process will generate faults until its pulls enough of its own entries into the TLB
Option 3: the hardware maintains a process id tag on each TLB entry Hardware compares this to a process id held in a specific register … on every
translation
76
Page tables
Do we access a page table when a process allocates or frees memory?
77
Page tables
Do we access a page table when a process allocates or frees memory?Not necessarilyLibrary routines (malloc) can service small
requests from a pool of free memory within a process
When these routines run out of space a new page must be allocated and its entry inserted into the page table
78
Page tablesWhen and why do we access a page table?On every instruction to translate virtual to physical addresses?
NO!On TLB miss faults to refill the TLBDuring process creation and destructionWhen a process allocates or frees memory?
Library routines (malloc) can service small requests from a pool of free memory within a process
When these routines run out of space a new page must be allocated and its entry inserted into the page table
During swapping/paging to disk
79
Page tables
In a well provisioned system, TLB miss faults will be the most frequently occurring event
TLB miss faultGiven a virtual page number we must find
the right page table entryFastest approach – index the page table
using virtual page numbers
80
Page table designPage table size depends onPage size Virtual address length
Memory used for page tables is overhead!How can we save space?… and still find entries quickly?
Two main ideasMulti-level page tables Inverted page tables
81
Multi-level Page Tables
82
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
83
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
84
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
85
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
86
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
87
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
88
Multi-level Page Tables
Top-levelPage table
2nd-level tables
framesin
memory•••
PT1 offsetPT210-bits 10-bits 12-bits
A Virtual Address:
89
Multi-level page tablesOk, so how does this save space?
Not all pages within a virtual address space are allocated Not only do they not have a page frame, but that range of
virtual addresses is not being used So no need to maintain complete information about it Some intermediate page tables are empty and not needed
We could also page the page table This saves space but slows access … a lot!
90
The x86 architecture
Page size 4 Kbytes
Virtual addresses (“logical addresses”) 32 bits --> 4GB virtual address space 2M Pages --> 20 bits for page number
91
The x86 architecture
032
12 bits
The page table mapping: Page Directory -> Page Table --> Frame
Virtual Address:
10 bits 10 bits
Page Directory(1024 entries)
Page Table(1024 entries)
Page Frame(Physical Memory)
92
Inverted page tablesProblem: Page table overhead increases with address space size Page tables get too big to fit in memory!
Consider a computer with 64 bit addresses Assume 4 Kbyte pages (12 bits for the offset) Virtual address space = 252 pages! Page table needs 252 entries! This page table is much too large for memory!
But we only need fast access to translations for those pages that are in memory! A 256 Mbyte memory can only hold 64 4Kbyte pages So we really only need 64 page table entries!
93
Inverted page tables
An inverted page table Has one entry for every frame of memory Tells which page is in that frame Is indexed by frame number not page number!
So how can we search it?
If we have a page number (from a faulting address) and want to find it page table entry, do we Do an exhaustive search of all entries?
94
Inverted page tablesAn inverted page table Has one entry for every frame of memory Tells which page is in that frame Is indexed by frame number not page number!
So how can we search it?
If we have a page number (from a faulting address) and want to find it page table entry, do we Do an exhaustive search of all entries? No, that’s too slow! Why not maintain a hash table to allow fast access given a page number?
95
Inverted Page Table
96
Which page table design is best?
The best choice depends on CPU architecture
64 bit systems need inverted page tables
Some systems use a combination of regular page tables together with segmentation (later)
97
Page tables
A typical page table entry
98
Performance of memory translation
Why can’t memory address translation be done in software?
How often is translation done?
What work is involved in translating a virtual address to a physical address? indexing into page tables interpreting page descriptors more memory references!
99
Memory hierarchy performance
0.5 ns!
0.5 ns - 20 ns
40 - 80 ns
longer than you want!
8 - 13 ms
1 - 40 cycles
1 cycle
80 - 160
16M - 26M
360 Billion
The “memory” hierarchy consists of several types of memory L1 cache (typically on die)
L2 cache (typically available)
Memory (DRAM, SRAM, RDRAM,…)
Disk (lots of space available)
Tape (even more space available…)
100
Performance of memory translation (2)
How can additional memory references be avoided? TLB - translation look-aside buffer an associative memory cache for page table
entries if there is locality of reference, performance is
good
101
Translation lookaside buffer
CPUp o
f o
page#
frame#
TLB
TLB Hit Physicalmemory
Page Table
102
TLB entries
103
TLB implementationIn order to be fast, TLBs must implement an associative search where the cache is searched in parallel. EXPENSIVE The number of entries varies (8 -> 2048)
Because the TLB translates logical pages to physical pages, the TLB must be flushed on every context switch in order to work Can improve performance by associating process bits with each TLB entry
A TLB must implement an eviction policy which flushes old entries out of the TLB Occurs when the TLB is full
104
Page table organization
How big should a virtual address space be? what factors influence its size?
How big are page tables? what factors determine their size?
Can page tables be held entirely in cache? can they be held entirely in memory even?
How big should page sizes be?
105
Page Size IssuesChoose a large page size
More loss due to internal fragmentation
Assume a process is using 5 regions of memory heavily
... Will need 5 pages, regardless of page size
---> Ties up more memory
Choose a small page size
The page table will become very large
Example:
Virtual Address Space: 4G bytes
Page Size: 4K (e.g., Pentium)
Page table size: 1M entries! (4Mbytes)
106
Address space organization
How big should a virtual address space be?
Which regions of the address space should be allocated for different purposes - stack, data, instructions?
What if memory needs for a region increase dynamically?
What are segments?
What is the relationship between segments and pages?
Can segmentation and paging be used together?
If segments are used, how are segment selectors incorporated into addresses?
107
Memory protectionAt what granularity should protection be implemented? page-level? segment level?
How is protection checking implemented? compare page protection bits with process capabilities and
operation types on every access sounds expensive!
How can protection checking be done efficiently? segment registers protection look-aside buffers
108
Memory protection with paging
5 R V
2 W V3 R V
9 W V
I I
012345
Page Table
Frame # R/W V/I
Associate protection bits with each page table entry Read/Write access - can provide read-only access for re-entrant code Valid/Invalid bits - tells MMU whether or not the page exists in the
process address space
Page Table Length Register (PTLR) - stores how long the page table is to avoid an excessive number of unused page table entries
109
Handling accesses to invalid pages
The page table is used to translate logical addresses to physical addressesPages that are not in memory are marked invalid
A page fault occurs when there is an access to an invalid page of a process
Page faults require the operating system to suspend the process find a free frame in memory swap-in the page that had the fault update the page table entry (PTE) restart the process
110
Page fault handling in more detail
Hardware traps to kernel General registers saved OS determines which virtual page needed OS checks validity of address, seeks page
frame If eviction needed & frame is dirty, write it to
disk
111
Page fault handling in more detail
OS brings new page in from disk Page tables updated Faulting instruction backed up to when it began Faulting process scheduled Registers restored Program continues
112
Anatomy of a page fault
A
C
E
0
1
2
3
4
B
D
Logical memory 9 V
i
2 V
i
5 V
0
1
2
3
4
Page table
1 off C
E
A
10
1
2
3
6
7
8
9
5
4
0
1
2
3
O.S.
5
4
7
8
Restart Proc.
Update PTE
Find Frame
Get from backing store
A C
E
B
D
6Bring in page
Page
faultPhysical memory
113
Locking pages in memory
An Issue to be aware of:Virtual memory and I/O occasionally interactProcess issues call for read from device into buffer while waiting for I/O, another processes starts up has a page fault buffer for the first process may be chosen to be
paged out
Need to specify some pages locked (pinned) exempted from being target pages
114
Quiz
Why is hardware support required for dynamic address translation?
What is a page table used for?
What is a TLB used for?
How many address bits are used for the page offset in a system with 2KB page size?
115
Memory protectionAt what granularity should protection be implemented?
page-level? A lot of overhead for storing protection information
for non-resident pages
segment level? Coarser grain than pages Makes sense if contiguous groups of pages share
the same protection status
116
Memory protection
How is protection checking implemented? compare page protection bits with process
capabilities and operation types on every load/store sounds expensive! Requires hardware support!
How can protection checking be done efficiently? Use the TLB as a protection look-aside buffer Use special segment registers
117
Protection lookaside buffer
A TLB is often used for more than just “translation”
Memory accesses need to be checked for validity Does the address refer to an allocated segment of the
address space? If not: segmentation fault!
Is this process allowed to access this memory segment? If not: segmentation/protection fault!
Is the type of access valid for this segment? Read, write, execute …? If not: protection fault!
118
Page-grain protection checking with a TLB
119
Segment-grain protection
All pages within a segment usually share the same protection status So we should be able to batch the protection
information
Why not just use segment-size pages? Segments vary in size Segments change size dynamically (stack, heap
etc)
120
Segmentation in a single address space
Example: A compiler
121
Segmented address spacesTraditional Virtual Address Space “flat” address space (1 dimensional)
Segmented Address Space Program made of several “pieces” Each segment is like a mini-address space Addresses within a segment start at zero The program must always say which segment it means
either embed a segment id in an address or load a value into a segment register
Addresses:Segment + Offset
Each segment can grow independently of others
122
Segmented memory
Each space grows, shrinks independently!
123
Separate instruction and data spaces
* One address space * Separate I and D spaces
124
Page sharingIn a large multiprogramming system... Some users run the same program at the same time
Why have more than one copy of the same page in memory???
Goal: Share pages among “processes” (not just threads!)
Cannot share writable pages If writable pages were shared processes would notice each
other’s effectsText segment can be shared
125
Page sharing
Process 1addressspace
Process 2addressspace
Process 1page table
Physical memory
Process 2page tableData (rw)
Instructions (rx)
Stack (rw)
126
Page sharing
“Fork” system call Copy the parent’s virtual address space
... and immediately do an “Exec” system call Exec overwrites the calling address space with the contents
of an executable file (ie a new program)
Desired Semantics: pages are copied, not shared
Observations Copying every page in an address space is expensive! processes can’t notice the difference between copying and
sharing unless pages are modified!
127
Page sharingIdea: Copy-On-Write Initialize new page table, but point entries to existing page
frames of parentShare pages
Temporarily mark all pages “read-only”Share all pages until a protection fault occurs
Protection fault (copy-on-write fault): Is this page really read only or is it writable but temporarily protected for
copy-on-write? If it is writable
copy the page mark both copies “writable” resume execution as if no fault occurred
128
On Page replacement..
Paging performance:Paging works best if there are plenty of free frames.
If all pages are full of dirty pages... Must perform 2 disk operations for each page
fault
129
Page replacementAssume a normal page table
User-program is executing
A PageInvalidFault occurs! The page needed is not in memory
Select some frame and remove the page in it If it has been modified, it must be written back to disk
the “dirty” bit in its page table entry tells us if this is necessary
Figure out which page was needed from the faulting addr
Read the needed page into this frame
Restart the interrupted process by retrying the same instruction
130
Page replacement algorithms
Which frame to replace?
Algorithms: The Optimal Algorithm First In First Out (FIFO) Not Recently Used (NRU) Second Chance / Clock Least Recently Used (LRU) Not Frequently Used (NFU) Working Set (WS) WSClock
131
The optimal page replacement algorithmIdea: Select the page that will not be needed for the
longest time
132
Optimal page replacement
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a a b b b b c c c c d d d d
X
Replace the page that will not be needed for the longest
Example:
133
Optimal page replacement
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a a a a a a ab b b b b b b b bc c c c c c c c cd d d d e e e e e
X X
Select the page that will not be needed for the longest time
Example:
134
The optimal page replacement algorithmIdea: Select the page that will not be needed for the
longest time
Problem: Can’t know the future of a program Can’t know when a given page will be needed
next The optimal algorithm is unrealizable
135
The optimal page replacement algorithmHowever: We can use it as a control case for simulation
studies Run the program once Generate a log of all memory references Use the log to simulate various page replacement
algorithms Can compare others to “optimal” algorithm
136
FIFO page replacement algorithm
Always replace the oldest page … “Replace the page that has been in memory for
the longest time.”
137
FIFO page replacement algorithm
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c a
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a b c c c c d d
X
Replace the page that was first brought into memory
Example: Memory system with 4 frames:
138
FIFO page replacement algorithm
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c a
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a a a b b bc c c c e e d d d d
X
Replace the page that was first brought into memory
Example: Memory system with 4 frames:
139
FIFO page replacement algorithm
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c a
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a a a a a c c b b b b b b bc c c c e e e e e e d d d d d d d a
X X X
Replace the page that was first brought into memory
Example: Memory system with 4 frames:
140
FIFO page replacement algorithm
Always replace the oldest page. “Replace the page that has been in memory for
the longest time.”
Implementation Maintain a linked list of all pages in memory Keep it in order of when they came into memory The page at the front of the list is oldest Add new page to end of list
141
FIFO page replacement algorithm
Disadvantage: The oldest page may be needed again soon Some page may be important throughout
execution It will get old, but replacing it will cause an
immediate page fault
142
Page table: referenced and dirty bits
Each page table entry (and TLB entry) has a Referenced bit - set by TLB when page read / written Dirty / modified bit - set when page is written If TLB entry for this page is valid, it has the most up
to date version of these bits for the page OS must copy them into the page table entry during fault
handling
On Some Hardware... ReadOnly bit but no dirty bit
143
Page table: referenced and dirty bits
Idea: Software sets the ReadOnly bit for all pages When program tries to update the page...
A trap occurs Software sets the Dirty Bit and clears the ReadOnly bit Resumes execution of the program
144
Not recently used page replacement alg.
Use the Referenced Bit and the Dirty Bit
Initially, all pages have Referenced Bit = 0 Dirty Bit = 0
Periodically... (e.g. whenever a timer interrupt occurs) Clear the Referenced Bit
145
Not recently used page replacement alg.
When a page fault occurs...
Categorize each page... Class 1: Referenced = 0 Dirty = 0 Class 2: Referenced = 0 Dirty = 1 Class 3: Referenced = 1 Dirty = 0 Class 4: Referenced = 1 Dirty = 1
Choose a victim page from class 1 … why?
If none, choose a page from class 2 … why?
If none, choose a page from class 3 … why?
If none, choose a page from class 4 … why?
146
Second chance page replacement alg.
Modification to FIFO
Pages kept in a linked list Oldest is at the front of the list
Look at the oldest page If its “referenced bit” is 0...
Select it for replacement Else
It was used recently; don’t want to replace it Clear its “referenced bit” Move it to the end of the list
Repeat
What if every page was used in last clock tick? Select a page at random
147
Clock algorithm (same as second chance)
1
1
2
3
5
4
0 clock bitframe #
Maintain a circular list of pages in memorySet a bit for the page when a page is referencedClock sweeps over memory looking for a victim page that does not have the referenced bit set If the bit is set, clear it and move on to the next page Replaces pages that haven’t been referenced for one complete
clock revolution
148
Least recently used algorithm (LRU)
Keep track of when a page is used.
Replace the page that has been used least recently.
149
LRU page replacement
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
Replace the page that hasn’t been referenced in the longest time
150
LRU page replacement
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
a a a a a a a a a ab b b b b b b b b b c c c c e e e e e d d d d d d d d d c c
X X X
Replace the page that hasn’t been referenced in the longest time
151
Least recently used algorithm (LRU)
But how can we implement this?
Implementation #1: Keep a linked list of all pages On every memory reference,
Move that page to the front of the list.
The page at the tail of the list is replaced.
“on every memory reference...” Not feasible in software
152
LRU implementation
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
Take referenced and put at head of list
153
LRU implementation
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
CABD
ACBD
a ab bc cd d
Take referenced and put at head of list
154
LRU implementation
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
CABD
ACBD
DACB
BDAC
a a a ab b b bc c c cd d d d X
Take referenced and put at head of list
155
LRU implementation
Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d
Page 0 aFrames 1 b 2 c 3 d
Page faults
CABD
ACBD
DACB
BDAC
EBDA
BEDA
ABED
BAED
CBAE
DCBA
a a a a a a a a a ab b b b b b b b b b c c c c e e e e e d d d d d d d d d c c
X X X
Take referenced and put at head of list
156
Least recently used algorithm (LRU)
But how can we implement this? … without requiring every access to be recorded?
Implementation #2: MMU (hardware) maintains a counter Incremented on every clock cycle Every time a page table entry is used
MMU writes the value to the entry “timestamp” / “time-of-last-use”
When a page fault occursSoftware looks through the page table Idenitifies the entry with the oldest timestamp
157
Least recently used algorithm (LRU)
What if we don’t have hardware support?
Implementation #3: No hardware support Maintain a counter in software One every timer interrupt...
Increment counter Run through the page table For every entry that has “ReferencedBit” = 1
Update its timestamp Clear the ReferencedBit
Approximates LRU If several have oldset time, choose one arbitrarily
158
Not frequently used algorithm (NFU)
Associate a counter with each page
On every clock interrupt, the OS looks at each page. If the Reference Bit is set...
Increment that page’s counter & clear the bit.
The counter approximates how often the page is used.
For replacement, choose the page with lowest counter.
159
Not frequently used algorithm (NFU)
Problem: Some page may be heavily used
---> Its counter is large
The program’s behavior changes Now, this page is not used ever again (or only rarely)
This algorithm never forgets! This page will never be chosen for replacement!
160
Modified NFU with aging
Associate a counter with each page
On every clock tick, the OS looks at each page. Shift the counter right 1 bit (divide its value by 2) If the Reference Bit is set...
Set the most-significant bit Clear the Referenced Bit
100000 =32
010000 = 16
001000 = 8
000100 = 4
100010 = 34
111111 = 63
161
Paged Memory Mangement
Concepts….
162
Working set page replacement
Demand paging Pages are only loaded when accessed When process begins, all pages marked INVALID
Locality of Reference Processes tend to use only a small fraction of their pages
Working Set The set of pages a process needs If working set is in memory, no page faults What if you can’t get working set into memory?
163
Working set page replacement
Thrashing If you can’t get working set into memory pages
fault every few instructions No work gets done
164
Working set page replacement
Prepaging (prefetching) Load pages before they are needed
Main idea: Indentify the process’s “working set”
How big is the working set? Look at the last K memory references As K gets bigger, more pages needed. In the limit, all pages are needed.
165
Working set page replacement
k (the time interval)
The size of the working set:
166
Working set page replacementIdea: Look back over the last T msec of time Which pages were referenced?
This is the working set.
Current Virtual Time Only consider how much CPU time this process has seen.
Implementation On each clock tick, look at each page Was it referenced?
Yes: Make a note of Current Virtual Time
If a page has not been used in the last T msec, It is not in the working set! Evict it; write it out if it is dirty.
167
Working set page replacement
168
WSClock page replacement algorithm
All pages are kept in a circular list (ring)
As pages are added, they go into the ring.
The “clock hand” advances around the ring.
Each entry contains “time of last use”.
Upon a page fault... If Reference Bit = 1...
Page is in use now. Do not evict. Clear the Referenced Bit. Update the “time of last use” field.
169
WSClock page replacement algorithm
If Reference Bit = 0 If the age of the page is less than T...
This page is in the working set. Advance the hand and keep looking
If the age of the page is greater than T... If page is clean
Reclaim the frame and we are done! If page is dirty
Schedule a write for the page Advance the hand and keep looking
170
Summary
171
Theory and practice
high water mark
low water mark
low < # free pages < high
Identifying victim frame on each page fault typically requires two disk accesses per page fault
Alternative the O.S. can keep several pages free in anticipation of upcoming page faults. In Unix: low and high water marks
172
Free pages and the clock algorithm
The rate at which the clock sweeps through memory determines the number of pages that are kept free: Too high a rate --> Too many free pages marked Too low a rate --> Not enough (or no) free pages marked
Large memory system considerations As memory systems grow, it takes longer and longer for the
hand to sweep through memory This washes out the effect of the clock somewhat Can use a two-handed clock to reduce the time between the
passing of the hands
173
The UNIX memory modelUNIX page replacementTwo handed clock algorithm for page replacement
If page has not been accessed move it to the free list for use as allocatable page
If modified/dirty write to disk (still keep stuff in memory though) If unmodified just move to free list
High and low water marks for free pages
Pages on the free-list can be re-allocated if they are accessed again before being overwritten
174
Modeling page replacementRun a program Look at all memory references Don’t need all this data Look at which pages are accessed
0000001222333300114444001123444
Eliminate duplicates012301401234
Reference String Use this to evaluate different page replacement algorithms
175
Belady’s anomalyIf you have more page frames (i.e., more memory)... You will have fewer page faults, right???
Not always!
Consider FIFO page replacement
Look at this reference string012301401234
Case 1: 3 frames available --> 9 page faults
Case 2: 4 frames available --> 10 page faults
176
Belady’s anomaly
FIFO with 3 page frames
177
Belady’s anomaly
FIFO with 3 page frames
FIFO with 4 page frames
178
Local vs. global page replacement
Assume several processes: A, B, C, ...
Some process gets a page fault (say, process A)
Choose a page to replace.
Local page replacement Only choose one of A’s pages
Global page replacement Choose any page
179
Local vs. global page replacement
Original Local Global
Example: Process has a page fault...
180
Local vs. global page replacement
Assume we have 5,000 frames in memory 10 processes
Idea: Give each process 500 frames
Fairness? Small processes: do not need all those pages Large processes: may benefit from even more frames
Idea: Look at the size of each process Give them a pro-rated number of frames With a minimum of (say) 10 frames per process
181
Page fault frequency“If you give a process more pages,
its page fault frequency will decline.”
182
Page fault frequency
Too High: Need to give thisprocess some more frames!
Too Low: Take some framesaway and give to other processes!
“If you give a process more pages,
its page fault frequency will decline.”
183
Page fault frequency
Measure the page fault frequency of each process.
Count the number of faults every second.
May want to consider the past few seconds as well.
184
Page fault frequencyMeasure the page fault frequency of each process.
Count the number of faults every second.
May want to consider the past few seconds as well.
Aging: Keep a running value. Every second
Count number of page faults Divide running value by 2 Add in the count for this second
185
Separation of Policy and Mechanism
Implementation ideas…
Kernel contains Code to manipulate the MMU
Machine dependent
Code to handle page faults Machine independent
A user-level “External Pager” process can determine policy
Which page to evict When to perform disk I/O How to manage the swap file
Examples: Mach, Minix
186
Separation of Policy and Mechanism