OMSE 510: Computing Foundations 8: The Address Space

1

OMSE 510: Computing Foundations8: The Address Space

Chris Gilmore <[email protected]>

Portland State University/OMSE

Material Borrowed from Jon Walpole’s lectures

2

Today

Memory Management

Virtual/Physical Address Translation

Page Tables

MMU, TLB

3

Memory managementMemory – a linear array of bytes Holds O.S. and programs (processes) Each memory cell is named by a unique memory address

Recall, processes are defined by an address space, consisting of text, data, and stack regions

Process execution CPU fetches instructions from the text region according to the

value of the program counter (PC) Each instruction may request additional operands from the data

or stack region

4

Virtual memory management overview

What have know about memory management? Processes require memory to run We prove the appearance that the entire process is resident during execution

We know some functions/code in processes never get invoked Error detection and recovery routines In a graphics package, functions like smooth, sharpen, brighten, etc... may

not get invoked

Virtual Memory - allows for the execution of processes that may not be completely in memory (extension of paging technique from the last chapter)

5

Virtual memory overviewGoals:

Hides physical memory from userAllows higher degree of multiprogramming (only bring in pages that are accessed)Allows large processes to be run on small amounts of physical memoryReduces I/O required to swap in/out processes

(makes the system faster)

Requires: Pager - page in /out pages as required “Swap” space in order to hold processes that are partially complete Hardware support to do address translation

6

Addressing memoryCannot know ahead of time where in memory a program will be loaded!

Compiler produces code containing embedded addresses these addresses can’t be absolute ( physical addresses)

Linker combines pieces of the program Assumes the program will be loaded at address 0

We need to bind the compiler/linker generated addresses to the actual memory locations

7

Relocatable address generation

Prog P : : foo() : :End P

P: : push ... jmp _foo :

foo: ...

P: : push ... jmp 75 :

foo: ...

0

75

P: : push ... jmp 175 :

foo: ...

0

100

175

LibraryRoutines

P: : push ... jmp 1175 :

foo: ...

1000

1100

1175

LibraryRoutines

Compilation Assembly Linking Loading

8

Address bindingAddress binding fixing a physical address to the logical address of a

process’ address space

Compile time binding if program location is fixed and known ahead of time

Load time binding if program location in memory is unknown until run-

time AND location is fixed

Execution time binding if processes can be moved in memory during execution Requires hardware support!

9

P: : push ... jmp 175 :

foo: ...

0

100

175

LibraryRoutines

P: : push ... jmp 1175 :

foo: ...

1000

1100

1175

LibraryRoutines

P: : push ... jmp 1175 :

foo: ...

1000

1100

1175

LibraryRoutines

P: : push ... jmp 175 :

foo: ...

0

100

175

LibraryRoutines

1000

Base register

Execution Time Address

Binding

Load Time Address Binding

Compile Time Address Binding

10

Memory management architectures

Fixed size allocationMemory is divided into fixed partitions

Dynamically sized allocationMemory allocated to fit processes exactly

11

Runtime binding – base & limit registers

Simple runtime relocation scheme Use 2 registers to describe a partition

For every address generated, at runtime... Compare to the limit register (& abort if larger) Add to the base register to give physical

memory address

12

Dynamic relocation with a base register

process i

Operatingsystem

Max addr

0

Max Mem

0

Physical memory address

Relocation register for process i

1000

+MMU

Program generated address

Memory Management Unit (MMU) - dynamically converts logical addresses into physical address

MMU contains base address register for running process

13

Protection using base & limit registers

base

+

Physicaladdress memory

register

<

limitregister

yes

no

addressing error

logicaladdress

Memory protection Base register gives starting address for process Limit register limits the offset accessible from the

relocation register

14

Multiprogramming with base and limit registers

OSPartition A

Partition B

Partition C

Partition D

Partition E

base

limit

Multiprogramming: a separate partition per process

What happens on a context switch? Store process A’s base and limit register values Load new values into base and limit registers for process

B

15

128KO.S.

896K

16

128KO.S. 128KO.S.

896K

P1

576K

320K

17

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

128KO.S.

P1

352K

320K

224K

18

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P3

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

19

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P3

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

224K

64K

20

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P3

P4

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

128K

64K

96K

21

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P3

P4

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

128K

64K

96K

P4

P3

128KO.S.

288K

320K

128K

64K

96K

22

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P3

P4

P5

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

128K

64K

96K

P4

P3

128KO.S.

288K

320K

128K

64K

96K

P4

P3

128KO.S.

288K

224K

128K

64K

96K

96K

23

128KO.S. 128KO.S.

896K

P1

576K

320K

P2

P6

P3

P4

P5

128KO.S.

P1

352K

320K

224K P2

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

224K

64K

P3

128KO.S.

P1

288K

320K

128K

64K

96K

P4

P3

128KO.S.

288K

320K

128K

64K

96K

P4

P3

128KO.S.

288K

224K

128K

64K

96K

96K

P5

P4

P3

128KO.S.

288K

224K

128K

64K

96K

96K

???128K

24

SwappingWhen a program is running... The entire program must be in memory Each program is put into a single partition

When the program is not running... May remain resident in memory May get “swapped” out to disk

Over time... Programs come into memory when they get swapped in Programs leave memory when they get swapped out

25

Basics - swapping

Max mem

0

Operatingsystem

Process j

Process i

Process m

Process k

Swap in

Swap out

Benefits of swapping:Allows multiple programs to be run

concurrently… more than will fit in memory at once

26

Swapping can also lead to fragmentation

27

Dealing with fragmentation

P6

P5

P4

P3

128KO.S.

288K

224K

128K

64K

96K

96K

???128K P 6

P5

P 4

P3

128KO.S.

288K

224K

128K

256K

Compaction – from time to time shift processes around to collect all free space into one contiguous block

Placement algorithms: First-fit, best-fit, worst-fit

28

Influence of allocation policy

P1

P1

P2P2

P2

P3P3P3P4

P4

P5 P5

P2

P2 P2 P2P2

P2

P2

P3P3P3

P4 P4 P4

P4 P4

P4 P4 P4

P5 P5 P5 P5

P6P6

P6

P5

P2

P4

BEST-FIT

FIRST-FIT

O.S.O.S.O.S.O.S.O.S.O.S.

O.S.O.S.O.S.O.S.O.S.

1. Scan2. Compact

29

How big should partitions be?Programs may want to grow during execution More room for stack, heap allocation, etc

Problem: If the partition is too small programs must be moved Requires modification of base and limit regs Why not make the partitions a little larger than necessary to

accommodate “some” growth?

Fragmentation: External fragmentation = unused space between partitions Internal fragmentation = unused space within partitions

30

Allocating extra space within partitions

31

Managing memory

Each chunk of memory is either Used by some process or unused (“free”)

Operations Allocate a chunk of unused memory big enough

to hold a new process Free a chunk of memory by returning it to the

free pool after a process terminates or is swapped out

32

Managing memory with bit maps

Problem - how to keep track of used and unused memory?

Technique 1 - Bit Maps A long bit string One bit for every chunk of memory

1 = in use

0 = free

Size of allocation unit influences space required Example: unit size = 32 bits

overhead for bit map: 1/33 = 3% Example: unit size = 4Kbytes

overhead for bit map: 1/32,769

33

Managing memory with bit maps

34

Managing memory with linked lists

Technique 2 - Linked List

Keep a list of elements

Each element describes one unit of memory Free / in-use Bit (“P=process, H=hole”) Starting address Length Pointer to next element

35


0

36

Merging holes

Whenever a unit of memory is freed we want to merge adjacent holes!

37

Merging holes

38

Merging holes

39

Merging holes

40

Merging holes

41


Searching the list for space for a new process First Fit Next Fit

Start from current location in the list Not as good as first fit

Best Fit Find the smallest hole that will work Tends to create lots of little holes

Worst Fit Find the largest hole Remainder will be big

Quick Fit Keep separate lists for common sizes

42

Fragmentation

Memory is divided into partitions

Each partition has a different size

Processes are allocated space and later freed

After a while memory will be full of small holes! No free space large enough for a new process even

though there is enough free memory in total This is external fragmentation

If we allow free space within a partition we have internal fragmentation

43

Solution to fragmentation?Allocate memory in equal fixed size units? Reduces external fragmentation problems But what about wasted space inside a unit due to internal fragmentation?

How big should the units be? The smaller the better for internal fragmentation The larger the better for management overhead

Can we use a unit size smaller than the memory needed by a process? Ie, allocate non-contiguous units to the same process? … but how would the base and limit registers work?

44

Using pages for non-contiguous allocation

Memory divided into fixed size page frames Page frame size = 2n bytes Lowest n bits of an address specify byte offset in page

But how do we associate page frames with processes? And how do we map memory addresses within a process

to the correct memory byte in a page frame?

Solution Processes use virtual addresses Hardware uses physical addresses hardware support for virtual to physical address

translation

45

Virtual addresses

bit 0bit n-1bit 31

20 bits 12 bits

offsetpage number

Example: 32 bit virtual addressPage size = 212 = 4KBAddress space size = 232 bytes = 4GB

Virtual memory addresses (what the process uses) Page number plus byte offset in page Low order n bits are the byte offset Remaining high order bits are the page number

46

Physical addresses

bit 0bit n-1bit 24

12 bits 12 bits

offsetPage frame number

Example: 24 bit physical addressPage frame size = 212 = 4KBMax physical memory size = 224 bytes = 16MB

Physical memory addresses (what the CPU uses) Page frame number plus byte offset in page Low order n bits are the byte offset Remaining high order bits are the page frame

number

47

Address translation

Hardware maps page numbers to page frame numbers

Memory management unit (MMU) has multiple registers for multiple pages Like a base register except its value is

substituted for the page number rather than added to it

Why don’t we need a limit register for each page?

48

Memory Management Unit (MMU)

49

Virtual address spaces

Lowest address

Highest address

Virtual Addr Space

Here is the virtual address space (as seen by the process)

50


Page 0

Page N

Page 1

Virtual Addr Space

01234567

N

A Page

The address space is divided into “pages” In x86, the page size is 4K

51


Virtual Addr Space

01234567

N

Unused

In reality, only some of the pages are used

52

Physical memory

Physical memoryVirtual Addr Space

01234567

N

Physical memory is divided into “page frames” (Page size = frame size)

53

Virtual and physical address spaces

These framesare used forthis process

Virtual Addr Space Physical memory

01234567

N

Some page frames are used to hold the pages of this process

54

Virtual and physical address spaces

Used by otherProcesses


01234567

N

Some page frames are used for other processes

55



01234567

N

Address mappings say which frame has which page

56

Page tables


01234567

N

Address mappings are stored in a page table in memory

One page table entry per page... Is this page in memory? If so, which frame is it in?

57

Address mappings and translation

Address mappings are stored in a page table in memory Typically one page table for each process

Address translation is done by hardware (ie the MMU)

How does the MMU get the address mappings? Either the MMU holds the entire page table (too expensive) Or the MMU holds a portion of the page table

MMU caches page table entries called a translation look-aside buffer (TLB)

58

Address mappings and translationWhat if the TLB needs a mapping it doesn’t have?

Software managed TLB it generates a TLB-miss fault which is handled by the operating system (like

interrupt or trap handling) The operating system looks in the page tables, gets the mapping from the right

entry, and puts it in the TLB

Hardware managed TLB it looks in a pre-specified memory location for the appropriate entry in the page

table The hardware architecture defines where page tables must be stored in

memory

59

A Simple Architecture

Page size 4 Kbytes

Virtual addresses (“logical addresses”) 32 bits --> 4GB virtual address space 2M Pages --> 20 bits for page number

60


011123220 bits 12 bits

offsetpage number

Page size 4 Kbytes


61


Physical addresses 32 bits --> 4 Gbyte installed memory (max) 4096K Frames --> 20 bits for frame number

Hardware Extensions…

62


032

12 bits

The page table mapping: Page Directory -> Page Table --> Frame

Virtual Address:

20 bits

Page table(1M entries)

Page Frame(Physical Memory)

63

Quiz

What is the difference between a virtual and a physical address?

Why are programs not usually written using physical addresses?

64

Page tables

When and why do we access a page table?On every instruction to translate virtual to

physical addesses?

65

Page tables

When and why do we access a page table?On every instruction to translate virtual to

physical addresses? NO!On TLB miss faults to refill the TLBDuring process creation and destructionWhen a process allocates or frees memory?…

66

Translation Lookaside Buffer (TLB)

Problem:MMU must go to page table on every memory

access!

67

Translation Lookaside Buffer (TLB)

Problem:MMU must go to page table on every memory access!

Solution:Cache the page table entries in a hardware cacheSmall number of entries (e.g., 64)Each entry contains

Page numberOther stuff from page table entry

Associatively indexed on page number

68

Hardware operation of TLB0121331

page number offset

0121331frame number offset

physical address

virtual address

69


page number offset


Page Number Frame NumberD R W Vunused

50 D R W Vunused

24 D R W Vunused

19 D R W Vunused

6 D R W Vunused

2317

92

12

physical address

5

37

Key

Other

virtual address

70


page number offset



50 D R W Vunused

24 D R W Vunused

19 D R W Vunused

6 D R W Vunused

2317

92

12

physical address

5

37

Key

Other

virtual address

71


page number offset



50 D R W Vunused

24 D R W Vunused

19 D R W Vunused

6 D R W Vunused

2317

92

12

physical address

5

37

Key

Other

virtual address

72


page number offset



50 D R W Vunused

24 D R W Vunused

19 D R W Vunused

6 D R W Vunused

2317

92

12

physical address

5

37

Key

Other

virtual address

73


page number offset



50 D R W Vunused

24 D R W Vunused

19 D R W Vunused

6 D R W Vunused

2317

92

12

physical address

5

37

Key

Other

virtual address

74

Software operation of TLBWhat if the entry is not in the TLB? Go to page table Find the right entry Move it into the TLB Which entry to replace?

Hardware TLB refill Page tables in specific location and format

Software refill Hardware generates trap (TLB miss fault) Lets the OS deal with the problem Page tables become entirely a OS data structure!

Want to do a context switch? Must empty the TLB Just clear the “Valid Bit”

75

Software operation of TLBWhat should we do with the TLB on a context switch?

How can we prevent the next process from using the last process’s address mappings? Option 1: empty the TLB

New process will generate faults until its pulls enough of its own entries into the TLB

Option 2: just clear the “Valid Bit” New process will generate faults until its pulls enough of its own entries into the TLB

Option 3: the hardware maintains a process id tag on each TLB entry Hardware compares this to a process id held in a specific register … on every

translation

76

Page tables

Do we access a page table when a process allocates or frees memory?

77

Page tables

Do we access a page table when a process allocates or frees memory?Not necessarilyLibrary routines (malloc) can service small

requests from a pool of free memory within a process

When these routines run out of space a new page must be allocated and its entry inserted into the page table

78

Page tablesWhen and why do we access a page table?On every instruction to translate virtual to physical addresses?

NO!On TLB miss faults to refill the TLBDuring process creation and destructionWhen a process allocates or frees memory?

Library routines (malloc) can service small requests from a pool of free memory within a process

When these routines run out of space a new page must be allocated and its entry inserted into the page table

During swapping/paging to disk

79

Page tables

In a well provisioned system, TLB miss faults will be the most frequently occurring event

TLB miss faultGiven a virtual page number we must find

the right page table entryFastest approach – index the page table

using virtual page numbers

80

Page table designPage table size depends onPage size Virtual address length

Memory used for page tables is overhead!How can we save space?… and still find entries quickly?

Two main ideasMulti-level page tables Inverted page tables

81

Multi-level Page Tables

82


Top-levelPage table

2nd-level tables

framesin

memory•••

83


Top-levelPage table

2nd-level tables

framesin

memory•••

PT1 offsetPT210-bits 10-bits 12-bits

A Virtual Address:

84


Top-levelPage table

2nd-level tables

framesin

memory•••


A Virtual Address:

85


Top-levelPage table

2nd-level tables

framesin

memory•••


A Virtual Address:

86


Top-levelPage table

2nd-level tables

framesin

memory•••


A Virtual Address:

87


Top-levelPage table

2nd-level tables

framesin

memory•••


A Virtual Address:

88


Top-levelPage table

2nd-level tables

framesin

memory•••


A Virtual Address:

89

Multi-level page tablesOk, so how does this save space?

Not all pages within a virtual address space are allocated Not only do they not have a page frame, but that range of

virtual addresses is not being used So no need to maintain complete information about it Some intermediate page tables are empty and not needed

We could also page the page table This saves space but slows access … a lot!

90

The x86 architecture

Page size 4 Kbytes


91

The x86 architecture

032

12 bits

The page table mapping: Page Directory -> Page Table --> Frame

Virtual Address:

10 bits 10 bits

Page Directory(1024 entries)

Page Table(1024 entries)

Page Frame(Physical Memory)

92

Inverted page tablesProblem: Page table overhead increases with address space size Page tables get too big to fit in memory!

Consider a computer with 64 bit addresses Assume 4 Kbyte pages (12 bits for the offset) Virtual address space = 252 pages! Page table needs 252 entries! This page table is much too large for memory!

But we only need fast access to translations for those pages that are in memory! A 256 Mbyte memory can only hold 64 4Kbyte pages So we really only need 64 page table entries!

93

Inverted page tables

An inverted page table Has one entry for every frame of memory Tells which page is in that frame Is indexed by frame number not page number!

So how can we search it?

If we have a page number (from a faulting address) and want to find it page table entry, do we Do an exhaustive search of all entries?

94

Inverted page tablesAn inverted page table Has one entry for every frame of memory Tells which page is in that frame Is indexed by frame number not page number!

So how can we search it?

If we have a page number (from a faulting address) and want to find it page table entry, do we Do an exhaustive search of all entries? No, that’s too slow! Why not maintain a hash table to allow fast access given a page number?

95

Inverted Page Table

96

Which page table design is best?

The best choice depends on CPU architecture

64 bit systems need inverted page tables

Some systems use a combination of regular page tables together with segmentation (later)

97

Page tables

A typical page table entry

98

Performance of memory translation

Why can’t memory address translation be done in software?

How often is translation done?

What work is involved in translating a virtual address to a physical address? indexing into page tables interpreting page descriptors more memory references!

99

Memory hierarchy performance

0.5 ns!

0.5 ns - 20 ns

40 - 80 ns

longer than you want!

8 - 13 ms

1 - 40 cycles

1 cycle

80 - 160

16M - 26M

360 Billion

The “memory” hierarchy consists of several types of memory L1 cache (typically on die)

L2 cache (typically available)

Memory (DRAM, SRAM, RDRAM,…)

Disk (lots of space available)

Tape (even more space available…)

100

Performance of memory translation (2)

How can additional memory references be avoided? TLB - translation look-aside buffer an associative memory cache for page table

entries if there is locality of reference, performance is

good

101

Translation lookaside buffer

CPUp o

f o

page#

frame#

TLB

TLB Hit Physicalmemory

Page Table

102

TLB entries

103

TLB implementationIn order to be fast, TLBs must implement an associative search where the cache is searched in parallel. EXPENSIVE The number of entries varies (8 -> 2048)

Because the TLB translates logical pages to physical pages, the TLB must be flushed on every context switch in order to work Can improve performance by associating process bits with each TLB entry

A TLB must implement an eviction policy which flushes old entries out of the TLB Occurs when the TLB is full

104

Page table organization

How big should a virtual address space be? what factors influence its size?

How big are page tables? what factors determine their size?

Can page tables be held entirely in cache? can they be held entirely in memory even?

How big should page sizes be?

105

Page Size IssuesChoose a large page size

More loss due to internal fragmentation

Assume a process is using 5 regions of memory heavily

... Will need 5 pages, regardless of page size

---> Ties up more memory

Choose a small page size

The page table will become very large

Example:

Virtual Address Space: 4G bytes

Page Size: 4K (e.g., Pentium)

Page table size: 1M entries! (4Mbytes)

106

Address space organization

How big should a virtual address space be?

Which regions of the address space should be allocated for different purposes - stack, data, instructions?

What if memory needs for a region increase dynamically?

What are segments?

What is the relationship between segments and pages?

Can segmentation and paging be used together?

If segments are used, how are segment selectors incorporated into addresses?

107

Memory protectionAt what granularity should protection be implemented? page-level? segment level?

How is protection checking implemented? compare page protection bits with process capabilities and

operation types on every access sounds expensive!

How can protection checking be done efficiently? segment registers protection look-aside buffers

108

Memory protection with paging

5 R V

2 W V3 R V

9 W V

I I

012345

Page Table

Frame # R/W V/I

Associate protection bits with each page table entry Read/Write access - can provide read-only access for re-entrant code Valid/Invalid bits - tells MMU whether or not the page exists in the

process address space

Page Table Length Register (PTLR) - stores how long the page table is to avoid an excessive number of unused page table entries

109

Handling accesses to invalid pages

The page table is used to translate logical addresses to physical addressesPages that are not in memory are marked invalid

A page fault occurs when there is an access to an invalid page of a process

Page faults require the operating system to suspend the process find a free frame in memory swap-in the page that had the fault update the page table entry (PTE) restart the process

110

Page fault handling in more detail

Hardware traps to kernel General registers saved OS determines which virtual page needed OS checks validity of address, seeks page

frame If eviction needed & frame is dirty, write it to

disk

111

Page fault handling in more detail

OS brings new page in from disk Page tables updated Faulting instruction backed up to when it began Faulting process scheduled Registers restored Program continues

112

Anatomy of a page fault

A

C

E

0

1

2

3

4

B

D

Logical memory 9 V

i

2 V

i

5 V

0

1

2

3

4

Page table

1 off C

E

A

10

1

2

3

6

7

8

9

5

4

0

1

2

3

O.S.

5

4

7

8

Restart Proc.

Update PTE

Find Frame

Get from backing store

A C

E

B

D

6Bring in page

Page

faultPhysical memory

113

Locking pages in memory

An Issue to be aware of:Virtual memory and I/O occasionally interactProcess issues call for read from device into buffer while waiting for I/O, another processes starts up has a page fault buffer for the first process may be chosen to be

paged out

Need to specify some pages locked (pinned) exempted from being target pages

114

Quiz

Why is hardware support required for dynamic address translation?

What is a page table used for?

What is a TLB used for?

How many address bits are used for the page offset in a system with 2KB page size?

115

Memory protectionAt what granularity should protection be implemented?

page-level? A lot of overhead for storing protection information

for non-resident pages

segment level? Coarser grain than pages Makes sense if contiguous groups of pages share

the same protection status

116

Memory protection

How is protection checking implemented? compare page protection bits with process

capabilities and operation types on every load/store sounds expensive! Requires hardware support!

How can protection checking be done efficiently? Use the TLB as a protection look-aside buffer Use special segment registers

117

Protection lookaside buffer

A TLB is often used for more than just “translation”

Memory accesses need to be checked for validity Does the address refer to an allocated segment of the

address space? If not: segmentation fault!

Is this process allowed to access this memory segment? If not: segmentation/protection fault!

Is the type of access valid for this segment? Read, write, execute …? If not: protection fault!

118

Page-grain protection checking with a TLB

119

Segment-grain protection

All pages within a segment usually share the same protection status So we should be able to batch the protection

information

Why not just use segment-size pages? Segments vary in size Segments change size dynamically (stack, heap

etc)

120

Segmentation in a single address space

Example: A compiler

121

Segmented address spacesTraditional Virtual Address Space “flat” address space (1 dimensional)

Segmented Address Space Program made of several “pieces” Each segment is like a mini-address space Addresses within a segment start at zero The program must always say which segment it means

either embed a segment id in an address or load a value into a segment register

Addresses:Segment + Offset

Each segment can grow independently of others

122

Segmented memory

Each space grows, shrinks independently!

123

Separate instruction and data spaces

* One address space * Separate I and D spaces

124

Page sharingIn a large multiprogramming system... Some users run the same program at the same time

Why have more than one copy of the same page in memory???

Goal: Share pages among “processes” (not just threads!)

Cannot share writable pages If writable pages were shared processes would notice each

other’s effectsText segment can be shared

125

Page sharing

Process 1addressspace

Process 2addressspace

Process 1page table

Physical memory

Process 2page tableData (rw)

Instructions (rx)

Stack (rw)

126

Page sharing

“Fork” system call Copy the parent’s virtual address space

... and immediately do an “Exec” system call Exec overwrites the calling address space with the contents

of an executable file (ie a new program)

Desired Semantics: pages are copied, not shared

Observations Copying every page in an address space is expensive! processes can’t notice the difference between copying and

sharing unless pages are modified!

127

Page sharingIdea: Copy-On-Write Initialize new page table, but point entries to existing page

frames of parentShare pages

Temporarily mark all pages “read-only”Share all pages until a protection fault occurs

Protection fault (copy-on-write fault): Is this page really read only or is it writable but temporarily protected for

copy-on-write? If it is writable

copy the page mark both copies “writable” resume execution as if no fault occurred

128

On Page replacement..

Paging performance:Paging works best if there are plenty of free frames.

If all pages are full of dirty pages... Must perform 2 disk operations for each page

fault

129

Page replacementAssume a normal page table

User-program is executing

A PageInvalidFault occurs! The page needed is not in memory

Select some frame and remove the page in it If it has been modified, it must be written back to disk

the “dirty” bit in its page table entry tells us if this is necessary

Figure out which page was needed from the faulting addr

Read the needed page into this frame

Restart the interrupted process by retrying the same instruction

130

Page replacement algorithms

Which frame to replace?

Algorithms: The Optimal Algorithm First In First Out (FIFO) Not Recently Used (NRU) Second Chance / Clock Least Recently Used (LRU) Not Frequently Used (NFU) Working Set (WS) WSClock

131

The optimal page replacement algorithmIdea: Select the page that will not be needed for the

longest time

132

Optimal page replacement

Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c d

Page 0 aFrames 1 b 2 c 3 d

Page faults

a a a a b b b b c c c c d d d d

X

Replace the page that will not be needed for the longest

Example:

133

Optimal page replacement



Page faults

a a a a a a a a ab b b b b b b b bc c c c c c c c cd d d d e e e e e

X X

Select the page that will not be needed for the longest time

Example:

134

The optimal page replacement algorithmIdea: Select the page that will not be needed for the

longest time

Problem: Can’t know the future of a program Can’t know when a given page will be needed

next The optimal algorithm is unrealizable

135

The optimal page replacement algorithmHowever: We can use it as a control case for simulation

studies Run the program once Generate a log of all memory references Use the log to simulate various page replacement

algorithms Can compare others to “optimal” algorithm

136

FIFO page replacement algorithm

Always replace the oldest page … “Replace the page that has been in memory for

the longest time.”

137


Time 0 1 2 3 4 5 6 7 8 9 10Requests c a d b e b a b c a


Page faults

a a a b c c c c d d

X

Replace the page that was first brought into memory

Example: Memory system with 4 frames:

138




Page faults

a a a a a b b bc c c c e e d d d d

X



139




Page faults

a a a a a a a c c b b b b b b bc c c c e e e e e e d d d d d d d a

X X X



140


Always replace the oldest page. “Replace the page that has been in memory for

the longest time.”

Implementation Maintain a linked list of all pages in memory Keep it in order of when they came into memory The page at the front of the list is oldest Add new page to end of list

141


Disadvantage: The oldest page may be needed again soon Some page may be important throughout

execution It will get old, but replacing it will cause an

immediate page fault

142

Page table: referenced and dirty bits

Each page table entry (and TLB entry) has a Referenced bit - set by TLB when page read / written Dirty / modified bit - set when page is written If TLB entry for this page is valid, it has the most up

to date version of these bits for the page OS must copy them into the page table entry during fault

handling

On Some Hardware... ReadOnly bit but no dirty bit

143

Page table: referenced and dirty bits

Idea: Software sets the ReadOnly bit for all pages When program tries to update the page...

A trap occurs Software sets the Dirty Bit and clears the ReadOnly bit Resumes execution of the program

144

Not recently used page replacement alg.

Use the Referenced Bit and the Dirty Bit

Initially, all pages have Referenced Bit = 0 Dirty Bit = 0

Periodically... (e.g. whenever a timer interrupt occurs) Clear the Referenced Bit

145

Not recently used page replacement alg.

When a page fault occurs...

Categorize each page... Class 1: Referenced = 0 Dirty = 0 Class 2: Referenced = 0 Dirty = 1 Class 3: Referenced = 1 Dirty = 0 Class 4: Referenced = 1 Dirty = 1

Choose a victim page from class 1 … why?

If none, choose a page from class 2 … why?



146

Second chance page replacement alg.

Modification to FIFO

Pages kept in a linked list Oldest is at the front of the list

Look at the oldest page If its “referenced bit” is 0...

Select it for replacement Else

It was used recently; don’t want to replace it Clear its “referenced bit” Move it to the end of the list

Repeat

What if every page was used in last clock tick? Select a page at random

147

Clock algorithm (same as second chance)

1

1

2

3

5

4

0 clock bitframe #

Maintain a circular list of pages in memorySet a bit for the page when a page is referencedClock sweeps over memory looking for a victim page that does not have the referenced bit set If the bit is set, clear it and move on to the next page Replaces pages that haven’t been referenced for one complete

clock revolution

148

Least recently used algorithm (LRU)

Keep track of when a page is used.

Replace the page that has been used least recently.

149

LRU page replacement



Page faults

Replace the page that hasn’t been referenced in the longest time

150

LRU page replacement



Page faults

a a a a a a a a a ab b b b b b b b b b c c c c e e e e e d d d d d d d d d c c

X X X

Replace the page that hasn’t been referenced in the longest time

151


But how can we implement this?

Implementation #1: Keep a linked list of all pages On every memory reference,

Move that page to the front of the list.

The page at the tail of the list is replaced.

“on every memory reference...” Not feasible in software

152

LRU implementation



Page faults

Take referenced and put at head of list

153

LRU implementation



Page faults

CABD

ACBD

a ab bc cd d


154

LRU implementation



Page faults

CABD

ACBD

DACB

BDAC

a a a ab b b bc c c cd d d d X


155

LRU implementation



Page faults

CABD

ACBD

DACB

BDAC

EBDA

BEDA

ABED

BAED

CBAE

DCBA

a a a a a a a a a ab b b b b b b b b b c c c c e e e e e d d d d d d d d d c c

X X X


156


But how can we implement this? … without requiring every access to be recorded?

Implementation #2: MMU (hardware) maintains a counter Incremented on every clock cycle Every time a page table entry is used

MMU writes the value to the entry “timestamp” / “time-of-last-use”

When a page fault occursSoftware looks through the page table Idenitifies the entry with the oldest timestamp

157


What if we don’t have hardware support?

Implementation #3: No hardware support Maintain a counter in software One every timer interrupt...

Increment counter Run through the page table For every entry that has “ReferencedBit” = 1

Update its timestamp Clear the ReferencedBit

Approximates LRU If several have oldset time, choose one arbitrarily

158

Not frequently used algorithm (NFU)

Associate a counter with each page

On every clock interrupt, the OS looks at each page. If the Reference Bit is set...

Increment that page’s counter & clear the bit.

The counter approximates how often the page is used.

For replacement, choose the page with lowest counter.

159

Not frequently used algorithm (NFU)

Problem: Some page may be heavily used

---> Its counter is large

The program’s behavior changes Now, this page is not used ever again (or only rarely)

This algorithm never forgets! This page will never be chosen for replacement!

160

Modified NFU with aging

Associate a counter with each page

On every clock tick, the OS looks at each page. Shift the counter right 1 bit (divide its value by 2) If the Reference Bit is set...

Set the most-significant bit Clear the Referenced Bit

100000 =32

010000 = 16

001000 = 8

000100 = 4

100010 = 34

111111 = 63

161

Paged Memory Mangement

Concepts….

162

Working set page replacement

Demand paging Pages are only loaded when accessed When process begins, all pages marked INVALID

Locality of Reference Processes tend to use only a small fraction of their pages

Working Set The set of pages a process needs If working set is in memory, no page faults What if you can’t get working set into memory?

163


Thrashing If you can’t get working set into memory pages

fault every few instructions No work gets done

164


Prepaging (prefetching) Load pages before they are needed

Main idea: Indentify the process’s “working set”

How big is the working set? Look at the last K memory references As K gets bigger, more pages needed. In the limit, all pages are needed.

165


k (the time interval)

The size of the working set:

166

Working set page replacementIdea: Look back over the last T msec of time Which pages were referenced?

This is the working set.

Current Virtual Time Only consider how much CPU time this process has seen.

Implementation On each clock tick, look at each page Was it referenced?

Yes: Make a note of Current Virtual Time

If a page has not been used in the last T msec, It is not in the working set! Evict it; write it out if it is dirty.

167


168

WSClock page replacement algorithm

All pages are kept in a circular list (ring)

As pages are added, they go into the ring.

The “clock hand” advances around the ring.

Each entry contains “time of last use”.

Upon a page fault... If Reference Bit = 1...

Page is in use now. Do not evict. Clear the Referenced Bit. Update the “time of last use” field.

169

WSClock page replacement algorithm

If Reference Bit = 0 If the age of the page is less than T...

This page is in the working set. Advance the hand and keep looking

If the age of the page is greater than T... If page is clean

Reclaim the frame and we are done! If page is dirty

Schedule a write for the page Advance the hand and keep looking

170

Summary

171

Theory and practice

high water mark

low water mark

low < # free pages < high

Identifying victim frame on each page fault typically requires two disk accesses per page fault

Alternative the O.S. can keep several pages free in anticipation of upcoming page faults. In Unix: low and high water marks

172

Free pages and the clock algorithm

The rate at which the clock sweeps through memory determines the number of pages that are kept free: Too high a rate --> Too many free pages marked Too low a rate --> Not enough (or no) free pages marked

Large memory system considerations As memory systems grow, it takes longer and longer for the

hand to sweep through memory This washes out the effect of the clock somewhat Can use a two-handed clock to reduce the time between the

passing of the hands

173

The UNIX memory modelUNIX page replacementTwo handed clock algorithm for page replacement

If page has not been accessed move it to the free list for use as allocatable page

If modified/dirty write to disk (still keep stuff in memory though) If unmodified just move to free list

High and low water marks for free pages

Pages on the free-list can be re-allocated if they are accessed again before being overwritten

174

Modeling page replacementRun a program Look at all memory references Don’t need all this data Look at which pages are accessed

0000001222333300114444001123444

Eliminate duplicates012301401234

Reference String Use this to evaluate different page replacement algorithms

175

Belady’s anomalyIf you have more page frames (i.e., more memory)... You will have fewer page faults, right???

Not always!

Consider FIFO page replacement

Look at this reference string012301401234

Case 1: 3 frames available --> 9 page faults

Case 2: 4 frames available --> 10 page faults

176

Belady’s anomaly

FIFO with 3 page frames

177

Belady’s anomaly



178

Local vs. global page replacement

Assume several processes: A, B, C, ...

Some process gets a page fault (say, process A)

Choose a page to replace.

Local page replacement Only choose one of A’s pages

Global page replacement Choose any page

179


Original Local Global

Example: Process has a page fault...

180


Assume we have 5,000 frames in memory 10 processes

Idea: Give each process 500 frames

Fairness? Small processes: do not need all those pages Large processes: may benefit from even more frames

Idea: Look at the size of each process Give them a pro-rated number of frames With a minimum of (say) 10 frames per process

181

Page fault frequency“If you give a process more pages,

its page fault frequency will decline.”

182

Page fault frequency

Too High: Need to give thisprocess some more frames!

Too Low: Take some framesaway and give to other processes!

“If you give a process more pages,

its page fault frequency will decline.”

183

Page fault frequency

Measure the page fault frequency of each process.

Count the number of faults every second.

May want to consider the past few seconds as well.

184

Page fault frequencyMeasure the page fault frequency of each process.

Count the number of faults every second.

May want to consider the past few seconds as well.

Aging: Keep a running value. Every second

Count number of page faults Divide running value by 2 Add in the count for this second

185

Separation of Policy and Mechanism

Implementation ideas…

Kernel contains Code to manipulate the MMU

Machine dependent

Code to handle page faults Machine independent

A user-level “External Pager” process can determine policy

Which page to evict When to perform disk I/O How to manage the swap file

Examples: Mach, Minix

186

Separation of Policy and Mechanism

OMSE 510: Computing Foundations 8: The Address Space

Documents

Transcript of OMSE 510: Computing Foundations 8: The Address Space