Introduction to Computer Organization and Architecture Lecture 9 By Juthawut Chantharamalee jutha...
-
Upload
bryan-long -
Category
Documents
-
view
219 -
download
3
Transcript of Introduction to Computer Organization and Architecture Lecture 9 By Juthawut Chantharamalee jutha...
Introduction to Computer Organization and Architecture
Lecture 9By Juthawut
Chantharamaleehttp://dusithost.dusit.ac.th/~juthawut_cha/home.htm
Outline Virtual Memory
Basics Address Translation Cache vs VM Paging Replacement TLBs Segmentation Page Tables
2Introduction to Computer Organization and Architecture
The Full Memory Hierarchy
CPU Registers100s Bytes<10s ns
CacheK Bytes10-100 ns1-0.1 cents/bit
Main MemoryM Bytes200ns- 500ns$.0001-.00001 cents /bitDiskG Bytes, 10 ms (10,000,000 ns)
10 - 10 cents/bit-5 -6
CapacityAccess TimeCost
Tapeinfinitesec-min
10 -8
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
StagingXfer Unit
prog./compiler1-8 bytes
cache cntl8-128 bytes
OS4K-16K bytes
user/operatorMbytes
Upper Level
Lower Level
faster
Larger
3Introduction to Computer Organization and Architecture
Virtual Memory Some facts of computer life…
Computers run lots of processes simultaneously No full address space of memory for each process Must share smaller amounts of physical memory
among many processes
Virtual memory is the answer! Divides physical memory into blocks, assigns them to
different processes
4Introduction to Computer Organization and Architecture
Virtual Memory Virtual memory (VM) allows main memory
(DRAM) to act like a cache for secondary storage (magnetic disk).
VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk.
Compiler assigns data to a “virtual” address.VA translated to a real/physical somewhere in memory…
(allows any program to run anywhere;where is determined by a particular machine, OS)
5Introduction to Computer Organization and Architecture
VM Benefit VM provides the following benefits
Allows multiple programs to share the same physical memory
Allows programmers to write code as though they have a very large amount of main memory
Automatically handles bringing in data from disk
6Introduction to Computer Organization and Architecture
Virtual Memory Basics Programs reference “virtual” addresses in a non-existent
memory These are then translated into real “physical” addresses Virtual address space may be bigger than physical address
space
Divide physical memory into blocks, called pages Anywhere from 512 to 16MB (4k typical)
Virtual-to-physical translation by indexed table lookup Add another cache for recent translations (the TLB)
Invisible to the programmer Looks to your application like you have a lot of memory!
7Introduction to Computer Organization and Architecture
VM: Page Mapping
Process 1’sVirtual AddressSpace
Process 2’sVirtual AddressSpace
Physical Memory
Disk
Page Frames
8Introduction to Computer Organization and Architecture
VM: Address Translation
Virtual page number Page offset
Physical page number Page offset
Page Tablebase
Per-process page table
Valid bitProtection bits
Dirty btReference bit
12 bits20 bits Log2 ofpagesize
To physical memory
9Introduction to Computer Organization and Architecture
Example of virtual memory Relieves problem of
making a program that was too large to fit in physical memory – well….fit!
Allows program to run in any location in physical memory (called relocation) Really useful as you
might want to run same program on lots machines…
048
12
VirtualAddress
ABCD
04K8K
12K
PhysicalAddress
C
A
B
D Disk
16K20K24K28K
Virtual Memory
PhysicalMain Memory
Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D;The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk
10Introduction to Computer Organization and Architecture
Cache terms vs. VM termsSo, some definitions/“analogies”
A “page” or “segment” of memory is analogous to a “block” in a cache
A “page fault” or “address fault” is analogous to a cache miss
“real”/physicalmemory
so, if we go to main memory and our dataisn’t there, we need to get it from disk…
11Introduction to Computer Organization and Architecture
More definitions and cache comparisons These are more definitions than analogies…
With VM, CPU produces “virtual addresses” that are translated by a combination of HW/SW to “physical addresses”
The “physical addresses” access main memory The process described above is called “memory mapping” or
“address translation”
12Introduction to Computer Organization and Architecture
Cache VS. VM comparisons (1/2)Parameter First-level cache Virtual memory
Block (page) size
12-128 bytes 4096-65,536 bytes
Hit time 1-2 clock cycles 40-100 clock cycles
Miss penalty
(Access time)
(Transfer time)
8-100 clock cycles
(6-60 clock cycles)
(2-40 clock cycles)
700,000 – 6,000,000 clock cycles
(500,000 – 4,000,000 clock cycles)
(200,000 – 2,000,000 clock cycles)
Miss rate 0.5 – 10% 0.00001 – 0.001%
Data memory size
0.016 – 1 MB 4MB – 4GB
13Introduction to Computer Organization and Architecture
Cache VS. VM comparisons (2/2)
Replacement policy: Replacement on cache misses primarily controlled by
hardware Replacement with VM (i.e. which page do I replace?)
usually controlled by OS Because of bigger miss penalty, want to make the right
choice
Sizes: Size of processor address determines size of VM Cache size independent of processor address size
14Introduction to Computer Organization and Architecture
Virtual MemoryTiming’s tough with virtual memory:
AMAT = Tmem + (1-h) * Tdisk
= 100nS + (1-h) * 25,000,000nS
h (hit rate) had to be incredibly (almost unattainably) close to perfect to work
so: VM is a “cache” but an odd one.
15Introduction to Computer Organization and Architecture
Paging Hardware
CPU page offset
PhysicalMemory
page table
frame
frame offset
page
32 32
How big is a page?How big is the page table?
16Introduction to Computer Organization and Architecture
Address Translation in a Paging System
Program Paging Main Memory
Virtual Address
Register
Page Table
PageFrame
Offset
P#
Frame #
Page Table Ptr
Page # Offset Frame # Offset
+
17Introduction to Computer Organization and Architecture
How big is a page table?Suppose
32 bit architecture Page size 4 kilobytes Therefore:
0000 0000 0000 0000 0000 00000000 0000
0000 0000 00000000 0000 00000000 0000
Offset 212Page Number 220
18Introduction to Computer Organization and Architecture
Test YourselfA processor asks for the contents of virtual memory address 0x10020. The paging scheme in use breaks this into a VPN of 0x10 and an offset of 0x020.
PTR (a CPU register that holds the address of the page table) has a value of 0x100 indicating that this process’s page table starts at location 0x100.
The machine uses word addressing and the page table entries are each one word long.
PTR 0x100
Memory Reference
VPN OFFSET
0x010 0x020
19Introduction to Computer Organization and Architecture
Test YourselfADDR CONTENTS0x00000 0x000000x00100 0x000100x00110 0x000220x00120 0x000450x00130 0x000780x00145 0x000100x10000 0x033330x10020 0x044440x22000 0x011110x22020 0x022220x45000 0x055550x45020 0x06666
• What is the physical address calculated?
1. 10020
2. 22020
3. 45000
4. 45020
5. none of the above
PTR 0x100
Memory Reference
VPN OFFSET
0x010 0x020
20Introduction to Computer Organization and Architecture
Test YourselfADDR CONTENTS0x00000 0x000000x00100 0x000100x00110 0x000220x00120 0x000450x00130 0x000780x00145 0x000100x10000 0x033330x10020 0x044440x22000 0x011110x22020 0x022220x45000 0x055550x45020 0x06666
• What is the physical address calculated?
• What is the contents of this address returned to the processor?
• How many memory accesses in total were required to obtain the contents of the desired address?
PTR 0x100
Memory Reference
VPN OFFSET
0x010 0x020
21Introduction to Computer Organization and Architecture
Another Example
0123
5612
Page Table
01 01 110 01
101110001010
00011011
Logical memory0 a1 b2 c3 d4 e5 f6 g7 h8 i9 j10 k11 l12 m13 n14 o15 p
Physical memory01234 i5 j6 k7 l8 m9 n10 o11 p121314151617181920 a21 b22 c23 d24 e25 f26 g27 h28293031 22Introduction to Computer Organization and Architecture
Replacement policies
23Introduction to Computer Organization and Architecture
Block replacement Which block should be replaced on a virtual
memory miss? Again, we’ll stick with the strategy that it’s a good
thing to eliminate page faults Therefore, we want to replace the LRU block
Many machines use a “use” or “reference” bit Periodically reset Gives the OS an estimation of which pages are referenced
24Introduction to Computer Organization and Architecture
Writing a block What happens on a write?
We don’t even want to think about a write through policy!
Time with accesses, VM, hard disk, etc. is so great that this is not practical
Instead, a write back policy is used with a dirty bit to tell if a block has been written
25Introduction to Computer Organization and Architecture
Mechanism vs. PolicyMechanism:
paging hardware trap on page fault
Policy: fetch policy: when should we bring in the pages of a
process? 1. load all pages at the start of the process 2. load only on demand: “demand paging”
replacement policy: which page should we evict given a shortage of frames?
26Introduction to Computer Organization and Architecture
Replacement PolicyGiven a full physical memory, which page should
we evict??What policy?
Random FIFO: First-in-first-out LRU: Least-Recently-Used MRU: Most-Recently-Used OPT: (will-not-be-used-farthest-in-future)
27Introduction to Computer Organization and Architecture
Replacement Policy Simulationexample sequence of page numbers
0 1 2 3 42 2 37 1 2 3
FIFO?LRU?OPT?How do you keep track of LRU info? (another
data structure question)
28Introduction to Computer Organization and Architecture
Page tables and lookups…1. it’s slow! We’ve turned every access to
memory into two accesses to memory solution: add a specialized “cache” called a “translation
lookaside buffer (TLB)” inside the processor
2. it’s still huge! even worse: we’re ultimately going to have a page
table for every process. Suppose 1024 processes, that’s 4GB of page tables!
29Introduction to Computer Organization and Architecture
Paging/VM (1/3)
CPUCPU 42 356
PhysicalMemory
page table
356
i
OperatingSystem
Disk
30Introduction to Computer Organization and Architecture
Paging/VM (2/3)
CPUCPU 42 356
PhysicalMemory
356
page table
i
OperatingSystem
Disk
Place page table in physical memoryHowever: this doubles the time per memory access!!
31Introduction to Computer Organization and Architecture
Paging/VM (3/3)
CPUCPU 42 356
PhysicalMemory
356
page table
i
OperatingSystem
Disk
Special-purpose cache for translationsHistorically called the TLB: Translation Lookaside Buffer
Cache!
32Introduction to Computer Organization and Architecture
Translation CacheJust like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped
TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations.
Note: 128-256 entries times 4KB-16KB/entry is only 512KB-4MB the L2 cache is often bigger than the “span” of the TLB.
33Introduction to Computer Organization and Architecture
CPUTLB
LookupCache Main
Memory
VA PA miss
hit
data
Trans-lation
hit
missTranslationwith a TLB
Translation CacheA way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB
Virtual Page # Physical Frame # Dirty Ref Valid Access
Really just a cache (a special-purpose cache) on the page table mappings
TLB access time comparable to cache access time (much less than main memory access time)
tag
34Introduction to Computer Organization and Architecture
An example of a TLBPage frame
addressPageOffset
<30> <13>
V<1>
Tag<30>
Phys. Addr.<21>
... …
11 22
32:1 Mux33
44
R<2>
W<2>
…
<21>
<13>
(Low-order 13bits of addr.)
(High-order 21bits of addr.)
34-bit physicaladdress
Read/write policies and permissions…
35Introduction to Computer Organization and Architecture
The “big picture” and TLBs Address translation is usually on the critical path…
…which determines the clock cycle time of the P
Even in the simplest cache, TLB values must be read and compared
TLB is usually smaller and faster than the cache-address-tag memory
This way multiple TLB reads don’t increase the cache hit time
TLB accesses are usually pipelined b/c its so important!
36Introduction to Computer Organization and Architecture
The “big picture” and TLBs
Try to readfrom cache
TLB accessVirtual Address
TLB Hit?
Write?Try to readfrom page
table
Page fault?
Replacepage from
disk
TLB missstall
Set in TLB
Cache hit?
Cache missstall
Deliver datato CPU
Cache/buffermemory write
Yes
Yes
YesYes
No
No No
No
37Introduction to Computer Organization and Architecture
Pages are Cached in a Virtual Memory System
Can Ask the Same Four Questions we did about caches Q1: Block Placement
choice: lower miss rates and complex placement or vice versa miss penalty is huge so choose low miss rate ==> place page anywhere in
physical memory similar to fully associative cache model
Q2: Block Addressing - use additional data structure fixed size pages - use a page table
virtual page number ==> physical page number and concatenate offset
tag bit to indicate presence in main memory
38Introduction to Computer Organization and Architecture
Normal Page Tables Size is number of virtual pages Purpose is to hold the translation of VPN to PPN
Permits ease of page relocation Make sure to keep tags to indicate page is mapped
Potential problem: Consider 32bit virtual address and 4k pages 4GB/4KB = 1MW required just for the page table! Might have to page in the page table…
Consider how the problem gets worse on 64bit machines with even larger virtual address spaces!
Might have multi-level page tables
39Introduction to Computer Organization and Architecture
Inverted Page Tables Similar to a set-associative mechanism Make the page table reflect the # of physical pages (not virtual) Use a hash mechanism
virtual page number ==> HPN index into inverted page table Compare virtual page number with the tag to make sure it is the one you
want if yes
check to see that it is in memory - OK if yes - if not page fault If not - miss
go to full page table on disk to get new entry implies 2 disk accesses in the worst case trades increased worst case penalty for decrease in capacity
induced miss rate since there is now more room for real pages with smaller page table
40Introduction to Computer Organization and Architecture
Inverted Page Table
Page V
Page Offset
Frame Offset
FrameHash
=
OK
•Only store entriesfor pages in physicalmemory
41Introduction to Computer Organization and Architecture
Address Translation Reality The translation process using page tables takes
too long! Use a cache to hold recent translations
Translation Lookaside Buffer Typically 8-1024 entries Block size same as a page table entry (1 or 2 words) Only holds translations for pages in memory 1 cycle hit time Highly or fully associative Miss rate < 1% Miss goes to main memory (where the whole page table
lives) Must be purged on a process switch
42Introduction to Computer Organization and Architecture
Back to the 4 Questions Q3: Block Replacement (pages in physical
memory) LRU is best
So use it to minimize the horrible miss penalty However, real LRU is expensive
Page table contains a use tag On access the use tag is set OS checks them every so often, records what it sees, and
resets them all On a miss, the OS decides who has been used the least
Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate
43Introduction to Computer Organization and Architecture
Last Question Q4: Write Policy
Always write-back Due to the access time of the disk So, you need to keep tags to show when pages are dirty and
need to be written back to disk when they’re swapped out.
Anything else is pretty silly Remember – the disk is SLOW!
44Introduction to Computer Organization and Architecture
Page Sizes An architectural choice Large pages are good:
reduces page table size amortizes the long disk access if spatial locality is good then hit rate will improve
Large pages are bad: more internal fragmentation
if everything is random each structure’s last page is only half full Half of bigger is still bigger if there are 3 structures per process: text, heap, and control stack then 1.5 pages are wasted for each process
process start up time takes longer since at least 1 page of each type is required to prior to start transfer time penalty aspect is higher
45Introduction to Computer Organization and Architecture
More on TLBs The TLB must be on chip
otherwise it is worthless small TLB’s are worthless anyway large TLB’s are expensive
high associativity is likely
==> Price of CPU’s is going up! OK as long as performance goes up faster
46Introduction to Computer Organization and Architecture
Selecting a Page Size Reasons for larger page size
Page table size is inversely proportional to the page size; therefore memory saved
Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows
Transferring larger pages to or from secondary storage, possibly over a network, is more efficient
Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses
Reasons for a smaller page size Want to avoid internal fragmentation: don’t waste storage; data must be conti
guous within page Quicker process start for small processes - don’t need to bring in more memo
ry than needed
47Introduction to Computer Organization and Architecture
Memory Protection With multiprogramming, a computer is shared by several programs o
r processes running concurrently Need to provide protection Need to allow sharing
Mechanisms for providing protection Provide Base and Bound registers: Base ฃAddress ฃBound Provide both user and supervisor (operating system) modes Provide CPU state that the user can read, but cannot write
Branch and bounds registers, user/supervisor bit, exception bits Provide method to go from user to supervisor mode and vice versa
system call : user to supervisor system return : supervisor to user
Provide permissions for each flag or segment in memory
48Introduction to Computer Organization and Architecture
Pitfall: Address space to small One of the biggest mistakes than can be made when designing an
architecture is to devote to few bits to the address address size limits the size of virtual memory difficult to change since many components depend on it (e.g., PC, registers,
effective-address calculations)
As program size increases, larger and larger address sizes are needed
8 bit: Intel 8080 (1975) 16 bit: Intel 8086 (1978) 24 bit: Intel 80286 (1982) 32 bit: Intel 80386 (1985) 64 bit: Intel Merced (1998)
49Introduction to Computer Organization and Architecture
Virtual Memory Summary Virtual memory (VM) allows main memory (DRAM) to act
like a cache for secondary storage (magnetic disk). The large miss penalty of virtual memory leads to
different stategies from cache Fully associative, TB + PT, LRU, Write-back
Designed as paged: fixed size blocks segmented: variable size blocks hybrid: segmented paging or multiple page sizes
Avoid small address size
50Introduction to Computer Organization and Architecture
Summary 2: Typical ChoicesOption TLB L1 Cache L2 Cache VM (page)
Block Size 4-8 bytes (1 PTE) 4-32 bytes 32-256 bytes 4k-16k bytes
Hit Time 1 cycle 1-2 cycles 6-15 cycles 10-100 cycles
Miss Penalty 10-30 cycles 8-66 cycles 30-200 cycles 700k-6M cycles
Local Miss Rate .1 - 2% .5 – 20% 13 - 15% .00001 - 001%
Size 32B – 8KB 1 – 128 KB 256KB - 16MB
Backing Store L1 Cache L2 Cache DRAM Disks
Q1: Block Placement
Fully or set associative
DM DM or SA Fully associative
Q2: Block ID Tag/block Tag/block Tag/block Table
Q3: Block Replacement
Random (not last)
N.A. For DM Random (if SA) LRU/LFU
Q4: Writes Flush on PTE write
Through or back
Write-back Write-back
51Introduction to Computer Organization and Architecture
The End Lecture 9