Machine Structures Lecture 26 – Virtual Memory I

23
L26 Virtual Memory I (1) Machine Structures Lecture 26 – Virtual Memory I www.sci-tech-today.com/story.xhtml?story_id=13200DIX5QFO CPU + GPU = “Fusion” AMD recently aquired ATI, maker of high-performance graphics chips, and will create a single, unified processor with BOTH the CPU and GPU on a chip. This could yield cheaper PCs & better performance QuickTime™ and a TIFF (Uncompressed) decompres are needed to see this pict QuickTime™ and a TIFF (Uncompressed) decompres are needed to see this pict QuickTime™ and a TIFF (Uncompressed) decompr are needed to see this pi +

description

Machine Structures Lecture 26 – Virtual Memory I. CPU + GPU = “ Fusion ”  AMD recently aquired ATI, maker of high-performance graphics chips, and will create a single, unified processor with BOTH the CPU and GPU on a chip. This could yield cheaper PCs & better performance. +. - PowerPoint PPT Presentation

Transcript of Machine Structures Lecture 26 – Virtual Memory I

Page 1: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (1)

Machine Structures

Lecture 26 – Virtual Memory I

www.sci-tech-today.com/story.xhtml?story_id=13200DIX5QFO

CPU + GPU = “Fusion” AMD recently aquired ATI,

maker of high-performance graphics chips, and will create a single, unified processor

with BOTH the CPU and GPU on a chip. This could yield cheaper PCs & better performance

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

+

Page 2: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (2)

Review: Caches

•Cache design choices:• size of cache: speed v. capacity• direct-mapped v. associative• for N-way set assoc: choice of N• block replacement policy• 2nd level cache?• Write through v. write back?

•Best choice depends on programs, technology, budget.

•Use performance model to pick between choices.

Page 3: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (3)

Another View of the Memory Hierarchy Regs

L2 Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Upper Level

Lower Level

Faster

Larger

CacheBlocks

Thus far{{Next:

VirtualMemory

Page 4: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (4)

Memory Hierarchy Requirements

•局部性原理实现了对内存的快速访问(接近 Cache 的速度),容量不变。这使我们有了推广这一做法的想法:

• 为什么不使用下一级存储来提供 DRAM 内存的速度,而同时具备磁盘的容量呢 ?

•除了上述想法• 还能做什么 ?

Page 5: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (5)

Memory Hierarchy Requirements

•允许多个进程同时驻留内存,并受到保护(即不允许一个程序读写另一个的内存)

•地址空间 – 每个程序都似乎拥有自己私有的内存

• 假定代码始于地址 0x40000000. 不同的进程代码不同 , 都驻留在同一地址 . 因此每个程序都对内存有一个自己的视图( view ) .

Page 6: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (6)

Virtual Memory

•Called “Virtual Memory”

•Next level in the memory hierarchy:• 给程序的印象是有很大的内存 :

• 正在运行的 “ pages (页面)” 驻留在内存中,其他留在磁盘中 .

•同时也允许 OS 共享内存 , 程序互不干扰•今天,更重要的是保护 ,而不是另一层次

的内存•每个进程( process )认为自己拥有整个

内存• (Historically, it predates caches)

Page 7: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (7)

Virtual to Physical Address Translation

•每个程序操作其自己的虚拟地址空间 ; 当然仅仅限于正在运行的程序•互不干扰•OS 决定每个程序访问哪个内存•硬件提供 virtual physical 映射

virtualaddress

(inst. fetchload, store)

Programoperates inits virtualaddressspace

HWmapping physical

address(inst. fetchload, store)

Physicalmemory

(incl. caches)

Page 8: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (8)

Analogy

•书名好比虚拟内存•书的标准索书号好比物理地址•目录卡片好比 page table, 实现书名到索

书号之间的映射•卡片上的“当地图书馆”还是“其他分支

机构”,正如 valid bit 表示在主存中还是在磁盘上•卡片上 , 可在图书馆使用 2- 小时 ( 还是

借出 checkout 2 周 ) 正如 access rights

Page 9: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (9)

Simple Example: Base and Bound Reg

0

OS

User A

User B

User C

$base

$base+$bound

•Want: • discontinuous mapping

• Process size >> mem

•Addition not enough!

use Indirection!

Enough space for User D,but discontinuous (“fragmentation problem”)

Page 10: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (10)

Mapping Virtual Memory to Physical Memory

0

Physical Memory

Virtual Memory

Code

Static

Heap

Stack

64 MB

•分为相同大小的块 ( 约 4 KB - 8 KB)

0

•虚拟内存的块与物理内存的块间的映射是任意的 (“page”)

Page 11: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (11)

Paging Organization (assume 1 KB pages)

AddrTransMAP

Page is unit of mapping

Page also unit of transfer from disk to physical memory

page 0 1K1K

1K

01024

31744

Virtual Memory

VirtualAddress

page 1

page 31

1K2048 page 2

...... ...

page 001024

7168

PhysicalAddress

PhysicalMemory

1K1K

1K

page 1

page 7...... ...

Page 12: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (12)

Virtual Memory Mapping Function

•无法使用简单函数预测任意映射•使用查表法进行映射

•使用查表法 (“Page Table”) 进行映射 : 页号即索引•虚拟内存映射函数

• Physical Offset = Virtual Offset

• Physical Page Number= PageTable[Virtual Page Number]

(P.P.N. also called “Page Frame”)

Page Number Offset

Page 13: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (13)

Address Mapping: Page Table

Virtual Address:page no. offset

Page TableBase Reg

Page Table located in physical memory

indexintopagetable

+

PhysicalMemoryAddress

Page Table

Val-id

AccessRights

PhysicalPageAddress

.

V A.R. P. P. A.

...

...

Page 14: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (14)

Page Table

•page table 为操作系统中的一个数据结构,包含虚拟地址到物理位置间的映射

• There are several different ways, all up to the operating system, to keep this data around

•操作系统中运行的每个进程都有自己的page table

• 进程的“状态 State”包含 PC, all registers, plus page table

• OS changes page tables by changing contents of Page Table Base Register

Page 15: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (15)

Requirements revisited

Remember the motivation for VM:

•受保护的共享内存• 不同的物理页可以分配给不同的进程 ( 共享 )

• 进程只能访问自己 page table 中的页面 (protection)

•分离地址空间• 由于程序工作在虚拟空间中 , 不同的程序可

以在相同的地址具有不同的数据 / 代码 !

What about the memory hierarchy?

Page 16: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (16)

Page Table Entry (PTE) Format•Contains either Physical Page Number or indication not in Main Memory

•OS maps to disk if Not Valid (V = 0)

• If valid, also check if have permission to use page: Access Rights (A.R.) may be Read Only, Read/Write, Executable

...Page Table

Val-id

AccessRights

PhysicalPageNumber

V A.R. P. P. N.

V A.R. P. P.N.

...

P.T.E.

Page 17: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (17)

Paging/Virtual Memory Multiple Processes

User B: Virtual Memory

Code

Static

Heap

Stack

0Code

Static

Heap

Stack

A PageTable

B PageTable

User A: Virtual Memory

00

Physical Memory

64 MB

Page 18: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (18)

Comparing the 2 levels of hierarchy

Cache version Virtual Memory vers.

Block or Line Page

Miss Page Fault

Block Size: 32-64B Page Size: 4K-8KB

Placement: Fully AssociativeDirect Mapped, N-way Set Associative

Replacement: Least Recently UsedLRU or Random (LRU)

Write Thru or Back Write Back

Page 19: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (19)

Notes on Page Table•解决碎片问题 : 所有块同样大小 , 所有的洞都可使用•OS 必须在磁盘上为每个进程保留 “交换空间 Swap

Space”

•如果有新的进程 , ask Operating System• 若有未使用的页 , OS 优先使用他们• 否则 , OS 将内存中一些旧页与磁盘进行交换• (Least Recently Used to pick pages to swap)

•每个进程有自己的 Page Table

•将来会加入一些细节 , 但 Page Table 是虚拟内存的基础

Page 20: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (20)

Virtual Memory Problem #1

•映射每个地址 每每每每每每每每每每 Page Table 访问内存一次 一次虚拟内存访问 = 两次物理内存访问 SLOW!

•观察 : 由于数据页的局部性 , 因此,这些页的虚拟地址变换也必然有局部性

•因为 small is fast, 为什么不使用一个虚拟到物理地址转换的小 cache 来加速变换呢 ?

•由于历史的原因 , 这个 cache 被称为 Translation Lookaside Buffer, or TLB

Page 21: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (21)

Translation Look-Aside Buffers (TLBs)

•TLBs usually small, typically 128 - 256 entries

• Like any other cache, the TLB can be direct mapped, set associative, or fully associative

Processor TLBLookup

CacheMain

Memory

VA PA

miss

hit dataTrans-lation

hit

miss

On TLB miss, get page table entry from main memory

Page 22: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (22)

Peer Instruction

A. Locality is important yet different for cache and virtual memory (VM): temporal locality for caches but spatial locality for VM

B. Cache management is done by hardware (HW), page table management by the operating system (OS), but TLB management is either by HW or OS

C. VM helps both with security and cost

ABC1: FFF2: FFT3: FTF4: FTT5: TFF6: TFT7: TTF8: TTT

Page 23: Machine Structures Lecture 26 –  Virtual Memory I

L26 Virtual Memory I (24)

And in conclusion…

•Manage memory to disk? Treat as cache• Included protection as bonus, now critical

• Use Page Table of mappings for each uservs. tag/data in cache

• TLB is cache of VirtualPhysical addr trans

•Virtual Memory allows protected sharing of memory between processes

•Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well