ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and...

13
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory

Transcript of ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and...

Page 1: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Lecture 28: More Virtual Memory

Page 2: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 2

Overview

Virtual memory used to protect applications from each other

Portions of application located both in main memory and on disk

Need to speed up access for virtual memory

Idea: use a small cache to store translation for frequently used pages

Page 3: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 3

How to Translate Fast?

Problem: Virtual Memory requires two memory accesses!

• one to translate Virtual Address into Physical Address (page table lookup) - Page Table is in physical memory

• one to transfer the actual data (hopefully cache hit)

• VM hierarchy only or Cache-memory-disk hierarchy

Why not create a cache of virtual to physical address translations to make translation fast? (smaller is faster)

For historical reasons, such a “page table cache” is called a Translation Lookaside Buffer, or TLB

CPU

Memory

Page 4: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 4

Translation-Lookaside Buffer (TLB)

Physical Page N-1

Physical Page 0

Physical Page 1

Main Memory

of page 1

H. Stone, “High Performance Computer Architecture,” AW 1993

Page 5: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 5

TLB and Page Table

Page 6: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 6

Translation Look-Aside Buffers

TLB is usually small, typically 32-512 entries

Like any other cache, the TLB can be fully associative, set associative, or direct mapped

TLB Cache Main

Memory

misshit

data

hit

miss

Disk

MemoryOS FaultHandler

page fault/protection violation

Page

Table

data

virtualaddr.

physicaladdr.

Processor

Page 7: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 7

Steps in Memory Access - Example

TLB Cache Main

Memory

misshit

data

hit

miss

Disk

MemoryOS FaultHandler

Page

Table

data

virtualaddr.

physicaladdr.

CPU

Page 8: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 8

Valid Tag Data

Page offset

Page offset

Virtual page number

Physical page numberValid

1220

20

16 14

Cache index

32

DataCache hit

2Byte

offset

Dirty Tag

TLB hit

Physical page number

Physical address tag

31 30 29 15 14 13 12 11 10 9 8 3 2 1 0 DECStation 3100/MIPS R2000Virtual Address

TLB

Cache

64 entries,

fully

associative

Physical Address

16K entries,

direct

mapped

Page 9: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 9

Real Stuff: Pentium Pro Memory Hierarchy

Address Size: 32 bits (VA, PA)

VM Page Size: 4 KB

TLB organization: separate i,d TLBs(i-TLB: 32 entries, d-TLB: 64 entries)4-way set associativeLRU approximatedhardware handles miss

L1 Cache: 8 KB, separate i,d 4-way set associativeLRU approximated32 byte blockwrite back

L2 Cache: 256 or 512 KB

Page 10: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 10

Each processor has: private 32-KB instruction and 32-KB data

caches and a 512-KB L2 cache. The four cores share an 8-MB L3

cache. Each core also has a two-level TLB.

Intel “Nehalim” quad-core processor

13.519.6 mm die;

731 million

transistors;

Two 128-bit

memory channels

Page 11: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 11

Intel Nehalem AMD Opteron X4

Virtual addr 48 bits 48 bits

Physical addr

44 bits 48 bits

Page size 4KB, 2/4MB 4KB, 2/4MB

L1 TLB(per core)

L1 I-TLB: 128 entries

L1 D-TLB: 64 entries

Both 4-way, LRU replacement

L1 I-TLB: 48 entries

L1 D-TLB: 48 entries

Both fully associative, LRU replacement

L2 TLB(per core)

Single L2 TLB: 512 entries

4-way, LRU replacement

L2 I-TLB: 512 entries

L2 D-TLB: 512 entries

Both 4-way, round-robin LRU

TLB misses Handled in hardware Handled in hardware

Comparing Intel’s Nehalim to AMD’s Opteron

Page 12: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 12

Intel Nehalem AMD Opteron X4

L1 caches(per core)

L1 I-cache: 32KB, 64-byte blocks, 4-way, approx LRU, hit time n/a

L1 D-cache: 32KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a

L1 I-cache: 32KB, 64-byte blocks, 2-way, LRU, hit time 3 cycles

L1 D-cache: 32KB, 64-byte blocks, 2-way, LRU, write-back/allocate, hit time 9 cycles

L2 unified cache(per core)

256KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a

512KB, 64-byte blocks, 16-way, approx LRU, write-back/allocate, hit time n/a

L3 unified cache (shared)

8MB, 64-byte blocks, 16-way, write-back/allocate, hit time n/a

2MB, 64-byte blocks, 32-way, write-back/allocate, hit time 32 cycles

Further Comparison

Page 13: ECE232: Hardware Organization and Design · 2019. 8. 29. · Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture

ECE232: More Virtual Memory 13

Summary

Virtual memory allows the appearance of a main memory that is larger than what is physically present

Virtual memory can be shared by multiple applications

Page table indicates how to translate from virtual to physical address

TLB speeds up access to virtual memory

• Generally set associative or fully associative

• Much smaller than main memory

Next time: Putting it all together (cache, TLB, virtual memory)