CS161 Spring 2014 11
VM Internals
• Topics• Hardware support for virtual memory
• Learning Objectives:• Specify what the hardware must provide to enable virtual
memory.• Discuss the hardware mechanisms that make virtual
memory perform well.
2/27/14
OS Designer Humor
CS161 Spring 2014 2
Hey, what if we translated every user address? Wow, that
sounds really slow!
What do we usually do to improve performance? Well, we could
try a cache…
2/27/14
Everything's a Cache
CPU Re-gisters
L1 Cache
L2 Cache
Memory
Disk
3
A Virtual Address Cache
• A translation lookaside buffer (TLB) is a cache of virtual to physical address translation.• Implemented in hardware• It contains a few tens to a few hundreds of mappings between
virtual and physical addresses.
• But a process may have many, many more mappings, how can such a small number help us?
CS161 Spring 2014 42/27/14
A Virtual Address Cache
• A translation lookaside buffer (TLB) is a cache of virtual to physical address translation.• Implemented in hardware• It contains a few tens to a few hundreds of mappings
between virtual and physical addresses.
• But a process may have many, many more mappings, how can such a small number help us?• Programs exhibit locality!• At any one time, you are only executing out of a few pages,
and you are only accessing data from a few pages.• Reuse a mapping many, many times and then replace it
when you no longer need it.
CS161 Spring 2014 52/27/14
The MMU
CS161 Spring 2014 6
CPUMemoryMMU
TLB
Regular Translation
2/27/14
The TLB
CS161 Spring 2014 7
VA PA V
junk junk 0junk junk 0junk junk 0
junk junk 0
1 2 3 4page offset
Compare page number to those in TLB
104800030003
Miss! Trap!
2/27/14
A TLB Miss (in software)
CS161 Spring 2014 8
VA PA Vjunk junk 0junk junk 0junk junk 0
junk junk 0
1 2 3 4page offset
Parallel comparison
10480003Trap• Exception PC• Faulting Address
10480003
11152129003A001D72016800
Page Table
001D
TLB refill
2/27/14
TLB Refill
CS161 Spring 2014 9
VA PA V
junk junk 0junk junk 0junk junk 0
junk junk 0
1 2 3 4page offset104800030003
001D
From mapping
1
2/27/14
1 2 3 4page offset
Physical Address
TLB Hit (1)
CS161 Spring 2014 10
VA PA V
junk junk 0junk junk 003 1D 1
junk junk 0
1 2 3 4page offset10480003
Virtual Address
10480003
?0003 001D 1
Is valid?
2/27/14
001D1D
page offset1 2 3 4
Virtual Address
104C00030003
1 2 3 4page offset
Physical Address
1D
TLB Hit (2)
CS161 Spring 2014 11
VA PA V
junk junk 0junk junk 003 1D 1
junk junk 0
?0003 001D 1
Is valid?001D
104C
2/27/14
page offset1 2 3 43246
Virtual Address
0000
1 2 3 4page offset
Physical Address
When your TLB Fills …
CS161 Spring 2014 12
0000
VA PA V
FD04 2340 10191 140E 10003 001D 1
4223 1403 1
?
3246
We know how to refill, butthere are no empty spots!
2/27/14
TLB Eviction
• How do we decide which entry to evict?• Goal is to evict something you are unlikely to need soon.
• Thoughts?
• We may also be constrained by where we are allowed to place an entry …
13CS161 Spring 20142/27/14
TLB Eviction
• How do we decide which entry to evict?• Goal is to evict something you are unlikely to need soon.
• Thoughts?• If we knew which entry would be used farthest in the future, we would
like to evict that one, because it’s doing us the least good.• If our fortune telling isn’t as good as we would like, how might we
guess?• How about the entry that has been unused the longest?• That would lead us to want to do LRU replacement.
• We may also be constrained by where we are allowed to place an entry …• Where we place entries is determined by how we find
entries in the TLB.
14CS161 Spring 20142/27/14
TLB Searching
• How do we quickly look up a virtual page number?• Linear search?
15CS161 Spring 20142/27/14
TLB Searching
• How do we quickly look up a virtual page number?• Linear search?
• Very simple!• Likely to be really slow!
• Direct mapped• Let each page number map to a particular slot in the TLB.• Typically select the slot be extracting a few bits from the page number:
16
VA PA V
FD04 2340 10191 140E 10003 001D 11244 1403 1
1010101010101010
High bits? Low bits?Middle bits?
CS161 Spring 20142/27/14
Direct Mapping: Which bits?
• High bits:
• Low bits:
• Middle bits:
• Mixture of high and low bits:
17CS161 Spring 20142/27/14
Direct Mapping: Which bits?
• High bits:• All the pages in a particular segment map to the same entry.
• Low bits:• Great for sequential access• “Obvious” places in different segments will map to the same
place (e.g., beginning of code and beginning of data).
• Middle bits:• Sensitive to segment sizes and actual mapping.
• Mixture of high and low bits:• Works well using mostly low and adding in a high bit to
differentiate segments.
18CS161 Spring 20142/27/14
Direct Mapping: Thrashing (1)
• One of the problems with direct mapped caches is that they are subject to thrashing: that is repeated missing and eviction of the same pages.
• Consider a sequence of instructions whose instruction addresses and data addresses map to the same entry in the TLB.
• For example:for (i = 0; i < 1024; i++)
sum += array[i]
• Let’s assume that this loop starts at PC 12340000 and the array resides at 32040000.
• Our page numbers are: 1234 and 3204
19
0011 01000001 0010
3 41 2
Instruction page
0000 01000011 0010
0 43 2
Data page
CS161 Spring 2014
Map to the same TLB entry!
2/27/14
Direct Mapping: Thrashing (2)
20
12340000: lw r1, 0(r0)12340004: add r2, r2, r112340008: br loop1234000C: add r0, r0, 4
VA PA V
FD04 2340 10191 140E 10002 001D 11143 1403 1
3204 0000R0
PC
VPN PPN Hit/Miss
1234 2244 HIT 1234 2244 14321 8864 13232 5678 14223 5679 1
Direct Mapping: Thrashing (2)
21
12340000: lw r1, 0(r0)12340004: add r2, r2, r112340008: br loop1234000C: add r0, r0, 4
VA PA V
FD04 2340 10191 140E 10002 001D 11143 1403 1
3204 0000R0
PC
VPN PPN Hit/Miss
1234 2244 HIT
3204 3579
1234 2244
1234 2244
1234 2244
1234 2244
3204 3579
1234 2244
1234 2244
1234 2244 14321 8864 13232 5678 14223 5679 1
3204 3579
MISS
MISS
MISS
MISS
HIT
HIT
HIT
HIT
0004
TLB Hit Ratios
• TLB performance or effectiveness is expressed in terms of a hit rate: the percentage of times that an access hits in the TLB.
22
TLB Hit Rate# TLB Accesses
# TLB Hits
CS161 Spring 2014
• What is the hit rate on the previous slide?• Even small TLBs typically achieve about a 98% hit
rate; anything less is intolerable!• How do they do it?
2/27/14
TLB: From Direct Mapped toSet Associative
• The fundamental problem we have is that any address can go in only one place.
• When more than one address that you need frequently maps to that one place, you are out of luck.
• Solution:• Provide more flexibility: let an address map to multiple
locations• Search those locations in parallel.
23CS161 Spring 20142/27/14
2-Way Set Associate TLB
24
VPN PPN Valid VPN PPN Valid
FF00 1234 1 AB00 4321 1
0001 8765 1 5501 2468 1
0002 1357 1 4202 2222 1
8843 1235 1 4203 2223 1
8844 0014 1 3204 2220 1
2345 0105 1 1355 A000 1
3636 FF00 1 2356 0106 1
0007 FE00 1 AB07 4567 1
Comparator Comparator
VPNindex
Physical Page NumberCS161 Spring 20142/27/14
Higher Associativity?
• If two-way is good, wouldn’t four way be better?• How about 8-way?• How about fully associative?
• Any entry can go in any location.• Older, tiny TLBs were, in fact, fully associative.• Implemented with a content-addressable memory (CAM).• CAMs are very expensive and become slower as you increase
their size.
• In practice, most processors use a direct mapped or two-way set associative TLB.
• Once you have a few hundred entries, it doesn’t seem to matter.
252/27/14 CS161 Spring 2014
Back to TLB Eviction
• Recall (slide 13) that we started to look at how TLBs were arranged to answer the question, “How do we select an entry to evict when the TLB is full?”
• Answer:• You do not have many choices.• In a direct mapped TLB?• In a 2-way TLB?• What do you suppose you do?
262/27/14 CS161 Spring 2014
The MMU and Context Switching
• What happens to the state in the TLB on a process switch?
• How could you design a smarter TLB?
• We’ve seen that traps must do a lot of work in saving and restoring state. This makes context switching expensive. What are some of the other (hidden) costs of context switching?
272/27/14 CS161 Spring 2014
The MMU and Context Switching
• What happens to the state in the TLB on a process switch?• Must be invalidated!
• How could you design a smarter TLB?• Add an address space ID in the TLB.
• We’ve seen that traps must do a lot of work in saving and restoring state. This makes context switching expensive. What are some of the other (hidden) costs of context switching?• If the last process used all of the TLB, you will take a lot of
TLB faults as you start running.
282/27/14 CS161 Spring 2014
Address Translation Summary (1)
29
TLB
HW Page Table
Virtual Address
1) Look up in TLB (fast)
OperatingSystem
Physical Address
TLB Hit!!!
2/27/14 CS161 Spring 2014
Address Translation Summary (2)
30
TLB
HW Page Table
Virtual Address
1) Look up in TLB (fast)
OperatingSystem
Physical Address
TLB Miss
2) Look up in page table(slower)
In table
Load TLB
TLB Hit!!!
2/27/14 CS161 Spring 2014
Address Translation Summary (3)
31
TLB
HW Page Table
Virtual Address
1) Look up in TLB (fast)
OperatingSystem
Physical Address
TLB Miss
2) Look up in page table(slower)
Not in table!
Load TLB
TLB Hit!!!
3) Ask theOS for help(slowest) Load
table
2/27/14 CS161 Spring 2014
Hardware/Software Boundary
32
TLB
HW Page Table
Virtual Address
OperatingSystem
Hardware
Software
Not found on allprocessors
2/27/14 CS161 Spring 2014
HW versus SW TLB Fault Handling
• Software (MIPS R2000):• Advantages
Simplicity (of hardware) Flexibility More able to monitor page accesses Good for homework assignments in operating systems
Disadvantages: Slow?
Hardware (x86): Advantages:
Speed Disadvantages:
Less control Hardware dictates page table structure
332/27/14 CS161 Spring 2014
Top Related