CS 471 Operating Systems Yue Cheng

78
CS 471 Operating Systems Yue Cheng George Mason University Fall 2019

Transcript of CS 471 Operating Systems Yue Cheng

Page 1: CS 471 Operating Systems Yue Cheng

CS 471 Operating Systems

Yue ChengGeorge Mason University

Fall 2019

Page 2: CS 471 Operating Systems Yue Cheng

Paging Problemso Page tables are too slow

o Page tables are too big

2

Page 3: CS 471 Operating Systems Yue Cheng

Address Translation Stepso Hardware: for each memory reference

1. Extract VPN (virt page num) from VA (virt addr)2. Calculate addr of PTE (page table entry)3. Fetch PTE4. Extract PFN (phys page frame num)5. Build PA (phys addr)6. Fetch PA to register

o Q: Which steps are expensive??

3

Page 4: CS 471 Operating Systems Yue Cheng

Address Translation Stepso Hardware: for each memory reference

1. Extract VPN (virt page num) from VA (virt addr)2. Calculate addr of PTE (page table entry)3. Fetch PTE4. Extract PFN (phys page frame num)5. Build PA (phys addr)6. Fetch PA to register

o Q: Which steps are expensive??

4

cheap

cheap

cheap

cheap

expensive

expensive

Page 5: CS 471 Operating Systems Yue Cheng

Address Translation Stepso Hardware: for each memory reference

1. Extract VPN (virt page num) from VA (virt addr)2. Calculate addr of PTE (page table entry)3. Fetch PTE4. Extract PFN (phys page frame num)5. Build PA (phys addr)6. Fetch PA to register

o Q: Which expensive steps can we avoid??

5

cheap

cheap

cheap

cheap

expensive

expensive

Page 6: CS 471 Operating Systems Yue Cheng

Array Iteratoro A simple code snippet in array.c

o Compile it using gcc

o Dump the assembly code– objdump (Linux) or otool (Mac) 6

int sum = 0;for (i=0; i<N; i++) {

sum += a[i];}

Page 7: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

7

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Page 8: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

8

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000load 0x100Cload 0x7004load 0x100Cload 0x7008load 0x100Cload 0x700C

Page 9: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

9

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000load 0x100Cload 0x7004load 0x100Cload 0x7008load 0x100Cload 0x700C

1st mem access: Fetch PTE

Page 10: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

10

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Map VPN to PFN: 3 à 7

Physload 0x100Cload 0x7000load 0x100Cload 0x7004load 0x100Cload 0x7008load 0x100Cload 0x700C

Page 11: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

11

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000load 0x100Cload 0x7004load 0x100Cload 0x7008load 0x100Cload 0x700C

2nd mem access: access a[i]

Page 12: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses

12

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000load 0x100Cload 0x7004load 0x100Cload 0x7008load 0x100Cload 0x700C

Note: 1. Each virt mem access à two phys mem accesses2. Repeated memory accesses!

Page 13: CS 471 Operating Systems Yue Cheng

13

Translation Lookaside Buffer (TLB)

Page 14: CS 471 Operating Systems Yue Cheng

Performance Problems of Paging

o A basic memory access protocol1. Fetch the translation from in-memory page table2. Explicit load/store access on a memory address

o In this scheme every data/instruction access requires two memory accesses– One for the page table – and one for the data/instruction

o Too much performance overhead!

14

Page 15: CS 471 Operating Systems Yue Cheng

Speeding up Translationo The two memory access problem can be solved

by the use of a special fast-lookup hardware cache called translation lookaside buffer (TLB)

o A TLB is part of the memory-management unit (MMU)

o A TLB is a hardware cacheo Algorithm sketch

– For each virtual memory reference, hardware first checks the TLB to see if the desired translation is held therein

15

Page 16: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA

16

Page 17: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA2. Check if TLB holds the translation

17

Page 18: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA2. Check if TLB holds the translation3. If it is a TLB hit – extract PFN from the

TLB entry, concatenate it onto the offset to form the PA

18

Page 19: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA2. Check if TLB holds the translation3. If it is a TLB hit – extract PFN from the

TLB entry, concatenate it onto the offset to form the PA

19

Fast path

Page 20: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA2. Check if TLB holds the translation3. If it is a TLB hit – extract PFN from the

TLB entry, concatenate it onto the offset to form the PA

4. If it is a TLB miss – access page table to get the translation, update the TLB entry with the translation

20

Fast path

Page 21: CS 471 Operating Systems Yue Cheng

TLB Basic Algorithm1. Extract VPN from VA2. Check if TLB holds the translation3. If it is a TLB hit – extract PFN from the

TLB entry, concatenate it onto the offset to form the PA

4. If it is a TLB miss – access page table to get the translation, update the TLB entry with the translation

21

Fast path

Slow path

Page 22: CS 471 Operating Systems Yue Cheng

Array Iterator (w/ TLB)

22

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Page 23: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

23

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Page 24: CS 471 Operating Systems Yue Cheng

24

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

PhysP1’s page table

53

41

0123

Valid Virt Phys

0

0

0

0

CPU’s TLB cache

Trace the Memory Accesses (w/ TLB)

Page 25: CS 471 Operating Systems Yue Cheng

25

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

PhysP1’s page table

53

41

0123

Valid Virt Phys

0

0

0

0

CPU’s TLB cache

Miss

Trace the Memory Accesses (w/ TLB)

Page 26: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

26

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

P1’s page table53

41

0123

Valid Virt Phys

0

0

0

0

CPU’s TLB cache

Miss

Physload 0x100C

Page 27: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

27

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100C

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 28: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

28

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 29: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

29

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 30: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

30

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000

(TLB hit)

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Hit

Page 31: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

31

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 32: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

32

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700C

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 33: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

33

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700C

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

0

0

0

CPU’s TLB cache

Page 34: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

34

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700Cload 0x100F

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

1 2 4

0

0

CPU’s TLB cache

Miss

Page 35: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

35

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700Cload 0x100Fload 0x4000

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

1 2 4

0

0

CPU’s TLB cache

Page 36: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

36

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

load 0x2004

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700Cload 0x100Fload 0x4000

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

1 2 4

0

0

CPU’s TLB cache

Page 37: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

37

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

load 0x2004

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700Cload 0x100Fload 0x4000

(TLB hit)

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

1 2 4

0

0

CPU’s TLB cache

Hit

Page 38: CS 471 Operating Systems Yue Cheng

Trace the Memory Accesses (w/ TLB)

38

Virtload 0x3000

load 0x3004

load 0x3008

load 0x300C…

load 0x2000

load 0x2004

Physload 0x100Cload 0x7000

(TLB hit)load 0x7004

(TLB hit)load 0x7008

(TLB hit)load 0x700Cload 0x100Fload 0x4000

(TLB hit)load 0x4004

P1’s page table53

47

0123

Valid Virt Phys

1 3 7

1 2 4

0

0

CPU’s TLB cache

Page 39: CS 471 Operating Systems Yue Cheng

How Many TLB Lookups

39

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Assume 4KB pages

Page 40: CS 471 Operating Systems Yue Cheng

How Many TLB Lookups

40

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Assume 4KB pages

Array a[] has 1024 items, each item is 4 bytes:Size(a) = 4096

Page 41: CS 471 Operating Systems Yue Cheng

How Many TLB Lookups

41

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Assume 4KB pages

Array a[] has 1024 items, each item is 4 bytes:Size(a) = 4096

Num of TLB miss: 4096/4096 = 1 or 2

Page 42: CS 471 Operating Systems Yue Cheng

How Many TLB Lookups

42

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Assume 4KB pages

Array a[] has 1024 items, each item is 4 bytes:Size(a) = 4096

Best case: Num of TLB miss: 4096/4096 = 1TLB miss rate: 1/1024 = 0.09%

Page 43: CS 471 Operating Systems Yue Cheng

How Many TLB Lookups

43

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

Assume 4KB pages

Array a[] has 1024 items, each item is 4 bytes:Size(a) = 4096

Best case: Num of TLB miss: 4096/4096 = 1TLB miss rate: 1/1024 = 0.09%

TLB hit rate : 99.91% (almost 100%)

Page 44: CS 471 Operating Systems Yue Cheng

TLB Contento Some entries are [wired down or reserved] for

permanently valid translations

o TLB is a fully associative cache– Any given translation can be anywhere in the TLB– Hardware searches entire TLB in parallel to find a

match

o A typical TLB entryVPN | PFN | other bits

44

Page 45: CS 471 Operating Systems Yue Cheng

Paging Hardware w/ TLB

45

offset

Page 46: CS 471 Operating Systems Yue Cheng

TLB Issue: Context Switcho TLB contains translations only valid for the

currently running process

o Switching from one process to another requires OS or hardware to do more work

46

Page 47: CS 471 Operating Systems Yue Cheng

One Exampleo How does OS distinguish which entry is for

which process?

47

P1P2

Page 48: CS 471 Operating Systems Yue Cheng

One Simple Solution: Flusho OS flushes the whole TLB on context switch

o Flush operation sets all valid bit to 0

48

Page 49: CS 471 Operating Systems Yue Cheng

One Simple Solution: Flusho OS flushes the whole TLB on context switch

o Flush operation sets all valid bit to 0

o Problem: the overhead is too high if OS switches processes too frequently

49

Page 50: CS 471 Operating Systems Yue Cheng

Optimization: ASIDo Some hardware systems provide an address

space identifier (ASID) field in the TLB

o Think of ASID as a process identifier (PID)– An 8-bit field

50

Page 51: CS 471 Operating Systems Yue Cheng

Page Sharingo Leveraging ASID for supporting page sharing

o In this example, two entries from two processes with two different VPNs point to the same physical page

51

Page 52: CS 471 Operating Systems Yue Cheng

Page Sharing (cont.)o Shared code

– One copy of read-only (reentrant) code shared among processes (e.g., text editors, compilers, window systems)

– Particularly important for time-sharing environments

o Private code and data – Each process keeps a separate copy of the code and

data

52

Page 53: CS 471 Operating Systems Yue Cheng

TLB Replacement Policyo Cache: When we want to add a new entry to a full TLB, an old entry must be evicted and replaced

o Least-recently-used (LRU) policy– Intuition: A page entry that has not recently been used

implies it won’t likely to be used in the near future

o Random policy– Evicts an entry at random

53

Page 54: CS 471 Operating Systems Yue Cheng

TLB Workloadso Sequential array accesses can almost always hit

in the TLB, and hence are very fast

o What pattern would be slow?

54

Page 55: CS 471 Operating Systems Yue Cheng

TLB Workloadso Sequential array accesses can almost always hit

in the TLB, and hence are very fast

o What pattern would be slow?– Highly random, with no repeat accesses

55

Page 56: CS 471 Operating Systems Yue Cheng

Workload Characteristics

56

int sum = 0;for (i=0; i<1024; i++) {

sum += a[i];}

int sum = 0;srand(1234);for (i=0; i<512; i++) {

sum += a[rand() % N];}srand(1234); // same seedfor (i=0; i<512; i++) {

sum += a[rand() % N];}

Workload A Workload B

Page 57: CS 471 Operating Systems Yue Cheng

Access Patterns

57

Workload A

Time

Addr …

Workload B

Time

Addr …

Page 58: CS 471 Operating Systems Yue Cheng

Access Patterns

58

Workload A

Time

Addr …

Workload B

Time

Addr …

Spatial Locality Temporal Locality

Page 59: CS 471 Operating Systems Yue Cheng

Workload Localityo Spatial locality:

– Future access will be to nearby addresses

o Temporal locality:– Future access will be repeated to the same data

59

Page 60: CS 471 Operating Systems Yue Cheng

Workload Localityo Spatial locality:

– Future access will be to nearby addresses

o Temporal locality:– Future access will be repeated to the same data

o Q: What TLB characteristics are best for each type?

60

Page 61: CS 471 Operating Systems Yue Cheng

Workload Localityo Spatial locality:

– Future access will be to nearby addresses

o Temporal locality:– Future access will be repeated to the same data

o Q: What TLB characteristics are best for each type?– One TLB entry holds the translation for one memory page:

all accesses to that particular page benefit from this single TLB entry (spatial locality)

– TLB is a small cache (if supporting LRU): memory accesses with temporal locality benefit

61

Page 62: CS 471 Operating Systems Yue Cheng

TLB Replacement Policyo Cache: When we want to add a new entry to a

full TLB, an old entry must be evicted and replaced

o Least-recently-used (LRU) policy– Intuition: A page entry that has not recently been used

implies it won’t likely to be used in the near future

o Random policy– Evicts an entry at random

62

Page 63: CS 471 Operating Systems Yue Cheng

LRU Trouble

63

Virt addr0123

Valid Virt Phys

0

0

0

0

CPU’s TLB cache

4

Page 64: CS 471 Operating Systems Yue Cheng

LRU Trouble

64

Virt addr0123

Valid Virt Phys

1 0 ?

0

0

0

CPU’s TLB cache

4

Page 65: CS 471 Operating Systems Yue Cheng

LRU Trouble

65

Virt addr0123

Valid Virt Phys

1 0 ?

0

0

0

CPU’s TLB cache

4

TLB miss

Page 66: CS 471 Operating Systems Yue Cheng

LRU Trouble

66

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

0

0

CPU’s TLB cache

4

Page 67: CS 471 Operating Systems Yue Cheng

LRU Trouble

67

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

0

0

CPU’s TLB cache

4

TLB miss

Page 68: CS 471 Operating Systems Yue Cheng

LRU Trouble

68

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

0

CPU’s TLB cache

4

Page 69: CS 471 Operating Systems Yue Cheng

LRU Trouble

69

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

0

CPU’s TLB cache

4

TLB miss

Page 70: CS 471 Operating Systems Yue Cheng

LRU Trouble

70

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

Page 71: CS 471 Operating Systems Yue Cheng

LRU Trouble

71

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

TLB miss

Page 72: CS 471 Operating Systems Yue Cheng

LRU Trouble

72

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

Page 73: CS 471 Operating Systems Yue Cheng

LRU Trouble

73

Virt addr0123

Valid Virt Phys

1 0 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

Now, 0 is the least-recently used item in TLB

Page 74: CS 471 Operating Systems Yue Cheng

LRU Trouble

74

Virt addr0123

Valid Virt Phys

1 4 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

Replace 0 with 4

Page 75: CS 471 Operating Systems Yue Cheng

LRU Trouble

75

Virt addr0123

Valid Virt Phys

1 4 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

TLB miss

Replace 0 with 4

Page 76: CS 471 Operating Systems Yue Cheng

LRU Trouble

76

Virt addr0123

Valid Virt Phys

1 4 ?

1 1 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

Accessing 0 again, which was unfortunately just evicted…

Page 77: CS 471 Operating Systems Yue Cheng

LRU Trouble

77

Virt addr0123

Valid Virt Phys

1 4 ?

1 0 ?

1 2 ?

1 3 ?

CPU’s TLB cache

4

TLB miss

Accessing 0 again, which was unfortunately just evicted…Replace 1 (which is the least-recently used item at this point) with 0…

Page 78: CS 471 Operating Systems Yue Cheng

Takeawayo LRU

o Random

o When is each better?– Sometimes random is better than a “smart” policy!

78