Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and...

47
Operating System Support for Shared Hardware Data Structures A dissertation thesis by Gedare Bloom George Washington University SEAS / Computer Science 19 Nov. 2012 Tompkins Hall 205 Committee Members: Gabriel Parmer, GWU CS Evan Drumwright, GWU CS Guru Venkataramani, GWU ECE Advised by Bhagirath Narahari and Rahul Simha

Transcript of Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and...

Page 1: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

Operating System Support for Shared Hardware Data Structures

A dissertation thesis by Gedare Bloom

George Washington UniversitySEAS / Computer Science

19 Nov. 2012Tompkins Hall 205

Committee Members:Gabriel Parmer, GWU CSEvan Drumwright, GWU CSGuru Venkataramani, GWU ECE

Advised by Bhagirath Narahari and Rahul Simha

Page 2: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

2

Memory wall

Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

Page 3: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

3

Cache grows with bandwidth

Page 4: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

4

Diminishing returns of cache growth

Cache Size (KB)

Cac

he

mis

s ra

te b

y ty

pe

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 4 8

16 32 64 128

1-way

2-way

4-way

8-way

Capacity

Conflict

Compulsory

Hennessy and Patterson

Page 5: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

5

Prefetching picks up the slack

5

Spe

edup

Hennessy and Patterson

Page 6: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

6

bzip

2ga

pm

cf

pars

er

vorte

xvp

r

amm

pap

plu ar

t

equa

ke

face

rec

galg

el

mes

am

grid

sixtra

cksw

im

wupwise

gmea

n

hmea

n0.0

1.0

2.0

3.0

4.0

5.0No Prefetching Conservative Aggressive

Prefetching not always beneficialIn

stru

ctio

ns p

er c

ycle

S. Srinath, O. Mutlu Kim Hyesoon, Y.N. Patt “Feedback Directed Prefetching” 2007

48% slowdown!

Page 7: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

7

Multicore game changer

● Industry: “use spare transistors for cores”

● Problems– Parallel programming is hard

– More data sharing: bad for cache

– More bus contention: bad for prefetching

Page 8: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

8

Hardware data structures (HWDSs)

● HWDS = parallelism + smart storage– Advantage: reduce algorithmic complexity

– Disadvantage: devotion of chip space

● New use for spare transistors

Page 9: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

9

Priority Queue HWDS

N-1

out

in

1

out

in

0

outin

out

in

out

control

priority

payload

priority

payload

HighestPriority

New entry

Page 10: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

10

Map HWDS

key

value

Insert

CAM RAM

Search/Extract

index

key

value

Page 11: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

11

Why PQ and Map?

● Critical to important software– 50-60% Dijkstra's algorithm

– 30% grey-weighted distance transform

– 40% discrete event simulation

– 18% real-time task scheduling

– 12% web browser

– 20% physics simulations / scientific apps

– 30%-900% referent object (bounds) checker

PQ

Map

Page 12: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

12

OS Support for HWDSs

● This thesis contributes– DS operation API

– Spilling HWDS overflow

– HWDS Assignment for sharing hardware

– Multiple kinds of HWDSs

– Improved predictability for real-time systems

– Cycle-accurate evaluation with real-world data

Page 13: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

13

Hardware’s advantage: parallelismso

ftwa

reh

ard

war

e

0

1

2

3

0

1

time

Page 14: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

14

Hardware’s advantage: parallelismso

ftwa

reh

ard

war

e

0

1

2

3

0

1

time

Page 15: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

15

Hardware’s disadvantage: capacityso

ftwa

reh

ard

war

e

0

1

2

3

0

1

time

Page 16: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

16

Hardware’s disadvantage: capacityso

ftwa

reh

ard

war

e

malloc

No space!

0

1

2

3

0

1

time

4

Page 17: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

17

Solving capacity by spilling

Overflow (software) DS HWDS

SPILL

Page 18: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

18

Solving capacity by filling

Overflow (software) DS HWDS

FILL

Page 19: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

19

HWDS Overflow: spill and fill

● United HWDS– Structural locality, HWDS knowledge

● Split HWDS– Spill inserts to overflow DS

● Overflow support needed– spill / fill HWDS instructions

– Exception handling / interposition

– Size limits

– What and how much to spill / fill

Page 20: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

20

PQ Overflow

● Structural locality– Spill lowest priority node

– Fill highest priority node

● Insert-after-spill violates ordering– Hardware marks ordering violations

– Filling clears the marks

● United HWDS: merge-sorted linked list

Page 21: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

21

Map Overflow

● Spill locality– Least recently used (LRU)

– Least frequently used (LFU)

● Fill locality– Most recently used (MRU)

– Most frequently used (MFU)

● Search fail-over– Fill after search (FAS)

United HWDS?

Page 22: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

22

Hardware’s disadvantage: sharing

0

1

2

3

0

1

time

softw

are

ha

rdw

are

Page 23: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

23

Hardware’s disadvantage: sharing

0

1

2

3

0

1

time

softw

are

ha

rdw

are

Wrong DS!

Page 24: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

24

Sharing hardware: context switch

Overflow (software) DS HWDS

Context switch

Page 25: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

25

Sharing hardware: context switch

Overflow (software) DS HWDS

Context switch

Page 26: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

26

Sharing hardware: Assignment

● HWDS or software implementation?● Static / dynamic algorithms

– Context switch● How much to restore● Pinning● Eviction

– Size limits

– Interposition

– Stalling

Page 27: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

27

HWDSs for Hard Real-Time

● HWDS reduces variance in operation times ● Apply OS support for HWDS to real-time

– Four HWDS assignment algorithms● Software-only assignment (SOA)● Hardware-only assignment (HOA)● Priority-aware assignment (PAA)● Context switch cost-aware assignment (CSCAA)

● Real-world applications improve by5–15% utilization (see [1])

1. G. Bloom, G. Parmer, B. Narahari, and R. Simha, “Shared Hardware Data Structures for Hard Real-Time Systems,” 12th International Conference on Embedded Software. EMSOFT 2012. October 2012

Page 28: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

28

Experiments: Setup

● HWDSs implemented in Simics/GEMS

– New functional unit: atomic, non-speculative

– 12-cycle HWDS instructions● OS support in RTEMS

– Exception handling and interposition library

– Overflow DS

– HWDS context switch and assignment

– Task scheduling

Page 29: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

29

Synthetic benchmarks

● Pending event set: classic hold [1]– Build PQ to max size

– Execute hold operations

● Skewed search [2]– Build map to max size

– Execute search and update operation

1. Douglas W Jones. An empirical comparison of priority-queue and event-set implementations. Commun. ACM, 29(4):300311, April 1986.2. Jim Bell and Gopal Gupta. An evaluation of self-adjusting binary search tree techniques. Software: Practice and Experience, 23(4):369–382, April 1993.

Page 30: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

30

pq-1task-4k-1024-1.eps

Overflow with HWPQ

pq-1task-4k-128-1.eps

Small PQ Big PQ

Page 31: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

31

HWDS sharing: 4 same-sized PQs

pq-4tasks-4k-same-1-1.eps

Page 32: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

32

Map overflow: spill last

Noskew

Largeskew

80% updatesRead-only

Page 33: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

33

Map overflow: LRU, FAS

Noskew

Largeskew

80% updatesRead-onlyFail

Page 34: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

34

Bigger HWDS, LRU FAS, skew80% updatesRead-only

128

1024

Page 35: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

35

Drill Down: LRU FAS w/ large skew80% updatesRead-only

128

1024

Page 36: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

36

Evict on extract (4Kops, 80% updates)No skew Large skew

Page 37: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

37

Sharing different size maps (4Kops)80% updatesRead-only

Noskew

Largeskew

Page 38: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

38

Size checked Map HWDS (4Kops)80% updatesRead-only

Noskew

Largeskew

Fail

Page 39: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

39

Synthetic Benchmarks: Summary

● Overflow and useful SW:HW ratios

< 16:1 for PQ – advantage United HWDS

< 1.5:1 for update-heavy skewed search

● Shared HWDSs are effective

● Policies can avoid performance loss

Page 40: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

40

Real-World Applications: Planning

● Dijkstra's Algorithm, A* search– PQ time can be 50% or more of total

– Benefits from change-key operation

● 9th DIMACS challenge: GPS navigation– USA road maps

Input PQ Size PQ Operations PQ timeNY 925 528693 28.50%

BAY 886 642540 27.10%

COL 945 871332 30.10%Behavior of 3 smallest USA inputs

Page 41: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

41

256 512 10240.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

-33.6%

3.8%

21.2%

6.7%

23.2%24.6%

9.8%

24.1%26.0%

NY BAY COL

Hardware Priority Queue Size

Per

form

ance

ver

sus

Sm

artQ

United HWDS: 5 queries

Page 42: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

42

Colorado benchmark: 1 query

128 256 512 10240

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.540.65

0.860.940.96

1.15

1.33 1.36

SmartQ SplitHWDS UnitedHWDS

Hardware Priority Queue Size

Nor

mal

ized

Exe

cutio

n T

ime

Fail

Page 43: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

43

GPS Application: Summary

● Performance improves despite overflow

● Prior art does not use HWDS effectively

● This thesis benefits real-world applications

Page 44: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

44

Conclusion

● HWDS can improve memory workloads● OS support necessary for applications● This thesis:

– Handles overflow better

– First to support HWDS sharing

– Demonstrates benefit for real applications

– Evaluates overheads with cycle precision

– Opens a new door for future explorations

Page 45: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

45

Future Work

● Memory access policies like with cache● HWDS assignment algorithms● Sharing data in a HWDS● OS optimizations from HWDS knowledge● Language and library integration● Hardware improvements

Page 46: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

46

PublicationsG. Bloom, G. Parmer, B. Narahari, and R. Simha, “Shared Hardware Data Structures for Hard Real-Time Systems,” 12th International Conference on Embedded Software. EMSOFT 2012. October 2012.

E. Leontie, G. Bloom, B. Narahari, and R. Simha, “No Principal Too Small: Memory Access Control for Fine-Grained Protection Domains,” 15th Euromicro Conference on Digital System Design. DSD 2012. September 2012.

G. Bloom, E. Leontie, B. Narahari, and R. Simha, “Chapter 12 - Hardware and Security: Vulnerabilities and Solutions,” in Handbook on Securing Cyber-Physical Critical Infrastructure, Boston: Morgan Kaufmann, 2012, pp. 305–331. ISBN: 978-0-12-415815-3.

E. Leontie, G. Bloom, R. Simha, “Automation for Creating and Configuring Security Manifests for Hardware Containers,” 4th Symposium on Configuration Analytics and Automation. SafeConfig 2011. October 2011.

G. Bloom, G. Parmer, B. Narahari, and R. Simha. “Real-Time Scheduling with Hardware Data Structures,” Work in Progress, IEEE Real-Time Systems Symposium, 2010. RTSS 2010. December 2010.

G. Bloom, B. Narahari, and R. Simha. “Fab Forensics: Increasing Trust in IC Fabrication,” IEEE International Conference on Technologies for Homeland Security, 2010. HST '10. November 2010.

E. Leontie, G. Bloom, O. Gelbart, B. Narahari, and R. Simha. “A compiler-hardware technique for protecting against buffer overflow attacks,” Journal of Information Assurance and Security, vol. 5, no.1, pp. 1-8, 2010.

E. Leontie, G. Bloom, B. Narahari, R. Simha, and J. Zambreno. “Hardware-enforced Fine-grained Isolation of Untrusted Code,” Proceedings of the First ACM Workshop on Secure Execution of Untrusted Code. SecuCode '09. November 2009.

G. Bloom, B. Narahari, R. Simha, and J. Zambreno. “Providing secure execution environments with a last line of defense against Trojan circuit attacks,” Computers & Security, vol. 28, no. 7, pp. 660-669, October 2009.

E. Leontie, G. Bloom, B. Narahari, R. Simha, and J. Zambreno. “Hardware Containers for Software Components: A Trusted Platform for COTS-Based Systems,” 2009 IEEE/IFIP International Symposium on Trusted Computing and Communications. TRUSTCOM 2009. August 2009.

G. Bloom, B. Narahari, and R.Simha. “OS Support for Detecting Trojan Circuit Attacks,” 2nd IEEE International Workshop on Hardware-Oriented Security and Trust. HOST 2009. July 2009.

G. Bloom and S. Popoveniuc, “Information leakage in mix networks with randomized partial checking,” 2009 International Conference on Information Security and Privacy. ISP-09. July 2009.

Page 47: Operating System Support for Shared Hardware Data … · 2 Memory wall Hennessy and Patterson,"Computer Architecture: A Quantitative Approach,” 4th Ed., 2007, Morgan Kaufman Publishers.

47

Thanks!

“programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant data itself but where to find it”

— John Backus, 1977 ACM Turing Award Lecture

“Advances in microelectronics have made the realization of “smart” data structures a practical reality.”

— Charles Leiserson, Systolic Priority Queues, 1979

Indeed, I believe that virtually every important aspect of programming arises somewhere in the context of sorting or searching!

— Don Knuth, The Art of Computer Programming