Fifteen Implementation Principles

27
Spring 2006 CS 685 Network Algorithmics 1 Fifteen Implementation Principles CS 685 Network Algorithmics Spring 2006

description

Fifteen Implementation Principles. CS 685 Network Algorithmics Spring 2006. Taxonomy of Principles. P1-P5: System-oriented Principles These recognize/leverage the fact that a system is made up of components Basic idea: move the problem to somebody else’s subsystem - PowerPoint PPT Presentation

Transcript of Fifteen Implementation Principles

Page 1: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 1

Fifteen Implementation Principles

CS 685 Network AlgorithmicsSpring 2006

Page 2: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 2

Taxonomy of Principles• P1-P5: System-oriented Principles

– These recognize/leverage the fact that a system is made up of components

– Basic idea: move the problem to somebody else’s subsystem

• P6-P10: Improve efficiency without destroying modularity– “Pushing the envelope” of module specifications– Basic engineering: system should satisfy spec but not do

more• P11-P15: Local optimization techniques

– Akin to “peephole optimizations” in compilers– Apply these after you have looked at the big picture

Page 3: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 3

Part I: Systems Principles

Page 4: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 4

P1: Avoid Obvious Waste• Key Concept: look for ways to avoid doing

something, or to avoid doing it multiple times• Example: copying in protocol stacks

– In the old days, copying was not a bottleneck– But transmission & processor speeds increased much

faster than memory speeds• Today, reading/writing from/to memory is slowest

operation on a packet• “Zero-copy” protocol stacks never copy packet data

once it is stored in memory by the NIC– Eventually multiple copies became a bottleneck

Page 5: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 5

P2: Shift Computation in Time• Key Concept: Move expensive operations from

where time is scarce to where it is more plentiful– P2a: Precompute (= shift earlier in time)

• Example: computing a function by table lookup– P2b: Evaluate Lazily (= shift later in time)

• Example: copy-on-write– P2c: Share Expenses (= collect things in time)

• Example: garbage collection• Examples:

– initializing counter arrays: lazy evaluation + batching

Page 6: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 6

P3: Relax (Sub)System Requirements

• Key Concept: make one subsystem’s job easier by relaxing the specification (possibly at the expense of another subsystem’s job getting slightly harder)– P3a: Trade certainty for time (= use probabilistic solutions)– P3b: Trade accuracy for time (= use approximate

solutions)• Remember: “Good enough” is good enough

– P3c: Shift computation in space (= let someone else do it)• Example: DF bit in IPv4

– Purpose: to help End Systems avoid fragmentation– Why avoid?

• ease load on routers, avoid loss of APDUs

Page 7: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 7

P4: Leverage Off-System Components

• Key Concept: design bottom-up, use hardware– P4a: Exploit locality (= exploit caches)

• Note: not always useful (cf. IP forwarding lookups)– P4b: Trade Memory for Speed

• Note: this can be interpreted two ways– Use more memory to avoid computation (cf P12)– Use less memory to make data structures fit in cache

– P4c: Exploit hardware features• Examples

– Some NIC cards compute TCP checksum on the board (i.e. the OS/software does not need to compute it)

Page 8: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 8

P5: Add Hardware to Improve Performance

• Key Concept: Hardware is inherently parallel, can be very fast, and is cheap in volume; consider adding hardware to perform operations that must be done often and are expensive in software– P5a: Use memory interleaving, pipelining (= parallelism)– P5b: Use Wide-word parallelism (save memory accesses)– P5c: Combine SRAM, DRAM

• Examples:– DES encryption — requires permuting bits, nonlinear

mappings (some say it was designed to require hardware for fast implementation)

• Add-on boards for encryption/decryption at wire speeds– Ephemeral State Store

Page 9: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 9

Ephemeral State Store(Associative Memory)

Implementation

Hash Table (SRAM)

tag expire chain

Last

Next

tag expire chain

Handle

Handle Table (DRAM) Value Table (DRAM)

k

k64k 64

A store of size 2k bindings requires 128 + (h+1)k + z bitswhere h = (hash table size / store size)

and z = timestamp size

value

(tag, value)

Page 10: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 10

Part II: Modularity With Efficiency

• Note: read Clark & Tennenhouse, “Architectural Considerations for a New Generation of Protocols”, Proceedings of ACM SIGCOMM 1990

Page 11: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 11

P6: Replace inefficient general routines with efficient specialized

ones• Key Concept: General-purpose routines cannot

leverage problem-specific knowledge that can improve performance

• Example: – in_pcblookup() function in BSD Unix: general-

purpose state-block retrieval for both TCP and UDP sockets

• Uses simple linear search• Not adequate for large servers with 10000’s of open

sockets• Applications have very different reference characteristics, so

organize PCB’s for different apps separately (Calvert & Dixon '95)

Page 12: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 12

P7: Avoid Unnecessary Generality

• Key Concept: Do not include features or handle cases that are not needed.– "When in doubt, leave it out"

• Example:– mbuf structure was designed to not waste memory

for devices that produced small amounts of input• Today memory is cheap, and most devices produce

1000+ bytes at once• Most implementations use large monolithic buffers, big

enough to hold an Ethernet packet (2K bytes or more)

Page 13: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 13

P8: Don't be tied to reference implementations

• Key Concept:– Implementations are sometimes given (e.g. by

manufacturers) as a way to make the specification of an interface precise, or show how to use a device

– These do not necessarily show the right way to think about the problem—they are chosen for conceptual clarity!

• Examples:– RSARef implementation of RSA cryptography– Thread-per-layer implementations vs. upcalls

Page 14: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 14

P9: Pass hints across interfaces• Key Concept: if the caller knows something the

callee will have to compute, pass it (or something that makes it easier to compute) as an argument!– "hint" = something that makes the recipient's life

easier, but may not be correct– "tip" = hint that is guaranteed to be correct– Caveat: callee must either trust caller, or verify

(probably should do both)• Example

– Passing addresses of device-mapped pages in from user processes (Text Section 4.1)

Page 15: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 15

P10: Pass hints in protocol headers

• Key Concept: If sender knows something receiver will have to compute, pass it in the header

• Example:– Identifying state blocks

• TCP/IP method (4-tuple) is inefficient and slow• Better: have each end give the other a handle, to be

included in each packet; handle can be index into array– Include a nonce or key to validate the information– Nonce stored in both header and state block

Page 16: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 16

Part III: Local Speedup Techniques

Page 17: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 17

P11: Optimize the Expected Case

• Key Concept: If 80% of the cases can be handled similarly, optimize for those cases

• P11a: Use Caches– A form of using state to improve performance

• Example:– TCP input "header prediction"

• If an incoming packet is in order and does what is expected, can process in small number of instructions (see code)

Page 18: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 18

P12: Add or Exploit State to Gain Speed

• Key Concept: Remember things to make it easier to compute them later

• P12a: Compute incrementally– Here the idea is to "accumulate" as you go, rather

than computing all-at-once at the end• Example:

– Incremental computation of Chi-Square statistic

Page 19: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 19

P13: Optimize Degrees of Freedom

• Key Concept: Consider all the aspects of the problem that might be adapted to the conditions

• Example: IP trie lookups, where "width" of tree varies in the tree

Page 20: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 20

P14: Use special techniques for finite universes (e.g. small integers)

• Key Concept: when the domain of a function is small, techniques like bucket sorting, bitmaps, etc. become feasible.

• Example: Timing wheels

Page 21: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 21

P15: Use algorithmic techniques to create efficient data structures

• Key Concept: once P1-P14 have been applied, think about how to build an ingenious data structure that exploits what you know

• Examples– IP forwarding lookups

• PATRICIA trees (data structure) were first• Then many other more-efficient approaches

– Packet classification• Given a set of patterns to match 5-tuples, and a 5-

tuple, find {all|the first} pattern(s) that it matches

Page 22: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 22

Caveats• These are implementation principles, not

design principles– But when you go to design new protocols, it is very

helpful to know them! (E.g. SCTP uses receiver-chosen handles to identify state blocks.)

• There are other (probably better) ways to carve up these ideas into groups of "Principles"– The value is in thinking about and applying them

Page 23: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 23

Cautionary Examples• Having web servers pre-serve embedded objects when an

HTML object is requested– Varghese says they tried it, found it hurt performance!– Two proposed reasons:

• Interaction with TCP slow-start• Client caching

• Multiple-string matching in Snort (IDS)– Modified Boyer-Moore matching algorithm to do "set

matches"– Incorporated into Snort – little improvement– Why?

• String matching was not the bottleneck!• A large data structure was required, which no longer fit in cache

Page 24: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 24

Cautionary Examples, cont.• Process-list searching in PDP-11 Unix

– Many kernel operations involved a linear search through the list of processes

– Idea: use doubly-linked list of processes to speed search, insertion, deletion

– Result: would've taken about twice as long for typical table sizes; needed 1000's of processes before it would pay off!

Page 25: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 25

Cautionary Questions• Q1: Is improvement really needed?• Q2: Is this really the bottleneck?• Q3: What impact will change have on rest of

system?• Q4: Does BoE-analysis indicate significant

improvement?• Q5: Is it worth adding custom hardware?• Q6: Can protocol change be avoided?• Q7: Do prototypes confirm the initial promise?• Q8: Will performance gains be lost if

environment changes?

Page 26: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 26

CANEs Packet-Processing Model

incoming channels

customizing code

outgoing channels

predefined “slots”

GenericForwardingFunction

(e.g. active congestion control)

Page 27: Fifteen Implementation Principles

Spring 2006 CS 685 Network Algorithmics 27

Experiment Configuration

Background traffic source

MPEG source(avg rate 725 kbps)

Bottleneck link(2 Mbps)

Active IP router