Fifteen Implementation Principles
description
Transcript of Fifteen Implementation Principles
Spring 2006 CS 685 Network Algorithmics 1
Fifteen Implementation Principles
CS 685 Network AlgorithmicsSpring 2006
Spring 2006 CS 685 Network Algorithmics 2
Taxonomy of Principles• P1-P5: System-oriented Principles
– These recognize/leverage the fact that a system is made up of components
– Basic idea: move the problem to somebody else’s subsystem
• P6-P10: Improve efficiency without destroying modularity– “Pushing the envelope” of module specifications– Basic engineering: system should satisfy spec but not do
more• P11-P15: Local optimization techniques
– Akin to “peephole optimizations” in compilers– Apply these after you have looked at the big picture
Spring 2006 CS 685 Network Algorithmics 3
Part I: Systems Principles
Spring 2006 CS 685 Network Algorithmics 4
P1: Avoid Obvious Waste• Key Concept: look for ways to avoid doing
something, or to avoid doing it multiple times• Example: copying in protocol stacks
– In the old days, copying was not a bottleneck– But transmission & processor speeds increased much
faster than memory speeds• Today, reading/writing from/to memory is slowest
operation on a packet• “Zero-copy” protocol stacks never copy packet data
once it is stored in memory by the NIC– Eventually multiple copies became a bottleneck
Spring 2006 CS 685 Network Algorithmics 5
P2: Shift Computation in Time• Key Concept: Move expensive operations from
where time is scarce to where it is more plentiful– P2a: Precompute (= shift earlier in time)
• Example: computing a function by table lookup– P2b: Evaluate Lazily (= shift later in time)
• Example: copy-on-write– P2c: Share Expenses (= collect things in time)
• Example: garbage collection• Examples:
– initializing counter arrays: lazy evaluation + batching
Spring 2006 CS 685 Network Algorithmics 6
P3: Relax (Sub)System Requirements
• Key Concept: make one subsystem’s job easier by relaxing the specification (possibly at the expense of another subsystem’s job getting slightly harder)– P3a: Trade certainty for time (= use probabilistic solutions)– P3b: Trade accuracy for time (= use approximate
solutions)• Remember: “Good enough” is good enough
– P3c: Shift computation in space (= let someone else do it)• Example: DF bit in IPv4
– Purpose: to help End Systems avoid fragmentation– Why avoid?
• ease load on routers, avoid loss of APDUs
Spring 2006 CS 685 Network Algorithmics 7
P4: Leverage Off-System Components
• Key Concept: design bottom-up, use hardware– P4a: Exploit locality (= exploit caches)
• Note: not always useful (cf. IP forwarding lookups)– P4b: Trade Memory for Speed
• Note: this can be interpreted two ways– Use more memory to avoid computation (cf P12)– Use less memory to make data structures fit in cache
– P4c: Exploit hardware features• Examples
– Some NIC cards compute TCP checksum on the board (i.e. the OS/software does not need to compute it)
Spring 2006 CS 685 Network Algorithmics 8
P5: Add Hardware to Improve Performance
• Key Concept: Hardware is inherently parallel, can be very fast, and is cheap in volume; consider adding hardware to perform operations that must be done often and are expensive in software– P5a: Use memory interleaving, pipelining (= parallelism)– P5b: Use Wide-word parallelism (save memory accesses)– P5c: Combine SRAM, DRAM
• Examples:– DES encryption — requires permuting bits, nonlinear
mappings (some say it was designed to require hardware for fast implementation)
• Add-on boards for encryption/decryption at wire speeds– Ephemeral State Store
Spring 2006 CS 685 Network Algorithmics 9
Ephemeral State Store(Associative Memory)
Implementation
Hash Table (SRAM)
tag expire chain
Last
Next
tag expire chain
Handle
Handle Table (DRAM) Value Table (DRAM)
k
k64k 64
A store of size 2k bindings requires 128 + (h+1)k + z bitswhere h = (hash table size / store size)
and z = timestamp size
value
(tag, value)
Spring 2006 CS 685 Network Algorithmics 10
Part II: Modularity With Efficiency
• Note: read Clark & Tennenhouse, “Architectural Considerations for a New Generation of Protocols”, Proceedings of ACM SIGCOMM 1990
Spring 2006 CS 685 Network Algorithmics 11
P6: Replace inefficient general routines with efficient specialized
ones• Key Concept: General-purpose routines cannot
leverage problem-specific knowledge that can improve performance
• Example: – in_pcblookup() function in BSD Unix: general-
purpose state-block retrieval for both TCP and UDP sockets
• Uses simple linear search• Not adequate for large servers with 10000’s of open
sockets• Applications have very different reference characteristics, so
organize PCB’s for different apps separately (Calvert & Dixon '95)
Spring 2006 CS 685 Network Algorithmics 12
P7: Avoid Unnecessary Generality
• Key Concept: Do not include features or handle cases that are not needed.– "When in doubt, leave it out"
• Example:– mbuf structure was designed to not waste memory
for devices that produced small amounts of input• Today memory is cheap, and most devices produce
1000+ bytes at once• Most implementations use large monolithic buffers, big
enough to hold an Ethernet packet (2K bytes or more)
Spring 2006 CS 685 Network Algorithmics 13
P8: Don't be tied to reference implementations
• Key Concept:– Implementations are sometimes given (e.g. by
manufacturers) as a way to make the specification of an interface precise, or show how to use a device
– These do not necessarily show the right way to think about the problem—they are chosen for conceptual clarity!
• Examples:– RSARef implementation of RSA cryptography– Thread-per-layer implementations vs. upcalls
Spring 2006 CS 685 Network Algorithmics 14
P9: Pass hints across interfaces• Key Concept: if the caller knows something the
callee will have to compute, pass it (or something that makes it easier to compute) as an argument!– "hint" = something that makes the recipient's life
easier, but may not be correct– "tip" = hint that is guaranteed to be correct– Caveat: callee must either trust caller, or verify
(probably should do both)• Example
– Passing addresses of device-mapped pages in from user processes (Text Section 4.1)
Spring 2006 CS 685 Network Algorithmics 15
P10: Pass hints in protocol headers
• Key Concept: If sender knows something receiver will have to compute, pass it in the header
• Example:– Identifying state blocks
• TCP/IP method (4-tuple) is inefficient and slow• Better: have each end give the other a handle, to be
included in each packet; handle can be index into array– Include a nonce or key to validate the information– Nonce stored in both header and state block
Spring 2006 CS 685 Network Algorithmics 16
Part III: Local Speedup Techniques
Spring 2006 CS 685 Network Algorithmics 17
P11: Optimize the Expected Case
• Key Concept: If 80% of the cases can be handled similarly, optimize for those cases
• P11a: Use Caches– A form of using state to improve performance
• Example:– TCP input "header prediction"
• If an incoming packet is in order and does what is expected, can process in small number of instructions (see code)
Spring 2006 CS 685 Network Algorithmics 18
P12: Add or Exploit State to Gain Speed
• Key Concept: Remember things to make it easier to compute them later
• P12a: Compute incrementally– Here the idea is to "accumulate" as you go, rather
than computing all-at-once at the end• Example:
– Incremental computation of Chi-Square statistic
Spring 2006 CS 685 Network Algorithmics 19
P13: Optimize Degrees of Freedom
• Key Concept: Consider all the aspects of the problem that might be adapted to the conditions
• Example: IP trie lookups, where "width" of tree varies in the tree
Spring 2006 CS 685 Network Algorithmics 20
P14: Use special techniques for finite universes (e.g. small integers)
• Key Concept: when the domain of a function is small, techniques like bucket sorting, bitmaps, etc. become feasible.
• Example: Timing wheels
Spring 2006 CS 685 Network Algorithmics 21
P15: Use algorithmic techniques to create efficient data structures
• Key Concept: once P1-P14 have been applied, think about how to build an ingenious data structure that exploits what you know
• Examples– IP forwarding lookups
• PATRICIA trees (data structure) were first• Then many other more-efficient approaches
– Packet classification• Given a set of patterns to match 5-tuples, and a 5-
tuple, find {all|the first} pattern(s) that it matches
Spring 2006 CS 685 Network Algorithmics 22
Caveats• These are implementation principles, not
design principles– But when you go to design new protocols, it is very
helpful to know them! (E.g. SCTP uses receiver-chosen handles to identify state blocks.)
• There are other (probably better) ways to carve up these ideas into groups of "Principles"– The value is in thinking about and applying them
Spring 2006 CS 685 Network Algorithmics 23
Cautionary Examples• Having web servers pre-serve embedded objects when an
HTML object is requested– Varghese says they tried it, found it hurt performance!– Two proposed reasons:
• Interaction with TCP slow-start• Client caching
• Multiple-string matching in Snort (IDS)– Modified Boyer-Moore matching algorithm to do "set
matches"– Incorporated into Snort – little improvement– Why?
• String matching was not the bottleneck!• A large data structure was required, which no longer fit in cache
Spring 2006 CS 685 Network Algorithmics 24
Cautionary Examples, cont.• Process-list searching in PDP-11 Unix
– Many kernel operations involved a linear search through the list of processes
– Idea: use doubly-linked list of processes to speed search, insertion, deletion
– Result: would've taken about twice as long for typical table sizes; needed 1000's of processes before it would pay off!
Spring 2006 CS 685 Network Algorithmics 25
Cautionary Questions• Q1: Is improvement really needed?• Q2: Is this really the bottleneck?• Q3: What impact will change have on rest of
system?• Q4: Does BoE-analysis indicate significant
improvement?• Q5: Is it worth adding custom hardware?• Q6: Can protocol change be avoided?• Q7: Do prototypes confirm the initial promise?• Q8: Will performance gains be lost if
environment changes?
Spring 2006 CS 685 Network Algorithmics 26
CANEs Packet-Processing Model
incoming channels
customizing code
outgoing channels
predefined “slots”
GenericForwardingFunction
(e.g. active congestion control)
Spring 2006 CS 685 Network Algorithmics 27
Experiment Configuration
Background traffic source
MPEG source(avg rate 725 kbps)
Bottleneck link(2 Mbps)
Active IP router