Towards a Billion Routing Lookups per Second in Software

Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc

Publisher: SIGCOMM Computer Communication Review, 2012

Presenter: Yuen-Shuo Li

Date: 2012/12/12

Outline

Idea Introduction Search Data Structure Lookup Algorithm Updating Performance

Can a software routing implementation compete in a field generally reserved for specialized lookup hardware?

introduction(1/3)

Software routing lookups have fallen out of focus of the research community in the past decade, as the performance of general-purpose CPUs had not been on par with quickly increasing packet rates.

Software vs Hardware?

introduction(2/3)

But landscape has changed, however.

1. The performance potential of general-purpose CPUs has increased significantly over the past decade.

• instruction-level parallelism, shorter execution pipelines, etc.

2. The increasing interest in virtualized systems and software defined networks calls for a communication infrastructure that is more flexible and less tied to the hardware.

introduction(3/3)

DXR, an efficient IPv4 lookup scheme for software running on general CPUs.

The primary goal was not to implement a universal routing lookup schemes: data structures and algorithms which have been optimized exclusively for the IPv4 protocol.

distills a real-world BGP snapshot with 417,000 prefix into a structure consuming only 782 Kbytes, less than 2 bytes per prefix, and achieves 490 MLps

MLps: million lookups per second

Search Data Structure(1/11)

Building of the search data structure begins by expanding all prefixes from the database into address ranges.

Neighboring address ranges that resolve to the same next hop are then merged.

each entry only needs the start address and the next hop.

the search data we need!

Lookup table

To shorten the search by splitting the entire range in chunks, and using the initial k bits of the address to reach, through a lookup table.

k = 16, chunks

Each entry in the lookup table must contain the position and size of the corresponding chunk in the range table

a special value for size indicates that the 19 bits are an offset into the next-hop table.

Format

Size position

1 bit 12 bits 19 bits

32 bits

Lookup table entry

k = 16, chunks

Range table

With k bits already consumed to index the lookup table, the range table entries only need to store the remaining 32-k address bits, and the next hop index.

Range table

A further optimization which is especially effective for large BGP views, where the vast majority of prefixes are /24 or less specific.

long format

short format

Range table

one bit in the lookup table entry is used to indicate whether a chunk is stored in long or short format.

long format

short format

Lookup index

Next hop

16 bit 16 bits

32 bits

Lookup index

Next hop

8 bit 8 bits

16 bits

Lookup Algorithm(1/4)

The k leftmost bits of the key are used as an index in the lookup table.

1.2.4.0

00000001.00000010.00000100.00000000

If the entry points to the next hop, the lookup is complete.

A special value for size indicates that the 19 bits are an offset into the next-hop table.

Otherwise, the (position, size) information in the entry select the portion of the range table on which to perform a binary search of the (remaining bits of the ) key.

position

Updating(1/3)

Lookup structures store only the information necessary for resolving next hop lookups, so a separate database which stores detailed information on all the prefixes is required for rebuilding the lookup structures.

lookup information

detailed information

Updating(2/3)

Updates are handled as follows:

updates covering multiple chunks(prefix length < k) are expanded.

then each chunk is processed independently and rebuilt from scratch.

Finds the best matching route for the first IPv4 address belonging to the chunk, translating it to an range table entry.

If the range table heap contains only a single element when the process ends, the next hop can be directly encoded in the lookup table.

Updating(3/3)

The rebuild time can be reduced

by coalescing multiple updates into a single rebuild We have implemented this by delaying the reconstruction for several

milliseconds after the first update is received

processing chunks in parallel, if multiple cores are available

Performance(1/11)

Our primary test vectors were three IPv4 snapshots from routeviews.org BGP routers.

Performance(2/11)

Here we present the results obtained using the LINX snapshot, selected because it is the most challenging for our scheme:

it contains the highest number of next hops

and an unusually high number of prefixes more specific than /24

Performance(3/11)

Our primary test machine had a 3.6 GHz AMD FX-815 8-core CPU and 4 GB of RAM. Each pair of cores share a private 2 MB L2 cache block, for a total of 8 MB of L2 cache. All 8 cores share 8 MB of L3 cache.

We also ran some benchmarks on a 2.8 GHz Intel Xeon W3530 4-core CPU with smaller but faster L2 caches(256 KB per core), and 8 MB shared L3 cache

• All tests ran on the same platform/compiler(FreeBSD 8.3, amd 64, gcc 4.2.2)

• Each test ran for 10 seconds

Performance(4/11)

We used three different request patterns:

REP (repeated): each key is looked up 16 times before moving to the next one.

RND (random): each key is looked up only once. The test loop has no data dependencies

between iterations.

SEQ (sequential): same as RND, but the timing loop has an artificial data dependency

between iterations so that requests become effectively serialized.

Performance(5/11)

Performance(6/11)

average lookup throughput with D18R is consistently faster compared to D16R on our test systems.

Performance(7/11)

the distribution of binary search steps for uniformly random queries observed using the LINX database.

n = 0 means a direct match in the lookup table

Performance(8/11)

the interference between multiple cores, causing a modest slowdown of the individual threads.

Performance(9/11)

DIR-24-8-BASIC

Performance(10/11)

Performance(11/11)

Towards a Billion Routing Lookups per Second in Software

Documents

Transcript of Towards a Billion Routing Lookups per Second in Software

Fast Incremental Updates on Ternary-CAMs for Routing Lookups and Packet Classification Devavrat Shah and Pankaj Gupta Department of Computer Science Stanford.

RefWorks Advanced Search And Lookups

WHOIS ACCURACY AND PUBLIC SAFETY - RIPE 73 · • WHOIS lookups are one of many tools investigators use in addition to: Routing tables/services Commercially available tools Internally

1 Basic Data Structures for IP lookups and Packet Classification.

IPv6 Services: AAAA DNS Lookups over an IPv4 Transport · IPv6 Services: AAAA DNS Lookups over an IPv4 Transport IPv6basicconnectivitycanbeenhancedbyconfiguringsupportforAAAArecordtypesintheDNS

CS 268: Computer Networking L-16 P2P. 2 Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups – Chord Comparison of DHTs.

Binary Search Schemes for Fast IP Lookups · Binary Search Schemes for Fast IP Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina

1 Comnet 2010 Communication Networks Recitation 7 Lookups & NAT.

Fast address lookups using controlled prefix …cseweb.ucsd.edu/~varghese/PAPERS/TOCS99.pdfFast Address Lookups Using Controlled Prefix Expansion V. SRINIVASAN and G. VARGHESE Washington

Tree Bitmap : Hardware/Software IP Lookups with ...cseweb.ucsd.edu/~varghese/PAPERS/willpaper.pdf · Tree Bitmap : Hardware/Software IP Lookups with Incremental Updates Page 2 ware.

Flexible Routing Tables in a Distributed Key ... - OPUS 4 · To show that FRT-Chord is extensible, the authors proposed GFRT-Chord which adds peer groups to further speed up lookups.

Excel if Statements LOOKUPS

Small Forwarding Tables for Fast Routing Lookupsdga/15-744/S07/papers/D+97.pdf · Small Forwarding Tables for Fast Routing Lookups ... We present a forwarding table data structure

Standardising Business Practices and Lookups

TreeCAM: Decoupling Updates and Lookups in Packet Classification

Algorithms for High Level Outline Routing Lookups and ...yuba.stanford.edu/~pankaj/orals/orals_6p.pdf · • Algorithm #1 – Motivation – Details – Performance • Algorithm

Edinburgh Research Explorer · queries goes to hash index lookups used in hash-join opera-tions [16]. By accelerating hash index lookups, a functionality that is essential in modern

13 ref works 2.0 advanced search and lookups

Towards a Billion Routing Lookups per Second in Software Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc Publisher: SIGCOMM Computer Communication Review,

Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates