Towards a Billion Routing Lookups per Second in Software

35
Towards a Billion Routing Lookups per Second in Software Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc Publisher: SIGCOMM Computer Communication Review, 2012 Presenter: Yuen-Shuo Li Date: 2012/12/12 1

description

Towards a Billion Routing Lookups per Second in Software. Author: Marko Zec , Luigi, Rizzo Miljenko Mikuc Publisher: SIGCOMM Computer Communication Review, 2012 Presenter: Yuen- Shuo Li Date: 2012/12/12. Outline. I dea Introduction S earch Data Structure L ookup Algorithm - PowerPoint PPT Presentation

Transcript of Towards a Billion Routing Lookups per Second in Software

Page 1: Towards a Billion Routing Lookups per Second in Software

1

Towards a Billion Routing Lookups per Second in Software

Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc

Publisher: SIGCOMM Computer Communication Review, 2012

Presenter: Yuen-Shuo Li

Date: 2012/12/12

Page 2: Towards a Billion Routing Lookups per Second in Software

2

Outline

Idea Introduction Search Data Structure Lookup Algorithm Updating Performance

Page 3: Towards a Billion Routing Lookups per Second in Software

3

Idea

Can a software routing implementation compete in a field generally reserved for specialized lookup hardware?

Page 4: Towards a Billion Routing Lookups per Second in Software

4

introduction(1/3)

Software routing lookups have fallen out of focus of the research community in the past decade, as the performance of general-purpose CPUs had not been on par with quickly increasing packet rates.

Software vs Hardware?

Page 5: Towards a Billion Routing Lookups per Second in Software

5

introduction(2/3)

But landscape has changed, however.

1. The performance potential of general-purpose CPUs has increased significantly over the past decade.

• instruction-level parallelism, shorter execution pipelines, etc.

2. The increasing interest in virtualized systems and software defined networks calls for a communication infrastructure that is more flexible and less tied to the hardware.

Page 6: Towards a Billion Routing Lookups per Second in Software

6

introduction(3/3)

DXR, an efficient IPv4 lookup scheme for software running on general CPUs.

The primary goal was not to implement a universal routing lookup schemes: data structures and algorithms which have been optimized exclusively for the IPv4 protocol.

distills a real-world BGP snapshot with 417,000 prefix into a structure consuming only 782 Kbytes, less than 2 bytes per prefix, and achieves 490 MLps

MLps: million lookups per second

Page 7: Towards a Billion Routing Lookups per Second in Software

7

Search Data Structure(1/11)

Building of the search data structure begins by expanding all prefixes from the database into address ranges.

Page 8: Towards a Billion Routing Lookups per Second in Software

8

Search Data Structure(2/11)

Neighboring address ranges that resolve to the same next hop are then merged.

Page 9: Towards a Billion Routing Lookups per Second in Software

9

Search Data Structure(3/11)

each entry only needs the start address and the next hop.

the search data we need!

Page 10: Towards a Billion Routing Lookups per Second in Software

10

Search Data Structure(4/11)

Page 11: Towards a Billion Routing Lookups per Second in Software

11

Search Data Structure(5/11)

Lookup table

To shorten the search by splitting the entire range in chunks, and using the initial k bits of the address to reach, through a lookup table.

k = 16, chunks

216

Page 12: Towards a Billion Routing Lookups per Second in Software

12

Search Data Structure(6/11)

Each entry in the lookup table must contain the position and size of the corresponding chunk in the range table

a special value for size indicates that the 19 bits are an offset into the next-hop table.

Format

Size position

1 bit 12 bits 19 bits

32 bits

Lookup table entry

k = 16, chunks

216

Page 13: Towards a Billion Routing Lookups per Second in Software

13

Search Data Structure(7/11)

Page 14: Towards a Billion Routing Lookups per Second in Software

14

Search Data Structure(8/11)

Range table

With k bits already consumed to index the lookup table, the range table entries only need to store the remaining 32-k address bits, and the next hop index.

Page 15: Towards a Billion Routing Lookups per Second in Software

15

Search Data Structure(9/11)

Range table

A further optimization which is especially effective for large BGP views, where the vast majority of prefixes are /24 or less specific.

long format

short format

Page 16: Towards a Billion Routing Lookups per Second in Software

16

Search Data Structure(10/11)

Range table

one bit in the lookup table entry is used to indicate whether a chunk is stored in long or short format.

long format

short format

Lookup index

Next hop

16 bit 16 bits

32 bits

Lookup index

Next hop

8 bit 8 bits

16 bits

Page 17: Towards a Billion Routing Lookups per Second in Software

Search Data Structure(11/11)

Page 18: Towards a Billion Routing Lookups per Second in Software

Lookup Algorithm(1/4)

The k leftmost bits of the key are used as an index in the lookup table.

1.2.4.0

00000001.00000010.00000100.00000000

index

Page 19: Towards a Billion Routing Lookups per Second in Software

Lookup Algorithm(2/4)

If the entry points to the next hop, the lookup is complete.

A special value for size indicates that the 19 bits are an offset into the next-hop table.

Page 20: Towards a Billion Routing Lookups per Second in Software

Lookup Algorithm(3/4)

Otherwise, the (position, size) information in the entry select the portion of the range table on which to perform a binary search of the (remaining bits of the ) key.

size

position

Page 21: Towards a Billion Routing Lookups per Second in Software

Lookup Algorithm(4/4)

Page 22: Towards a Billion Routing Lookups per Second in Software

22

Updating(1/3)

Lookup structures store only the information necessary for resolving next hop lookups, so a separate database which stores detailed information on all the prefixes is required for rebuilding the lookup structures.

lookup information

detailed information

Page 23: Towards a Billion Routing Lookups per Second in Software

23

Updating(2/3)

Updates are handled as follows:

updates covering multiple chunks(prefix length < k) are expanded.

then each chunk is processed independently and rebuilt from scratch.

Finds the best matching route for the first IPv4 address belonging to the chunk, translating it to an range table entry.

If the range table heap contains only a single element when the process ends, the next hop can be directly encoded in the lookup table.

Page 24: Towards a Billion Routing Lookups per Second in Software

24

Updating(3/3)

The rebuild time can be reduced

by coalescing multiple updates into a single rebuild We have implemented this by delaying the reconstruction for several

milliseconds after the first update is received

processing chunks in parallel, if multiple cores are available

Page 25: Towards a Billion Routing Lookups per Second in Software

25

Performance(1/11)

Our primary test vectors were three IPv4 snapshots from routeviews.org BGP routers.

Page 26: Towards a Billion Routing Lookups per Second in Software

26

Performance(2/11)

Here we present the results obtained using the LINX snapshot, selected because it is the most challenging for our scheme:

it contains the highest number of next hops

and an unusually high number of prefixes more specific than /24

Page 27: Towards a Billion Routing Lookups per Second in Software

27

Performance(3/11)

Our primary test machine had a 3.6 GHz AMD FX-815 8-core CPU and 4 GB of RAM. Each pair of cores share a private 2 MB L2 cache block, for a total of 8 MB of L2 cache. All 8 cores share 8 MB of L3 cache.

We also ran some benchmarks on a 2.8 GHz Intel Xeon W3530 4-core CPU with smaller but faster L2 caches(256 KB per core), and 8 MB shared L3 cache

• All tests ran on the same platform/compiler(FreeBSD 8.3, amd 64, gcc 4.2.2)

• Each test ran for 10 seconds

Page 28: Towards a Billion Routing Lookups per Second in Software

28

Performance(4/11)

We used three different request patterns:

REP (repeated): each key is looked up 16 times before moving to the next one.

RND (random): each key is looked up only once. The test loop has no data dependencies

between iterations.

SEQ (sequential): same as RND, but the timing loop has an artificial data dependency

between iterations so that requests become effectively serialized.

Page 29: Towards a Billion Routing Lookups per Second in Software

Performance(5/11)

29

Page 30: Towards a Billion Routing Lookups per Second in Software

30

Performance(6/11)

average lookup throughput with D18R is consistently faster compared to D16R on our test systems.

Page 31: Towards a Billion Routing Lookups per Second in Software

31

Performance(7/11)

the distribution of binary search steps for uniformly random queries observed using the LINX database.

n = 0 means a direct match in the lookup table

Page 32: Towards a Billion Routing Lookups per Second in Software

32

Performance(8/11)

the interference between multiple cores, causing a modest slowdown of the individual threads.

Page 33: Towards a Billion Routing Lookups per Second in Software

33

Performance(9/11)

DIR-24-8-BASIC

Page 34: Towards a Billion Routing Lookups per Second in Software

34

Performance(10/11)

Page 35: Towards a Billion Routing Lookups per Second in Software

35

Performance(11/11)