ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting...

12
Computer Networks 126 (2017) 162–173 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet ICE: A memory-efficient BGP route collecting engine Enrico Gregori, Barbara Guidi, Alessandro Improta , Luca Sani Institute of Informatics and Telematics, Italian National Research Council (IIT-CNR), Pisa, Italy a r t i c l e i n f o Article history: Received 24 November 2016 Revised 6 June 2017 Accepted 18 July 2017 Available online 19 July 2017 Keywords: Internet Route collectors BGP Data compression a b s t r a c t Since their deployment, BGP route collectors have played a fundamental role in investigating and de- tecting routing accidents and hijack attempts. However, an increasing number of detection techniques designed for real-time environments show that the lack of interactivity of route collectors represents a limitation to their efficacy, together with the small amount of sources from which data is collected. Both issues stem from the current implementation of route collectors, which relies on single-threaded and general-purpose routing suites to establish BGP sessions and collect data. With this implementation any interactive operation impacts on the collection process and the number of sessions that can be estab- lished is limited by memory usage, which is not optimized for route collecting purposes. In this paper we present ICE, a multi-threaded and memory-efficient BGP collecting engine which allows route collec- tors to overcome the above mentioned limitations. The multi-threaded environment allows us to solve the lack of interactiveness, allowing concurrent read/write operations. Memory efficiency has been ob- tained thanks to the design of a variant of the Liv-Zempel compression algorithms specifically tailored to operate within a BGP real-time collecting environment. The proposed technique exploits the high degree of repetitiveness characterizing BGP data and reduces the ICE memory usage by as much as 30%. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Twenty-seven years have passed since Kirk Lougheed and Yakov Rekhter conceived the very first version of the Border Gateway Protocol (BGP) [1]. To the present day, the Two Napkin Protocol 1 is still widely used and is the de facto standard protocol to es- tablish inter-domain routing between Autonomous Systems (ASes). The main reason is its extreme versatility, which allows network operators to implement any kind of routing policy by applying simple filters on the routes announced [2,3]. Based on these poli- cies, ASes announce to each other which portions of the Internet they can carry traffic towards. These pieces of routing informa- tion are stored inside the Routing Information Base (RIB) of each BGP speaker, and represent a potential gold mine of information about the Internet ecosystem. Since ASes are typically managed by private organizations, nowadays the only possibility of analyzing these pieces of information is thanks to the volunteer contribu- tion of ASes to route collectors. Route collectors were originally con- ceived in the late 90s as a tool for network administrators to ob- tain information about the Internet inter-domain routing. During Corresponding author. E-mail addresses: [email protected] (E. Gregori), [email protected] (B. Guidi), [email protected], [email protected] (A. Improta), [email protected] (L. Sani). 1 http://www.computerhistory.org/atchm/the- two- napkin- protocol/. time, the needs of network administrators have become increas- ingly time-constrained and source-dependent. This means that the ability to check the routing status of a given network in real-time from a large amount of different BGP perspectives has become an important requirement for several applications, such as prefix hi- jack attempts [4–6] and routing anomalies [7–10] detection. These new requirements cannot be directly fulfilled with the classic implementation of route collectors, since they require di- rect random read operations on the RIB of each route collector which are usually not permitted – and a cost-efficient methodol- ogy to keep as many Adj-RIBs-In 2 as possible in the RAM of a sin- gle machine. One of the main causes of the current lack of interac- tiveness and the poor scalability of route collectors is that current implementations are use single-threaded general-purpose routing suites [12] to collect routing data. On the one hand, the general purpose nature of these routing suites introduces overheads into the collection process, both in terms of memory and of route pro- cessing speed due to features of the BGP protocol not needed in route collection (e.g. the BGP decision process). On the other hand, any read operation on the RIB would affect the route collection process of every BGP session established on the very same machine 2 The Adj-RIBs-In (Adjacent Routing Information Bases, Incoming) are the part of the RIB which ”... stores routing information learned from inbound UPDATE messages that were received from other BGP speakers.” [11] http://dx.doi.org/10.1016/j.comnet.2017.07.009 1389-1286/© 2017 Elsevier B.V. All rights reserved.

Transcript of ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting...

Page 1: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

Computer Networks 126 (2017) 162–173

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier.com/locate/comnet

ICE: A memory-efficient BGP route collecting engine

Enrico Gregori, Barbara Guidi, Alessandro Improta

∗, Luca Sani

Institute of Informatics and Telematics, Italian National Research Council (IIT-CNR), Pisa, Italy

a r t i c l e i n f o

Article history:

Received 24 November 2016

Revised 6 June 2017

Accepted 18 July 2017

Available online 19 July 2017

Keywords:

Internet

Route collectors

BGP

Data compression

a b s t r a c t

Since their deployment, BGP route collectors have played a fundamental role in investigating and de-

tecting routing accidents and hijack attempts. However, an increasing number of detection techniques

designed for real-time environments show that the lack of interactivity of route collectors represents a

limitation to their efficacy, together with the small amount of sources from which data is collected. Both

issues stem from the current implementation of route collectors, which relies on single-threaded and

general-purpose routing suites to establish BGP sessions and collect data. With this implementation any

interactive operation impacts on the collection process and the number of sessions that can be estab-

lished is limited by memory usage, which is not optimized for route collecting purposes. In this paper

we present ICE, a multi-threaded and memory-efficient BGP collecting engine which allows route collec-

tors to overcome the above mentioned limitations. The multi-threaded environment allows us to solve

the lack of interactiveness, allowing concurrent read/write operations. Memory efficiency has been ob-

tained thanks to the design of a variant of the Liv-Zempel compression algorithms specifically tailored to

operate within a BGP real-time collecting environment. The proposed technique exploits the high degree

of repetitiveness characterizing BGP data and reduces the ICE memory usage by as much as 30%.

© 2017 Elsevier B.V. All rights reserved.

t

i

a

f

i

j

c

r

w

o

g

t

i

s

p

t

c

r

a

1. Introduction

Twenty-seven years have passed since Kirk Lougheed and Yakov

Rekhter conceived the very first version of the Border Gateway

Protocol (BGP) [1] . To the present day, the Two Napkin Protocol 1

is still widely used and is the de facto standard protocol to es-

tablish inter-domain routing between Autonomous Systems (ASes).

The main reason is its extreme versatility, which allows network

operators to implement any kind of routing policy by applying

simple filters on the routes announced [2,3] . Based on these poli-

cies, ASes announce to each other which portions of the Internet

they can carry traffic towards. These pieces of routing informa-

tion are stored inside the Routing Information Base (RIB) of each

BGP speaker, and represent a potential gold mine of information

about the Internet ecosystem. Since ASes are typically managed by

private organizations, nowadays the only possibility of analyzing

these pieces of information is thanks to the volunteer contribu-

tion of ASes to route collectors . Route collectors were originally con-

ceived in the late 90s as a tool for network administrators to ob-

tain information about the Internet inter-domain routing. During

∗ Corresponding author.

E-mail addresses: [email protected] (E. Gregori), [email protected]

(B. Guidi), [email protected] , [email protected] (A. Improta),

[email protected] (L. Sani). 1 http://www.computerhistory.org/atchm/the- two- napkin- protocol/ .

p

t

t

http://dx.doi.org/10.1016/j.comnet.2017.07.009

1389-1286/© 2017 Elsevier B.V. All rights reserved.

ime, the needs of network administrators have become increas-

ngly time-constrained and source-dependent. This means that the

bility to check the routing status of a given network in real-time

rom a large amount of different BGP perspectives has become an

mportant requirement for several applications, such as prefix hi-

ack attempts [4–6] and routing anomalies [7–10] detection.

These new requirements cannot be directly fulfilled with the

lassic implementation of route collectors, since they require di-

ect random read operations on the RIB of each route collector –

hich are usually not permitted – and a cost-efficient methodol-

gy to keep as many Adj-RIBs-In

2 as possible in the RAM of a sin-

le machine. One of the main causes of the current lack of interac-

iveness and the poor scalability of route collectors is that current

mplementations are use single-threaded general-purpose routing

uites [12] to collect routing data. On the one hand, the general

urpose nature of these routing suites introduces overheads into

he collection process, both in terms of memory and of route pro-

essing speed due to features of the BGP protocol not needed in

oute collection (e.g. the BGP decision process). On the other hand,

ny read operation on the RIB would affect the route collection

rocess of every BGP session established on the very same machine

2 The Adj-RIBs-In (Adjacent Routing Information Bases, Incoming) are the part of

he RIB which ”... stores routing information learned from inbound UPDATE messages

hat were received from other BGP speakers.” [11]

Page 2: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 163

s

c

c

c

s

n

a

T

a

a

f

I

r

i

a

t

e

o

d

A

t

c

t

p

s

t

p

p

T

t

t

t

i

t

n

p

n

I

7

2

w

e

v

e

T

w

l

i

o

s

V

c

v

t

B

a

i

p

r

t

i

r

g

d

c

t

t

A

(

s

t

a

c

a

l

t

t

s

s

r

l

r

e

p

i

i

r

e

T

p

i

d

R

f

f

c

a

Q

n

r

l

k

i

e

i

o

p

c

o

ince the routing suites are single-threaded. The delay introduced

an be small and imperceptible in the best case, but in the worst

ase it can lead to BGP session failures [13] of other ASes and the

onsequent peaks of traffic caused by RIB transfers when the BGP

essions are re-established. The direct consequence of these phe-

omena is that public route collector projects either forbid direct

ccess to their routing suites or provide a restricted Telnet access.

In this paper we propose the Interactive Collecting Engine (ICE).

his is an open-source route collecting engine able to solve the

bove mentioned problems and which allows the deployment of

new class of cost-efficient and interactive route collectors. Dif-

erently from existing solutions (e.g. Quagga [14] and Bird [15] ),

CE exploits the multi-thread paradigm to allow fast and concur-

ent accesses to the RIB, in addition to periodically dump routing

nformation in MRT format [16] . Thanks to multi-threading, ICE is

ble to handle a large number of read operations without affecting

he collecting process and without affecting the reader user experi-

nce. ICE has been developed and designed with a particular focus

n memory usage, to enable an ideal route collecting project to

eploy only a small number of servers to accomplish a large scale

S coverage. To do that, ICE exploits the large amount of repeti-

iveness present in the PATH ATTRIBUTE field of BGP packets to

ompress the size of each Adj-RIB-In with a dedicated compression

echnique derived from the classic Liv-Zempel algorithms. This ap-

roach allows ICE to establish about 30% more full route BGP ses-

ions in memory, enabling a simple 16GB RAM machine dedicated

o route collecting to store up to 285 BGP sessions and keeping the

rocessing time overhead introduced reasonable. ICE is already de-

loyed and working as the core of the IIT-CNR Isolario project [17] .

his is a distributed system which provides both real-time moni-

oring services to every participating AS administrator, in addition

o classic route collection.

The rest of the paper is organized as follows. Section 2 presents

he new route collecting challenges and evaluates the limits of ex-

sting route collector design. Section 3 describes the implementa-

ion of ICE. Section 4 provides an analysis on BGP data repetitive-

ess and an overview of compression techniques available to com-

ress the Adj-RIBs-In. Section 5 describes the compression tech-

ique applied in ICE. Finally, Section 6 describes Isolario and how

CE has been interfaced with the rest of the system and Section

concludes the paper.

. Towards interactive route collectors

The first deployment of BGP route collectors dates back to 1997,

hen data collection was performed by the Measurement and Op-

rations Analysis Team (MOAT) of the National Laboratory for Ad-

anced Network Research (NLANR) project [18] . Data was initially

xtracted via shell scripting, which retrieved regular RIB snapshots.

hen, thanks to the introduction of MRT [16] the data format

as standardized and every single packet in BGP sessions estab-

ished towards route collectors was caught and stored, thus mak-

ng it possible to re-create the BGP flow collected in a given period

f time to investigate network issues. NLANR ceased its activities

ome years ago, 3 but route collecting was continued by the Route

iews project at the University of Oregon [19] – which started to

ollect MRT data in 2001 – and by the Routing Information Ser-

ice (RIS) at the Réseaux IP Européens Network Coordination Cen-

er (RIPE NCC) [20] – which started to collect MRT data in 1999.

oth projects make publicly available periodic RIB snapshots and

3 http://www.nlanr.net/ .

V

m

collection of every single BGP packet collected at different time

ntervals. 4

Route collector design has changed little since their initial de-

loyment. Route collectors are basically servers which mimic the

ole of a BGP border router and collect the best routes adver-

ised by connected ASes – hereafter feeders – without announc-

ng any route back to the other party. To do this, route collectors

un a dedicated piece of software – hereafter route collecting en-

ine – able to keep the full route announced by each feeder and to

ump routing data periodically in MRT format. The design has not

hanged much, but the requirements of users have evolved from

he capability to perform offline static analyses on collected data

owards the possibility of analyzing in real-time the content of the

dj-RIBs-In, mostly for online routing problem detection purposes

e.g. [4–10] ). In order to meet these new requirements – taking in-

piration from [21] – an interactive route collector must rely on

he availability of a route collecting engine able to:

Collect data in real-time . BGP messages should be recorded

s soon as received, without additional delays introduced by other

ollateral data operations.

Use a low amount of resources . Route collectors should keep

s many Adj-RIBs-In as possible in RAM to save costs and make

arge-scale deployment possible.

Handle real-time routing table queries . Users have to be able

o retrieve routing information for any IP space portion stored in

he Adj-RIBs-Ins. Their requests should be handled as soon as pos-

ible and should not interfere with the route collection process.

Dump messages in MRT format . Incoming BGP messages

hould be stored using a standard methodology to allow a poste-

iori analyses together with data collected from other route col-

ecting projects.

So far, Quagga [14] represents the best known open-source

outing suite available and is used in a plethora of different

nvironments, such as data centers or network-driven research

rojects. Thanks to its capability to dump received BGP messages

n MRT format, it has been used intensively in the last two decades

n BGP route collectors. Nevertheless, it does not fulfill the above

equirements.

First of all, the BGP daemon provided in Quagga is optimized to

mulate the whole BGP process and to act like a real BGP router.

herefore, its memory usage is not optimized for route collecting

urposes. To better understand the capabilities of Quagga in stor-

ng multiple full routing tables we tested it by feeding it with ran-

om RIB snapshots chosen from the set of feeders connected to

oute Views [19] , RIS [20] and Isolario [17] and sharing their IPv4

ull routing table on March 2nd, 2016. Fig. 1 shows the results

ound by running Quagga v.1.0.20160315 on a standard multipro-

essor server equipped with 2 x E5-2407 4-core CPU at 2.20 GHz

nd 16GB of RAM running Debian 3.16.0 (64 bit). In this testbed,

uagga shows an average increment of 81.4MB per feeder con-

ected, leading to a maximum number of about 201 feeders.

The main problem of using Quagga in route collectors is rep-

esented however by its single-threaded core which does not al-

ow simultaneous read and write operations on the Adj-RIBs-In

ept in memory. Using a single-threaded collecting engine in an

nteractive route collector means that every single read/write op-

ration on the RIB causes a potential delay in recording incom-

ng BGP packets. The consequences strictly depend on the amount

f data to be read and on the amount of consecutive operations

erformed. In the best case, the timestamps of the BGP packets

ollected during the read operations are postponed by some sec-

nds. In the worst case, the delay introduced by read operations

4 Route Views RIB snapshots are shot every 2 h, RIS snapshots every 8 h. Route

iews UPDATE messages are dumped every 15 min, RIS UPDATE messages every 5

in.

Page 3: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

164 E. Gregori et al. / Computer Networks 126 (2017) 162–173

0 B

1GB

2GB

3GB

4GB

5GB

6GB

1 2 4 8 16 32 64

Mem

ory

usag

e

# of IPv4 full routing tables

Fig. 1. Quagga memory usage.

0

200000

400000

600000

800000

1x106

1.2x106

0 20 40 60 80 100 120

# of

MR

T r

ecor

ds

time [s]

Fig. 2. Packet timestamp recorded by Quagga in MRT files during two sequential

RIB transfer. The second RIB transfer (t = 45 s) has been interrupted by ten sequen-

tial full table read operations (t = 60 s).

Fig. 3. ICE overview.

s

p

X

3

t

a

p

a

t

i

t

s

t

(

o

t

l

a

e

f

p

i

P

e

p

e

a

g

p

H

m

m

a

a

r

s

5 By feeder handler thread we mean the combination of all those threads dedi-

cated to a given feeder – i.e. two threads to receive BGP packets and send periodical

KEEPALIVE messages to the feeder, one thread to write in the common RIB and

one thread to dump MRT data.

would also delay the dispatch of the KEEPALIVE message from

Quagga to the feeders, likely causing their hold timer to expire,

the BGP session to be torn down from the feeder side and the

re-transmission of the whole feeder full routing table as soon as

the session is re-established. In both cases, results collected in

MRT files would not reflect what really happened in the Internet,

and any application relying on data collected during these events

would perceive artificial silences and peaks of BGP announcements,

which can potentially lead to drawing wrong conclusions.

As proof of this, we tested Quagga by recreating two sequential

RIB transfers involving about 550k routes. The first RIB is trans-

ferred without any interruption, while during the second we per-

form ten sequential read operations related to the whole routing

information announced by the first feeder F 1 via CLI and the com-

mand show ip bgp neighbors F i routes . Fig. 2 shows the

timestamps recorded by Quagga in the UPDATE dumps during the

RIB transfers. The first RIB transfer ends after about 35 s, while

the second RIB transfer starts after 45 s and finishes after 80 s.

The read operation is performed after 60 s, and causes a delay of

about 10 s in data collection. A larger set of similar sequential read

operations would have delayed write operations even more, intro-

ducing into the MRT files potentially minutes of routing data si-

lence followed by peaks of routing traffic at the end of the read

operations. It must be noted, however, that this behavior is com-

mon to every routing suite based on a single-threaded paradigm.

To the best of our knowledge, none of the available routing

suites completely fulfill the requirements described in Section 2 .

Most of them indeed have a single-threaded engine to manage BGP

essions (e.g. Bird [15] , OpenBGPD [22] ) – thus sharing the same

roblem of Quagga – or do not provide any MRT file dump (e.g.

ORP [23] ).

. ICE: an Interactive Collecting Engine

The first step to making a route collector interactive is to substi-

ute the current single-threaded routing suites with a new engine

ble to i ) collect routing data in MRT format, ii ) maintain multi-

le full routing tables in IPv4 and IPv6, and iii ) be queryable at

ny time without affecting the route collection process. Following

hese strict requirements, we developed ICE, a BGP engine specif-

cally designed for route collecting purposes and based on multi-

hreading. ICE architecture is depicted in Fig. 3 . Read requests are

erved via request handler threads which allow users to read por-

ion(s) of the routing table of a given feeder either in plain text

via CLI channel) or in binary format (via raw channel). On the

ther hand, feeder handler threads 5 are responsible for managing

he BGP sessions established with each feeder following the guide-

ines indicated by the BGP Finite State Machine described in [11] ,

nd to produce periodically MRT files – i.e. snapshots of the RIB

very X hours and dumps of every single UPDATE packet received

rom the feeder every Y minutes. No BGP decision process is im-

lemented in ICE since one of the peculiarities of route collectors

s not to announce any UPDATE message back to the feeders.

Similarly to [24] , ICE maintains a local RIB implemented like a

atricia Trie ( PT ), which evolves with the contribution of BGP pack-

ts received from every feeder connected. Whenever one UPDATEacket is received by a feeder handler, the set of destinations is

xtracted (i.e. the content of the NLRI field and MP REACH NLRIttribute, if available) and each of them is inserted into the PT to-

ether with the TOTAL PATH ATTRIBUTE chunk found in the

acket. To save space we only keep the Network Address of Next

op field and its length [25] in each MP REACH NLRI announce-

ent carrying IPv6 network since it is possible to infer the re-

aining fields from the PT node where the attribute is inserted,

s described in [16] . Every time a new subnet is recorded from

ny feeder, the PT is modified with one or more nodes, each rep-

esenting one or more bits of the binary representation of the

ubnet. Then, the TOTAL PATH ATTRIBUTE chunk is inserted

Page 4: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 165

0.0*100

2.0*105

4.0*105

6.0*105

8.0*105

1.0*106

1.2*106

1.4*106

0 10 20 30 40 50 60 70

# of

MR

T r

ecor

ds

time [s]

No readers1 reader

2 readers4 readers8 readers

(a) Number of cores = 1

0.0*100

2.0*105

4.0*105

6.0*105

8.0*105

1.0*106

1.2*106

1.4*106

0 10 20 30 40 50 60 70

# of

MR

T r

ecor

ds

time [s]

No readers1 reader

2 readers4 readers8 readers

(b) Number of cores = 8

Fig. 4. Packet timestamp recorded by ICE in MRT files during two sequential RIB transfer in variable multi-core environments. Multiple read operations are performed 1s

after the beginning of the second RIB transfer.

i

s

t

4

a

i

t

w

fi

m

r

f

d

b

t

c

r

u

r

v

r

n

i

c

i

s

t

t

w

w

a

a

s

s

a

s

F

s

d

r

F

d

t

a

Table 1

Mean value ( μ) and standard deviation ( σ ) of time elapsed (in seconds) to complete

N concurrent read operations before and during the F 2 RIB transfer.

Number of readers Before F 2 RIB transfer During F 2 RIB transfer

1 core 8 cores 1 core 8 cores

μ σ μ σ μ σ μ σ

1 2.12 0.07 2.09 0.07 2.21 0.07 2.18 0.08

2 2.15 0.10 2.14 0.10 2.19 0.09 2.16 0.08

4 2.32 0.15 2.20 0.11 2.28 0.13 2.17 0.11

8 3.66 0.06 2.15 0.14 3.73 0.09 2.13 0.15

c

c

t

c

I

8

t

a

F

o

s

fi

w

p

c

s

c

i

p

t

s

1

w

h

h

t

B

n

c

r

r

nside the node representing the last bit(s) of the subnet, in a

lot dedicated to the feeder which sent the packet. For example,

he PT depicted in Fig. 6 contains nodes related to the subnets

8.0.0.0/4 and 192.0.0.0/3 announced by Feeder 1 and 2,

nd to 64.0.0.0/3 announced by Feeders 3 and 4. Feeder slots

n each subnet node are allocated only when needed, i.e. the first

ime a feeder announces that subnet. On the other hand, each

ithdrawn route (i.e. the content of the WITHDRAWN ROUTESeld and MP UNREACH NLRI attribute, if available) causes the re-

oval of the path attribute chunk from the feeder slot, but not the

emoval of any node from PT , even if that subnet is withdrawn

rom every feeder. This design choice was made because with-

rawn events are most likely related to network transients caused

y network malfunctions [26] , so most of the involved routes tend

o be re-announced in time and not to disappear forever. Ac-

esses to PT and to its nodes are protected according to the classic

eaders-writers paradigm, which is implemented by the Sync mod-

le. Concurrent read operations are always allowed, while concur-

ent write and read/write operations are allowed only if they in-

olve different nodes. A write operation consists in a preliminary

ead operation to find if there exists the node for the related sub-

et. If the node is found, then the path attribute chunk is inserted

n the proper feeder slot of the node. Otherwise, the new node is

reated with the path attribute in the proper feeder slot and finally

nserted into the RIB by locking the entire PT . Note that, as a con-

equence of the choice not to remove any node in any case from

he PT , only the insertion of new nodes causes the lock of the en-

ire PT . This happens frequently only during the ICE start-up phase,

hen the first feeder announces the whole RIB.

To quantify the effectiveness of the multithreading approach,

e ran a set of tests very similar to the test performed on Quagga

t the end of Section 2 . The experiment stresses the system in

worst case scenario, i.e. the first feeder F 1 establishes the BGP

ession with ICE, completes the RIB transfer and then proceeds to

end BGP keepalives to keep the BGP session with ICE up. Then

fter about 20 s the second feeder F 2 proceeds to perform the

ame operations. After 1 s from the beginning of the RIB transfer of

2 , ICE receives multiple sequential read requests on the whole IP

pace available for F 1 , which in Section 2 caused a delay in Quagga

ata recording. Given the multithreaded nature of the software, we

an the experiment varying the number of cores available for ICE.

ig. 4 shows the timestamps recorded by ICE in the UPDATE dumps

uring the RIB transfers. When the number of threads is higher

han the number of available cores, the timestamps in MRT packets

re recorded with a slight delay with respect to the zero-readers

ase. This is expected, since the scheduler has to share one or more

ores among the writer and the readers. Nevertheless, even when

he number of available cores is one, ICE is still able to write in-

oming BGP packets thanks to the scheduler activity. This allows

CE to limit the delay in MRT data records in this scenario to about

–10 s with respect to the 8-core scenario depicted in Fig. 4 main-

aining the same curve trend. Finally, we compared the time that

set of concurrent readers take to retrieve the full routing table of

1 before and during the table transfer of F 2 , varying the number

f cores. Results are shown in Table 1 . As can be seen, the time the

econd set of readers take to retrieve F 1 table is very close to the

rst set, confirming that multiple readers can proceed concurrently

ith the feeder that is currently sending its full table without ex-

eriencing any significant delay.

It is noteworthy that the tests performed above represent worst

ase scenarios. First, we made the readers dump the whole IP

pace of F 1 , when read operations are mostly driven to given spe-

ific portions of the IP space. Then we performed them during an

deal RIB transfer phase where a sequence of ∼ 630k BGP UPDATE

ackets each containing a single subnet was sent from the feeder

o ICE without any additional delay. On a regular multiprocessor

erver equipped with 2 x E5-2407 4-core CPU at 2.20 GHz and

6GB of RAM running Debian 3.16.0 (64 bit), ICE terminates the

rite operations requested in about 16s, meaning that is able to

andle a peak rate of ∼ 30k packets per second. RIB transfers are

owever performed only as soon as the BGP session goes up – ei-

her because a new BGP peer has been connected or an existing

GP session reset [13] – and represent a very rare event. During

ormal operations write request rate is much lower than in the RIB

ase. Fig. 5 shows the distribution of the average and maximum

ate of BGP announcement/withdrawn events received by Isolario

oute collectors [17] from each of its feeders (37) in a random day

Page 5: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

166 E. Gregori et al. / Computer Networks 126 (2017) 162–173

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

100 101 102 103 104 105

CC

DF

events/s

AvgMax

Fig. 5. Average and maximum BGP event rate as measured from 37 BGP feeds con-

nected to Isolario.

Fig. 6. Patricia Trie implementation of the RIB.

0 B

1GB

2GB

3GB

4GB

5GB

6GB

1 2 4 8 16 32 64

Mem

ory

usag

e

# of IPv4 full routing tables

Fig. 7. ICE memory usage while storing multiple IPv4 full routing tables.

n

I

e

t

s

a

i

w

t

o

i

d

m

4

i

m

l

w

s

p

P

t

c

w

t

c

A

D

T

e

e

a

p

i

t

i

i

6 In November 2016, 803 different ASes were connected on the Amsterdam In-

ternet Exchange (AMS-IX), 769 on the London Internet Exchange (LINX) and 705 on

the Frankfurt site of the Deutscher Commercial Internet Exchange (DE-CIX).

of analysis (10th, April 2017). On average, each feeder generates 1–

10 events per second, with rare peaks of about 1k–10k.

4. Data compression

The route collecting engine proposed in Section 3 allows con-

current routing data operations and still avoiding the usage of

inter-feeder semaphores, but the design choice to keep each plain

BGP attribute in memory limits its scalability. To have a better in-

sight into ICE memory usage we tested it with the same testbed

described in Section 2 . As can be seen from Fig. 7 , ICE uses more

or less the same amount of memory as Quagga, showing an av-

erage per-feeder increment of 82.4MB. This can be reasonable in

several scenarios where memory consumption is not an issue or

the number of feeders is low, but it may be troublesome in other

scenarios. For example, in [27] it has been shown that about five

thousand ASes should be connected to route collecting projects to

have the chance to reveal the full AS connectivity of every transit

AS in the Internet ecosystem. In other words, the route collecting

project should set up at least about five thousand different BGP

sessions, requiring more than 25 machines running a fully-loaded

ICE instance each. A less ideal example is the deployment of a

route collector on Internet Exchange Points (IXP). Some of these

facilities in Europe will soon have more than a thousand ASes con-

ected, 6 which would mean that to collect a full route from each

XP participant five machines running a fully-loaded ICE instance

ach would be required.

This scalability issue can be mitigated by exploiting the charac-

eristics of BGP data collected and applying classic data compres-

ion techniques on each incoming BGP packet to compress the path

ttribute data stored in each node of PT , saving space thanks to the

nter-packet redundancy found in the BGP session. In this section,

e provide an analysis of BGP data to investigate the amount of

his repetitiveness, which is an indication as to the effectiveness

f any data compression technique applied to BGP data. Then, we

nvestigate the most well known data compression techniques to

iscover how they could be adapted (if possible) to fit our require-

ents.

.1. On the repetitiveness of BGP data

The first step towards a more memory-efficient route collect-

ng software using compression techniques is to understand how

uch data is repetitive, and thus compressible. The presence of a

arge degree of repetitiveness in BGP data can be identified even

ith a simple analysis of the protocol design. In each BGP ses-

ion, routing information is carried by UPDATE messages which

... advertise feasible routes that share common path attributes to a

eer, or to withdraw multiple unfeasible routes from service” [11] .

ath attributes are used by routers in their BGP decision process

o select the best routes, and are the key to analyzing the Internet

haracteristics [3,27,28] . Some of these attributes are mandatory,

hile others are discretionary/optional and appear only whenever

hey are needed. In external BGP scenarios (which include route

ollecting) the mandatory attributes listed in [11] are ORIGIN ,S PATH and NEXT HOP , while the AGGREGATOR , MULTI EXITISC and ATOMIC AGGREGATE are defined as optional attributes.

he first hint of repetitiveness can be found in the description of

ach attribute. Apart from the AS PATH and the AGGREGATOR ,very other attribute just marks the route with a value taken from

limited set of possibilities, which may depend on the BGP peer

olicies applied (e.g. NEXT HOP ) and on the BGP peer receiv-

ng the attribute (e.g. MULTI EXIT DISC ). Values that obviously

end to be highly repeated.

In order to have a thorough view of the path attribute repet-

tiveness, we analyzed BGP data collected from each full feeder

n time. We considered as a full feeder every BGP peer which

Page 6: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 167

Table 2

BGP RIB snapshot details.

March 2nd, 2016 February 2nd, 2016 March 2nd, 2015 March 2nd, 2006

IPv4

# of full feeders 330 327 244 150

Avg. # of routes in RIB 575,335.26 568,892.60 525,888.18 177,073.38

# of ASes 53,525 53,145 49,986 21,901

# of /32 covered 2.71 · 10 9 2.67 · 10 9 2.58 · 10 9 1.29 · 10 9

IPv6

# of full feeders 256 255 180 14

Avg. # of routes in RIB 27,141.82 26,514.96 21,321.86 677.64

# of ASes 11,581 11,371 9666 599

# of /128 covered 2.95 · 10 31 2.79 · 10 31 2.15 · 10 31 3.78 · 10 29

0

0.2

0.4

0.6

0.8

1

Mar,2006

Mar,2015

Feb,2016

Mar,2016

Avg

. u-in

dex

AS pathNext hop

MEDAggregator

Communities

(a) BGP attributes in IPv4 network announcements

0

0.2

0.4

0.6

0.8

1

Mar,2006

Mar,2015

Feb,2016

Mar,2016

Avg

. u-in

dex

AS pathNext hop

MEDAggregator

Communities

(b) BGP attributes in IPv6 network announcements

Fig. 8. Average u-index values of each path attributes.

a

I

t

t

B

e

b

l

t

l

a

e

f

t

t

v

e

i

i

t

A

o

t

u

a

1

P

l

r

e

r

I

(

n

c

o

s

H

c

n

A

l

t

T

d

n

N

f

s

c

o

t

P

p

u

s

r

a

(

9

nnounced to the route collectors more than 75% of the whole

Pv4/IPv6 space collected. Every route with a mask value smaller

han /8 in IPv4 sessions and /32 in IPv6 sessions is discarded from

his computation. Starting from March 2nd, 2016 we focused on

GP data contained in the RIB snapshot shot at midnight from ev-

ry Isolario, RIS and Route Views route collector. Then we moved

ack in time by analyzing the snapshots gathered one month ear-

ier (February 2nd, 2016), one year earlier (March 2nd, 2015) and

en years earlier (March 2nd, 2006). Details regarding the ana-

yzed BGP tables can be found in Table 2 . To highlight the large

mount of repetitiveness of collected data we focus on the differ-

nt values of each BGP path attribute found by every single full

eeder, and we define the uniqueness index ( u-index ) as the ra-

io between the number of different values assumed by the at-

ribute in the feeder RIB and the number of occurrences of the

ery same attribute in the feeder RIB. In IPv6 scenarios we improp-

rly refer to NEXT HOP as the value of the next hop field found

n the MP REACH NLRI attribute, since the NEXT HOP attribute

s IPv4 specific. Moreover, we show only the most used path at-

ribute types, and we omit results related to the ORIGIN and the

TOMIC AGGREGATE attributes due to their intrinsic (and obvi-

us) repetitiveness. Nevertheless, during these analyses we found

hat the only optional attribute not described in [11] and widely

sed is the COMMUNITIES attribute (described in [29] ). In the

nalysis of February and March 2016 we also found that less than

0% of full feeders recorded at least one occurrence of the AS4ATH and of the AS4 AGGREGATOR (both defined in [30] ), while

ess than 1% of them recorded at least one occurrence of the dep-

ecated CONNECTOR and AS PATHLIMIT path attributes.

The most interesting result is that each attribute tends to be

xtremely repetitive in IPv4 scenarios ( Fig. 8 (a)) and moderately

epetitive in IPv6 scenarios ( Fig. 8 (b)). The different behavior of

Pv4 and IPv6 scenarios can be explained by the different RIB sizes

Table 2 ) and by their (current) different pervasiveness in the Inter-

et. Only about one every five ASes appears in AS PATH attributes

arrying IPv6 reachability information, and these ASes announce

n average only two subnets each, versus the ten subnets of IPv4

cenarios. The most repetitive attribute is obviously the NEXTOP , which gets multiple values only in data collected by route

ollectors located on IXPs with a route server presence. It is worth

oting, however, that MULTI EXIT DISC , COMMUNITIES and

GGREGATOR are optional attributes, and their u-index is calcu-

ated only on the amount of subnets carrying them. A further par-

icular note can be applied to the MULTI EXIT DISC attribute.

his is a non-transitive attribute which is used to influence the BGP

ecision process of the direct peer for a given set of routes and has

o particular meaning in a BGP sessions towards a route collector.

evertheless, it is announced on average on two out of three full

eeder BGP sessions. One of the possible reasons for this apparently

trange behavior is that AS administrators typically do not use spe-

ial BGP export policies towards route collectors, choosing to apply

ne of the existing policies in which they announce their full route

owards a customer, thus including the MULTI EXIT DISC . The less intuitive result regards the repetitiveness of the AS

ATH attribute. Given the size of each RIB, one would imagine a

lethora of different paths used. On the contrary, each full feeder

ses on average a number of distinctive paths which is much

maller than the average size of the RIB ( Table 2 ), and this is

ecurrent in every scenario analyzed. More interestingly, simply

nalyzing the Complementary Cumulative Distribution Function

CCDF) of the number of unique AS paths of each full feeder ( Fig.

(a) and (b)) it is possible to see that almost every feeder uses a

Page 7: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

168 E. Gregori et al. / Computer Networks 126 (2017) 162–173

0

0.2

0.4

0.6

0.8

1

0 20000 40000 60000 80000 100000 120000 140000

P(X

>x)

x = # unique AS paths

Mar,2006Mar,2015Feb,2016Mar,2016

(a) AS paths in IPv4 network announcements

0

0.2

0.4

0.6

0.8

1

0 5000 10000 15000 20000

P(X

>x)

x = # unique AS paths

Mar,2006Mar,2015Feb,2016Mar,2016

(b) AS paths in IPv6 network announcements

Fig. 9. CCDF of the amount of unique AS path attributes.

0

0.2

0.4

0.6

0.8

1

0 20000 40000 60000 80000 100000 120000 140000

P(X

>x)

x = # unique path attributes

Mar,2006Mar,2015Feb,2016Mar,2016

(a) Path attributes in IPv4 network announcements

0

0.2

0.4

0.6

0.8

1

0 5000 10000 15000 20000

P(X

>x)

x = # unique path attributes

Mar,2006Mar,2015Feb,2016Mar,2016

(b) Path attributes in IPv6 network announcements

Fig. 10. CCDF of the amount of unique path attribute fields.

a

m

P

n

a

4

p

e

a

o

G

l

a

t

(

o

e

c

c

s

t

e

similar number of distinctive AS paths to reach every Internet des-

tination. Even more interesting, is that the number of distinctive

AS paths is about 1.5 times greater than the number of announced

ASes, i.e. a typical full feeder reaches every other AS with an

average of 1.5 different AS paths. This is likely one of the conse-

quences of the hierarchical structure of the core of the Internet.

The Internet is composed mostly of small/medium organizations

with a country-level coverage [31] and by a very small set of ASes

which are spread across different regions and provide worldwide

transit services. None of these ASes can reach every other AS by

itself though, and every geographic region has its own different

peculiarities [32] , which lead one AS to be more pervasive than

others (e.g. National/regional telcos). As a consequence of this,

packets from any AS have to cross a small set of provider-free

ASes to reach all the destinations found in different geographic

regions, leading to a limited number of possible paths available.

The repetitiveness of every single path attribute is not enough

to claim that the BGP data collected is highly repetitive. Each

route is independent of each other, and repetitive attributes may

combine in many different ways. Since path attribute ordering

is claimed as optional in [11] , it is also technically possible that

routes with the same set of path attributes receive them in a dif-

ferent order, leading to two different path attribute fields. We com-

plete the overview on data repetitiveness focusing on the amount

of different values (in bytes) of the PATH ATTRIBUTES field re-

ceived from each full feeder. The CCDF of the number of unique

fields per each feeder is depicted in Fig. 10 (a) and (b). As can

be seen, the CCDF behaviors are very close to the behaviors of

the CCDFs of the single AS PATH attribute, showing that there

re not so many combinations of attributes currently used. This

eans also that it is possible to efficiently compress the whole

ATH ATTRIBUTES field of incoming BGP packets without the

eed to introduce further overheads required to identify each path

ttribute element and compress it separately.

.2. Data compression algorithms vs. real-time BGP data collection

During the last century the data compression research field

roduced a plethora of algorithms to efficiently compress differ-

nt types of data. Most of them are extremely effective on large

mounts of static data with no particular time constraints, while

thers are more adaptive and allow random access at record level.

iven the real-time collection constraint, real-time BGP data col-

ection represents an extremely peculiar scenario. The compression

lgorithm required to compress effectively BGP data has to be able

o work on continuous streams of data composed of small packets

max 4096B [11] ) containing routing information similar to each

ther which has to be stored in a PT as soon as possible. To be

ffective, a compression algorithm in this scenario has to:

Be lossless . The original data has to be reconstructed from the

ompressed data without any loss of information.

Be adaptive . There is no a priori information about size and

ontent of the input stream, which can potentially change in time.

Be fast . Data should be compressed and inserted in the PT as

oon as possible, so as not to affect other following operations.

Allow random access at record level . Any user should be able

o read the piece of routing information in which he/she is inter-

sted just by decompressing the path attributes related to the re-

Page 8: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 169

q

t

i

c

k

T

c

f

l

w

q

d

e

r

v

l

c

t

t

a

a

t

n

p

n

t

c

b

r

t

a

t

t

u

5

B

r

I

a

n

t

b

t

T

i

a

s

P

i

(

s

P

a

t

w

m

d

t

i

p

c

Input: TOTAL PATH ATTRIBUTE P of length Ls = ’’while read byte b from P

sb = s + bif ( exists dictionary [sb ])

continueelse

if ( dictionary is not full )code = dictionary .add(wb)C .append(code)s = ’’unused codes.remove(code);

elseif (unused codes is not empty)

code = pop first (unused codes)dictionary [sb] = codeC .append(code)s = ’’

elsecode = dictionary [s]C .append(code)s = b

Output: compressed chunk C

Fig. 11. Compression algorithm LZW

∗ .

Input: compressed chunk Cwhile read code from C

s = dictionary . access by index (code)P .append(s)

Output: TOTAL PATH ATTRIBUTE P of length L

Fig. 12. Decompression algorithm LZW

∗ .

0 B

2GB

4GB

6GB

8GB

10GB

12GB

14GB

Dict=2BNo LZW*

Dict=4B

Mem

ory

usag

e

Dictionary SizeData Size

Fig. 13. Memory usage of LZW

∗ .

t

t

s

a

c

s

t

t

o

R

t

N

t

N

d

t

s

uired destinations. Thus, the decompression phase has to be un-

ied from the compression phase.

Most of the available lossless data compression algorithms fall

nto two main categories. The first, defined as entropy encoding

ontains a set of algorithms that only work with data that have

nown characteristics and where order is known and preserved.

he most well known algorithm in this category is the Huffman

oding [33] , which assigns binary codes to symbols based on the

requency of the symbol in the original chunk of data. The main

imit of this algorithm in our scenario – and in general, of the

hole entropy coding category – is that it relies on occurrence fre-

uencies which have to be computed a priori. The second category,

efined as dictionary coders , contains all those algorithms which

xploit a data structure – i.e. the dictionary – containing the most

ecurrent patterns found in the input and their related compressed

alues. To compress data, these algorithms firstly search for the

ongest match in the dictionary and substitute it with the related

ompressed value. The most well known examples of algorithms in

his category are the Lempel-Ziv coders LZ77 [34] , LZ78 [35] and

heir variants (e.g. LZW [36] , LZ4 [37] and Snappy [38] ). LZ77 is

ble to capture the repetitiveness of a chunk of data by building

dictionary that is a portion of the previously encoded sequence,

hus implicitly tying compression and decompression phases and

ot fulfilling our requirements. LZ78 and LZW – which is an im-

rovement of LZ78 – on the other hand, create a dictionary dy-

amically to store repetitive patterns, which can be recreated on

he decompression side in a second moment or – as in our case –

an be shared by compressor and decompressor. Another two fast

lock-based compression algorithms based on the Lempel-Ziv algo-

ithms are LZ4 and Snappy. Both of them are faster and more effec-

ive than original algorithms in large blocks of data, and represent

good choice to compress the whole RIB in snapshots. However,

o be implemented in ICE they would require the creation of a fur-

her buffer in memory where in order to store a large bunch of

ncompressed BGP packets, leading to peaks of RAM usages.

. Saving memory in ICE with path attribute compression

Given the requirements of our scenario, the repetitiveness of

GP data and the characteristics of the data compression algo-

ithms available, a valid solution to reduce the memory usage in

CE is to apply a variant of LZW to incoming data while keeping

per feeder dictionary in memory. The choice to keep a dictio-

ary in memory shared between compressors and decompressors

thus removing the classic constraint to compressing data such

hat the dictionary could be rebuilt on the decompression side just

y analyzing the sequence of codes received – allows us to simplify

he original LZW algorithm and optimize it for our special needs.

he LZW

∗ compression algorithm starts with a dictionary initial-

zed with one code per every symbol in the alphabet and works

s illustrated in Fig. 11 . It starts by trying to find the shortest sub-

tring P in input string that is not indexed by the dictionary. Then,

is added to the dictionary (line 7) – provided that the dictionary

s not full – and the related N -bit code is appended to the output

lines 8–11). The algorithm then restarts trying to find the next

hortest prefix starting from the character immediately following

. If the dictionary is found to be full, the compressor checks if

n unused dictionary index is available (lines 11–14) and uses it

o add P to the dictionary. A dictionary entry can become unused

henever all the compressed chunks containing its index are re-

oved from PT as a consequence of route replacements or with-

rawals. This means that each entry also contains a usage counter

o keep track of how many compressed chunks are currently us-

ng every given index. If there are no unused indices, the longest

ortion of P found in the dictionary is used – which is a substring

omposed by the first P.size () − 1 elements of P – and appends it

o the output, continuing the iteration from the last character of

he substring (lines 19–22). The decompression algorithm ( Fig. 12 )

imply reads the compressed chunk as a sequence of N -bit indices

nd rebuilds the original TOTAL PATH ATTRIBUTE field by con-

atenating all the substrings found in the dictionary at the corre-

ponding indices.

The compression performances of LZW

∗ are strictly related to

he value of N , which limits the size of the dictionary and affects

he size of compressed sequence stored. To evaluate the best value

f N we compressed with LZW

∗ every path attribute recorded by

oute Views, RIS and Isolario on March 2nd, 2016 . Fig. 13 shows

he results found using N = 2 B and N = 4 B . As can be seen, a large

has the best compression effectiveness, but the size of the dic-

ionary increases potentially indefinitely. On the contrary, a small

keeps the dictionary small and manages to effectively compress

ata, even though about 50% of feeders completely fill up their dic-

ionary ( Fig. 14 ).

Fig. 15 shows the integration of the compression module in-

ide ICE. Whenever an UPDATE message arrives from a feeder F ,

i
Page 9: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

170 E. Gregori et al. / Computer Networks 126 (2017) 162–173

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000 1x106 1x107

P(X

>x)

Dictionary Entries

Dict=4B Dict=2B

Fig. 14. CCDF of the number of dictionary entries in LZW

∗ per feeder (March, 2nd

2016).

Fig. 15. Integration of LZW

∗ algorithm into ICE.

0 B

1GB

2GB

3GB

4GB

5GB

6GB

1 2 4 8 16 32 64

Mem

ory

usag

e

# of IPv4 full routing tables

without LZW*with LZW*

Fig. 16. ICE memory usage with LZW

∗ while storing multiple IPv4 full routing

tables.

Fig. 17. CCDF of the elapsed time required by ICE to parse IPv4 full routing tables.

Fig. 18. CCDF of the average route processing rate of ICE with IPv4 full routing

tables.

a

w

m

b

w

o

T

w

t

o

R

i

l

a

p

v

m

w

t

1

t

i

t

A

e

t

(

c

a

m

it is parsed and – if the packet contains the announcement of a

route – its TOTAL PATH ATTRIBUTE field is compressed using

the dictionary dedicated to F i and finally stored in the PT . When-

ever a read request has to be performed, the sequence of codes is

retrieved from PT , the original TOTAL PATH ATTRIBUTE field is

rebuilt thanks to the dedicated feeder dictionary, and it is finally

sent to the reader.

Fig. 16 shows the amount of memory used by ICE with LZW

with an increasing number of feeders sharing their IPv4 full rout-

ing table. The compression module shows an increment of only

57.5MB per feeder connected, thus saving up to 30% of memory

used by the original version of ICE. Returning to the ideal scenario

imed at connecting the Internet core described in Section 2 , this

ould mean that only 18 machines would be required, since each

achine with 16GB RAM can store up to 285 IPv4 full routing ta-

les. The improved performances in memory management gained

ith the application of LZW

∗ in ICE, however, comes with the price

f an increased execution time required by the compression phase.

o better understand the amount of additional elaboration time,

e stressed ICE by performing local RIB transfers and we analyzed

he time required to complete each RIB transfer with and with-

ut applying the LZW

∗ technique. Fig. 17 shows the CCDF of the

IB transfer time of each of the 319 feeders which were announc-

ng their full IPv4 routing table to either Route Views, RIS or Iso-

ario on March 2nd, 2016. Each RIB transfer involves an average of

bout 570k routes per feeder, each sent to ICE in a dedicated BGP

acket via UNIX sockets from a separate process running on the

ery same machine hosting ICE, so that transmission overhead is

inimized. As expected, the processing speed of ICE slows down

hen LZW

∗ is enabled. ICE without LZW

∗ received and stored in

he PT each full route in a window of time between 5.45 s and

1 . 25 s, while with LZW

∗ the window of time ranges from 9 . 31 s

o 36 . 65 s. Fig. 18 shows the CCDF of the average route process-

ng rate, which is computed per feeder as the ratio between the

ransfer time and the amount of subnets involved in the transfer.

s can be seen, ICE without LZW

∗ is able to parse about one route

very 12 μs–20 μs (i.e. 50k–80k routes per second), while the in-

roduction of LZW

∗ leads to parsing one route every 30 μs–64 μs

i.e. 15k–33k routes per second). Obviously this is just a worst-

ase scenario, since each RIB transfer is performed locally as fast

s possible, whereas real table transfers have been shown to be

uch slower, taking also several minutes [39] . However, it is pos-

Page 10: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 171

Fig. 19. ICE in Isolario real-time environment.

s

s

p

u

d

R

i

6

c

c

t

c

w

[

r

B

i

e

l

c

p

i

w

m

a

t

r

q

w

m

r

t

t

t

a

r

B

a

i

p

d

s

s

t

t

e

c

B

T

i

c

o

n

f

r

u

7

e

m

h

p

I

s

u

t

f

r

t

d

e

o

I

p

w

/

o

c

r

b

a

I

R

ible to infer from these results that ICE with LZW

∗ running on a

tandard server does not introduce any delay other than the com-

ression procedure in data recorded if fed from feeders sending

p to about 16k routes per second. In other words, ICE on a stan-

ard server would lose precision only in rare events, such as full

IB transfers, large-scale events or network topology changes (e.g.

ntroduction/removal of one of the transit providers) on the feeder.

. Isolario: an ICE use case

ICE has been devised with the specific aim of enabling route

ollectors to be more interactive, but it can also be used in other

ontexts. For example, ICE can be applied to intra-AS environments

o monitor the evolution of the best routes of each BGP router, or

an be used as an engine for looking glass software and queried at

ill from the web. A special use case of ICE is the Isolario project

17] , a distributed system developed at the IIT-CNR which provides

eal-time monitoring services based on the analysis of incoming

GP sessions in exchange for full routing tables. A network admin-

strator which decides to participate in Isolario would be able, for

xample, to analyze in real-time its BGP session and detect patho-

ogical inter-domain routing events like route flapping without in-

reasing the computational load on its routers or introducing third-

arty software, and/or to analyze the historic routing evolution of

ts own networks. More details about Isolario can be found on the

ebsite [17,40] .

ICE is one of the main pillars of the Isolario real-time environ-

ent, since it is used both for regular route collecting operations

nd to retrieve the full routing views whenever required. Most of

he Isolario services are based on the real-time analyses of the

eachability of specific portions of IP space. Users are typically re-

uired to identify which networks they want to monitor and from

hich BGP session. Then, the requests are forwarded to the Isolario

odules where the involved BGP sessions are established and a di-

ect channel is created, dedicated to the monitored network from

he feeders to the user web browser. Every single packet involving

he monitored network will flow in this channel, as well as its ini-

ial state as recorded by ICE. To achieve this, ICE cannot be used

s it is but should be inserted in a framework which is summa-

ized in Fig. 19 . The dispatcher module duplicates every incoming

GP packet and propagates the packet both to ICE and to a set of

pplication-driven filters . Every packet matching the filter criteria

s then forwarded to the proper service module, which will finally

repare data and will propagate it to the user. Since ICE and the

ispatcher are two separate and independent processes, it is pos-

ible that an incoming packet has already been forwarded to the

ervices, while ICE has not yet parsed it. Thus, it is not possible

o infer the current status of a given network with a simple query

o ICE without risking introducing errors. To solve this problem,

very service request has to populate the filter module and a dis-

riminator module with the networks involved (1) to allow fresh

GP packets to be immediately forwarded to the rest of the system.

hen, (2) it must request ICE to supply the status of the networks

nvolved, which will be received by the discriminator (3). The dis-

riminator module is able to understand whether parts of the read

peration results are outdated or not just checking if the related

etwork was announced/withdrawn in any BGP message coming

rom the dispatcher, and forwards to the system only the correct

outing information providing real-time and reliable results to the

ser.

. Conclusions

In this paper we presented ICE, a multi-threaded and memory-

fficient route collecting engine developed to allow the deploy-

ent of a new class of interactive route collectors. ICE exploits the

igh degree of repetitiveness found in BGP packets received on a

er-feeder basis, saving up to 30% memory in storing the Adj-RIB-

n of each feeder thanks to a variant of the Liv-Zempel algorithms

pecifically tailored for BGP data. Thanks to the compression mod-

le, it is possible to store in memory up to 285 IPv4 full routing

ables, allowing the deployment of large-scale route collecting in-

rastructures. ICE also enhances the concept of interactiveness of

oute collectors thanks to its multi-threaded environment, creating

he basis for the development of innovative and responsive inter-

omain routing analysis software.

To the best of our knowledge, ICE is the first multi-threaded

ngine which implements a classic compression technique variant

n data stored in memory to lower the amount of resources used.

CE is already deployed and working as the core of the Isolario

roject [17] and is publicly available as an open-source software

ritten for GNU/Linux platforms on the Isolario website ( https:

/www.isolario.it ). ICE has been developed with its primary focus

n route collecting, but it could be used as the basic brick of more

omplex pieces of software. For example, it could be used to create

eal-time looking glass software as described in Section 6 , or could

e enhanced with a proper BGP decision process so as to create

multithreaded BGP daemon to be used as route server on large

XPs.

eferences

[1] K. Lougheed, J. Rekhter, RFC 1105 - a border gateway protocol (BGP), 1989.

[2] M. Chiesa , L. Cittadini , L. Vanbever , S. Vissicchio , G. Di Battista , Using routersto build logic circuits: how powerful is BGP? in: Proc. of IEEE ICNP, 2013, pp.

1–10 . [3] L. Gao , On inferring autonomous system relationships in the internet,

IEEE/ACM Trans. Netw. 9 (6) (2001) 733–745 . [4] X. Hu , Z. Morley Mao , Accurate real-time identification of IP hijacking, in: Proc.

of IEEE Symposium on Security and Privacy, 2007, pp. 3–17 .

[5] M. Lad , D. Massey , D. Pei , Y. Wu , B. Zhang , L. Zhang , PHAS: a prefix hijack alertsystem, in: Proc. of USENIX Security Symposium, vol. 15, 2006 .

[6] X. Shi , Y. Xiang , Z. Wang , X. Yin , J. Wu , Detecting prefix hijackings in the inter-net with argus, in: Proc. of ACM SIGCOMM IMC, 2012, pp. 15–28 .

[7] S.T. Teoh , K. Zhang , S.-M. Tseng , K.-L. Ma , S.F. Wu , Combining visual and auto-mated data mining for near-real-time anomaly detection and analysis in BGP,

in: Proc. ACM VizSEC/DMSEC, 2004, pp. 35–44 .

[8] S. Deshpande , M. Thottan , T. Ho , B. Sikdar , A statistical approach to anomalydetection in interdomain routing, in: Proc. of BROADNETS, 2006, pp. 1–10 .

[9] T. Wong , V. Jacobson , C. Alaettinoglu , Internet routing anomaly detection andvisualization, in: Proc. of DSN, 2005, pp. 172–181 .

[10] K. Zhang , A. Yen , X. Zhao , D. Massey , S. Felix Wu , L. Zhang , On detection ofanomalous routing dynamics in BGP, in: Proc. of IFIP-TC6 Networking, 2004,

pp. 259–270 . [11] Y. Rekhter, T. Li, S. Hares, RFC 4271 - a border gateway protocol 4 (BGP-4),

2006.

[12] W.A. Miltenburg, Research on RIS route collectors, ( https://labs.ripe.net/Members/wouter _ miltenburg/Researchpaper.pdf ).

[13] P.C. Cheng , X. Zhao , B. Zhang , L. Zhang , Longitudinal study of BGP monitor ses-sion failures, ACM SIGCOMM Comput. Commun. Rev. 40 (2) (2010) 34–42 .

[14] Quagga routing suite, ( http://www.nongnu.org/quagga/ ).

Page 11: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

172 E. Gregori et al. / Computer Networks 126 (2017) 162–173

[

[

[

[

[

[

[15] The BIRD routing daemon project, ( http://bird.network.cz/ ). [16] L. Blunk, M. Karir, C. Labovitz, RFC 6396 - multi-threaded routing toolkit (MRT)

routing information export format, 2011. [17] IIT-CNR Isolario project, last accessed 2016-11-21, ( http://www.isolario.it ).

[18] T. McGregor , H.-W. Braun , J. Brown , The NLAMR network analysis infrastruc-ture, IEEE Commun. Mag. 38 (5) (20 0 0) 122–128 .

[19] University of Oregon route views project, last accessed 2016-11-21, ( http://www.routeviews.org ).

[20] RIPE NCC routing information service, last accessed 2016-11-21, ( http://www.

ripe.net/data-tools/stats/ris/routing-information-service ). [21] S. Vissicchio , L. Vergantini , L. Cittadini , V. Mezzapesa , M. Pizzonia , M.L. Pa-

pagni , Beyond the best: real-time non-invasive collection of BGP messages, in:Proc. of INM/WREN, 2010, pp. 9–15 .

[22] OpenBGPD, last accessed 2016-11-21, ( http://www.openbgpd.org ). [23] XORP, last accessed 2016-11-21, ( http://www.xorp.org ).

[24] F. Raspall , Building nemo, a system to monitor IP routing and traffic paths in

real time, Comput. Networks 97 (2016) 1–30 . [25] T. Bates, R. Chandra, D. Katz, Y. Rekhter, RFC 4724 - graceful restart mechanism

for BGP, 2007. [26] J. Li , M. Guidero , Z. Wu , E. Purpus , T. Ehrenkranz , BGP routing dynamics revis-

ited, ACM SIGCOMM Comput. Commun. Rev. 37 (2) (2007) 5–16 . [27] E. Gregori , A. Improta , L. Lenzini , L. Rossi , L. Sani , A novel methodology to ad-

dress the internet AS-Level data incompleteness, IEEE/ACM Trans. Netw. 23 (4)

(2015) 1314–1327 .

28] R. Oliveira , D. Pei , W. Willinger , B. Zhang , L. Zhang , The (In)completeness ofthe observed internet AS-level structure, IEEE/ACM Trans. Netw. 18 (1) (2010)

109–122 . 29] P. Traina, R. Chandrasekeran, RFC 1997 - BGP communities attribute, 1996,.

[30] Q. Vohra, E. Chen, RFC 6793 - BGP support for four-octet autonomous system(AS) number space, 2012.

[31] E. Gregori , A. Improta , L. Lenzini , L. Rossi , L. Sani , Discovering the geographicproperties of the internet AS-level topology, Netw. Sci. 3 (1–4) (2013) 34–42 .

32] W.B. Norton , The Internet Peering Playbook: Connecting to the Core of the In-

ternet, DrPeering Press, Palo Alto, CA, 2011 . [33] D.A. Huffman , A method for the construction of minimum-redundancy codes,

Proc. Inst. Radio Eng. 40 (9) (1952) 1098–1101 . [34] J. Ziv , A. Lempel , A universal algorithm for sequential data compression, IEEE

Trans. Inf. Theory 23 (3) (1977) 337–343 . [35] J. Ziv , A. Lempel , Compression of individual sequences via variable-rate coding,

IEEE Trans. Inf. Theory 24 (5) (1978) 530–536 .

36] T.A. Welch , A technique for high-Performance data compression, IEEE Comput.17 (6) (1984) 8–19 .

[37] LZ4, last accessed 2016-11-21, ( http://cyan4973.github.io/lz4 ). 38] Snappy, last accessed 2016-11-21, ( http://google.github.io/snappy ).

39] Z. Ben Houidi , M. Meulle , R. Teixeira , Understanding Slow BGP Routing TableTransfers, in: Proc. of ACM SIGCOMM IMC, 2009, pp. 350–355 .

[40] E. Gregori, A. Improta, L. Sani, Isolario: a do-ut-des approach to improve the

appeal of BGP route collecting.

Page 12: ICE: A memory-efficient BGP route collecting engine · ICE: A memory-efficient BGP route collecting engine ... tecting routing accidents and hijack attempts. However, an increasing

E. Gregori et al. / Computer Networks 126 (2017) 162–173 173

ring from the University of Pisa in 1980. He joined CNUCE, an institute of the Italian

a CNR research director. In 1986 he held a visiting position in the IBM research center on heterogeneous networking. He has contributed to several national and international

number of papers in the area of computer networks and has published in international

interests include: Internet Topology, Wireless access to Internet, Wireless LANs, Evolution the editorial board of several international journals: the Cluster Computing Journal, Com-

l chairman of international conference: Networking 20 02, IEEE PERCOM 20 06 conference. s and in proceeding of international conferences.

e Department of Computer Science of the University of Pisa. She received his Bachelor

2011. She received her Ph.D. degree in 2015 from the Department of Computer Science of he has been working on novel approaches to manage the problem of data availability in

include also distributed systems, peer-to-peer networks, Internet AS-level measurement t the Heinrich Heine University of Dusseldorf.

puter Engineering from the University of Pisa, Italy, in 20 06 and 20 09 respectively and

iversity of Pisa, in 2013. Since 2009 he has held a research position with the Institute Research Council (CNR) in Pisa. In 2010 he was a visitor at the AT&T Research Labs in

nk characteristics. His research interests include the Internet AS-level ecosystem analysis

eering from the University of Pisa, respectively in 2008 and 2010. In 2014 received his hool for Advanced Studies Lucca. In 2013 he was a visiting researcher at the Computer

rking on a BGP monitoring project. Since 2014 he is researcher at with the Institute of

search Council (CNR) in Pisa His research interests are in Internet mapping, monitoring

Enrico Gregori received the Laurea in electronic enginee

National Research Council (CNR) in 1981. He is currently in Zurich working on network software engineering and

projects on computer networking. He has authored a large

journals and conference proceedings. His current research of TCP/IP protocols and ad hoc networks. He has been on

puter Network, ACM wireless Network,. He has the generaEnrico Gregori published more than 140 papers in journal

Barbara Guidi is currently a postdoctoral researcher at th

degree in February 2007 and the M.Sc. degree in October the University of Pisa. From the beginning of her Ph.D., s

Distributed Online Social Networks. Her research interestsand analysis. In 2014, during her Ph.D., she was a visitor a

Alessandro Improta received his B.Sc. and M.Sc. in Com

his Ph.D. degree in Information Engineering from the Unof Informatics and Telematics (IIT) at the Italian National

Florham Park, U.S. working on the discovery of Internet liand the discovery of Internet path characteristics.

Luca Sani received his B.Sc. and M.Sc. in Computer EnginPh.D. in Computer Science and Engineering from IMT Sc

Science Department of the Colorado State University, wo

Informatics and Telematics (IIT) at the Italian National Reand analysis.