Inferring Geography from BGP raw data - Isolario · Inferring Geography from BGP raw data Luca Sani...

Post on 31-Aug-2019

1 views 0 download

Transcript of Inferring Geography from BGP raw data - Isolario · Inferring Geography from BGP raw data Luca Sani...

Inferring Geography from BGP raw data

Luca Sani

Enrico Gregori, Alessandro Improta, Luciano Lenzini, Lorenzo Rossi

Luca Sani Inferring Geography from BGP raw data 1 / 18

The Internet

The Internet: a huge set interconnected Autonomous Systems (ASes)

ASes are owned by organizations with different geographicdistribution and economic purposes

Luca Sani Inferring Geography from BGP raw data 2 / 18

Motivation

Internet AS-level topology inferred from BGP data:

1 node = 1 AS

1 edge = 1 or more BGP connections between two ASes

This global view hides the Internet heterogeneity

Luca Sani Inferring Geography from BGP raw data 3 / 18

Goals

1 Infer regional AS-level topologies from BGP data

2 Analyze graph and economic properties . . .

. . . at continental granularity:

Africa

Asia Pacific (Asia and Oceania)

Europe

Latin America (the Caribbean, Central America, Mexico andSouth America)

North America (Bermuda, Canada, Greenland, Saint Pierreand Miquelon, USA)

Luca Sani Inferring Geography from BGP raw data 4 / 18

Goals

1 Infer regional AS-level topologies from BGP data

2 Analyze graph and economic properties . . .

. . . at continental granularity:

Africa

Asia Pacific (Asia and Oceania)

Europe

Latin America (the Caribbean, Central America, Mexico andSouth America)

North America (Bermuda, Canada, Greenland, Saint Pierreand Miquelon, USA)

Luca Sani Inferring Geography from BGP raw data 4 / 18

What “BGP data” is?

We use BGP data provided by the Oregon UniversityRouteViews and the RIPE RIS projects (October 2011)

They deployed route collectors around the world

Route collectors gather routes from cooperating ASes(feeders)

Luca Sani Inferring Geography from BGP raw data 5 / 18

What “BGP data” is? (cont.)

There are three relevant information (for our work) in each route:

Set of AS paths ⇒ Global Topology

39,974 ASes139,944 Connections

Luca Sani Inferring Geography from BGP raw data 6 / 18

First Step - AS geolocation

“An AS is a connected group of one or more IP prefixes runby one or more IP network operators which has a single and clearlydefined routing policy” (RFC 1930)

For each AS

1 We collect its IP prefixes from BGP data

2 We geolocate it by geolocating its prefixes (Maxmind GeoLiteDatabase)

96% of 39,974 ASes result located only in one region

88% of 139,944 connections involve at least an AS locatedonly in one region

Luca Sani Inferring Geography from BGP raw data 7 / 18

Second Step - Single Region ASes

The connection A-B is geolocated in North America

BGP requires the interfaces of the connection to share thesame IP subnet (exception: BGP multihop)

The subnet S belongs either to AS A or to AS B

Luca Sani Inferring Geography from BGP raw data 8 / 18

Second Step - Single Region ASes (cont.)

A does not own any IP address outside the North America!

Luca Sani Inferring Geography from BGP raw data 9 / 18

Second Step - Single Region ASes (cont.)

In any case IP A must be in North America

⇒ The connection is geolocated in the single common region

Luca Sani Inferring Geography from BGP raw data 10 / 18

Second Step - Other cases

We exploit single region ASes or SOURCE and DESTINATION regions

We geolocate the connection in North America (regionalprinciple)

Luca Sani Inferring Geography from BGP raw data 11 / 18

Regional Topologies: EU and NA case

EuropeNorth

AmericaWorld

ASes 17,101 15,894 39,974Connections 72,581 42,610 139,944

Avg.Degree

8.49 5.36 6.97

Max Degree1818

(RETN)2542

(Level3)3418

(Cogent)

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104

P(X

>x)

x = k

Europe North America World

Luca Sani Inferring Geography from BGP raw data 12 / 18

Regional Topologies: EU and NA case

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10-4 10-3 10-2 10-1 100

P(X

>x)

x = kNN/max(k)

Europe North America World

Luca Sani Inferring Geography from BGP raw data 13 / 18

Economic Analysis

In order to get better insights we investigate the economic natureof the connections

Classic Economic Tags

Provider-to-Customer (P2C), Peer-to-Peer (P2P),Sibling-to-Sibling (S2S)

We adapted an economic tagging algorithm* to deal withgeographic information

*Enrico Gregori, Alessandro Improta, Luciano Lenzini, Lorenzo Rossi, Luca Sani: BGP

and inter-AS Economic Relationships, IFIP Networking ’11

Luca Sani Inferring Geography from BGP raw data 14 / 18

Economic Analysis - Results

EuropeNorth

AmericaWorld

P2C 32,471 31,820 80,095P2P 39,813 10,230 58,040S2S 297 560 1,743

P2C = Provider-to-Customer, P2P = Peer-to-Peer, S2S = Sibling-to-Sibling

Europe vs North-America case

They have a similar number of P2C connections

Europe has much more P2P connections

IXPs play a fundamental role in this difference

Luca Sani Inferring Geography from BGP raw data 15 / 18

Conclusion

We developed a methodology to infer continental AS-leveltopologies

We analyzed their graph and economic properties

We evidenced structural differences otherwise hidden in theglobal topology

Luca Sani Inferring Geography from BGP raw data 16 / 18

Future

Fine-grained analysis (requires a high-precision geolocationtool)

Sensitivity of the results with respect to geolocation databases

Influence of current BGP feeders distribution on the results

Luca Sani Inferring Geography from BGP raw data 17 / 18

The End

Thank you for your attention!

Questions

luca.sani@imtlucca.it

Luca Sani Inferring Geography from BGP raw data 18 / 18

Backup

Backup Slides

Luca Sani Inferring Geography from BGP raw data 19 / 18

Active Measurement

Active measurement to solve geolocation of particular connections

Luca Sani Inferring Geography from BGP raw data 20 / 18

Step 2 - FROM & NEXT HOP

the FROM field identifies the neighbor BGP

the NEXT HOP field identifies the neighbor IP

Luca Sani Inferring Geography from BGP raw data 21 / 18

Economic Analysis - Tag Changes

Tag changes from the worldwide to the regional scenarios

AfricaAsia

PacificEurope

LatinAmerica

NorthAmerica

Peering to transit 12 86 325 36 219Transit to peering 165 824 2,304 361 1,136

Luca Sani Inferring Geography from BGP raw data 22 / 18

Geolocation issues

a) 145 ASes are not geolocated at all

b) pair of ASes that do not share any region (partial geolocationor multihop)

Do not appear in any regional topology

6,141 over 139,944 connections

199 over 39,974 ASes ⇐ 145 because a), 44 because b)

Luca Sani Inferring Geography from BGP raw data 23 / 18

Tagging algorithm

step A: Inference of all the possible economic relationships foreach direct AS connection

direct means that (A,B) 6= (B,A)

It is based on the approach proposed by Oliveira et al. in [2]

The list of Tier-1 provided by Wikipedia has been exploited

For each tag is mantained the lifespan of the AS path used

At the end of this step we have multiple (tag, lifespan) pairsfor each connection

Luca Sani Inferring Geography from BGP raw data 24 / 18

Tagging algorithm

step B: Inference of a single economic relationship for eachdirect AS connectionAll (tag, lifespan) pairs related to the same direct connectionhave to be merged

Find the max lifespan among each pairMerge only those pairs that have a comparable lifespan withthe max, i.e. those do not differ more than N order ofmagnitude from the maxRecord the largest lifespan as the lifespan of the resulting tag

[A, B][A, B] p2c p2p c2p s2s

p2c p2c p2c s2s s2sp2p p2c p2p c2p s2sc2p s2s c2p c2p s2ss2s s2s s2s s2s s2s

Luca Sani Inferring Geography from BGP raw data 25 / 18

Tagging algorithm

step C: Final tagging and two-way validationIn order to have the economic relationship existing betweenAS A and AS B, the tags inferred for (A,B) and (B,A)connections have to be mergedThe approach used is the same as Step B, considering thedifferent direction of connections, e.g. (A,B) = p2c and (B,A)= c2p have the same meaning

The merge is still based on lifespan, thus if the lifespans arenot comparable, only the long-lasting tag affect the final tag

If there is a tag for both (A,B) and (B,A) and their lifespan iscomparable, then the tag is said to be two-way validated

Luca Sani Inferring Geography from BGP raw data 26 / 18

Step 1 - Inferring Enhanced Routes from BGP data

IP Geolocation Database: Maxmind GeoLiteCity

The geolocation of the IP feeder is trivial

The geolocation of a /X prefix requires to geolocate 2(32−X )

IP addresses

What about AS geolocation?

Luca Sani Inferring Geography from BGP raw data 27 / 18

Step 1 - Inferring Enhanced Routes from BGP data

IP Geolocation Database: Maxmind GeoLiteCity

The geolocation of the IP feeder is trivial

The geolocation of a /X prefix requires to geolocate 2(32−X )

IP addresses

What about AS geolocation?

Luca Sani Inferring Geography from BGP raw data 27 / 18

Step 2 - Detection of SRLTPs inside enhanced Routes

SRLTP = Single Region Located Transit Point

In each enhanced route we find regions from which the traffic hasto transit:

1 SOURCE REGION, DEST REGION and one-region located ASes

2 ASes with only one region in common with neighbors

Luca Sani Inferring Geography from BGP raw data 28 / 18

Step 2 - Detection of SRLTP - Examples

Luca Sani Inferring Geography from BGP raw data 29 / 18

Step 3 - Inferring Geographic AS paths

For each enhanced route we analyze each SRTLP

Given the region of a SRLTP we try to expand the set ofconnections in that region

Luca Sani Inferring Geography from BGP raw data 30 / 18

Step 3 - Inferring Geographic AS paths

Luca Sani Inferring Geography from BGP raw data 31 / 18

Regional Topologies - Results

AfricaAsia

PacificEurope

LatinAmerica

NorthAmerica

World

ASes 815 6,427 17,101 2,453 15,894 39,974Connections 2,002 18,040 72,581 8,329 42,610 139,944

Avg. Overlap(Conns)

0.03±0.01 0.05±0.02 0.03±0.02 0.03±0.01 0.05±0.02 -

Avg. Degree 4.90 5.61 8.49 6.79 5.36 6.68

Luca Sani Inferring Geography from BGP raw data 32 / 18

Economic Analysis - Results

AfricaAsia

PacificEurope

LatinAmerica

NorthAmerica

World

P2C 1,456 12,808 32,471 4,514 31,820 80,095P2P 492 5,012 39,747 3,719 10,164 58,040S2S 21 102 297 37 350 1,743

P2C = Provider-to-Customer, P2P = Peer-to-Peer, S2S = Sibling-to-Sibling

Luca Sani Inferring Geography from BGP raw data 33 / 18

Economic Analysis

In order to get better insights we investigate the economicnature of the connections

Original* Adapted

Input: AS paths + LifespansGeographic AS paths +

Lifespans

Output:Global Economic Tagged

TopologyRegional EconomicTagged Topologies

Classic Economic Tags

Provider-to-Customer (P2C) , Peer-to-Peer (P2P) ,Sibling-to-Sibling (S2S)

*Enrico Gregori, Alessandro Improta, Luciano Lenzini, Lorenzo Rossi, Luca Sani: BGP

and inter-AS Economic Relationships, IFIP Networking ’11

Luca Sani Inferring Geography from BGP raw data 34 / 18

Economic Relationships

provider-to-customer: the customer pays the provider toreach all ASes that it cannot reach in other ways

peer-to-peer: the two ASes exploits each other to reach theircustomer-cones (typically free-of-charge)

sibling-to-sibling: each AS acts as a provider for the other

Luca Sani Inferring Geography from BGP raw data 35 / 18