Internet Tomography and Geography What is this area all about? Related work in the area Main Paper...

Post on 13-Jan-2016

212 views 0 download

Transcript of Internet Tomography and Geography What is this area all about? Related work in the area Main Paper...

Internet Tomography and Geography

What is this area all about?Related work in the areaMain Paper WEBMAPPER’s features How WEBMAPPER works WEBMAPPER results summarized

Internet Tomography / Geography

Where, geographically, are my clients located (or servers)?What’s the best server for me to get content from?Given two IP’s, what’s the latency between them? What IP’s all come from the same geographic location?Generally, show me a complete and accurate map of the Internet

Related Work

B. Krishnamurthy and J. Wang, “On network-aware clustering of webclients,” in Proceedings of SIGCOMM ’00, August 2000

Problem: Which IP’s are the same Does BGP-table based clustering Groups IP’s by common administrative control Passive

S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the placement of intenet instrumentation,” in Proceedings of INFOCOM ’00, March 2000, pp. 295–304.

Problem: What’s latency between 2 IP’s Clusters the internet based on BGP address prefixes Places “Tracers” everywhere in the Internet that actively ping

each other

Related work

An Investigation of Geographic Mapping Techniques for Internet Hosts Padmanabhan, Subramanian, SIGCOMM 2001 Problem: geo-locate from IP 3 solutions

GeoTrack: infer location by DNS / traceroute GeoPing: probe target from known locations,

triangulate [active] GeoCluster: do passive BGP, IP-prefix based

clustering

Related work

Predicting Internet Network Distance with Coordinate-Based Approaches Problem: Given 2 IP’s, what’s latency Ng, Zhang, CMU, INFOCOM 2002 Active network of landmarks, pinging each

other Better math model (GNP) for interpolating

distance than IDMaps

Proprietary Solutions (Akamai, etc) Problem: What’s the closest server to an IP?

Clustering and Server Selection Using Passive Monitoring

M. Andrews, B. Sheperd, A. Srinivasan, P. Winkler, F. Zane(Bell Labs), INFOCOM 2002

Problem: Server Selection

Given a client, and a set of servers, all with identical content, tell me the “best” serverFor the client, “best” means lowest latency or highest throughputFor the whole system, it may mean something else

What is Passive Monitoring?

Content Servers don’t ping each otherContent Servers don’t ping clientsNo pinging is done: no additional network traffic is introducedInstead, servers record the Round-Trip Time of TCP handshake from a client

WEBMAPPER

WEBMAPPER: Clusterer

Clusterer uses the client-server latency pairs reported by the serversDetermines which address prefixes correspond to the same network location, and which don’tDoes more than just find the closest server to each cluster; assigns probabilities to each to balance the network flow

WEBMAPPER OutputClient cluster

Content server 1

Content server 2

Content server 3

25.135.64.0/19

0.9 0.05 0.05

132.0.0.0/8 0.85 0 0.15

WEBMAPPER: Big Tree

Giant Binary Tree of all IP addressesWell, not all… assume last 8 bits always clusteredRoot of tree: 0.0.0.0/0 Children 0.0.0.0/1, 128.0.0.0/1 Leaves 123.123.123.123/24

WEBMAPPER: Big Tree

Leaves also store Sum of recorded distances (per

server) Squared sum of recorded distances Number of recorded distances

Leaf data is periodically aged exponentially (I = I * 0.9)

WEBMAPPER: Small Tree

Clusters are formed from big tree by folding children into their parents when they’re “similar” to their siblingsUses statistical test to determine “similar” (two-sample t-test)Threshold based

Assigning Clusters to Servers

Assigning servers to clusters is complicated 1. Testing Index

Futzes with the probability of a cluster being assigned to a server

Based on multi-armed bandit solution

Assigning Clusters to Servers

Assigning Clusters to Servers

Calculate server capacityEach cluster assignment has a latency costTry to minimize the total cost without breaking any server’s capacityGraph theory to the rescue: min-cost flow

Results

Experiment 1: 28 day log of busy web traffic Recorded client IP, time, RTT Clustering produced 17,270 clusters Here are 12 example clusters for high

traffic days

Results

Experiment 2: Set up a west-coast and east-coast

server Force clients to download something

from each (to get actual measurments)

Have clients download something from the “either-or” server (WEBMAPPER powered)

Opinions

Statistics and weighing formulae are coolWhat’s a good way to tell if the clustering is any good aside from eyeing samples?Two server test is all we can get out of Bell Labs?