On the Internet Delay Space Dimensionality
-
Upload
bruno-abrahao -
Category
Education
-
view
1.568 -
download
2
description
Transcript of On the Internet Delay Space Dimensionality
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
ACM/SIGCOMM Internet Measurement Conference, 2008
InetDim: Characterizing the Internet Delay Space Dimensionality
Bruno Abrahao Robert Kleinberg
Cornell University
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Develop a geometric framework for exploring properties of networks
Understanding allows us to make predictions on the consequences of the growth and evolution of the network and guide the design of distributed systems.
InetDim Project
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Concerned with properties of the Internet that contribute to the overall dimensionality of its latency space.
B. Abrahao and R. Kleinberg, On the Internet Delay Space Dimensionality, In Proc. of ACM/SIGCOMM Internet Measurement Conference (IMC 2008)
Current Investigation
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Internet Delay Space―matrix of all-pairs round trip times between Internet
hosts
Dimensionality
―We’ll consider several definitions in this talk
―Value which abstracts notion of network complexity
Definitions
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Why to study the Internet delay space dimensionality?
Question
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Network embedding (GNP, Vivaldi)― Models the network as contained in a
vector space
― Estimate real distances with geometric distances
Elegant, compact, relative success
BUT they suffer from ― inherent embedding distortion
― disappointing accuracy [Lua et al., IMC’05][Ledlie-Gardner-Seltzer, NSDI’07]
Coordinate-based positioning systems
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Meridian [Wong et al. SIGCOMM’04]
Iplane [Madhyastha et al. OSDI’06]
Relatively more accurate
BUT
― Measurement intensive
― Strong scaling assumptions
Measurement based positioning systems
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Dimensionality of target space is a tunable parameter
― Critical parameter influencing accuracy, convergence, stability [Dabek et al., SIGCOMM’04] [Ledlie-Gardner-Seltzer, NSDI’07]
Question: What is the optimal value?
― too low: high distortion
― too high: inefficiency
Motivation 1: Coordinate-based positioning systems
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Prior work: delay space can be embedded into an 5- to 9-dimensional Euclidean space with “reasonably” low distortion [Ng-Zhang’01; Tang-Crovella’03]
Is this value optimal?
Invariant with scaling?
― For worst-case metric space, dimensionality increases logarithmic with the cardinality of the metric [Bourgain 75]
Motivation 1: Coordinate-based positioning systems
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Scaling assumptions underlying Meridian
― Latency space is a metric of bounded doubling dimension D
Motivation 2: Measurement-based positining systems
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Question 1: What properties of the Internet contribute to its dimensionality? And to what extent?
Question 2: Is the Internet delay space dimensionality homogeneous or it is made up of many lower-dimensional pieces?
― Hierarchical embedding [Zhang ‘06]
Motivation 3: Internet properties
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
How to generate synthetic realistic delay spaces?
Previous work [Zhang, IMC’06]
― Statistical properties preserved
Generative models?
Motivation 4: Synthetic delay space generation
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Part I: Studies different methods for characterizing the Internet delay space geometry
Part II: Demonstrates how to use these tools for predicting dimensionality shifts cause by structural properties
Outline
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Uses raw measurements collected via the King method by Meridian (5200 hosts) and P2PSim (1953 hosts)― Filter out pairs with < 10 measurements total in both
directions
― Latency = median of measurements in both directions
― Eliminate ambiguity which arises from missing values
― Approximate the largest clique with the remaining pairs
― After filtering
―Meridian: 2385 hosts
―P2PSim: 298 hosts
Datasets 1/2
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Major limitation is availability of datasets
Other publicly available datasets: Dimes, Planetlab, DS2, King (Harvard) , …― Small cliques, collected over long periods of time
Combined with geolocation obtained by querying hostip.info
Datasets 2/2
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Meridian geolocation visualization
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
QuestionHow to estimate the Internet Delay Space dimensionality?
What methods were applied in prior work? What are their assumptions?
― data can be approximately embedded into a low-dimensional Euclidean space
― the distance matrix can be accurately approximated by a low-rank matrix
Part I: Dimensionality measures
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Embedding Dimension
Benefit: recovers coordinates of pointsSuggests a dimensionality value between 4 and 7
The dimensionality can be estimated using an embedding algorithm, such as Vivaldi
Meridian
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Embedding Dimension Shortcomings (1/2)
Embeddings using 8D and beyond are worse than 7D!
Curse of dimensionality: embedding algorithm is overwhelmed with so many degrees of freedom
Meridian
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Existing network embedding algorithms are suboptimal and slow to converge!
Finding an embedding that minimizes distortion is intractable [Matoušek-Sidiropoulos ‘08]
Fails if the measured distances reflect a metric other than Euclidean distance
Algorithm often produces lots of empty space, unnecessarily inflating the estimate of dimensionality
Embedding Dimension Shortcomings (2/2)
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Approximates the distance matrix by a low-rank matrix
Rotates the axes so that data variance is better captured by the components
Principal Component Analysis
Meridian
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Assumes that the dimensions are independent― e.g. surface of a sphere is reported as a 3D object
The dimensionality cut-off is non-obvious to determine
Principal Component Analysis
?
??
Meridian
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Question: How to avoid assuming a specific host metric space or assuming linearity?
If a dataset exhibits power-law behavior in its statistical or structural properties, one can model and measure its dimensionality using Fractal Measures
Intrinsic notion of dimensionality
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Power-law behavior over two decimal orders of magnitude
Intrinsic dimensionality metrics
Correlation Fractal Dimension (pair-count plot)Fractal measures indicate dimensionality values less than 2
Meridian P2PSim
Includes almost all intra-continental distances
(usec)
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Correlation Dimension
x
r
# samples within distance r is proportional to r
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Correlation Dimension
x
r
# samples within distance r is proportional to2r
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Sampling from a unit square
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
There is an infinite family of fractal dimensions, parameterized by q
Some practical dimensions are― Hausdorff Dimension― Shannon’s Entropy― Correlation Dimension
For manifolds, i.e., spaces locally modeled on , the fractal dimension coincides with d
Fractal Dimension Family
0D
1D
2D
d
Easy to measure
Sensitive to non-linear behavior
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
How much does the embedded structure deviate from the original space?
Correlation Dimension of Embedded Matrix
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Question: What doe the fractal behavior implies about the Internet? Is it self-similar? Recursive?
― [Li, Alderson, Willinger, Doyle, 2004]
― [Mitzenmacher, 2004]
Fractal behavior may also arise from non-recursive structures
― e.g., snowflakes, coastlines, surface of the human brain, etc.
Fractals and Internet Models
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Questions
What features contribute to the delay space behavior? To what extent?
Why is it useful to study the Internet through the lens of the fractal measures?
Part II: Dimensionality Components
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Power-law extending over two orders of magnitude.
Geolocation does not account for the whole delay space dimensionality, although it is a strong component
Geographic Location
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Transit links between Tier-1 AS’es contained on a significant fraction of Internet routesDecomposition into overlapping subsets, each rooted at a Tier-1 network
Dimensionality Reducing Decomposition
Notice: not a partition, due to multihomed networks
Superposition of these pieces may inflate the dimensionality
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Approximation to ground truth decomposition using a snapshot of the inferred topology [Oliveira et al., SIGMETRICS’08]
Dimensionality Reducing Decomposition
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
We measured each piece separately using― Embedding dimension
― PCA
― Correlation dimension
― Hausdorff dimension (see paper)
Dimensionality Reducing Decomposition
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Dimensionality Reducing Decomposition Results
Embedding dimension and PCA were insensitive to decomposition
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Dimensionality Reducing Decomposition Results
Fractal measures capture the structural change and report a dimensionality reduction
Power-law is preserved over the same range for subsets, however, with different exponents
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Is the reduction in dimensionality due to other side effects of the decomposition?― Pieces of smaller diameter?
―Decompose the network according to geographical and latency-based clustering
Sanity Check 1
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Diverts the set from the power-law behavior
Distance-based Clustering
• Not comparable to topology-based decomposition
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Is the reduction in dimensionality due to other side effects of the decomposition?― Pieces of smaller cardinality [Bourgain’ 75] ?
―Generated 139 random subnetworks with smaller cardinality (i.e., the median of Tier-1 subnetwork the sizes)
Sanity Check 2
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
Random subsets with lower cardinality
• Random subsets of lower cardinality do not explain the dimensionality reduction
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Fractal tool reports a dimensionality shift which was not captured by Embedding Dimension or PCA
• What explain the discrepancy?
Dimensionality Paradox
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Dimensionality reducing decomposition is real and not an artifact of the fractal methodology
Isomap
Space is rich in non-linearity
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Dimensionality is critical aspect influencing performance of algorithms
• Fractal nature and non-linear behavior: Internet delay space dimensionality better characterized by fractal measures― Lightweight
― Sensitive to Internet features and structural changes
• Properties influencing dimensionality • Internet delay space dimensionality is not homogeneous
Conclusion
Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality
• Inetdim Project Website― King datasets annotated with IP addresses, code, more info
― http://www.cs.cornell.edu/~abrahao/inetdim
Further Info