Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

34
Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala

Transcript of Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

Page 1: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

Improving Internet Availabilitywith Path Splicing

Murtaza MotiwalaNick Feamster

Santosh Vempala

Page 2: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

2

“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be:

1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.

2. …

Availability

Page 3: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

3

Availability of Other Services

• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability

• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability

• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability

Page 4: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

4

Can the Internet Be “Always On”?

• Various studies (Paxson, etc.) show the Internet is at about 2.5 “nines”

• More “critical” (or at least availability-centric) applications on the Internet

• At the same time, the Internet is getting more difficult to debug– Increasing scale, complexity, disconnection, etc.

Is it possible to get to “5 nines” of availability?If so, how?

Page 5: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

5

Availability: Two Aspects

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– If two nodes s and t remain connected in the

underlying graph, there is some sequence of hops in the routing tables that will result in traffic

• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path

Page 6: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

6

Where Today’s Protocols Stand

• Reliability: Routing protocols are single path.– When a link or node failure occurs, routers must

recompute new paths to each destination– Approach: Compute backup paths– Challenge: Many possible failure scenarios!

• Recovery: Today’s Internet routing protocols– Meanwhile, packets are dropped, reordered, etc.– Approach: Switch to a backup when a failure occurs– Challenge: Must quickly discover a new working path

Page 7: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

7

Multipath: Promise and Problems

• Bad: If any link fails on both paths, s is disconnected from t

• Want: End systems remain connected unless the underlying graph has a cut

ts

Page 8: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

8

Path Splicing: Main Idea

• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration

• Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol

ts

Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.

Page 9: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

9

Outline• Path Splicing

– Achieving Reliabile Connectivity• Mechanism #1: Random Perturbations• Mechanism #2: Network Slicing

– Forwarding– Recovery

• Properties– High Reliability– Bounded Stretch– Fast recovery

• Ongoing Work

Page 10: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

10

Mechanism #1: Perturbations

• Goal: Each instance provides different paths• Mechanism: Each edge is given a weight that is

a slightly perturbed version of the original weight– Two schemes: Uniform and degree-based

ts

3

3

3

“Base” Graph

ts

3.5

4

5 1.5

1.5

1.25

Perturbed Graph

Page 11: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

11

How to Perturb the Link Weights?

• Uniform: Perturbation is a function of the initial weight of the link

• Degree-based: Perturbation is a linear function of the degrees of the incident nodes– Intuition: Deflect traffic away from nodes where traffic

might tend to pass through by default

Page 12: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

12

Mechanism #2: Network Slicing

• Goal: Allow multiple instances to co-exist• Mechanism: Virtual forwarding tables

a

t

c

s b

t a

t c

Slice 1

Slice 2

dst next-hop

Page 13: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

13

Forwarding Traffic

• Packet has shim header with forwarding bits

• Routers use lg(k) bits to index forwarding tables– Shift bits after inspection

• To access different (or multiple) paths, end systems simply change the forwarding bits– Incremental deployment is trivial– Persistent loops cannot occur

Page 14: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

14

Putting It Together

• End system sets forwarding bits in packet header• Forwarding bits specify slice to be used at any hop• Router: examines/shifts forwarding bits, and forwards

ts

Page 15: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

15

A Definition Motivated by Reliability

• Reliability: the probability that, upon failing each edge with probability p, the graph remains connected

• Reliability curve: the fraction of source-destination pairs that remain connected for various link failure probabilities p

• The underlying graph has an underlying reliability (and reliability curve)– Goal: Reliability of routing system should approach that of the underlying graph.

Page 16: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

16

Reliability Curve: Illustration

Probability of link failure (p)

Fraction of source-dest pairs disconnected

Better reliability

More edges available to end systems -> Better reliability

Page 17: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

17

Reliability Approaches Optimal• Sprint (Rocketfuel) topology• 1,000 trials• p indicates probability edge was removed from base graph

Reliability approaches optimal

Average stretch is only 1.3

Sprint topology,degree-based perturbations

Page 18: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

18

Recovery is Fast

• Which paths can be recovered within 5 trials?– Sequential trials: 5 round-trip times– …but trials could also be made in parallel

Recovery approaches maximum possible

Adding a few more slices improves recovery beyond best possible reliability with fewer slices.

Page 19: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

19

Stretch is Bounded

• Stretch: How much longer is the path taken by packets over the “optimal” path?– Stretch is bounded in one slice by amount of perturbation– …but what about the stretch of spliced paths?– As long as “significant progress” (a large fraction of the

distance to d) is achieved for each hop, stretch bounded

Implication: Loops are rare.

Page 20: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

20

Summary: Splicing Improves Availability

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– Approach: Overlay trees generated using random link-weight

perturbations. Allow traffic to switch between them.– Result: Splicing ~ 10 trees achieves near-optimal reliability

• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path– Approach: End nodes randomly select new bits.– Result: Recovery within five trials approaches best possible.

Page 21: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

21

Open Questions and Future Work

• How does splicing interact with traffic engineering? Sources controlling traffic?

• What are the best mechanisms for reliability and recovery?

• What changes are required to today’s routers to make splicing possible?

• Can splicing eliminate dynamic routing?

Page 22: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

22

Variation: BGP Splicing• Observation: Many routers already learn multiple

alternate routes to each destination.• Idea: Use the forwarding bits to index into these

alternate routes at an AS’s ingress and egress routers.

• Storing multiple entries per prefix • Indexing into them based on packet headers• Selecting the “best” k routes for each destination

Required new functionality

ddefault

alternate

Splice paths at ingress and egress routers

Page 23: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

23

Conclusion• Simple: Forwarding bits provide access to

different paths through the network

• Scalable: Exponential increase in available paths, linear increase in state

• Stable: Fast recovery does not require fast routing protocols

• No modifications to existing routing protocols

http://www.cc.gatech.edu/~feamster/papers/splicing-hotnets.pdf

Page 24: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

24

Page 25: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

25

History: Network Embedding

• Given: virtual (V) and physical (P) network– Topology, constraints, etc.

• Problem: find the appropriate mapping onto available physical resources (nodes and edges)

• Idea: Define a virtual graph G’ onto which G can be embedded

• A link in G can be mapped to multiple links in G’• How to forward traffic over multiple links in G’?• …

Page 26: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

26

Possible Applications/Future Work

• Fast recovery from poorly performing paths

• Data transfer with easy multi-path– Overlay networks, CDNs, etc.– Transfer of video with multiple description

• Security applications

• Spatial diversity in wireless networks

Page 27: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

27

Significant Novelty for Modest Stretch

• Novelty: difference in nodes in a perturbed shortest path from the original shortest path

Example

s d

Novelty: 1 – (1/3) = 2/3

Fraction of edges on short path shared with long path

Page 28: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

28

Related Work

• Pre-Computed Backup Paths– Multi-Topology Routing– Multiple Router Configuration– MPLS Fast Reroute

• End-Node Controlled Traffic– Source routing– Routing deflections

• Multipath routing (ECMP, MIRO, etc.)• IGP link-weight optimization• Measurement of path diversity and multihoming• Layer-3 VPNs

Page 29: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

29

Other Properties• Scalable

– Exponential increase in paths, linear increase in state

• Fast recovery from underlying failures

• Automatic tuning (e.g., for traffic engineering)– Perturbations achieve property of automatically spreading

traffic across different links– Standard link-weight optimization is potentially brittle in the

face of link failures

• Incrementally deployable

Page 30: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

30

Prototype Implementation

• Click and Quagga on PL-VINI– http://www.vini-veritas.net/

Control Plane

ForwardingTable

Daemon

Classifier

Control Plane

ForwardingTable

Daemon

Page 31: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

31

Loops, Reconsidered

• Problem: Potential for loops between ASes– AS-level loops can be longer than intra-AS loops

• Two possible approaches– Detection: routers mark packets and determine that

packets have traversed the same AS twice– Prevention: Exploit “common” routing policies to

ensure that packets are only deflected along valley-free paths

Page 32: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

32

Preventing Inter-AS Loops with Policy

Observation: inter-AS loops inherently involve traversal that violates valley-free

Constraints: 1. once a “down” deflection has occurred, do not deflect 2. only allow one “across” deflection

Possible relaxation: allow a limited number of violations, specified by source

Page 33: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

34

Definitions of Path Diversity

• Connectivity: Minimum number of edges whose failure disconnects the graph (min cut)

• Expansion: Intuitively, small cuts disconnect small groups of nodes from the graph

Page 34: Improving Internet Availability with Path Splicing Murtaza Motiwala Nick Feamster Santosh Vempala.

35

Design Goals

• Reachability: allow endpoints to communicate • High Diversity: expose paths to end hosts that survive

failures– Capacity: the total available data rate between each source-

destination pair should be high– Fault tolerance: the number of disjoint paths should be high,

and the network should remain connected under failures

• Low Stretch: paths should not be too circuitous• Scalability: scale to a large number of networks,

destinations, routers, etc.

Today’s routing protocols do not exploit the diversity of the underlying network graph