BGP Safety with Spurious Updates The Conditions of BGP Convergence

58
BGP Safety with Spurious Updates The Conditions of BGP Convergence Martin Suchara in collaboration with: Alex Fabrikant and Jennifer Rexford January 12, 2011

description

BGP Safety with Spurious Updates The Conditions of BGP Convergence. Martin Suchara in collaboration with: Alex Fabrikant and Jennifer Rexford. Local Control vs. Global Properties. The Internet is a “network of networks” ~35,000 independently administered ASes - PowerPoint PPT Presentation

Transcript of BGP Safety with Spurious Updates The Conditions of BGP Convergence

Page 1: BGP Safety with Spurious Updates The Conditions of BGP Convergence

BGP Safety with Spurious UpdatesThe Conditions of BGP Convergence

Martin Suchara

in collaboration with:Alex Fabrikant and

Jennifer Rexford

January 12, 2011

Page 2: BGP Safety with Spurious Updates The Conditions of BGP Convergence

2

Local Control vs. Global Properties The Internet is a “network of networks”

~35,000 independently administered ASes Competitive cooperation to find routes

Local ControlInterdomain policies, Intradomain routing

Global PropertiesStability, reliability, performance, etc.

Page 3: BGP Safety with Spurious Updates The Conditions of BGP Convergence

3

Interdomain Routing Autonomous systems (AS) have different goals

Different views on which path is best

Interdomain routing: agree on a set of paths

11 22

33

66

44

client

Prefer 5346 to 5326

Has to use : 5 → 3 → 2 → 6

Prefer 326 to 346

55

Page 4: BGP Safety with Spurious Updates The Conditions of BGP Convergence

4

The Border Gateway Protocol (BGP) ASes exchange information about paths

Policy configurations provided by AS operators

Path selection: which path do I choose?

Path export: which neighbors do I tell?

d

2211

Data traffic

“I can reach d via AS 3”

“I can reach d”

33

Page 5: BGP Safety with Spurious Updates The Conditions of BGP Convergence

5

Business Driven Policies of ASes

Peer-Peer Relationship

Export only customer routers to a peer

Export peer routes only to customers

Customer-Provider Relationship

Provider exports its customer’s routes to everybody

Customer exports provider’s routes only to downstream customers

Page 6: BGP Safety with Spurious Updates The Conditions of BGP Convergence

6

BGP Safety Challenges 35,000 ASes and 300,000 address blocks

Flexible AS policies

Routing convergence usually takes minutes

But the system does not always converge…

0

1 2

d

Prefer 120 to 10

Prefer 210 to 20

Use 20Use 10Use 120

Use 210

Page 7: BGP Safety with Spurious Updates The Conditions of BGP Convergence

7

Results on BGP Safety

Necessary or sufficient conditions of safety (Gao and Rexford, 2001), (Gao, Griffin and Rexford, 2001), (Griffin, Jaggard and Ramachandran, 2003), (Feamster, Johari and Balakrishnan, 2005), (Sobrinho, 2005), (Fabrikant and Papadimitriou, 2008), (Cittadini, Battista, Rimondini and Vissicchio, 2009), …

Safety verification important to network operators

Absence of a “dispute wheel” sufficient for safety (Griffin, Shepherd, Wilfong, 2002)

Page 8: BGP Safety with Spurious Updates The Conditions of BGP Convergence

8

Models of BGP Existing models (variants of SPVP)

Widely used to analyze BGP properties Simple but do not capture spurious

behavior of BGP

This work A new model of BGP with spurious updates Spurious updates have major consequences More accurate model makes proofs easier!

Page 9: BGP Safety with Spurious Updates The Conditions of BGP Convergence

9

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 10: BGP Safety with Spurious Updates The Conditions of BGP Convergence

10

SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)

12010ε

Permitted paths

Network topology

2

0

1

The higher the more preferred

21020ε

The destination

Always includes the empty path

Page 11: BGP Safety with Spurious Updates The Conditions of BGP Convergence

11

SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)

12010ε

0

1

21020ε

Selected paths

2

Page 12: BGP Safety with Spurious Updates The Conditions of BGP Convergence

12

SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)

12010ε

0

21020ε

Activation21

Activation models the processing of BGP update messages sent by neighbors

Vertex or edge activations

Page 13: BGP Safety with Spurious Updates The Conditions of BGP Convergence

13

SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)

12010ε

0

21020ε

Vertex or edge activations

Switch to best available

2 Activation1

Activation models the processing of BGP update messages sent by neighbors

Page 14: BGP Safety with Spurious Updates The Conditions of BGP Convergence

14

SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)

12010ε

0

21020ε

System is safe if all “fair” activation sequences lead to a stable path assignment

Switch to best available

2 Activation1

Page 15: BGP Safety with Spurious Updates The Conditions of BGP Convergence

15

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 16: BGP Safety with Spurious Updates The Conditions of BGP Convergence

16

What are Spurious Updates? A phenomenon: router announces a route

other than the highest ranked one

Spurious BGP update 230:

Selected path: 20

Behavior not allowed in SPVP

0

1 2

3

123010

30

21020230

230

Page 17: BGP Safety with Spurious Updates The Conditions of BGP Convergence

17

What Causes Spurious Updates?

1. Limited visibility to improve scalability Internal structure of ASes Cluster-based router architectures

2. Timers and delays to prevent instabilities and reduce overhead Route flap damping MRAI timers Grouping updates to priority classes Finite size message queues in routers

Page 18: BGP Safety with Spurious Updates The Conditions of BGP Convergence

18

Cause 1 – Limited Visibility

r1

r2

r3BA

r3

The internal structure of ASes improves scalability while reducing visibility

After route r1 is withdrawn, router B temporarily announces r3

r3r1

ri

RR

A

Route reflector

Router

Route announcement

Autonomous system (AS)

r1r2

RR

1 2 3 4

r1

r2

r3

5

Page 19: BGP Safety with Spurious Updates The Conditions of BGP Convergence

19

Cause 1 – Limited Visibility

r1

r2

r3BA

r3

The internal structure of ASes improves scalability while reducing visibility

After withdrawals of routes r1 and r3, router B temporarily withdraws the route

ri

A

Route reflector

Router

Route announcement

Autonomous system (AS)

r1r2

ε

RRRR

1 2 3 4

5

Page 20: BGP Safety with Spurious Updates The Conditions of BGP Convergence

20

Cause 2 – Delays Route flap damping temporarily suppresses all

routes learned from a neighbor

After the update r2→r1 the less preferred route r3

is temporarily selected

r1

r2

r3

r2r3

A

r3r2

Autonomous system (AS)

→r1

ri

A Router

Route announcement

1 2 3

r1

r2

r3

4

Page 21: BGP Safety with Spurious Updates The Conditions of BGP Convergence

21

DPVP– A More General Model of BGP DPVP = Dynamic Path Vector Protocol

Generalizes the earlier model (SPVP) Spurious update with a less preferred route

that was recently available

Spurious updates allowed in transient period τ after last route change

Safety is independent of numerical value τ

Page 22: BGP Safety with Spurious Updates The Conditions of BGP Convergence

22

DPVP– A More General Model of BGP

12010ε

The permitted paths and their ranking

2

0

1

20

21020ε

Spurious update

Selected path: 210

Spurious updates are allowed only if current time < StableTime

Spurious updates may include paths that were recently available or the empty path

Remember all recently available paths (e.g. 20, 210)

StableTime = τ after last path change

Page 23: BGP Safety with Spurious Updates The Conditions of BGP Convergence

23

DPVP– A More General Model of BGP Behavior captured irrespective of cause

Simple future-proof model independent of underlying network technologies

For every allowed spurious behavior in DPVP we can find a possible cause Details in our technical report:

TR-881-10, Dept. of Comp. Sci., Princeton, July 2010

www.cs.princeton.edu/~msuchara/publications.html

Page 24: BGP Safety with Spurious Updates The Conditions of BGP Convergence

24

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 25: BGP Safety with Spurious Updates The Conditions of BGP Convergence

25

Consequences of Spurious Updates Spurious behavior is temporary

Tempting to conclude that it cannot have long term consequences

The surprise: spurious behavior may trigger permanent oscillations!

Page 26: BGP Safety with Spurious Updates The Conditions of BGP Convergence

26

The Surprise: Spurious Announcements Trigger Permanent Oscillations!

Permanent oscillations with spurious behavior

Safe instance in all classical models of routing:

13010

21020

321032031030

1

3

0

2

Stable outcome

Page 27: BGP Safety with Spurious Updates The Conditions of BGP Convergence

27

Example of Oscillation

13010

21020

321032031030

ε

2

1

3

0

B

A

RR

Autonomous system (AS)

1

RR

A

1

Route reflector

Router

AS activation

Route announcement

Page 28: BGP Safety with Spurious Updates The Conditions of BGP Convergence

28

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 29: BGP Safety with Spurious Updates The Conditions of BGP Convergence

29

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

ε210

210

Autonomous system (AS)Route reflector

Router

AS activation

Route announcement

ε

ε

1

A

1

RR

RR

Did not learn about withdrawal of 210 yet

Does not know any route

Page 30: BGP Safety with Spurious Updates The Conditions of BGP Convergence

30

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

ε210

210

Autonomous system (AS)Route reflector

Router

AS activation

Route announcement

ε

ε

1

A

1

RR

RR

Did not learn about withdrawal of 210 yet

Does not know any route

Page 31: BGP Safety with Spurious Updates The Conditions of BGP Convergence

31

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

ε210

210

Autonomous system (AS)Route reflector

Router

AS activation

Route announcement

ε

ε

1

A

1

RR

RR

Did not learn about withdrawal of 210 yet

Does not know any route

Page 32: BGP Safety with Spurious Updates The Conditions of BGP Convergence

32

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

2020

20

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 33: BGP Safety with Spurious Updates The Conditions of BGP Convergence

33

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 34: BGP Safety with Spurious Updates The Conditions of BGP Convergence

34

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 35: BGP Safety with Spurious Updates The Conditions of BGP Convergence

35

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 36: BGP Safety with Spurious Updates The Conditions of BGP Convergence

36

Example of Oscillation

13010

21020

321032031030

2

1

3

0

B

A

Autonomous system (AS)Route reflector

Router

AS activation

Route announcementε

1

A

1

RR

RR

Page 37: BGP Safety with Spurious Updates The Conditions of BGP Convergence

37

Consequences of Spurious Updates Temporary behavior may cause permanent

oscillations

The number of oscillating nodes and / or frequency of oscillations may increase

Which results do not hold in the new model?

Page 38: BGP Safety with Spurious Updates The Conditions of BGP Convergence

38

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 39: BGP Safety with Spurious Updates The Conditions of BGP Convergence

39

Which SPVP Results Hold in DPVP? Most previous results in SPVP also hold for

DPVP Formal justification later in the talk

Some results cannot be extended Slightly different conditions of convergence Exponentially slower convergence possible

Page 40: BGP Safety with Spurious Updates The Conditions of BGP Convergence

40

Case Study IDifferent Conditions of Convergence Safety under filtering: is instance safe under any

filtering?Filter arbitrary subset of routes

3

Absence of a “dispute reel” necessary and sufficient for safety under filtering in SPVP (Cittadini et al., 2009)

Our result: permanent oscillations in DPVP even without a reel

Subgraph with specific properties

Page 41: BGP Safety with Spurious Updates The Conditions of BGP Convergence

41

Case Study I Different Conditions of Convergence Example of a “safe” topology, Cittadini et al.:

Spurious updates cause oscillations

132010130

321030320

213020210

210

320

130

2

3

0

1

Page 42: BGP Safety with Spurious Updates The Conditions of BGP Convergence

42

Case Study IIExponential Slowdown of Convergence BGP converges in 2l + 2 phases

(Sami et al., 2009)

l: length of longest customer-provider chain Phase: each node processes and sends

updates Assumes standard business relationships

With spurious updates exponential slowdown to (2k + 1)l-2 phases

k: max. # of spurious updates after route change

Page 43: BGP Safety with Spurious Updates The Conditions of BGP Convergence

43

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 44: BGP Safety with Spurious Updates The Conditions of BGP Convergence

44

Convergence Conditions

Absence of a “dispute wheel” is still sufficient for safety in DPVP Most of the previous results of the past

decade still hold under DPVP!

Absence of a “dispute wheel” sufficient for safety in SPVP (Griffin, Shepherd, Wilfong, 2002)

One of the most cited results

Other stronger results in DPVP next

Page 45: BGP Safety with Spurious Updates The Conditions of BGP Convergence

45

Why are Proofs Easier in DPVP? No need to prove that:

Announced route is the highest ranked one Announced route is the last one learned from

the downstream neighbor

Next the necessary and sufficient conditions

We changed the problem PSPACE complete vs. NP complete

Page 46: BGP Safety with Spurious Updates The Conditions of BGP Convergence

46

Necessary and Sufficient Conditions How can we prove a system may oscillate?

Classify each node as “stable” or “coy” At least one “coy” node exists Prove that “stable” nodes must be stable Prove that “coy” nodes may oscillate

Next: a formal definition of a construction that captures this intuition

Easy in a model with spurious announcements

Page 47: BGP Safety with Spurious Updates The Conditions of BGP Convergence

47

Necessary and Sufficient Conditions

Coy nodes may make spurious announcements

Stable nodes have a permanent path

Theorem: DPVP oscillates if and only if it has a CoyOTE

Definition: CoyOTE is a triple (C, S, Π) satisfying several conditions

One path assigned to each node proves if the node is coy or stable

0

1 2

3

123010

30

21020230

Page 48: BGP Safety with Spurious Updates The Conditions of BGP Convergence

48

Necessary and Sufficient Conditions

Definition: CoyOTE satisfies these conditions:

0

1 2

3

123010

30

21020230

= coy

= stable

1) The best stable path assigned to each stable node

2) Coy node is assigned a coy path:

• more preferred than the best stable path

• consistent with the paths of stable nodes

3) Origin is stable

Page 49: BGP Safety with Spurious Updates The Conditions of BGP Convergence

Verifying the Convergence Conditions = Finding a CoyOTE In general an NP-hard problem

Compact inputs with regular expressions

Can be checked in polynomial time for most “reasonable” network configurations!

49

e.g.

Page 50: BGP Safety with Spurious Updates The Conditions of BGP Convergence

50

DeCoy – Safety Verification Algorithm Goal: verify safety in polynomial time

Main idea: find the maximal stable set S by expanding it in a greedy fashion

If all nodes are stable system is safe

Otherwise system may oscillate

Page 51: BGP Safety with Spurious Updates The Conditions of BGP Convergence

51

DeCoy – Safety Verification Algorithm

0

1 2

3

123010

30

21020230

= coy= stable

Goal: verify safety in polynomial time

Initially the destination is the only stable node; it has a path to itself

Page 52: BGP Safety with Spurious Updates The Conditions of BGP Convergence

52

DeCoy – Safety Verification Algorithm

0

1 2

3

123010

30

21020230

Goal: verify safety in polynomial time

Find a coy node where its most preferred path consistent with the paths of all stable nodes is a stable path

= coy= stable

Page 53: BGP Safety with Spurious Updates The Conditions of BGP Convergence

53

DeCoy – Safety Verification Algorithm

0

1 2

3

123010

30

21020230

Goal: verify safety in polynomial time

Assign this path to the node and mark the node as stable

Keep expanding the stable region until we get stuck

= coy= stable

Find a coy node where its most preferred path consistent with the paths of all stable nodes is a stable path

Page 54: BGP Safety with Spurious Updates The Conditions of BGP Convergence

54

DeCoy – Safety Verification Algorithm Goal: verify safety in polynomial time

Theorem: if all nodes are added to the stable set the system is safe. Otherwise it is not safe.

Open question: find a distributed version of the algorithm that preserves privacy of the nodes

Business relationships are secret

Page 55: BGP Safety with Spurious Updates The Conditions of BGP Convergence

55

Overview

I. Classical model of BGP: the SPVP

III. The surprise: networks believed to be safe oscillate!

IV. The consequences: applicability of earlier results

V. Convergence conditions: polynomial time verifiable

VI. Conclusion

II. Spurious BGP updates: what are they?

Page 56: BGP Safety with Spurious Updates The Conditions of BGP Convergence

56

Conclusion DPVP: best of both worlds

More accurate model of BGP Model simplifies theoretical analysis

Key results

Page 57: BGP Safety with Spurious Updates The Conditions of BGP Convergence

57

Thank You!

Page 58: BGP Safety with Spurious Updates The Conditions of BGP Convergence

58