Peter Lothberg +1 703 864 7887 · z24 z32 z36 global i-bgp. ... r2 g2 b3r3 g3 b2 x1 x2 x4x3 x5 x6...

74
http://radweb SPRINTLINK IP NETWORK Peter Lothberg <[email protected]> +1 703 864 7887

Transcript of Peter Lothberg +1 703 864 7887 · z24 z32 z36 global i-bgp. ... r2 g2 b3r3 g3 b2 x1 x2 x4x3 x5 x6...

http://radweb

SPRINTLINK IP NETWORK

Peter Lothberg<[email protected]>+1 703 864 7887

10/10/2002http://radweb

2

David Meyer (DMM)

10/10/2002http://radweb

3

SprintLink and the Philosophy of Building Large Networks

David MeyerChief Technologist/Senior Scientist

[email protected] 10, 2002

10/10/2002http://radweb

4

Agenda

Philosophy -- How We Build Networks SprintLink Architecture/CoverageWhat is all of this MPLS talk about?Putting it all Together− Network Behavior in a Couple Failure Scenarios

IPv6FutureClosing/Q&A

10/10/2002http://radweb

5

Build Philosophy

Simplicity Principle− “Some Internet Architectural Guidelines and

Philosophy”, draft-ymbk-arch-guidelines-05.txt Use fiber plant− To efficiently provision robust paths− “1:1 Protection Provisioning”

And remember that the job of the core is to move packets, not inspect or rewrite them.− Zero Drop, Speed-of-Light-like Latency, Low Jitter− Side-effect of provisioning approach

10/10/2002http://radweb

6

Support Philosophy

Three S’s− Simple

NOC Staff can operate it

− SaneDon’t have to be a PhD to understand and troubleshoot the routing

− SupportableIf it takes twelve hours to figure out what’s wrong, something isn't right..

If upgrading means re-thinking and redesigning the whole support process, something is likely broken

10/10/2002http://radweb

7

Aside: System Complexity

Complexity impedes efficient scaling, and hence is the primary driver behind both OPEX and CAPEX (Simplicity Principle)

Complexity in systems such as the Internet derives from scale and from two well-known properties from non-linear systems theory:− Amplification− Coupling

10/10/2002http://radweb

8

Amplification Principle

In very large system, even small things can (and do) cause huge events− Corollary: In large systems such as the

Internet, even small perturbations on the input to a process can destabilize the system’s output

− Example: It has been shown that increased interconnectivity results in more complex and frequently slower BGP routing convergence

“The Impact of Internet Policy and Topology on Delayed Routing Convergence”, Labovitz et. Al, Infocom, 2002Related: “What is the sound of One Route Flapping”, Timothy Griffin, IPAM Workshop on Large Scale Communication Networks, March, 2002

10/10/2002http://radweb

9

Coupling Principle

As systems get larger, they often exhibit increased interdependence between components− Corollary: The more events that simultaneously

occur, the larger the likelihood that two or more will interact

− Unforeseen Feature Interaction“Robustness and the Internet: Design and Evolution”, Willinger et al.

Example: Slow start synchronization

10/10/2002http://radweb

10

Example: The Myth of 5 Nines

80% of outages caused by people and process errors [SCOTT]. Implies that at best you have a 20% window in which to work on components

In order to increase component reliability, we add complexity (optimization), effectively narrowing the 20% window

i.e., in the quest for increased robustness, you increase the likelihood of people/process failures

10/10/2002http://radweb

11

Example: The Myth of 5 Nines

The result is a Complexity/Robustness Spiral, in which increases in system complexity create further and more serious sensitivities, which in turn require additional robustness, … [WILLINGER2002]

Keeping in mind that we can always do better…

What does this say about all of the router HA work?

10/10/2002http://radweb

12

Aside: System Complexity

Bottom Line: We must manage complexity closely or complexity will quickly overwhelm all other facets of a system − “Some Internet Architectural Guidelines and Philosophy”,

Randy Bush and David Meyer, draft-ymbk-arch-guidelines-05.txt, August, 2002

Currently in the RFC editor’s queue− “Complexity and Robustness”, Carlson, et. al., Proceedings of

the National Academy of Science, Vol. 99, Suppl. 1, February, 2002

See me if you’d like additional literature for your spare time :-)

10/10/2002http://radweb

13

Physical Topology Principle

S L B BR o u te r

S L B BR o u te r

S L B BR o u te r

S L B BR o u te r

A F ib e r P a th

B F ib e r P a th

DWDM

B S ys te m

A S ys te m

DWDM

DWDM

B S y s te m

A S y s te m

DWDM

DWDM

B S ys te m

A S ys te m

DWDM

DWDM

B S ys te m

A S ys te m

DWDM

DWDM

B S ys te m

A S ys te m

DWDM

DWDM

B S y s te m

A S y s te m

DWDM

DWDM

B S ys te m

A S ys te m

DWDM

DWDM

B S y s te m

A S y s te m

DWDM

10/10/2002http://radweb

14

POP Design 2001 – 6 Core Routers

OC192s(POS)

OC12 SRP RING(DPT)

OC48s (POS)

OC192s

PeeringWAN

Data Centers

PeeringWAN

Data Centers

10/10/2002http://radweb

15

PeeringWAN

Data Centers

PeeringWAN

Data Centers

PeeringWAN

Data Centers

PeeringWAN

Data Centers

OC12 SRP RING(DPT)

OC48s (POS)

OC192s(POS)OC192

POP Design 2001 – 8 Core Routers

10/10/2002http://radweb

16

POP Design 2002 – 9 Core Routers

http://radweb

POP APOP A

POP FPOP F

POP EPOP E

POP DPOP D

POP CPOP C

POP BPOP B

Backbone Principle

10/10/2002http://radweb

18

G1R1

B1

B2

R3 R2

G2B3

G3

X1X2

X4X3

Y1 Y1Y1Y1

Z3Z2

Z1

Z4

Backbone Route Reflectors

10/10/2002http://radweb

19

G1R1 B1

B2R3R2 G2 B3G3

Z8X1 X2 X4X3 Y4Y1 Y3Y2 Z3Z2Z1 Z4X5 X6 X7 X8 Z5 Z6 Z7Y6 Y7Y5 Y8

X9 X10 X12X11 Y12Y9 Y11Y10 Z11Z10Z9 Z12X13X14X15X16 Z13 Z14 Z15Y14 Y15Y13 Y16

X17 X18 X20X19 Y20Y17 Y19Y18 Z19Z18Z17 Z20X21X22X23X24 Z21 Z22 Z23Y22 Y23Y21 Y24

X25 X26 X28X27 Y28Y25 Y27Y26 Z27Z26Z25 Z28X29 X30X31X32 Z29 Z30 Z31Y30 Y31Y29 Y32

X33X34X35X36 Z33 Z34 Z35Y34 Y35Y33 Y36

Route Reflector A

Route Reflector B

Network Wide IBGP

Z16

Z24

Z32

Z36

Global I-BGP

10/10/2002http://radweb

20

G1R1 B1

B2R3R2 G2 B3G3

Z8X1 X2 X4X3 Y4Y1 Y3Y2 Z3Z2Z1 Z4X5 X6 X7 X8 Z5 Z6 Z7Y6 Y7Y5 Y8

X9 X10 X12X11 Y12Y9 Y11Y10 Z11Z10Z9 Z12X13X14X15X16 Z13 Z14 Z15Y14 Y15Y13 Y16

X17 X18 X20X19 Y20Y17 Y19Y18 Z19Z18Z17 Z20X21X22X23X24 Z21 Z22 Z23Y22 Y23Y21 Y24

X25 X26 X28X27 Y28Y25 Y27Y26 Z27Z26Z25 Z28X29 X30X31X32 Z29 Z30 Z31Y30 Y31Y29 Y32

X33X34X35X36 Z33 Z34 Z35Y34 Y35Y33 Y36

Route Reflector A

Route Reflector B

Z16

Z24

Z32

Z36

Ring 3 LAN Ring 2 LANRing 1 LAN

POP Logical Topology

10/10/2002http://radweb

21

Sprintlink POP, Relay MD (SL-BB20-RLY)

10/10/2002http://radweb

22

Traditional Access Today

L E CC O

T 1

D e d ic a te dC u s to m e r

C P ER o u te r

S p rin t P O P

L E C A D M

S P R IN T A D M(R in g X X .1 )

D W D M D W D M

S p rin t S w itc h

A D M (R in g X X .1 )

D W D M

B B D C S

W B D C S

R o u te rO C 1 2 (D S 3 S e rv ic e )

D S 3 (T 1 S e rv ic e )

D W D MS p rin t S w itc h o r P O P

A D M (R in g X X .1 )

D W D M D W D M

L E C A D M

10/10/2002http://radweb

23

3/1 DXC

BB DXC

ADM ADM 3/1 DXC

ADM

Sprint LEC

ADM

Changing access model

10/10/2002http://radweb

24

Internet Transport Node

OC192OC48OC12OC3Internet Center3rd Party Data Center

ChicagoChicago

Los AngelesLos AngelesAnaheimAnaheim

Palo AltoPalo Alto

OrlandoOrlando

MiamiMiami

San JoseSan Jose

RaleighRaleigh

RestonReston

PennsaukenPennsaukenSacramentoSacramento

Kansas CityKansas City

DallasDallas

DenverDenver

BostonBostonSpringfieldSpringfield

StocktonStockton

RoachdaleRoachdale

CheyenneCheyenne

FortWorthFort

Worth

SeattleSeattle

New YorkNew York

TacomaTacoma

AtlantaAtlanta

Relay/DCRelay/DC

To PearlCity, HITo PearlCity, HI

SecaucusSecaucus

SiliconValley

US Network (non technical slide…)

10/10/2002http://radweb

25

Brussels

Dublin

FrankfurtFrankfurt

Stockholm

Hamburg

Milan

MunichParis

Amsterdam

STM 64 (OC-192)STM16 (OC-48)STM4 (OC-12)Internet Transport NodeLanding Station

Oslo

Copenhagen

Bude

NoerreNebel

Manasquan, NJTuckerton, NJ

Relay / DC

Raleigh

Atlanta

Orlando

Springfield / Boston

New YorkPennsauken, NJ

London

EU Network

10/10/2002http://radweb

26

SL-BB20-TUK, SL-BB21-TUK

10/10/2002http://radweb

27

Tuckerton – London, STM-64

10/10/2002http://radweb

28

EU Network, Traffic

10/10/2002http://radweb

29

2002 Asia Sprint IP Backbone Network

Internet Transport NodeLanding StationFuture Location

Seoul

Taipei

Hawaii

Bandon

San Luis Obispo

To Chicago, Kansas City, New York

To Cheyenne

To Relay/DC, Chicago

To Pennsauken

PusanChinju

Tseung KwanLanta Island

Penang

Brookvale

Alexandria

Singapore

Tokyo

Hong Kong

Sydney

Stockton

SeattleTacoma

San Jose

Auckland

Spencer BeachKahe Point

Suva

Fangshan

Toucheng

Anaheim

Pt. Arena

Los Osos

Nedonna Beach

To Fort Worth

PAIXKite-IbarakiAligura

MaruyamaChikura

Shima

10/10/2002http://radweb

30

Back to Navgation BarBack to Navgation Bar

Miami (NAP of the Americas)

Caracas

Santiago

Bogota

Buenos Aires

Internet Transport NodeLanding StationFuture Location

STM1 (Protected)

STM-1 (Protected)

Central and South America Backbone Network

10/10/2002http://radweb

31

US 10 Internet Centers

RoachdaleRoachdale

OrlandoOrlando

RTPRTPAtlantaAtlanta

ChicagoChicago

SeattleSeattle

SanJoseSanJose

AnaheimAnaheim

FortWorthFort

Worth

Kansas CityKansas City

CheyenneCheyenneNew YorkNew York

TacomaTacoma

SpringfieldSpringfield

Relay/DCRelay/DCPennsaukenPennsauken

StocktonStockton

DallasDallas

NYCNYC

BostonBoston

KCKCRestonReston

RanchoRancho

Silicon ValleySilicon Valley

AtlantaAtlanta

LALA

DenverDenver

10/10/2002http://radweb

32

10+ Carrier Hotel Sites

RoachdaleRoachdale

OrlandoOrlando

RTPRTPAtlantaAtlanta

ChicagoChicago

SeattleSeattle

SanJoseSanJose

AnaheimAnaheim

FortWorthFort

Worth

Kansas CityKansas City

CheyenneCheyenneNew YorkNew York

TacomaTacoma

SpringfieldSpringfield

Relay/DCRelay/DCPennsaukenPennsauken

StocktonStockton

MiamiMiami

SprintLink Shared Tenant site SprintLink Shared Tenant site

ChicagoChicago

San JoseSan Jose

Palo AltoPalo Alto AshburnAshburn

SprintLink Shared Tenant site SprintLink Shared Tenant site

New YorkNew YorkSecaucusSecaucus

TukwillaTukwilla

DallasDallas

LALA

AtlantaAtlanta

10/10/2002http://radweb

33

SprintLink - Strengths

Homogeneous Global ArchitectureSingle AS Globally (exception: AU)IP Layer Redundancy Drives Accountability− Accountability equals Customer Service

L3/L1 Architecture from Day 1 - No False StartsSuccess at Driving New Equipment DevelopmentLeader in Peering ArchitecturesRobust Architecture Allows for Unsurpassed StabilityLead in the Introduction of Multicast TechnologyLeading SLAs via Zero Loss & Speed of Light Delays

10/10/2002http://radweb

34

Agenda -- MPLS

Brief MPLS History of the MPLS Universe...Traffic EngineeringQoSConvergence/RestorationLayer 2 Transport/VPNLayer 3 Transport/VPNProvisioningAnything Else?

10/10/2002http://radweb

35

Brief History of the MPLS Universe

This Page Intentionally Left Blank...

10/10/2002http://radweb

36

Traffic Engineering

MPLS Approach:− Off/On-line computation of CoS paths− RSVP-TE + IS-IS/OSPF-TE− Tunnel Topology− Can consider a wide variety of “metrics”

Sprintlink Approach− “1:1 Protection Provisioning”− Nice side effect: Zero loss, speed-of-light-like latency, small

jitter− Provisioning ahead of demand curve

Note demand/provisioning curve deltas

10/10/2002http://radweb

37

Demand vs. Provisioning Time Lines

10/10/2002http://radweb

38

Traffic Engineering

Aggregated Traffic in a core network (> = OC48) is “uncorrelated”, that is, not self-similar− “Impact of Aggregation on Scaling Behavior of Internet Backbone

Traffic”, Zhi-Li Zhang, Vinay Riberio, Sue Moon, Christophe Diot, Sprint ATL Technical Report TR02-ATL-020157 (http://www.sprintlabs.com/ipgroup.htm)

So you can actually provision to avoid queuing in a core network

With proper network design, you can get within 3% of optimal (utilization)− “Traffic Engineering With Traditional IP Routing Protocols”, Bernard

Fortz, Jennifer Rexford, and Mikkel ThorupSo why would you buy the complexity of MPLS-TE?

10/10/2002http://radweb

39

Aside: Self-similarity

10/10/2002http://radweb

40

Aside: Self-similarity

10/10/2002http://radweb

41

MPLS-TE and Sprintlink

Engineering Aside -- No Current Need for MPLS-TE− All Links Are Same Speed Between All Cities Domestically

(two exceptions)− 50% of bandwidth is reserved by design on every link for

protection (actually 1/n reserved…)− If there is no queuing and/or buffering, why do we need a

constraint on which packets get forwarded first.More to Follow

− We are in the business of delivering ALL packets for ALL of our customers

− Too Much State in Your Core Will Eventually Burn YouOr Your Edge for That Matter

10/10/2002http://radweb

42

QoS/CoS

MPLS Approach− MPLS in and of itself provides no QoS facilities− Diffserv-aware MPLS-TE, lots of other machinery, state in the

core, complexity

Sprintlink Approach− Congestion free core, CoS on edge (“edge QoS”, as access is

where congestion occurs− As previously mentioned, recent results show that aggregated

traffic in the core network “uncorrelated”, which means you can actually provision a core to avoid queuing

What does QoS in a core mean anyway?

http://radweb

Sprintlink Core SLA

Forwarding outages < 1sPacket loss 0.05%Packet reordering 1%RTT US 100msRTT World 380msJitter 5msBW/Delay quota 2.4G/350msMTU 4470B

10/10/2002http://radweb

44

T1 & T3 Queueing Delay

10/10/2002http://radweb

45

T1 & OC3 Queueing Delay

10/10/2002http://radweb

46

T1 & OC12 Queueing Delay

10/10/2002http://radweb

47

T1 & OC48 Queuing Delay

10/10/2002http://radweb

48

Convergence/Restoration

MPLS Approach− Fast Reroute, with various kinds of protection− O(N^2*C) complexity (C classes of service)− B/W must be available

Sprintlink approach− Simple network design− Equal cost multi-path/IS-IS improvements for sub-second

convergence− BTW, what is the (service) convergence time requirement?

Note: Recent work shows that FIB download dominates service restoration time, so...

http://radweb

02468

10121416

1998 1999 2000 2001 2002

Seconds

0.3S

Non Forwarding after topology change

10/10/2002http://radweb

50

US Network (summary)

10/10/2002http://radweb

51

US Network Detail

10/10/2002http://radweb

52

US Cross Country Link

10/10/2002http://radweb

53

L2 Transport/VPN

MPLS Approach− PWE3 consolidated approach (e.g. martini encap)− CoS/QoS Capabilities

Sprintlink Approach− L2TPv3 (UTI) + Edge QoS− Already doing (I)VPL, Ethernet, and Frame Relay

10/10/2002http://radweb

54

L3 Transport/VPN

MPLS Approach− RFC 2547 (MPLS/BGP VPN)

Sprintlink Approach− CPE Based and VR based (network based)

Interestingly, although many customers seem to be asking for 2547 VPN, there is no artifact that will allow users to distinguish between a VR VPN and a 2547 VPN− See also “Integrity for Virtual Private Routed Networks”,

Randy Bush and Tim Griffin, INFOCOMM 2003 Result: 2547 cannot provide isolation (“security”) in the multi-provider (inter-domain) case

10/10/2002http://radweb

55

Core IP Network

mp-BGP mp-BGP

MPLS/LSP through L2TPv3 Attachment Circuits

CPE orOverlay PE

CPE orOverlay PE

CPE orOverlay PE

PE

PEPE

LSP -> VCID Attachment Circuit

L2TPv3

Customer VRF

IPL2TP (VCID->LSP)

LSPVPNv4 IP

BGP D

erived Ta

rgete

d LDP

BGP Derived Targeted LDP

BGP Derived Targeted LDP

LSP

LDP Derived VCID for L2TPv3

LDP Derived VCID for L2TPv3

LDP D

erived V

CID fo

r L2T

Pv3

10/10/2002http://radweb

56

Comment on VPN “Security”

Many providers are claiming− Isolation == Security

This is the “Private network argument”− In particular, from DoS like attacks

Reality Check --> Isolation != Security− This is the Security by Obscurity argument!− On a public infrastructure...

you would have to trace the tunnel(s)end points are RFC 1918, so not globally visableand not even addressed in L2 VPN

− On “Isolated” infrastructure...

10/10/2002http://radweb

57

Isolated Infrastructure...

Well, as soon as > 1 customer, we’re no longer “isolated”

What happens when someone puts up a public internet g/w?− Appears to be some kind of false security

Isolation != Security (of any real kind)

10/10/2002http://radweb

58

Provisioning/Optical Control Planes

MPLS Approach− GMPLS or some variant (ASON)

Sprint Approach− Support the deployment of an optical layer control plane− Integration into backoffice/OSS systems still under study− Reliability/Robustness must be proven before deployment

There is, however, reason to be skeptical of optical control planes like GMPLS...

10/10/2002http://radweb

59

What is there to be skeptical about?

Well, a fundemental part of the IP architecture is “broken” (decoupled) by GMPLS− Basically, the “decoupling” means that one can no longer

assume that a control plane adjacency implies a data plane adjacency, so you need a convergence layer (RSVP-TE+LMP)

− What are the implications of this?

Aside: We know that IP doesn’t run well over a control plane that operates on similar timescales (cf. IP over ATM with PNNI)

10/10/2002http://radweb

60

MPLS – Bottom Line

If you have 5 OC48s Worth of Traffic…− You need 5 OC48s…

none of these TE or {C,Q}oS techniques manufactures bandwidth

− If the path that carries those 5 OC48s (or a subset of breaks)…− Then you better have 5 more (or that subset) between the

source and destination…− Its that simple for a true tier 1 operator.

If the above is not the case…− Then be prepared to honor your SLAs and pay out (waive the

fees)

10/10/2002http://radweb

61

A Brief Look...

At a couple of high profile failure scenarios

Baltimore Tunnel Fire

Other Fiber cuts

10/10/2002http://radweb

62

Baltimore Train Tunnel Fire

10/10/2002http://radweb

63

Train Derailment, Major Fiber Cut In Ohio April 25

“WorldCom officials blame the problem on a train derailment thatoccurred in Ohio, 50 miles south of Toledo, resulting in fiber cuts. Meanwhile, independent engineers pointed to Cisco Systems Inc.(Nasdaq: CSCO - message board) routers, which Cisco officials later confirmed. But the bottom line may be: If there's a fiber cut or router problem, isn't the network supposed to stay up anyway?”

Lightreading – 4/26/02

10/10/2002http://radweb

65

More Stats – 3rd Party

10/10/2002http://radweb

66

Closing

Robust, yet simple, and built (day 1) on native Packet-Over-SONET/SDH framing infrastructure− Ask me about HOT (Highly Optimized Tolerance) models of complex

systems if we wind up with timeBasic result: Complex systems such as the Internet are characterized by Robust yet Fragile behavior

Load-sharing is done by a per-destination caching scheme− I.E. traffic flows take only ONE best path across the SprintLink

NetworkMinimized packet re-ordering, reduced fiber-path induced jitter.

IP traffic growth is still doubling ~yearly− Easier to provision the network to ensure no congestion in the core,

more cost-effective than fancy queuing in the core.− Simple means reliable, fixable, and more stable.

10/10/2002http://radweb

67

Closing 2

Queuing only needed at the edge, where packet/frame sizes are ‘large’ in proportion to the ingress bandwidth.

Stays with Simplicity PrincipleFrees up Core routing system’s resources

Aside: Recent work in the complex systems field is leading to a deep understanding of the Complexity/Robustness tradeoffs in large (non-linear) systems. Let me know if you’d like more literature on this one...

So once UMTS/3G is deployed, there is

a worldwide IPv6 network you may not be able to talk

with…..

SprintLink and IPv6, Sure!

10/10/2002http://radweb

69

1997: Obtained 6bone address space (3ffe:2900::/24)− Original router under Robs desk ☺

1998: Totaling 15 customers using tunnels to 6bone1999: Totaling 40 customers using tunnels to 6bone− Move router out to the network…

2000: Obtained ARIN space (2001:440::/35)− Totaling 110 customer using tunnels to 6bone.

2001-2002: Added 4 more IPv6 capable PoP’s− Brussels, Washington DC, San Jose, New

York− Member of the NY6IX exchange− Adding 1-2 connections/week!

SprintLink IPv6 history

10/10/2002http://radweb

70

New York

Washington

Seattle

San Jose

Stockton

Brussels

Tokyo

IPv6 Backbone Today

10/10/2002http://radweb

71

Sprintlink IPv6 (AS6175) UUNET-UK

IIJ

ISI

CHHTL-TW

FIBERTEL

UNAM-MX COMPENDIUM-AR

AOL

TELE-2

BELLSOUTH

VBNS (I2)

INFN-IT

NTT

VIAGENIE CAIRNCNNIC/

CERNET

SONG-FI

TUMPET-AU GBLX

CSIR-ZA

External IPv6 Connectivity

10/10/2002http://radweb

72

Sprintlink IPv6 (AS6175)BAY NETWORKS

DEC

MICROSOFT

CISCO

HP

NOKIA

LUCENT

FUTUREUNIX

BROKERSYS

SprintLink IPv6 connectivity to HW & SW vendors

http://radweb

“Core” “Core”

“Core”

Customers

Customers

Customers

Looking in to 2003

Thank YouQuestions?

Interested in switching over to a real IP network?

Call or send E-Mail for a Quote! [email protected]

+1 703 864 7887