Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data...

52
Internetworking BGP From mort&tim

Transcript of Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data...

Page 1: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Internetworking

BGPFrom mort&tim

Page 2: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

2

Internetworking

• So far we have talked about:– Moving data between hosts– Moving data within a network

(administrative domain)• So what is the Internet then, really?

The InternetBT VerizonAT&T

Page 3: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

3

Page 4: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

4

Recall: Routing vs. Forwarding

• Router receives an IP packet: what to do?– Drop or forward via an interface

• Deciding which interface is forwarding– IP bases this decision (almost) solely on the

destination IP address• Building up the information to do so is routing

– Where are all the addresses at the moment?

Page 5: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

5

Recall: Longest Prefix Matching

1100 0000 . 1010 1000 . 0000 0000 . 0000 0000192 168 0 0 /16

1100 0000 . 1010 1000 . 0000 1000 . 0000 0000192 168 8 0 /21

1100 0000 . 1010 1000 . 0000 1010 . 0000 0000192 168 10 0 /23

1100 0000 . 1010 1000 . 0000 1010 . 0000 1100192 168 10 12 /32 – Host

1100 0000 . 1010 1000 . 0000 1010 . 0000 0000192 168 10 0 /24

1100 0000 . 1010 1000 . 0000 0100 . 0000 0000192 168 4 0 /24

Page 6: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

6

Contents

• Routing• The Protocol• Decision Process• Operations

Page 7: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

7

Contents

• Routing– Inter-domain Routing– BGPv4– Autonomous Systems

• The Protocol• Decision Process• Operations

Page 8: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

8

Routing Protocols

• Distribute the data to build forwarding tables• Examples we saw: OSPF, IS-IS, RIP

– Link-state, Distance vector• These are intra-domain routing protocols

– Or Interior Gateway Protocols– Source and destination inside the same network

• What happens between networks?

Page 9: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

9

Inter-domain Routing

• An important distinction: local vs global– Interior vs Exterior Gateway Protocol (IGP, EGP)– Why is this important? Two reasons:

• Dynamics– Need to scope information propagation (why?)

• Protection– Need to hide information (why?)

Page 10: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

10

Border Gateway Protocol, BGPv4

• The Internet inter-domain routing protocol– RFC 4271, updating RFC 1771– Derives originally from GGP, EGP (1982)– Updated over time (RFCs 1105, 1163, 1267)

• Deals in IP prefixes and Autonomous Systems– ASs purely administrative– Purpose is to enable policy to be applied– Only prefixes matter in the data-plane

Page 11: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

11

Autonomous Systems, ASs

• Internet policy domains– Logical construct only– No meaning outside BGP– Do not map simply onto ISPs or networks

• Currently ~493,000 prefixes, ~46,000 ASs

AS1

AS2

AS4

AS3

Page 12: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

12

Contents

• Routing• The Protocol

– Sessions– Updates– Path Attributes

• Decision Process• Operations

Page 13: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

13

A Very Simple Protocol

• Exchanges prefixes– Uses TCP/179 as transport– OPEN, UPDATE, KEEPALIVE,

NOTIFICATION• Sessions between peers

– Simple capability negotiation– Manage simultaneous OPEN– Lose everything on session

failure (why?)

Peer A Peer B

OPEN (myAS, id)

KEEPALIVE

UPDATEs(withdrawn, attributes, advertised)

Page 14: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

14

Sessions & RIBs

• BGP peer typically has many sessions– 10? 20? 100s?

• Logically, Adj-RIB-In & -Out for each session– Advertisements received and to be sent

• Generate Loc-RIB from Adj-RIB-In– Routes to use and potentially distribute– Resolved into per-port forwarding tables

• Generate Adj-RIB-Out from Loc-RIB and policy

Routing Information Bases

Page 15: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

15

UPDATEs

• Incremental – indicate changes to state– Withdrawn routes– Path attributes, common to all advertised routes– Advertised routes, known as NLRI

• There are ~27 path attributes defined– Perhaps a dozen or so are in common use– Communicate information about prefixes– Used to apply policy in BGP decision process

Page 16: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

16

Path Attributes

• Well-known, Mandatory– Next Hop– AS Path– Origin

• Well-known, Discretionary– Local Preference– Atomic Aggregate

• Optional, Transitive– Aggregator– Community– Extended Communities

• Optional, Non-transitive– Multi-Exit Discriminator– Originator ID– …

Page 17: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

17

An Example UPDATE[ Thu Apr 1 04:26:25 2010 ]MRT packet: len: 81, type: PROTOCOL_BGP4MP, subtype: MESSAGE AS(src): 39202, AS(dst): 12654 ifc idx: 0, AFI: IP IP(src): 195.66.225.2, IP(dst): 195.66.225.241 Update (len=65): unfeasible_len=0 path_attr_len=26 UNFEASIBLE ROUTES: PATH ATTRIBUTES: ORIGIN: IGP [ transitive ] AS_PATH: (SEQUENCE)[ <- 39202 <- 3491 <- 17639 <- 6163 <- 6163 ] [ transitive ] NEXT_HOP: 195.66.224.167 [ transitive ] FEASIBLE ROUTES: 1: 61.9.0.0/24 2: 61.9.1.0/24 3: 61.9.62.0/24 4: 202.47.132.0/24

Page 18: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

18

Page 19: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

19

Contents

• Routing• The Protocol• Decision Process

– Path Vectors• Operations

Page 20: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

20

Path Vectors – AS_PATH

• Distance vector – prefer lowest cost path– Need to break loops somehow (how?)

• Path Vector– How do we know if we’ve seen this advert before?– Store the list of ASs through which it reached us– The AS_PATH

• Loops can be broken: – If our ASN appears in a received AS_PATH, drop

the advert

Page 21: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

21

Decision Process

• Drop prefix if:– NEXT_HOP is unreachable via local routing table– Local AS appears in AS_PATH

• Then (commonly) apply following preference:1. Higher WEIGHT

(local to this router)2. Highest LOCAL_PREF3. Shortest AS_PATH

(leads to AS padding)4. Lowest ORIGIN5. Lowest MED

(if from same AS – why?)

6. EGP to IGP (hot potato)

7. Shortest internal path8. Prefer oldest route9. Lowest Router-ID

(usually, highest router IP)10. Lowest interface IP

address

Page 22: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

22

Contents

• Routing• The Protocol• Decision Process• Operations

– Consistency– Scaling– Confederations– Route Reflectors

Page 23: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

23

Consistency

• Learn external routes on EBGP sessions– EBGP defined as peers having different ASNs– Must ensure every router knows all external

routes (why?)• Redistribute external routes inside network

– Via IGP – only in small networks (why?)– Via IBGP – gives full control over route distribution

• What’s the problem with IBGP?

Page 24: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

24

Scaling

• Can’t distribute IBGP routes on IBGP sessions– Why?

• Have to maintain N.(N-1)/2 IBGP sessions– Each carrying up to 490k routes x 2 tables

• Two standard solutions– Route Reflectors:

supernodes, readvertising IBGP routes– AS Confederations:

split AS up into mini-ASs– Both tweak decision process somewhat

Page 25: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

25

Operations

• Handle link failures– Bind to loopback– Flap damping (but can make things worse!)

• Process failures– Out of memory error due to too many routes

• Hijacking, intentional and unintentional– “Don’t believe everything you read”– http://www.youtube.com/watch?v=IzLPKuAOe50

• Anycast (1:1-of-N)– Advertise same prefix in many places. Carefully.

Page 26: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

26

Network Interconnection

• Networks interconnect via EBGP sessions– POPs, Points-of-Presence; or IXs, Internet eXchanges

• Multi-homing– This is all logical – what about physical diversity?

• How does this all fit together?– Public/Private Peering vs Transit– Roughly hierarchical (though this is changing)– Tier-1/core/backbone vs the rest

• As ever, business and politics– E.g., Level3 vs Cogent de-peering

Page 27: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

27

Simple Example of a Complex Graph

(Policy – example from Level3)

Page 28: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

28

Contents• Routing

– Inter-domain Routing– BGPv4– Autonomous Systems

• The Protocol– Sessions– Updates– Path Attributes

• Decision Process– Path Vectors

• Operations– Consistency– Scaling– Confederations– Route Reflectors

Page 29: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

29

Summary

• The Internet is inter-connected networks– The routing protocols are what hold it together

• BGPv4 is the inter-network routing protocol– All about application of policy– To meet business needs

• Simple protocol, can be arbitrarily complex– Many operational matters make this hard

Page 30: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

30

Quiz (1)

1. What information needs to be exchanged between networks to route packets?

2. What constraints are different between an IGP and an EGP?

3. Why does BGP add path attributes to prefixes?4. What is an AS?5. Why is simultaneous open of BGP sessions an issue,

and how is it resolved?6. What might happen if the corresponding tables and

routes were not removed on session failure?

Page 31: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Load Balancing Example

primary link for prefix P1backup link for prefix P2

AS 1

AS 2

AS 3 AS 4

provider

peer peer

provider

customer

AS 5customer

primary link for prefix P2backup link for prefix P1

Simple session reset my not work!!

Page 32: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Can’t un-wedge with session resets!

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1—2 down 1—5 down

1—2 up 1—5 up

P2wedged

P1wedged

INTENDED

Reset 1—2 Reset 1—5

1

2

3 4

5

BOTHP1 & P2wedged

1—2 & 1—5 down

1

2

3 4

5

1

2

3 4

5

1—2 & 1—5 down

all up all up

Note that when bringingall up we could actually landthe system in any one of the 4 stable states --- dependson message order….

Page 33: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Recovery

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1

2

3 4

5

1—2 down 1—5 down

1—2 up 1—5 up

P2wedged

P1wedged

INTENDED

Temporarilyfilter P2 from 1—5 session

Temporarilyfilter P1 from 1—2 session

Who among us could figure this one out? When 1—2 is in New York and 1—5 is in Tokyo?

Page 34: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

AS 1

AS 2

AS 3 AS 4

customer

provider

peer peer

provider

customer

customer

provider

primary link

Full Wedgie Example

AS 5

backup links

• AS 1 implements backup links by sending AS 2 and AS 3 a “depref me” communities.

• AS 2 implements its community so that the resulting local pref is below that of its upstream providers and it’s peers (AS 3 and AS 5 routes)

• AS 5 implements its community so that the resulting local pref is below its peers (AS 2) but above that of its providers (AS 3)

customer

peer peer

Page 35: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

And the Routings are…

AS 1

AS 2

AS 3 AS 4

AS 5

AS 1

AS 2

AS 3 AS 4

AS 5

Intended Routing Unintended Routing

Page 36: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Resetting 1—2 does not help!!

AS 1

AS 2

AS 3 AS 4

AS 5

AS 1

AS 2

AS 3 AS 4

AS 5

Bring down AS 1-2 session

Bring up AS 1-2 session

Page 37: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Recovery

AS 1

AS 2

AS 3 AS 4

AS 5

AS 1

AS 2

AS 3 AS 4

AS 5

Bring down AS 1-2 sessionAND AS 1-5 session

AS 1

AS 2

AS 3 AS 4

AS 5

A lot of “non-local” knowledge is required to arrive at this recovery strategy!

Try to convince AS 5 and AS 1 that their session has be reset (or filtered) even though it is not associated with an active route!

Bring up AS 1-2 sessionAND AS 1-5 session

Page 38: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

That Can’t happen in MY network!!

AU++

APEMEA

LA

NA

An “normal” global global backbone (ISP or Corporate Intranet) implemented with 5 regional ASes

Page 39: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

The Full Wedgie Example, in a new Guise

AU

EMEA

NA AP

LA

Intended Routing for some prefixes in AU,implementedwith communities.

DOES THIS LOOK FAMILIAR??

Message: Same problems can arisewith “traffic engineering” acrossregional networks.

Page 40: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Recommendations

• Be aware of BGP Wedgies• Preference-impacting Interdomain

communities should be defined with care and consistently implemented (this may require translating and transiting communities).

Page 41: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

References

• Internet Draft (grow working group):

draft-ietf-grow-bgp-wedgies-03.txt

• Long-term solution? – Metarouting!– http://www.acm.org/sigs/sigcomm/sigcomm2005/

techprog.html#session1

Page 42: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

42

Extras…

Page 43: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

So, how do you build an IP network?

1. Buy (lease) routers2. Buy (lease) fibre3. Connect them all together

4. Configure routers

5. Configure end-systems

$1m? $2m? for a new, populated, backbone router!

Wayleaves = $$$Be a landowner!

Correctly.For now.

Mwuhahaha.

Someone else’s can of worms.

Page 44: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Multiple Router Flavours

• Core– OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)– Big, fat, fast, expensive– E.g., Cisco HFR, Juniper T-640– HFR: 1.2Tbps each, interconnect up to 72 giving

92Tbps, start at $450k• Transit/Peering-facing

– OC-3 and up, good GigE density– ACLs, full-on BGP, uRPF, accounting

Page 45: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Multiple Router Flavours

• Customer-facing– FR/ATM/…– Feature set as above, plus fancy queues, etc

• Broadband aggregator– High scalability: sessions, ports, reconnections– Feature set as above

• Customer-premises (CPE)– 100Mbps, maybe– NAT, DHCP, firewall, wireless, VoIP, …– Low cost, low-end, perhaps just software on a PC

Page 46: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Multiple Router Flavours

Cisco CRS-1Multi-shelf system

Page 47: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Network Design

• Whose network?– ISPs, IXs, enterprise, campus– POPs, DCs

• Many designs: – Flat– Hierarchical– Hybrids– Multiple scales

Page 48: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Network Design Constraints

• Business– Backwards compatibility. Who to connect. Peering.

• Technology– Power – directly (24x7 operation) and indirectly (cooling)– Port density vs. raw bandwidth– Software reliability– Hardware/software capability

• Addressing schemes for scalability, summarization• Can’t run feature X with feature Y on vendor C in network size N

• Connectivity/resiliency– “All core routers connect to at least 2 other core routers”– “All edge routers connect to at least 2 core routers”

Page 49: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Router OS Configuration

• Initialization– Name the router, setup boot options, setup

authentication options

• Configure interfaces– Loopback, Ethernet, fibre, ATM– Subnet/mask, filters, static routes– Shutdown (or not), queuing options, full/half

duplex

Page 50: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Router Software Configuration

• Configure routing protocols (OSPF, BGP, &c)– Process number, addresses to accept routes from,

networks to advertise– Access lists, filters, ...

• Numeric id, permit/deny, subnet/mask, protocol, port

– Route-maps, matching routes rather than data traffic

• Other configuration aspects: traps, syslog, &c– (Oh, and switch configuration is about as painful)

Page 51: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Router Configuration Fragmentshostname FOOBAR!boot system flash slot0:a-boot-image.binboot system flash bootflash:logging buffered 100000 debugginglogging console informationalaaa new-model aaa authentication login default tacacs local aaa authentication login consoleport none aaa authentication ppp default if-needed tacacs aaa authorization network tacacs !ip tftp source-interface Loopback0no ip domain-lookupip name-server 10.34.56.78!ip multicast-routingip dvmrp route-limit 7000ip cef distributed

interface Loopback0 description router-1.network.corp.com ip address 10.65.21.43 255.255.255.255!interface FastEthernet0/0/0 description Link to New York ip address 10.65.43.21 255.255.255.128 ip access-group 175 in ip helper-address 10.65.12.34 ip pim sparse-mode ip cgmp ip dvmrp accept-filter 98 neighbor-list 99 full-duplex!interface FastEthernet4/0/0 no ip address ip access-group 183 in ip pim sparse-mode ip cgmp shutdown full-duplex

router ospf 2 log-adjacency-changes passive-interface FastEthernet0/0/0 passive-interface FastEthernet0/1/0 passive-interface FastEthernet1/0/0 passive-interface FastEthernet1/1/0 passive-interface FastEthernet2/0/0 passive-interface FastEthernet2/1/0 passive-interface FastEthernet3/0/0 network 10.65.23.45 0.0.0.255 area 1.0.0.0 network 10.65.34.56 0.0.0.255 area 1.0.0.0 network 10.65.43.0 0.0.0.127 area 1.0.0.0

access-list 24 remark Mcast ACLaccess-list 24 permit 239.255.255.254access-list 24 permit 224.0.1.111access-list 24 permit 239.192.0.0 0.3.255.255access-list 24 permit 232.192.0.0 0.3.255.255access-list 24 permit 224.0.0.0 0.0.0.255access-list 1011 deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.fffftftp-server slot1:some-other-image.bintacacs-server host 10.65.0.2tacacs-server key xxxxxxxxrmon event 1 trap Trap1 description "CPU Utilization>75%" owner configrmon event 2 trap Trap2 description "CPU Utilization>95%" owner config

Page 52: Internetworking BGP From mort&tim. Internetworking So far we have talked about: – Moving data between hosts – Moving data within a network (administrative.

Router Configuration• Lots of large, fragile text files

– 00s/000s routers, 00s/000s lines per config– Errors are hard to find and have non-obvious results– Router configuration also editable on-line– Order matters!

• How to keep track of them all?– Naming schemes, directory trees, CVS, ssh upload and atomic commit

to router– Perhaps even a proper database

• State of the art is pretty basic– Few tools to check consistency, design goals– Generally generate configurations from templates and have human-

intensive process to control access to running configs

This counts as advanced!