BGP Tutorial
John S. Graham
What’s In Store?
• Routing Architectures • BGP Basics • Path Selection (many examples) • BGP State Machine • Multiprotocol Extensions
– Multicast Routing Covered – Layer-III VPN Omitted
Cisco Architecture
BGP
RIP OSPF
Static EIGRP
192.168.0.0/16
Metric:
Next-Hop:
172.16.0.0/12
Metric:
Next-Hop:
10.0.0.0/8
Metric:
Next-Hop:
IP Routing Table
Export
Import
BGP
RIP
OSPF
Static
IS-IS
Juniper Architecture
Metric 1:
Metric 2:
Next Hop: Area:
172.16.0.0/12
Export
Import
AS-Path: Community: Level:
BGP Schematic (IPv4 Unicast)
BGP Path Selection
Routing Policy
<
>
< B
GP Table
Routing Table
>
<
<>
inet.0
172.16.0.0/12
192.168.0.0/16
The Autonomous System
• Collection of routers under one administrative control
• Single internal routing protocol
• Identified using an AS Number
AS100
AS200
AS Numbers • An ASN is a 16-bit number
– 1 through 64511 are assigned by RIRs – 64512 through 65534 are for private use and should
never appear on the Internet – Numbers ‘0’ and ‘65535’ are reserved – AS 23456 used to represent 4-byte ASN to routers
unable to handle the new standard • All major routing platforms now support 32-bit
ASN: http://www.get4byteasn.info/ • Interesting contrasts between European and US-
based R&E networks
Test 4-Byte ASN 39322
• Operated by Force10 Networks • Advertises 64.127.137.0/24
– Accessible via Internet2 CPS – Check the AS-Path on your network!
• 4-Byte ASN can be expressed as: – AS-Plane = 39322 – AS-Dot = 153.154
Thanks to Brent R. Sweeny for advice
ISP Routing Components
• Interior Gateway Protocol (IGP) • Internal BGP (iBGP)
– Routes customer prefixes around internal infrastructure
– Is NOT congruent with physical connectivity • External BGP (eBGP)
– Prefix interchange with customers – Most routing policy located here
Routing Components Depicted Circuit
IS-IS Adjacency
iBGP Peering
eBGP Peering
A.0
B.0
C.0
D.0
E.0
A.1
B.1
B.2
C.2 C.3
D.3
D.4
A.4 Router A
Router B
Router C
Router D
Router E
Attribute Classes Well-Known Optional
Mandatory Discretionary Transitive Non-Transitive
AS_PATH LOCAL_PREF COMMUNITY MED
NEXT_HOP ATOMIC_ AGGREGATE AGGREGATOR CLUSTER_LIST
ORIGIN ORIGINATOR_ID
MP_REACH_NLRI
MP_UNREACH_NLRI
The AS_PATH Attribute
172.16.0.0/12 200 100
172.16.0.0/12 300 200 100
172.16.0.0/12 100
172.16.0.0/12 400 300 200 100
172.16.0.0/12 500 400 300 200 100
AS100 detects its own AS in the path from AS500 and
ignores the prefix
Which is the ‘Better’ Path?
The NEXT_HOP Attribute
B.1
B.2
A.1
C.2 C.3 D.3
Router NEXT_HOP
SELF = NO SELF = YES
A ― ―
B A.1 A.1
C A.1 B.0
D C.3 C.3
The LOCAL_PREF Attribute
Graphic used with kind permission of Philip Smith, Cisco Systems
Multi-Exit Discriminator
Graphic used with kind permission of Philip Smith, Cisco Systems
MED: Metric Confusion
• MED is non-transitive and optional – Some implementations send learned MED to
iBGP peers by default and others do not – Some implementations send MEDs to eBGP
peers by default while others do not • Default metric varies by vendor
– No explicit metric implies 232 -1 or 232 -2 – No explicit metric implies zero
eBGP vs iBGP 1. Uses AS_PATH for
loop avoidance 2. NEXT_HOP is
modified 3. Peers must be
directly connected 4. MED is reset 5. LOCAL_PREF is
never advertised
1. Chinese whispers prohibited
2. NEXT_HOP is unchanged
3. Peers not necessarily directly connected
4. MED is propagated 5. LOCAL_PREF
always advertised
Internal Peering Topologies
Daisy Chain (Wrong)
Full Mesh (Allowed)
Route Reflector (Allowed)
How Are Prefixes Passed Around?
• On any given router only the best path for a prefix is passed to other peers
• Best path learned via eBGP – Advertised to all other eBGP peers – Advertised to iBGP peers
• Best path learned via iBGP – Advertised onto eBGP peers – Not advertised to other iBGP peers
The Golden Rule
• Never redistribute routes from the IGP into BGP
• Never redistribute routes from BGP into the IGP
Best Route Selection
• Longest prefix always wins regardless of routing protocol
• Source of routing information – Connected > Static > eBGP > {IGP} > iBGP
• BGP ignores received prefixes if – There is no route to the NEXT_HOP – The AS_PATH contains the local AS number – Not synchronized (Cisco IOS only)
BGP Path Vector Algorithm 1. {Highest WEIGHT} 2. Highest LOCAL_PREF is Preferred 3. Shortest AS_PATH 4. Lowest ORIGIN Code 5. Lowest MED 6. Prefer prefix received from eBGP peer over
iBGP peer 7. Path with lowest metric to NEXT_HOP (aka
Metric2) 8. Lowest ROUTER_ID 9. Shortest CLUSTER_LIST 10. Lowest neighbor IP address
Interior Gateway Protocol (IGP) • Can be OSPF or IS-IS
– Prefer IS-IS • Doesn’t require IP or ANY Layer-III (e.g. CLNS) protocol to work • Routes IPv6 with no more effort than IPv4
• Routes ISP infrastructure addresses: – Router /32 loopback – Point-to-point backbone /30 or /31 subnets – (Customer assigned link networks) – (IXP assigned address for public peerings) – Other infrastructure addresses such as management networks
• Simple and invariant configuration compared with an enterprise network – Adjacencies follow physical backbone connectivity – Non-backbone interfaces run IGP passively – Metric
• Based on route miles • Can be adjusted for traffic engineering purposes
– Complex sub-divisions (areas, levels) unnecessary • Principal job of IGP is determining least-cost path between any two routers
Static iBGP; Dynamic IGP
Within Internet2:
1. ‘Next Hop’ for customer prefixes in RIB provided by iBGP and does not change if backbone circuit fails.
2. Entry in FIB provided by IS-IS and depends on backbone connectivity.
Route Reflection
CHIC
HOUS ATLA LOSA NEWY SALT SEAT WASH
KANS
Route Reflection Rules
• Prefix received from a client – Reflect to all client peers apart from the
sender – Reflect to all non-client peers
• Prefix received from a non-client peer – Reflect to all client peers
Loop Prevention for RR
• iBGP speaker receives a prefix with the ORIGINATOR_ID attribute equal to its own ROUTER_ID
• Route reflector receives a prefix with the CLUSTER_LIST attribute containing its own CLUSTER_ID
Progress of Route Reflection
1. External Prefix advertised to both RR 2. Prefix reflected to all client peers
3. Prefix passed between RR but ignored due to CLUSTER_LIST attribute
iBGP Tracks the IGP Metric
CHIC
KANS LOSA SEAT SALT HOUS ATLA WASH
NEWY
KANS WASH ATLA HOUS SALT SEAT LOSA
BUFF NEWY
278 978 2363 2363 3019 3932 4068
3212 2932 2019 689 1507 1045 905
1000
1000
128.122.0.0/16 128.122.0.0/16
NREN and Internet2
Metric = 2932
Metric = 2019
Possible Layer-II Connectivity
• Internet2 and NREN routers directly connected
• Connection through a VLAN on a single peering switch
• Connection via VLAN that traverses multiple Layer-II and optical devices
Internet2 Connection at NGIX-W
OC192
OC192
OC192
10GE
GE
10GE
10GE
VLAN 166
VLAN 166
Prefer the 10G-Connected Path
• Internet2 router in Seattle – Associate a high LOCAL_PREF with prefixes
received on external peering with NREN • Internet2 router in Salt Lake City
– Receives NREN prefixes over iBGP peering with Seattle with high LOCAL_PREF
– Now prefers iBGP to SEAT over eBGP via NGIX-W
Boomerang Prefixes
CHIC
KANS
NEWY
GPN
GLBX
RRMA RRMA
161.46.0.0/16
Boomerang Prefixes 2/2
• Loop is prevented by AS_PATH attribute • Suppressing boomerang prefixes:
– Advertise the prefix to the echoing peer – Attach NO_EXPORT community to transiting
peer – Request echoing peer to filter their outbound
(sub-optimal)
Redundant Connections
• Multiple Peerings with Internet2 – Set the MED on prefixes sent to I2 – Use communities to change the
LOCAL_PREF on I2 – Allow both peerings with I2 to float
• Advertise prefixes both directly to I2 and through another RON
Use of MED: Internet2 & ESNet
128.175/16 [0]
128.175/16 [278]
128.175/16 [1001]
128.175/16 [1000]
278
1000
905
ESNet Send MEDs
131.225/16 [1]
131.225/16 [0]
131.225/16 [1000]
131.225/16 [905]
1000
905
278
Bidirectional MED Causing Asymmetric Routing
Fermi UPenn UPenn Fermi
What If I2 Doesn’t Send MEDs?
• ESNet use their IGP Metric – Traffic remains on ESNet backbone only until it
reaches a router with an I2 peering. • ESNet use LOCAL_PREF on One Peering
– A single (congested?) egress from ESNet for all traffic to Internet2-connected destinations
• Different I2 Prefixes Receive High LOCAL_PREF at Different Locations – Same effect as MEDs, but implementation far more
manual and complex
Introducing Prefixes Into BGP
1. Use network statement 1. With auto-summary disabled 2. With auto-summary configured
2. Configure aggregate routing 3. Use route maps to redistribute
1. Prefixes learned from an IGP 2. Static routes
1.1 The network Statement (Cisco) router bgp 87 no auto-summary network 129.79.0.0
! ip route 129.79.0.0 255.255.0.0 Null0 200
1. There is no mask following the prefix in the ‘network’ statement as we are advertising a classful network
2. The static route serves two important purposes:
1. The prefix will not be advertised by BGP unless there is an exact match in the IP routing table
2. Traffic sent to non-existent IP addresses in the range will be silently dropped. This avoids wasting b/w sending ICMP Unreachables and is a valuable defense against scanners and DDoS attacks.
1.2 Using auto-summary (Cisco)
router bgp 87 auto-summary network 129.79.0.0
1. The prefix will be advertised by BGP providing there is at least one contained prefix in the IP routing table
2. This IGP-learned prefix can be any length; it does not have to match the classful network that BGP is being asked to advertise
3. Use this command:
show ip route 129.79.0.0 255.255.0.0 longer-prefixes
to check whether there is an IGP-learned route. If there are none, then BGP will not advertise the 129.79/16 parent
3.2 Redistribute Static (Juniper) policy-options { policy-statement ORIGINATE { term Seed { from { protocol static; route-filter 129.79.0.0/16 exact; } then accept; } } }
routing-options { static { route 129.79.0.0/16 discard; } }
Aggregation Scenario
Universities
Regional Network
Internet2 or NLR
10.100/16 10.200/16 10.300/16
10/8
Prefix Leaking Problem router bgp 400 network 10.0.0.0 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.1.5 remote-as 200 neighbor 192.168.1.9 remote-as 300 neighbor 192.168.1.13 remote-as 500
! ip route 10.0.0.0 255.0.0.0 Null0 200
Router D
(Faking) Aggregation
• Use the NO_EXPORT Community – Not the best choice as the problem and its
solution reside in different domains • Filter outbound prefixes to NLR/Internet2
– A straightforward robust solution • Deploy route aggregation
– Downstream problems could cause routing flaps on peering with Internet2
Configuring Aggregation (Cisco) router bgp 400 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.1.5 remote-as 200 neighbor 192.168.1.9 remote-as 300 neighbor 192.168.1.13 remote-as 500 aggregate-address 10.0.0.0 255.0.0.0 summary-only
Router D
Configuring Aggregation (JunOS) routing-options { aggregate { route 10.0.0.0/8 discard passive community 65535:65281; } }
policy-options policy-statement ORIGINATE { term AGGREGATE_to_BGP { from protocol aggregate; then accept; } }
protocols bgp group ISP { export ORIGINATE; peer-as 400; neighbor 192.168.1.13 { … } }
Global NOC Recommendation
• Use a ‘network’ statement to originate ARIN allocations to Internet2
• Configure a supporting static route • Disable the ‘auto-summary’ capability • Filter more specific contained prefixes
using an outbound route-map applied to the peering with Internet2 or NLR
Steering Inbound Traffic (1/2)
129.79/16 [19782 87 I] 129.79/16 [19782 19782 19782 87 I]
[210 11537 19782 87 I]
[19401 19782 19782 19782 87 I]
Steering Inbound Traffic (2/2)
129.79/16 129.79.9/24
LOCAL_PREF = 200
LOCAL_PREF = 100
129.79/16
Global Routing Flap Analysis 1/3 [email protected]> show route aspath-regex " .* 47868 .* " table commodity.inet.0
commodity.inet.0: 284903 destinations, 486789 routes (284902 active, 0 holddown, 5 hidden) + = Active Route, - = Last Active, * = Both
94.125.216.0/21 *[BGP/170] 05:29:58, localpref 200, from 149.165.255.64 AS path: 4323 3257 25512 47868 I > to 149.165.254.25 via ge-0/1/0.1, label-switched-path LL640Lo0.0->CTC640lo0.0 to 149.165.254.29 via xe-2/0/0.111, label-switched-path LL640Lo0.0->CTC640lo0.0 [BGP/170] 16:42:14, MED 208, localpref 200 AS path: 1239 3257 3257 25512 47868 I > to 144.228.154.165 via so-1/0/0.0
47868 25512 3257 4323
Global Routing Flap Analysis 2/3
47868 29113 3257
1299
3356
255*[47868]
Global Routing Flap Analysis 3/3
Graphics used with kind permission of Renesys and Earl Zmijewski
Functions Served by Communities
1. Assign prefixes to pre-defined groups (local significance only)
2. Control how prefix is advertised by peer 3. Control your peer’s LOCAL_PREF for
the specific prefix 4. Signal peer to prepend multiple AS
numbers to AS_PATH 5. Blackhole all traffic to specific prefix
Expressing A Community
1. A community is just a 32-bit number 2. By convention, the most-significant 16 bits
represent an AS number 3. To convert from ‘new’ to ‘old’ formats
1. Multiply the ‘high’ 16-bits by 216
2. Add the ‘low’ 16-bits to the result
JunOS & Cisco ‘New Format’ 11537:260
Cisco ‘Old Format’ 756089092
BGP Communities on Internet2
• Classify prefixes – Directly connected participants – Sponsored – SEGP
• Adjust Internet2 LOCAL_PREF • Request Internet2 to black-hole a prefix • (Prevent ISP from advertising prefix to
specified upstream peers)
Well-Known Communities
NO-EXPORT 65535:65281 Do not advertise to any eBGP peer
NO-ADVERTISE 65535:65282 Do not advertise to any peer
NO-EXPORT-SUBCONFED 65535:65283 Do not advertise beyond local AS
(confederations only)
NO-PEER 65535:65284 Do not advertise to bi-lateral peer
Using Communities (1/5)
96.4.0.0/15
128.169.0.0/16
192.55.208.0/24
+14048:10 +14048:10
128.169.0.0/16
141.225.0.0/16
192.55.208.0/24
NO-EXPORT Community 172.16.0.0/12
172.16.1.0/24 +NO-EXPORT
172.16.0.0/12
172.16.2.0/24 +NO-EXPORT
172.16.0.0/16
Security Diversion
Null 0
Routers are optimized for packet forwarding; not packet filtering
Routing to Null saves valuable CPU cycles
Inbound Packet
Interface
ACL
Routing Table
Customer-Triggered Blackhole Inbound Prefix
11537:911
Prefix > /24
Discard Forward
No
No
Yes
Yes
Static Route
11537:911
Prefix > /24
Redistribute Into BGP
ISP Customer
Customer-Triggered Blackhole (ISP Perspective; Cisco IOS)
interface Null0 no ip unreachables
ip policy-list BLACKHOLE permit match ip address prefix-list 24_TO_32 match community 10 ! ip community-list 10 permit 756089743 ! ! ip prefix-list 24_TO_32 seq 5 permit 0.0.0.0/0 ge 24 ! ! route-map CUSTOMER_IN permit 10 match policy-list BLACKHOLE set community no-export set interface Null0
Customer-Triggered Blackhole (Customer Perspective; JunOS)
routing-options { static { route 192.168.17.15/32 { discard; community 11537:911; } } }
policy-options { policy-statement ORIGINATE { term BLACKHOLE { from { protocol static; route-filter 0.0.0.0/0 prefix-length-range /24-/32; community BLACKHOLE; } then accept; } term … { … } } }
Customer-Side Policy: IOS vs JunOS
• Juniper – Blackhole prefix statically routed to Discard – Attach a community tag to the static route
• Cisco IOS – Blackhole prefix statically routed to Null0 – Add the new prefix to the ‘blackhole’ prefix list – Existing route-map
• Redistributes ‘blackhole’ prefix list into BGP • Attaches the correct community
Recommended Routing Policy
• Should be implemented – Reject any prefix with a private AS in the
AS_PATH – Reject bogon prefixes (following slide)
• Consider implementing – Assign higher LOCAL_PREF to Internet2 or
NLR prefixes than to commodity. – Max prefixes limit on some peers
LOCAL_PREF Gotcha
149.165.240.64/26
149.165.128.0/17 149.165.128.0/17
LOCAL_PREF = 200
Indiana Gigapop
Bogon Prefixes Prefix Reason RFC
0.0.0.0/0 Default
10.0.0.0/8
Private 1918 172.16.0.0/12
192.168.0.0/16
127.0.0.0/8 Loopback
169.254.0.0/8 Link Local 3330
192.0.2.0/24 IANA Reserved
192.88.99.1/32 6 to 4 relay
198.18.0.0/15 Network device benchmarking 2544
224.0.0.0/4 Multicast group addresses 3171
240.0.0.0/4 Class E addresses
BGP Messages
Type Description References
1 Open RFC 4271
2 Update RFC 4271
3 Notification RFC 4271
4 Keepalive RFC 4271
5 Route-Refresh RFC 2918
The BGP Update Message
Type
Length
Value
Length
Prefix
Length
Prefix Withdrawn Routes
Total Path Attributes Length
Path Attributes
NLRI
× M
× N
× P
BGP Header
Data
Unfeasible Routes Length
N >= M
BGP State Machine
Idle
Open-Sent
Established
Connect
Open-Confirm
Active
BGP Convergence Scenarios
Both BGP processes immediately transition from ‘Established’ to ‘Active’
The router connected via the unaffected circuit blackholes traffic for up to 90 seconds
Anatomy of Brief Outage
1. Link between ESNet and MANLAN Goes Down 2. ESNet router sends NOTIFICATION which is not received 3. Peering on Internet2 remains Established even though ESNet side
is Down 4. Link between ESNet and MANLAN is restored before KEEPALIVE
timer expires on Internet2 5. ESNet router negotiates new TCP virtual circuit with Internet2
router 6. The peering on Internet2 resets
Source-Specific Multicast
S G
G R
(S,G) Join
(S,G) Join
IGMPv3 Join
Multicast Routing
Unicast IPv4 Prefixes
Unicast and Multicast IPv4 Prefixes
Unicast traffic AS400 <> AS200
Multicast traffic AS400 <> AS200
Top Related