SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated...

20
SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: [email protected] Hamid Ould-Brahim: [email protected] NANOG 74, Vancouver, October 2018

Transcript of SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated...

Page 1: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

SDN-basedAutomatedPeeringOptimizationChallengesandSolutions

RedaLaichi:[email protected]:[email protected]

NANOG74,Vancouver,October2018

Page 2: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 2 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 2

Agenda

•  DefiningtheProblem•  SDNandAutomation•  UseCases•  SummaryandReferences

Page 3: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 3 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 3

Internet traffic reality

2000 à 2018 From web browsing to social media, video streaming and online gaming

Internet traffic became much more versatile, dynamic and unpredictable

Traffic

Time

iOS update

New episodes

Viral video

Page 4: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 4 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 4

Outbound Traffic

Peering Only

Inbound Traffic

Peering & Internal Network

What traffic are we interested in?

What part of the network?

What do we want to address?

Better bandwidth management, automatic congestion resolution, better traffic symmetry

Reducing transit peering cost, addressing OPEX & CAPEX

Better SLAs, & application Performance (Latency, Packet Loss)

Balanced

Problem Space Objectives & applicability...

Page 5: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 5 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 5

Existing Limitations...

•  Unaware of link capacity & real-time utilization Packet loss and congestion

•  No real-time path performance indication High latency

•  No end-to-end performance indication Sub-optimum overall performance

•  Multiple teams involved (network operations, peer engineers, OAM probing, EMS alarms…)

Slow communication •  Complex, manual processes

Error-prone configurations •  Reactive model

Inadequate for sudden real-time event changes

Routing mechanism - Border Gateway Protocol (BGP)

Organization and tools Visibility

Limited traffic engineering and steering Lack or limited visibility

Page 6: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 6 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 6

Agenda

•  DefiningtheProblem•  SDNandAutomation•  UseCases•  SummaryandReferences

Page 7: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 7 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 7

What is being done today? Webscale players leading the way...

Gaming companies Webscale players Existing route controllers/appliances

Latency Source: https://qz.com/790208/how-the-company-behind-league-of-legends-rebuilt-its-own-internet-backbone-so-that-its-faster-for-gamers/

Source: delivery.acm.org, 2017, Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering

Source: research.fb.com, 2017, Engineering Egress with Edge Fabric Steering Oceans of Content to the World

Page 8: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 8 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 8

The Approach: Ten Thousand Foot View... Closed-Loop Automation for Peering

Intent

Network control & Monitoring Real-time visibility

Automation & Optimization

Information

Action Insight

•  Cost •  Latency •  Bandwidth

Automated

Semi-automated

Page 9: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 9 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 9

Approach?

Complexity of the “automation” problem?

Peering Analytics & Telemetry

OAM - Probing

Network Data & Monitoring

topology, path, router/interface

discovery

Automatic Real-time Events

Detection

Software Control

Steering, flows, Routes, Policies

Challenge:Existingoperationalenvironmentanditscomplexity,andtakingintoaccountthehumandimension

•  Stats, BGP routes, flow stats, protocol, destination AS, prefixes, application-based?

•  Rates of sampling & collection

•  Ping, TWAMP, TCP/UDP, HTTP, in-band?

•  Internal lsp-ping •  Bi-directionality •  Return-path

Theotherchallengeisinsystemintegration,openinterfacesandmulti-vendor...atscale

Filters, BGP Policies, Route Injection, BGP FlowSpec, Openflow, BGP SR TE Policies & route coloring, RIB/FIB API, RSVP-TE/SR, PCEP, Netconf

BGP-LS, EPE extensions, OSPF, ISIS, BMP, netconf Leveraging recent technology

developments

Page 10: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 10 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 10

Agenda

•  DefiningtheProblem•  SDNandAutomation•  Usecases•  SummaryandReferences

Page 11: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 11

Stats per source subnet or per destination subnet or per destination AS, per traffic category or per specific APP

Use Case 1: Local Peer Engineering Congestion example for outbound traffic

Internet

ASBR (BR1)

ASBR (BR2)

Transit 1

End users

Content source

Transit 2

Peer

IXP

ASBR (BR3)

AS522

AS521

AS523

PE1

PE2

DC/Hosting ContentASBR (BR4)

!

Analytics SDN

12

Automatic link congestion detection Real-time stat/data collection and correlation (gRPC, Netconf/SNMP) Interface & Flow Stats (IPFix)

Determine optimal alternate peer based on bandwidth availability

Match selected traffic flows at local ASBR based on IP flow/route/AS/5 tuple, or application match (requires transaction-based steering)

Redirect to next hop using FlowSpec, openflow, filters/Netconf, RIB/FIB API, BGP policies)

Problem: 100s of peering partners, which link is best and is least congested?

1

2

Approach: 1 2 3 4

Interface Heatmap

BR1

Congestion Threshold

Target AlternateTraffic Offloading

Page 12: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 12

AnalyticsSDN

Internet

ASBR (BR1)

ASBR (BR2)

Transit 1

End users

Content source

Transit 2

Peer

IXP

ASBR (BR3)

AS522

AS521

AS523

PE1

PE2

DC/Hosting ContentASBR (BR4)

!

Use Case 2: Egress peer engineering

1

3 2

Stats per source subnet or per destination subnet or per destination AS, per traffic category or per specific APP

Establish an LSP/Tunnel to alternate that meets traffic offloading profile

Steer selected traffic directly to LSP /Tunnel (filters, openflow, flowspec, segment routing BGP colored route

Determine optimal alternate ASBR & alternate peer based on topology and bandwidth availability.

Auto-create/use existing tunnel/path to alternate ASBR (can be PCE initiated)

Steer selected IP flows at the edge of the network across the newly/existing tunnel and encode the egress peer link label/segment ID.

Note: Colored BGP route to BGP-SR TE Policy tunnel can be used for steering

Problem: 100s of peering partners, which peer and egress link is best and is least congested?

1

2

Approach:

3

Automatic link congestion detection Real-time stat/data collection and correlation (gRPC, Netconf/SNMP) Interface & Flow Stats (IPFix)

Page 13: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 13

Use Case 3: Locale and EPE with Latency-based Steering Performance-based Optimization End-to-End Problem:

100s of peering partners, BGP best path on default peer causing high latency for top traffic or selected application/branch destination traffic.

1

3 2

In addition to stats, collect latency measurement from PE-ASBR (in-band using LSP ping) and ASBR-Destination point per selected prefix using Ping, TWAMP, TCP/UDP, HTTP

•  Compute end-to-end based on Total latency information

•  Establish an LSP/Tunnel to alternate that meets traffic offloading profile

Steer selected traffic directly to LSP /Tunnel (filters, openflow, flowspec, segment routing BGP colored route)

Determine optimal alternate ASBR based on end-to-end latency/performance data using path computation based on abstract topology Auto-create the LSP/tunnel to alternate ASBR

Real-time probing from each source PE and each alternate ASBRs of discovered top/VIP destinations

Approach:

Auto-steer selected traffic to the LSP/tunnel to alternate ASBR/ASBR+Egress peer link

1

2

3

AnalyticsSDN

Internet

ASBR (BR1)

ASBR (BR2)

Transit 1

End users

Content source

Transit 2

Peer

IXP

ASBR (BR3)

AS522

AS521

AS523

PE1

PE2

DC/Hosting ContentASBR (BR4)

! Latency degradation

Page 14: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 14

Use Case 4: Controlling Inbound Traffic The example of Auto-pilot mode

•  Problem: The links from transit providers are congested due to high incoming traffic. Automate traffic shifting per BGP communities to alternate TP.

•  Solution/Approach: Integrates and automate analytics data showing top BGP community traffic.

•  Monitors bandwidth availability down to Customer devices and performs BGP route extraction and analysis (using BMP) – Extract topology information, LSPs, telemetry

•  Using BGP Policies or route injection, selectively shift customer incoming traffic to alternate peer links or to even to an alternate transit provider

•  Optionally steer selected customer prefixes to specific LSPs

Internet

Analytics

ASBR (BR1)

ASBR (BR2)

TP2

Content source

TP3

TP1

PE1

SDN

Inbound traffic

TP4

Customers/Users

1

Collects interface stats and flow stats per destination AS/destination prefix (IPFIX) Build BGP community real-time utilization

Inject BGP communities or BGP routes to influence traffic shifting (depending on BGP implementation/config.)

2

2

•  Automation hierarchical policies: Automate within the same router, other routers same site, completely different region (slice)

•  Need to factor in the BGP convergence time

Page 15: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 15

Use Case 4: Controlling Inbound Traffic Example of Operator Triggered Change Problem: The links from transit providers are congested due to high incoming traffic.

Peering Policy Intent: Shift top traffic per destination AS/prefix to South/East region and pick and choose which prefix to steer to alternate tunnel once traffic is shifted.

Solution/Approach:

•  Slice the peering/network in terms of automation/optimization zones.

•  Performs BGP route extraction and analysis (using BMP) – Extract topology information (BGP-LS, IGP), discovers LSPs.

•  Using BGP Policies or route injection, selectively shift customer incoming traffic to alternate transit provider by injecting the change to the PEs.

•  Optionally steer selected customer prefixes to specific LSPs

1

Integrates and automate analytics data showing customers destination AS traffic is in the top-5. Collects interface stats and flow stats per destination AS/destination prefix (IPFIX)

Add BGP communities or BGP routes to influence traffic shifting (depending on BGP implementation/config.)

2

2

Internet

Analytics

ASBR (BR1)

ASBR (BR2)

Transit 1

Content source

Transit 2

Peer

PE1

SDN

End users

Inbound traffic

Transit 3South Region/East

North Region/West

Select prefixes and steer to selected LSPs/tunnels. Use BMP to learn all the routes

Automation/optimization slices

Policies set once (match on community set MED...)

Page 16: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 16

Use Case 4: Controlling Inbound Traffic – Behind The Scene Example of Operator Triggered Change

Deployed on Border Router: community ASPATH_PREPEND_1_TP1_NORTH members 1000:12201;

term 40 {

from {

protocol bgp;

community ASPATH_PREPEND_1_TP1_SOUTH;

}

then {

as-path-prepend 1000;

.

From Transit provider 2/Slice6011 to transit provider 2/Slice6012 policy-statement eBGP {

...

from {

protocol bgp;

as-path CUSTOMER1-ASN;

}

then {

community add TP2_SLICE6011_SUPPRESSED;

community add TP1_SUPPRESSED;

community add TP2_SLICE6012_USED;

community add TP2_USED;

accept;

}

community TP1_SLICE6011_SUPPRESSED members 1000:12227;

community TP1_SUPPRESSED members [ 1000:12201 1000:12222 ];

community TP2_SLICE6012_USED members 1000:12176;

community TP2_USED members [ 1000:12151 1000:12171 ];

Automatic Injection

Page 17: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 17

Use Case 5: The Case of Destination AS Optimization

Stats per destination AS, per traffic category or per specific APP

12

Internet

Analytics

ASBR (BR1)

ASBR (BR2)

Transit 1

End usersTransit 2

Peer

IXP

ASBR (BR3)

AS522

AS521

AS523

PE1

PE2

DC/Hosting ContentASBR (BR4)

SDN

UplinksService Provider

Problem:

Hosting provider with congestion uplinks and available alternate network (metro) but limited bandwidth used congestion with no BGP reconfiguration

Determine best next hop and steer selected AS to that next hop.

Real-time traffic utilization monitoring per destination AS.

Approach:

1

2

Build one steering transaction of a set of traffic steering rules and redirect to alternate Next Hop/LSP

Requires next hop tracking capabilities

To ASBR BR4

Page 18: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 18

•  Tracking changes (uniform data retention consideration) •  Culture and operational change considerations •  Think about “Revertive” actions (Undo button)! •  Peering simulation & predictive analysis on multi-dimensional data-set •  Traffic engineering/steering on multi-dimensional data-set •  Network control changes at large scale and real-time or near real-time

telemetry at high frequency. •  Keep what BGP is best at and complement existing functionality

Summary and Other Considerations

Page 19: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 19 Reda Laichi & Hamid Ould-Brahim, NANOG 74, Oct. 2018, 19

References

•  BGP-LS•  https://tools.ietf.org/html/rfc7752

•  PCEP•  https://tools.ietf.org/html/rfc5440

•  SegmentRoutingArchitecture•  https://tools.ietf.org/html/rfc8402

•  InsightDrivenAutomatedNetworkwhitepaper•  https://pages.nokia.com/12018.Insight-driven-automated-networking.html

Page 20: SDN-based Automated Peering Optimization Challenges and … · 2018-10-09 · SDN-based Automated Peering Optimization Challenges and Solutions Reda Laichi: reda.laichi@nokia.com

Questions?

RedaLaichi:[email protected]:[email protected]

20