Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery...
Transcript of Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery...
![Page 1: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/1.jpg)
Blink: Fast Connectivity Recovery Entirely in the Data Plane
Joint work with
Edgar Costa Molero Maria Apostolaki Stefano VissicchioAlberto DainottiLaurent Vanbever
ETH Zürich ETH Zürich University College LondonCAIDA, UC San DiegoETH Zürich
Thomas HolterbachETH Zürich
NSDI26th February 2019
https://blink.ethz.ch
![Page 2: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/2.jpg)
2
![Page 3: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/3.jpg)
www.opte.org
AS level topologyin 2015
3
![Page 4: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/4.jpg)
www.opte.org
AS level topologyin 2015
4
Your network
![Page 5: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/5.jpg)
www.opte.org
AS level topologyin 2015
5
Your networkLocal
![Page 6: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/6.jpg)
www.opte.org
AS level topologyin 2015
6
Remote
Remote
RemoteRemote
Remote
Remote
Your networkLocal
![Page 7: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/7.jpg)
Upon local failures, connectivity can be quickly restored
7
![Page 8: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/8.jpg)
Upon local failures, connectivity can be quickly restored
8
Fast failure detectionusing e.g., hardware-generated signals
Fast traffic reroutingusing e.g., Prefix Independent Convergenceor MPLS Fast Reroute
![Page 9: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/9.jpg)
Upon remote failures, the only way to restore connectivity isto wait for the Internet to converge
9
![Page 10: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/10.jpg)
10
… and the Internet converges very slowly*
*Holterbach et al. SWIFT: Predictive Fast RerouteACM SIGCOMM, 2017
Upon remote failures, the only way to restore connectivity isto wait for the Internet to converge
![Page 11: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/11.jpg)
11
![Page 12: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/12.jpg)
12
0 100 200 300 400 500 600Time difference (s) between the
outage and the first and last withdrawal
0.0
0.2
0.4
0.6
0.8
1.0
CD
F ov
er th
e BG
P pe
ers First
withdrawal
Lastwithdrawal
AS11427
Time difference between the outage
CDF over the BGP peers
Time difference between the outageand the BGP withdrawals (s)
BGP took minutes to converge upon the Time Warner Cable outage in 2014
![Page 13: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/13.jpg)
13
0 100 200 300 400 500 600Time difference (s) between the
outage and the first and last withdrawal
0.0
0.2
0.4
0.6
0.8
1.0
CD
F ov
er th
e BG
P pe
ers First
withdrawal
Lastwithdrawal
AS11427
Time difference between the outage
CDF over the BGP peers
Time difference between the outageand the BGP withdrawals (s)
BGP took minutes to converge upon the Time Warner Cable outage in 2014
![Page 14: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/14.jpg)
Control-plane (e.g., BGP) based techniques typically converge slowlyupon remote outages
14
![Page 15: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/15.jpg)
15
What about using data-plane signals for fast rerouting?
Control-plane (e.g., BGP) based techniques typically converge slowlyupon remote outages
![Page 16: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/16.jpg)
16
Blink: Fast Connectivity Recovery Entirely in the Data Plane
Thomas HolterbachETH Zürich
NSDI26th February 2019
https://blink.ethz.ch
![Page 17: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/17.jpg)
Outline
4. Blink works in practice, on existing devices
1. Why and how to use data-plane signals for fast rerouting
2. Blink infers more than 80% of the failures, often within 1s
3. Blink quickly reroutes traffic to working backup paths
17
![Page 18: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/18.jpg)
Outline
4. Blink works in practice, on existing devices
1. Why and how to use data-plane signals for fast rerouting
2. Blink infers more than 80% of the failures, often within 1s
3. Blink quickly reroutes traffic to working backup paths
18
![Page 19: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/19.jpg)
TCP flows exhibit the same behavior upon failures
19
![Page 20: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/20.jpg)
A:1000
S:500
source destination
20
TCP flows exhibit the same behavior upon failures
![Page 21: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/21.jpg)
A:1000
failure
S:500
source destination
21
TCP flows exhibit the same behavior upon failures
![Page 22: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/22.jpg)
cwnd:4 pkts
S:4100
t
S:3100
S:2100
S:1000
A:1000
failure
(=congestion window)
S:500
source destination
22
TCP flows exhibit the same behavior upon failures
![Page 23: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/23.jpg)
RTO: 200ms cwnd:4 pkts
S:4100
t
S:3100
S:2100
S:1000
A:1000
failure
(=congestion window)
S:500
source destination
23
TCP flows exhibit the same behavior upon failures
Retransmission timeout (RTO) = SRTT + 4∗RTT_VAR
![Page 24: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/24.jpg)
RTO: 200ms cwnd:4 pkts
S:4100
t
t + 200ms cwnd:1
S:3100
S:2100
S:1000
A:1000
failure
(=congestion window)
S:500
source destination
24
TCP flows exhibit the same behavior upon failures
S:1000
Retransmission timeout (RTO) = SRTT + 4∗RTT_VAR
![Page 25: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/25.jpg)
RTO: 200ms cwnd:4 pkts
S:4100
t
t + 200ms cwnd:1
cwnd:1
S:3100
S:2100
S:1000
A:1000
failure
exponential backoff
(=congestion window)
t + 600ms
……
S:500
source destination
25
TCP flows exhibit the same behavior upon failures
S:1000
S:1000
Retransmission timeout (RTO) = SRTT + 4∗RTT_VAR
![Page 26: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/26.jpg)
RTO: 200ms cwnd:4 pkts
S:4100
t
t + 200ms cwnd:1
cwnd:1
S:3100
S:2100
S:1000
A:1000
failure
exponential backoff
(=congestion window)
t + 600ms
…
……
…
t + 1400ms cwnd:1
S:500
source destination
26
TCP flows exhibit the same behavior upon failures
S:1000
S:1000
S:1000
Retransmission timeout (RTO) = SRTT + 4∗RTT_VAR
![Page 27: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/27.jpg)
When multiple flows experience the same failure the signal is a wave of retransmissions
27
![Page 28: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/28.jpg)
*CAIDA equinix-chicagodirection A, 2015
28
Same RTT distributionthan in a real trace*
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 29: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/29.jpg)
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
29
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
Number ofretransmissions
Time (s)
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 30: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/30.jpg)
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
30
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
Number ofretransmissions
Time (s)
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 31: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/31.jpg)
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
31
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
Number ofretransmissions
Time (s)
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 32: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/32.jpg)
32
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
Number ofretransmissions
Time (s)
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 33: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/33.jpg)
33
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
Time (s)
Number ofretransmissions
![Page 34: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/34.jpg)
34
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
Time (s)
Number ofretransmissions
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
![Page 35: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/35.jpg)
Outline
4. Blink works in practice, on existing devices
1. Why and how to use data-plane signals for fast rerouting
2. Blink infers more than 80% of the failures, often within 1s
3. Blink quickly reroutes traffic to working backup paths
35
![Page 36: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/36.jpg)
To detect failures, Blink looks at TCP retransmissions
36
![Page 37: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/37.jpg)
To detect failures, Blink looks at TCP retransmissionsProblem: TCP retransmissions can be unrelated to a failure (i.e., noise)
37
![Page 38: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/38.jpg)
38
number of retransmissions
Time
To detect failures, Blink looks at TCP retransmissionsProblem: TCP retransmissions can be unrelated to a failure (i.e., noise)
![Page 39: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/39.jpg)
39
number of retransmissions
Time
congestions
To detect failures, Blink looks at TCP retransmissionsProblem: TCP retransmissions can be unrelated to a failure (i.e., noise)
![Page 40: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/40.jpg)
40
number of retransmissions
Time
one "bogus" flow
To detect failures, Blink looks at TCP retransmissionsProblem: TCP retransmissions can be unrelated to a failure (i.e., noise)
![Page 41: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/41.jpg)
41
number of retransmissions
Time
failure
To detect failures, Blink looks at TCP retransmissionsProblem: TCP retransmissions can be unrelated to a failure (i.e., noise)
![Page 42: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/42.jpg)
Solution #1: Blink looks at consecutive packets with the same sequence number
![Page 43: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/43.jpg)
RTO: 200ms cwnd:4 pkts
S:4100
t
t + 200ms cwnd:1
cwnd:1
S:3100
S:2100
S:1000
A:1000
failure
exponential backoff
(=congestion window)
t + 600ms
…
……
…
t + 1400ms cwnd:1
S:500
source destination
S:1000
S:1000
S:1000
Retransmission timeout (RTO) = SRTT + 4∗RTT_VAR
Solution #1: Blink looks at consecutive packets with the same sequence number
![Page 44: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/44.jpg)
44
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
![Page 45: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/45.jpg)
45
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 46: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/46.jpg)
46
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 47: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/47.jpg)
47
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 48: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/48.jpg)
48
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 49: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/49.jpg)
49
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 50: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/50.jpg)
50
number of retransmissions
Time
number of flowsexperiencingretransmissions
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
congestionsone "bogus" flow
failure
![Page 51: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/51.jpg)
51
number of retransmissions
Time
number of flowsexperiencingretransmissions
800ms
congestionsone "bogus" flow
failure
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
![Page 52: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/52.jpg)
52
number of retransmissions
Time
number of flowsexperiencingretransmissions
congestionsone "bogus" flow
failure
Solution #2: Blink monitors the number of flows experiencingretransmissions over time using a sliding window
![Page 53: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/53.jpg)
53
Blink is intended to run in programmable switches
![Page 54: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/54.jpg)
54
Blink is intended to run in programmable switchesProblem: those switches have very limited resources
![Page 55: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/55.jpg)
Solution #1: Blink focuses on the popular prefixes,i.e., the ones that attract data traffic
55
![Page 56: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/56.jpg)
56
As Internet traffic follows a Zipf-like distribution* (1k pref. account for >50%),Blink covers the vast majority of the Internet traffic
*Sarra et al. Leveraging Zipf’s Law for Traffic offloadingACM CCR, 2012
Solution #1: Blink focuses on the popular prefixes,i.e., the ones that attract data traffic
![Page 57: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/57.jpg)
Solution #2: Blink monitors a sample of the flowsfor each monitored prefix
57
TCP flows
Traffic to a destination prefix
![Page 58: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/58.jpg)
58
TCP flows
Traffic to a destination prefix
Solution #2: Blink monitors a sample of the flowsfor each monitored prefix
default 64 flowsmonitored
![Page 59: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/59.jpg)
To monitor active flows, Blink evicts a flow from the sampleif it does not send a packet for a given time (default 2s)
59
![Page 60: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/60.jpg)
60
and selects a new one in a first-seen, first-selected manner
To monitor active flows, Blink evicts a flow from the sampleif it does not send a packet for a given time (default 2s)
![Page 61: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/61.jpg)
61
Blink infers a failure for a prefix when the majority ofthe monitored flows experience retransmissions
![Page 62: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/62.jpg)
62
Time
number of flowsexperiencingretransmissions
Blink infers a failure for a prefix when the majority ofthe monitored flows experience retransmissions
![Page 63: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/63.jpg)
63
Time
number of flowsexperiencingretransmissions
32
FAILURE
Blink infers a failure for a prefix when the majority ofthe monitored flows experience retransmissions
![Page 64: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/64.jpg)
We evaluated Blink failure inference using 15 real traces,13 from CAIDA, 2 from MAWI, covering a total of 15.8 hours
![Page 65: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/65.jpg)
We evaluated Blink failure inference using 15 real traces,13 from CAIDA, 2 from MAWI, covering a total of 15.8 hours
We are interested in:
Accuracy: True Positive Rate vs False Positive Rate
Speed: How long does Blink take to infer failures
![Page 66: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/66.jpg)
As we do not have ground truth, we generated synthetic tracesfollowing the traffic characteristics extracted from the real traces
66
![Page 67: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/67.jpg)
67
Step #1 - We extracted the RTT, Packet rate, Flow durationfrom the real traces
Step #2 - We used NS3 to replay these flowsand simulate a failure
Step #3 - We ran a Python-based version of Blinkon the resulting traces
As we do not have ground truth, we generated synthetic tracesfollowing the traffic characteristics extracted from the real traces
![Page 68: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/68.jpg)
12
34
56
78
910
1112
1314
15Trace ID
0.0
0.2
0.4
0.6
0.8Tr
ue P
ositi
ve R
ate
68
Blink failure inference accuracy is above 80% for 13 real traces out of 15
Real traces ID
True Positive Rate
![Page 69: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/69.jpg)
12
34
56
78
910
1112
1314
15Trace ID
0.0
0.2
0.4
0.6
0.8Tr
ue P
ositi
ve R
ate
69
Real traces ID
True Positive Rate
Blink failure inference accuracy is above 80% for 13 real traces out of 15
![Page 70: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/70.jpg)
70
Blink avoids incorrectly inferring failures when packet loss is below 4%
packet loss % 1 2 3 4 5 8 9…
False Positive Rate
![Page 71: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/71.jpg)
71
Blink avoids incorrectly inferring failures when packet loss is below 4%
packet loss % 1 2 3 4 5 8 9
0 0 0 0.67 0.67 1.3 2.7False Positive Rate …
…
![Page 72: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/72.jpg)
72
Blink infers a failure within 1s for the majority of the cases
12
34
56
78
910
1112
1314
15Trace ID
0
2
4
6Sp
eed
(s)
1s
Real traces ID
Speed (s)
![Page 73: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/73.jpg)
12
34
56
78
910
1112
1314
15Trace ID
0
2
4
6Sp
eed
(s)
1s
73
Blink infers a failure within 1s for the majority of the cases
Real traces ID
Speed (s)
![Page 74: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/74.jpg)
Outline
4. Blink works in practice, on existing devices
1. Why and how to use data-plane signals for fast rerouting
2. Blink infers more than 80% of the failures, often within 1s
3. Blink quickly reroutes traffic to working backup paths
74
![Page 75: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/75.jpg)
75
Upon detection of a failure, Blink immediately activatesbackup paths pre-populated by the control-plane
![Page 76: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/76.jpg)
Problem: since the rerouting is done entirely in the data-plane,Blink cannot prevent forwarding issues
![Page 77: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/77.jpg)
77
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
Problem: since the rerouting is done entirely in the data-plane,Blink cannot prevent forwarding issues
AS4
![Page 78: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/78.jpg)
Problem: since the rerouting is done entirely in the data-plane,Blink cannot prevent forwarding issues
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
AS4
![Page 79: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/79.jpg)
Problem: since the rerouting is done entirely in the data-plane,Blink cannot prevent forwarding issues
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
AS4
BLACKHOLE
![Page 80: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/80.jpg)
Problem: since the rerouting is done entirely in the data-plane,Blink cannot prevent forwarding issues
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
AS4
LOOP
![Page 81: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/81.jpg)
81
Solution: As for failures, Blink uses data-plane signalsto pick a working backup path
![Page 82: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/82.jpg)
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
AS4
32 monitoredflows
32 monitored flows +the non-monitored ones
The probing periodlasts up to 1s
Solution: As for failures, Blink uses data-plane signalsto pick a working backup path
![Page 83: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/83.jpg)
AS3 (backup #2)
AS2(backup #1)
AS1 (primary)
Blink
destination
AS4
Solution: As for failures, Blink uses data-plane signalsto pick a working backup path
![Page 84: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/84.jpg)
As for failures, Blink compares the sequence number ofconsecutive packets to detect blackholes or loops*
84
*See the paper for an evaluation of the rerouting
![Page 85: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/85.jpg)
Outline
4. Blink works in practice, on existing devices
1. Why and how to use data-plane signals for fast rerouting
2. Blink infers more than 80% of the failures, often within 1s
3. Blink quickly reroutes traffic to working backup paths
85
![Page 86: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/86.jpg)
86
We ran Blink on the 15 real traces (15.8 hours)
![Page 87: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/87.jpg)
87
We ran Blink on the 15 real traces (15.8 hours)and it detected 6 outages, each affecting at least 42% of all the flows
![Page 88: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/88.jpg)
88
On current programmable switches, Blink supports up to 10k prefixes
![Page 89: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/89.jpg)
On current programmable switches, Blink supports up to 10k prefixes
89
Number of prefixes
Memory
![Page 90: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/90.jpg)
90
Number of prefixes
Memory
1 pref.
6418 bits
On current programmable switches, Blink supports up to 10k prefixes
![Page 91: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/91.jpg)
91
Number of prefixes
Memory
1 pref.
6418 bits
8 Mb
10k pref.
On current programmable switches, Blink supports up to 10k prefixes
![Page 92: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/92.jpg)
Blink works on a real Barefoot Tofino switch
![Page 93: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/93.jpg)
Blink works on a real Barefoot Tofino switch
TOFINO
![Page 94: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/94.jpg)
Blink works on a real Barefoot Tofino switch
TOFINO
Source
Destination
![Page 95: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/95.jpg)
Number of packetsevery 100ms
Time (s)
RTTs in [10ms; 300ms]
95
Blink works on a real Barefoot Tofino switch
![Page 96: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/96.jpg)
Number of packetsevery 100ms
Time (s)
RTTs in [10ms; 300ms]
96
Blink works on a real Barefoot Tofino switch
![Page 97: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/97.jpg)
Number of packetsevery 100ms
Time (s)
RTTs in [10ms; 300ms]
97
1.1s
Blink works on a real Barefoot Tofino switch
![Page 98: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/98.jpg)
Blink: Fast Connectivity Recovery Entirely in the Data Plane
Infers failures from data-plane signalswith more than 80% accuracy, and often within 1s
Fast reroutes traffic at line rateto working backup paths
Works on real traffic traces and on existing devices
https://blink.ethz.ch
![Page 99: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/99.jpg)
Blink: Fast Connectivity Recovery Entirely in the Data Plane
Joint work with
Edgar Costa Molero Maria Apostolaki Stefano VissicchioAlberto DainottiLaurent Vanbever
ETH Zürich ETH Zürich University College LondonCAIDA, UC San DiegoETH Zürich
Thomas HolterbachETH Zürich
NSDI26th February 2019
![Page 100: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/100.jpg)
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
100
*CAIDA equinix-chicagodirection A, 2015
Same RTT distributionthan in a real trace*
Time (s)
Number ofretransmissions
We simulated a failure affecting100k flows with NS3
When multiple flows experience the same failure the signal is a wave of retransmissions
RTT x1.5
![Page 101: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/101.jpg)
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
101
Time (s)
When multiple flows experience the same failure the signal is a wave of retransmissions
RTT x1.5
0 1 2 3 4 5 6 7Time (s)
0
10K
20K
30K
40K
50K
60K
70K
Num
ber o
f ret
rans
mis
sion
s
Failu
re
![Page 102: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/102.jpg)
102
Blink failure inference accuracy is close to a best case scenario,and is above 80% for 13 real traces out of 15
12
34
56
78
910
1112
1314
15Trace ID
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
e
"best case", i.e.,no sampling butthreshold still 32
12
34
56
78
910
1112
1314
15Trace ID
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
eBlink
![Page 103: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/103.jpg)
12
34
56
78
910
1112
1314
15Trace ID
0
2
4
6Sp
eed
(s)
1s
103
Blink infers a failure within 1s for the majority of the cases
"best case", i.e.,no sampling butthreshold still 32
Blink
Real traces ID
Speed (s)
![Page 104: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/104.jpg)
104
Blink avoids incorrectly inferring failures when packet loss is below 4%
packet loss % 1 2 3 4 5 8 9
0 0 0 0.67 0.67 1.3 2.7
False Positive Rate
Blink
no sampling butthreshold still 32 59 85 93 94 95 97
…
…
… 98
![Page 105: Blink: Fast Connectivity Recovery Entirely in the Data Plane · Blink: Fast Connectivity Recovery Entirely in the Data Plane Joint work with Edgar Costa Molero Maria Apostolaki Stefano](https://reader033.fdocuments.in/reader033/viewer/2022060318/5f0c9fee7e708231d43654ee/html5/thumbnails/105.jpg)
Blink quickly infers and avoids forwarding loops
Time (s)
Number of packetsevery 100ms