Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the...

65
Pantheon: the training ground for Internet congestion-control research https://pantheon.stanford.edu Francis Y. Yan , Jestin Ma , Greg Hill , Deepti Raghavan , Riad S. Wahby , Philip Levis , Keith Winstein Stanford University, Massachusetts Institute of Technology May 15, 2018 Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 1 / 44

Transcript of Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the...

Page 1: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: the training ground for Internetcongestion-control researchhttps://pantheon.stanford.edu

Francis Y. Yan†, Jestin Ma†, Greg Hill†, Deepti Raghavan¶,Riad S. Wahby†, Philip Levis†, Keith Winstein†

†Stanford University, ¶Massachusetts Institute of Technology

May 15, 2018

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 1 / 44

Page 2: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Congestion control

Most important topic in computer networking

Avoid congestion collapse

Allocate resources among users

Affect every application using TCP socket

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 2 / 44

Page 3: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Status quo of congestion control research

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 3 / 44

Page 4: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Status quo of congestion control research

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 3 / 44

Page 5: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Status quo of congestion control research

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 3 / 44

Page 6: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Status quo of congestion control research

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 3 / 44

Page 7: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Inconsistent behaviors

Better

Figure: AWS Brazil to Colombia (cellular, 1 flow, 3 trials, P1392)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 4 / 44

Page 8: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Inconsistent behaviors

Figure: a figure presented in the paper(flows share the bandwidth fairly)

Figure: between two real nodes onPantheon (throughput ratio 32:4:1)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 5 / 44

Page 9: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

Every emerging algorithm claims to be the “state-of-the-art”

... compared with other algorithms that they picked

... evaluated on their own testbeds in real world

... and/or on simulators/emulators with their settings

... based on the specific results that they collected

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 6 / 44

Page 10: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 11: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 12: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 13: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 14: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 15: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Challenges and problems

... compared with other algorithms that they picked

=⇒ must acquire, compile, and execute prior algorithms

... evaluated on their own testbeds in real world

=⇒ large service operators: risky to deploy, long turnaround time

=⇒ researchers: at much smaller scale, results may not generalize

... and/or on simulators/emulators with their settings

=⇒ how to configure the settings?

... based on the specific results that they collected

=⇒ but the Internet is diverse and evolving

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 7 / 44

Page 16: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 8 / 44

Page 17: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 8 / 44

Page 18: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Pantheon: a community evaluation platform for congestioncontrol research

a common reference set of benchmark algorithms

a diverse testbed of network nodes

Cellular and wired: U.S., Mexico, Brazil, Colombia, India, ChinaWired networks only: U.K., Australia, Japan, Korea

a collection of calibrated emulators and pathological emulatednetworks

a continous-testing system and a public archive of results athttps://pantheon.stanford.edu

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 9 / 44

Page 19: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Pantheon: a community evaluation platform for congestioncontrol research

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 9 / 44

Page 20: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Our vision

XXX outperforms 15 algorithms on Pantheon

XXX shows 10x performance improvement on Pantheon

results (P1392, P950, ...) are available on Pantheon

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 10 / 44

Page 21: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Pantheon: a community resource

A common language in congestion control

shared testbedsbenchmark algorithmspublic data

A training ground for congestion control

enables faster innovation and more reproducible researche.g., Vivace, Copa (NSDI ’18), and Indigo: data-driven congestioncontrol based on neural networks

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 11 / 44

Page 22: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Introduction

Pantheon: a community resource

A common language in congestion control

shared testbedsbenchmark algorithmspublic data

A training ground for congestion control

enables faster innovation and more reproducible researche.g., Vivace, Copa (NSDI ’18), and Indigo: data-driven congestioncontrol based on neural networks

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 11 / 44

Page 23: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Outline

1 Introduction

2 Pantheon: a community evaluation platform for congestion control

3 Calibrated Emulators

4 Ongoing projectsVivace, Copa, and moreIndigo

5 Conclusion

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 12 / 44

Page 24: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

A collection of congestion-control algorithms, eachexposing the same interface

15+ algorithms and designed to be easily extended

TCP Cubic, TCP Vegas, TCP BBR, QUIC Cubic, LEDBAT, WebRTC(media), Sprout, Remy, Verus, PCC, SCReAM, FillP, Copa, Vivace,Indigo, ...

Common testing interface

A full-throttle flow that runs until killed

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 13 / 44

Page 25: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Operation and testing methods

Runs multiple benchmarks per week on each wired and wireless pathsin both directions

Each benchmark runs all schemes in round-robin for multiple times

... with single flow for 30 seconds or multiple flows

Calculates mean throughput, 95th-percentile one-way delay, loss rate

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 14 / 44

Page 26: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Findings

Measurement study from more than a year of data

Comparative analysis: which scheme an end host should run

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 15 / 44

Page 27: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Key finding 1: scheme performance varies by path

Better

Figure: AWS Brazil to Colombia(cellular, 1 flow, 3 trials, P1392)

Better

Figure: Stanford to AWS California(cellular, 1 flow, 3 trials, P950)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 16 / 44

Page 28: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Key finding 2: scheme performance varies by path direction

Better

Figure: AWS Brazil to Colombia(cellular, 1 flow, 3 trials, P1392)

Better

Figure: Colombia to AWS Brazil(cellular, 1 flow, 3 trials, P1391)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 17 / 44

Page 29: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Key finding 3: scheme performance varies in time

Better

Figure: AWS Brazil to Colombia (cellular, 1 flow, 3 trials, filled dots showperformance after 2 days)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 18 / 44

Page 30: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Pantheon: a community evaluation platform for congestion control

Limitations

Only tests schemes at full throttle

Nodes are not necessarily representative

No available Wi-Fi connections currently

Does not measure interactions between different schemes (ongoingcollaboration with CMU)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 19 / 44

Page 31: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Outline

1 Introduction

2 Pantheon: a community evaluation platform for congestion control

3 Calibrated Emulators

4 Ongoing projectsVivace, Copa, and moreIndigo

5 Conclusion

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 20 / 44

Page 32: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Motivations

Simulation/emulation: reproducible and allow rapid experimentation

ns-2/ns-3, Mininet, Mahimahi, etc.

fine-grained and detailed, providing a number of parameters

do not necessarily capture real-world paths

Open problem

What is the choice of parameter values to faithfully emulate a particulartarget network?

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 21 / 44

Page 33: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Motivations

Simulation/emulation: reproducible and allow rapid experimentation

ns-2/ns-3, Mininet, Mahimahi, etc.

fine-grained and detailed, providing a number of parameters

do not necessarily capture real-world paths

Open problem

What is the choice of parameter values to faithfully emulate a particulartarget network?

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 21 / 44

Page 34: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

New figure of merit for network emulators

Replication error

Average difference of the performance of a set of transport algorithms runover the emulator compared with over the target real network path.

Better

Figure: Filled dots: real results over a network path; open dots: results over anemulator.

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 22 / 44

Page 35: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

New figure of merit for network emulators

Replication error

Average difference of the performance of a set of transport algorithms runover the emulator compared with over the target real network path.

Better

Figure: Filled dots: real results over a network path; open dots: results over anemulator.

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 22 / 44

Page 36: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Emulator characteristics

Five parameters

a bottleneck link rate

a constant propagation delay

a DropTail threshold for the sender’s queue

a loss rate (per-packet, i.i.d)

a bit that selects constant rate or Poisson-governed rate

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 23 / 44

Page 37: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Automatically calibrate emulators to match a network path

Collect a set of results over a particular network path on Pantheon

average throughput and 95th percentile delay of a dozen algorithms

Run Bayesian optimization

standard black-box optimization approach

designed for optimizing a noisy function

when observations are expensive to obtain

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 24 / 44

Page 38: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Automatically calibrate emulators to match a network path

Collect a set of results over a particular network path on Pantheon

average throughput and 95th percentile delay of a dozen algorithms

Run Bayesian optimization

standard black-box optimization approach

designed for optimizing a noisy function

when observations are expensive to obtain

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 24 / 44

Page 39: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Bayesian optimization details

Input x : <rate, propagation delay, queue size, loss rate>

Run twice: constant rate and Poisson-governed rate

Objective function f (x): mean replication error

Other common settings:

Prior: Gaussian processAcquisition function: expected improvement

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 25 / 44

Page 40: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Results of calibrated emulators

Trained emulators calibrated to 6 of Pantheon’s paths

Nepal (Wi-Fi), Colombia (cellular), Mexico (cellular), China (wired),India (wired), and Mexico (wired)including single flow and three flows for Mexico (wired)

Each for about 2 hours on 30 machines with 4 cores each

Replication error is within 17% on average

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 26 / 44

Page 41: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Results of calibrated emulators

Trained emulators calibrated to 6 of Pantheon’s paths

Nepal (Wi-Fi), Colombia (cellular), Mexico (cellular), China (wired),India (wired), and Mexico (wired)including single flow and three flows for Mexico (wired)

Each for about 2 hours on 30 machines with 4 cores each

Replication error is within 17% on average

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 26 / 44

Page 42: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Calibrated Emulators

Representative calibration result

Better

Figure: AWS California to Mexico (wired, 3 flows, 10 trials, P1237).Mean replication error: 14.4%.

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 27 / 44

Page 43: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects

Outline

1 Introduction

2 Pantheon: a community evaluation platform for congestion control

3 Calibrated Emulators

4 Ongoing projectsVivace, Copa, and moreIndigo

5 Conclusion

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 28 / 44

Page 44: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Vivace, Copa, and more

Pantheon’s contribution to Vivace, Copa, ...

Vivace (NSDI ’18): validating a new scheme in the real world

Copa (NSDI ’18): iterative design with measurements

Other ongoing projects:

Mixed-scheme multi-flow measurements (CMU)FillP (Huawei)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 29 / 44

Page 45: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: a machine-learned congestion-control algorithm

Trained on data gathered by Pantheon

=⇒ Pantheon enables entirely new designs

Uses a recurrent neural network (LSTM) to encode the algorithm

Imitates a congestion-control oracle

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 30 / 44

Page 46: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: a machine-learned congestion-control algorithm

Trained on data gathered by Pantheon

=⇒ Pantheon enables entirely new designs

Uses a recurrent neural network (LSTM) to encode the algorithm

Imitates a congestion-control oracle

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 30 / 44

Page 47: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Model and imitation learning

Agent: sender

Environment: network

State: congestion signals

Action: congestion window adjustment

Indigo’s goal

Learn to imitate an expert’s behavior: state?−→ action

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 31 / 44

Page 48: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Model and imitation learning

Agent: sender

Environment: network

State: congestion signals

Action: congestion window adjustment

Indigo’s goal

Learn to imitate an expert’s behavior: state?−→ action

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 31 / 44

Page 49: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Expert policy

Congestion-control oracle

Brings congestion window size closer to the ideal size

Does not know ideal size in real world — only exists in emulators

What is the ideal size?

BDP: simple links with a fixed bandwidth and min RTT

Search around BDP otherwise

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 32 / 44

Page 50: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Expert policy

Congestion-control oracle

Brings congestion window size closer to the ideal size

Does not know ideal size in real world — only exists in emulators

What is the ideal size?

BDP: simple links with a fixed bandwidth and min RTT

Search around BDP otherwise

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 32 / 44

Page 51: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Model

Input: a state

→ 1-layer LSTM network with 32 hidden units per layer

→ Softmax classifier with cross-entropy loss

→ Output: an action

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 33 / 44

Page 52: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Step

10 ms (congestion window is adjusted every 10 ms)

but observations occur each time an ACK is received

State

An EWMA of the queuing delay

An EWMA of the sending rate

An EWMA of the receiving rate

The current congestion window size

The previous action taken

Action

÷2,−10 (packets),+0,+10 (packets),×2

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 34 / 44

Page 53: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Step

10 ms (congestion window is adjusted every 10 ms)

but observations occur each time an ACK is received

State

An EWMA of the queuing delay

An EWMA of the sending rate

An EWMA of the receiving rate

The current congestion window size

The previous action taken

Action

÷2,−10 (packets),+0,+10 (packets),×2

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 34 / 44

Page 54: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Step

10 ms (congestion window is adjusted every 10 ms)

but observations occur each time an ACK is received

State

An EWMA of the queuing delay

An EWMA of the sending rate

An EWMA of the receiving rate

The current congestion window size

The previous action taken

Action

÷2,−10 (packets),+0,+10 (packets),×2

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 34 / 44

Page 55: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Training

State-of-the-art algorithm: DAgger

Like supervised learning but in a repeated fashion

Distributed

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 35 / 44

Page 56: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Indigo: design and training

Global LSTMNetwork

Worker 2 Worker NLocal LSTMNetwork

Sender

ReceiverNetworkoracle

1. sync localparameters

2. startflow

3.c get action1

3.b getexpert action1

4. enqueuedata, finish iteration

1repeat step 3every 10ms

3.a get state1

5.

5.

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 36 / 44

Page 57: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Method

Trained on 6 calibrated emulators

help mitigate the “distribution mismatch”

and on 24 synthetic emulators

all combinations of (5, 10, 20, 50, 100, 200 Mbps) link rate with (10,20, 40, 80 ms) min one-way delayinfinite queues and no loss

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 37 / 44

Page 58: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Compute cost

Training

30 worker machines with 4 cores eachthe training server has 8 cores and an Nvidia Tesla K80 GPUtakes less than 12 hours

Inference

each action evaluation takes about 0.5 msslightly higher than TCP Cubic in CPU usage

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 38 / 44

Page 59: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Real-world results

Better

Figure: AWS Brazil to Colombia (wired, 1 flow, 10 trials, P1439)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 39 / 44

Page 60: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Real-world results

Better

Figure: India to AWS India (wired, 3 flows, 10 trials, P1476)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 40 / 44

Page 61: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Ongoing projects Indigo

Real-world results

Figure: Time-domain three-flow test (one trial in P1476)

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 41 / 44

Page 62: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Conclusion

Outline

1 Introduction

2 Pantheon: a community evaluation platform for congestion control

3 Calibrated Emulators

4 Ongoing projectsVivace, Copa, and moreIndigo

5 Conclusion

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 42 / 44

Page 63: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Conclusion

Conclusion

Pantheon

A common language in congestion control

shared testbeds, benchmark algorithms and public data

A training ground for congestion control

enables faster innovation and more reproducible researche.g., Vivace, Copa (NSDI ’18), and Indigo: data-driven congestioncontrol based on neural networks

Calibrated emulators

Replication error — new figure of merit for network emulators

Automatically calibrate an emulator to accurately model real networks

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 43 / 44

Page 64: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Conclusion

Conclusion

Pantheon

A common language in congestion control

shared testbeds, benchmark algorithms and public data

A training ground for congestion control

enables faster innovation and more reproducible researche.g., Vivace, Copa (NSDI ’18), and Indigo: data-driven congestioncontrol based on neural networks

Calibrated emulators

Replication error — new figure of merit for network emulators

Automatically calibrate an emulator to accurately model real networks

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 43 / 44

Page 65: Pantheon: the training ground for Internet congestion ... Talks/2018/Francis Y Yan.pdfPantheon: the training ground for Internet congestion-control research Francis Y. Yany, Jestin

Conclusion

Q&A

Visit https://pantheon.stanford.edu for more results and the paper!

Francis Y. Yan (Stanford) Pantheon of Congestion Control May 15, 2018 44 / 44