1 Backward Congestion Notification Version 2.0 Davide Bergamasco ([email protected])[email protected]...

39
1 Backward Congestion Notification Version 2.0 Davide Bergamasco ([email protected] ) Rong Pan ([email protected]) Cisco Systems, Inc. IEEE 802.1 Interim Meeting Garden Grove, CA (USA) September 22, 2005

Transcript of 1 Backward Congestion Notification Version 2.0 Davide Bergamasco ([email protected])[email protected]...

Page 1: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

1

Backward Congestion Notification Version 2.0

Davide Bergamasco ([email protected])

Rong Pan ([email protected])

Cisco Systems, Inc.

IEEE 802.1 Interim Meeting

Garden Grove, CA (USA)

September 22, 2005

Page 2: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

222

Credits

• Valentina Alaria (Cisco)

• Andrea Baldini (Cisco)

• Flavio Bonomi (Cisco)

• Manoj K. Wadekar (Intel)

Page 3: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

333

BCN v2.0

• Desire from Mick to see an analytical studyof BCN stability

• BCN v2.0 improvements

• Linear control loop allows analysis of stability

• Simplified detection mechanism

• Reduced signaling rate

• Original BCN framework remains the same

Page 4: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

444

BCN Background

Data Center Network

10 Gbps

End Node A 10 Gbps

10 Gbps

End Node B

10 Gbps

10 Gbps End Node C

10 Gbps

Tra

ffic

Traffic

BCN Message

BC

N M

essa

ge

Congestion

Traffic

Traf

fic

Traffic

Edge Switch A

Core Switch

Edge Switch B

Edge Switch C

Page 5: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

555

Detection & Signaling

FULL QUEUE

OUTIN

Qeq

BCN (Qoff, Qdelta)

BCN (0,0) No Message

BCN (0,0)

RLTaggedFrame?

SampleFrame with

Probability P

No

Yes

MESSAGE TO GENERATE

MESSAGE TO GENERATE

EMPTY QUEUE

Qsc

BCN (Qoff, Qdelta)

SampledFrame?

Yes

No

SendBCN

NOP

Qoff = Qeq - Qlen [-Qeq. +Qeq]

Qdelta = #pktEnq - #pktDeq [-2Qeq, +2Qeq]

Page 6: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

666

Reaction

Data OUT

R1F1

R2F2

RnFn

No

Mat

ch

Control IN

Data IN

Packets Marked withRATE_LIMITED_TAG

EDGENODE

NETWORKCORE

BCN Messagesfrom congestedpoint

* Feedback

Fb = (Qoff - W * Qdelta)

* Additive Increase (Fb > 0)

R = R + Gi * Fb * ru

* Multiplicative Decrease (Fb < 0)

R = R * ( 1 - Gd * |Fb| )

* Parameters

W = derivative weightGi = increase gainGd = decrease gainru = rate unit

Page 7: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

777

Suggested BCN Message Format 0 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DA = SA of sampled frame +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SA = MAC Address of CP + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IEEE 802.1Q Tag or S-Tag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EtherType = BCN |Version| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + CPID + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Qoff | Qdelta | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | | First N bytes of sampled frame starting from DA | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FCS | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 8: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

888

Suggested RLT Tag Format 0 3 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DA of rate-limited frame +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SA of rate-limited frame + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IEEE 802.1Q Tag or S-Tag of rate-limited frame | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EtherType = RLT |Version| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + CPID + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp |EtherType of rate limited frame|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| |+ Payload of rate-limited frame +| |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| FCS |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Page 9: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

999ST1 SU1 ST2 SU2 ST3 SU3 ST4 SU4 DT DU

SR2

DR2

SJ

Core Switch

ES2 ES3 ES4 ES5

ES6

SR1

ES1

DR1

Simulation Environment (1)

Congestion

TCP Bulk

UDP On/Off

Page 10: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

101010

Simulation Environment (2)

• Short Range, High Speed DC Network

• Link Capacity = 10 Gbps

• Switch latency = 1 s

• Link Length = 100 m (0.5 s propagation delay)

• Control loop

• Delay ~ 3 s

• Parameters

• W = 2

• Gi = 4

• Gd = 1/64

• Ru = 8 Mbps

• Workload

• ST1-ST4: 10 parallel TCP connections transferring 1 MB each continuously

• SU1-SU4: 64 KB bursts of UDP traffic starting at t = 10 ms

Page 11: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

111111

BCNv1.0

Page 12: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

121212

BCNv2.0

Higher Stability @ Steady State

Faster Transient Response

Page 13: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

131313

Simulation Environment (3)

• Long Range, High Speed DC Network

• Link Capacity = 10 Gbps

• Switch latency = 1 s

• Link Length = 20000 m (100 s propagation delay)

• Control loop

• Delay ~ 200 s

• Parameters

• W = 2

• Gi = 4

• Gd = 1/64

• Ru = 8 Mbps

• Workload

• ST1-ST4: 10 parallel TCP connections transferring 1 MB each continuously

• SU1-SU4: 64 KB bursts of UDP traffic starting at t = 10 ms

Page 14: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

141414

BCNv1.0

Page 15: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

151515

BCNv2.0

Much higher stability @ steady state with larger

loop delays

Page 16: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

161616

Summary

• BCN v2 has a number of advantages …

• Can be studied analytically

• Better protection of TCP flows in mixed TCP and UDP traffic scenarios

• Detection algorithm independent of Switch implementation

• Better Performance

• Lower signaling frequency (from 10% to 1%)

• Better stability

• Increased tolerance to loop delays

• … and one disadvantage

• Slower convergence to fairness

Page 17: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

171717

A Control-Theoretic Approach to BCNDesign and Analysis

Page 18: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

181818

Notation

N: Number of FlowsC: Link Capacity: Round Trip Delay

w: Weight of the DerivitivePm: Sampling ProbabilityGi: Additive Increase GainGd: Multiplicative Decrease Gain

Page 19: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

191919

Block Diagram of BCN Congestion Control

+

C

_qR

Time Delay

+

+

_

Gi

∆R

Pm

))1()((*

))(()(

TqTqw

TqqTFb eq

N

Gd

+

+

Page 20: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

202020

Non-linear Differential Equations

meq PCdt

tdqwtqqtFb

CtRNdt

tdq

*

1*

)(*))(()(

)(*)(

md PtRtFbGtRdt

tdR*)(*)(**)(

)(

If Fb(t-) > 0

If Fb(t-) < 0mi PtRtFbG

dt

tdR*)(*)(*

)(

Link Control

Source Control

Page 21: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

212121

Linearization Around Operating Point

• Using feedback control to analyze local stability

• Operating point:

R = C/N;

q’ = qeq – q = 0;

• Linearization

Difficulty: depending on sgn(Fb(t-d)), the system responses are different

– Luckily, a piecewise-linear function

Details are in the appendix

Page 22: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

222222

Block Diagram of BCN Feedback Control

+

R

_

+

+

s

N q

Fb)

*

*1()(

mPC

swsFb se

N

CGws

N

CPG

d

md

2

2**

wGsNC

PG

i

mi

*

**

lose 90o margin

add lead zero to compensate

)*

*1()(

mPC

swsFb

Multiplicative Decrease:

Additive Increase:

Page 23: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

232323

The Effect Of Zero From Time Domain’s Eyes

R

q

zero:dq/dt

Page 24: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

242424

Choosing Parameters – an example

• Network conditions (10G link)

N = 50

= 200us

• Choose parameters such that the feedback loop is stable with a 35o margin

w = 4

Gi = 2Mbps

Gd = 1/128

Pm = 0.01

Page 25: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

252525

Stability Result:lo

st 9

0o m

argi

n

1. With N = 50, delay = 200us, the system is stable

2. Phase margin translates into allowing extreme network conditions of N -> 1000 flows or -> 1ms before oscillation

Page 26: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

262626

Simulation Result Shows A Stable System for N = 50; Delay = 200us

Page 27: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

272727

Simulation Result Shows System is stable, but on the verge of oscillation: N = 50, Delay = 1ms

Page 28: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

282828

Change W = 4 -> 1

1. When w = 1, a system with N = 50, delay = 200us already runs out of margin, on the verge of oscillation

2. w = 1, diminishing zero effect. System can’t cope with wide range of network conditions

Page 29: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

292929

Indeed System is stable, but on the verge of oscillation even for N = 50, Delay = 200us when w = 1.0

Page 30: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

303030

Requests to 802.1

• Start a Task Force on Congestion Management

• Use BCN as a Baseline Proposal

Page 31: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

313131

Appendix

Page 32: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

323232

Linearizing…

)()*

*1(*)(

)*

)(*)((*)(

)()(

)()(

.

.

sqPmC

swGsFb

PmC

tqwtqGtFb

s

sNRsq

tRNtq

Page 33: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

333333

Linearizing Additive Increase Function

)(

)*

*)(

)((***)(*(

)(

**)(*

)(

)(**)(*)(

:

tR

PCw

dttdq

tqqGPtRG

tR

f

N

PCGPtRG

tFb

f

tFbPtRGdt

tdRf

meqmi

mimi

mi

Page 34: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

343434

Linearizing Additive Increase Function

FbwGGsNC

PGR

RwGGFbN

CPGRs

wGG

PC

wNPtRGG

tR

PCw

CtNR

PtRGG

tR

PCw

CtNR

PtRGG

tRPmCw

dttdq

tqqPtRGG

PC

w

dt

tdqtqqGPG

tR

PCw

dttdq

tqqGPtRG

tR

f

i

mi

imi

i

mmi

mmi

mmi

eq

mim

eqmi

meqmi

**

**

*****

**

****)(**

)(

)*

*))(((

**)(**

)(

)*

*))(((

**)(**

)(

)*

*)(

)((**)(**)

**

)()((***

)(

)*

*)(

)((***)(*(

)(

Page 35: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

353535

Linearizing Multiplicative Decrease Function

)(

)*

*)(

)((***)(*)(*(

)(

***)(*)(*

)(

)(**)(*)(*)(

:

2

2

tR

PCw

dttdq

tqqGPtRtRG

tR

g

N

CPGPtRtRG

tFb

g

tFbPtRtRGdt

tdRg

meqmd

mdmd

md

Page 36: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

363636

Fb

NCG

wGs

NCPG

R

RN

CGwGFb

N

CPGRs

wGN

CGwGtRG

PC

wNGPtRG

tR

PCw

CtNR

GPtRG

tR

PCw

CtNR

GPtRG

tR

PCw

dttdq

tqq

GPtRGPC

w

dt

tdqtqqGPtRG

tR

PCw

dttdq

tqqGPtRtRG

tR

g

d

md

dmd

dd

mmd

mmd

mmd

meq

mdm

eqmd

meqmd

**

**

**

***

*****)(*

*****)(*

)(

)*

*))(((

***)(*

)(

)*

*))(((

***)(*

)(

)*

*)(

)((

***)(*)*

*)(

)((***)(**2

)(

)*

*)(

)((***)(*)(*(

)(

2

2

2

2

22

2

2

Linearizing Multiplicative Decrease Function

Page 37: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

373737

- - - -+ + + +

Stop Generation of BCN Messages

t

Q

Qeq

Issue #1: Non-linearity

• ISSUE: Overshoots and undershoots accumulate over time

• SOLUTION: Signal only when

• Q > Qeq && dQ/dt > 0

• Q < Qeq && dQ/dt < 0

• Easy to implement in hardware: just an Up/Down counter

• Increment @ every enqueue

• Decrement @ every dequeue

• Reduces signaling rate by 50%!!

Page 38: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

383838

Issue #2: Specific Detection Mechanism

FULL QUEUE

OUTIN

T+4T+3T+2T+1T+0

BCN+4BCN+3BCN+2BCN+1

BCN 0 No Message

NoMessage

BCN 0

RLTaggedFrame?

SampleFrame with

Probability P

No

Yes

MESSAGE TO GENERATE

MESSAGE TO GENERATE

EQUILIBRIUMEMPTY QUEUE

T-1T-2T-3T-4

BCN-1BCN-2BCN-3BCN-4

BCN-1BCN-2BCN-3BCN-4

SampledFrame?

Yes

RL Tag && Solicit

Bit Set?

No

Yes

No

BCNtype

dQ/dt < 0?

dQ/dt > 0

+ Yes

NOP

SendBCN

NOP

Yes

No

No

-

BCN+4BCN+3BCN+2BCN+1No Message

MESSAGE TO GENERATE

NOP

0

Page 39: 1 Backward Congestion Notification Version 2.0 Davide Bergamasco (davide@cisco.com)davide@cisco.com Rong Pan (ropan@cisco.com) Cisco Systems, Inc. IEEE.

393939393939