House - Dynamic Bandwidth Throttling in a Client Server ...

47

Transcript of House - Dynamic Bandwidth Throttling in a Client Server ...

Page 1: House - Dynamic Bandwidth Throttling in a Client Server ...
Page 2: House - Dynamic Bandwidth Throttling in a Client Server ...

Dynamic Bandwidth Throttling

Bart House,

Development Lead, Microsoft

Page 3: House - Dynamic Bandwidth Throttling in a Client Server ...

The Problem

• Client / Server games require server

• Server uses high outbound bandwidth> Bandwidth to update N clients at 30Hz

Bandwidth = N * N * per-client-update * 30 Bandwidth for 15 clients with just 5 byte update

Bandwidth = 15 * 15 * 40 * 30 = 270kbps

> Average home connection <300kbps> We never have enough bandwidth> Can not update ideally for large games

Page 4: House - Dynamic Bandwidth Throttling in a Client Server ...

The Problem (cont.)

• Home machine is in a hostile environment> Many devices compete for bandwidth

Voice over IP can quickly saturate bandwidth High bandwidth browsing and downloading

– WoW patches, P2P downloads, etc…

And the problem will only get worse– MP3 players downloading in the background wirelessly

> Many Xboxes are connected wirelessly

• Home machine can move> Team LAN parties

Page 5: House - Dynamic Bandwidth Throttling in a Client Server ...

Questions we will answer

• How do we adjust bandwidth utilization to match bandwidth availability?> This is the bulk of the talk

• As available bandwidth changes, how do we adjust game state replication?

• What do we do when a machine can no longer host?

• When matchmaking, how do we ensure someone can host the game?

Page 6: House - Dynamic Bandwidth Throttling in a Client Server ...

Adjusting Bandwidth

• Adjusting is done at three different levels> Connection Control

Server to client connection Takes care of rapid adjustments due to client specific problems

> Global Control Server adjusts all client connections simultaneously Adjustments for problems across multiple clients

– Problems most likely due to local bottleneck

> History Control Server adjust overall bandwidth target as conditions change and

evidence builds Provides continuity during a game Allows for growth and adjustment over course of multiple games Provides basis for estimating future performance

Page 7: House - Dynamic Bandwidth Throttling in a Client Server ...

Connection Control

• Bandwidth between two sever and client• Try to reach and maintain goal bandwidth

> Goal bandwidth is set by Global Control• All traffic is UDP based

> Reliable messaging built on top> Not part of talk today

• Bandwidth is always used> Packets are handed up to game to fill> Unused packet space is padded> Ensures bandwidth is available when needed

Page 8: House - Dynamic Bandwidth Throttling in a Client Server ...

Congestion Control

• When congestion is detected> Reduce current bandwidth> When congestion has cleared, increase

current bandwidth over time until goal is reached

> Maintain bandwidth at goal

• Two signals for congestion control> Increase in measured RTT> Packet loss due to timeout or subsequent

acknowledgements

Page 9: House - Dynamic Bandwidth Throttling in a Client Server ...

Congestion Control States

• Maintain State> Remain in this state if goal is reached> Transition to Growth if able to maintain

current bandwidth for some period of time

• Recovery State> Entered when congestion is detected> Transition to Maintain when able to

achieve current bandwidth and when RTT has stabilized

Page 10: House - Dynamic Bandwidth Throttling in a Client Server ...

Congestion Control States

• Growth State> Growth rate established when target was set

Allows for rapidly growing connections when appropriate> Growth is stopped when congestion is detected

Recovery state is entered> Growth is stopped if measured throughput fails to

come close to current bandwidth Maintain state is entered

> Growth occurs in steps Next step taken when measured throughput comes close

enough to current bandwidth> RTT threshold used for congestion warning signal

Established when state is entered Adjusted as packet sizes increase

Page 11: House - Dynamic Bandwidth Throttling in a Client Server ...

RTT

• Primary congestion control signal

• To calculate RTT> Timestamp in every packet

Allowed us to see changes in RTT more quickly

> Used low pass filter to calculate smooth RTT

• Baseline RTT established in Maintain state

• Significant deviation from baseline used a signal of congestion

Page 12: House - Dynamic Bandwidth Throttling in a Client Server ...

Packet-loss

• Two types of packet-loss> Loss due to subsequent acknowledgement

If we get multiple subsequent acknowledgements we take this as an indication of a packet loss

> Loss due to timeout Packet failed to be acknowledged after some period of time Timeout calculated from filtered RTT

• Only causes a congestion control signal if multiple events encountered over an interval of time> Spurious packet loss does not trigger control

signal

Page 13: House - Dynamic Bandwidth Throttling in a Client Server ...

Global Control

• The server keeps a current goal bandwidth which is divided among all client connections

• Attempt to reach goal by growing bandwidth backing off when bandwidth is exceeded

• Ability to look at behavior across multiple connections allows detection of bad connections

Page 14: House - Dynamic Bandwidth Throttling in a Client Server ...

Global Control States

• Recovery State> Entered when bandwidth is exceeded> Slow Adjustment is entered when fully recovered

• Rapid Growth State> Quick adjustments are made until bandwidth is

exceeded or goal is encountered• Slow Adjustment State

> Small incremental growth until bandwidth is exceeded or goal is reached

• Goal Reached State> Goal bandwidth is maintained until bandwidth is

exceeded

Page 15: House - Dynamic Bandwidth Throttling in a Client Server ...

Global Recovery State

• Continue to reduce current global bandwidth as long as bandwidth is being exceeded

• Reduction occurs at regular interval• Percentage reduction applied until some

minimum• Current bandwidth is reduced immediately

when state is entered> Amount of reduction is less when entered

from rapid growth

Page 16: House - Dynamic Bandwidth Throttling in a Client Server ...

Global Slow/Rapid Growth

• Grow bandwidth in steps

• Each step is a small/large percentage of current global bandwidth

• Next step taken when global bandwidth is measured and maintained over a period of time

• Growth continues until bandwidth is exceeded or goal is reached

Page 17: House - Dynamic Bandwidth Throttling in a Client Server ...

Detecting Bandwidth Overuse

• If over 50% of the connections are in there recovery control state we assume that the bandwidth is exceeded

• Single bad connection can affect global control> Will not cause bandwidth exceeded signal> But can prohibit growth of bandwidth due

to failure to deliver its share of throughput

Page 18: House - Dynamic Bandwidth Throttling in a Client Server ...

Dividing Bandwidth

• Even bandwidth distribution among client connections

• Extra bandwidth given based on need> Some clients act as voice repeaters and

thus require additional bandwidth

• Bandwidth might be limited for bad connections

Page 19: House - Dynamic Bandwidth Throttling in a Client Server ...

Bad Connections

• Any connection that accumulates congestion signals over time is eventually marked as bad> We keep a counter that accumulates for every

period that experienced a congestion period and decrements for every period that did not

• Bandwidth to that connection is limited• It’s throughput is no longer taken into

consideration when determining whether goal bandwidth is met> All traffic sent is added to total throughput since

we can’t rely on acknowledgements from connection

Page 20: House - Dynamic Bandwidth Throttling in a Client Server ...

History Control

• Global bandwidth is adjusted over time

• Global goal bandwidth> Adjusted up linearly

If the host is able to consistently maintain it

> Adjusted down exponentially If the host fails to maintain it

• Bandwidth History Period> Global goal held constant> Periods used as quantum of measurement

Page 21: House - Dynamic Bandwidth Throttling in a Client Server ...

Bandwidth History Periods

• Starts when a global goal bandwidth is set• Period ends when:

> Goal changes and the period is canceled For instance due to a client leaving

> Goal reached and held for a period of time No global control recoveries occurred

– Period is considered successful– Successful result recorded with goal bandwidth

Some global control recoveries occurred – Period is neither successful or failed– Nothing is recorded

> Failure occurs Goal not reached given sufficient time Multiple global control recoveries occurred

Page 22: House - Dynamic Bandwidth Throttling in a Client Server ...

Bandwidth History

• Successful/failed periods are recorded

• Recorded in circular buffer> Large enough to hold many hours of play

• Stored using per-user live storage> History is tied both to box and the player

• From the history, we can calculate:> Reliability percentage for some bandwidth> Failure percentage given some bandwidth

Page 23: House - Dynamic Bandwidth Throttling in a Client Server ...

Reliability Percentage

• Give some bandwidth X, how reliable has this machine been at delivering that bandwidth> Success(X) / (Success(X) + Failure(X))*100

Success(X) is the number of successful periods at or above bandwidth X

Where Failure(X) is the number of failed periods at or below bandwidth X

Page 24: House - Dynamic Bandwidth Throttling in a Client Server ...

Failure Percentage

• Given some bandwidth X, how often have we failed to deliver that bandwidth> Failure(X) / (TotalSuccess + Failure(X))*100

Where TotalSuccess is the total number of successful periods regardless of bandwidth

Page 25: House - Dynamic Bandwidth Throttling in a Client Server ...

Reliable Bandwidth

• Greatest bandwidth X such that ReliabilityPercentage(X) >= 95%

• Stated another way> What bandwidth should we pick to ensure

that we will only get a 1 in 20 chance of having a failure to maintain that bandwidth

• We want to ensure overall consistency in game play

Page 26: House - Dynamic Bandwidth Throttling in a Client Server ...

Use of Reliable Bandwidth

• Almost always used as global goal

• Two exceptions> When no history is present

Conservative estimate is used Enough to be considered when picking host But not too much Below average home connection speed Reduce chance of poor game play at game start

Page 27: House - Dynamic Bandwidth Throttling in a Client Server ...

Trying Higher Bandwidth

• Only consider one step higher> A bandwidth that is one step higher then

our reliable bandwidth

• Will use this higher bandwidth if:> FailurePercentage(X) <= 5%

• Success or failure will be recorded

• Logic runs again

Page 28: House - Dynamic Bandwidth Throttling in a Client Server ...

Trying Higher Bandwidths

• Consider this case> Recent failure at 320kbps (stepped up)> Connection speed at 300kbps> ReliableBandwidth at 300kbps> FailurePercentage(320) > 5%

• What happens> 300kbps will be used repeatedly> Each success slowly reduces FailurePercentage(320)> Eventuall, FailurePercentage(320) <= 5%> 320kbps will be tried> Failure will be record> This will repeat trying 320 once in every 2 hours

Page 29: House - Dynamic Bandwidth Throttling in a Client Server ...

Bandwidth Control In Action

• Lets consider some typical scenarios> No history but good connection> No history but bad bandwidth> Good history but temporary problem

Page 30: House - Dynamic Bandwidth Throttling in a Client Server ...

No History Good Connection

• A client with no history will default to assuming a reasonable reliable bandwidth> Assumption is enough to host a moderate size game> If host has good pings to other clients and open NAT,

it is likely they will be picked to serve• Global Goal Bandwidth will use default

> Default is conservative> Below average home connection> Goal will be achievable > 50% of the time> Bandwidth period will be recorded as a success> Next bandwidth step will immediately be tried

FailurePercentage(x) is always 0% until a failure

Page 31: House - Dynamic Bandwidth Throttling in a Client Server ...

No History Good Connection

• Cont.> Higher bandwidth will likely succeed

> Another success recorded

> Higher and higher bandwidths will be tried

> Eventually bandwidth will approach actual

> Latency spike on all connections will occur

> RTT congestion signal will trigger

> Connection control recovery state entered across all connections

Page 32: House - Dynamic Bandwidth Throttling in a Client Server ...

No History Good Connection

• Cont.> Majority congestion problems

Global recovery state will be entered

> After recovery Slow growth will then be follwed by another recovery

> Multiple recoveries will cause period failure> Reported failure will stop increases> Another step up will not occur for a while> Bandwidth usage stabilizes just below actual

bandwidth

Page 33: House - Dynamic Bandwidth Throttling in a Client Server ...

No History Bad Bandwidth

• Initial guess is too high• Congestion will be seen immediately• First period will be reported as a failure• Step down bandwidth is tried next• If this fails, step downs are increased

exponentially• Eventually bandwidth will be set below

actual bandwidth• Bandwidth will stabilize here

Page 34: House - Dynamic Bandwidth Throttling in a Client Server ...

Good History Hiccups Occur

• Bandwidth is stable below actual bandwidth• Something happens

> Available bandwidth is lowered• If it occurs very briefly

> Connections will briefly experience congestion> Bandwidth across all connections will be dropped> Single global recovery will occur> Period will not fail

• If it occurs over a sustained period of time> Failure will be recorded> Reduced bandwidth used next> Reductions will continue> RelaibleBandwidth is significantly reduced> This is what we want

Page 35: House - Dynamic Bandwidth Throttling in a Client Server ...

Picking Best Host

• Reliable bandwidth is primary factor

• Reflects ability of that box under that players control to reliable deliver (at least 95% of the time) bandwidth

• Basing the estimate on a long history captures the ability of the player and box to survive in the hostile home environment

Page 36: House - Dynamic Bandwidth Throttling in a Client Server ...

Picking Best Host (cont.)

• We build a pool of hosts from those that have a reliable bandwidth large enough for current game size

• From this host pool we use other criteria as tie breakers> Ping times to other players> Nat type (open preferred over moderate)> Percentage of games left gracefully

Graceful exits are those that the game has a chance to remove the player from the game before the box is turned off or the network cable is unplugged

Page 37: House - Dynamic Bandwidth Throttling in a Client Server ...

Game State Replication

• Game state> Player positions> Weapon damage> Object positions in world> Weapon in hand> Player health> Object damage

• As bandwidth between the server and a client changes, the game must react appropriately to the changing conditions

Page 38: House - Dynamic Bandwidth Throttling in a Client Server ...

Priority Based Replication

• Updates are assigned a priority based on:> Importance of update

Player position more important then player health

> Time of last update The longer since last update, the higher the priority

> Client importance Can the players on that client see/hear what is

being updated

Page 39: House - Dynamic Bandwidth Throttling in a Client Server ...

Priority Based Replication

• Updates are then sent based on priority

• Time intensive task to get priority scheme right> Must play, take traces and decide whether

the right decisions are being made by priority system when under load

> This hand tuning is critical to get the best polish for the game

Page 40: House - Dynamic Bandwidth Throttling in a Client Server ...

Host Migration

• Host migration is moving the hosting responsibilities from one box to another box

• Shadowrun only supports host initiated migration> Eliminates the potential for exploitation> Halo2’s host election process had

unintended consequences Encouraged griefing

Page 41: House - Dynamic Bandwidth Throttling in a Client Server ...

Host Migration

• Host is migrated when:> Host chooses to leave

Game will end, hosting is migrated and host is then removed from game

> Current host is no longer best Host changed between games Consistent gameplay for duration of game Between rounds would perhaps have been better

> Game is prematurely stopped Host bandwidth no longer supports game size

Page 42: House - Dynamic Bandwidth Throttling in a Client Server ...

Matchmaking

• Good hosts> Good bandwidth> Open Nat> High Hosting Reliability

• Good hosts will favor games without> Will try to join a game that needs good

host before trying to join one that does not> But will only do so if game is a good match

Page 43: House - Dynamic Bandwidth Throttling in a Client Server ...

Putting It All Together

• Home is a hard place to serve from• History attempts to identify players

who can manage it well• Design focuses on consistency

> Global bandwidth increases are made over multiple rounds of play

• Good hosts are rewarded by having host advantage and thus encouraging players to be good hosts

Page 44: House - Dynamic Bandwidth Throttling in a Client Server ...

Putting It All Together (cont.)

• Quick to respond to changing conditions> Low level point-to-point control ensures

continued connectivity

> This response happens very quickly

• If problems persist, global bandwidth control will kick in and reduce overall targets within minutes> Host will eventually be replaced

> Bad host will have low reliable bandwidth

Page 45: House - Dynamic Bandwidth Throttling in a Client Server ...

Putting It All Together

• During Beta bandwidth histories accurately reflected player connection abilities

• System repeatedly found the same good hosts as system histories were reset

• Bad hosts did cause bad rounds of play but where quickly eliminated from pool of hosts in future games

Page 46: House - Dynamic Bandwidth Throttling in a Client Server ...

Wish List

• Ability for game to know when box has moved> Create a signature that can be stored with the

history that represents the network location> For instance using the MAC of the local

gateway along with perhaps routing information to known service

• Ability to manage QoS in the home> Demands for bandwidth in the home are only

going to get worse> Efforts need to be made to help manage

bandwidth across devices in the home

Page 47: House - Dynamic Bandwidth Throttling in a Client Server ...

Wish List (cont.)

> Consistent Bandwidth Control and Prediction across titles and platforms Game developers should be relieved of this job Common problem for many games History is applicable across titles and platforms

> Power Off UDP Packet Delivery Add ability for hardware to send out notification of

powering down before actually powering down This will allow others who are connected to game to be

notified of removal of box and thus can handle it gracefully