LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE...

58
LOAD BALANCING IS IMPOSSIBLE

Transcript of LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE...

Page 1: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

L O A D B A L A N C I N GI S

I M P O S S I B L E

Page 2: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SLIDE

LOAD BALANCING IS IMPOSSIBLE

Tyler McMullen

[email protected]

@tbmcmullen

2

Page 3: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

WHAT IS LOAD BALANCING?

Page 4: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

[DIAGRAM DESCRIBING LOAD BALANCING]

Page 5: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

[ALLEGORY DESCRIBING LOAD BALANCING]

Page 6: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SLIDELOAD BALANCING IS IMPOSSIBLE 6

Abstraction Balancing LoadFailure

Treat many servers as oneSingle entry point

Simplification

Transparent failoverRecover seamlessly

SimplificationSpread the load efficiently across servers

Why Load Balance?

Three major reasons. The least of which is balancing load.

Page 7: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

R A N D O M

T H E I N G L O R I O U S D E F A U LT

A N D B A N E O F M Y E X I S T E N C E

Page 8: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SLIDE

LOAD BALANCING IS IMPOSSIBLE

What’s good about random?

8

• Simplicity

• Few edge cases

• Easy failover

• Works identically when distributed

Page 9: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SLIDE

LOAD BALANCING IS IMPOSSIBLE

What’s bad about random?

9

• Latency• Especially long-tail latency• Useable capacity

Page 10: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

B A L L S - I N T O - B I N S

Page 11: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

I f you throw m bal ls into n b ins,what is the maximum load

of any one bin?

Page 12: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 13: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

import numpy as npimport numpy.random as nr

n = 8 # number of serversm = 1000 # number of requests

bins = [0] * n

for chosen_bin in nr.randint(0, n, m): bins[chosen_bin] += 1

print bins

[129, 100, 134, 113, 117, 136, 148, 123]

Page 14: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

import numpy as npimport numpy.random as nr

n = 8 # number of serversm = 1000 # number of requests

bins = [0] * n

for weight in nr.uniform(0, 2, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += weight

print bins

[133.1, 133.9, 144.7, 124.1, 102.9, 125.4, 114.2, 121.3]

Page 15: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

How do you modelrequest latency?

Page 16: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

What do

Erlangand

getting kicked by a horsehave in common?

Page 17: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

POISSON PROCESS

Page 18: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 19: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 20: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 21: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 22: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

WHY IS THAT A PROBLEM?

Page 23: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

50ms

Page 24: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 25: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Even if your application has perfect constant response time

... It doesn’t.

Page 26: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Log-normal Distribution

50th: 0.675th: 1.2 95th: 3.1

99th: 6.0

99.9th: 14.1

MEAN: 1.0

Page 27: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

User-Generated Content Social

Ad-serving Photos

Page 28: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

mu = 0.0sigma = 1.15lognorm_mean = math.e ** (mu + sigma ** 2 / 2)

desired_mean = 1.0

def normalize(value): return value / lognorm_mean * desired_mean

for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight)

[128.7, 116.7, 136.1, 153.1, 98.2, 89.1, 125.4, 130.4]

Page 29: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

mu = 0.0sigma = 1.15lognorm_mean = math.e ** (mu + sigma ** 2 / 2)

desired_mean = 1.0baseline = 0.05

def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline)

for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight)

[100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

Page 30: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

THIS IS WHY

PERFECTION

IS IMPOSSIBLE

Page 31: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

._.

1

2

4

Page 32: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

WHAT EFFECT DOES IT HAVE?

Page 33: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Random simulationActual distribution

Page 34: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

The probability of a single resource request avoiding the 99th percentile is 99%.

The probability of all N resource requests in a page avoidingthe 99th percentile is (99% ^ N).

99% ^ 69 = 49.9%

Page 35: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 36: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SO WHAT DO WE DO ABOUT IT?

Page 37: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Random simulationJSQ simulation

Page 38: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 39: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 40: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Join-shortest-queue

Page 41: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

L E T ’ S T H R O W A W R E N C H I N T O T H I S . . .

D I S T R I B U T E D L O A D B A L A N C I N G

A N D W H Y I T M A K E S E V E R Y T H I N G H A R D E R

Page 42: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

DISTRIBUTED RANDOM IS EXACTLY THE SAME

Page 43: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE

Page 44: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 45: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 46: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 47: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

mu = 0.0sigma = 1.15lognorm_mean = math.e ** (mu + sigma ** 2 / 2)

desired_mean = 1.0baseline = 0.05

def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline)

for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight)

[100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

Page 48: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

mu = 0.0sigma = 1.15lognorm_mean = math.e ** (mu + sigma ** 2 / 2)

desired_mean = 1.0baseline = 0.05

def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline)

for weight in nr.lognormal(mu, sigma, m): a = nr.randint(0, n) b = nr.randint(0, n) chosen_bin = a if bins[a] < bins[b] else b bins[chosen_bin] += normalize(weight)

[130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

Page 49: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

[100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

[130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

STANDARD DEVIATION: 1.18

STANDARD DEVIATION: 22.9

Page 50: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

Random simulationJSQ simulationRandomized JSQ simulation

Page 51: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 52: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 53: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 54: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 55: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

A N O T H E R C R A Z Y I D E A

Page 56: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification
Page 57: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

WRAP UP

Page 58: LOAD BALANCING - QCon San Francisco · 2020-05-10 · LOAD BALANCING IS IMPOSSIBLE 6 SLIDE Abstraction Failure Balancing Load Treat many servers as one Single entry point Simplification

SLIDE

LOAD BALANCING IS IMPOSSIBLE

THANKS BYE

[email protected]

@tbmcmullen

58