System Reliability and Resilience and stuff. Some things need to be cleared up first.

66
System Reliability and Resilience and stuff

Transcript of System Reliability and Resilience and stuff. Some things need to be cleared up first.

Page 1: System Reliability and Resilience and stuff. Some things need to be cleared up first.

SystemReliability and

Resilienceand stuff

Page 2: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Some things need to be cleared up first

Page 3: System Reliability and Resilience and stuff. Some things need to be cleared up first.

http://en.wikipedia.org/wiki/Vedette_(cabaret)

Page 4: System Reliability and Resilience and stuff. Some things need to be cleared up first.

tuple

Page 5: System Reliability and Resilience and stuff. Some things need to be cleared up first.

//Initialize customer and invoiceInitialize(customer, invoice);

Page 6: System Reliability and Resilience and stuff. Some things need to be cleared up first.

public void Initialize(Customer customer, Invoice

invoice){

customer.Name = “asdf”;invoice.Date = DateTime.Now;

}

Page 7: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Initialize(customer, invoice);//did something happen to customer// and/or invoice?

Page 8: System Reliability and Resilience and stuff. Some things need to be cleared up first.

customer.Name =

InitNameFrom(customer, invoice);invoice.Date =

InitDateFrom(customer, invoice);

Page 9: System Reliability and Resilience and stuff. Some things need to be cleared up first.

customer.Name =

GetNameFrom(customer, invoice);invoice.Date =

GetDateFrom(customer, invoice);

Page 10: System Reliability and Resilience and stuff. Some things need to be cleared up first.

var results = Initialize(customer,

invoice);

customer.Name = results.Item1;invoice.Date = results.Item2;

Page 11: System Reliability and Resilience and stuff. Some things need to be cleared up first.

public tuple<string, DateTime>Initialize(customer,

invoice){

return new Tuple<string, DateTime>

(“asdf”, DateTime.Now);}

Page 12: System Reliability and Resilience and stuff. Some things need to be cleared up first.

public static bool TryParse(string s, out DateTime result)

or

public static tuple<bool, DateTime?>

TryParse(string s)

Page 13: System Reliability and Resilience and stuff. Some things need to be cleared up first.

tuple• Avoid side effects• Avoid out parameters•multiple values without a specific type

Page 14: System Reliability and Resilience and stuff. Some things need to be cleared up first.

null object

Page 15: System Reliability and Resilience and stuff. Some things need to be cleared up first.

private ILogger _logger;public MyClass(ILogger logger) {

_logger = logger;}

if (_logger != null) {_logger.Debug(

“it worked on my machine!”);}

Page 16: System Reliability and Resilience and stuff. Some things need to be cleared up first.

null checks for everyone!

Page 17: System Reliability and Resilience and stuff. Some things need to be cleared up first.

forget one and…

Page 18: System Reliability and Resilience and stuff. Some things need to be cleared up first.

public class NullLogger : ILogger {

public void Debug(string text) {

//do sweet nothing}

}

Page 19: System Reliability and Resilience and stuff. Some things need to be cleared up first.

private ILogger _logger = new NullLogger();

public MyClass(ILogger logger) {_logger = logger;

}

_logger.Debug(“it worked on my machine!”);

Page 20: System Reliability and Resilience and stuff. Some things need to be cleared up first.

null object• Can eliminate null checks• Simple to implement

Page 21: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Circuit Breaker

Page 22: System Reliability and Resilience and stuff. Some things need to be cleared up first.
Page 23: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Retry

Page 24: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion Out of Process

Dependency

N times

Page 25: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Out of Process Dependency

N times*

Y clients

Page 26: System Reliability and Resilience and stuff. Some things need to be cleared up first.

= Denial of

Service Attack

Page 27: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Limit the # of retries

Page 28: System Reliability and Resilience and stuff. Some things need to be cleared up first.

N * Ybecomes5 * Y

Page 29: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Y isstill a

problem

Page 30: System Reliability and Resilience and stuff. Some things need to be cleared up first.
Page 31: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Circuit Breaker

Page 32: System Reliability and Resilience and stuff. Some things need to be cleared up first.
Page 33: System Reliability and Resilience and stuff. Some things need to be cleared up first.

State Machine

On :: Off

Page 34: System Reliability and Resilience and stuff. Some things need to be cleared up first.

On Offwhen not healthy

Page 35: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Off Onmanually

Page 36: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Get to softwarebefore we ask you to dance

Page 37: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Healthyor

Unhealthy

Out of Process Dependency

Page 38: System Reliability and Resilience and stuff. Some things need to be cleared up first.

State is independent of requestor

Out of Process Dependency

Page 39: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion Has many

independent external dependencies

Page 40: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion

Can throttle itself

Page 41: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion

Has a wait threshold

Page 42: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your Application

External Dependency

Circuit Breaker

Threshold = 2Pause = 10msTimeout = 30sState = ClosedRequest

Request

Failure (i.e. HTTP 500)Failure Count = 1Pause 10ms

Request

Failure (i.e. HTTP 500)Failure Count = 2State = Open

OperationFailedException

Page 43: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Threshold = 2Pause = 10msTimeout = 30sState = OpenRequest

30s has not passed

CircuitBreakerOpenException

Request

30s has not passed

CircuitBreakerOpenException

System can try to

become healthyfor 30s

Your Application

External Dependency

Circuit Breaker

Page 44: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Threshold = 2Pause = 10msTimeout = 30sState = ½ OpenRequest

Request

Failure (i.e. HTTP 500)Failure Count = 2State = Open

OperationFailedException

30s has passed

Your Application

External Dependency

Circuit Breaker

Page 45: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Threshold = 2Pause = 10msTimeout = 30sState = ½ OpenRequest

Request

Failure Count = 0State = Closed

Response

30s has passed

Response

Your Application

External Dependency

Circuit Breaker

Page 46: System Reliability and Resilience and stuff. Some things need to be cleared up first.

ClosedOpen

½ Open

Page 47: System Reliability and Resilience and stuff. Some things need to be cleared up first.

½ Open is like a

manual reset

Page 48: System Reliability and Resilience and stuff. Some things need to be cleared up first.

PauseTimeout

Page 49: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Pausebetween calls

in the loop

Page 50: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Timeoutbefore you

can call again

Page 51: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Exceptions

Page 52: System Reliability and Resilience and stuff. Some things need to be cleared up first.

OperationFailed:

AggregateException

Page 53: System Reliability and Resilience and stuff. Some things need to be cleared up first.

CircuitBreakerOpen:

ApplicationException

Page 54: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Don’t Loose Exception Info

Page 55: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Always use InnerException(s)

Page 56: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Threshold = 3State = ClosedRequest

Request

Failure (i.e. HTTP 500)Request

Failure (i.e. HTTP 500)Failure Count = 2

Failure Count = 0State = Closed

Response

Response

Request?Your

ApplicationExternal

DependencyCircuit

Breaker

Failure Count = 1

Page 57: System Reliability and Resilience and stuff. Some things need to be cleared up first.

SegregateDependencies

Page 58: System Reliability and Resilience and stuff. Some things need to be cleared up first.

circuitBreaker(“database”)

circuitBreaker(“weatherservice”)

Page 59: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Dependency type, endpoint svc,

endpoint

Page 60: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Where?

Page 61: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion Out of Process

DependencyCi

rcui

t Bre

aker

Prox

y

Page 62: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Watch forInception

Page 63: System Reliability and Resilience and stuff. Some things need to be cleared up first.

Your

App

licat

ion W

eb ServiceCi

rcui

t Bre

aker

Circ

uit B

reak

er

Prox

y

DatabaseRepo

sitor

y

Page 64: System Reliability and Resilience and stuff. Some things need to be cleared up first.

circuit breaker• retry looping• slow down attempts• good neighbour

Page 65: System Reliability and Resilience and stuff. Some things need to be cleared up first.

¡Muchas gracias!

Page 66: System Reliability and Resilience and stuff. Some things need to be cleared up first.

gracias

Donald Belcham@dbelcham

[email protected]