Cynycal software

17
CYNICAL SOFTWARE Tips for writing software that do not wake you up at 2am [email protected] https://twitter.com/_dagi Sunday 17 November 13

description

Tips for writing software that do not wake you up at 2am. A topic highly influenced by Michael T. Nygard's book Release It and lessons we've learnt at GoodData.

Transcript of Cynycal software

Page 1: Cynycal software

CYNICAL SOFTWARE

Tips for writing software that do not wake you up at 2am

[email protected]://twitter.com/_dagi

Sunday 17 November 13

Page 2: Cynycal software

HOW TO BECOME CYNICAL

• Eat your own dog food

• Strong feedback

• Pager duty (Engineer on duty)

•DevOps

Sunday 17 November 13

No throw over the wall syndrom

Page 3: Cynycal software

Sunday 17 November 13

Page 4: Cynycal software

http://pragprog.com/book/mnee/release-itSunday 17 November 13

Read an excelent book about Cynical software- stability patterns&anti-patterns- architecture, testing and design

Page 5: Cynycal software

LB

DB

3rd party service

Frontend service

Backend service

Backend service

Frontend service

Backend service LB DB

DB

LB

Failure modes

Integration points

INTEGRATION POINTS & FAILURE MODES

Sunday 17 November 13

- shift toward SOA, interconnected services, remote communication, 3rd party services- how a crack may appear - failure mode- integration point- cracks are tightly coupled to integration points- the way cracks appear and propagated across multiple layers and services- Integration points may accelerate (chained reaction) or stop cracks- Failure in one component increase the probability of failure in another component service- Slow responses, endpoint unreachability- High levels of complexity provide more directions for the cracks to propagate in.

Page 6: Cynycal software

Sunday 17 November 13

Page 7: Cynycal software

Datacenter Rackspace

Datacenter AWS

User user; Projects projects;try { projects = dao.getProjects(user, OFFSET, LIMIT).await(250, MS);} catch (FutureTimeoutException e) { throw new AccountInfoUnavailableException();}

for (Project project : projects) { Set<String> userPermissions = dao.getPermissions(project, user); if (userPermissions.contains(CAN_INIT_DATA)) { return new AccountInfo(Boolean.TRUE); }}

Frontend server Backend server

C3

MySQL

1

2

MySQL

MySQL

Frontend server Backend server

C3

MySQLMySQLMySQL

Hanging SQL connection

(90 s timeout)

ClientHanging HTTP connection (60s

timeout)

Hanging HTTP worker thread

(120s)

Sunday 17 November 13

Page 8: Cynycal software

Sunday 17 November 13

Page 9: Cynycal software

A cynical software

“Cynicism is merely the art of seeing things as they are instead of as they ought to be” [1.]

Bad things happen

Lack of trust

Internal barriers

No intimacy

Resilience to impulse and stressSunday 17 November 13

Explain the main attributes of cynical software

Page 10: Cynycal software

STABILITY PATTERNS & ANTIPATTERNS

BulkheadsTest harness

Circuit breakerHandshaking Fail fast

Timeouts

Steady state

Decoupling middleware

SLA inversion

Blocked threads

Slow responses

Attacks of self denial

Unbounded result set

Scaling effects

Unbalanced capacities

Sunday 17 November 13

- the antipatterns will create, accelerate or multiply cracks in the system- the patterns provide architecture and design guidance to reduce, eliminate, or mitigate the effects of cracks in the system

Page 11: Cynycal software

CIRCUIT BREAKER

Closed Failure

Failure counter : 1Failure counter : 2

FailureOpenTimeoutHalf-OpenFailure counter : 0

OpenFrontend service

Backend service

Sunday 17 November 13

- mediator (decoupling, isolation), integration point wrapper- fail fast, recovery

Page 12: Cynycal software

HTTP client

BULKHEADS

Frontend service

Backend service A

Backend service B

Backend service C

Connections pool

Hiccup

Bulkheads

Hiccup

Sunday 17 November 13

fat tails - sizing/capacityisolation, degrading functionality

Page 13: Cynycal software

Latency and Fault Tolerance for Distributed Systems

https://github.com/Netflix/Hystrix

Sunday 17 November 13

- do not reinvent the wheel- OSS, Java- most of the patterns are implemented there (circuit breaker, bulkheads, fail fast)

Page 14: Cynycal software

public class CommandHelloWorld extends HystrixCommand<String> {

private final String name;

public CommandHelloWorld(String name) { super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup")); this.name = name; }

@Override protected String run() { // a real example would do work like a network call here return "Hello " + name + "!"; }}

String s = new CommandHelloWorld("World").execute();

Future<String> fs = new CommandHelloWorld("World").queue();

Sunday 17 November 13

HystrixCommand = circuit breaker + bulkheadsSynchronous/Asynchronous usageAsync usage Future -> timeout

Page 15: Cynycal software

TESTINGFocus on failures

Chaos monkey [2]

Simulate bad things

Sunday 17 November 13

- we do test but on "sligthly" different topology. It makes hard to reveal some kind of bugs - we mostly test optimistic cases - HTTP mock server (bad responses, slow responses, protocol violation...) - Longevity tests

Page 16: Cynycal software

ARCHITECTURE

Adaptable design [3]

Conway’s law[4]

ROC [5]Sunday 17 November 13

- early decisions are hard to revert later (costs)- Big Up Front Design doesn’t work prefer Adaptable design - Framework, Platform- restartability (No restart the world), diagnostics (health checks), recovery mechanism - circuit breaker, isolation/redundancy ()

Page 17: Cynycal software

Q&A

https://github.com/dagi/cynical-software

Sunday 17 November 13