Download - Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

Transcript
Page 1: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 1 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

#GHC16

2016Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

Tanuja Phadke

[email protected]

Page 2: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 2 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

The problem with resiliency in distributed systems

Single node system

Node

Database1

Web container caching

Database2

• All components reside in the same machine.

• It’s not too hard to ensure atomicity.• Either all occur or nothing occurs

Page 3: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 3 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

The problem with resiliency in distributed systems

node node1

node2

node2node1

node1 node2

• Components are spread out.• Maintaining atomicity and resiliency is a

challenge.• So we strive for eventual consistency.• The change will eventually be propagated

to all the copies of data.

Page 4: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 4 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Intuit case study: Login Service

Intuit makes financial software. Many of these products use the Login service for login and fetching users’ bank accounts and transactions securely.

Page 5: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 5 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Requirements for the Login Service

• Fast response times• Resilience• Fault-tolerance• Consistency

Page 6: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 6 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

4-step solution we used to solve the problem

1. Decouple design a. Implement single responsibility principle (SRP)b. Use the command pattern

2. Use circuit breaker framework3. Use reactor to recover4. Use caching (record)

Page 7: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 7 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

1. Decouple design

• Individual components can be developed independently.• Plug and play components into bigger solution.

Page 8: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 8 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

1a. Implement single responsibility principle

• Module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.

• Separation of concerns• Each module/method does only one task.

Page 9: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 9 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

1b. Use command pattern

Invoker

Client

creates

<<interface>> Command

execute()recover()

ConcreteCommand A

ConcreteCommand B

implements

uses

creates

A behavioral design pattern in which an object is used to encapsulate all information needed to perform an action or trigger an event at a later time.

Page 10: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 10 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Benefits of the command pattern

• Each command knows how to execute itself.• Each command knows how to react to failures.• Rollback • Retry• Something else

Page 11: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 11 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Traditional model with services

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Page 12: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 12 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Introduce commands

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Command create, update ...

Commandcreate, update ...

Page 13: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 13 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

2. Use circuit breaker

• Circuit breaker is used to detect failures, and encapsulates logic to reacting to failure (during maintenance, temporary external system failure or unexpected system difficulties).

• The circuit breaker pattern is a stability patterns applied in a RESTful architecture.

• Several open sources are available (Hystrix is developed by Netflix and is popular open source).

Page 14: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 14 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Example of circuit breaker

Page 15: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 15 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

3. Use reactor

• Gets invoked in case of failure.• We can specify the behavior.• Rollback • Retry• Trigger a back-up • Fallback

Page 16: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 16 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Use circuit breaker

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Circuit breaker

Fallback

Short circuitLog error

Error response

CommandCommand

Page 17: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 17 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

4. Use caching

• Use cache to save the commands so that they can be used for recovery.

• Some popular open source solutions:• Hazelcast• Memcache• Redis

Page 18: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 18 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

4. Use caching

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Fallback

Short circuitLog error

Error response

Cache Client Cache

Cache Listener

[Reactor]

Page 19: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 19 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Full resilient picture

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Page 20: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 20 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Service A fails

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Fallback

Short circuitLog error

Error response

Page 21: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 21 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Service A is successful and Service B fails

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Fallback

Short circuitLog error

Error response

Cache Client Cache

Cache Listener

[Reactor]

Reactor(recover)Caching dirty

Page 22: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 22 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Service A and Service B both succeed

Orchestration

Handler

Service A

Service B

create

update

delete

get

create

update

delete

get

GET

PUT

Fallback

Short circuitLog error

Error response

Cache Client Cache

Cache Listener

[Reactor]

Caching(record)Not Dirty

Page 23: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 23 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

4-step solution we used to solve the problem

1. Decouple design a. Implement single responsibility principle (SRP)b. Use the command pattern

2. Use caching (record)3. Use circuit breaker framework4. Use reactor to recover

Page 24: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 24 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Our story: How did we benefit?

Over 100 user update requests were failing.• They got slow responses.• Resulted in high CPU utilization and cascading failures.

After we implemented this solution, we failed fast and could adhere to the SLAs.

Page 25: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 25 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

For more info ...

Retry pattern https://msdn.microsoft.com/en-us/library/dn589788.aspx

Command Handlinghttp://www.axonframework.org/docs/2.0/command-handling.html

Page 26: Building Resilient Distributed Systems by Using Caching Command and Rollback-Replay

PAGE 26 | GRACE HOPPER CELEBRATION 2016 | #GHC16PRESENTED BY THE ANITA BORG INSTITUTE AND THE ASSOCIATION FOR COMPUTING MACHINERY

Thank you

Feedback?

Download at http://bit.ly/ghc16app or search GHC 16 in the app store

Rate and review the session on our mobile app