AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

57
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Carl Youngblood, Lead Engineer, UnderArmour Prahlad Rao, Solutions Architect, AWS November 29, 2016 Cross-Region Replication with Amazon DynamoDB Streams

Transcript of AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Page 1: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Carl Youngblood, Lead Engineer, UnderArmour

Prahlad Rao, Solutions Architect, AWS

November 29, 2016

Cross-Region Replication with

Amazon DynamoDB Streams

Page 2: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

What to expect from the session

DynamoDB introduction

1. SQL vs NoSQL refresher

2. Amazon DynamoDB recap

3. DynamoDB replication patterns

Implementing cross-region replication at Under Armour

1. What does single sign-on mean?

2. Background and problem context

3. Decision process that lead to our current solution

4. Our experience so far

5. Next steps

6. Starting over

Page 3: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Amazon DynamoDB

Fast and consistent

Scales to any workloadDocument or key-valueFully managed NoSQL

Event driven programmingAccess control

Page 4: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

TableTable

Items

Attributes

HashKey

RangeKey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for a hash key==, <, >, >=, <=“begins with”“between”sorted resultscountstop/bottom N valuespaged responses

Table can be partitioned for scale

Page 5: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Page 6: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

DynamoDB replication

patterns

Page 7: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Replication use cases

• Globally distributed applications

• Lower-latency data access

• Traffic distribution

• Disaster recovery

• In-region and cross-region

Page 8: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Stream of updates to a table

Asynchronous

Exactly once

Strictly ordered

• Per item

Highly durable

• Scale with table

24-hour lifetime

Sub-second latency

DynamoDB Streams

Page 9: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

In-region replication

• Automatic replication across AZs within

region (natively provided)

• Writes replicated continuously across 3

AZs, persisted to disk (SSD)

• Reads—strong or eventually consistent

• For data redundancy and protection

• DynamoDB Streams and AWS Lambda

• Streams of updates to a table

• DynamoDB triggers invoke a Lambda

function to run your code

Page 10: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Open Source Cross-

Region Replication Library

Cross-region Replication

• Solution uses Amazon

DynamoDB Cross-Region

Replication Library

• Leverages DynamoDB streams to

keep tables in sync across

multiple regions in near real-time

• Leverage cross-region replication

library in your applications

• Available in GitHub repository at:• https://github.com/awslabs/dyna

modb-cross-region-library

Page 11: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Stream

Table

Partition 1

Partition 2

Partition 3

Partition 4

Partition 5

Table

Shard 1

Shard 2

Shard 3

Shard 4

KCL

Worker

KCL

Worker

KCL

Worker

KCL

Worker

Amazon Kinesis Client

Library application

DynamoDB

client application

Updates

DynamoDB Streams and Amazon Kinesis Client Library

Cross-region replication

Page 12: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

DynamoDB Streams and AWS Lambda

Page 13: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Cross-region replication at

Under Armour

Page 14: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 15: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

To make all athletes better through passion, design, and the relentless pursuit of

innovation.

Under Armour connected fitness

Page 16: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

42%

33%

11%

8%

2%1% 3%

Engineering Team Locations

Austin San Francisco Copenhagen Denver Baltimore Guangzhou Off-Site

Page 17: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

About me

Page 18: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

What does single sign-on mean?

Page 19: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 20: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 21: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 22: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 23: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 24: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)
Page 25: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Background and problem context

Page 26: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Background and problem context

• 1 manager/developer/tech lead

• 1 developer

• 1 site reliability engineer (me!)

• Fast startup

• Fast iteration

• Low overhead

• Reliable188 million users.

Sign on once. That’s it.

Page 27: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Background and problem context

STOP

Personally identifiable information (PII)…as used in US privacy law…is

information that can be used…to identify, contact, or locate a single person, or

to identify an individual in context.https://en.wikipedia.org/wiki/Personally_identifiable_information

Page 28: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Background and problem context

*not to scale

• Store data where it belongs

• Don’t store data where it doesn’t belong

• Get data where and when it’s needed

1. Replicate PII-free pointers across regions

2. Follow pointers to locate user data

userId homeRegion

42 US

US users

German users

Other EU users

Page 29: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process

Page 30: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process

Google:

“dynamodb cross

region

replication.”

Click first result.http://docs.aws.amazon.com/a

mazondynamodb/latest/develop

erguide/Streams.CrossRegionR

epl.html

Profit. …sort of.

Page 31: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process—AWS CloudFormation

*This solution has now been deprecated.

• CloudFormation

• Amazon EC2 Container Service

• Tuning containers based on throughput

• Possible to wedge the whole thing if you go full chaos monkey

• No custom replication logic

Struggles

Page 32: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process

Google:

“dynamodb cross

region

replication.”

Click first result.http://docs.aws.amazon.com/a

mazondynamodb/latest/develop

erguide/Streams.CrossRegionR

epl.html

Check out the

Amazon Kinesis

Client LibraryPlus the DynamoDB Streams adapter

Profit. …well, sort of.

Page 33: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process—Amazon Kinesis Client Library

• Requires running a process somewhere

• Troubleshooting, startup, rebalancing, and failovers

• State tracking DynamoDB table in your account

• Scaling processes for throughput

• Less is more

Struggles

Page 34: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process

Google:

“dynamodb cross

region

replication.”

Click first result.http://docs.aws.amazon.com/a

mazondynamodb/latest/develop

erguide/Streams.CrossRegionR

epl.html

Profit. …yep!

DynamoDB Streams

+ Lambda

Check out the

Amazon Kinesis

Client LibraryPlus the DynamoDB Streams adapter

Page 35: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process—Lambda

• 24 hours to respond to problems

• Parallelizable with 1,024 threads

• Almost zero operational overhead

• Automatically scales with throughput

Strengths

Page 36: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Decision process—Lambda

• Log4j

• Logs to Amazon CloudWatch

• Lack of run-time configuration

Struggles

Page 37: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Our experience so far

Page 38: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Experience—reads

• Public DynamoDB endpoints + TLS

• Read anonymous data locally

• Read PII from user’s home region

eu-west-1us-east-1

us-east-1

OpenID servereu-west-1

OpenID server

Page 39: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Experience—writes

• Write anonymous data to us-east-1

• Replicate anonymous data

• Write PII to user’s home region

• Public DynamoDB endpoints + TLS

us-east-1

OpenID server

us-east-1

eu-west-1us-east-1

Page 40: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Experience—replication

class Main extends StrictLogging {

def handler(event: DynamodbEvent, context: Context): Unit = {val conf = Main.loadConfFromContext(context)logger.info("Replicating to regions: %s".format(Main.readConfRegions(conf)))

val clients = Main.buildClientsFromConf(conf)

val (records, skipped) = event.getRecords.asScala.toList.partition(Main.filterReplicatedUpdate)logger.info("Skipping %s records: %s".format(

skipped.length, for (r <- skipped) yield (r.getEventSourceARN, r.getDynamodb.getKeys)))logger.info("Replicating %s records: %s".format(

records.length, for (r <- records) yield (r.getEventSourceARN, r.getDynamodb.getKeys)))

records.par.map(Main.replicate(_, clients))}

}

Page 41: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Experience—latency

Slow

Fast

Outside us-east-1, outside home region

Outside us-east-1, inside home region

Inside us-east-1, outside home region

Inside us-east-1, inside home region

from us-east-1

~50ms to eu-west-1

~150ms to ap-northeast-1

Page 42: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Experience—reliability

• ~1 year in production

• CloudWatch alarms on throttles, errors

• ~0 pager alerts

Page 43: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Next steps

Page 44: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

circuit: open

us-east-1

Multimaster—reliability

us-east-1

OpenID server

eu-west-1

us-east-1

circuit: open

eu-west-1

circuit: closed

ap-northeast-1

fallback

fallback

Page 45: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Multimaster—latency

Slow

Fast

Outside us-east-1, outside home regionOutside us-east-1, inside home regionInside us-east-1, outside home regionInside us-east-1, inside home region

SQUISH

Better non-PII data locality

from us-east-1

~50ms to eu-west-1

~150ms to ap-northeast-1

Page 46: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Multimaster—write ordering

Extra rields:

1. Timestamp

2. Write ID

3. Replication flag

userId 42

email, etc [email protected]

timestamp 1476106431728

writeId5c0fb0d3-c1fe-4526-

a2cf-0678880952f9

replicateMe true

Page 47: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Lambda

DynamoDB

Application

Multimaster—write ordering

Replicate

if(replicateMe) DoneWrite to

DynamoDB

Poll DynamoDB

Stream event

source

DynamoDB

Stream shard

updated if(writeConditionFailed)

Write to Amazon S3

Done

Page 48: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Multimaster—write ordering

// Write condition expression

(:timestamp > timestamp)

OR (:timestamp = timestamp

AND :writeId > writeId)

Page 49: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

ts=1

r=t

ts=1

r=f

ts=1

r=f

us-east-1

eu-west-1

ap-northeast-1

Multimaster—write ordering

Page 50: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Multimaster—write ordering

ts=1

r=t

ts=2

r=t

ts=3

r=t

ts=3

r=f

ts=3

r=f us-east-1

eu-west-1

ap-northeast-1

Page 51: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Multimaster—write ordering

ts=1,wid=a

r=t

ts=1,wid=b

r=t

ts=1,wid=a

r=t

ts=1,wid=b

r=t

ts=1,wid=b

r=t us-east-1

eu-west-1

ap-northeast-1

Page 52: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

What if we started over?

Page 53: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Concurrent writes will happen!

The question is not how to work around or avoid them.

The question is how to recognize and resolve them.

Page 54: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Document schema

Concurrent writes require storage for multiple versions

of your data.

Either formally as a CRDT data structure or ad hoc for

eventual conflict resolution by a person or process.

Page 55: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Dotted version vectors

Thank you:

basho http://basho.com

Russel Brown https://github.com/russelldb

Nuno Preguiça

Carlos Baquero

Paulo Almeida

Victor Fonte

Ricardo Gonçalves

Efficient Causality Tracking in

Distributed Storage Systems

With Dotted Version Vectors.

Page 56: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Thank you!

Carl Youngblood

[email protected]

Page 57: AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DAT201)

Remember to complete

your evaluations!