AWS Summit Berlin 2013 - Tadaa - HD Camera and Photo Community

Post on 06-May-2015

805 views 0 download

description

tadaa is the camera app replacing your DSLR, coupled with a massive community of pro and amateur photographers. Let's talk about cost-efficient living, scaling and growing in the cloud. Speaker: Friedemann Wachsmuth, CTO, menschmaschine Publishing GmbH

Transcript of AWS Summit Berlin 2013 - Tadaa - HD Camera and Photo Community

Scaling tadaaPhoto Broadcasting

in the Cloud

What is tadaa?

•HD Camera & Photo Editor for iOS

•Photosharing Community for Prosumers

•3 Million App Users

tadaa is a Realtime Application...

•Constant Relationship Changes

•Reactions, Interactions & Notifications

•Fan-outs, Deletions

•Sync Notification Counts

...with a Lot of Data Underneath

•250M Contact Hashes

•Hundreds of Millions of Images on S3

•~2500 Messages / Second

Architecture

Architecture

•100% Hosted on AWS

•Mostly Using AWS Services: EC2, EMR, CloudFront, S3, DynamoDB, ElastiCache, CloudWatch, IAM...

Apple Push Notifications Sent

Load can be Bursty

•If a Single User has 10k Followers

➡ One Photo can cause 10k Push Notifs!

•With 25% Open Rate, they create

➡ 2500 API calls within few Seconds

•leading to Thousands of Likes and Reactions... Within a few Seconds.

When Instagram Changed T&Cs...

ALL YOUR BASE ARE BELONG TO US

Using CouchDB...Pros:

•Simple REST API

•Replicates

•Schemaless

•Map/Reduce

•Everything versioned

Cons:

•Sharding is Hard

•JSON = Data Inflation

•Very Disk-I/O Dependent

•Everything versioned

Juggling 8TB Database Files on

EBS isn‘t Fun.

Hello, DynamoDB!

Moving to DynamoDB

•Predictable Performance

•Infinite Table Size

•Full Redundancy

•8TB Became 150GB!

DynamoDB Scales

•We Migrated the Live System with no Downtime within a Few Days

•Query Latency is Really just 2-3ms

•Worst Value ever seen was 8ms

•Now You get Some Free Burst Allowance!

Cost Scales Up, too...„How Can we Handle Bursts Better?“

Optimizing Access Patterns

•Level Your Reads and Writes

•Vary Hashkeys

•Throttle Non-Realtime Tasks

•Mirror in ElastiCache Where Possible, Persist Lazily

➡ Reduce Required CUs by 75%

Leveling Reads/Writes

•Avoid Bursts and Scans

•Queue Your I/O through a Messaging System

•If You Can, Separate Hot and Cold Data

Vary Your Hashkeys

•Your Table is Partitioned

•# of Partitions Grows with Size of Table and Provisioned CUs

• You don‘t know the # of Actual Partitions

•You only get your Provisioned CUs when using all Partitions Equally!

Throttle Non-Realtime Tasks

•Purge old Data through a Throttled Message Queue

•Spool Data with expected Redundancies and drain only the uniqued Changes (e.g. Facebook LiveAPI)

•Scale your Worker Throughput based on CloudWatch Data

Use ElastiCache for hot Data

•Write hot Data to ElastiCache and your DAL

•Alter Hot Data in the Cache (e.g. Like-Counts)

•Benefit e.g. from Increments

•Read latest Data from Cache once you persist to DynamoDB

It is all well worth it.

Thank You!

friedemann@tadaa.net

@peaceman