© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
November 12, 2014 | Las Vegas
BDT312
Using the Cloud to Scale
from a Database to a Data Platform
Ryan Horn, Lead Software Engineer at Twilio
Hi, I’m RyanTech Lead of the User Data team at Twilio
What is Twilio?
We provide a communications
API that enables phones,
VoIP, and messaging to be
embedded into web, desktop
and mobile software.
How Does it Work?
A user calls your
number
Twilio receives the call Your app responds
What is the User Data Team?• We scale Twilio's backend database infrastructure
• We build customer facing data APIs
• We manage data policies and data security at rest
Databases at Twilio
Calls and Messages are Stateful
Queued
Ringing
In Progress
Completed
Queued
Sending
Sent
Delivered
In the Beginning…All data was placed in the same physical database
regardless of where the call or message was in its
lifecycle.
The Monolithic Database Model
API
Web
Billing
MySQLCall/Message
Service
Carriers
Problems at Scale• Many consumers of data
• Data with different performance characteristics
• Failure in the database degrades many services
• Horizontal scaling and orchestration is
complicated
Moving to a Service-Oriented Architecture
What is a Service-Oriented Architecture?
An architecture in which required system behavior
is decomposed into discrete units of functionality,
implemented as individual services for applications
to compose and consume.
Communicate Through Interfaces, Not Databases
API
Web
Billing
In Flight
MySQL
Call/Message
Service
In Flight
Service
Post Flight
Service
Post Flight
MySQL
Carriers
Database Can Change Without Changing Every Service
API
Web
Billing
In Flight
MySQL
Call/Message
Service
In Flight
Service
Post Flight
Service
Post Flight
Amazon
DynamoDB
Carriers
SOA Doesn’t Solve EverythingNo matter how many services you put in front
of MySQL, it’s still a single point of failure.
Sharding MySQL
Implementing Sharding (the easy part)
1. Choose partitioning scheme
2. Implement routing logic
3. Send application queries through router
4. Go!
Sharding at Twilio
Application Router Shard1
Shard2
Shard0
0-3
3-6
6-9
Rolling it Out With Zero Downtime (the hard part)
• We provide a 24/7, always on service
• Communications is intolerant of inconsistency
and latency
• There is no maintenance window
Bringing Up a New ShardMaster1
Slave1
Master2
Slave2
Application
0-9
Split Odds and Evens for WritesMaster1
Slave1
Master2
Slave2
Application
Odds
Evens
0-9
Update RoutingMaster1
Slave1
Master2
Slave2
Application
Odds
Evens
0-4
5-9
Cut Slave LinkMaster1
Slave1
Master2
Slave2
Application
0-4
5-9
New Solutions, New Problems
A Necessary BurdenIn the beginning, the burden of managing our
own databases was non-negotiable.
The Landscape has ChangedWe now have a variety of managed database
services which solve these problems for us,
such as Amazon RDS, Amazon DynamoDB,
Amazon SimpleDB, Amazon Redshift, etc.
Cost Is Never OptimizedApplication developers do not (and should not)
optimize for database cost.
Self Managed Databases are Costly
Everything
Else 22%
Databases
78%
Source: Twilio Data Usage
Keeping up With GrowthAs growth continues to accelerate, we need to
somehow keep up.
A Change in Approach• Change our hiring practices and bring in specialists
• Remove the context switching
Focusing on What We Do Well
Adopting Amazon DynamoDB
Thinking in Terms of ThroughputAmazon DynamoDB allows us to scale in terms of
throughput, not machines. This is the future of
resource provisioning.
OperationsManagement and scaling of our cluster is fully
abstracted away from us.
Cost Compared to MySQL
MySQL 82%
Amazon
DynamoDB 18%
Source: Twilio Data Usage
Cost with MySQL Fully Replaced
Everything
Else 61%
Databases
39%
Source: Twilio Data Usage
A Relational Model with Amazon DynamoDB
Many of our services allow for querying data in a way
that maps naturally to a relational database.
GET /Accounts/2/Events
SELECT * FROM events ORDER BY date DESC;
SELECT * FROM events WHERE IpAddress=“5.6.7.8”
ORDER BY date DESC;
SELECT * FROM events WHERE IpAddress=“5.6.7.8”
AND Date<=“2014-10-03” ORDER BY date DESC;
GET /Accounts/2/Events?IpAddress=5.6.7.8&Date<=2014-10-03
AccountId (Hash) Date (Range) IpAddress_Date Type
2 2014-10-03 5.6.7.8|2014-10-03 call
2 2014-10-01 5.6.7.8|2014-10-01 message
GET /Accounts/2/Events
AccountId=2, ScanIndexForward=false
AccountId (Hash) IpAddress_Date
(Range)
Date Type
2 5.6.7.8|2014-10-03 2014-10-03 call
2 5.6.7.8|2014-10-01 2014-10-01 message
GET /Accounts/2/Events?IpAddress=5.6.7.8
AccountId=2, IpAddress_Date begins with “5.6.7.8|”, ScanIndexForward=false
AccountId (Hash) IpAddress_Date
(Range)
Date Type
2 5.6.7.8|2014-10-03 2014-10-03 call
2 5.6.7.8|2014-10-01 2014-10-01 message
GET /Accounts/2/Events?IpAddress=5.6.7.8&Date<=2014-10-03
AccountId=2, IpAddress_Date LT “5.6.7.8|2014-10-03”, ScanIndexForward=false
Need to Handle Exceeded Throughput Failures
Exceeding provisioned throughput is a runtime error.
Handling Exceeded Write Throughput with
Amazon SQS
Queuing events to Amazon SQS processing
asynchronously allows us to gracefully deal with write
throughput errors.
API
Web
Billing
Amazon SQSEvents
Processor
Amazon
DynamoDB
Maximum of 5 Global and 5 Local Indexes
You can manage your own indexes, but your
application must then handle partial mutation failures.
Local Index Size LimitsLocal secondary indexes provide immediate
consistency… and limit the data set for a given hash
key to 10GB.
Data Warehouse
Brief History2008 - 2011
All business intelligence queries run on replicas of
MySQL clusters serving production traffic.
Brief History2011 - 2013
Data pushed to Amazon S3 and queried with Pig,
Amazon EMR, improving ability to aggregate, but with
high latency.
Brief History2013 - Present
Move to Amazon Redshift cut the time these reports
took from hours to seconds allowing us to answer
critical BI and financial questions in near real time.
Pushing Data Into Amazon Redshift
Post Flight
ServiceKafka
SQS (DLQ)
Amazon S3
Loader
S3Warehouse
Loader
Amazon
Redshift
Wrapping Up
Managed Services as a CultureOur focus is on creating an experience that unifies
and simplifies communications is a reflection on our
adoption of managed services.
Managed Services as a CultureUnderstanding and focusing on our areas of expertise
and leveraging managed services for the rest
accelerates the delivery of value and innovation
to our customers.
Thank You!
http://bit.ly/awsevals
Top Related