Post on 16-Apr-2017
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Raju Gulabani, VP Database Services, AWS
November 2016
DAT320
AWS Database ServicesState of the Union
What to Expect from the Session
• Learn our strategy and overview of our key services
• Get a sense of our scale and key customers per service
• Understand when to use which services for your apps
Strategy
• Start from the customer and work backwards
• Offer managed services
• Leverage the cloud architecture
• Support migration of apps and data from/to on-premises
• Multiple services, each optimized for different use case
Comprehensive Product Portfolio
Traditional Apps
Relational Databases
NoSQL & In-MemoryBig
Data
RDS
Aurora
Database Migration Service
Relational Databases
DynamoDB
ElastiCache
NoSQL & In-Memory
Amazon Redshift
EMR
Data Pipeline
Athena
Big Data
QuickSight
Elasticsearch
Amazon ML
Analytics
Database Services Usage
• Amazon Aurora is the fastest growing service in AWS history
• More than 14,000 databases have been migrated using AWS
Database Migration Service
• DynamoDB served over 56 billion extra requests worldwide on
Prime Day compared to the same day the previous week.
Select DB Services Customers
Relational Databases
Amazon RDS
Amazon Aurora
Database Migration Service
• Multi-engine support: Aurora, MySQL, MariaDB, PostgreSQL,
Oracle, SQL Server
• Automated provisioning, patching, scaling, backup/restore,
failover
• Use with GP2 or Provisioned IOPS storage
• High availability with RDS Multi-AZ
– 99.95% SLA for Multi-AZ deployments
Amazon RDS
Amazon Aurora
Key Insight: Relational Databases are Complex
• Our experience running Amazon.com taught us that
relational databases can be a pain to manage and
operate with high availability
• Poorly-managed relational databases are a leading
cause of lost sleep and downtime in the IT world!
• Lower TCO because we manage the muck
• Get more leverage from your teams
• Focus on the things that differentiate you
• Built-in high availability and cross region replication across multiple data centers
• Available on all engines, including base/standard editions, not just for enterprise editions
• Now even a small startup can leverage multiple data centers to design highly available apps with over 99.95% availability.
We Made Things Cheaper, Easier, and Better
Enterprise-grade fault tolerance
solution for production
databases
Automatic failover
Synchronous replication
Inexpensive & enabled with one click
High Availability Multi-AZ Deployments
Amazon RDS Customers
• Airbnb moved its main MySQL database to Amazon RDS
with only 15 minutes of downtime
• RDS simplifies much of the time-consuming administrative
tasks associated with databases so engineers can spend
more time on features
• Uses asynchronous master-slave replication to improve
website performance launched via the RDS console or an
API call
• Leverages multi-Availability Zone (Multi-AZ) for high
availability
Airbnb – Amazon RDS for MySQL
Reinventing the Relational Database
Key Questions We Asked
• What if we started from a clean sheet of paper with only constraint being that the
database was a relational database?
• Could we offer much better performance by leveraging the massive scale of our
cloud?
• Could we give you a database with designed durability indistinguishable from 100%
and availability of 99.99%?
• …And could we be better and cheaper than the 30-year old commercial databases in
use today?
Yes, We Can. Answer = Amazon Aurora
• A new relational database engine, built from the ground
up to leverage AWS• For all new apps that require SQL, we recommend Amazon Aurora
• Commercial-grade performance and availability at open
source prices
• Retains compatibility with MySQL 5.6
Amazon RDS for Aurora
• MySQL compatible with up to 5x better performance on the
same hardware: 100,000 writes/sec & 500,000 reads/sec
• Scalable with up to 64 TB in single database, up to 15 read
replicas
• Highly available, durable, and fault-tolerant custom SSD
storage layer: 6-way replicated across 3 Availability Zones
• Transparent encryption for data at rest using AWS KMS
• Stored procedures in Amazon Aurora can invoke AWS
Lambda functions
Fastest growing service
in AWS history
Amazon Aurora Customers
Use case: Near real-time analytics and reporting
MasterRead
Replica
Read
Replica
Read
Replica
Shared distributed storage volume
Reader end-point
A customer in the travel industry migrated to Aurora for
their core reporting application accessed by ~1,000
internal users.
Replicas can be created, deleted and scaled within
minutes based on load.
Read-only queries are load balanced across replica
fleet through a DNS endpoint – no application
configuration needed when replicas are added or
removed.
Low replication lag allows mining for fresh data with
no delays, immediately after the data is loaded.
Significant performance gains for core analytics
queries - some of the queries executing in 1/100th
the original time.
► Up to 15 promotable read replicas
► Low replica lag – typically < 10ms
► Reader end-point with load balancing
Amazon Aurora is now PostgreSQL-compatible
• PostgreSQL 9.6 compatibility with support for PostGIS
• All the features you expect from Amazon Aurora
including 15 read replicas with <10ms lag, shared
storage, failover without data loss, 6-way replication
across 3 Availability Zones, encryption with AWS KMS
• Available now in preview
Simplify monitoring from the
AWS Management Console
Database load: Identifies
database bottlenecks
Easy
Powerful
Identifies source of bottlenecks
Top SQL
Adjustable time frame
Hour, day, week, and longer
Max CPU
Performance Insights for Amazon RDS
AWS Database Migration Service
• Fully managed service for migration from on-premises to
the AWS Cloud with minimal downtime
• Migrates data to and from all widely used commercial and
open source DBs
• Schema Conversion Tool that converts source DB schemas,
stored procedures and application code to a different target
format
• Supports homogenous and heterogeneous data replication
• A terabyte-sized DB can be migrated for as little as $3
Database Conversion Capabilities in SCT
Source Database Target Database
Microsoft SQL Server Amazon Aurora, MySQL, PostgreSQL
MySQL PostgreSQL
Oracle Amazon Aurora, MySQL, PostgreSQL
Oracle Data Warehouse Amazon Redshift
PostgreSQL Amazon Aurora, MySQL
Teradata, Netezza, Greenplum Amazon Redshift
AWS Database Migration Service Customers
Heterogeneous Migration
• Oracle private DC to RDS PostgreSQL migration
• Used the AWS Schema Conversion Tool to convert their database schema
• Used on-going replication (CDC) to keep databases in sync until they reached the cutover window
• Benefits:• Improved reliability of the cloud environment
• Savings on Oracle licensing costs
• SCT Assessment Report let them understand the scope of the migration
NoSQL & In MemoryDynamoDB
ElastiCache
Fast, Flexible, Scalable NoSQLAmazon
DynamoDB
History of NoSQL at Amazon
Key Questions We Asked
• Aurora was designed with a single constraint• SQL compatibility and relational database semantics
• What if we said no to this constraint?• No to SQL = NoSQL
• Could we eliminate the things we didn’t like about
relational databases?
Yes, We Can. Answer = Amazon DynamoDB
• Database that can scale beyond a single box without any changes to your app
• You can start small but know that there is no limit to how successful your app can be
• If your app is running fast today with 10 users, it will always run fast, even when you have 1M,
10M or 100M users using your app
• No need to spend time tuning queries and diagnosing why your app is running slow
• Deliver availability and durability indistinguishable from 100%.
• 99.99% and 60 second failover are not good enough
• You don’t have to manage anything. You don’t even need to know what a database instance is
• No schema. All you need to tell us is the number of reads/sec and writes/sec you want to execute.
We do the rest
Amazon DynamoDB Customers
Lyft Easily Scales Up its Ride Location Tracking System using DynamoDB
It was so simple to scale out.
We had two knobs. One was
for reads and one was for
writes.
Chris Lambert
CTO, Lyft
”
“ • Lyft serves up to 8x more rides during
peak times
• The GPS location for all rides was
tracked in the ride location tracking
system.
• In June, 2014, Lyft deployed DynamoDB
in production.
• Lyft has since moved many of its other
data stores over to DynamoDB as well.
In-memory cache
Memcached or Redis
Fully managed; zero admin
Amazon
ElastiCache
Key ElastiCache Features
• Fully managed
• Cache node auto-discovery
• Multi-AZ node
placement
• Fully managed
• Persistence
• Read replicas
• Multi-AZ with
auto-failover
• Redis cluster
Gaming AdTech Media Mobile Other
Amazon ElastiCache Customers
RDS and ElastiCache are Behind Grab’s Taxi-Booking App
The latency of a cab call must be low,
and remain low even in times of peak
traffic of hundreds of thousands of
cab requests per minute. We use
ElastiCache for Redis in front of RDS
MySQL to keep our systems’ real
time performance at any scale.
Ryan Ooi
Sr. Devops Engineer, Grab
”
“ • Grab is a popular taxi hailing app in southeast Asia.
• Average response time of the API layer is <40ms, mandating an
in-memory layer to achieve such performance.
• A small devops team that tried running Redis on EC2 before, but
that was too much work. Using both RDS and ElastiCache in
Multi-AZ allowed them to outsource all the management to AWS.
Big Data
Amazon Redshift
Amazon EMR
Amazon Athena
Data Pipeline
Amazon Redshift
• Petabyte-scale, relational, MPP, data
warehousing
• Fully managed with SSD and HDD platforms
• Built-in end-to-end security, including
customer-managed keys
• $1,000/TB/year; start at $0.25/hour
Why we built Amazon Redshift
• Customers were generating data in the cloud but moving
it on-premises to analyze it using a data warehouse
• Customers had migrated everything to AWS except their
on-premises data warehouses. • They wanted to shut down these data centers but could not till we offered
them a solution in the cloud
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
1990 2000 2010 2020
Key Insight: Most Data Falls on the Floor
90% of the data in a company
is never analyzed
High costs and complexity of
traditional DW systems make it
hard to justify the capital
expense
Key Questions We Asked
• Could we design a system cheap and scalable enough
to let you analyze all your data?
• Could we build a service that was faster, cheaper, and
easier to use than traditional DW systems?
Yes, We Can. Answer = Amazon Redshift
• A massively parallel processing (MPP) system with up to 128 compute nodes to
store and process up to 2PB of compressed data
• At $1,000/TB/year, its so cheap that you can analyze all your data
• You can provision a petabyte in under three minutes and pay for it by the hour
• 10x performance and 1/10 the price of other solutions
• Fully managed with automated provisioning, patching, securing, backup,
restore, and built-in fault tolerance
Amazon Redshift Customers
NTT Docomo: Japan’s largest mobile provider
• 68 million customers
• 10s of TBs per day of data across
mobile network
• 6PB of total data (uncompressed)
• Data science for marketing
operations, logistics etc.
• Greenplum on-premises
• 125 node DS2.8XL cluster
• 4,500 vCPUs, 30TB RAM
• 6 PB uncompressed data
• 10x faster analytic queries
• 50% reduction in time for
new BI app. deployment
• Significantly less ops.
overhead
Amazon EMR
• Hadoop, Hive, Presto, Spark, Tez, Impala etc.
• Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase
1.2.3 and HBase on S3, Phoenix, Tez, Flink
• New applications added within 30 days of their open source release
• Fully managed, automatically scaling clusters with support for On-
Demand and Spot pricing
• Support for HDFS and S3 filesystems enabling separated compute and
storage; multiple clusters can run against the same data in S3
• HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client-
side encryption with customer managed keys and AWS KMS
Why we built Amazon EMR
• Customers wanted to use the latest open source analytic frameworks to analyze and transform their data
• Customers wanted to use technologies like Spark and Presto in conjunction with AWS services like Amazon S3 and features like EC2 Spot Instances
• Customers wanted to benefit from the elasticity that AWS offers
Amazon EMR Customers
Amazon Athena
• Serverless query service for querying data in S3 with no
infrastructure to manage.
• No data loading required; query directly from Amazon S3
• Use standard ANSI SQL queries with support for joins,
JSON, and window functions.
• Support for multiple data formats include text, CSV, TSV,
JSON, Avro, ORC, Parquet
• Pay per query only when you’re running queries; $5/TB
scanned; if you compress your data, your queries cost less
Why we built Amazon Athena
• Customers wanted an easy way to run ad-hoc queries
on data in Amazon S3 with no infrastructure to manage
• Customers wanted a service that could complement their
use of Amazon Redshift and Amazon EMR
• Customers wanted to give this capability to anyone in
their company and only pay per query
Amazon Athena Customers
Analytics QuickSight
Amazon ES
Amazon ML
As a native cloud service, QuickSight
combines the speed, scalability, and
and ease of deployment that our
customers have come to depend on
with the value and cost effectiveness
you expect from AWS.
Amazon QuickSight
Fast, easy to use business analytics service at 1/10th the
cost of traditional BI solutions.
Amazon QuickSight
• Auto-Discover AWS data sources like Amazon Redshift, RDS, and S3
• Connect to third-party sources like Excel, Salesforce, and other
hosted/on-premises databases
• Super-fast performance with SPICE
• Instant visualizations with Autograph
• Securely share and collaborate on analyses, dashboards and stories
• Native iPhone experience and web based access from all other devices
• Governed datasets
• User access controls
• Active Directory Integration
QuickSight providing real-time insights at MLB Advanced Media
QuickSight provides us with
a real-time, 360 degree view
of our business without
being constrained by pre-
built dashboards and
metrics expanding our use
of data to make informed
decisions.
Brandon Sangiovanni
Sr. BI Development Manager
”
“
Distributed search and analytics engine
Managed service using Elasticsearch and Kibana
Fully managed; zero admin
Highly available and reliable
Tightly integrated with other AWS servicesAmazon
Elasticsearch
Service
Amazon Elasticsearch Service Leading Use-Cases
Log Analytics &
Operational Monitoring
• Monitor the performance of your
application, web servers, and
hardware
• Easy to use, yet powerful data
visualization tools to detect
issues in near real-time
• Ability to dig into your logs in an
intuitive, fine-grained way
• Kibana provides fast, easy
visualization
Traditional Search
• Application or website provides search capabilities over diverse documents
• Tasked with making this knowledge base searchable and accessible
• Key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting
• Query API to support application search
Media and Entertainment
Online Services
Technology Other
Amazon Elasticsearch Customers
Case Study: Adobe Developer Platform (Adobe I/O)
Over 200,000 API calls per second peak• destinations, response times, bandwidth
Log data is routed with Amazon Kinesis to Amazon Elasticsearch Service, then
displayed using AES Kibana
Adobe team can easily see traffic patterns and error rates, quickly identifying
anomalies and potential challenges
Amazon
Kinesis
StreamsSpark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1
Which Service Should You Use?
Situation Solution
Existing application
Use your existing engine on RDS• MySQL Amazon Aurora, RDS for MySQL
• PostgreSQL RDS for PostgreSQL
• Oracle, SQL Server RDS for Oracle, RDS for SQL Server
New application• If you can avoid relational features DynamoDB
• If you need relational features Amazon Aurora
Data Warehouse & BI • Amazon Redshift and Amazon QuickSight
Ad hoc analysis of data in S3 • Amazon Athena and Amazon QuickSight
Spark, Hadoop, Hive, HBase • Amazon EMR
Log analytics, operational
monitoring and search• Amazon Elasticsearch Service
Thank you!
Remember to complete
your evaluations!