AWS re:Invent 2016: AWS Database State of the Union (DAT320)

Raju Gulabani, VP Database Services, AWS

November 2016

DAT320

AWS Database ServicesState of the Union

What to Expect from the Session

• Learn our strategy and overview of our key services

• Get a sense of our scale and key customers per service

• Understand when to use which services for your apps

Strategy

• Start from the customer and work backwards

• Offer managed services

• Leverage the cloud architecture

• Support migration of apps and data from/to on-premises

• Multiple services, each optimized for different use case

Comprehensive Product Portfolio

Traditional Apps

Relational Databases

NoSQL & In-MemoryBig

Aurora

Database Migration Service

DynamoDB

ElastiCache

NoSQL & In-Memory

Amazon Redshift

Data Pipeline

Athena

Big Data

QuickSight

Elasticsearch

Amazon ML

Analytics

Database Services Usage

• Amazon Aurora is the fastest growing service in AWS history

• More than 14,000 databases have been migrated using AWS

• DynamoDB served over 56 billion extra requests worldwide on

Prime Day compared to the same day the previous week.

Select DB Services Customers

Amazon RDS

Amazon Aurora

• Multi-engine support: Aurora, MySQL, MariaDB, PostgreSQL,

Oracle, SQL Server

• Automated provisioning, patching, scaling, backup/restore,

failover

• Use with GP2 or Provisioned IOPS storage

• High availability with RDS Multi-AZ

– 99.95% SLA for Multi-AZ deployments

Amazon RDS

Amazon Aurora

Key Insight: Relational Databases are Complex

• Our experience running Amazon.com taught us that

relational databases can be a pain to manage and

operate with high availability

• Poorly-managed relational databases are a leading

cause of lost sleep and downtime in the IT world!

• Lower TCO because we manage the muck

• Get more leverage from your teams

• Focus on the things that differentiate you

• Built-in high availability and cross region replication across multiple data centers

• Available on all engines, including base/standard editions, not just for enterprise editions

• Now even a small startup can leverage multiple data centers to design highly available apps with over 99.95% availability.

We Made Things Cheaper, Easier, and Better

Enterprise-grade fault tolerance

solution for production

databases

Automatic failover

Synchronous replication

Inexpensive & enabled with one click

High Availability Multi-AZ Deployments

Amazon RDS Customers

• Airbnb moved its main MySQL database to Amazon RDS

with only 15 minutes of downtime

• RDS simplifies much of the time-consuming administrative

tasks associated with databases so engineers can spend

more time on features

• Uses asynchronous master-slave replication to improve

website performance launched via the RDS console or an

API call

• Leverages multi-Availability Zone (Multi-AZ) for high

availability

Airbnb – Amazon RDS for MySQL

Reinventing the Relational Database

Key Questions We Asked

• What if we started from a clean sheet of paper with only constraint being that the

database was a relational database?

• Could we offer much better performance by leveraging the massive scale of our

cloud?

• Could we give you a database with designed durability indistinguishable from 100%

and availability of 99.99%?

• …And could we be better and cheaper than the 30-year old commercial databases in

use today?

Yes, We Can. Answer = Amazon Aurora

• A new relational database engine, built from the ground

up to leverage AWS• For all new apps that require SQL, we recommend Amazon Aurora

• Commercial-grade performance and availability at open

source prices

• Retains compatibility with MySQL 5.6

Amazon RDS for Aurora

• MySQL compatible with up to 5x better performance on the

same hardware: 100,000 writes/sec & 500,000 reads/sec

• Scalable with up to 64 TB in single database, up to 15 read

replicas

• Highly available, durable, and fault-tolerant custom SSD

storage layer: 6-way replicated across 3 Availability Zones

• Transparent encryption for data at rest using AWS KMS

• Stored procedures in Amazon Aurora can invoke AWS

Lambda functions

Fastest growing service

in AWS history

Amazon Aurora Customers

Use case: Near real-time analytics and reporting

MasterRead

Replica

Shared distributed storage volume

Reader end-point

A customer in the travel industry migrated to Aurora for

their core reporting application accessed by ~1,000

internal users.

Replicas can be created, deleted and scaled within

minutes based on load.

Read-only queries are load balanced across replica

fleet through a DNS endpoint – no application

configuration needed when replicas are added or

removed.

Low replication lag allows mining for fresh data with

no delays, immediately after the data is loaded.

Significant performance gains for core analytics

queries - some of the queries executing in 1/100th

the original time.

► Up to 15 promotable read replicas

► Low replica lag – typically < 10ms

► Reader end-point with load balancing

Amazon Aurora is now PostgreSQL-compatible

• PostgreSQL 9.6 compatibility with support for PostGIS

• All the features you expect from Amazon Aurora

including 15 read replicas with <10ms lag, shared

storage, failover without data loss, 6-way replication

across 3 Availability Zones, encryption with AWS KMS

• Available now in preview

Simplify monitoring from the

AWS Management Console

Database load: Identifies

database bottlenecks

Powerful

Identifies source of bottlenecks

Top SQL

Adjustable time frame

Hour, day, week, and longer

Max CPU

Performance Insights for Amazon RDS

AWS Database Migration Service

• Fully managed service for migration from on-premises to

the AWS Cloud with minimal downtime

• Migrates data to and from all widely used commercial and

open source DBs

• Schema Conversion Tool that converts source DB schemas,

stored procedures and application code to a different target

format

• Supports homogenous and heterogeneous data replication

• A terabyte-sized DB can be migrated for as little as $3

Database Conversion Capabilities in SCT

Source Database Target Database

Microsoft SQL Server Amazon Aurora, MySQL, PostgreSQL

MySQL PostgreSQL

Oracle Amazon Aurora, MySQL, PostgreSQL

Oracle Data Warehouse Amazon Redshift

PostgreSQL Amazon Aurora, MySQL

Teradata, Netezza, Greenplum Amazon Redshift

AWS Database Migration Service Customers

Heterogeneous Migration

• Oracle private DC to RDS PostgreSQL migration

• Used the AWS Schema Conversion Tool to convert their database schema

• Used on-going replication (CDC) to keep databases in sync until they reached the cutover window

• Benefits:• Improved reliability of the cloud environment

• Savings on Oracle licensing costs

• SCT Assessment Report let them understand the scope of the migration

NoSQL & In MemoryDynamoDB

ElastiCache

Fast, Flexible, Scalable NoSQLAmazon

DynamoDB

History of NoSQL at Amazon

• Aurora was designed with a single constraint• SQL compatibility and relational database semantics

• What if we said no to this constraint?• No to SQL = NoSQL

• Could we eliminate the things we didn’t like about

relational databases?

Yes, We Can. Answer = Amazon DynamoDB

• Database that can scale beyond a single box without any changes to your app

• You can start small but know that there is no limit to how successful your app can be

• If your app is running fast today with 10 users, it will always run fast, even when you have 1M,

10M or 100M users using your app

• No need to spend time tuning queries and diagnosing why your app is running slow

• Deliver availability and durability indistinguishable from 100%.

• 99.99% and 60 second failover are not good enough

• You don’t have to manage anything. You don’t even need to know what a database instance is

• No schema. All you need to tell us is the number of reads/sec and writes/sec you want to execute.

We do the rest

Amazon DynamoDB Customers

Lyft Easily Scales Up its Ride Location Tracking System using DynamoDB

It was so simple to scale out.

We had two knobs. One was

for reads and one was for

writes.

Chris Lambert

CTO, Lyft

“ • Lyft serves up to 8x more rides during

peak times

• The GPS location for all rides was

tracked in the ride location tracking

system.

• In June, 2014, Lyft deployed DynamoDB

in production.

• Lyft has since moved many of its other

data stores over to DynamoDB as well.

In-memory cache

Memcached or Redis

Fully managed; zero admin

Amazon

ElastiCache

Key ElastiCache Features

• Fully managed

• Cache node auto-discovery

• Multi-AZ node

placement

• Fully managed

• Persistence

• Read replicas

• Multi-AZ with

auto-failover

• Redis cluster

Gaming AdTech Media Mobile Other

Amazon ElastiCache Customers

RDS and ElastiCache are Behind Grab’s Taxi-Booking App

The latency of a cab call must be low,

and remain low even in times of peak

traffic of hundreds of thousands of

cab requests per minute. We use

ElastiCache for Redis in front of RDS

MySQL to keep our systems’ real

time performance at any scale.

Ryan Ooi

Sr. Devops Engineer, Grab

“ • Grab is a popular taxi hailing app in southeast Asia.

• Average response time of the API layer is <40ms, mandating an

in-memory layer to achieve such performance.

• A small devops team that tried running Redis on EC2 before, but

that was too much work. Using both RDS and ElastiCache in

Multi-AZ allowed them to outsource all the management to AWS.

Big Data

Amazon Redshift

Amazon EMR

Amazon Athena

Data Pipeline

Amazon Redshift

• Petabyte-scale, relational, MPP, data

warehousing

• Fully managed with SSD and HDD platforms

• Built-in end-to-end security, including

customer-managed keys

• $1,000/TB/year; start at $0.25/hour

Why we built Amazon Redshift

• Customers were generating data in the cloud but moving

it on-premises to analyze it using a data warehouse

• Customers had migrated everything to AWS except their

on-premises data warehouses. • They wanted to shut down these data centers but could not till we offered

them a solution in the cloud

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Available for analysis

Generated data

1990 2000 2010 2020

Key Insight: Most Data Falls on the Floor

90% of the data in a company

is never analyzed

High costs and complexity of

traditional DW systems make it

hard to justify the capital

expense

• Could we design a system cheap and scalable enough

to let you analyze all your data?

• Could we build a service that was faster, cheaper, and

easier to use than traditional DW systems?

Yes, We Can. Answer = Amazon Redshift

• A massively parallel processing (MPP) system with up to 128 compute nodes to

store and process up to 2PB of compressed data

• At $1,000/TB/year, its so cheap that you can analyze all your data

• You can provision a petabyte in under three minutes and pay for it by the hour

• 10x performance and 1/10 the price of other solutions

• Fully managed with automated provisioning, patching, securing, backup,

restore, and built-in fault tolerance

Amazon Redshift Customers

NTT Docomo: Japan’s largest mobile provider

• 68 million customers

• 10s of TBs per day of data across

mobile network

• 6PB of total data (uncompressed)

• Data science for marketing

operations, logistics etc.

• Greenplum on-premises

• 125 node DS2.8XL cluster

• 4,500 vCPUs, 30TB RAM

• 6 PB uncompressed data

• 10x faster analytic queries

• 50% reduction in time for

new BI app. deployment

• Significantly less ops.

overhead

Amazon EMR

• Hadoop, Hive, Presto, Spark, Tez, Impala etc.

• Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase

1.2.3 and HBase on S3, Phoenix, Tez, Flink

• New applications added within 30 days of their open source release

• Fully managed, automatically scaling clusters with support for On-

Demand and Spot pricing

• Support for HDFS and S3 filesystems enabling separated compute and

storage; multiple clusters can run against the same data in S3

• HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client-

side encryption with customer managed keys and AWS KMS

Why we built Amazon EMR

• Customers wanted to use the latest open source analytic frameworks to analyze and transform their data

• Customers wanted to use technologies like Spark and Presto in conjunction with AWS services like Amazon S3 and features like EC2 Spot Instances

• Customers wanted to benefit from the elasticity that AWS offers

Amazon EMR Customers

Amazon Athena

• Serverless query service for querying data in S3 with no

infrastructure to manage.

• No data loading required; query directly from Amazon S3

• Use standard ANSI SQL queries with support for joins,

JSON, and window functions.

• Support for multiple data formats include text, CSV, TSV,

JSON, Avro, ORC, Parquet

• Pay per query only when you’re running queries; $5/TB

scanned; if you compress your data, your queries cost less

Why we built Amazon Athena

• Customers wanted an easy way to run ad-hoc queries

on data in Amazon S3 with no infrastructure to manage

• Customers wanted a service that could complement their

use of Amazon Redshift and Amazon EMR

• Customers wanted to give this capability to anyone in

their company and only pay per query

Amazon Athena Customers

Analytics QuickSight

Amazon ES

Amazon ML

As a native cloud service, QuickSight

combines the speed, scalability, and

and ease of deployment that our

customers have come to depend on

with the value and cost effectiveness

you expect from AWS.

Amazon QuickSight

Fast, easy to use business analytics service at 1/10th the

cost of traditional BI solutions.

Amazon QuickSight

• Auto-Discover AWS data sources like Amazon Redshift, RDS, and S3

• Connect to third-party sources like Excel, Salesforce, and other

hosted/on-premises databases

• Super-fast performance with SPICE

• Instant visualizations with Autograph

• Securely share and collaborate on analyses, dashboards and stories

• Native iPhone experience and web based access from all other devices

• Governed datasets

• User access controls

• Active Directory Integration

QuickSight providing real-time insights at MLB Advanced Media

QuickSight provides us with

a real-time, 360 degree view

of our business without

being constrained by pre-

built dashboards and

metrics expanding our use

of data to make informed

decisions.

Brandon Sangiovanni

Sr. BI Development Manager

Distributed search and analytics engine

Managed service using Elasticsearch and Kibana

Fully managed; zero admin

Highly available and reliable

Tightly integrated with other AWS servicesAmazon

Elasticsearch

Service

Amazon Elasticsearch Service Leading Use-Cases

Log Analytics &

Operational Monitoring

• Monitor the performance of your

application, web servers, and

hardware

• Easy to use, yet powerful data

visualization tools to detect

issues in near real-time

• Ability to dig into your logs in an

intuitive, fine-grained way

• Kibana provides fast, easy

visualization

Traditional Search

• Application or website provides search capabilities over diverse documents

• Tasked with making this knowledge base searchable and accessible

• Key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting

• Query API to support application search

Media and Entertainment

Online Services

Technology Other

Amazon Elasticsearch Customers

Case Study: Adobe Developer Platform (Adobe I/O)

Over 200,000 API calls per second peak• destinations, response times, bandwidth

Log data is routed with Amazon Kinesis to Amazon Elasticsearch Service, then

displayed using AES Kibana

Adobe team can easily see traffic patterns and error rates, quickly identifying

anomalies and potential challenges

Amazon

Kinesis

StreamsSpark Streaming

Amazon

Elasticsearch

Service

Sources

Which Service Should You Use?

Situation Solution

Existing application

Use your existing engine on RDS• MySQL Amazon Aurora, RDS for MySQL

• PostgreSQL RDS for PostgreSQL

• Oracle, SQL Server RDS for Oracle, RDS for SQL Server

New application• If you can avoid relational features DynamoDB

• If you need relational features Amazon Aurora

Data Warehouse & BI • Amazon Redshift and Amazon QuickSight

Ad hoc analysis of data in S3 • Amazon Athena and Amazon QuickSight

Spark, Hadoop, Hive, HBase • Amazon EMR

Log analytics, operational

monitoring and search• Amazon Elasticsearch Service

Thank you!

Remember to complete

your evaluations!

AWS re:Invent 2016: AWS Database State of the Union (DAT320)

Technology

Transcript of AWS re:Invent 2016: AWS Database State of the Union (DAT320)

Zero to Sixty: AWS CloudFormation (DMG201) | AWS re:Invent 2013

Navigating AWS re:Invent 2015

AWS re:Invent 2016 Fast Forward

Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206) | AWS re:Invent 2013

NetApp Private Storage for AWS (ENT216) | AWS re:Invent 2013

AWS re:Invent 2015 re:Cap

Bluesoft @ AWS re:Invent 2017 + AWS 101

AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server Database to an Open Source Amazon Aurora or PostgreSQL Database Using AWS SCT and AWS DMS (DAT323)

AWS re:Invent 2017 Recap

(ARC204) Architecting Microsoft Workloads on AWS | AWS re:Invent 2014

(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014

(SEC201) AWS Security Keynote Address | AWS re:Invent 2014

Understanding AWS Storage Options (STG101) | AWS re:Invent 2013

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

(ENT302) Cost Optimization on AWS | AWS re:Invent 2014

AWS re:Invent Hackathon

Continuous Deployment @ AWS Re:Invent

Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013

Overview of Windows on AWS (CPN206) | AWS re:Invent 2013

AWS re:Invent re:Flection - Spot Pricing