Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit...

37
Cloud Analytics and Business Intelligence on AWS

Transcript of Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit...

Page 1: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Cloud Analytics and Business

Intelligence on AWS

Page 2: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Infrastructure Regions Availability Zones Points of Presence

Enterprise

Applications Virtual Desktops Sharing & Collaboration

Core ServicesStorage

(Object, Block

and Archival)

Compute

(VMs, Auto-scaling

and Load Balancing)

Databases

(Relational, NoSQL,

Caching)

Networking

(VPC, DX,

DNS)

CDN

Access

Control

Usage & Resource

Tracking

Monitoring

and Logs

Administration &

SecurityKey Storage &

Management

Identity

Management

Service

Catalog

Platform

Services

Deployment & Management

One-click web app

deployment

Dev/ops resource

management

Resource Templates

Push

Notifications

Mobile Services

Identity

Sync

Mobile

Analytics

App Services

Queuing &

Notifications

Workflow

App streaming

Transcoding

Email

Search

Analytics

Hadoop

Data Pipeline

Data Warehouse

Real-time

Streaming Data

Code Deploy

Code Pipeline

Code Commit

Machine

Learning

Page 3: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Availability99.99%

Durability 99.999999999%

A Distributed Object StoreNot a file system

No Single Points of Failure

Eventually consistent

Paradigm Object store

Performance Very Fast

Redundancy Across Availability Zones

Security Public Key / Private Key

Pricing $0.03/GB/month

Typical use case Write once, read many

Simple Storage

ServiceHighly scalable object

storage for the internet

1 byte to 5TB in size

99.999999999% durability

Page 4: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

34 secs per terabyte

GB/Second

Re

ad

er

Co

nn

ection

s

Amazon S3 provides near linear scalability

S3 Streaming Performance100 VMs; 9.6GB/s; $26/hr

350 VMs; 28.7GB/s; $90/hr

S3 Performance & Scalability

Page 5: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Application Services

Amazon KinesisManaged Service for Real Time Big Data Processing

Create Streams to Produce & Consume Data

Elastically Add and Remove Shards for Performance

Use Kinesis Worker Library to Process Data

Integration with S3, Redshift and Dynamo DB

Compute Storage

AWS Global Infrastructure

Databas

e

App Services

Deployment & Administration

Networking

Analytics

Page 6: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Data Sources

App.4

[Machine Learning]

AW

S En

dp

oin

t

App.1

[Aggregate & De-Duplicate]

Data Sources

Data Sources

Data Sources

App.2

[Metric Extraction]

S3

DynamoDB

Redshift

App.3[Sliding Window Analysis]

Data Sources

Availability

Zone

Shard 1

Shard 2

Shard N

Availability

ZoneAvailability

Zone

Amazon Kinesis

Page 7: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Cloud HSMDedicated Tenancy SafeNet Luna SA HSM Device

Common Criteria EAL4+, NIST FIPS 140-2

AWS Key Management ServiceImplemented on HSM

Automated Key Rotation & Auditing

Integration with other AWS Services

AWS Server Side EncryptionAWS Managed Key Infrastructure

AWS Security Services

Compute Storage

AWS Global Infrastructure

Databas

e

App Services

Deployment & Administration

Networking

Analytics

Page 8: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Structured Data Management

Page 9: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Database

Relational Database ServiceManaged Oracle, MySQL & SQL Server

Dynamo DBManaged NOSQL Database

ElastiCacheManaged In Memory Caching

RDS Dynamo

DB

Redshift Elasticache

Amazon RedshiftMassively Parallel Petabyte Scale Data Warehouse

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 10: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Database

Relational Database ServiceDatabase-as-a-Service

No need to install or manage database instances

Scalable and fault tolerant configurations

Integration with Data Pipeline

RDS Dynamo

DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 11: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Database

DynamoDBProvisioned throughput NoSQL database

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Integration with EMR & Hive

RDS Dynamo

DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 12: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

• Writes– Writes are acknowledged

(committed) once they exist in at least two physical data centers

– Writes are persisted to SSD

• Reads– Tunable for Application

Requirements

• No reduction in durability or consistency in order to achieve throughput

Dynamo Consistency

Eventually Consistent Read Strongly Consistent Read

Stale Values reads possible No Stale Values read

Highest Throughput Lower Potential Throughput

√ √

Page 13: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Database

RedshiftManaged Massively Parallel Petabyte Scale Data

Warehouse

Streaming Backup/Restore to S3

Load data from S3, DynamoDB and EMR

Extensive Security Features

Scale from 160 GB -> 1.6 PB Online

RDS Dynamo

DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 14: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Query

Load

Backup

Restore

Resize

ComputeNode

ComputeNode

ComputeNode

LeaderNode

Common BI Tools

JDBC/ ODBC

10GigE Mesh

Redshift Parallelizes Everything

Page 15: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Exploratory Analytics…

Data Cleansing…

Advanced Data Science

Page 16: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Elastic MapReduce

Managed, elastic Hadoop (1.x & 2.x) cluster

Integrates with S3, DynamoDB and Redshift

Install End User Tools Automatically (Spark,

Impala)

Support for EC2 Spot Instances

Transient or Always on Clusters

Managed Big Data

Elastic

MapReduce

Compute Storage

AWS Global Infrastructure

Databas

e

App Services

Deployment & Administration

Networking

Analytics

Page 17: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure
Page 18: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

EMR

Pig

Vibrant Ecosystem

HDFS

Page 19: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Weather Insurance for Farms

Challenge:Volatile weather is deadly to crops

like grapes

60 years of crop data

200 TB of S3 Data

1M government Doppler radar points

Solution:Built a predictive model based on

freely available data:

150B Soil

Observations

850K Precision Rainfall

Grids Tracked

3M Daily Weather

Measurements

50 EMR clusters process new data as it comes into S3 each

day, continuously updating the model

Page 20: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Try different configurations to find the optimal cost/performance balance

CPU

c3 family

cc2.8xlarge

d2 family

Memory

m2 family

r3 family

Disk/IO

d2 family

i2 family

General

m3 family

Choose your instance types

ETL Machine Learning Spark HDFS

Page 21: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Custom Intel Xeon processors for AWS C4 = highest performing EC2 instances

New EC2 Instances – C4

Page 22: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

The Financial Industry Regulatory Authority

30 Billion Market Events / Day

Objective to react to changing Market Dynamics

Amazon Elastic MapReduce & Amazon S3

$10-20M Savings by moving Platform to AWS

Page 23: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Event Processing

AWS LambdaFully Managed Event Processor

Node.js, Integrated AWS SDK & ImageMagick

Natively Compile & Install Node.js modules

Specify Runtime RAM & Timeout

Automatically Scaled to support Event Volume

Events from S3, Dynamo DB, Kinesis & Lambda

Integrated CloudWatch Logging

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Page 24: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Introducing Amazon Machine Learning

Easily create machine learning models

Visualize and optimize models

Put models into production in seconds

Battle-hardened technologyMachine Learning

expertise

SDE expertise

Page 25: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Easy to Use, High Performance

Train and optimize models on GBs of data

Batch process predictions

Real-time prediction API in one-click

No servers to provision or manage

Page 26: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Developing with Amazon Machine Learning

Buildmodel

Validate &optimize

Make predictions

1 2 3

Page 27: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Building a Predictive Model with Amazon Machine Learning

Use existing data in S3, Redshift and RDS

Automatic data visualization

& exploration

Descriptive and summary statistics

Your data doesn’t have to be perfect

Missing data, malformed data records, type validation

Page 28: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Model Validation and Optimization Tools

Page 29: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Making Predictions with Amazon Machine Learning

Batch predictions

Asynchronous predictions with trained model

Real time predictions

Synchronous, low latency, high throughput

Mount API end-point with a single click

Page 30: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Traditional Business Intelligence…

OLAP…

Data Sources for ML

Page 31: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Managed Data Warehouse

Redshift

Managed Massively Parallel Petabyte Scale Data

Warehouse

Streaming Backup/Restore to S3

Load data from S3, DynamoDB and EMR

Extensive Security Features

Scale from 160 GB -> 1.6 PB Online

RDS Dynamo DB

Redshift ElastiCache

Compute Storage

AWS Global Infrastructure

Databas

e

App Services

Deployment & Administration

Networking

Analytics

Page 32: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Redshift lets you start small and grow big

Extra Large Node (dw1.xl & dw2.xl)

3 spindles, 15GiB RAM 2 virtual cores, 10GigE

Single Node (160GB SSD or 2TB Magnetic)

Cluster 2-32 Nodes (320GB SSD – 64TB Magnetic)

8 Extra Large Node (dw1.8xl & dw2.8xl)

24 spindles, 120GiB RAM, 1.2TB SSD or 16TB Magnetic, 16 virtual cores, 10GigE

Cluster 2-100 Nodes (2.4TB SSD – 1.6PB Magnetic)

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

XL

Page 33: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

End User Reporting

Redshift

S3

EMR

Dynamo DB

Page 34: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Ignite Your Ambition

34

Leading Index Provider With

41,000+ Indexes Across Asset Classes And Geographies

Over 10,000 Corporate Clients in

60 countries

Our technology

powers over

70

MARKETPLACES,

regulators, CSDs

and clearing-

houses

in over

50 COUNTRIES

100+ DATA

PRODUCT OFFERINGS

supporting 2.5+ millioninvestment professionals

and users

IN 98 COUNTRIES

26 Markets

3 Clearing Houses

5 Central Securities

Depositories

Lists more than 3,500

companies in 35 countries,

representing more than $8.8

trillion in total market value

Page 35: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

NDW 1.0 Requirements

Original scope was to replace on-premises warehouse with Redshift, keeping equivalent schemas and data

4-8 Billion Rows/Day

Legacy limited to 1 Year Retention

Must be lower cost than legacy system

Legacy $1.16M/Year

Must satisfy multiple security and regulatory requirements

Must perform similarly to legacy warehouse under concurrent query load

Page 36: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Migration Completed On Schedule

Migrated off legacy warehouse to Redshift (start to finish) in 7 man-months

Redshift costs were 43% of legacy budget for the same data set (~1100 tables)

Tuned queries now running faster than on legacy system

Data Ingest5.5B rows/day average for 2014

High water mark: 14B rows in 1 day

Best write rates ~2.76M rows/second

450 GB/day (after compression) into Redshift

1,895 GB/day average uncompressed

Currently resize clusters once a quarter (if necessary)

NDW_Prod is currently growing +3 dw1.8xl nodes per quarter

Page 37: Cloud Analytics and Business Intelligence on AWSaws-de-media.s3.amazonaws.com/images/AWS Summit Berlin 2015... · Cloud Analytics and Business Intelligence on AWS. Infrastructure

Integrated Analytics