Big Data Analytics on AWS

Post on 22-Jan-2018

768 views 1 download

Transcript of Big Data Analytics on AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dickson Yue, Solutions Architect

17 June 2016

Big Data Analytics on AWS Digital Innovation & e-Commerce Track

How to get started?

Data Answers

START HERE WITH A BUSINESS CASE

Revenue Lift

Market acquisition

Product recommendation

Improve user experience

Operation intelligence

Data Answers

Time to Answer (Latency) Throughput

Cost

Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

Data Answers Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS

Amazon EMR

Amazon Redshift

Amazon Machine Learning

Storage Processing Visualize

ElasticSearch service

QuickSight

ElastiCache

Tracking Clickstream, user retention

Answer •  User retention •  High spending customer

navigation pattern •  Product recommendation •  User journey in the shop •  UX improvement •  What deal/ad to try

next

Use case

Data source •  Page •  Click event •  Web log •  Thing event

JavaScript (Snowplow)

AWS SDK

logstach

Fluentd

Ingest Store

@ 30km/s a.k.a 300 rps

HTTP Post

Amazon S3

Storage

@ 100km/s Ingest Store

JavaScript (Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon Kinesis

Firehose

API Server Streaming Buffer

24hrs-7days

Web Servers

Amazon S3

Storage Data lake

@ 100km/s Ingest Store

JavaScript (Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon S3

Amazon Kinesis

Firehose

API Gateway

API Server Streaming Buffer

24hrs-7days

Storage Data lake

Amazon S3

Storage Data lake

Store Process/Analyze

EMR

Redshift

Redshift EMR ETL

Visualize

JDBC ODBC

JDBC ODBC

QuickSight

Amazon S3

Store Process

EMR

Visualize

JDBC ODBC

Redshift Basket

CRM ERP DBs

Log file

QuickSight

Day-14 retention over time

User retention and growth

N-day retention

Social listening Social CRM, Chatbot

Answer Campaign performance Customer service automation Building Chatbot

Use case

Data Brand page activity Post #hashtag User profile

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

Crawlers AWS SDK

Amazon Kinesis

Firehose

Store

Amazon S3 Data Lake

ElasticSearch Last 120mins

Analysts

AWS SDK

Why do we need machine learning for this?

The social media stream is high-volume, and most of the messages are not CS-actionable

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

Crawlers AWS SDK

Amazon Kinesis

Process

Amazon Lambda

Analyze

AWS SDK

Machine learning

Notification

Action

Support issue

Database

Feature request

Keep training the ML model with new data

Action

Amazon S3

AWS SDK

Ingest Store

Bot AWS SDK

Messenger

Amazon Kinesis

Process

Amazon Lambda

Analysts

Machine learning

Action

Bot

App

Get prediction

Keep training the ML model with new data Amazon S3

OI from Business view with custom source

Refrigerator

POS

Door sensor

Water

Camera

Storefront

Kitchen

Lambda

SQS

AWS IoT

SQSPoller

Http Event Collector

Serverless Architecture

Our Big Data Scale

Total ~25 PB DW on Amazon S3 Read ~10% DW daily Write ~10% of read data daily ~ 550 billion events daily ~ 350 active platform users

predict what you want to watch before you watch it.

Netflix Prize - best collaborative filtering algorithm

Storage Compute Service Tools

Big Data Portal

API Portal

Big Data API

AWS S3

Data Answers Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

START WITH A BUSINESS CASE

MATCH AVAILABLE DATA

CHOOSE BEST FIT

Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS

Amazon EMR

Amazon Redshift

Amazon Machine Learning

Storage Processing Visualize

ElasticSearch service

QuickSight

ElastiCache

Source DBs

3rd Party Data

Log Data

Reporting

Analysis

Processing

Data Lake

S3

Source of truth

Remember to complete your evaluations!

Thank you

CRM ERP DBs

Log file

AWStats

days

MB

2002 Big bang

<2005 Hello world

Page/Event tracking

GA

hours

GB

SOLOMO

minutes - hours

TB

<2008 New customer service

New System monitoring New QA

IoT

O2O

seconds – hours PB

2016 Fast and big

data driven marketing

Analytics

ETL

Interactive data exploration

Interactive slice & dice

RT analytics & iterative/ML algo and more ...

Different Big Data Processing Needs