Modern data architectures for real time analytics and engagement

Post on 12-Apr-2017

335 views 2 download

Transcript of Modern data architectures for real time analytics and engagement

Modern Data Architectures for Real-Time Analytics & Engagement

Russell NashAPAC Solutions Architect

Russell NashAPAC Solutions ArchitectAmazon Web Services

SCALABLE FLEXIBLE MANAGEABLE COST EFFECTIVE

Modern Data Architecture

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Modern Data Architecture

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Real-time Pipeline

Amazon Kinesis

Machines

Devices

Mobile

Clickstream

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Amazon Kinesis Stream

SHARD1000 TPS or 1MB 5 TPS or 2MB

SHARD

2000 TPS or 2MB 10 TPS or 4MB

SHARD

3000 TPS or 3MB 15 TPS or 6MB

Retention: 24 hours to 7 Days

Creating a Kinesis Stream

Amazon Kinesis Stream

SHARD

SHARD

SHARD

EVENT PRODUCERS

KinesisEndpoint

Specify Partition Key

• Writes to one or more Amazon Kinesis Streams• Retry Mechanism• Uses PutRecords • Aggregates • Integrates with Amazon KCL to de-aggregate• Submits Amazon CloudWatch metrics

Kinesis Producer Library

Kinesis Agent

• Monitors files and sends new data records to your delivery stream• Handles file rotation, checkpointing, and retry upon failures• Delivers all data in a reliable, timely, and simple manner• Emits AWS CloudWatch metrics

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Kinesis Data Out – Kinesis Client Library

SHARD 1

SHARD 2

SHARD 3

SHARD N

EC2 Instance

Worker 1

Worker 2

EC2 Instance

Worker 3

Worker N

KCL: Java, Node.js, Python, .NET, Ruby

twitter-trends.com

twitter-trends.com website

twitter-trends.com

The solution: Local Top 10

My top-10

My top-10

My top-10

Global top-10

KINESIS

twitter-trends.com

Challenges using the Kinesis API directly

Kinesisapplication

Manual creation of workers and assignment to shards

How many workers per EC2 instance?How many EC2 instances?

KINESIS

twitter-trends.com

Using the Kinesis Client Library

Kinesisapplication

Shard mgmt table

KINESIS

twitter-trends.com

Elasticity and load balancing

Shard mgmt table

Auto scaling Group

KINESIS

twitter-trends.com

Fault tolerance support in KCL

Shard mgmt table

XAvailability Zone

1

Availability Zone 3

Checkpoint, replay design pattern

Kinesis

1417182123

Shard-i235810

Shard ID

Lock Seq num

Shard-i

Host A

Host B

Shard ID

Local top-10

Shard-i

0

10

18X2

3

5

8

10

14

1718

2123

0

310

Host AHost B

{#Movies: 10235, #Weather: 9835, …}{#Movies: 10235, #Weather: 9910, …}

1023

1417

1821

23

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Kinesis & Lambda

SHARD 1

SHARD 2

SHARD 3

SHARD N

AWS Lambda: Node.js, Java, Python, C#

AWS Lambda

LambdaBlueprints

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

StreamMicro

BatchesResults

Amazon Kinesis

Apache Kafka

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

Data Prep

Prediction Model

Train

TestSplit

70%

30%

Near Real-time Data

Training Data

SQL

ML

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Amazon Kinesis AWS Lambda

Application

Amazon EMR

Streaming

S3 (Log)

Amazon ElasticSearch(Dashboard)

Real-time Pipeline

AmazonElasticsearch

• Search and Analytics• Scalable• Fully Managed• Integrated – Logstash, Kibana

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Amazon Kinesis AWS Lambda

Application

Amazon EMR

Streaming

S3 (Logs)

Amazon ElasticSearch(Dashboards)

Amazon EMR(Predictions)

ML

Amazon SNS(Alerts)

Real-time Pipeline

Amazon Redshift

(Analytics)

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

S3

Redshift

Elasticsearch

Amazon Kinesis Firehose

Auto provisioningAuto partition keysEnd to End Elastic

Batch Compress

Encrypt

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

Kinesis Analytics

Stream or Firehose

Kinesis Analytics

Data OutData In

SQL

Stream or Firehose

Sonos

New X1 Instance - Tons of Memory

• Large-scale, in-memory applications

• Intel® Xeon® E7 8880 v3 Haswell processors

• Up to 2TB of memory

• Up to 128 vCPUs per instance

Intel® Processor Technologies

Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing

Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data

Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads

Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput

P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance

twitter.com/awsawscloudseasia

aws-asean-marketing@amazon.com

facebook.com/amazonwebservices/

youtube.com/user/AmazonWebServices

slideshare.net/amazonwebservices

Thank you for joining us today. Please complete the survey & let us know what you think of the webinar.

REGISTER NOWhttp://amzn.to/2jFt11NComplimentary labs are available only till 31 March 2017

Get hands on experience working with the AWS Technology.Access the complimentary Big Data on AWS self-paced labs

Q&A