Building big data applications on AWS by Ran Tessler

17
Ran Tessler, Manager, Solutions Architecture Building Big Data Applications on AWS

Transcript of Building big data applications on AWS by Ran Tessler

Ran Tessler, Manager, Solutions Architecture

Building Big Data Applications on AWS

A Modern Take on Alchemy

Turning Data into Actionable Insights

What to Expect from this Session

Big Data architectural principles Reference Lambda ArchitectureLive demo

Architectural Principles

• Decoupled “data bus”Data → Store → Process → Answers

• Use the right tool for the jobLatency, throughput, access patterns

• Apply Lambda architecture ideasImmutable (append-only) log, batch/speed/serving layer

• Leverage AWS managed servicesNo/low admin

• Be cost conscious Big data ≠ big cost

Simplify Big Data Processing

Ingest / collect store process /

analyzeconsume / visualize

data answers

Time to Answer (data freshness)Throughput

Demo

http://aws.amazon.com/big-data/use-cases/

AccessLog - Common Log Format (CLF)

75.35.230.210 - - [20/Jul/2009:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236

Did NASA’s STS-69 Mission Land …

… On the right homepage?

Your First Big Data Application on AWS

PROCESS

STORE

ANALYZE & VISUALIZE

COLLECT

Your First Big Data Application on AWS

PROCESS

S3

STORE

Logs COLLECT:

Amazon Kinesis FirehoseAmazon Kinesis

ANALYZE & VISUALIZE

Your First Big Data Application on AWS

S3

STORE

Logs COLLECT:

Amazon Kinesis FirehoseAmazon Kinesis

ANALYZE & VISUALIZE

PROCESS: Amazon EMR with Spark & HiveS

park

Your First Big Data Application on AWS

PROCESS: Amazon EMR with Spark & Hive

EMRS3

STORE

Amazon Redshift

ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight

Logs COLLECT:

Amazon Kinesis FirehoseAmazon Kinesis

Spa

rkQuickSight

Reference Lambda Architecture

processstore

Apps

Batch Layer

Amazon Kinesis S3 Connector

Amazon S3

Amazon Redshift

Amazon EMR

Presto

Hive

Pig

Spark

Lambda Architecture

Serving Layer

AmazonElastiCache

AmazonDynamoDB

AmazonRDS

AmazonES

Amazon

Kinesis Speed Layer

KCL

AWS Lambda

Spark Streaming

Storm

AmazonMLdata

Back to our demo…

PROCESS: Amazon EMR with Spark & Hive

EMRS3

STORE

Amazon Redshift

ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight

Logs COLLECT:

Amazon Kinesis FirehoseAmazon Kinesis

Spa

rkQuickSight

DIYDownload all steps: http://bit.ly/29fhcwu

AmazonKinesis

Firehose

AmazonEMR

AmazonS3

AmazonRedshift

AmazonQuickSight

AmazonS3

http://aws.amazon.com/big-data/use-cases/