Big Data Building Blocks with AWS Cloud

24
KINESIS REDSHIFT EMR DYNAMO DB Big Data Building Blocks With AWS 2.5 quintillion bytes of Data is generated Everyday! How Do You Tackle These Big Data Challenges?

description

The Presentation Talks about how Cloud Computing is Big Data's Best Friend and How AWS Cloud Components Fit in to complete your Big Data Life Cycle. Agenda: - How Big is Big Data Actually growing? - How Cloud has the potential to become Big Data's Best Friend - A tour on The Big Data Life Cycle - How AWS Cloud Components Fit in to this Life Cycle - A Case Study of Our Log Analytics Tool Cloudlytics, using Big Data Implementation on AWS Cloud.

Transcript of Big Data Building Blocks with AWS Cloud

Page 1: Big Data Building Blocks with AWS Cloud

KINESIS REDSHIFT

EMR

DYNAMO DB

Big Data Building Blocks With AWS

2.5 quintillion bytes of

Data is generated

Everyday!

How Do You Tackle These

Big Data Challenges?

Page 2: Big Data Building Blocks with AWS Cloud

Agenda

Big Data is getting

Bigger and Bigger ! How Cloudlytics Uses

AWS Cloud for its Big Data

How AWS Building Blocks

can Help Tame Big Data!

Figuring Out the

Big Data Life Cycle

Why is Cloud Big Data’s

Best Friend ?

1 5

4

3

2

Cloud IT Better 2

Page 3: Big Data Building Blocks with AWS Cloud

So What is Big Data ?

Simply put, Big Data is

data which cannot be

processed by the current

tools or technologies. Big

Data is too Big, too Fast

and too Varied. High Resolution images from NASA, our place in

the cosmos!

Cloud IT Better 3

Page 4: Big Data Building Blocks with AWS Cloud

The 3 V’s that make Big Data difficult to Tame!

Volume

2.5 quintillion bytes of Data is generated everyday!

Conventional Databases allow processing of data in batches, it could take days weeks to process one batch of Big Data.

Variety

Velocity

Data from social networks, sensors installed at store entrances, traffic lights, in airplanes, Car

GPS and countless other sources !!

Twitter Generates 5 Giga Bytes of data/min Facebook generates 7 Giga Bytes of data/min.

Cloud IT Better 4

Page 5: Big Data Building Blocks with AWS Cloud

“ It is estimated that Walmart collects more than 2.5 petabytes of data EVERY HOUR from its customer transactions ”

Big Data is Getting Bigger and BIGGER!

“ Zuckerberg noted that 1 billion pieces of content are shared via Facebook’s Open Graph DAILY ! “

“ More data crosses the internet EVERY SECOND than were stored in the entire internet just 20 years ago? “

Cloud IT Better 5

Page 6: Big Data Building Blocks with AWS Cloud

Why is Cloud Big Data’s Best Friend ?

With Big Data, we Know we want to Generate, Store, Analyze & Share.

But How does Cloud

come in to Picture?

Cloud IT Better 6

Page 7: Big Data Building Blocks with AWS Cloud

Our IT Resources are Limited & Precious!

And, Cloud has

The Solution for this !!

Cloud IT Better 7

Page 8: Big Data Building Blocks with AWS Cloud

Elasticity

On Demand No CapEx

Scalable Pay Per Use

Fast Time to Market

Pooled Resources

Remote Access

Flexible

Secure

Cost Effective

Resilient

Cloud Has Many Advantages

Cloud IT Better 8

Page 9: Big Data Building Blocks with AWS Cloud

Cloud Makes Sure that Your Precious IT Resources are

OPTIMIZED

Cloud Optimizes Your IT Resources

Cloud IT Better 9

Page 10: Big Data Building Blocks with AWS Cloud

Cloud makes it Easy!

Cloud Makes Big Data

Easier To Handle

Image Courtesy: http://www.slideshare.net/AmazonWebServicesLATAM/big-data-on-aws?

Cloud IT Better 10

Page 11: Big Data Building Blocks with AWS Cloud

Let us Figure out the Big Data Life Cycle

In order to make the entire process of Big Data more tangible, it is divided

into 4 stages:

Generation

Collection

& Store

Analyze &

Computation

Data

Collaboration

& Sharing

11 Cloud IT Better

Page 12: Big Data Building Blocks with AWS Cloud

Generation

Collection

& Store

Analyze &

Computation

Data

Collaboration

& Sharing

Generating the Data

Structured Data – Employee Records Semi Structured Data – End User Logs Unstructured Data – Social User Profile images

Data Mining

Log file analysis

Machine learning

Web indexing

Financial

analysis

Scientific

simulations

Data

warehousing Bioinformatics

research

Web based APIs can be used to access this data and Store it.

Cloud IT Better 12

Page 13: Big Data Building Blocks with AWS Cloud

Transferring Your Data to AWS Cloud To transfer your Data Sets on to the Cloud You can Use:

AWS Direct Connect

AWS Storage Gateway AWS Import/Export

Establish a dedicated network connection from your premises to AWS

Secure Integration between an On-premises IT & AWS’s storage infrastructure

Move large amounts of data into and out of AWS using portable storage devices for transport

Cloud IT Better 13

Page 14: Big Data Building Blocks with AWS Cloud

Collecting & Storing Data on AWS Cloud

Simple Storage Service (S3) Write, read, and delete objects

containing from 1 byte to 5 terabytes of data each.

A full featured relational databases giving you access to capabilities of a MySQL, Oracle, SQL

Server, or PostgreSQL databases engines

AWS Relational Database Service (RDS)

A fast, fully managed NoSQL database service making it simple & cost-effective to store & retrieve

any amount of data, and serve any level of request traffic.

AWS DynamoDB

Cloud IT Better 14

Page 15: Big Data Building Blocks with AWS Cloud

Data Analysis on AWS Cloud

http://dorkutopia.com/wp-content/uploads/2013/06/

Once You’ve stored your Content On Cloud, It is Time to Analyze It !!

So if you’re Thinking implementing a Hadoop Infrastructure ……

Cloud IT Better 15

Page 16: Big Data Building Blocks with AWS Cloud

Data Analysis on AWS Cloud

Setting Up a Hadoop Infrastructure is not that Easy, But AWS Has the Answer !

Image courtesy: http://globalgeeknews.com/wp-content/uploads/

Cloud IT Better 16

Page 17: Big Data Building Blocks with AWS Cloud

Data Analysis on AWS Cloud

Amazon Elastic Map Reduce (EMR)

• A managed Hadoop distribution by Amazon Web Services using customized Apache Hadoop framework • Using MapReduce, in which a data processing tasks are mapped to set of servers in a cluster for processing. • EMR integrates with AWS S3 (an alternative Storage to HDFS) & EC2(Compute Instances). • EMR allows you to tune the default Hadoop Job Flows to your custom needs. • The various How To’s of Hadoop Architecture such as adding, removing & configuring nodes is taken care of by EMR.

Cloud IT Better 17

Page 18: Big Data Building Blocks with AWS Cloud

AWS Redshift for Retrieval & Collaboration

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service making it simple & cost-effective to efficiently analyze all your data using your

existing business intelligence tools.

• Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and

distributing SQL operations. • You can use AWS Redshift to Store and retrieve processed

data quickly, to generate custom based Reports.

AWS Redshift

Cloud IT Better 18

Page 19: Big Data Building Blocks with AWS Cloud

AWS Data Pipelines for Automation

Input Node

Output Node

Activity

AWS Data pipeline allows users to define a dependent chain

of data sources and destinations with an option to create data processing activities called pipeline.

• Can be implemented across all stages of Big Data Life Cycle. • Tasks Scheduled to perform Data movement and processing Activities. • Failure & Retry options in Data pipeline workflows also Available. • Input & Output Data nodes support S3 Bucket, DynamoDB, MySQL DB & SQL Data Source. • Activities currently supported are Copy, EMR, Hive & Shell Activity.

Cloud IT Better 19

Page 20: Big Data Building Blocks with AWS Cloud

AWS Kinesis (NEW)

Amazon Kinesis is a fully managed service for real-time processing of streaming data at

massive scale. Amazon Kinesis can collect and process hundreds of TBs of data/hr from hundreds of thousands of sources.

• Real Time Processing allowing you to answer questions about the current state of your data.

• Amazon Kinesis automatically provisions & manages the storage required to reliably & durably collect your data stream. • You can add as many as kinesis Streams as desired based on

the volume & variety of Data. • Your Kinesis Streams are connected to your Kinesis App

from which you can use DynamoDB or Redshift to process complex queries at real Time.

Image courtesy: https://static.gosquared.com/images/liquidicity/kinesis/

Cloud IT Better 20

Page 21: Big Data Building Blocks with AWS Cloud

The Big Data Life cycle - Compiled

Component Description

……………………… ……………… ........ ……………. …….

AWS S3 AWS RDS AWS DynamoDB

AWS EMR

AWS Data Pipeline

Generation

Collection

& Store

Analyze &

Computation

Data

Collaboration

& Sharing

AWS S3 AWS RDS

AWS DynamoDB AWS Redshift

AWS Data Pipeline

AWS Data Pipeline

Cloud IT Better 21

Page 22: Big Data Building Blocks with AWS Cloud

Use Case - Cloudlytics

Cloudlytics is a Pay-as-you-Go, SaaS based Log Analytics Tool powered by AWS. It Takes the Big Data Approach using AWS Components such as EMR & Redshift.

Customer Log Files Stored in S3

Processing Processed Data

Customer Reports

Cloud IT Better 22