Big Data Building Blocks with AWS Cloud
-
Upload
blazeclan-technologies-private-limited -
Category
Technology
-
view
111 -
download
1
description
Transcript of Big Data Building Blocks with AWS Cloud
KINESIS REDSHIFT
EMR
DYNAMO DB
Big Data Building Blocks With AWS
2.5 quintillion bytes of
Data is generated
Everyday!
How Do You Tackle These
Big Data Challenges?
Agenda
Big Data is getting
Bigger and Bigger ! How Cloudlytics Uses
AWS Cloud for its Big Data
How AWS Building Blocks
can Help Tame Big Data!
Figuring Out the
Big Data Life Cycle
Why is Cloud Big Data’s
Best Friend ?
1 5
4
3
2
Cloud IT Better 2
So What is Big Data ?
Simply put, Big Data is
data which cannot be
processed by the current
tools or technologies. Big
Data is too Big, too Fast
and too Varied. High Resolution images from NASA, our place in
the cosmos!
Cloud IT Better 3
The 3 V’s that make Big Data difficult to Tame!
Volume
2.5 quintillion bytes of Data is generated everyday!
Conventional Databases allow processing of data in batches, it could take days weeks to process one batch of Big Data.
Variety
Velocity
Data from social networks, sensors installed at store entrances, traffic lights, in airplanes, Car
GPS and countless other sources !!
Twitter Generates 5 Giga Bytes of data/min Facebook generates 7 Giga Bytes of data/min.
Cloud IT Better 4
“ It is estimated that Walmart collects more than 2.5 petabytes of data EVERY HOUR from its customer transactions ”
Big Data is Getting Bigger and BIGGER!
“ Zuckerberg noted that 1 billion pieces of content are shared via Facebook’s Open Graph DAILY ! “
“ More data crosses the internet EVERY SECOND than were stored in the entire internet just 20 years ago? “
Cloud IT Better 5
Why is Cloud Big Data’s Best Friend ?
With Big Data, we Know we want to Generate, Store, Analyze & Share.
But How does Cloud
come in to Picture?
Cloud IT Better 6
Our IT Resources are Limited & Precious!
And, Cloud has
The Solution for this !!
Cloud IT Better 7
Elasticity
On Demand No CapEx
Scalable Pay Per Use
Fast Time to Market
Pooled Resources
Remote Access
Flexible
Secure
Cost Effective
Resilient
Cloud Has Many Advantages
Cloud IT Better 8
Cloud Makes Sure that Your Precious IT Resources are
OPTIMIZED
Cloud Optimizes Your IT Resources
Cloud IT Better 9
Cloud makes it Easy!
Cloud Makes Big Data
Easier To Handle
Image Courtesy: http://www.slideshare.net/AmazonWebServicesLATAM/big-data-on-aws?
Cloud IT Better 10
Let us Figure out the Big Data Life Cycle
In order to make the entire process of Big Data more tangible, it is divided
into 4 stages:
Generation
Collection
& Store
Analyze &
Computation
Data
Collaboration
& Sharing
11 Cloud IT Better
Generation
Collection
& Store
Analyze &
Computation
Data
Collaboration
& Sharing
Generating the Data
Structured Data – Employee Records Semi Structured Data – End User Logs Unstructured Data – Social User Profile images
Data Mining
Log file analysis
Machine learning
Web indexing
Financial
analysis
Scientific
simulations
Data
warehousing Bioinformatics
research
Web based APIs can be used to access this data and Store it.
Cloud IT Better 12
Transferring Your Data to AWS Cloud To transfer your Data Sets on to the Cloud You can Use:
AWS Direct Connect
AWS Storage Gateway AWS Import/Export
Establish a dedicated network connection from your premises to AWS
Secure Integration between an On-premises IT & AWS’s storage infrastructure
Move large amounts of data into and out of AWS using portable storage devices for transport
Cloud IT Better 13
Collecting & Storing Data on AWS Cloud
Simple Storage Service (S3) Write, read, and delete objects
containing from 1 byte to 5 terabytes of data each.
A full featured relational databases giving you access to capabilities of a MySQL, Oracle, SQL
Server, or PostgreSQL databases engines
AWS Relational Database Service (RDS)
A fast, fully managed NoSQL database service making it simple & cost-effective to store & retrieve
any amount of data, and serve any level of request traffic.
AWS DynamoDB
Cloud IT Better 14
Data Analysis on AWS Cloud
http://dorkutopia.com/wp-content/uploads/2013/06/
Once You’ve stored your Content On Cloud, It is Time to Analyze It !!
So if you’re Thinking implementing a Hadoop Infrastructure ……
Cloud IT Better 15
Data Analysis on AWS Cloud
Setting Up a Hadoop Infrastructure is not that Easy, But AWS Has the Answer !
Image courtesy: http://globalgeeknews.com/wp-content/uploads/
Cloud IT Better 16
Data Analysis on AWS Cloud
Amazon Elastic Map Reduce (EMR)
• A managed Hadoop distribution by Amazon Web Services using customized Apache Hadoop framework • Using MapReduce, in which a data processing tasks are mapped to set of servers in a cluster for processing. • EMR integrates with AWS S3 (an alternative Storage to HDFS) & EC2(Compute Instances). • EMR allows you to tune the default Hadoop Job Flows to your custom needs. • The various How To’s of Hadoop Architecture such as adding, removing & configuring nodes is taken care of by EMR.
Cloud IT Better 17
AWS Redshift for Retrieval & Collaboration
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service making it simple & cost-effective to efficiently analyze all your data using your
existing business intelligence tools.
• Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and
distributing SQL operations. • You can use AWS Redshift to Store and retrieve processed
data quickly, to generate custom based Reports.
AWS Redshift
Cloud IT Better 18
AWS Data Pipelines for Automation
Input Node
Output Node
Activity
AWS Data pipeline allows users to define a dependent chain
of data sources and destinations with an option to create data processing activities called pipeline.
• Can be implemented across all stages of Big Data Life Cycle. • Tasks Scheduled to perform Data movement and processing Activities. • Failure & Retry options in Data pipeline workflows also Available. • Input & Output Data nodes support S3 Bucket, DynamoDB, MySQL DB & SQL Data Source. • Activities currently supported are Copy, EMR, Hive & Shell Activity.
Cloud IT Better 19
AWS Kinesis (NEW)
Amazon Kinesis is a fully managed service for real-time processing of streaming data at
massive scale. Amazon Kinesis can collect and process hundreds of TBs of data/hr from hundreds of thousands of sources.
• Real Time Processing allowing you to answer questions about the current state of your data.
• Amazon Kinesis automatically provisions & manages the storage required to reliably & durably collect your data stream. • You can add as many as kinesis Streams as desired based on
the volume & variety of Data. • Your Kinesis Streams are connected to your Kinesis App
from which you can use DynamoDB or Redshift to process complex queries at real Time.
Image courtesy: https://static.gosquared.com/images/liquidicity/kinesis/
Cloud IT Better 20
The Big Data Life cycle - Compiled
Component Description
……………………… ……………… ........ ……………. …….
AWS S3 AWS RDS AWS DynamoDB
AWS EMR
AWS Data Pipeline
Generation
Collection
& Store
Analyze &
Computation
Data
Collaboration
& Sharing
AWS S3 AWS RDS
AWS DynamoDB AWS Redshift
AWS Data Pipeline
AWS Data Pipeline
Cloud IT Better 21
Use Case - Cloudlytics
Cloudlytics is a Pay-as-you-Go, SaaS based Log Analytics Tool powered by AWS. It Takes the Big Data Approach using AWS Components such as EMR & Redshift.
Customer Log Files Stored in S3
Processing Processed Data
Customer Reports
Cloud IT Better 22
Check out our Past Webinars
Cloud IT Better 23
Thank you
www.blazeclan.com
Follow Us On :
Our Blog : http://blog.blazeclan.com/
Contact us : [email protected]
Cloud IT Better 24