AWS Big Data Analytics IP Expo 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
286 -
download
0
description
Transcript of AWS Big Data Analytics IP Expo 2013
Big Data Analytics
David de Santiago
Business Development Manager, Analytics EMEA
1. Introducing Big Data
2. From data to actionable information
3. Analytics and Cloud Computing
Overview
Introducing Big Data
1
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
The cost of data generation
is falling
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Highly
constrained
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure
Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Elastic and highly scalable
No upfront capital expense
Only pay for what you use +
+
Available on-demand
+
= Remove
constraints
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Highly
constrained
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Accelerated
Technologies and techniques for
working productively with data,
at any scale.
Big Data
From data to
actionable information
2
3.5 billion records
13 TB of click stream logs
71 million unique cookies
Per day:
User bought
recently a home
theatre system
And is now
looking at sport
games
Targeted Ad
500% return on ad spend
17,000% reduction in procurement time
Results:
“We couldn’t have done it”
Identified early mobile usage
Invested heavily in mobile development
Finding signal in the noise of logs
9,432,061 unique mobile devices
used the Yelp mobile app.
Other Features powered by EMR: People Who Viewed this Also Viewed
Review highlights
Auto complete as you type on search
Search spelling suggestions
Top searches
Ads
In January 2013
Open web index.
3.4 billion records.
Available to all.
You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
Tweeting about Flu
Full parse for impact of
social networks
300 lines of Ruby code.
14 hours.
$100.
Analytics and
Cloud Computing
3
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
S3, Glacier,
Storage Gateway,
DynamoDB,
Redshift, RDS,
HBase
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
EC2 &
Elastic MapReduce
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
EC2 & S3,
CloudFormation,
Elastic MapReduce,
RDS, DynamoDB, Redshift
Amazon Redshift
Fully Managed Data Warehouse
Scales to 1.6PB
Faster, Simpler, Cheaper
Amazon Redshift
Effective
Hourly Price
Per TB
Effective
Annual Price
per TB
On-Demand $ 0.425 $ 3,723
1 Year Reservation $ 0.250 $ 2,190
3 Year Reservation $ 0.114 $ 999
“Two months to migrate to Amazon Redshift.”
Greg Johnson, Head of Analytics, Nokia
“TOWARDS THE END OF LAST YEAR OUR DATA
VOLUMES LITERALLY
BROKE THE EXISTING
DATABASE. WE WERE NO
LONG ABLE TO SCALE THE
DATABASE OR DO ANYTHING
USEFUL; LIKE RUNNING
QUERIES”
Elastic Map Reduce: How does it work?
EMR
EMR Cluster S3
1. Put the data into S3 (or HDFS)
3. Get the results
2. Launch your cluster. Choose: • Hadoop distribution • How many nodes • Node type (hi-CPU,
hi-memory, etc.) • Hadoop apps (Hive,
Pig, HBase)
EMR
EMR Cluster
Elastic Map Reduce: How does it work?
S3
You can easily resize the cluster
EMR
EMR Cluster
Elastic Map Reduce: How does it work?
S3
Use Spot nodes to save time
and money
EMR
EMR Cluster
Elastic Map Reduce: How does it work?
S3
Launch parallel clusters against the same data source (tune for the
workload)
Elastic Map Reduce: How does it work?
EMR Cluster S3
When the work is complete, you can terminate the cluster
(and stop paying)
Thousands of Customers, 5+ Million Clusters
Give it a try.
Cost to run a 100-node EMR cluster:
£4.90 / hour
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
EC2 & S3,
CloudFormation,
Elastic MapReduce,
RDS, DynamoDB, Redshift
EC2 &
Elastic MapReduce
S3, Glacier,
Storage Gateway,
DynamoDB,
Redshift, RDS,
HBase AWS Data Pipeline
AWS Data Pipeline
Data-intensive orchestration and automation
Reliable and scheduled
Easy to use, drag and drop
Execution and retry logic
Map data dependencies
Create and manage temporary compute
resources
Anatomy of a pipeline
Arbitrarily complex pipelines
Thanks. [email protected]
To Learn More:
aws.amazon.com/elasticmapreduce
aws.amazon.com/datapipeline
aws.amazon.com/big-data
aws.amazon.com/redshift
aws.amazon.com/rds
Thank you!