Big Data & The Cloud
-
Upload
amazon-web-services -
Category
Technology
-
view
5.842 -
download
0
description
Transcript of Big Data & The Cloud
Amazon Web ServicesBig Data and the Cloud : A Best Friend Story
Joe ZieglerTechnical [email protected] @jiyosub
Big Data on the Cloud
In the Real World
How the Cloud Is
Big Data’s Best Friend
Characteristics of Big Data
Characteristics of Big Data
BIG DATAWhen your data sets become so large that you have to start
innovating how to collect, store, organize, analyze and share it
Bigger Data is
Better Data
Features driven by MapReduce
Bigger Datais
Harder Data
Big Data is Getting Bigger
2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
Why is Big Data Hard (and Getting Harder)?
Changing Data RequirementsFaster response time of fresher data
Sampling is not good enough & history is important
Increasing complexity of analyticsUsers demand inexpensive experimentation
Where is it Coming From?Computer Generated
• Application server logs (web sites, games)
• Sensor data (weather, water, smart grids)
• Images/videos (traffic, security cameras)
Human Generated• Twitter “Fire Hose” 50m tweets/day 1,400% growth per year
• Blogs/Reviews/Emails/Pictures
• Social Graphs: Facebook, Linked-in, Contacts
The Role of Data is Changing
Until now, Questions you ask drove Data model
New model is collect as much data as possible – “Data-First Philosophy”
Data is the new raw material
for any business on par with capital, people, labor
Data is the new raw material for any business
on par with capital, people, labor
We Need Tools Built Specifically for Big Data
Hadoop
• Scale out Easily• Parallel Computing• Commodity Hardware
• Solves some Problems• Complex to Run• Special Skills to Maintain
How the Cloud IsBig Data’s Best
Friend
How do we define the cloud?By Benefits!
Cloud
Elasticity
Fast Time to Market Focus on core competency
Pay Per Use
No Cap Ex
Why is the CloudBig Data’s Best Friend
We know we want collect, store, organize, analyze and share it.But we have limited resources.
The Cloud OptimizesPrecious IT Resources
i.e. Skilled People
“Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x.
While the pool of IT staff available to manage them will grow only slightly. At 1.5x”
- 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Using Big Data
70%
The Old IT World
30%
Managing All of the “Undifferentiated Heavy Lifting”
Cloud computing
Cloud-BasedInfrastructure
Using Big Data
Analyzing and Using Big Data Configuring Cloud Assets
70%
30%70%
30%
Managing All of the “Undifferentiated Heavy Lifting”
Cloud computing
The Old IT World
ReusabilityManaged Services
Scale Innovation
ReusabilityManaged Services
Scale Innovation
ReusabilityManaged Services
Scale Innovation
ReusabilityManaged Services
Scale Innovation
ReusabilityManaged Services
Scale Innovation
The Cloud OptimizesCapacity Resources
On and Off Fast Growth
Variable peaks Predictable peaks
Elastic Compute Capacity
Elastic Compute Capacity
On and Off Fast Growth
Predictable peaksVariable peaks
WASTE
CUSTOMER DISSATISFACTION
Elastic cloud capacity
Traditional
IT capacity
Your IT needs
Time
Capacity
Elastic Compute Capacity
Elastic Compute Capacity
Fast GrowthOn and Off
Predictable peaksVariable peaks
The CloudEmpowers Users to Balance
Cost and Time
1 instance for 500 hours=
500 instances for 1 hourI like this!
I scale
The CloudReduces Cost
For Experimentation
The Cloud Enables Collection and
Storageof Big Data
Q4
2006
Q4
2007
Q4
2008
Q4
2009
Q4
2010
Q4
2011
Q2
2012
0.000
250.000
500.000
750.000
1000.000
1 Trillion
750k+ peak transactions per second
Simple Storage Service
Global Accessibility RegionsRegion
US-WEST (N. California) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC (Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
Storage Costs are Declining
Big Data on the Cloud
In the Real World
Big Data Verticals
Media/Advertising
Targeted Advertising
Image and Video
Processing
Oil & Gas
Seismic Analysis
Retail
Recommend
Transactions Analysis
Life Sciences
Genome Analysis
Financial Services
Monte Carlo Simulations
Risk Analysis
Security
Anti-virus
Fraud Detection
Image Recognition
Social Network/Gami
ng
User Demographics
Usage analysis
In-game metrics
Visualizations
Bank – Monte Carlo Simulations“The AWS platform was a good fit for its
unlimited and flexible computational power to our risk-simulation process requirements.
With AWS, we now have the power to decide how fast we want to obtain simulation results,
and, more importantly, we have the ability to run simulations not possible before due to the
large amount of infrastructure required.” – Castillo, Director, Bankinter
23 Hours to 20 Minutes
etsy.com/gifts
Recommendations
Gift Ideas for Facebook Friends
Targeted Ad
User recently purchased a
sports movie and is searching for video games
(1.7 Million per day)
Click Stream Analysis
Big Data on the Cloud
In the Real World
How the Cloud Is
Big Data’s Best Friend
Characteristics of Big Data
Questions?
Joe ZieglerTechnical [email protected] @jiyosub