Big Data & The Cloud

54
Amazon Web Services Big Data and the Cloud : A Best Friend Story

description

Joe Ziegler's presentation at the 5th Elephant conference in Bangalore.

Transcript of Big Data & The Cloud

Page 1: Big Data & The Cloud

Amazon Web ServicesBig Data and the Cloud : A Best Friend Story

Page 2: Big Data & The Cloud

Joe ZieglerTechnical [email protected] @jiyosub

Page 3: Big Data & The Cloud

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of Big Data

Page 4: Big Data & The Cloud

Characteristics of Big Data

Page 5: Big Data & The Cloud

BIG DATAWhen your data sets become so large that you have to start

innovating how to collect, store, organize, analyze and share it

Page 6: Big Data & The Cloud

Bigger Data is

Better Data

Page 7: Big Data & The Cloud

Features driven by MapReduce

Page 8: Big Data & The Cloud

Bigger Datais

Harder Data

Page 9: Big Data & The Cloud

Big Data is Getting Bigger

2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos

Page 10: Big Data & The Cloud

Why is Big Data Hard (and Getting Harder)?

Changing Data RequirementsFaster response time of fresher data

Sampling is not good enough & history is important

Increasing complexity of analyticsUsers demand inexpensive experimentation

Page 11: Big Data & The Cloud

Where is it Coming From?Computer Generated

• Application server logs (web sites, games)

• Sensor data (weather, water, smart grids)

• Images/videos (traffic, security cameras)

Human Generated• Twitter “Fire Hose” 50m tweets/day 1,400% growth per year

• Blogs/Reviews/Emails/Pictures

• Social Graphs: Facebook, Linked-in, Contacts

Page 12: Big Data & The Cloud

The Role of Data is Changing

Page 13: Big Data & The Cloud

Until now, Questions you ask drove Data model

New model is collect as much data as possible – “Data-First Philosophy”

Page 14: Big Data & The Cloud

Data is the new raw material

for any business on par with capital, people, labor

Data is the new raw material for any business

on par with capital, people, labor

Page 15: Big Data & The Cloud

We Need Tools Built Specifically for Big Data

Page 16: Big Data & The Cloud

Hadoop

• Scale out Easily• Parallel Computing• Commodity Hardware

• Solves some Problems• Complex to Run• Special Skills to Maintain

Page 17: Big Data & The Cloud

How the Cloud IsBig Data’s Best

Friend

Page 18: Big Data & The Cloud

How do we define the cloud?By Benefits!

Page 19: Big Data & The Cloud

Cloud

Elasticity

Fast Time to Market Focus on core competency

Pay Per Use

No Cap Ex

Page 20: Big Data & The Cloud

Why is the CloudBig Data’s Best Friend

Page 21: Big Data & The Cloud

We know we want collect, store, organize, analyze and share it.But we have limited resources.

Page 22: Big Data & The Cloud

The Cloud OptimizesPrecious IT Resources

i.e. Skilled People

Page 23: Big Data & The Cloud

“Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x.

While the pool of IT staff available to manage them will grow only slightly. At 1.5x”

- 2011 IDC Digital Universe Study

Page 24: Big Data & The Cloud

Deploying a Hadoop cluster is hard

Page 25: Big Data & The Cloud

Using Big Data

70%

The Old IT World

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

Page 26: Big Data & The Cloud

Cloud-BasedInfrastructure

Using Big Data

Analyzing and Using Big Data Configuring Cloud Assets

70%

30%70%

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

The Old IT World

Page 27: Big Data & The Cloud

ReusabilityManaged Services

Scale Innovation

Page 28: Big Data & The Cloud

ReusabilityManaged Services

Scale Innovation

Page 29: Big Data & The Cloud

ReusabilityManaged Services

Scale Innovation

Page 30: Big Data & The Cloud

ReusabilityManaged Services

Scale Innovation

Page 31: Big Data & The Cloud

ReusabilityManaged Services

Scale Innovation

Page 32: Big Data & The Cloud

The Cloud OptimizesCapacity Resources

Page 33: Big Data & The Cloud

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic Compute Capacity

Page 34: Big Data & The Cloud

Elastic Compute Capacity

On and Off Fast Growth

Predictable peaksVariable peaks

WASTE

CUSTOMER DISSATISFACTION

Page 35: Big Data & The Cloud

Elastic cloud capacity

Traditional

IT capacity

Your IT needs

Time

Capacity

Elastic Compute Capacity

Page 36: Big Data & The Cloud

Elastic Compute Capacity

Fast GrowthOn and Off

Predictable peaksVariable peaks

Page 37: Big Data & The Cloud

The CloudEmpowers Users to Balance

Cost and Time

Page 38: Big Data & The Cloud

1 instance for 500 hours=

500 instances for 1 hourI like this!

I scale

Page 39: Big Data & The Cloud

The CloudReduces Cost

For Experimentation

Page 40: Big Data & The Cloud

The Cloud Enables Collection and

Storageof Big Data

Page 41: Big Data & The Cloud

Q4

2006

Q4

2007

Q4

2008

Q4

2009

Q4

2010

Q4

2011

Q2

2012

0.000

250.000

500.000

750.000

1000.000

1 Trillion

750k+ peak transactions per second

Simple Storage Service

Page 42: Big Data & The Cloud

Global Accessibility RegionsRegion

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC (Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

Page 43: Big Data & The Cloud

Storage Costs are Declining

Page 44: Big Data & The Cloud

Big Data on the Cloud

In the Real World

Page 45: Big Data & The Cloud

Big Data Verticals

Media/Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recommend

Transactions Analysis

Life Sciences

Genome Analysis

Financial Services

Monte Carlo Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network/Gami

ng

User Demographics

Usage analysis

In-game metrics

Page 46: Big Data & The Cloud

Visualizations

Page 47: Big Data & The Cloud

Bank – Monte Carlo Simulations“The AWS platform was a good fit for its

unlimited and flexible computational power to our risk-simulation process requirements.

With AWS, we now have the power to decide how fast we want to obtain simulation results,

and, more importantly, we have the ability to run simulations not possible before due to the

large amount of infrastructure required.” – Castillo, Director, Bankinter

23 Hours to 20 Minutes

Page 48: Big Data & The Cloud

The Taste Test http://www.etsy.com/tastetest

Recommendations

Page 49: Big Data & The Cloud

etsy.com/gifts

Recommendations

Gift Ideas for Facebook Friends

Page 50: Big Data & The Cloud
Page 51: Big Data & The Cloud

Targeted Ad

User recently purchased a

sports movie and is searching for video games

(1.7 Million per day)

Click Stream Analysis

Page 52: Big Data & The Cloud

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of Big Data

Page 53: Big Data & The Cloud

Questions?

Page 54: Big Data & The Cloud

Joe ZieglerTechnical [email protected] @jiyosub