Big Data & The Cloud

Post on 15-Jan-2015

5.842 views 0 download

Tags:

description

Joe Ziegler's presentation at the 5th Elephant conference in Bangalore.

Transcript of Big Data & The Cloud

Amazon Web ServicesBig Data and the Cloud : A Best Friend Story

Joe ZieglerTechnical Evangelistzieglerj@amazon.com @jiyosub

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of Big Data

Characteristics of Big Data

BIG DATAWhen your data sets become so large that you have to start

innovating how to collect, store, organize, analyze and share it

Bigger Data is

Better Data

Features driven by MapReduce

Bigger Datais

Harder Data

Big Data is Getting Bigger

2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos

Why is Big Data Hard (and Getting Harder)?

Changing Data RequirementsFaster response time of fresher data

Sampling is not good enough & history is important

Increasing complexity of analyticsUsers demand inexpensive experimentation

Where is it Coming From?Computer Generated

• Application server logs (web sites, games)

• Sensor data (weather, water, smart grids)

• Images/videos (traffic, security cameras)

Human Generated• Twitter “Fire Hose” 50m tweets/day 1,400% growth per year

• Blogs/Reviews/Emails/Pictures

• Social Graphs: Facebook, Linked-in, Contacts

The Role of Data is Changing

Until now, Questions you ask drove Data model

New model is collect as much data as possible – “Data-First Philosophy”

Data is the new raw material

for any business on par with capital, people, labor

Data is the new raw material for any business

on par with capital, people, labor

We Need Tools Built Specifically for Big Data

Hadoop

• Scale out Easily• Parallel Computing• Commodity Hardware

• Solves some Problems• Complex to Run• Special Skills to Maintain

How the Cloud IsBig Data’s Best

Friend

How do we define the cloud?By Benefits!

Cloud

Elasticity

Fast Time to Market Focus on core competency

Pay Per Use

No Cap Ex

Why is the CloudBig Data’s Best Friend

We know we want collect, store, organize, analyze and share it.But we have limited resources.

The Cloud OptimizesPrecious IT Resources

i.e. Skilled People

“Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x.

While the pool of IT staff available to manage them will grow only slightly. At 1.5x”

- 2011 IDC Digital Universe Study

Deploying a Hadoop cluster is hard

Using Big Data

70%

The Old IT World

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

Cloud-BasedInfrastructure

Using Big Data

Analyzing and Using Big Data Configuring Cloud Assets

70%

30%70%

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

The Old IT World

ReusabilityManaged Services

Scale Innovation

ReusabilityManaged Services

Scale Innovation

ReusabilityManaged Services

Scale Innovation

ReusabilityManaged Services

Scale Innovation

ReusabilityManaged Services

Scale Innovation

The Cloud OptimizesCapacity Resources

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic Compute Capacity

Elastic Compute Capacity

On and Off Fast Growth

Predictable peaksVariable peaks

WASTE

CUSTOMER DISSATISFACTION

Elastic cloud capacity

Traditional

IT capacity

Your IT needs

Time

Capacity

Elastic Compute Capacity

Elastic Compute Capacity

Fast GrowthOn and Off

Predictable peaksVariable peaks

The CloudEmpowers Users to Balance

Cost and Time

1 instance for 500 hours=

500 instances for 1 hourI like this!

I scale

The CloudReduces Cost

For Experimentation

The Cloud Enables Collection and

Storageof Big Data

Q4

2006

Q4

2007

Q4

2008

Q4

2009

Q4

2010

Q4

2011

Q2

2012

0.000

250.000

500.000

750.000

1000.000

1 Trillion

750k+ peak transactions per second

Simple Storage Service

Global Accessibility RegionsRegion

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC (Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

Storage Costs are Declining

Big Data on the Cloud

In the Real World

Big Data Verticals

Media/Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recommend

Transactions Analysis

Life Sciences

Genome Analysis

Financial Services

Monte Carlo Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network/Gami

ng

User Demographics

Usage analysis

In-game metrics

Visualizations

Bank – Monte Carlo Simulations“The AWS platform was a good fit for its

unlimited and flexible computational power to our risk-simulation process requirements.

With AWS, we now have the power to decide how fast we want to obtain simulation results,

and, more importantly, we have the ability to run simulations not possible before due to the

large amount of infrastructure required.” – Castillo, Director, Bankinter

23 Hours to 20 Minutes

The Taste Test http://www.etsy.com/tastetest

Recommendations

etsy.com/gifts

Recommendations

Gift Ideas for Facebook Friends

Targeted Ad

User recently purchased a

sports movie and is searching for video games

(1.7 Million per day)

Click Stream Analysis

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of Big Data

Questions?

Joe ZieglerTechnical Evangelistzieglerj@amazon.com @jiyosub