The Hadoop Guarantee: Keeping Analytics Running On Time

34
Grab some coffee and enjoy the preshow banter before the top of the hour!

Transcript of The Hadoop Guarantee: Keeping Analytics Running On Time

Grab some coffee and enjoy the pre-­show banter

before the top of the

hour!

The Briefing Room

The Hadoop Guarantee: Keeping Analytics Running On Time

Twitter Tag: #briefr The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Twitter Tag: #briefr The Briefing Room

  Reveal the essential characteristics of enterprise software, good and bad

  Provide a forum for detailed analysis of today’s innovative technologies

 Give vendors a chance to explain their product to savvy analysts

  Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr The Briefing Room

Topics

September: HADOOP 2.0

October: DATA MANAGEMENT

November: ANALYTICS

Twitter Tag: #briefr The Briefing Room

The Holy Grail of Hadoop

Ø Mixed Workloads!

Ø Deep visibility into the cluster

Ø Ability to define & meet SLAs

Twitter Tag: #briefr The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Twitter Tag: #briefr The Briefing Room

Pepperdata

Pepperdata offers a platform for managing and optimizing Hadoop clusters

 The platform monitors and balances resources across multiple workloads and/or clusters in real time

Pepperdata provides an interactive dashboard with real-time visualizations and reports on hardware usage

Twitter Tag: #briefr The Briefing Room

Guest: Sean Suchter

Sean Suchter, Cofounder, CEO of Pepperdata Sean was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.

©2015 Pepperdata

Sean Suchter, CEO & Cofounder

September 15, 2015

Pepperdata:Bringing Predictability & Reliability to Hadoop

©2015 Pepperdata

Agenda

•  Market trends

•  Customer demands

•  Where Pepperdata fits

•  Q&A

©2015 Pepperdata

Market Reality

•  Unreliability of Hadoop •  Growing skills gap•  Multitude of vendors & tools in ecosystem

Unpredictable jobs Bottlenecks, missed SLAs

Poor visibility Lengthy troubleshooting, “flying blind”

Inefficient cluster allocation Overbuilding, costs

Many organizations state that big data is high priority for them, but many will fail to see a competitive advantage due to issues such as:

©2015 Pepperdata

Mature deployments have increasing requirements

•  Multi-tenancy (multiple workloads, multiple tenants)

•  Internal deployments of Hadoop-as-a-Service

•  Guaranteed SLAs

Organizations today demand

©2015 Pepperdata

Node-level metrics

YARN

Node-level metrics

Pepperdata

Real-time metrics by queue, user, job, task

Allocate resources dynamically (maximize utilization)

Control hardware usage (priority jobs complete on time)

Schedule jobs; pre-allocate memory, CPU

Prevent rogue jobs from harming high-priority jobs

When jobs are scheduled

Once jobs are running

During & after job runtime

You need more than YARN

©2015 Pepperdata

No human can make the thousands of decisions a second necessary for dynamic, real-time hardware resource management.

Time and sweat won’t solve the problem

©2015 Pepperdata

Pepperdata lets enterprises rely on Hadoop

•  Provide mission critical applications in multi-tenant environments

•  Monitor and control hardware usage dynamically and in real time

•  Enable SLAs, increase throughput, and improve visibility

Companies can now:

©2015 Pepperdata

“ DEMO

©2015 Pepperdata

Pepperdata: unmatched visibility

©2015 Pepperdata

“ Thank you.

Twitter Tag: #briefr The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Hadoop Performance

Robin Bloor, PhD

Hadoop Had a Dream

The Biological Analog

u Our human control system works at different speeds: •  Internal systems – Enteric nervous system •  Instant external reflex – Spinal cord •  Fast external response – Motor systems •  Considered response – The brain

u  Swift external response is predictive analytics & triggers

u Considered response is analytics

A While Ago…

The Hadoop Disruption

Hadoop Evolution

HDFS & MapReduce

HDFS YARNSpark

HDFS YARNMapReduce

Serial Single Batch

Serial Multiple Batch

Serial Multiple Microbatch

The Spark Dynamic

u  Spark has become the de facto vehicle for many distinct Hadoop projects: analytics and data integration

u  It can do “microbtach streaming,” but it is not ideal for very low latency applications

u  It has in-memory capability (=100x in memory, 10x on disk)

u  Speed of development

u  Spark SQL

So What’s Missing?

u Resource allocation

u Resource management by “job”

u Dynamic prioritization of workloads

u Real-time monitoring

u  Service management: performance and throughput feedback and controls

u Capacity planning

Operational Control

Hadoop has the potential to be the “scale-out OS” for data as soon as

it can manage its resources

u  How easy is Pepperdata to implement? What’s the process?

u  What is (roughly) the most complex environment in respect to workloads where Pepperdata is deployed? Please describe.

u  What is the Pepperdata proposition in respect to ROI?

u  Are there any competing products?

u  Which specific companies/products do you complement?

u  Is there any Hadoop distribution that you prefer? If so, why?

Twitter Tag: #briefr The Briefing Room

Twitter Tag: #briefr The Briefing Room

Upcoming Topics

www.insideanalysis.com

September: HADOOP 2.0

October: DATA MANAGEMENT

November: ANALYTICS

Twitter Tag: #briefr The Briefing Room

THANK YOU for your

ATTENTION!

Some images provided courtesy of Wikimedia Commons and http://desvadgama.com/wp-content/uploads/2012/11/holy-grail.jpg