ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

19
Enabling Real-Time Analytics Using Hadoop Map/Reduce Copyright © 2013 by ScaleOut Software, Inc. Briefing on New Product Release: ScaleOut hServer™ V2 October 14, 2013 Bill Bain, CEO ([email protected]) David Brinker, COO ([email protected])

description

Welcome to real-time analytics for Hadoop! ScaleOut hServer V2 is the world's first in-memory execution engine for Hadoop MapReduce. Now you can analyze live data using standard Hadoop MapReduce code, in memory and in parallel without the need to install and manage the Hadoop stack of software. (Only one small change is needed to your Hadoop program.) Gone are disk I/O latencies, slow start-up times, and software environment management headaches. Benchmark tests have demonstrated 20x faster execution time over the Apache Hadoop distribution. Now you can use Hadoop MapReduce in live applications in financial services, e-commerce, logistics, and countless other scenarios where results are needed in seconds instead of minutes or hours. Learn more: http://www.scaleoutsoftware.com/products/scaleout-hserver Watch the presentation video: http://inside-bigdata.com/2013/10/15/enabling-real-time-analytics-using-hadoop-mapreduce/

Transcript of ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

Page 1: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

Enabling Real-Time Analytics Using Hadoop Map/Reduce

Copyright © 2013 by ScaleOut Software, Inc.

Briefing on New Product Release: ScaleOut hServer™ V2

October 14, 2013

Bill Bain, CEO ([email protected]) David Brinker, COO ([email protected])

Page 2: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

2 ScaleOut Software, Inc.

ScaleOut hServer V2:

•  World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid

•  Full Hadoop MapReduce support for “live” fast-changing data

•  20x performance improvement in benchmark tests

•  Significant new technology to simplify development and maximize ease of use

What’s New Today

Page 3: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

3 ScaleOut Software, Inc.

•  Develops and markets software middleware for: •  Scaling application performance and •  Performing real-time analytics using •  In-memory data storage and computing

•  Executive Team:

•  Dr. William Bain, Founder & CEO

•  Career focused on parallel computing – Bell Labs, Intel, Microsoft

•  3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server

•  David Brinker, COO

•  25 years software business and executive management experience

•  Mentor Graphics, Cadence, Webridge

•  Eight years market experience in Windows & Linux; 400 customers

About ScaleOut Software

Page 4: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

4 ScaleOut Software, Inc.

•  ScaleOut StateServer®

•  In-Memory Data Grid for Windows and Linux

•  Scales application performance.

•  Industry-leading performance and ease of use

•  ScaleOut GeoServer® adds •  WAN based data replication for DR •  Breakthrough technology for global

data access

•  ScaleOut Analytics Server® adds •  Real-time data analysis for “live” data

•  Comprehensive management tools

•  Introducing ScaleOut hServer™ V2 •  Full Hadoop Map/Reduce engine (20X faster*) •  Hadoop Map/Reduce on live, in-memory data

ScaleOut Software Products ScaleOut StateServer In-Memory Data Grid

GridService

GridService

GridService

GridService

*in benchmark testing

Page 5: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

5 ScaleOut Software, Inc.

ScaleOut Analytics Server stores and analyzes “live” data:

•  In-memory storage holds live data sets which are continuously updated and accessed within operational systems. •  Examples: stock ticker data, business rules, order & inventory data

•  Integrated analytics engine tracks important patterns & trends.

•  Data-parallel analysis delivers results in msec. to seconds.

IMDGs Perform Real-Time Analytics

Page 6: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

6 ScaleOut Software, Inc.

Integrate analysis into a stock trading platform:

•  The IMDG holds market data and hedging strategies.

•  Updates to market data continuously flow through the IMDG.

•  The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time.

•  IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers.

Example in Financial Services

Page 7: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

7 ScaleOut Software, Inc.

Example Uses

Online loan apps & banking

Portfolio management

Trading systems

Reservations systems

Ecommerce shopping

Customer service sites

Streaming entertainment

Configuration engines

Gaming

Customers •  400 unique customers •  35 Fortune 500 customers •  32 countries •  9,000 servers licensed •  50% have multiple deployments

% in $$s

Entertain.)&)Commun.

13%Financial)&)Insurance

26%Ecommerce)

Sales17%

Ecommerce)Services19%

Travel)&)Transport.

4%

Gov't)&)Education

10%

Software8%

Other3%

Page 8: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

8 ScaleOut Software, Inc.

•  In-Memory Data Grids have become key in several fast-growth markets.

•  Drivers:

•  Cloud computing / virtualization

•  Hardware enablement

•  Competitive pressure

•  Exploding workloads

•  Big data analysis

•  ScaleOut addresses scalability and analytics.

IMDGs Seeing Wide Adoption

Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013

Big Data Analytics $18B 1

Enterprise Software

$292B 2

HPC / Grid Computing

$25B 3

In-Memory Data Grids

$355M 4

Page 9: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

9 ScaleOut Software, Inc.

Big Data Analytics $18B

Analytics Market

Static data sets Petabytes Disk storage Hours to minutes Best uses:

•  Analyzing warehoused data

•  Mining for long-term trends

Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses:

•  Tracking live data

•  Immediately identifying trends and capturing opportunities

Analytics Server

hServer

Hadoop IBM

Teradata SAS SAP

Real-Time Batch

Real-time “Operational Intelligence”

Batch “Business Intelligence”

Page 10: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

10 ScaleOut Software, Inc.

Run continuous Hadoop on live data, while it’s being updated.

Accelerate Hadoop on static data with a one line code change.

Quickly prototype Hadoop code.

ScaleOut hServer Targeted Use Cases

“Capture perishable business opportunities and identify issues.” Real-time risk

analysis Credit card fraud

detection

“Speed-up Hadoop execution by >10X for faster business insights.”

Process simulations

Financial modeling

“Validate your Hadoop code before it goes into batch processing.”

Fast-turn debug and tuning

No need to install Hadoop stack

...

...

...

Page 11: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

11 ScaleOut Software, Inc.

•  Typically used for very large, static, offline datasets

•  Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis.

•  Hadoop Map/Reduce adds lengthy batch scheduling overhead.

Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics

Page 12: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

12 ScaleOut Software, Inc.

Benefits:

•  Enables real-time analysis using Hadoop M/R APIs. •  Accelerates data access by staging data in memory.

•  Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions.

•  Analyzes “live” data.

•  Allows Hadoop M/R programs to run without change.

•  Eliminates complexity in Hadoop deployment.

•  Enables rapid prototyping.

Solution: Integrate Hadoop M/R into In-Memory Data Grid

Page 13: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

13 ScaleOut Software, Inc.

Enables Hadoop Map/Reduce to perform real-time analysis:

•  Adds full Map/Reduce engine to SOAS IMDG. •  Delivers results in msec. to seconds instead of

minutes or hours. •  Benchmark results show 20X speedup.

•  Has flexible options for data storage/access: •  Hadoop programs can access/store

key/value pairs using either IMDG or HDFS.

•  Automatically caches HDFS data in IMDG for fast access.

•  Allows dynamic updates to key/value pairs during analysis to support “live” data.

•  Ships as open source Java library combined with SOAS IMDG.

Introducing ScaleOut hServer™ V2

Page 14: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

14 ScaleOut Software, Inc.

•  ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG.

•  Hadoop programs optionally can output results to IMDG with Grid Record Writer.

•  Grid Record Reader optimizes access to key/value pairs to eliminate network overhead.

•  Applications can access and update key/value pairs as operational data during analysis.

Enabling Access to IMDG Data

Page 15: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

15 ScaleOut Software, Inc.

•  ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution.

•  Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs.

•  Dataset Record Reader stores and retrieves data with minimum network and memory overheads.

•  Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG.

Enabling Fast Access to HDFS Data

Page 16: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

16 ScaleOut Software, Inc.

ScaleOut hServer Editions

•  Offered in community and commercial editions

•  Community Edition can be used for evaluation or production

•  Hybrid open source / proprietary licensing

Editions

Community Commercial

# Servers Up to 4 100s

Expected data set size

256GB (max) GB - TBs

Pricing Free Subscription & perpetual

Support Community Forum

Full support

Page 17: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

17 ScaleOut Software, Inc.

•  IMDGs help scale application performance and analyze “live” data in real-time.

•  Hadoop focuses on analyzing large, static (offline) datasets held in file systems.

•  ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: •  Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.

•  Accelerates Map/Reduce execution by 20X in benchmark tests.

•  Enables Hadoop applications to analyze “live,” in-memory data.

•  Offers flexible access to both in-memory and file-based data.

•  Eliminates complex Hadoop deployment and tuning.

•  Offers a fast, easy-to-use platform for rapid prototyping.

Summary

Page 18: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

18 ScaleOut Software, Inc.

A few examples: •  Equity trading: to minimize risk during a trading day •  Ecommerce: to optimize real-time shopping activity •  Reservations systems: to identify issues, reroute, etc. •  Credit cards: to detect fraud in real time •  Smart grids: to optimize power distribution & detect issues

Online Systems Need Real-Time Analysis

Page 19: ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

19 ScaleOut Software, Inc.

•  ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara).

•  Based on 150 responses:

•  78% of organizations generate fast-changing data.

•  60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months.

•  Only 42% consider Hadoop to be an effective platform for real-time analysis, but…

•  93% would benefit from real-time data analytics.

•  71% consider a 10X improvement in performance meaningful.

•  Take-away: Hadoop users need real-time analytics.

Hadoop Users Need Real-Time Analytics