ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce
-
Upload
insidehpc -
Category
Technology
-
view
602 -
download
7
description
Transcript of ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce
Enabling Real-Time Analytics Using Hadoop Map/Reduce
Copyright © 2013 by ScaleOut Software, Inc.
Briefing on New Product Release: ScaleOut hServer™ V2
October 14, 2013
Bill Bain, CEO ([email protected]) David Brinker, COO ([email protected])
2 ScaleOut Software, Inc.
ScaleOut hServer V2:
• World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid
• Full Hadoop MapReduce support for “live” fast-changing data
• 20x performance improvement in benchmark tests
• Significant new technology to simplify development and maximize ease of use
What’s New Today
3 ScaleOut Software, Inc.
• Develops and markets software middleware for: • Scaling application performance and • Performing real-time analytics using • In-memory data storage and computing
• Executive Team:
• Dr. William Bain, Founder & CEO
• Career focused on parallel computing – Bell Labs, Intel, Microsoft
• 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server
• David Brinker, COO
• 25 years software business and executive management experience
• Mentor Graphics, Cadence, Webridge
• Eight years market experience in Windows & Linux; 400 customers
About ScaleOut Software
4 ScaleOut Software, Inc.
• ScaleOut StateServer®
• In-Memory Data Grid for Windows and Linux
• Scales application performance.
• Industry-leading performance and ease of use
• ScaleOut GeoServer® adds • WAN based data replication for DR • Breakthrough technology for global
data access
• ScaleOut Analytics Server® adds • Real-time data analysis for “live” data
• Comprehensive management tools
• Introducing ScaleOut hServer™ V2 • Full Hadoop Map/Reduce engine (20X faster*) • Hadoop Map/Reduce on live, in-memory data
ScaleOut Software Products ScaleOut StateServer In-Memory Data Grid
GridService
GridService
GridService
GridService
*in benchmark testing
5 ScaleOut Software, Inc.
ScaleOut Analytics Server stores and analyzes “live” data:
• In-memory storage holds live data sets which are continuously updated and accessed within operational systems. • Examples: stock ticker data, business rules, order & inventory data
• Integrated analytics engine tracks important patterns & trends.
• Data-parallel analysis delivers results in msec. to seconds.
IMDGs Perform Real-Time Analytics
6 ScaleOut Software, Inc.
Integrate analysis into a stock trading platform:
• The IMDG holds market data and hedging strategies.
• Updates to market data continuously flow through the IMDG.
• The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time.
• IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers.
Example in Financial Services
7 ScaleOut Software, Inc.
Example Uses
Online loan apps & banking
Portfolio management
Trading systems
Reservations systems
Ecommerce shopping
Customer service sites
Streaming entertainment
Configuration engines
Gaming
Customers • 400 unique customers • 35 Fortune 500 customers • 32 countries • 9,000 servers licensed • 50% have multiple deployments
% in $$s
Entertain.)&)Commun.
13%Financial)&)Insurance
26%Ecommerce)
Sales17%
Ecommerce)Services19%
Travel)&)Transport.
4%
Gov't)&)Education
10%
Software8%
Other3%
8 ScaleOut Software, Inc.
• In-Memory Data Grids have become key in several fast-growth markets.
• Drivers:
• Cloud computing / virtualization
• Hardware enablement
• Competitive pressure
• Exploding workloads
• Big data analysis
• ScaleOut addresses scalability and analytics.
IMDGs Seeing Wide Adoption
Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013
Big Data Analytics $18B 1
Enterprise Software
$292B 2
HPC / Grid Computing
$25B 3
In-Memory Data Grids
$355M 4
9 ScaleOut Software, Inc.
Big Data Analytics $18B
Analytics Market
Static data sets Petabytes Disk storage Hours to minutes Best uses:
• Analyzing warehoused data
• Mining for long-term trends
Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses:
• Tracking live data
• Immediately identifying trends and capturing opportunities
Analytics Server
hServer
Hadoop IBM
Teradata SAS SAP
Real-Time Batch
Real-time “Operational Intelligence”
Batch “Business Intelligence”
10 ScaleOut Software, Inc.
Run continuous Hadoop on live data, while it’s being updated.
Accelerate Hadoop on static data with a one line code change.
Quickly prototype Hadoop code.
ScaleOut hServer Targeted Use Cases
“Capture perishable business opportunities and identify issues.” Real-time risk
analysis Credit card fraud
detection
“Speed-up Hadoop execution by >10X for faster business insights.”
Process simulations
Financial modeling
“Validate your Hadoop code before it goes into batch processing.”
Fast-turn debug and tuning
No need to install Hadoop stack
...
...
...
11 ScaleOut Software, Inc.
• Typically used for very large, static, offline datasets
• Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis.
• Hadoop Map/Reduce adds lengthy batch scheduling overhead.
Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics
12 ScaleOut Software, Inc.
Benefits:
• Enables real-time analysis using Hadoop M/R APIs. • Accelerates data access by staging data in memory.
• Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions.
• Analyzes “live” data.
• Allows Hadoop M/R programs to run without change.
• Eliminates complexity in Hadoop deployment.
• Enables rapid prototyping.
Solution: Integrate Hadoop M/R into In-Memory Data Grid
13 ScaleOut Software, Inc.
Enables Hadoop Map/Reduce to perform real-time analysis:
• Adds full Map/Reduce engine to SOAS IMDG. • Delivers results in msec. to seconds instead of
minutes or hours. • Benchmark results show 20X speedup.
• Has flexible options for data storage/access: • Hadoop programs can access/store
key/value pairs using either IMDG or HDFS.
• Automatically caches HDFS data in IMDG for fast access.
• Allows dynamic updates to key/value pairs during analysis to support “live” data.
• Ships as open source Java library combined with SOAS IMDG.
Introducing ScaleOut hServer™ V2
14 ScaleOut Software, Inc.
• ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG.
• Hadoop programs optionally can output results to IMDG with Grid Record Writer.
• Grid Record Reader optimizes access to key/value pairs to eliminate network overhead.
• Applications can access and update key/value pairs as operational data during analysis.
Enabling Access to IMDG Data
15 ScaleOut Software, Inc.
• ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution.
• Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs.
• Dataset Record Reader stores and retrieves data with minimum network and memory overheads.
• Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG.
Enabling Fast Access to HDFS Data
16 ScaleOut Software, Inc.
ScaleOut hServer Editions
• Offered in community and commercial editions
• Community Edition can be used for evaluation or production
• Hybrid open source / proprietary licensing
Editions
Community Commercial
# Servers Up to 4 100s
Expected data set size
256GB (max) GB - TBs
Pricing Free Subscription & perpetual
Support Community Forum
Full support
17 ScaleOut Software, Inc.
• IMDGs help scale application performance and analyze “live” data in real-time.
• Hadoop focuses on analyzing large, static (offline) datasets held in file systems.
• ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: • Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.
• Accelerates Map/Reduce execution by 20X in benchmark tests.
• Enables Hadoop applications to analyze “live,” in-memory data.
• Offers flexible access to both in-memory and file-based data.
• Eliminates complex Hadoop deployment and tuning.
• Offers a fast, easy-to-use platform for rapid prototyping.
Summary
18 ScaleOut Software, Inc.
A few examples: • Equity trading: to minimize risk during a trading day • Ecommerce: to optimize real-time shopping activity • Reservations systems: to identify issues, reroute, etc. • Credit cards: to detect fraud in real time • Smart grids: to optimize power distribution & detect issues
Online Systems Need Real-Time Analysis
19 ScaleOut Software, Inc.
• ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara).
• Based on 150 responses:
• 78% of organizations generate fast-changing data.
• 60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months.
• Only 42% consider Hadoop to be an effective platform for real-time analysis, but…
• 93% would benefit from real-time data analytics.
• 71% consider a 10X improvement in performance meaningful.
• Take-away: Hadoop users need real-time analytics.
Hadoop Users Need Real-Time Analytics