Benchmarking “No One Size Fits All” Big Data Analytics
description
Transcript of Benchmarking “No One Size Fits All” Big Data Analytics
![Page 1: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/1.jpg)
Benchmarking “No One Size Fits All”
Big Data AnalyticsBigFrame Team
The Hong Kong Polytechnic UniversityDuke University
HP Labs
![Page 2: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/2.jpg)
Analytics System Landscape
• MPP DBo Greenplum, SQL server PDW, Teradata, etc.
• Columnaro Vertica, Redshift, Vectorwise, etc.
• MapReduceo Hadoop, Hive, HadoopDB, Tenzing, etc
• Streamingo Storm, Streambase, etc
• Grapho Pregel, GraphLab, etc
• Multi-tenancyo Mesos, Yarn, etc
![Page 3: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/3.jpg)
Analytics System Landscape
• MPP DBo Greenplum, SQL server PDW, Teradata, etc.
• Columnaro Vertica, Redshift, Vectorwise, etc.
• MapReduceo Hadoop, Hive, HadoopDB, Tenzing, etc
• Streamingo Storm, Streambase, etc
• Grapho Pregel, GraphLab, etc
• Multi-tenancyo Mesos, Yarn, etc
What does this mean for Big Data Practitioners?
![Page 4: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/4.jpg)
Gives them a lot of power!
![Page 5: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/5.jpg)
Even the mighty may need a little help
![Page 6: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/6.jpg)
Challenges for PractitionersWhich system touse for the app that I am developing?
• Features (e.g. graph data)
• Performance (e.g., claims like System A is 50x faster than B)
• Resource efficiency• Growth and scalability• Multi-tenancyApp Developers,
Data Scientists
![Page 7: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/7.jpg)
Challenges for PractitionersWhich system touse for the app that I am developing?
Different parts of my app have different requirements
Compose "best of breed" systems Or Use "one size fits all" System?
App Developers, Data Scientists
![Page 8: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/8.jpg)
Challenges for PractitionersWhich system touse for the app that I am developing?
Different parts of my app have different requirements
Managing manysystems is hard!
App Developers, Data Scientists
System Admins CIO
Total Cost of Ownership (TCO)?
![Page 9: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/9.jpg)
NeedBenchmarks
![Page 10: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/10.jpg)
One Approach
Categorize systems
Develop a benchmark per system category
![Page 11: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/11.jpg)
Useful, But ...
• MPP DB, Columnaro TPC-H/TPC-DS, Berkeley Big Data Benchmark etc.
• MapReduceo Terasort, DFSIO, GridMix, HiBench etc.
• Streamingo Linear Road, etc.
• Grapho Graph 500, PageRank, etc.
• ...
![Page 12: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/12.jpg)
Problem: May miss the Big Picture
![Page 13: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/13.jpg)
Problem: May miss the Big Picture
• Cannot capture the complexities and end-to-end behavior of big data applications and deployments:o Bottleneckso Data conversion, transfer, & loading overheadso Storage costs & other parts of the data life-cycleo Resource management challengeso Total Cost of Ownership (TCO)
![Page 14: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/14.jpg)
A Better Approach:
BigBench or Deep Analytics Pipeline:• Applications driven• Involved multiple types of data:
o Structuredo Semi-structuredo Unstructured
• Involved multiple types of operator:o Relation Operators: join, group byo Text Analytics: Sentiment analysiso Machine Learning
![Page 15: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/15.jpg)
Problem:
Give a man fish and you will feed him for a day.
Give him fishing gear and you will feed him for life.
--Anonymous
Benchmark
X
XBenchmark Generator
![Page 16: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/16.jpg)
BigFrameA Benchmark Generator for
Big Data Analytics
![Page 17: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/17.jpg)
How a user uses BigFrame
HiveMapReduce
HBase
BigFrame Interface
BenchmarkGenerator
Benchmark Driver for System Under
Test
bigif(benchmark input format)
bigspec(benchmark
specification)
result
run the benchmark
System Under Test
![Page 18: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/18.jpg)
bigspec: Benchmark Specification
HiveMapReduce
HBase
![Page 19: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/19.jpg)
What should be captured by the benchmark input format
• The 3Vs
VolumeVelocity
Variety
![Page 20: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/20.jpg)
bigif: BigFrame's InputFormat
![Page 21: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/21.jpg)
Benchmark Generation
bigif(benchmark input format)
bigspec(benchmark
specification)BenchmarkGenerator
bigif describes points in a discrete space of
{Data, Query} X {Variety, Volume, Velocity}
1. Initial data to load2. Data refresh pattern3. Query streams4. Evaluation metrics
Benchmark generation can be addressed as a search problem within a rich application domain
![Page 22: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/22.jpg)
Application Domain Modeled Currently
E-commerce sales,promotions,
recommendations
Social media sentiment &
influence
Benchmark generation can be addressed as a search problem within a rich application domain
![Page 23: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/23.jpg)
Application Domain Modeled Currently
![Page 24: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/24.jpg)
Application Domain Modeled Currently
Item
Web_sales
Promotion
![Page 25: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/25.jpg)
Application Domain Modeled Currently
![Page 26: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/26.jpg)
Use Case 1: Exploratory BI
• Large volumes of relational data
• Mostly aggregation and few join
• Can Spark's performance match that of a MPP DB
BigFrame will generate a benchmark specification containing
relational data and (SQL-ish) queries
Data Variety = {Relational}
Query Variety = {Micro}
![Page 27: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/27.jpg)
Use Case 2: Complex BI
• Large volumes of relational data
• Even larger volumes of text data
• Combined analytics
Data Variety = {Relational, text}
Query Variety = {Macro} (application-focused instead of micro-benchmark)
BigFrame will generate a benchmark specification that includes
sentiment analysis tasks over tweets
![Page 28: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/28.jpg)
Use Case 3: Dashboards
• Large volume and velocity of relational and text data
• Continuously-updated Dashboards
Data Velocity= Fast
Query Variety = continuous(as opposed to Exploratory)
BigFrame will generate a benchmark specification that includes data refresh as well as continuous queries whose results change upon data refresh
![Page 29: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/29.jpg)
Working with the community
• First release of BigFrame planned for August 2013o open source with extensibility APIs
• Benchmark Driver for more systems• Utilities (accessed through the benchmark
Driver to drill down into system behavior during benchmarking)
• Instantiate the BigFrame pipeline for more app domains
![Page 30: Benchmarking “No One Size Fits All” Big Data Analytics](https://reader036.fdocuments.in/reader036/viewer/2022062521/56816766550346895ddc4ab4/html5/thumbnails/30.jpg)
Take Away
• Benchmarks shape a field (for better or worse); they are how we determine the value of change.
--(David Patterson, University of California Berkeley, 1994).
• Benchmarks meet different needs for different people• End customers, application developers, system
designers, system administrators, researchers, CIOs
• BigFrame helps users generate benchmarks that best meet their needs