Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP?...
Transcript of Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP?...
![Page 1: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/1.jpg)
Insights of Approximate Query Processing Systems
Presented by: Huanyi Chen
Ruoxi Zhang
![Page 2: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/2.jpg)
Agenda§ Introduction
§ Background
§ VerdictDB & SnappyData
§ Experiment Setup
§ Evaluation
§ Insights
Insights of Approximate Query Processing Systems PAGE 2
![Page 3: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/3.jpg)
Why AQP?
Insights of Approximate Query Processing Systems PAGE 3
# of Day Income (CAD)1 1502 2403 1804 2005 2306 1907 180
Avg(Income) 195.71
shop income
![Page 4: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/4.jpg)
Why AQP?
Insights of Approximate Query Processing Systems PAGE 4
# of Day Income (CAD)1 1502 2403 1804 2005 2306 1907 180
Avg(Income)
186.67shop income
more efficient (50% rows)accuracy > 95%
195.71
![Page 5: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/5.jpg)
Why AQP?
Insights of Approximate Query Processing Systems PAGE 5
99.9% Identical100x-200x Faster
![Page 6: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/6.jpg)
Sampling Based AQP
Insights of Approximate Query Processing Systems PAGE 6
![Page 7: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/7.jpg)
Sampling Based AQP
Insights of Approximate Query Processing Systems PAGE 7
Query Column Set (QCS)
![Page 8: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/8.jpg)
Why SnappyData & VerdictDB ?
Insights of Approximate Query Processing Systems PAGE 8
Name Online/Offline
Distributed/Standalone Platform Algorithm Skewed
BlinkDB Offline Distributed Hive/Hadoop (Shark) Stratified sampling Yes
Sapprox Online Distributed Hadoop Distribution-aware Online sampling No
Approxhadoop Online Distributed Hadoop Approximation-enabled MapReduce No
Quickr Online Distributed N/A ASALQA algorithm No
SnappyData Online Distributed Spark and GemFireSpark as a computational engine; GemFire as transactional store
No
FluoDB Online Distributed Spark Mini-batch execution OLA Model No
XDB Online Standalone PostgreSQL Wander join No
VerdictDB Online Standalone Spark SQL Database learning No
IDEA Online Standalone N/AReuse answers of past overlapping queries for new query
No
BEAS Online Standalone Commercial DBMS Approximability theorem NoABS Online Standalone N/A Bootstrap No
• Spark• Open-source*
![Page 9: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/9.jpg)
SnappyData
Insights of Approximate Query Processing Systems PAGE 9
SDE is NOT open
source
![Page 10: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/10.jpg)
SnappyData
Insights of Approximate Query Processing Systems PAGE 10
+ WITH ERROR
QCSFRACTION
![Page 11: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/11.jpg)
VerdictDB
Insights of Approximate Query Processing Systems PAGE 11
![Page 12: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/12.jpg)
VerdictDB
Insights of Approximate Query Processing Systems PAGE 12
![Page 13: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/13.jpg)
Experiment Setup
Insights of Approximate Query Processing Systems PAGE 13
§ Cluster Setup§ SnappyData: 1 locator, 1 lead, and 2 servers
![Page 14: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/14.jpg)
Experiment Setup
Insights of Approximate Query Processing Systems PAGE 14
§ Cluster Setup§ SnappyData: 1 locator, 1 lead, and 2 servers
§ VerdictDB on Spark: 1 master and 2 executors
§ Each Node§ 24/32 GB memory used
§ 500 GB HDD
![Page 15: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/15.jpg)
Experiment Setup
Insights of Approximate Query Processing Systems PAGE 15
§ TPC-H Benchmark § OLAP
§ 22 queries includes Aggregation, Join, etc.
§ Well known and standard
§ Customizable
§ Data§ 1GB and 10GB
§ Uniformly distributed
![Page 16: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/16.jpg)
Evaluation
Insights of Approximate Query Processing Systems PAGE 16
SnappyData• Stratified Sampling• In-memory
VerdictDB• Uniform Sampling• Not in-memory (bug?)
![Page 17: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/17.jpg)
SnappyData - Latency
Insights of Approximate Query Processing Systems PAGE 17
29439
1832
66298092
13993870
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Q1 Q6 Q14
Execution time (ms) using TPC-H (SF=10, fraction 0.1)
SnappyData SnappyData_AQP (>95% accuracy)
Q1: Up to 3.6x speedup~0.0001 Error
![Page 18: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/18.jpg)
fraction0.1
SnappyData - Accuracy
Insights of Approximate Query Processing Systems PAGE 18
Base Table
fraction0.01
Sample Tables ...
0.00
0.00
0.01
0.01
0.02
0.02
fraction 0.01 fraction 0.1 fraction 0.2 fraction 0.3
Actual Error for TPC-H Q14 result (SF=10) given different sample tables (fraction)
01,0002,0003,0004,0005,0006,0007,000
fraction 0.01
fraction 0.1
fraction 0.2
fraction 0.3
Snappy
Time (ms) for TPC-H Q14 result (SF=10) given different sample tables (fraction)
![Page 19: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/19.jpg)
SnappyData- Creating Sample Tables
Insights of Approximate Query Processing Systems PAGE 19
0
50,000
100,000
150,000
200,000
250,000
fraction 0.01 fraction 0.1 fraction 0.2 fraction 0.3
Time (ms) for creating SnappyData sample tables with different fractions
![Page 20: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/20.jpg)
VerdictDB - Latency
Insights of Approximate Query Processing Systems PAGE 20
206034
8891299195
17598 1521024355
0
50,000
100,000
150,000
200,000
250,000
Q1 Q6 Q14
Execution time (ms) using TPC-H (SF=10, fraction 0.1)
SparkSQL VerdictDB (> 95% accuracy)
Up to ~11x speedup!
![Page 21: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/21.jpg)
VerdictDB - Speedup
Insights of Approximate Query Processing Systems PAGE 21
0
2
4
6
8
10
12
14
Q1 Q6 Q14
Speedup for TPC-H (SF=10, fraction=0.1)
Speedup
0
2
4
6
8
10
12
Q1 Q6 Q14
Speedup for TPC-H (SF=1, fraction=0.1)
Speedup
![Page 22: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/22.jpg)
VerdictDB - Creating Sample Tables
Insights of Approximate Query Processing Systems PAGE 22
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
fraction 0.01 fraction 0.1 fraction 0.2 fraction 0.3
Time (ms) for creating VerdictDB sample tables with different fraction
![Page 23: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/23.jpg)
fraction0.1
VerdictDB - Accuracy
Insights of Approximate Query Processing Systems PAGE 23
Base Table
fraction0.01
Sample Tables ...
0
0.05
0.1
0.15
0.2
0.25
fraction 0.01
fraction 0.05
fraction 0.1
fraction 0.2
fraction 0.3
Actual Error for TPC-H Q14 result (SF=10)given different sample tables (fraction)
0
20,000
40,000
60,000
80,000
100,000
120,000
fraction 0.01
fraction 0.05
fraction 0.1 fraction 0.2fraction 0.3 SparkSQL
Time (ms) for TPC-H Q14 result (SF=10)given different sample tables (fraction)
converge!
![Page 24: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/24.jpg)
Other Queries?
Insights of Approximate Query Processing Systems PAGE 24
Q19Error: ~ 80%
Speedup: ~5.5X
Q14 Error: ~ 1.7%
Speedup: ~1.7X
![Page 25: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/25.jpg)
Other Queries?
Insights of Approximate Query Processing Systems PAGE 25
Key missing in sample tables!
Careful design of sample tableor original table!
Q7AQP not working!
![Page 26: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/26.jpg)
Insights
Insights of Approximate Query Processing Systems PAGE 26
§ AQP performs well:§ For aggregate functions such as SUM, AVG and COUNT
§ When WHERE is simple
§ Users’ foreseen is important!§ for both query and original table
![Page 27: Insights of Approximate Query Processing Systemstozsu/courses/CS848/W19/projects...Why AQP? Insightsof Approximate Query Processing Systems PAGE 3 # of Day Income (CAD) 1 150 2 240](https://reader033.fdocuments.in/reader033/viewer/2022050109/5f4769dbc515bf34491f6a8f/html5/thumbnails/27.jpg)
Future Work
Insights of Approximate Query Processing Systems PAGE 27
§ Test error estimation in sampling
§ Other sampling techniques§ Biased Sampling
§ Database learning
§ Approximate hardware