A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian...
-
Upload
stephon-bordwell -
Category
Documents
-
view
216 -
download
0
Transcript of A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian...
A Thin Monitoring Layer for Top-k Aggregation Queries over a Database
Foteini Alvanaki Sebastian MichelSaarland University
DBRank 2013, Riva Del Garda, Italy
Data Cube
Brand
Coun
try
Product Type
sum(price*quantity)
Data Cube
Brand Product Type Country sum(Price*Quantity)Brand1 Type1 Country1 1234
Brand1 Type2 Country1 3522
Brand1 Type1 1234
Brand1 Type2 3522
Brand1 Country1 4756
Type1 Country1 1234
Type2 Country1 3522
Brand1 4756
Type1 1234
Type2 3522
Country1 4756
1. What are the top-2 product types with the highest revenue of each brand in each country?
2. What are the top-2 brands with the highest revenue in each country?
Top-k Queries• Primary Attribute: The attribute/dimension over which the selection is performed (e.g. product type)• Secondary Attributes: Used to filter specific results (e.g. brand, country)• Aggregated Attributes: Used to compute an aggregated score (e.g. price, quantity)• Aggregate Function: e.g. sum
One top-k query for each combination of secondary attribute instances (filtering condition)
Filtering Conditions: Example (1)brand={X} - country = {Y, W}
brand=X AND country=Y
brand=X AND country=W
SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=WGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
Filtering Conditions: Example (2)country = {Y, W} - brand={X}
country=Y
country=W
brand=X
SELECT type, SUM(price*quantity) FROM relation WHERE country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
SELECT type, SUM(price*quantity) FROM relation WHERE country=WGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
SELECT type, SUM(price*quantity) FROM relation WHERE brand=XGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
Filtering Conditions: Example (3)country = {Y, W} - brand={X}
SELECT type, SUM(price*quantity) FROM relationGROUP BY typeORDER BY SUM(price*quantity) LIMIT K
UpdatesInsertions to the underlying database that contain all information related to the top-k queries
INSERT INTO relation (type, brand, country, price, quantity)
VALUES (T, X, Y, 100, 3)
Problem
How to maintain all these queries in the presence of fast updates?
Outline
Setting/Problem• Algorithms– Naïve Approach– Estimates Approach– Groups Approach
• Experimental Results• Conclusions
ExampleSELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT 2
Update: (type, X, Y, 300)
Naïve ApproachCase 1: type in the top-2, e.g. (B,X,Y,300)
Type ScoreA 3452B 2406 +300
Type ScoreA 3452B 2706
Case 2: type NOT in the top-2, e.g. (K,X,Y,300)
Verification Query: SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=Y AND type=KGROUP BY type
Estimates ApproachIn-memory Structures
• top-(k+N) instances with exact aggregated scores• B instances with estimated aggregated scores• best possible score (basic score) + inserted values
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 2112Q 2076R 1997
Buffer
Estimates ApproachCase 1.1: type in the top-2, e.g. (B,X,Y,300)
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreA 3452B 2706C 2356D 2167E 1987
top-2
top-5
+300
Estimates ApproachCase 1.2: type in the top-5, e.g. (D,X,Y,300)
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreA 3452B 2406C 2356D 2467E 1987
top-2
top-5+300
Type ScoreA 3452D 2467B 2406C 2356E 1987
Estimates ApproachCase 2: type in the Buffer, e.g. (P,X,Y,300)
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 2112Q 2076R 1997
Buffer
+300
Type ScoreO 1990P 2412Q 2076R 1997
Buffer
Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=P GROUP BY type
Estimates ApproachSub-case 2.1: score(P) < score(E), e.g. score(P) = 756
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 756Q 2076R 1997
BufferType ScoreO 1990
Q 2076R 1997
Estimates ApproachSub-case 2.2: score(P) > score(E), e.g. score(P) = 2178
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 2178Q 2076R 1997
BufferType ScoreA 3452B 2406C 2356D 2167P 2178
Type ScoreO 1990
Q 2076R 1997
Estimates ApproachSub-case 2.3: score(P) > score(B), e.g. score(P) = 2407
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 2407Q 2076R 1997
BufferType ScoreA 3452P 2407B 2406C 2356D 2167
Type ScoreO 1990
Q 2076R 1997
Estimates Approach
Buffer Full Reset Query
Estimated Score(T) = basic score + 300 = 2287
Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)
SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YAND type IN (O,P,Q,R)GROUP BY type
Estimates Approach
score(O)=1254, score(P)=432, score(Q)=2050, score(R)=1990
Type ScoreA 3452B 2406C 2356D 2167E 1987
top-2
top-5
Type ScoreO 1990P 2112Q 2076R 1997
BufferType ScoreT 2287
Type ScoreA 3452B 2406C 2356D 2167Q 2050
Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)
Queries Characteristics
• SAME primary attribute• SAME aggregate attributes• SAME aggregate function• SAME top-k condition• DIFFERENT filtering condition
Lattice organisation
Groups Approach
• The updates are forwarded from top to bottom in the lattice
• Each ranking forwards the queried results to the rankings lying in lower levels in the lattice
Groups Approach: Example
SELECT type, SUM(price*quantity)FROM relationWHERE brand=XGROUP BY typeORDER BY SUM(price*quantity)LIMIT 2
Update: (type, X, Y, 300)Ranking: brand=X, country=ANY
Groups ApproachCase 2: type in the Buffer, e.g. (P,X,Y,300)
Verification Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type=P
Buffer Reset Query:SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R)
Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)
Groups Approach
• Tuples (type, brand, country, price*quantity) limited to those satisfying its filtering condition
• Uses them to compute the scores.
• Forwards them to the rankings lower in lattice
• Rankings receiving tuples use those qualifying to their filtering condition to compute the scores
Outline
ProblemAlgorithms
Naïve Approach Estimates Approach Groups Approach
• Experimental Results• Conclusions
Experiments (1)
• TPC-H data• Select on part.p_partKey (200,000 unique
values)• Filter on customer.c_mktsegment,
orders.o_orderpriority and region.r_name• Aggregation sum on lineitem.l_quantity• 216 total rankings• 30,000 updates/insertions
Experiments (2)
Updates• Random: inserts quantity between 1 and 50 for a
random part.p_partKey.• 80-20: inserts quantity between 1 and 50 for a
part.p_partKey selected according to the 80-20 rule
N-extra Gap• Difference between top-k and top-(k+N) scores
100% (1*50) and 200% (2*50)
80-20 Updates: Queries
80-20 Updates: Time
Random Updates: Queries
Random Updates: Time
Naïve Approach
• 80-20 updates: 239,985 Verification Queries, 4 secs/update
• Random updates: 239,977 Verification Queries, 4 secs/update
Outline
ProblemAlgorithms
Naïve Approach Estimates Approach Groups Approach
Experimental Results• Conclusions
Conclusion
• Two algorithms to maintain top-k rankings in the presence of fast updates arriving in an underlying database
• Exact top-k results• Faster than a Naïve approach while Groups
Approach limits further the communication with the database
• Preliminary results which provide insights on the impact of the various parameters in the effectiveness of our methods
Thank you!
Additional Instances