A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian...

39
A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini Alvanaki Sebastian Michel Saarland University DBRank 2013, Riva Del Garda, Italy

Transcript of A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian...

Page 1: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

A Thin Monitoring Layer for Top-k Aggregation Queries over a Database

Foteini Alvanaki Sebastian MichelSaarland University

DBRank 2013, Riva Del Garda, Italy

Page 2: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Data Cube

Brand

Coun

try

Product Type

sum(price*quantity)

Page 3: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Data Cube

Brand Product Type Country sum(Price*Quantity)Brand1 Type1 Country1 1234

Brand1 Type2 Country1 3522

Brand1 Type1 1234

Brand1 Type2 3522

Brand1 Country1 4756

Type1 Country1 1234

Type2 Country1 3522

Brand1 4756

Type1 1234

Type2 3522

Country1 4756

1. What are the top-2 product types with the highest revenue of each brand in each country?

2. What are the top-2 brands with the highest revenue in each country?

Page 4: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Top-k Queries• Primary Attribute: The attribute/dimension over which the selection is performed (e.g. product type)• Secondary Attributes: Used to filter specific results (e.g. brand, country)• Aggregated Attributes: Used to compute an aggregated score (e.g. price, quantity)• Aggregate Function: e.g. sum

One top-k query for each combination of secondary attribute instances (filtering condition)

Page 5: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Filtering Conditions: Example (1)brand={X} - country = {Y, W}

brand=X AND country=Y

brand=X AND country=W

SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=WGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

Page 6: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Filtering Conditions: Example (2)country = {Y, W} - brand={X}

country=Y

country=W

brand=X

SELECT type, SUM(price*quantity) FROM relation WHERE country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

SELECT type, SUM(price*quantity) FROM relation WHERE country=WGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

SELECT type, SUM(price*quantity) FROM relation WHERE brand=XGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

Page 7: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Filtering Conditions: Example (3)country = {Y, W} - brand={X}

SELECT type, SUM(price*quantity) FROM relationGROUP BY typeORDER BY SUM(price*quantity) LIMIT K

Page 8: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

UpdatesInsertions to the underlying database that contain all information related to the top-k queries

INSERT INTO relation (type, brand, country, price, quantity)

VALUES (T, X, Y, 100, 3)

Page 9: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Problem

How to maintain all these queries in the presence of fast updates?

Page 10: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Outline

Setting/Problem• Algorithms– Naïve Approach– Estimates Approach– Groups Approach

• Experimental Results• Conclusions

Page 11: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

ExampleSELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YGROUP BY typeORDER BY SUM(price*quantity) LIMIT 2

Update: (type, X, Y, 300)

Page 12: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Naïve ApproachCase 1: type in the top-2, e.g. (B,X,Y,300)

Type ScoreA 3452B 2406 +300

Type ScoreA 3452B 2706

Case 2: type NOT in the top-2, e.g. (K,X,Y,300)

Verification Query: SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=Y AND type=KGROUP BY type

Page 13: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachIn-memory Structures

• top-(k+N) instances with exact aggregated scores• B instances with estimated aggregated scores• best possible score (basic score) + inserted values

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 2112Q 2076R 1997

Buffer

Page 14: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachCase 1.1: type in the top-2, e.g. (B,X,Y,300)

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreA 3452B 2706C 2356D 2167E 1987

top-2

top-5

+300

Page 15: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachCase 1.2: type in the top-5, e.g. (D,X,Y,300)

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreA 3452B 2406C 2356D 2467E 1987

top-2

top-5+300

Type ScoreA 3452D 2467B 2406C 2356E 1987

Page 16: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachCase 2: type in the Buffer, e.g. (P,X,Y,300)

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 2112Q 2076R 1997

Buffer

+300

Type ScoreO 1990P 2412Q 2076R 1997

Buffer

Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=P GROUP BY type

Page 17: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachSub-case 2.1: score(P) < score(E), e.g. score(P) = 756

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 756Q 2076R 1997

BufferType ScoreO 1990

Q 2076R 1997

Page 18: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachSub-case 2.2: score(P) > score(E), e.g. score(P) = 2178

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 2178Q 2076R 1997

BufferType ScoreA 3452B 2406C 2356D 2167P 2178

Type ScoreO 1990

Q 2076R 1997

Page 19: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates ApproachSub-case 2.3: score(P) > score(B), e.g. score(P) = 2407

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 2407Q 2076R 1997

BufferType ScoreA 3452P 2407B 2406C 2356D 2167

Type ScoreO 1990

Q 2076R 1997

Page 20: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates Approach

Buffer Full Reset Query

Estimated Score(T) = basic score + 300 = 2287

Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)

SELECT type, SUM(price*quantity) FROM relationWHERE brand=X AND country=YAND type IN (O,P,Q,R)GROUP BY type

Page 21: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Estimates Approach

score(O)=1254, score(P)=432, score(Q)=2050, score(R)=1990

Type ScoreA 3452B 2406C 2356D 2167E 1987

top-2

top-5

Type ScoreO 1990P 2112Q 2076R 1997

BufferType ScoreT 2287

Type ScoreA 3452B 2406C 2356D 2167Q 2050

Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300)

Page 22: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Queries Characteristics

• SAME primary attribute• SAME aggregate attributes• SAME aggregate function• SAME top-k condition• DIFFERENT filtering condition

Page 23: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Lattice organisation

Page 24: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Groups Approach

• The updates are forwarded from top to bottom in the lattice

• Each ranking forwards the queried results to the rankings lying in lower levels in the lattice

Page 25: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Groups Approach: Example

SELECT type, SUM(price*quantity)FROM relationWHERE brand=XGROUP BY typeORDER BY SUM(price*quantity)LIMIT 2

Update: (type, X, Y, 300)Ranking: brand=X, country=ANY

Page 26: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Groups ApproachCase 2: type in the Buffer, e.g. (P,X,Y,300)

Verification Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type=P

Buffer Reset Query:SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R)

Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300)

Page 27: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Groups Approach

• Tuples (type, brand, country, price*quantity) limited to those satisfying its filtering condition

• Uses them to compute the scores.

• Forwards them to the rankings lower in lattice

• Rankings receiving tuples use those qualifying to their filtering condition to compute the scores

Page 28: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Outline

ProblemAlgorithms

Naïve Approach Estimates Approach Groups Approach

• Experimental Results• Conclusions

Page 29: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Experiments (1)

• TPC-H data• Select on part.p_partKey (200,000 unique

values)• Filter on customer.c_mktsegment,

orders.o_orderpriority and region.r_name• Aggregation sum on lineitem.l_quantity• 216 total rankings• 30,000 updates/insertions

Page 30: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Experiments (2)

Updates• Random: inserts quantity between 1 and 50 for a

random part.p_partKey.• 80-20: inserts quantity between 1 and 50 for a

part.p_partKey selected according to the 80-20 rule

N-extra Gap• Difference between top-k and top-(k+N) scores

100% (1*50) and 200% (2*50)

Page 31: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

80-20 Updates: Queries

Page 32: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

80-20 Updates: Time

Page 33: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Random Updates: Queries

Page 34: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Random Updates: Time

Page 35: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Naïve Approach

• 80-20 updates: 239,985 Verification Queries, 4 secs/update

• Random updates: 239,977 Verification Queries, 4 secs/update

Page 36: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Outline

ProblemAlgorithms

Naïve Approach Estimates Approach Groups Approach

Experimental Results• Conclusions

Page 37: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Conclusion

• Two algorithms to maintain top-k rankings in the presence of fast updates arriving in an underlying database

• Exact top-k results• Faster than a Naïve approach while Groups

Approach limits further the communication with the database

• Preliminary results which provide insights on the impact of the various parameters in the effectiveness of our methods

Page 38: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Thank you!

Page 39: A Thin Monitoring Layer for Top-k Aggregation Queries over a Database Foteini AlvanakiSebastian Michel Saarland University DBRank 2013, Riva Del Garda,

Additional Instances