HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

41
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo

description

HKU CSIS DB Seminar Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes): HomeDistance from HKUArea (m 2 ) Kevin1 km10 Ben9 km100 Felix5 km2 K.K Loo8 km250 Good  Close to HKU (Min.) Good  Max. Area (Max.) Return those homes that are not worse than any others in ALL DIMENSIONS

Transcript of HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

Page 1: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Skyline QueriesHKU CSIS DB Seminar

9 April 2003Speaker: Eric Lo

Page 2: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Skyline A new operator in database systems Filters out a set of interesting points from a

potential large set of data points A data point is interesting if it is not

dominated by any other point

Page 3: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes):

Home Distance from HKU Area (m2)

Kevin 1 km 10

Ben 9 km 100

Felix 5 km 2

K.K Loo 8 km 250 Good Close to HKU (Min.) Good Max. Area (Max.) Return those homes that are not worse than any others in ALL

DIMENSIONS

Page 4: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

Page 5: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

The Skyline OperatorICDE 2001S.Borzonyi, D.Kossmann, K.Stocker

1. Define the skyline operator in databases

2. Extension of SQL for skyline3. Block-nested-loop Algorithm4. Divide-and-conquer Algorithm

Page 6: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Problem Definition Related to:

maximum vector problem contour problem convex hull of a data set

Assume the whole dataset fits in the memory

Page 7: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

SQL Extensions SELECT … FROM … WHERE …

GROUP BY … HAVING …SKYLINE OF [DISTINCT] d1 [MIN | MAX],SKYLINE OF [DISTINCT] d1 [MIN | MAX],

… …… … dm [MIN | MAX] dm [MIN | MAX]ORDER BY …

d1… dm denote the dimensions participate the Skyline

SELECT * FROM HOMESWHERE CITY=‘HK’SKYLINE OF DIST MIN, AREA MAX;

Page 8: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Naïve Approach for Skyline 1D skyline is equivalent to computing min,

max in SQL Naïve 2D skyline:

Sort the data according to the 2 dimensions Compare every tuple with its predecessor Sorting may need in 2 or more passes if the data

are not fit into memory use current external sorting techniques

Page 9: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Naïve 2DHome Distance from HKU Area

Kevin 1 km 10

Felix 5 km 2

KK 8 km 250

Ben 9 km 100

1. Sort by “Distance”2. Compare “Felix” with “Kevin” eliminate “Felix”3. Compare “KK” with “Kevin” incompatible part of skyline4. Compare “Ben” with “KK” eliminate “Ben”

Page 10: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Naïve 2D not works for > 2Ds If skyline involves more than 2D, sorting does not work

Home Distance from HKU Area Rent

Kevin 1 km 10 $9

Felix 5 km 2 $5

KK 8 km 250 $10

Ben 9 km 100 $9

2D 3DCmp Felix, Kevin eliminatedCmp KK, Kevin part of skylineCmp Ben, KK eliminated

Cmp Felix, Kevin part of skylineCmp KK, Felix part of skylineCmp Ben, KK part of skyline No! Ben dominated by Kevin predecessor not work!

Page 11: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Block-nested-loops Algorithm A straightforward approach:

Compare each point p with every other point If p is not dominated part of skyline Scan the data file and keeping a list of candidate

skyline points in main memory

Page 12: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

BNL cont.

1. Insert the 1st data point into the list2. For each subsequent point p:

1. If p is dominated by any point in the list, it is discarded

2. If p dominates any point in the list, insert it into the list and remove all points dominated by p

3. If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list

Page 13: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

BNL cont. The candidate list is self-organizing:

Points that have been dominated other points are moved to the top of list

Reduces the number of comparisons E.g. the self-organizing list holdings the partial skyline like:

Home Distance from HKU Area Rent

Kevin 1 km 249 $1

K.K 8 km 250 $100

...…

Other skylines which is not as strong as Kevin except a few dimensions

Page 14: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

More on BNL Point 3 in BNL: If p neither dominated, nor d

ominates any point in the list, inserted it as part of the skyline in the list. If the are no more space in the list, write p on a temporary file on disk. Tuples in tmp file will be further processed in next iteration of algorithm

Page 15: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

More on BNL (cont.)A

E

F

H

ABCDEFGHIJ

Dom. by A

Dom. by A

Dom. by A

Incompatible with A

Incompatible with A,E

Dom On F, replace F

G

Incompatible with A,E,G

Incompatible with A,E,G,H, but full!

Incompatible with A,E,G,H, but full!J has not compare with I

IJ

After 1st Iteration, A,E,G,H areoutput as skylines, then clear upthe list and treat I,J… as newdata set and perform BNL again

Page 16: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Short summary on BNL Easy to implement Any dimension without using index or sorting Relies on main memory may have many

iterations Not adequate for on-line processing it has

to read the entire data file before it returns the first skyline point (not progressively…)

Page 17: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Divide-and-Conquer AlgorithmPrice Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2

1) Find the median of some dimension, sayprice, Price(med)=0.3

2) Split the input into 2 partitions according to Price(med)

3) Compute Skyline S1 in P1(<0.3) and S2 in P2(>=0.3) respectively by recursive partitioning.[Note: S1 is better than S2 on price]

4) Recursive partitioning until a partition contains very few (or 1) tuples

5) If only a few tuples, find out skyline is very easy6) Merging the skylines of each partitions

by eliminating those S2 which are dominated by S1[Note: None of the tuples in S1 can be dominatedby S2 as all tuples in S1 are better than S2 on pricei.e. tuples in UPPER never be eliminated]

Page 18: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Divide-and-Conquer AlgorithmPrice Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2

0.3 2

0.3 3

0.2 4

0.2 3

0.1 2

1) Find the median of some dimension, sayprice, Price(med)=0.3

2) Split the input into 2 partitions according to Price(med)

0.8 4

0.6 1

0.5 2

0.4 3

0.8 3

Page 19: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Divide-and-Conquer Algorithm

Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2

0.3 2

0.3 3

0.2 4

0.2 3

0.1 2

0.8 4

0.6 1

0.5 2

0.4 3

0.8 3

0.4 3

0.5 2

0.6 1

0.8 3

0.8 4

0.2 4

0.2 3

0.1 2

0.3 2

0.3 3

S1

S2

S3

S4

S5

S6S7

Page 20: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Divide-and-Conquer Algorithm

Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2

0.3 2

0.3 3

0.2 4

0.2 3

0.1 2

0.8 4

0.6 1

0.5 2

0.4 3

0.8 3

0.4 3

0.5 2

0.8 4

0.8 3

0.6 1

0.2 4

0.2 3

0.1 2

0.3 2

0.3 3

S1

S2

S3

S4

S5

S6S7

S1, S2

S4, S5, S7

S1,S2, S7i.e. tuples in UPPER never be eliminated

Page 21: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Efficient Progressive Skyline ComputationVLDB 2001K.L. Tan, P.K. Eng, B.C. Ooi

Previous approach require at least onepass over the dataset to return the firstinteresting point, We propose:

1. Bitmap-based Algorithm2. B+-tree-based Algorithm

They can return the first interesting point oncethey identified.

Page 22: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Progressive? Both bitmap and tree-base returns skyline

very quickly Maybe useful if you are not willing to wait so

long for the first few interesting homes out of the large dataset

Also outperform BNL and D-&-C in overall response time

Page 23: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Skyline by Bitmap Main idea:

Given a point p, if “something” can tell you: p is not dominated by any other points in DB

skyline! p is dominated by some points in DB throw away

Non-blocking! Can return the skyline points immediately

Page 24: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Bitmap All information requires to decide whether a

point is in skyline are encoded in bitmaps A data point p = (p1, p2, …, pd) where d is no.

of dimensions, is mapped to a m-bit vector, m is number of distinct values over all dimensions

Page 25: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Bitmap Distinct values on price

and distance is 7 and 4 m = 11

Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2

Page 26: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Bitmap representation Distinct value on x: 10 Distinct value on y: 10 m=20 20-bit vector E.g (4,8):

4 is 4-th smallest on dimension x, set 4-th to the leftmost be 1 (starting from right)

8 is 8-th smallest on y, set 8-th to the leftmost be 1)

Point Bitmap Representation

(1,9) (1111111111, 1100000000)

(2,10) (1111111110, 1000000000)

(4,8 ) (1111111000, 1110000000)

(6,7 ) (1111100000, 1111000000)

(9,10 ) (1100000000, 1000000000)

(7,5 ) (1111000000, 1111110000)

(5,6 ) (1111110000, 1111100000)

(4,3 ) (1111111000, 1111111100)

(3,2 ) (1111111100, 1111111110)

(9,1 ) (1100000000, 1111111111)

(10,4 ) (1000000000, 1111111000)

(6,2 ) (1111100000, 1111111110)

(8,3) (1110000000, 1111111100)

Page 27: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Bitmap representation Do (4,8) is a skyline point?

(min x, y) Create bit-strings Cx and Cy

(Not CY Ng!) Cx= 1110000110000 Cy= 0011011111111 Cx & Cy =

0010000110000 If Cx&Cy has more than one

‘1’, dominated by some points

Point Bitmap Representation

(1,9) (1111111111, 1100000000)

(2,10) (1111111110, 1000000000)

(4,8 ) (1111111000, 1110000000)

(6,7 ) (1111100000, 1111000000)

(9,10 ) (1100000000, 1000000000)

(7,5 ) (1111000000, 1111110000)

(5,6 ) (1111110000, 1111100000)

(4,3 ) (1111111000, 1111111100)

(3,2 ) (1111111100, 1111111110)

(9,1 ) (1100000000, 1111111111)

(10,4 ) (1000000000, 1111111000)

(6,2 ) (1111100000, 1111111110)

(8,3) (1110000000, 1111111100)

Page 28: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Bitmap representation Do (3,2) is a skyline poi

nt? (min x, y) Create bit-strings Cx an

d Cy Cx= 1100000010000 Cy= 0000000011010 Cx & Cy =

0000000010000 If Cx&Cy has only 1, it i

s a skyline

Point Bitmap Representation

(1,9) (1111111111, 1100000000)

(2,10) (1111111110, 1000000000)

(4,8 ) (1111111000, 1110000000)

(6,7 ) (1111100000, 1111000000)

(9,10 ) (1100000000, 1000000000)

(7,5 ) (1111000000, 1111110000)

(5,6 ) (1111110000, 1111100000)

(4,3 ) (1111111000, 1111111100)

(3,2 ) (1111111100, 1111111110)

(9,1 ) (1100000000, 1111111111)

(10,4 ) (1000000000, 1111111000)

(6,2 ) (1111100000, 1111111110)

(8,3) (1110000000, 1111111100)

Page 29: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Short summary on Bitmap Need to pre-compute bitmap representation o

f every point Each point retrieve all bitmaps in order to get

the juxtapositions (Cx and Cy) Large storage if the domain of each attributes

are large

Page 30: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Some other progressive algorithms B+-tree index (also proposed by BOC)

Organizes the points into d lists (d is no. of dimensions in data)

Build B+tree on the lists for retrieving skylines Suffer similar problem as bitmap approach

Page 31: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Some other progressive algorithms (cont.) NN algorithm (by Donald Kossmann agai

n) [VLDB 02]

Page 32: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

NN skyline

Page 33: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion

Page 34: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

An Optimal and Progressive Algorithm for Skyline QueriesSIGMOD 2003D.Papadias, Y. Tao, G. Fu, B. Seeger

We propose:

1. A NN algorithm which is more efficient andI/O Optimal

2. Ranked skyline queries3. Constrained skyline queries4. Dynamic skyline queries5. K-dominating queries

Page 35: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Ranked Skyline A ranked skyline returns K skyline points that

have minimum/max score according to a function f

In our example, f = 3*Dist + 7*Area Return the top K homes

Though skylines are returning interesting points, we may want the most interesting points according to our own preferences, especially the data set is large(say hotels) and skyline is also large!

Page 36: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Constrained Skyline Returning the most interesting points in a specific

data space

Page 37: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Dynamic Skyline Returning update skyline dynamically E.g. Ask for hotels with minimum distance

and price (again?) Minimum distance is now depends on my

current location

Page 38: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Enumerating Skyline Enumerate queries return, for each skyline

point p, the number of points dominated by p Sometime useful if you want to know this

skyline hotel C has dominated 1000 hotels, and another hotel Y dominated only 1 hotel maybe C is better than Y in many properties (e.g.

price, dist, etc), but Y has only 1 properties better than C, e.g. with PS2

Page 39: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Experimental Evaluation

Running time comparison on progressive algorithms without NN approaches

Index

Bitmap

D&C

BNL

Page 40: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

Conclusion Introduction the skyline queries How to implement (support) the skyline

operator in DBMS? Variation of skyline queries If the information are placed in different

places, how to answer skyline queries on a mobile device?

Page 41: HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

HKU CSIS DB Seminar

References S.Borzonyi, D.Kossmann, K.Stocker. The Skyline Operat

or. ICDE 2001. K.L. Tan, P.K. Eng, B.C. Ooi . Efficient Progressive Skyli

ne Computation. VLDB 2001. D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in the

Sky: An Online Algorithm for Skyline Queries. VLDB 2002.

D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD 2003.