HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
-
Upload
noreen-dalton -
Category
Documents
-
view
214 -
download
0
description
Transcript of HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
HKU CSIS DB Seminar
Skyline QueriesHKU CSIS DB Seminar
9 April 2003Speaker: Eric Lo
HKU CSIS DB Seminar
Skyline A new operator in database systems Filters out a set of interesting points from a
potential large set of data points A data point is interesting if it is not
dominated by any other point
HKU CSIS DB Seminar
Example Find some good places for us to hold the next DB Seminar Dataset (Table Homes):
Home Distance from HKU Area (m2)
Kevin 1 km 10
Ben 9 km 100
Felix 5 km 2
K.K Loo 8 km 250 Good Close to HKU (Min.) Good Max. Area (Max.) Return those homes that are not worse than any others in ALL
DIMENSIONS
HKU CSIS DB Seminar
Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion
HKU CSIS DB Seminar
The Skyline OperatorICDE 2001S.Borzonyi, D.Kossmann, K.Stocker
1. Define the skyline operator in databases
2. Extension of SQL for skyline3. Block-nested-loop Algorithm4. Divide-and-conquer Algorithm
HKU CSIS DB Seminar
Problem Definition Related to:
maximum vector problem contour problem convex hull of a data set
Assume the whole dataset fits in the memory
HKU CSIS DB Seminar
SQL Extensions SELECT … FROM … WHERE …
GROUP BY … HAVING …SKYLINE OF [DISTINCT] d1 [MIN | MAX],SKYLINE OF [DISTINCT] d1 [MIN | MAX],
… …… … dm [MIN | MAX] dm [MIN | MAX]ORDER BY …
d1… dm denote the dimensions participate the Skyline
SELECT * FROM HOMESWHERE CITY=‘HK’SKYLINE OF DIST MIN, AREA MAX;
HKU CSIS DB Seminar
Naïve Approach for Skyline 1D skyline is equivalent to computing min,
max in SQL Naïve 2D skyline:
Sort the data according to the 2 dimensions Compare every tuple with its predecessor Sorting may need in 2 or more passes if the data
are not fit into memory use current external sorting techniques
HKU CSIS DB Seminar
Naïve 2DHome Distance from HKU Area
Kevin 1 km 10
Felix 5 km 2
KK 8 km 250
Ben 9 km 100
1. Sort by “Distance”2. Compare “Felix” with “Kevin” eliminate “Felix”3. Compare “KK” with “Kevin” incompatible part of skyline4. Compare “Ben” with “KK” eliminate “Ben”
HKU CSIS DB Seminar
Naïve 2D not works for > 2Ds If skyline involves more than 2D, sorting does not work
Home Distance from HKU Area Rent
Kevin 1 km 10 $9
Felix 5 km 2 $5
KK 8 km 250 $10
Ben 9 km 100 $9
2D 3DCmp Felix, Kevin eliminatedCmp KK, Kevin part of skylineCmp Ben, KK eliminated
Cmp Felix, Kevin part of skylineCmp KK, Felix part of skylineCmp Ben, KK part of skyline No! Ben dominated by Kevin predecessor not work!
HKU CSIS DB Seminar
Block-nested-loops Algorithm A straightforward approach:
Compare each point p with every other point If p is not dominated part of skyline Scan the data file and keeping a list of candidate
skyline points in main memory
HKU CSIS DB Seminar
BNL cont.
1. Insert the 1st data point into the list2. For each subsequent point p:
1. If p is dominated by any point in the list, it is discarded
2. If p dominates any point in the list, insert it into the list and remove all points dominated by p
3. If p neither dominated, nor dominates any point in the list, inserted it as part of the skyline in the list
HKU CSIS DB Seminar
BNL cont. The candidate list is self-organizing:
Points that have been dominated other points are moved to the top of list
Reduces the number of comparisons E.g. the self-organizing list holdings the partial skyline like:
Home Distance from HKU Area Rent
Kevin 1 km 249 $1
K.K 8 km 250 $100
...…
Other skylines which is not as strong as Kevin except a few dimensions
HKU CSIS DB Seminar
More on BNL Point 3 in BNL: If p neither dominated, nor d
ominates any point in the list, inserted it as part of the skyline in the list. If the are no more space in the list, write p on a temporary file on disk. Tuples in tmp file will be further processed in next iteration of algorithm
HKU CSIS DB Seminar
More on BNL (cont.)A
E
F
H
ABCDEFGHIJ
Dom. by A
Dom. by A
Dom. by A
Incompatible with A
Incompatible with A,E
Dom On F, replace F
G
Incompatible with A,E,G
Incompatible with A,E,G,H, but full!
Incompatible with A,E,G,H, but full!J has not compare with I
IJ
After 1st Iteration, A,E,G,H areoutput as skylines, then clear upthe list and treat I,J… as newdata set and perform BNL again
HKU CSIS DB Seminar
Short summary on BNL Easy to implement Any dimension without using index or sorting Relies on main memory may have many
iterations Not adequate for on-line processing it has
to read the entire data file before it returns the first skyline point (not progressively…)
HKU CSIS DB Seminar
Divide-and-Conquer AlgorithmPrice Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2
1) Find the median of some dimension, sayprice, Price(med)=0.3
2) Split the input into 2 partitions according to Price(med)
3) Compute Skyline S1 in P1(<0.3) and S2 in P2(>=0.3) respectively by recursive partitioning.[Note: S1 is better than S2 on price]
4) Recursive partitioning until a partition contains very few (or 1) tuples
5) If only a few tuples, find out skyline is very easy6) Merging the skylines of each partitions
by eliminating those S2 which are dominated by S1[Note: None of the tuples in S1 can be dominatedby S2 as all tuples in S1 are better than S2 on pricei.e. tuples in UPPER never be eliminated]
HKU CSIS DB Seminar
Divide-and-Conquer AlgorithmPrice Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
1) Find the median of some dimension, sayprice, Price(med)=0.3
2) Split the input into 2 partitions according to Price(med)
0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
HKU CSIS DB Seminar
Divide-and-Conquer Algorithm
Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
0.4 3
0.5 2
0.6 1
0.8 3
0.8 4
0.2 4
0.2 3
0.1 2
0.3 2
0.3 3
S1
S2
S3
S4
S5
S6S7
HKU CSIS DB Seminar
Divide-and-Conquer Algorithm
Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2
0.3 2
0.3 3
0.2 4
0.2 3
0.1 2
0.8 4
0.6 1
0.5 2
0.4 3
0.8 3
0.4 3
0.5 2
0.8 4
0.8 3
0.6 1
0.2 4
0.2 3
0.1 2
0.3 2
0.3 3
S1
S2
S3
S4
S5
S6S7
S1, S2
S4, S5, S7
S1,S2, S7i.e. tuples in UPPER never be eliminated
HKU CSIS DB Seminar
Efficient Progressive Skyline ComputationVLDB 2001K.L. Tan, P.K. Eng, B.C. Ooi
Previous approach require at least onepass over the dataset to return the firstinteresting point, We propose:
1. Bitmap-based Algorithm2. B+-tree-based Algorithm
They can return the first interesting point oncethey identified.
HKU CSIS DB Seminar
Progressive? Both bitmap and tree-base returns skyline
very quickly Maybe useful if you are not willing to wait so
long for the first few interesting homes out of the large dataset
Also outperform BNL and D-&-C in overall response time
HKU CSIS DB Seminar
Skyline by Bitmap Main idea:
Given a point p, if “something” can tell you: p is not dominated by any other points in DB
skyline! p is dominated by some points in DB throw away
Non-blocking! Can return the skyline points immediately
HKU CSIS DB Seminar
Bitmap All information requires to decide whether a
point is in skyline are encoded in bitmaps A data point p = (p1, p2, …, pd) where d is no.
of dimensions, is mapped to a m-bit vector, m is number of distinct values over all dimensions
HKU CSIS DB Seminar
Bitmap Distinct values on price
and distance is 7 and 4 m = 11
Price Dist0.2 40.8 40.4 30.3 20.1 20.6 10.8 30.2 30.3 30.5 2
HKU CSIS DB Seminar
Bitmap representation Distinct value on x: 10 Distinct value on y: 10 m=20 20-bit vector E.g (4,8):
4 is 4-th smallest on dimension x, set 4-th to the leftmost be 1 (starting from right)
8 is 8-th smallest on y, set 8-th to the leftmost be 1)
Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
HKU CSIS DB Seminar
Bitmap representation Do (4,8) is a skyline point?
(min x, y) Create bit-strings Cx and Cy
(Not CY Ng!) Cx= 1110000110000 Cy= 0011011111111 Cx & Cy =
0010000110000 If Cx&Cy has more than one
‘1’, dominated by some points
Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
HKU CSIS DB Seminar
Bitmap representation Do (3,2) is a skyline poi
nt? (min x, y) Create bit-strings Cx an
d Cy Cx= 1100000010000 Cy= 0000000011010 Cx & Cy =
0000000010000 If Cx&Cy has only 1, it i
s a skyline
Point Bitmap Representation
(1,9) (1111111111, 1100000000)
(2,10) (1111111110, 1000000000)
(4,8 ) (1111111000, 1110000000)
(6,7 ) (1111100000, 1111000000)
(9,10 ) (1100000000, 1000000000)
(7,5 ) (1111000000, 1111110000)
(5,6 ) (1111110000, 1111100000)
(4,3 ) (1111111000, 1111111100)
(3,2 ) (1111111100, 1111111110)
(9,1 ) (1100000000, 1111111111)
(10,4 ) (1000000000, 1111111000)
(6,2 ) (1111100000, 1111111110)
(8,3) (1110000000, 1111111100)
HKU CSIS DB Seminar
Short summary on Bitmap Need to pre-compute bitmap representation o
f every point Each point retrieve all bitmaps in order to get
the juxtapositions (Cx and Cy) Large storage if the domain of each attributes
are large
HKU CSIS DB Seminar
Some other progressive algorithms B+-tree index (also proposed by BOC)
Organizes the points into d lists (d is no. of dimensions in data)
Build B+tree on the lists for retrieving skylines Suffer similar problem as bitmap approach
HKU CSIS DB Seminar
Some other progressive algorithms (cont.) NN algorithm (by Donald Kossmann agai
n) [VLDB 02]
HKU CSIS DB Seminar
NN skyline
HKU CSIS DB Seminar
Outline Introduction to Skyline Queries Skyline Operator in SQL Implementation Algorithms Progressive Algorithms Variations of Skyline Queries Experimental Result Conclusion
HKU CSIS DB Seminar
An Optimal and Progressive Algorithm for Skyline QueriesSIGMOD 2003D.Papadias, Y. Tao, G. Fu, B. Seeger
We propose:
1. A NN algorithm which is more efficient andI/O Optimal
2. Ranked skyline queries3. Constrained skyline queries4. Dynamic skyline queries5. K-dominating queries
HKU CSIS DB Seminar
Ranked Skyline A ranked skyline returns K skyline points that
have minimum/max score according to a function f
In our example, f = 3*Dist + 7*Area Return the top K homes
Though skylines are returning interesting points, we may want the most interesting points according to our own preferences, especially the data set is large(say hotels) and skyline is also large!
HKU CSIS DB Seminar
Constrained Skyline Returning the most interesting points in a specific
data space
HKU CSIS DB Seminar
Dynamic Skyline Returning update skyline dynamically E.g. Ask for hotels with minimum distance
and price (again?) Minimum distance is now depends on my
current location
HKU CSIS DB Seminar
Enumerating Skyline Enumerate queries return, for each skyline
point p, the number of points dominated by p Sometime useful if you want to know this
skyline hotel C has dominated 1000 hotels, and another hotel Y dominated only 1 hotel maybe C is better than Y in many properties (e.g.
price, dist, etc), but Y has only 1 properties better than C, e.g. with PS2
HKU CSIS DB Seminar
Experimental Evaluation
Running time comparison on progressive algorithms without NN approaches
Index
Bitmap
D&C
BNL
HKU CSIS DB Seminar
Conclusion Introduction the skyline queries How to implement (support) the skyline
operator in DBMS? Variation of skyline queries If the information are placed in different
places, how to answer skyline queries on a mobile device?
HKU CSIS DB Seminar
References S.Borzonyi, D.Kossmann, K.Stocker. The Skyline Operat
or. ICDE 2001. K.L. Tan, P.K. Eng, B.C. Ooi . Efficient Progressive Skyli
ne Computation. VLDB 2001. D.Kossmann, F.Ramsak, S. Rost. Shooting Stars in the
Sky: An Online Algorithm for Skyline Queries. VLDB 2002.
D.Papadias, Y. Tao, G. Fu, B. Seeger. An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD 2003.