SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian...
-
Upload
annabella-pitts -
Category
Documents
-
view
219 -
download
0
Transcript of SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian...
![Page 1: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/1.jpg)
SUBSKY:Efficient Computation of Skylines in Subspaces
Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei
Conference: ICDE 2006
Presenter: Kamiru
Superviosr: Dr. Nikos Mamoulis
![Page 2: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/2.jpg)
Skyline Queries
Given a set of d-dimenional points, a point p dominates another p’ if
p[i]<=p’[i], for all i in d,and p[j]<p’[j], for any j in d
Skyline queries aim to find the points that are not dominated by any point
foul rate
turnover rate
0
1
1
For the NBA database,
Low turnover rate and low foul rate are two important factors for a defense player
player
Best point
![Page 3: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/3.jpg)
Applications of Skyline Queries
Find a good hotel to me according to distance and price
price 1000 AA BB
CC
price 1500
price 500DDprice 2000
Hotel D must not a good hotel for this user, since its price is higher and distance is farther than other hotels
![Page 4: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/4.jpg)
Alternative applications of Skyline Queries - i
Some top-k queries are calculated by Skyline queries
A top-k query retrieves the k tuples in P with highest scores according to g where g must be a monotonic function, ex:
g(p) = p.x + p.y
![Page 5: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/5.jpg)
Alternative applications of Skyline Queries - i
Please help me to find who are the top-2 NBA players according to sum of their points and assists in 2007-2008 season
assists
points
20
10
100
The values are represented by right-top corner of each player photo
The results (up to Jan 23 2008) of this query are
Allen Iverson, 27+6.9
LeBron Jamesm, 29.7+7.4
Top-2 results must be in top-2 skyband
PRUNED
![Page 6: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/6.jpg)
Alternative applications of Skyline Queries - ii
Another interesting measurement is dominating count (DC)
DC is counted by the number of dominating points to a query
foul rate
turnover rate
0
1
1player
1
40
1
Ex: find the top-2 dominating players in the NBA database according to turnover rate and fould rate
2
Best point
![Page 7: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/7.jpg)
Skyline Computations
Two categories of skyline computations Computing from scratch (no index) Relied on index
1. Computing from scratch (no index) Advantage
• No any pre-computation• Not to update any index when data changed
Drawback• Must calculate from scratch
– It must scan the entire data at least once
![Page 8: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/8.jpg)
Skyline Computations
2. Relied on index Once you built, get to use it many times Lower query cost is occurred by performing the
search on an appropriate structure• B - tree• R - tree• …
Since all of us are database people, (I hope…) we prefer method 2 more
![Page 9: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/9.jpg)
Related works
1. Computing from scratch (no index) Block nested loop Sort filter skyline Divide and conquer Bitmap Linear elimination sort for skyline
![Page 10: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/10.jpg)
Related works
2. Relied on index B – tree approach
• index
R – tree approach• Nearest neighbor (NN)• Branch and bound skyline (BBS)
– BBS has been proved that is I/O optimal. It accesses fewer disk pages than any algorithm based on R-trees
![Page 11: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/11.jpg)
List y p4:0.1 p1:0.2 p3:0.3 p8:0.6
List x p5:0.1 p6:0.25 p2:0.3 p7:0.6
Related works
index
p1
p2
p3
p4
p6
p5 p7
p8
Best pointx
yPoint p adds to list i if p has the smallest value in dimension i
1) Ssky = {p5}
2) Ssky = {p5,p4}
3) Ssky = {p5,p4,p1}
• All remaining elements in List x are pruned by p1 since both coordinates of p6 is bigger than p1
• Due to the same reason, all remaining elements in List y are pruned by p1 too
![Page 12: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/12.jpg)
Related works
BBS
p1
p2p3 p4
p6
p5 p7
p8
N3
N4
N1
N2
M1
M2
M1 M2
N1 N2 N3 N4
p1 p2 p3 p4 p5 p6 p7 p8
Best point
HNN={p1,p2,N2,M2}
• p1 is the first NN object from best point
Dominant region of p1 shows in grey color
2) p2 is pruned by dominating region
3) Expand N2
4) …
Dominant region
![Page 13: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/13.jpg)
SUBSKY
According to NBA database, we have more than 10 different attributes for one player
Skyline queries may be interested in some attributes only
![Page 14: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/14.jpg)
SUBSKY
Build one R-tree and run BBS BBS is an I/O optimal algorithm based on R-tree
index, but their approaches are optimized for a fixed set of dimensions
Build R-trees for all elements in the power set of dimensions Hugh storage space
![Page 15: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/15.jpg)
SUBSKY for uniform data
Anchor point Ac– the maximal corner of the data space having maximum coordinate on all dimensions
x
y
1
1
psky
p1
p2
Ac
f(p)=max(1-p[i]),
where i is from 1 to d
fsky(psky)=min(1-psky[i]),
where i is from 1 to d
f(p2)
f(p1)
fsky(psky)
No any point p satisfying
f(p)<fsky(psky)
can belong to the skyline
Pruning region of psky
maximum value of the coordinate
Best point
A similar result exists for the skyline of any subspace
![Page 16: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/16.jpg)
SUBSKY for uniform data
Skyline queries only apply on relevant dimensions SUB
f’sky(psky)=min(1-psky[i]),
where i is in SUB Then,
f(p) < fsky(psky) <= f’sky(psky)
No any point p satisfying the above equation can belong to the skyline
![Page 17: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/17.jpg)
SUBSKY for uniform data
Assume that our skyline query is interested in dimension x and y only
First, we sort the data by f(pi) p3, p4, p1, p2, p5
Ssky={p3}, f’sky(p3)=0.5 =min(1-0.5,1-0.3) U=0.5 (largest f’ value in Ssky)
Ssky={p3,p4}, f’sky(p4)=0.1 U=0.5
Ssky={p1,p4}, f’sky(p1)=0.8 p3 is removed by adding p1, since it is dominated by p1
U=0.8
p1 p2 p3 p4 p5
x 0.2 0.4 0.5 0.9 0.6y 0.2 0.4 0.3 0.1 0.8z 0.5 0.9 0.1 0.6 0.7
f(pi) 0.8 0.6 0.9 0.9 0.4
![Page 18: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/18.jpg)
General SUBSKY
In practical, data are usually clustered If the data are clustered, then we should
expect that one anchor point cannot give us enough pruning power
x
1
1Ac
Best point
psky
A1
![Page 19: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/19.jpg)
General SUBSKY
x
Ac
psky
A1
A2
cluster s1
s2
s3
s4
Anchors for different clusters
Two questions:
1) How to find the anchors?
2) How to assign points to anchors?
1
1
Best point
![Page 20: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/20.jpg)
1
1
Best point
Ac
Finding the Anchors
First, let us see what a perfect anchor of a point p If p is assigned to A, then p can be pruned by any
skyline point dominating p
p
Major perpendicular plane
A1
A2
A3
Any point on this line is a perfect anchor of point p
Anti-dominant region of p
![Page 21: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/21.jpg)
Finding the Anchors
1
1
Best point
Ac
p1
Major perpendicular plane
p2a good anchor
For each point, find the projections to the plane Ex: p’1, p’2…
Partition the projected points into m clusters using algorithm k-means, and formulate an anchor for each cluster
p’2p’1
![Page 22: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/22.jpg)
Finding the Anchors
How to decide an anchor for a cluster?Blue points are assigned to cluster S. How can we decide the anchor for S?
1) Obtain point B, whose coordinate on each dimension equals the lowest coordinate of the points in S in their original space on this axis
2) Then, the algorithm computes the smallest square opposite to B which covers all points in S
A
B
![Page 23: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/23.jpg)
Assigning Points to Anchors
A naïve way is to assign points to their closest anchor point in the major perpendicular plane (projected space)
It is not directly quantifies the benefit of an assignment
![Page 24: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/24.jpg)
Pruning region of p2
Pruning region of p1
ER of p
Assigning Points to Anchors
In order to assign a point to a good anchor, this paper introduces a new measurement which name is effective region (ER)
1
1
Best point
Ac
All points in yellow region (ER) can make a pruning region to Ac that cover p
p
If ER-volume of p is larger, then p has more chance to be pruned
p1
p2
![Page 25: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/25.jpg)
ER of p
Assigning Points to Anchors
1
1
Best point
Ac
p
ER of p
1
1
Best point
Ac
p
A’
p1
p2
p1
p2
![Page 26: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/26.jpg)
Assigning Points to Anchors
The pruning volume size of a point p to an anchor point Aj is
∏max(0,Aj[i]-L∞(p,Aj)),
where i is from 1 to d Therefore, assign a point p to Aj that produces
the largest pruning volume size
![Page 27: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/27.jpg)
Query example
We use the same example in previous slide Assume that we have two anchors, one is Ac
and the other A’ is found by K-means (m=1) Ac=(1,1,1) and A’=(1,1,0.8)
First, we calculate the ER volume of each data point with respect to Ac and A’
p1 p2 p3 p4 p5
x 0.2 0.4 0.5 0.9 0.6y 0.2 0.4 0.3 0.1 0.8z 0.5 0.9 0.1 0.6 0.7
f(pi) 0.8 0.6 0.9 0.9 0.4
p1 p2 p3 p4 p5
Ac 8 64 1 1 216A’ 0 - 9 - 144
Unit 10-3
![Page 28: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/28.jpg)
Query example
Sorted list by f: Ac p4 p1 p2 p5
A’ p3
p1 p2 p3 p4 p5
x 0.2 0.4 0.5 0.9 0.6y 0.2 0.4 0.3 0.1 0.8z 0.5 0.9 0.1 0.6 0.7
f(pi) 0.8 0.6 0.9 0.9 0.4
p1 p2 p3 p4 p5
Ac 8 64 1 1 216A’ 0 - 9 - 144
1) Ssky={p4}, f’sky(p4)=0.5U=0.5
2) Ssky={p4, p1}, f’sky(p1)=0.8
U=0.8
![Page 29: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/29.jpg)
Experiments
3 real datasets NBA, Household, and Color
2 synthetic data (10D) Uniform Clustered
• 10 cluster centroids
• For each centroid, it takes N/10 points whose coordinate on each axis follows a Gaussian distribution with variance 0.05 and a mean equal to the corresponding coordinate of the centroid
NBA Household Color
Dimension 13 6 9
Cardinality 17k 127k 68k
![Page 30: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/30.jpg)
Experiments
![Page 31: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/31.jpg)
Experiments
![Page 32: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/32.jpg)
Experiments
3D subspaces, 1 million cardinality
3D subspaces, full-space dimensionality is 10
![Page 33: SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:](https://reader035.fdocuments.in/reader035/viewer/2022062309/56649f175503460f94c2d8aa/html5/thumbnails/33.jpg)
Conclusion
The core of SUBSKY is a transformation that convert multi-dimensional data into 1D values
Show better performance than a I/O optimized algorithm in the subspace skyline problem
Some continuous monitoring cases are good to investigate How to adopt the set of anchor points if data
update rapidly The f values could be stored in other index
structure to support fast update