Post on 15-Jul-2015
INI Lab.
An Optimal and Progressive Algorithm for Skyline QueriesDimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger
ACM SIGMOD’ 2003
PresentersKYEONG SEOK HYUN,
WOO-SUNG CHOI,
JA-YEON KIM,
Ab
stra
ct
An Optimal
and Progressive Algorithm
for Skyline Queries
Using R-Tree
con
ten
ts1. Introduction
2. Related Work
2.1 Block Nested Loop (BNL)
2.5 Nearest Neighbor (NN)
3. Branch and Bound Skyline Algorithm
With I/O analysis
5. Experimental Evaluation
Skyline
Problem definition
Wh
ich
on
e d
o yo
u p
refe
r?
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://emperia.egloos.com/m/2516211
5,000 Won
40,000 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
혜자>> 창렬
pre
lim
ina
ries
Formal definition of Dominates (≪)
Given a set of d-dimensional points 𝑇
We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇
If and only if
∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≧ 𝑡2[𝑖]
∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 > 𝑡2[𝑗]
and Denoted by t2 ≪ t1
(simply saying, t1 이이득)
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note thatthe meaning of ‘dominates’ may differ
according to type of application
Wh
ich
on
e d
o yo
u p
refe
r?
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://emperia.egloos.com/m/2516211
5,000 Won
40,000 Won
4,500 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
Still혜자 >> 창렬
Hotel(attraction, 1/price, 1/distance)
Two Hotel
A : `80`, `1/15,000`, `1/500m`
B : `30`, `1/20,000`, `1/1500m`
𝐵 ≪ 𝐴
Why?
30<80
1/20,000 < 1/15,000
1/1,500m < 1/500m
A
1/p
rice
attraction
BAB
Dominates!
≪
for example,
Very important
Pro
ble
m D
efin
itio
n(m
ath
ema
tica
l)The Skyline operator
Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}
Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝑝𝑖 ≪ 𝑝∗}
A
B
C
D
E
F
Dominating Area(B)
x axis
yax
is
G
Common misconceptions“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≫ 𝐶 , D, F” , wrong
“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡, s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≫ 𝐵”, correct
Naïve approach
for processing skyline queries
Exh
aust
ive
Test
Suppose there are n objects in the given set
𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}
Algorithm -Naïve 1
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷
𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷
𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;
𝑒𝑙𝑠𝑒
𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;
break;
𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥} A
B
C
D
E
F
G
Suppose there are n objects in the given set
𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}
Algorithm -Naïve 1
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷
𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷
𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;
𝑒𝑙𝑠𝑒
𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;
break;
𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥}
Exh
aust
ive
Test
Nes
ted
Lo
op
Str
uct
ure
Modification: (Algorithm -Naïve 2)
Idea 1. Use Nested Loop StructureIdea 2. Take advantage of ‘Block-transfer’
towards better re-usability!
Block A
Block B
A
B
C
D
E
F
G
The Inherited Limitation of these approaches
1. It needs full-scan over the data
2. Though, query result containsonly a small fraction of the dataset
3. That is, these approaches are wasteful
R-Tree Index Approach
for processing skyline queries
Pre
lim
ina
ries
R-Tree
Nearest Neighbor Query
Pre
lim
ina
ries
R-Tree: Balanced tree for indexing multi-dimensional object
Support Dynamic operation (insert, update, delete)
R-Tree Index Approach
R-TreeVS
B-Tree
B+-Tree
Balanced
Requiring that all leaves be at the
same depth
Leaf nodes contain one
dimensional value
R-Tree
Similar to B+-Tree
Leaf nodes contain d-dimensional
value
http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
R-Tree Index Approach
Spatial objects (or d-dimensional objects or geometric objects)
d-dimensional object? R-Tree Used for the Organization of
a set of d-dimensional objects
How? Main Idea
Minimum Bounding Rectangles (MBRs)
http://caversham.otago.ac.nz/research/geog.php
<Objects in 2-dimension space>
Qu
izWhat is the minimum number of points for representing
a rectangle?
Assumption: each rectangle is parallel to the coordinate axes
18
6 8
4
7
x
y
0
R-Tree Index Approach
Demonstration
R-Tree Simulator
Nea
rest
Nei
ghb
or
(NN
) Q
uer
y P
roce
ssin
g u
sin
g R
-Tre
e
Nearest Neighbor Query
Input
Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}
Query Point - q
Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝐿𝑝 𝑝𝑖 , 𝑞 > 𝐿𝑝(𝑝∗, 𝑞)}
0 x
y
See how it works in appendix
R-Tree Index Approach
0 x
y
MINDIST(X, 0) MINDIST(X,1)
MINMAXDIST(X, 0)
MINMAXDIST(X,1)
0 1Root node
Key
ID
EA
!Pruning!
http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html
http://www.davey.com/
Back to the original question
Skyline with R-Tree
R-T
ree
Ind
ex A
pp
roac
h Let’s process skyline objects using R-Tree
Strategy 1 – Use traditional tech. (i.e. NN Query)
Strategy 2 – This paper
Strategy 1
Partition the data using NN Query recursively
Distance metric: 𝐿1 𝑛𝑜𝑟𝑚
First NN Query -> start from the ideal point (i.e. zero point)
Strategy 1
Recursive NN Query
Dominating Area(i)
exa
mp
lea
x axis
yax
is b
c
d
e
f
g
i m
n
k
i
IDEAL
i
To-do Area 1
To-do Area 2
exa
mp
lea
x axis
yax
is b
i
k
IDEAL
i
Dominating Area(i)
TO-DO Area 2
TO-DO Area 1
To-do Area 2To-do Area 2
To-do Area 1
exa
mp
lea
x axis
yax
is b
i
k
i
Dominating Area(i)
TO-DO Area 1
TO-DO Area 2Dominating Area(k)k
IDEAL
``
Next, test these area (only to find nothing)
To-do Area 1
exa
mp
le
x axis
i
k
i
Dominating Area(i)
TO-DO Area 1
Dominating Area(k)
To-do Area 1
k
a
yax
is b
IDEAL
a
Dominating Area(a)
Dominating Area(k)
Result
Dominating Area(i)
IDEAL
Dominating Area(a)
x axis
yax
is
i
k
i
k
aa
Lim
ita
tion
of
Str
ate
gy 1
Generally speaking,
In a d-dimensional space,
Each skyline object discovered causes d recursive partitioning phase
Dominated
Lim
ita
tion
of
Str
ate
gy 1
Generally speaking,
In a d-dimensional space,
Each skyline object discovered causes d recursive partitioning phase
Area 1
Dominated
Area 2
Dominated
Area 3
Dominated
What if?
In general, for d>2
The overlapping of the partitions
Necessitates DUPLICATE ELIMINATION
Area 1
Dominated Area
2
Dominated
Area 3
Dominated
Dis
ad
van
tage
! Strategy 1 needs an additional phase
For removing redundant outputs
4 elimination methods
Laisser-faire
Propagate
Merge
Fine-grained Partitioning
They works
Problem: sub-optimal
Strategy 2
Branch & Bound Skyline Algorithm
Idea!
Similar to previous NN Query
Branch & Bound Skyline (BBS)
http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
h
example
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
L1E1 L1E2
Queue
L1E2, 4 L1E1, 10
RootPtr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Result
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
L1E2, 4 L1E2
Queue
L2E2, 5
L1E1, 10
L2E3, 7 L2E4, 8
3 5 7
2
1
9
1
RootPtr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Result
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Queue
3 5 7
2
1
9
1
L2E2, 5 L1E1, 10L2E3, 7 L2E4, 8
c, 12 h, 7 i, 5
Result
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Queue
3 5 7
2
1
9
1
L1E1, 10L2E4, 8 c, 12h, 7i, 5
Result
L2E3, 7
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Queue
3 5 7
2
1
9
1
L1E1, 10L2E4, 8 c, 12h, 7
i, 5
Result
L2E3, 7
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Queue
3 5 7
2
1
9
1
L1E1, 10L2E4, 8 c, 12
i, 5
Result
k, 10 f n i
example
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1a b c null
L2E2c h i null
L2E3d g m null
L2E4f k l n
Queue
3 5 7
2
1
9
1
i, 5
Result
a, 10 k, 10
Analysis
Strategy 1
An
alys
iso
f Str
ateg
y 1
Notation
Variable Description
s #of Skyline obj
e Empty Query
ne Non-empty Query
r Redendent Query
d d-dimension
h Height of the given R-Tree
Recursion Tree
…
d new recursive NN
… …
𝑛𝑒 = 𝑠 + 𝑟
𝑒 = 𝑛𝑒 ∙ 𝑑 − 1 + 1, 𝑠𝑖𝑛𝑐𝑒 𝑛𝑒 + 𝑒 = 𝑛𝑒 ∙ 𝑑 + 1(𝑟𝑜𝑜𝑡)
𝑒 = 𝑠 + 𝑟 𝑑 − 1 + 1
𝑁𝐴𝑁𝑁 ≥ 𝑒 + 𝑠 + 𝑟 ∗ ℎ = 𝑠 + 𝑟 𝑑 − 1 + 1 + 𝑠 + 𝑟 ℎ > 𝑠 ∙ ℎ ∙ 𝑑
Analysis
Strategy 2
An
alys
iso
f Str
ateg
y 2
(bri
ef v
ersi
on)
Notation
Variable Description
s #of Skyline obj
h Height of the given R-Tree
𝑠 ∙ ℎ ≥ 𝑁𝐴𝐵𝐵𝑆
𝑁𝐴𝑁𝑁 > 𝑠 ∙ ℎ ∙ 𝑑 > 𝑁𝐴𝐵𝐵𝑆
Is it the optimal solution?
BBS Algorithm
Proof 1.
Termination&
Correctness
Lemma 1. BBS visits entries in ascending order
Of their distance to the ‘ideal point’
Lemma 2. Any data point added into Result_Set
Is guaranteed to be a final skyline point
Proof.
Suppose not then 𝑝𝑗 was added into Result_Set but not a final skyline point
Then, ∃ 𝑝∗ ∈ 𝐷𝐵 𝑠. 𝑡, 𝑝∗ ≫ 𝑝𝑗 , which means L1 ideal, p∗ < L1(ideal, pj)
However, observe that 𝑝∗ must be visited before 𝑝𝑗 by lemma 1.
Contradiction: 𝑝𝑗 should have been pruned, which contradicts the assumption.
Lemma 3. All data point will be examined, unless one of its ancestor
nodes has been pruned.
Lem
ma
s fo
r th
e th
eore
m Lemma 4. Any skyline algorithm
based on R-Tree must access all the
nodes whose mbrs intersects the SSR
Lemma 5. If an entry e doesn’t
intersect the SSR
Then ∃𝑝∗ 𝑠. 𝑡. 𝐿1 𝑖𝑑𝑒𝑎𝑙, 𝑝∗ <
𝐿1(𝑖𝑑𝑒𝑎𝑙, 𝑒. 𝑙𝑒𝑓𝑡𝑑𝑜𝑤𝑛)
Theorem: The # of node accesses
performed by BBS is OPTIMAL
A
B
C
D
E
F
Do
min
atin
g A
rea(
B)
x ax
is
yaxis
G
SSR
Pro
of o
f th
e th
eore
mProof 1. BBS only accesses nodes that
may contain skyline points.
That is, BBS only accesses nodes
whose mbrs intersect the SSR
Suppose not
Node e that doesn’t intersect the SSR
∃𝑝∗ by lemma 5
Contradicts, by lemma 1
Proof 2. BBS visits nodes at most
once. (trivial)
A
B
C
D
E
F
Do
min
atin
g A
rea(
B)
x ax
is
yaxis
G
SSR
To q
uan
tify
th
e ac
tual
co
st
Skip the details A
B
C
D
E
F
Dominating Area(B)
x axis
yax
is
G
SSR
Experimental Evaluation
Exp
erim
enta
l E
valu
ati
on
Dim
ensi
on
alit
y
Car
din
alit
y3d dataset
Pro
gres
sive
beh
avio
rN=1M, d=3
Co
nst
rain
ed
skyl
ine
qu
erie
sN=1M, d=3
h
a
x axis
yax
is b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Constrain