1 University of Minnesota
Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor
(MTNN) Queries
Xiaobin Ma
Advisor: Shashi Shekhar
Dec, 2005
2 University of Minnesota
Outline Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results Conclusion and future work
3 University of Minnesota
Motivation GIS applications
Find shortest path Through one point from each of different feature types
4 University of Minnesota
A Running Example
Three feature types:
red(g), green(g), black(b)
q is query point
Route with solid red line is shortest route
Routes with dashed lines are other possible routes
q
5 University of Minnesota
Basic Concepts
<P1,P2,…,Pk> ordered point sequence and P1,P2,…,Pk are from k
different (feature) types of data sets R(q, P1,P2,…,Pk)
a route from q through points P1,P2,…, and Pk d(R(q, P1,P2,…,Pk))
distance of route R(q, P1,P2,…,Pk) Multi-Type Nearest Neighbor (MTNN)
ordered point sequence <P1’,P2
’,…,Pk’> such that
d(R(q,P1’,P2
’,…,Pk’)) is minimum among all possible
routes d(R(q, P1
’,P2’,…,Pk
’)) is MTNN distance MTNN query
A query finding MTNN
6 University of Minnesota
Problem Statement for MTNN Query Given:
A query point Distance metric k different (feature) types of spatial objects with data
points numbers N1, N2, N3, … ,Nk respectively R-tree for each data set
Find: Multi-type nearest neighbor (MTNN) Objective: Minimize length of route from query
point covering an instance of each feature Constraint:
Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types
Completeness: Only the shortest path is returned as the query result
7 University of Minnesota
Related Work Optimal sequence route (OSR) query [Kolahdozan
et. al. Tech 05-840 USC] Optimal algorithms (RLORD)
Focus on optimal algorithms for specified permutation of feature types
Point-based algorithms Trip plan query (TPQ) [Li et. al. SSTD 05]
Heuristic algorithms Give approximate results
8 University of Minnesota
RLORD Example q is query
point Search order
is <r, b, g> R(q,r2,b2,
g2) is greedy route
Radius of circle is d(R(q,r2,b2,g2))
qb2
b15b12b1
g2
g10g12
g13
g1
g6
g8
g11
g3
g9g14
g1g5
b6 b13
b17
b10b5
b8b9
b3
b14
b4b11
g16
g7g4
r2
r9
r10r11
r14r13r7
r4
r5
r6
r3
r12
r1
r8r15
9 University of Minnesota
RLORD Running Iterations Use backward search strategy O=<g,b,r> First iteration - examine feature type g
<g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> in a set R
Second iteration - examine next feature type in O For every point bi in black set,
iterate on every partial route <gj>in R: IF d(R(q, bi)) + d(R(bi,gj)) < d(R(q,r2,b2,g2)) THEN put <bi,gj> into a set R1
keep ordered sequence <bi,gj> in R1 such that d(R(bi,gj)) + d(R(gj)) is minimum
<b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> in a set R2
R <- R2 Examine next feature type and repeat above procedure until
all types of data are examined
10 University of Minnesota
Our Contributions
Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem
Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm
Evaluated the proposed algorithm via cost model and experiment
11 University of Minnesota
Key Ideas of PLUB Prune search space at page level Create candidate leaf page sequences Search candidate MTNN in these candidate leaf
page sequences
12 University of Minnesota
Page Level Upper Bound (PLUB) Algorithm Step 1: First upper bound search
Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy
Step 2: R-Tree search Prune search space with current upper bound and form a
set of leaf node candidate sequences, using page level pruning approach
Step 3: Subset search Search candidate MTNN in leaf node candidate sequences Go to step 2 until going thought all permutation of feature
types, using candidate MTNN distance as current upper bound
13 University of Minnesota
B1
G1
R2
R1
B2
B4
RLUB – An Example
qb2
b15b12b1
g2
g10g12
g13
g1
g6
g8
g11
g3
g9g14
g1g5
b6 b13
b17
b10b5
b8b9
b3
b14
b4b11
g16
g7g4
r2
r9
r10r11
r14
r8r15
r13r7
r4
r5
r6
r3
r12r1
Inputs q: query point Euclidean distance R-tree for each feature
B3
G2
G3
G4
R3
R4
R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees
14 University of Minnesota
B1
G1
R2
R1
B2
B4
RLUB – An Example
qb2
b15b12b1
g2
g10g12
g13
g1
g6
g8
g11
g3
g9g14
g1g5
b6 b13
b17
b10b5
b8b9
b3
b14
b4b11
g16
g7g4
r2
r9
r10r11
r14
r8r15
r13r7
r4
r5
r6
r3
r12r1
B3
G2
G3
G4
R3
R4
R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees
UB
E?
R1 B1 G1 2.04 NR1 B1 G3 6.2 YR1 B1 G4 4.27 YR1 B3 G1 7.53 YR1 B3 G3 6.54 YR1 B3 G4 4.29 YR1 B4 G1 4.02 YR2 B1 3.7 YR2 B3 G4 3.43 YR2 B4 5.17 YR4 B1 4.08 YR4 B3 7.94 YR4 B4 7.56 Y
Leaf page upper bound calculation (current search bound 3.37)
Only leaf node sequence <R1,B1,G1> left
15 University of Minnesota
B1
G1
R2
R1
B2
B4
RLUB – An Example
qb2
b15b12b1
g2
g10g12
g13
g1
g6
g8
g11
g3
g9g14
g1g5
b6 b13
b17
b10b5
b8b9
b3
b14
b4b11
g16
g7g4
r2
r9
r10r11
r14
r8r15
r13r7
r4
r5
r6
r3
r12r1
B3
G2
G3
G4
R3
R4
R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees
Search candidate MTNN in <R1,B1,G1>(time unit p-p)
1st iteration <g2><g10><g12>
<g13> Time 4
2nd iteration <b12,g13,><b1,g13>
<b2,g2><b15,g13> Time 4x4+4=20
3rd iteration <r10,b15,g13,><r9,b15,g13
><r2,b2,g2> <r11,b1,g13>
Time 4x4+4=20 Output
Shortest distance route R(q,r11,b1,g13) and distance value 3.16
16 University of Minnesota
Running Results of RLORD First iteration (time unit p-p)
<g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16>
Time 11 Second iteration
<b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13>
Time 11x12+12=144 Third iteration
<r1,b11,g3>, <r2,b2,g2>, <r3,b11,g3>, <r8,b1,g13>, <r9,b15,g13>, <r10,b15,g13>, <r11,b1,g13>, <r12,b11,g3>, <r13,b1,g13>, <r14,b1,g13>, <r15,b1,g13>
Time 12x11+11=143 R(q,r11,b1,g13) is shortest among all routes
Shortest distance value 3.16
17 University of Minnesota
Running Time Comparison Table R-R: rectangle to rectangle distance P-P: point to point distance
R-R P-P
PLUB 17 44RLORD 0 298
RLORD has no R-R distance calculation, but has much more P-P calculation
Cost of R-R < 2 x cost of P-P
18 University of Minnesota
Cost Model for PLUB (For One Permutation)
CR-T + CLF + CPN CR-T : cost of R-tree traversal to find all R-tree leaf
nodes intersected by the circle with radius of current upper bound, centered at query point q
CLF : cost of page level leaf node search for R-tree candidate leaf node sequences
CPN : cost of point level search for candidate MTNN in candidate leaf node sequences
19 University of Minnesota
CR-T Model of PLUB
CR-T : R-tree traversal cost CPR :cost of point to rectangle distance calculation N t,i : number of all the tree nodes visited in feature
type i tree traversal CR-T = CPR x Σ N t,i (i= 1, …, k)
20 University of Minnesota
CLF Model of PLUB
CLF: search of R-tree candidate leaf node sequences
NR-R : Number of leaf nodes visited in candidate leaf node sequences search
CR-R : cost of rectangle to rectangle distance calculation
CLF = NR-R x CR-R
21 University of Minnesota
CPN Model of PLUB
CPN : search MTNN in candidate leaf node sequences FLS : leaf node candidate sequence filtering ability ratio nl : average point number in leaf node for all feature types pi : page number of feature type i CP-P :cost of point to point distance calculation Cls : cost of search MTNN in single leaf node sequence
Cls = CP-P x (nl +(nl x nl) + nl + (nl x nl) + … + nl + (nl x nl) (k-1 items)
= (k-1) (nl x (nl +1)) x CPP CPN = Cls x Π pi x (1- FLS) i = 1,…,k
22 University of Minnesota
Cost Model for R-Lord (For One Permutation)
CR-T‘+ CPS CR-T‘: cost of R-tree based coarse pruning, i.e. find all
data points inside initial upper bound CR-T‘ = CR-T + CP-P x nl x (p1+ p2 +p3 +…+ pk-1+ pk ) CPS : cost of candidate MTNN search in remaining
subsets CP-P :cost of point to point distance calculation CPS = CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 )+ …
+ (pk-1+ nl x pk-1 x pk )
23 University of Minnesota
Cost Model Summary of PLUB and RLORD( one permutation)
In random or approximate random datasets, FLS is not big enough, PLUB takes more time.
In clustered datasets, FLS tends to be very big. When 1-FLS <(nl x (p1 + nl x p1xp2 +(p2+ nl x p2xp3 )+…+ (pk-1+ nl x pk-1 x pk ))) /((k-1) nl x
(nl +1) x Π pi )PLUB runs faster than RLORD For clustered datasets, it becomes true when clusters becomes more
compact Left side: remaining ratio (r-ratio) Right side: comparison ratio (c-ratio)
General Form Approximate FormPLUB CR-T + CLF + CPN CP-P x (k-1) nl x (nl +1) x Π pi x (1- FLS)
RLORD CR-T‘+ CPS
CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 ) + … + (pk-1+ nl x pk-1 x pk )
24 University of Minnesota
Experiment Design
25 University of Minnesota
Synthetic Data Sets Generation Randomly generate cluster center in rectangle with bottom-
left (0,0) and top-right point (10000,10000) Constraint: the minimum distance between two cluster centers is
minCCDist Around every cluster center, generate cluster member points
Maximum distance from member point to cluster center is ClusterSize
Simplified maximum cluster center distance is determined by: maxCCDist = 10000.0/(int)(sqrt(CN)+1)
Thus minimum cluster center distance when generating cluster center is as follows:
minCCDist = BCF x maxCCDist Then the cluster size is:
ClusterSize = ICF x minCCDist
26 University of Minnesota
Experiment Parameters
Feature Types:2-7 Between-cluster Compactness Factor (BCF):
0.1-1.0 In-cluster Compactness Factor (ICF):0.1-0.5 Cluster Number(CN):20,50,100,200
27 University of Minnesota
Synthetic Datasets Example
BCF=0.5,ICF=0.5,CN=20,Feature Type=2
BCF=0.5,ICF=0.3,CN=20,Feature Type=2
28 University of Minnesota
Experiment Setup & Data Sets Setup
C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data
Synthetic data Scalability test in terms feature types Effect of data sets density Effect of Between-cluster compactness factor Effect of In-cluster compactness factor
29 University of Minnesota
Scalability Test
Parameters Fixed:
BCF=0.1, ICF = 0.1, CN=20
Variable: feature types (2-7)
Trend PLUB is much
faster when number of features is high
30 University of Minnesota
Effect of Data Sets Density
Parameters Fixed: FT = 7,
BCF=0.1, ICF=0.5
Variable: cluster number (20,50,100,200)
Trend PLUB is always
faster than RLORD for all densities of data sets
31 University of Minnesota
Effect of Between-cluster Compactness Factor
Parameters Fixed: FT = 7,
ICF=0.3,CN=50, Variable: BCF
(0.1-1.0)
32 University of Minnesota
Effect of Between-cluster Compactness Factor
Top: execution time v.s. BCF
Trend PLUB is faster
than RLORD when BCF is less than 0.7
PLUB is slower than RLORD when BCF is bigger than 0.7
33 University of Minnesota
Effect of Between-cluster Compactness Factor
Bottom: Remaining ratio (r-ratio) and comparison ratio (c-ratio) v.s. BCF
Trend Ratios increase as
BCF increase Remaining ratio is
less than comparison ratio when BCF is less than 0.8
34 University of Minnesota
Effect of Between-cluster Compactness Factor
Contradiction? Remaining ratio
increases, which means the pruning ratio decreases, the execution time decreases
when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically
35 University of Minnesota
Effect of Between-cluster Compactness Factor
Key information when remaining
ratio is less than comparison ratio, PLUB runs faster
when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.
36 University of Minnesota
Effect of In-cluster Compactness Factor
Parameters Fixed: FT = 7,
BCF=0.1,CN=50,
Variable: ICF (0.1-0.5)
Trend PLUB is always
faster than RLORD for ICF from 0.1 to 0.5
37 University of Minnesota
Conclusion and Future Work Formalized MTNN query problem Proposed PLUB based algorithm for MTNN
query Compared PLUB and RLORD
Design heuristic algorithms to tackle MTNN query problem in large number of feature types
38 University of Minnesota
References
[1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report 05-840, 2005
[2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang-Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.
Top Related