Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang...

29
Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1 , Heng Tao Shen 1 , Xiaofang Zhou 1 , Yu Zheng 2 , Xing Xie 2 1 The University of Queensland 2 Microsoft Research, Asia

Transcript of Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang...

Page 1: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Searching Trajectories by Locations

– An Efficiency Study

Zaiben Chen1, Heng Tao Shen1, Xiaofang Zhou1, Yu Zheng2, Xing Xie2

1 The University of Queensland2 Microsoft Research, Asia

Page 2: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Outline

Research problem & application scenarios Basic ideas

K Best-Connected Trajectory (k-BCT) query The Incremental k-NN Algorithm (IKNN)

Performance study Best-first Depth-first

Optimization & extension Experiments Conclusion

Page 3: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Research Problem: Searching Trajectory Databases

GPS trajectories collected by GeoLife Project, MSRA

How to retrieve the trajectories we want?

Page 4: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Searching Trajectory Databases

Search by a location

Search by a sample trajectory

Frentzos et al. Geoinfomatica07; Dfoser et al. VLDB00. (R-tree variants)

Chen et al, SIGMOD05; Vlachos et al, ICDE02; Yi et al, ICDE98, etc. (Similarity)

Page 5: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Searching Trajectory Databases

The problem we study: Searching by multiple locations

To find trajectories that are ‘close’ to all the locations Technically, it is an extension of the single-location based query. But more complicated. Practically, it produces a more general way to search trajectories.

Two extreme cases (one location, many locations)

Page 6: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Application motivations

The Microsoft GeoLife Projecthttp://research.microsoft.com/en-us/projects/geolife/

GeoLife is a location-based service built on Microsoft Virtual Earth.

Our work benefits the following two functions

(1) Travel recommendation

E.g. To help a visitor planning a trip to multiple attractions by considering other’s traveling trajectories.

(2) Sharing life experiences & friend recommendation

E.g. To find out which users share the similar daily route through Queens Plaza, Central Stat., Mains St.

Page 7: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Application motivations

Geo-Coding:From Pictures to Coordinates

The recommended route

Page 8: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Application motivations

Geo-Coding:From Pictures to Coordinates

The recommended route

The first step: to define the closeness (i.e. distance) between a trajectory and locations

Page 9: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Similarity Function

The similarity function reflects how close a trajectory is to the given locations, and we call the most similar trajectory the best-connected trajectory. Step 1. find out the closest trajectory point on R to each location qi

Step 2. sum up the contribution of each matched pair. (unordered query)

Distq(qi, R) is the shortest distance from qi to R

Q={q1, q2, … qm}, R={p1, p2, … pn}

Page 10: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Problem Definition

k-Best Connected Trajectory (k-BCT) query

Given a set of trajectories T = {R1, R2, … , Rn}, a set of query locations

Q = {q1, q2, … ,qm}, and the similarity function Sim(Q, R), the k-BCT query is to find the k trajectories among T that have the highest similarity.

Assumption:

The number of query locations is small. (m is a small constant)

Intuition:

The k-BCT result is the JOIN of m single-location based queries.

Page 11: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Basic ideas

Incremental k-NN Algorithm (IKNN)

Step 1. Index all the trajectory points by one single R-tree Get the shortest distance from a query location to the trajectories

Step 2. Search for the λ-nearest neighbor (λ-NN) of each query location (q1 to qm), by using any traditional k-nearest neighbor algorithm over R-tree.

For any trajectory that scanned by a λ-NN, it’s shortest distance to the query point is known.

Candidate set C = {all scanned trajectories}

Page 12: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

IKNN algorithm

Step 3. Construct lower bounds of similarity.

For a trajectory R1 in C, assume it got 3 points p1, p2 and p3 scanned by the λ-NN search of q1, q2.

R1

p1 p2

Sim(Q, R1) = e-|q1, p1| + e-|q2, p2| + e-|q3, p5|

p3

q1q2 q3

p5

≥ e-|q1, p1| + e-|q2, p2|

Page 13: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

The Incremental k-NN algorithm

Step 4. Construct upper bound of similarity.

For any trajectory that is not covered by the λ-NN search, e.g. R5

it’s distance to qi must be larger than the radius of qi

R1

Sim(Q, R5) = e-|q1, R5| + e-|q2, R5| + e-|q3, R5| ≤ e-radius1+ e-radius2 + e-radius3

q1q2 q3

R5

radius1 radius2 radius3

Page 14: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

The Incremental k-NN algorithm

Step 5. Check the STOP condition (pruning condition)

For a k-BCT query, if we can get k candidate trajectories whose lower bounds are not less than the upper bound of similarity for all un-scanned trajectories ,

then the k best-connected trajectories must be included in the candidate set.

if the condition is satisfied

go to the refinement step

else

increase λ by some Δ

repeat the search process

With the search region of the λ-NN search enlarges, eventually k best-connected trajectories will be found.

Page 15: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Problem

The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location.

(let λ be a constant k initially, and Δ be k as well)

round 1: 1 – k nearest neighbors

round 2: 1 – 2k nearest neighbors

round i: 1 – i*k nearest neighbors

Trajectory points are visited multiple times.

Normally, λ >> k, so the complexity is λ^2.

Page 16: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Problem

The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location.

(let λ be a constant k initially, and Δ be k as well)

round 1: 1 – k nearest neighbors

round 2: 1 – 2k nearest neighbors

round i: 1 – i*k nearest neighbors

Normally, λ >> k, so the complexity is lambda square.

Can we reduce the overlapped search regions?

Page 17: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Efficiency study of the IKNN

Adaption of the λ-NN algorithm The best-first nearest neighbor search [Hjaltason et al., TODS99]

A priority queue is maintained to store all the R-tree entries that have yet to be visited, using the MINDIST as a key. So it visits MBRs/Objects in the order of the MINDIST.

The depth-first nearest neighbor search [Roussopoulos et al., SIGMOD95]

It recursively traverses the R-tree level by level in a depth-first manner, while maintaining a global list of k nearest candidates found so far.

Estimate the performance of the IKNN adopting different λ-NN algorithms

Page 18: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Adaption of the λ-NN algorithm

The best-first NN search Retrieve the λ, λ+∆, λ+2∆, … NN for each query location incrementally

until the k best-connected trajectories are included in the candidate set.

Benefit

The λ-NN is returned in an incremental way

I/O optimal, no overlap occurs, Vsum = λ

Shortcoming

Memory consumption is NOT guaranteed. A priority queue is maintained to store all the R-tree entries that have yet to be visited. The queue may be as large as the whole dataset in an extreme case.

Page 19: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

The best-first strategy

Performance (R-tree leaf access) Estimate the circle region (with radius r) that contains λ points [Belussi

et al. VLDB95]

Estimate the leaf access of a range query with radius r [Korn et al. TKDE2001]

m independent λ-NN queries

q

λ objects

radius

Page 20: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Adaption of the lambda-NN algorithm

The depth-first NN search Every time we search for the λ+∆ NN, we have to re-visit the search

region of the λ-NN query.

Benefit: Guaranteed memory usage, O(c LogcN)

Drawback: Too many overlaps

A simple improvement: Double λ at each round, to reduce the number of rounds and amortize cost.

Pruning: All MBRs whose MAXDIST is even smaller than the current search range of λ-NN can be skipped in the search of λ+∆ NN.

Page 21: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

The depth-first strategy

Performance (R-tree leaf access)

The search region is not necessary a circle! So we can not use the previous method directly. Estimate the size of the first visited

MBR (at any level) that contains not less

than λ points Estimate the radius (MAXDIST) of the

region that contains the MBR

MBR1

qi

MAXDIST

R-tree nodes outside the circle with radius MAXDIST wont be visited.

Page 22: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

The depth-first strategy (cont.)

Performance Estimate the leaf access of a range query with radius MAXDIST [Korn et

al. TKDE2001]

Finally,

Page 23: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Summary

IKNN algorithm Memory usage Object visits Leaf access

The best-first strategy

no guarantee m × O(λ)

The depth-first strategy

O(logN * c) m × O(λ)

The best-first strategy, although has no guarantee in memory usage, it normally runs faster and the priority queue can still be accommodated in the main memory of a modern computer easily.

The modified depth-first strategy reaches nearly the same performance as that of the best-first strategy, while it still preserves a low memory consumption

Page 24: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Optimization & Extension

Considering the importance of the query locations and assigning different weights in exploring objects.

Extension to query locations with an order specified

Page 25: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Experiments

12, 653 trajectories (1,147,116 points) collected by the Geolife project

Number of query locations: 2 to 10 Tests are conducted on PC with 2.1GHz CPU and 1GB memory

Page 26: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Experiments – Node Access

Page 27: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Experiments – Query Time

Page 28: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Experiments – Memory Usage

Page 29: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.

Thank you