Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia,...

Post on 17-Dec-2015

222 views 0 download

Tags:

Transcript of Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia,...

Progressive Computation Progressive Computation of The Min-Dist of The Min-Dist

Optimal-Location QueryOptimal-Location Query

Donghui ZhangDonghui Zhang, ,

Yang Du, Tian Xia, Yufei Tao*Yang Du, Tian Xia, Yufei Tao*

Northeastern UniversityNortheastern University

* Chinese University of Hong Kong* Chinese University of Hong Kong

VLDB’06, Seoul, Korea

Donghui Zhang et al. Optimal Location Query 2

MotivationMotivation

• “ What is the optimal location in Boston area to build a new McDonald’s store?”

• Suppose a customer drives to the closest McDonald’s.

• Optimality: Minimize AVG driving distance.

Donghui Zhang et al. Optimal Location Query 3

Who will be interested?Who will be interested?

• Corporations– Chained restaurants (e.g. McDonald’s, Burger

King, Starbucks)– Supermarkets (e.g. Wal-Mart, Costco, Stop &

Shop)– Location-based service providers (e.g. Verizon,

AT&T)

• Computer Scientists especially in– Databases– Computational Geometry– Algorithms

Donghui Zhang et al. Optimal Location Query 4

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.

200

200

600

600

Donghui Zhang et al. Optimal Location Query 5

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.• With new site l1: AD(l1) = (30+30+600+600)/4 = 315.

30600

60030

l1

Donghui Zhang et al. Optimal Location Query 6

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.• With new site l1: AD(l1) = (30+30+600+600)/4 = 315.• With new site l2 : AD(l2) = (200+200+30+30)/4 = 115.

3030

l2200

200

Donghui Zhang et al. Optimal Location Query 7

Formal DefinitionFormal Definition

• Given a set S of sites, a set O of objects, and a query range Q ,

• min-dist OL is a location l Q which minimizes

distance between o and its nearest site

OolSodNN

OlAD }){,(

||

1)(

• “Solution”: compute all AD(l). But…

Donghui Zhang et al. Optimal Location Query 8

ChallengingChallenging

1. There are infinite number of locations in Q! How to produce a finite set of candidates (yet keeping optimality)?

2. How to avoid computing AD(l) for all candidates?

Donghui Zhang et al. Optimal Location Query 9

Solution HighlightsSolution Highlights

1. Algorithm to compute AD(l).2. Theorems to limit #candidates.3. Lower-bound of AD(l) for all

locations l in a cell C.4. Progressive algorithm.

Donghui Zhang et al. Optimal Location Query 10

L1 DistanceL1 Distance

• d(o, s) = |o.x – s.x|+|o.y – s.y|

Donghui Zhang et al. Optimal Location Query 11

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

l

RNN(l)=AD=AD(l)

Donghui Zhang et al. Optimal Location Query 12

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

l

RNN(l)={o7, o8}AD(l) < AD

Donghui Zhang et al. Optimal Location Query 13

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• AD(l)=AD - ?

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

Average savings for customers in RNN(l)

Donghui Zhang et al. Optimal Location Query 14

1. Compute 1. Compute AD(l)AD(l)

• Theorem

)()),(),((

||

1)(

lRNNolodSodNN

OADlAD

• S and O are “static” versus l.– AD can be pre-computed.– So is dNN(o, S)

• To compute AD(l):– Find RNN(l) oRNN(l), compute d(o, l)

Donghui Zhang et al. Optimal Location Query 15

How to compute RNN(How to compute RNN(ll)?)?

• This is an implementation detail, dealing with computational geometry and spatial databases.

• Naïve solution: o O , compare with all sites and l.

• More efficient: 1. Compute Voronoi cell of l.2. Retrieve objects inside the Voronoi cell

using a range search on R-tree.

Donghui Zhang et al. Optimal Location Query 16

How to compute RNN(How to compute RNN(ll)?)?(1) Compute Voronoi cell(1) Compute Voronoi cell

• Remember: RNN(l) is the set of objects close to l than to any existing site in S.

• Consider all sites. Draw a spatial region close to l than to any site.

l

Donghui Zhang et al. Optimal Location Query 17

How to compute RNN(How to compute RNN(ll)?)?(2) Retrieve objects(2) Retrieve objects

• Standard range search.• Any spatial access methods, e.g. R-

tree.

Donghui Zhang et al. Optimal Location Query 18

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

d

e f

g h

i j

k

l

m

Range query: find the objects in a given range.E.g. find all hotels in Boston.

No index: scan through all objects. NOT EFFICIENT!

Donghui Zhang et al. Optimal Location Query 19

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

d

e f

g h

i j

k

l

m

l m

E7

i j k

E6

E6 E7

Minimum Bounding Rectangle (MBR)

Donghui Zhang et al. Optimal Location Query 20

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

aE3

d

e f

g h

i j

k

l

m

E4

E5

E6

E7

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

Donghui Zhang et al. Optimal Location Query 21

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

Donghui Zhang et al. Optimal Location Query 22

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

Donghui Zhang et al. Optimal Location Query 23

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

Donghui Zhang et al. Optimal Location Query 24

20 4 6 8 10

2

4

6

8

10

x axis

y axis

b

c

a

E1d

e f

g h

i j

k

l

m

E2

a b c d e

E1 E2

E3 E4 E5

Root

E1 E2

E3E4

f g h

E5

l m

E7

i j k

E6

E6 E7

Donghui Zhang et al. Optimal Location Query 25

2. Limit #candidates2. Limit #candidates

• Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections!

Q

Donghui Zhang et al. Optimal Location Query 26

2. Limit #candidates2. Limit #candidates

• Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections!

5x6=30 candidates

Q

Donghui Zhang et al. Optimal Location Query 27

2. Limit #candidates2. Limit #candidates• Proof idea: suppose the OL is not, move it

will produce a better (or equal) result.

l

• Consider RNN(l).

δ

• Move to the right saves total dist.

Donghui Zhang et al. Optimal Location Query 28

2. VCU(2. VCU(QQ))

• A spatial region, enclosing the objects closer to Q than to sites in S.

• It’s the Voronoi cell of Q versus sites in S.

Q

Donghui Zhang et al. Optimal Location Query 29

2. Further Limit #candidates2. Further Limit #candidates

• Only consider objects in VCU(Q).

5x6=30 candidates

Donghui Zhang et al. Optimal Location Query 30

2. Further Limit #candidates2. Further Limit #candidates

5x6=30 candidates

• Only consider objects in VCU(Q).

Donghui Zhang et al. Optimal Location Query 31

2. Further Limit #candidates2. Further Limit #candidates

4x4=16 candidates

• Only consider objects in VCU(Q).

Donghui Zhang et al. Optimal Location Query 32

Naïve AlgorithmNaïve Algorithm

• Derive candidates.• Compute AD(l) for each.• Pick smallest.

• Not efficient! Too many candidates! To compute AD(l) for each one, need:• compute RNN(l)• retrieve all these objects…

Donghui Zhang et al. Optimal Location Query 33

Progressive IdeaProgressive Idea

• Treat Q as a cell and consider its corners.

Donghui Zhang et al. Optimal Location Query 34

Progressive IdeaProgressive Idea

• Divide the cell.

Donghui Zhang et al. Optimal Location Query 35

Progressive IdeaProgressive Idea

• Divide the cell.

Donghui Zhang et al. Optimal Location Query 36

Progressive IdeaProgressive Idea

• Recursively divide a sub-cell.

Donghui Zhang et al. Optimal Location Query 37

Progressive IdeaProgressive Idea

• Recursively divide a sub-cell.

• Able to check all candidates.

Donghui Zhang et al. Optimal Location Query 38

Progressive IdeaProgressive Idea• Q: What do you save?• A: Cell pruning, if its lower bound AD(l0) of some candidate l0.

AD(lo ) =50

Suppose 60 is a lower bound for AD(l), l C

Donghui Zhang et al. Optimal Location Query 39

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

c

Donghui Zhang et al. Optimal Location Query 40

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• Theorem: 4

}2

)()(,

2

)()(max{ 3241 pcADcADcADcAD

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

is a lower bound, where p is perimeter.

• e.g. LB(C)=3500-p/4

c

Donghui Zhang et al. Optimal Location Query 41

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• A better lower bound Theorem:

||

|)(|*

4}

2

)()(,

2

)()(max{ 3241

O

CVCUpcADcADcADcAD

• Comparing with the previous lower bound:• Higher quality since the lower bound is larger.• More computation.

Donghui Zhang et al. Optimal Location Query 42

4. The Progressive Algorithm4. The Progressive Algorithm

1. Maintain a heap of cells ordered by LB(). Initially one cell: Q.

2. Maintain the best candidate lopt3. Pick the cell with minimum LB() and

partition it.4. Compute AD() for the corners of sub-cells.5. Compute LB() for the sub-cells.

6. Insert sub-cell ci to heap if LB(ci)<AD(lopt)7. Goto 3.

Donghui Zhang et al. Optimal Location Query 43

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best corner of Q)

LB(Q)

AD( real OL ) is inside the interval

Donghui Zhang et al. Optimal Location Query 44

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

LB(Q)

AD( real OL ) is inside the interval

Donghui Zhang et al. Optimal Location Query 45

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

Min{ LB(C) | C in heap }

AD( real OL ) is inside the interval

• User may choose to terminate any time.

Donghui Zhang et al. Optimal Location Query 46

Batch PartitioningBatch Partitioning

• To partition a cell, should partition into multiple sub-cells.

• Reason: to compute AD(l), need to access the R*-tree of objects. When access the R*-tree, want to compute multiple AD(l).

• Tradeoff: if partition too much: wasteful! Since some candidates could be pruned.

Donghui Zhang et al. Optimal Location Query 47

Performance SetupPerformance Setup

• O: 123,593 postal addresses in Northeastern part of US. Stored using an R*-tree.

• S: randomly select 100 sites from O.• Buffer: 128 pages.• Dell Pentium IV 3.2GHz.• Query size: 1% in each dimension.

Donghui Zhang et al. Optimal Location Query 48

4x4=16 candidates

• Only consider objects in VCU(Q).

2. Further Limit #candidates2. Further Limit #candidates

Donghui Zhang et al. Optimal Location Query 49

Effect of VCU ComputationEffect of VCU Computation

Donghui Zhang et al. Optimal Location Query 50

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• Theorem: 4

}2

)()(,

2

)()(max{ 3241 pcADcADcADcAD

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

is a lower bound, where p is perimeter.

• e.g. LB(C)=3500-p/4

c

Donghui Zhang et al. Optimal Location Query 51

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• A better lower bound Theorem:

||

|)(|*

4}

2

)()(,

2

)()(max{ 3241

O

CVCUpcADcADcADcAD

• Comparing with the previous lower bound:• Higher quality since the lower bound is larger.• More computation.

Donghui Zhang et al. Optimal Location Query 52

Comparison of Lower BoundsComparison of Lower Bounds

Donghui Zhang et al. Optimal Location Query 53

Effect of Batch PartitioningEffect of Batch Partitioning

Donghui Zhang et al. Optimal Location Query 54

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

Min{ LB(C) | C in heap }

AD( real OL ) is inside the interval

• User may choose to terminate any time.

Donghui Zhang et al. Optimal Location Query 55

ProgressivenessProgressiveness

•Each step: partition a cell to 40 sub-cells.•After 200 steps, accurate answer.•After 20 steps, answer is 1% away from optimal.

Donghui Zhang et al. Optimal Location Query 56

ConclusionsConclusions

• Introduced the min-dist optimal-location query.

• Proved theorems to limit the number of candidates.

• Presented lower-bound estimators.• Proposed a progressive algorithm.