Progressive Computation of The Min-Dist Optimal-Location Query

45
Progressive Progressive Computation of The Computation of The Min-Dist Min-Dist Optimal-Location Query Optimal-Location Query Donghui Zhang Donghui Zhang , , Yang Du, Tian Xia, Yufei Tao* Yang Du, Tian Xia, Yufei Tao* Northeastern University Northeastern University * Chinese University of Hong Kong * Chinese University of Hong Kong VLDB’06, Seoul, Korea

description

Progressive Computation of The Min-Dist Optimal-Location Query. Donghui Zhang , Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong. VLDB ’ 06, Seoul, Korea. Motivation. - PowerPoint PPT Presentation

Transcript of Progressive Computation of The Min-Dist Optimal-Location Query

Page 1: Progressive Computation of The Min-Dist  Optimal-Location Query

Progressive Computation Progressive Computation of The Min-Dist of The Min-Dist

Optimal-Location QueryOptimal-Location Query

Donghui ZhangDonghui Zhang, ,

Yang Du, Tian Xia, Yufei Tao*Yang Du, Tian Xia, Yufei Tao*

Northeastern UniversityNortheastern University

* Chinese University of Hong Kong* Chinese University of Hong Kong

VLDB’06, Seoul, Korea

Page 2: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 2

MotivationMotivation

• “ What is the optimal location in Boston area to build a new McDonald’s store?”

• Suppose a customer drives to the closest McDonald’s.

• Optimality: Minimize AVG driving distance.

Page 3: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 3

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.

200

200

600

600

Page 4: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 4

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.• With new site l1: AD(l1) = (30+30+600+600)/4 = 315.

30600

60030

l1

Page 5: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 5

min-dist OLmin-dist OL

• Without any new site: AD = (200+200+600+600)/4 = 400.• With new site l1: AD(l1) = (30+30+600+600)/4 = 315.• With new site l2 : AD(l2) = (200+200+30+30)/4 = 115.

3030

l2200

200

Page 6: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 6

Formal DefinitionFormal Definition

• Given a set S of sites, a set O of objects, and a query range Q ,

• min-dist OL is a location l Q which minimizes

distance between o and its nearest site

OolSodNN

OlAD }){,(

||

1)(

Page 7: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 7

L1 DistanceL1 Distance

• d(o, s) = |o.x – s.x|+|o.y – s.y|

Page 8: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 8

ChallengingChallenging

1. There are infinite number of locations in Q. How to produce a finite set of candidates (yet keeping optimality)?

2. How to avoid computing AD(l) for all candidates?

Page 9: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 9

Solution HighlightsSolution Highlights

1. Algorithm to compute AD(l).2. Theorems to limit #candidates.3. Lower-bound of AD(l) for all

locations l in a cell C.4. Progressive algorithm.

Page 10: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 10

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

l

RNN(l)=AD=AD(l)

Page 11: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 11

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

l

RNN(l)={o7, o8}AD(l) < AD

Page 12: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 12

1. Compute 1. Compute AD(l)AD(l)

• Remember

• Define

OoSodNN

OAD ),(

||

1

OolSodNN

OlAD }){,(

||

1)(

• AD(l)=AD - ?

• Let RNN(l) be the objects “attracted” by l.• AD(l)=AD if RNN(l)=

Average savings for customers in RNN(l)

Page 13: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 13

1. Compute 1. Compute AD(l)AD(l)

• Theorem

)()),(),((

||

1)(

lRNNolodSodNN

OADlAD

• S and O are “static” versus l.– AD can be pre-computed.– So is dNN(o, S)

• To compute AD(l):– Find RNN(l) oRNN(l), compute d(o, l)

Page 14: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 14

2. Limit #candidates2. Limit #candidates

• Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections!

Q

Page 15: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 15

2. Limit #candidates2. Limit #candidates

• Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections!

5x6=30 candidates

Q

Page 16: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 16

2. Limit #candidates2. Limit #candidates• Proof idea: suppose the OL is not, move it

will produce a better (or equal) result.

l

• Consider RNN(l).

δ

• Move to the right saves total dist.

Page 17: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 17

2. VCU(2. VCU(QQ))

• A spatial region, enclosing the objects closer to Q than to sites in S.

• It’s the Voronoi cell of Q versus sites in S.

Page 18: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 18

2. Further Limit #candidates2. Further Limit #candidates

• Only consider objects in VCU(Q).

5x6=30 candidates

Page 19: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 19

2. Further Limit #candidates2. Further Limit #candidates

5x6=30 candidates

• Only consider objects in VCU(Q).

Page 20: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 20

2. Further Limit #candidates2. Further Limit #candidates

4x4=16 candidates

• Only consider objects in VCU(Q).

Page 21: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 21

Naïve AlgorithmNaïve Algorithm

• Derive candidates.• Compute AD(l) for each.• Pick smallest.

• Not efficient! Too many candidates! To compute AD(l) for each one, need:• compute RNN(l)• retrieve all these objects…

Page 22: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 22

Progressive IdeaProgressive Idea

• Treat Q as a cell and consider its corners.

Page 23: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 23

Progressive IdeaProgressive Idea

• Divide the cell.

Page 24: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 24

Progressive IdeaProgressive Idea

• Divide the cell.

Page 25: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 25

Progressive IdeaProgressive Idea

• Recursively divide a sub-cell.

Page 26: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 26

Progressive IdeaProgressive Idea

• Recursively divide a sub-cell.

• Able to check all candidates.

Page 27: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 27

Progressive IdeaProgressive Idea• Q: What do you save?• A: Cell pruning, if its lower bound AD(l0) of some candidate l0.

AD(lo ) =50

Suppose 60 is a lower bound for AD(l), l C

Page 28: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 28

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

c

Page 29: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 29

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• Theorem: 4

}2

)()(,

2

)()(max{ 3241 pcADcADcADcAD

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

is a lower bound, where p is perimeter.

• e.g. LB(C)=3500-p/4

c

Page 30: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 30

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• A better lower bound Theorem:

||

|)(|*

4}

2

)()(,

2

)()(max{ 3241

O

CVCUpcADcADcADcAD

• Comparing with the previous lower bound:• Higher quality since the lower bound is larger.• More computation.

Page 31: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 31

4. The Progressive Algorithm4. The Progressive Algorithm

1. Maintain a heap of cells ordered by LB(). Initially one cell: Q.

2. Maintain the best candidate lopt3. Pick the cell with minimum LB() and

partition it.4. Compute AD() for the corners of sub-cells.5. Compute LB() for the sub-cells.

6. Insert sub-cell ci to heap if LB(ci)<AD(lopt)7. Goto 3.

Page 32: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 32

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best corner of Q)

LB(Q)

AD( real OL ) is inside the interval

Page 33: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 33

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

LB(Q)

AD( real OL ) is inside the interval

Page 34: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 34

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

Min{ LB(C) | C in heap }

AD( real OL ) is inside the interval

• User may choose to terminate any time.

Page 35: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 35

Batch PartitioningBatch Partitioning

• To partition a cell, should partition into multiple sub-cells.

• Reason: to compute AD(l), need to access the R*-tree of objects. When access the R*-tree, want to compute multiple AD(l).

• Tradeoff: if partition too much: wasteful! Since some candidates could be pruned.

Page 36: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 36

Performance SetupPerformance Setup

• O: 123,593 postal addresses in Northeastern part of US. Stored using an R*-tree.

• S: randomly select 100 sites from O.• Buffer: 128 pages.• Dell Pentium IV 3.2GHz.• Query size: 1% in each dimension.

Page 37: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 37

4x4=16 candidates

• Only consider objects in VCU(Q).

2. Further Limit #candidates2. Further Limit #candidates

Page 38: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 38

Effect of VCU ComputationEffect of VCU Computation

Page 39: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 39

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• Theorem: 4

}2

)()(,

2

)()(max{ 3241 pcADcADcADcAD

AD(c1)=1000 AD(c2)=3000

AD(c3)=4000 AD(c4)=2500

is a lower bound, where p is perimeter.

• e.g. LB(C)=3500-p/4

c

Page 40: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 40

3. LB(3. LB(CC): lower bound for ): lower bound for AD(AD(ll), ), llCC

• A better lower bound Theorem:

||

|)(|*

4}

2

)()(,

2

)()(max{ 3241

O

CVCUpcADcADcADcAD

• Comparing with the previous lower bound:• Higher quality since the lower bound is larger.• More computation.

Page 41: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 41

Comparison of Lower BoundsComparison of Lower Bounds

Page 42: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 42

Effect of Batch PartitioningEffect of Batch Partitioning

Page 43: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 43

ProgressivenessProgressiveness

• The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining.

Time

AD(best candidate)

Min{ LB(C) | C in heap }

AD( real OL ) is inside the interval

• User may choose to terminate any time.

Page 44: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 44

ProgressivenessProgressiveness

•Each step: partition a cell to 40 sub-cells.•After 200 steps, accurate answer.•After 20 steps, answer is 1% away from optimal.

Page 45: Progressive Computation of The Min-Dist  Optimal-Location Query

Donghui Zhang et al. Optimal Location Query 45

ConclusionsConclusions

• Introduced the min-dist optimal-location query.

• Proved theorems to limit the number of candidates.

• Presented lower-bound estimators.• Proposed a progressive algorithm.