Caching Dynamic Skyline Queries

50
July 11 SSDBM'08 Caching Dynamic Skyline Queries D. Sacharidis 1 , P. Bouros 1 , T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems – R.C. Athena

description

Caching Dynamic Skyline Queries. D. Sacharidis 1 , P. Bouros 1 , T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems – R.C. Athena. Outline. Introduction Skyline (SL) and dynamic skyline queries (DSL) Related work - PowerPoint PPT Presentation

Transcript of Caching Dynamic Skyline Queries

Page 1: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Caching Dynamic Skyline Queries

D. Sacharidis1, P. Bouros1, T. Sellis1,2

1National Technical University of Athens2Institute for Management of Information Systems – R.C. Athena

Page 2: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Outline

• Introduction– Skyline (SL) and dynamic skyline queries (DSL)

• Related work• Evaluating dynamic skyline queries

– Computing orthant skylines (OSL)– Computing dynamic skyline via caching

• LRU, LFU, LPP cache replacement policies

• Experimental evaluation• Conclusions and Future work

Page 3: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Skyline queries (SL)

• Given a dataset of d-dimensional points– SL contains points not

dominated by others– x dominates y iff x as

good as y in all dimensions and strictly better in at least one

Page 4: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Skyline queries (SL)

• Given a dataset of d-dimensional points– SL contains points not

dominated by others– x dominates y iff x as

good as y in all dimensions and strictly better in at least one

• Example– Dataset of hotels– Prefer cheap hotels

close to the sea

Distance from sea

Pric

e

Page 5: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Skyline queries (SL)

• Given a dataset of d-dimensional points– SL contains points not

dominated by others– x dominates y iff x as

good as y in all dimensions and strictly better in at least one

• Example– Dataset of hotels– Prefer cheap hotels

close to the sea

Distance from sea

Pric

e

Skyline points

Page 6: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Skyline queries (SL)

• Given a dataset of d-dimensional points– SL contains points not

dominated by others– x dominates y iff x as

good as y in all dimensions and strictly better in at least one

• Example– Dataset of hotels– Prefer cheap hotels

close to the sea

Distance from sea

Pric

e

Skyline points

p1

Page 7: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Skyline queries (SL)

• Given a dataset of d-dimensional points– SL contains points not

dominated by others– x dominates y iff x as

good as y in all dimensions and strictly better in at least one

• Example– Dataset of hotels– Prefer cheap hotels

close to the sea

Distance from sea

Pric

e

Skyline points

p1p2

Page 8: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Dynamic skyline queries (DSL)

• Extension of skyline queries– Given a query point q– DSL contains points not

dynamically dominated by others w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

• Can be treated as static SL– Transform points w.r.t. q

Page 9: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Dynamic skyline queries (DSL)

• Extension of skyline queries– Given a query point q– DSL contains points not

dynamically dominated by others w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

• Can be treated as static SL– Transform points w.r.t. q

• Example– User defines “ideal”

hotel q

Distance from sea

Pric

e

Query point q

Page 10: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Dynamic skyline queries (DSL)

• Extension of skyline queries– Given a query point q– DSL contains points not

dynamically dominated by others w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

• Can be treated as static SL– Transform points w.r.t. q

• Example– User defines “ideal”

hotel q

Distance from sea

Pric

e

Dynamic Skyline points

q

Page 11: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Dynamic skyline queries (DSL)

• Extension of skyline queries– Given a query point q– DSL contains points not

dynamically dominated by others w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

• Can be treated as static SL– Transform points w.r.t. q

• Example– User defines “ideal”

hotel q

Distance from sea

Pric

e

Dynamic Skyline points

q

p4

p5

Page 12: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Intuition (1)

• Traditional SL algorithms need to run anew for each DSL query

• Our idea– Exploit results from past queries to reduce

processing cost for future DSL queries– Cache past queries– Decide which queries in cache are useful

Page 13: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Intuition (2)

Distance from sea

Pric

e

Page 14: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Intuition (2)

• 2 past DSL queries– qa, qb

• Each query partitions space in 4 quadrants

Distance from sea

Pric

e

qa

qb

Page 15: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Intuition (3)

• A new query q arrives• Consider DSL for qa

– p1 is contained DSL(qa)– p1 dominates p2, p3, p4

• p1 lies in upper right quadrant w.r.t. qa

• qa lies in upper right quadrant w.r.t. q

• p1 dominates also p2, p3, p4 w.r.t. to q– Exclude p2, p3, p4 from

dominance test for DSL(q)

Distance from sea

Pric

e

qa

qb

p1

p2

p3

p4

q

• Shaded area denotes points dominated by p1

Page 16: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Contribution in brief

• Caching past DSL queries cannot reduce processing cost for future ones– We need more information about dominance

relationships• Introduce orthant skylines (OSL) and examine

their relationship with DSL • Extend Bitmap algorithm to compute OSL in

parallel with DSL• Cache OSL to enhance DSL queries evaluation

– Present 3 cache replacement policies• LRU, LFU, LPP

• Experimental evaluation of caching mechanism

Page 17: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Related work

• Non-indexed methods– Block-Nested Loops (BnL)– Bitmap – Multidimensional Divide and Conquer (DnC)– Sort First Scan (SFS)

• Index-based methods– B-tree

• sort points according to the lowest valued coordinate

– R-tree• Nearest neighbor based (NN)• Branch and bound (BBS)

Page 18: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Related work

• Non-indexed methods– Block-Nested Loops (BnL)– Bitmap – Multidimensional Divide and Conquer (DnC)– Sort First Scan (SFS)

• Index-based methods– B-tree

• sort points according to the lowest valued coordinate

– R-tree• Nearest neighbor based (NN)• Branch and bound (BBS)

Page 19: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Bitmap

• BnL variant• Suitable for domains with low cardinality and

discrete• In brief

– Computes a bitmap representation of the points in the dataset

– Examines each point separately (dominance test)• Checks whether it is contained in the skyline or not• Exploits fast bitwise operations OR/AND

Page 20: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Bitmap – Dominance test

• For each point p – Define A = A1 & A2 & … & Ad

• Denotes the points as good as p in all dimensions

– Define B = B1 | B2 | … | Bd

• Denotes the points strictly better than p in at least one dimension

– Dominance test:• If C = A & B has all bits set to 0 then p is in SL

Page 21: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Orthant skyline (OSL)

• OSL provides more information about dominance relationships than DSL– Useful for pruning

• Given a dataset of d-dimensional points and a query point q– Space partitioned in 2d

orthants– o-th orthant skyline (OSL) of q

contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q

Page 22: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Orthant skyline (OSL)

• OSL provides more information about dominance relationships than DSL– Useful for pruning

• Given a dataset of d-dimensional points and a query point q– Space partitioned in 2d

orthants– o-th orthant skyline (OSL) of q

contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

Query point q

Page 23: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Orthant skyline (OSL)

• OSL provides more information about dominance relationships than DSL– Useful for pruning

• Given a dataset of d-dimensional points and a query point q– Space partitioned in 2d

orthants– o-th orthant skyline (OSL) of q

contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

Query point q

Page 24: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Orthant skyline (OSL)

• OSL provides more information about dominance relationships than DSL– Useful for pruning

• Given a dataset of d-dimensional points and a query point q– Space partitioned in 2d

orthants– o-th orthant skyline (OSL) of q

contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

Query point q

Quadrant 2 skyline points

Page 25: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

Page 26: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

Page 27: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

• Map points from quadrants 1,2,3 to points inside quadrant 0

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

Page 28: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

• Map points from quadrants 1,2,3 to points inside quadrant 0

• Compute DSL w.r.t. q

original pointmapped point

dynamic skylinepointquadrant skylinepoint

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

Page 29: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

• Map points from quadrants 1,2,3 to points inside quadrant 0

• Compute DSL w.r.t. q• Union of all OSLs is

superset of DSL w.r.t. to q

original pointmapped point

dynamic skylinepointquadrant skylinepoint

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

Page 30: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

• Map points from quadrants 1,2,3 to points inside quadrant 0

• Compute DSL w.r.t. q• Union of all OSLs is

superset of DSL w.r.t. to q

original pointmapped point

dynamic skylinepointquadrant skylinepoint

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

p1

p2

Page 31: Caching Dynamic Skyline Queries

July 11 SSDBM'08

OSL and DSL relationship

• Map points from quadrants 1,2,3 to points inside quadrant 0

• Compute DSL w.r.t. q• Union of all OSLs is

superset of DSL w.r.t. to q

original pointmapped point

dynamic skylinepointquadrant skylinepoint

Quadrant 2Distance from sea

Pric

e

Quadrant 0

Quadrant 3

Quadrant 1

q

p3

p2

Page 32: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Computing orthant skylines

• Algorithm DBM– Extends Bitmap to compute DSL and OSLs at

the same time

• Method:– Compute bitmap representation

• Transform each point coordinates w.r.t. to query q

– Dominance test, point p, orthant o• p not in OSLo and not in DSL• p not in DSL, but in OSLo

• p in DSL and in OSLo

Page 33: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Dynamic skylines Via Caching

• Cache OSLs instead of DSLs– Query cache contains (query point qj, OSLs)– OSLs encode by bitmaps

• Algorithm cDBM– OSL contains information about dominance test inside orthant– Discard points inside orthants from dominance tests

• Method:– Compute bitmap representation– For each point p consider its position (orthant) w.r.t. to cache

queries qj

– If p in the same orthant o w.r.t qj as qj w.r.t. q and p not in OSLo(qj) then exclude it from OSLo(q), DSL(q)

Page 34: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Cache Replacement Policies

• General idea– Limited cache space– Identify least useful query point in cache– Replace it with new one

Page 35: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Usage-based policies

• Only a few queries in cache are useful

• Log cache query usage• Given a new query q

– Consider as input the query point cache Q

– Only query points in OSL of Q w.r.t. q are useful

– Update cache - remove:• Least Recently Used

(LRU) query point• Least Frequently Used

(LFU) query point

Page 36: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Usage-based policies

• Only a few queries in cache are useful

• Log cache query usage• Given a new query q

– Consider as input the query point cache Q

– Only query points in OSL of Q w.r.t. q are useful

– Update cache - remove:• Least Recently Used

(LRU) query point• Least Frequently Used

(LFU) query point

Distance from sea

Pric

e

qa

qbqc

qd

q

Page 37: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Usage-based policies

• Only a few queries in cache are useful

• Log cache query usage• Given a new query q

– Consider as input the query point cache Q

– Only query points in OSL of Q w.r.t. q are useful

– Update cache - remove:• Least Recently Used

(LRU) query point• Least Frequently Used

(LFU) query point

Distance from sea

Pric

e

qa

qbqc

qd

q

Redundant queries

Page 38: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Usage-based policies

• Only a few queries in cache are useful

• Log cache query usage• Given a new query q

– Consider as input the query points in cache Q

– Only query points in OSL of Q w.r.t. q are useful

– Update cache - remove:• Least Recently Used

(LRU) query point• Least Frequently Used

(LFU) query point

Distance from sea

Pric

e

qa

qbqc

qd

q

Redundant queries

Page 39: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Usage-based policies

• Only a few queries in cache are useful

• Log cache query usage• Given a new query q

– Consider as input the query points in cache Q

– Only query points in OSL of Q w.r.t. q are useful

– Update cache - remove:• Least Recently Used

(LRU) query point• Least Frequently Used

(LFU) query point

Distance from sea

Pric

e

qa

qbqc

qd

q

Redundant queries

Page 40: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

Page 41: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

2:24

5:24

3:24

74:24

Page 42: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

2:34

5:74

3:44

74:884

Page 43: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

2:3176

5:74

3:44

74:884

Page 44: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

2:3176

5:720

3:421

74:88222

Page 45: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Pruning power-based policy

• Usage-based policies do not indicate usefulness

• Useful cached query– Great pruning power

• Probability that a query can prune points of dataset from DSL computation

– Depends on• Points dominated by query in

an orthant j• Points contained in the

antisymetric orthant of j

• Update cache – remove– Query point with less pruning

power (LPP)

Distance from sea

Pric

e

qa

q

2:3176

5:720

3:421

74:88222

Page 46: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Experimental Evaluation• Synthetic datasets

– Distribution types• Independent, correlated, anti-correlated

– Number of points N• 10k, 20k, 50k, 100k,

– Dimensionality• d = {2,3,4,5,6}

– Domain size for dimension• |D| = {10,20,50}

• Compare– Bitmap (NO-CACHE)– cDBM with LFU,LRU,LPP cache replacement policies– Query cache

• |Q| = {10,20,30,40,50} past query points• Cache size is |Q|*N bits uncompressed

Page 47: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Varying query cache size

• Dataset: N = 50k points, with d = 4 dimensions of |D| = 20 domain size

• LFU,LRU cache queries not representative for future ones

• LPP caches queries with great pruning power

Anti-correlatedIndependent

Page 48: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Effect of distribution parameters

• Relative improvement in running time over NO-CACHE• Vary number of points N

– d = 4 dimensions of |D| = 20 domain size• Vary number of dimensions d

– N = 50k, |D| = 20

Correlated vary dCorrelated vary N

Page 49: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Conclusions and Future work

• Conclusions– Introduced orthant skylines (OSLs) and discussed its

relationship with DSL– Extended Bitmap to compute OSLs and DSL at the

same time (DBM algorithm)– Proposed caching mechanism of OSLs to reduce cost

for future DSL queries• LRU, LFU, LPP cache replacement policies

– Experimentally verified the efficiency of caching mechanism

• Future work– Apply caching mechanism to index-based methods– Further increase pruning power of cached queries

Page 50: Caching Dynamic Skyline Queries

July 11 SSDBM'08

Questions ?