Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al,...
Transcript of Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al,...
Skyline SnippetsMarkus Endres and Werner Kießling
Outline
2
1. Skyline and Preference Queries
2. Skyline Snippets 3. Performance Benchmarks
4. Summary and Outlook
3
1. Skyline Queries
Cal
[mg]
Fat [mg]
0.5 1.0 1.5 2.0
0.0
5
10
15
20
Drink 5
Drink 4
Drink 3
Drink 3
Drink 1
Drink 6
Drink 7
Drink 8
Drink 9
Skyline QueriesSkyline Queries and Pareto Preferences
4
Beverages with lowest calories and lowest fat?
Literature: • On Finding the Maxima of a Set of Vectors (Kung et. al, 1975)• The Skyline Operator (Börzsönyi et. al, 2001)• Foundations of Preferences in Database Systems (Kießling, 2002)
Skyline / Preference SQL query
SELECT *FROM Beverage BPREFERRING B.cal LOWEST AND B.fat LOWEST
‣ Skyline results become large for• high dimensionality (dimensions up to 10 are not uncommon)
• large database relations
‣Computing the full Skyline is time and memory consuming
‣ In many applications a fraction of the full Skyline is sufficient, e.g. Web-Services, Mobile Internet
‣ State of the Art: • Full Skyline: BNL, LESS, Hexagon / Lattice Skyline, ...
Algorithm with and without indexes.
• Progressive Skyline: BBS, Bitmap, PDS, ...Highly specialized indexes necessary.
Skyline QueriesMotivation
5
‣ Skyline queries are a subset of Pareto preference queries
‣ Preference: strict partial order on dom(A) means: I like y more than x
‣ Preference selection of a preference P
σ[P ](R) := {t ∈ R | ¬∃t� ∈ R : t <P t�}
Skyline QueriesPreference Background (Kießling)
6
x <P y
Skyline / BMO-set / Winnow
<P
‣ Weak Order Preference (WOP)Dominance test by a numerical utility functionwhich depends on the type of preference
‣ Base preference constructorsLOWEST, HIGHEST, POS, NEG, ...
•
•
The d-parameter allows the partitioning of the range of domain values
Skyline QueriesPreference Background (Kießling)
7
fP : dom(A) → R+0
x <P y ⇐⇒ fP (x) > fP (y)
P:=LOWESTd(A)
P:=HIGHESTd(A)fP (x) :=
�x−min
d
�
fP (x) :=
�max−x
d
�
‣ Complex preference constructors, e.g. Pareto (Skyline)
For weak order preferences P1 = (A1, <P1), . . . , Pm = (Am, <Pm), a Paretopreference is defined as
P := ⊗(P1, . . . , Pm) = (A1 × · · ·×Am, <P )
(x1, . . . , xm) <P (y1, . . . , ym) ⇐⇒∃i ∈ {1, . . . ,m} : fPi(xi) > fPi(yi) ∧∀j ∈ {1, . . . ,m}, j �= i : fPj (xj) ≥ fPj (yj)
Skyline QueriesPreference Background (Kießling)
8
A tuple is said to dominate another tuple if it is better in at least one dimension and not worse in all other dimensions.
‣Taxonomy of Base Preference Constructors
‣Complex Preference Constructors• Equal importance: Pareto
• More important: Prioritization
• Weighted importance: Rank, ...
POS NEG LOWESTd HIGHESTd
EXPLICIT
POS/POS POS/NEG AROUNDd
LAYEREDm BETWEENd
SCOREd
CONTAINS GEO PREFERENCE
NEARBYd
WITHINd BUFFERd
ONROUTEd
Skyline QueriesPreference Constructor - An Overview
9
Skyline QueriesHigh Dimensional Preference Query
10www.trial.PreferenceSQL.comA Demo of Preference SQL is available at
A high dimensional preference query
SELECT r.id, r.name, FROM restaurant r, city_map c PREFERRING
c.location NEARBY <lat>, <lon>, 1000 ANDc.ascent LESS THAN 200, 20 ANDr.cuisine IN (`Italian`, `Mexican`) NOT IN (`German`) ANDr.priceCategory NOT IN (`Expensive`, `Luxury`) ANDr.rating BETWEEN `2star` AND `3star` ANDr.ambient IN (`pleasant`) ANDr.waitingTime LOWEST ANDr.customerFriendly HIGHEST
11
2. Skyline Snippets
Skyline Snippets
are a general method to computea fraction of the full Skylinewithout any index structure
Skyline Snippets
12
13
2.1 Pareto k-partition
‣Sub-preference: a lower-dimensional Pareto preference (similar to the concept of subspace Skylines)
‣ Example
Sample sub-preferences
Skyline QueriesSub-Preferences
P := ⊗(P1, P2, P3)
• P {P1,P2} := ⊗(P1, P2)
• P {P1,P3} := ⊗(P1, P3)
• P {P2,P3} := ⊗(P2, P3)
‣ A k-partition of is a decomposition of P into k disjoint Pareto sub-preferences such that
‣ Example:A few partitions of are
P := ⊗(P1, . . . , Pm)
⊗(P1, . . . , Pm) = ⊗(P I1 , . . . , P Ik)
P = ⊗(P1, . . . , P4), k = 2
15
Skyline SnippetsPareto k-Partition
• P = ⊗(P {P1,P2}, P {P3,P4})
• P = ⊗(P {P1,P3}, P {P2,P4})
• P = ⊗(P {P1}, P {P2,P3,P4})
16
2.2 The Skyline Snippets Algorithm
‣The Skyline Snippets Theorem• Given a Pareto preference
and a k-partition
• Let be the Skyline on a relation R.
⊗(P I1 , . . . , P Ik)
S := σ[P ](R)
1. Let Sk =�k
i=1 σ[PIi ](R), then
• σ[P ](Sk) �= ∅• σ[P ](Sk) ⊆ S
σ[P ](Sk) is called a k-snippet of the skyline S.
2. Let Lk =�k
i=1 σ[PIi ](R). If Lk �= ∅, then Lk ⊆ S.
17
Skyline Snippets
P := ⊗(P1, . . . , Pm)
‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}
• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S18
Skyline SnippetsExample
‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}
• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S19
Skyline SnippetsExample
‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}
• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S20
Skyline SnippetsExample
‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}
• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S21
Skyline SnippetsExample
‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}
• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S22
Skyline SnippetsExample
‣Example 2: , only LOWEST preferences on R
P := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
23
Skyline SnippetsExample
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}
• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S
‣Example 2: , only LOWEST preferences on R
P := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
24
Skyline SnippetsExample
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}
• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S
‣Example 2: , only LOWEST preferences on R
P := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
25
Skyline SnippetsExample
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}
• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S
‣Example 2: , only LOWEST preferences on R
P := ⊗(P1, . . . , P4)
Table 1: Sample data set.R ID A1 A2 A3 A4
t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0
26
Skyline SnippetsExample
• The Skyline is S := {t1, t2, t3}
• 2-partitions
– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}
• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S
27
Skyline SnippetsThe Skyline Snippets Algorithm (SSA)
Note:Line 4 can be done in parallel in multi-core architectures.
28
3. Performance Benchmarks
Performance Benchmarks
29
‣ SSA Algorithm vs. • Hexagon (Lattice Skyline) Preisinger, Kießling: The Hexagon Algorithm for Pareto Preference Queries (2007)
• Progressive Hexagon
‣ Implementation in Preference SQL• Java Framework for preference queries on conventional databases• Oracle 11g database
‣ Experiments • Synthetic data sets: ANTI, COR, IND (Data generator, Börzsönyi 2001)
• Vary data cardinality, number of distinct values, d-parameter
0
10
20
30
40
50
60
70
80
90
2 3 4 5 6 7 8 9 10
Ru
ntim
e in
se
c
Dimension m
HexagonSSA
Performance Benchmarks
30
Benchmark 1: Computation time Hexagon vs. SSA
• Pareto preference, only LOWEST preferences (MIN)• Hexagon computes full Skyline, whereas SSA computes a few Skyline points• n = 500K tuples, domain size c = 100K, d_value d = 10K
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
2 3 4 5 6 7 8 9 10
Runtim
e in s
ec
Dimension m
HexagonSSA
ANTI COR
Benchmark 2: Progressive Hexagon vs. SSA
• Pareto preference: • Stop progressive Hexagon after it has computed as many Skyline points as SSA• k-partitions k = 2, 4, 8 to evaluate the influence of the partitions • n = 500K tuples, domain size c = 100K, d_value d = 10K• Full Skyline size: 5902
⊗(P1, . . . , P8)
Table 1: Hexagon (prog.) vs. SSA (ANTI).
#Skylines sec #Skylines sec #Skylines sec
Hexagonp 3801 6.22 1075 5.95 419 5.29
SSA 3801 3.81 1075 0.812 419 0.198
k = 2 k = 4 k = 8
Performance Benchmarks
Benchmark 3: Number of Skyline points computed by Hexagon and SSA
• Pareto preference: • m/2-partitions• Hexagon computes full Skyline• n = 500K tuples, domain size c = 100K, d_value d = 10K
Table 1: Skyline points computed by Hexagon and SSA (ANTI).
m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}
4 12312 1348 1211 1394 - -6 18771 2851 1378 1631 1299 -8 24432 5495 1812 1919 1058 1403
Table 2: Skyline points computed by Hexagon and SSA (COR).
m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}
4 3126 982 706 703 - -6 8931 117 516 621 581 -8 11026 1131 643 681 657 597
Performance Benchmarks
⊗(P1, . . . , Pm), m = 4, 6, 8
33
4. Summary and Outlook
Summary and Outlook
34
Summary
‣ Too many Skyline points in high-dimensional space‣ Skyline evaluation on high-dimensional space is time and memory consuming‣ Some Snippets of the full Skyline often sufficient, e.g. Mobile Internet, Web Services‣ Skyline Snippets algorithm (SSA) without any specialized index structure‣ Very fast computation of some Skyline points
Summary and Outlook
35
Outlook
‣ Extended performance benchmarks investigating• Influence of the different types of preference constructors• Performance impact of different k-partitions
‣ Development of heuristics for choosing k-partitions
36
Thank you for your attention!
Questions ?
{endres,kiessling}@informatik.uni-augsburg.de