Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings
-
Upload
tzu-li-tai -
Category
Technology
-
view
149 -
download
1
description
Transcript of Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings
Author: Hua Lu, et al.
Aalborg University, Denmark
Reported by: Tzu-Li Tai
National Cheng Kung University, Taiwan
High Performance Parallel and Distributed Systems Lab
Elsevier: Information Systems, Volume 38, 2013
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
A. Background Knowledge
B. The Problem: Efficient Continuous Skyline Monitoring
C. The Approach: Two-Phase Monitoring
D. Personal Feedback
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
Before anything else……
What is a skyline?
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
Definition of “tuple A dominates tuple B”:
A is not worse than B for all attributes, and A is better than B for at least one
attribute
Notation:
𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵
𝑡𝑝𝐴 = 𝑝1, 𝑝2, … , 𝑝𝑛𝑡𝑝𝐵 = 𝑝1, 𝑝2, … , 𝑝𝑛
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
𝑡𝑝 = 𝑝𝑟𝑖𝑐𝑒, 𝑟𝑎𝑡𝑖𝑛𝑔
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵
𝑡𝑝𝐴 = 5, 4000
𝑡𝑝𝐵 = 2.5, 5000
A
B
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵
𝑡𝑝𝐴 = 4, 1500
𝑡𝑝𝐵 = 4, 4500A B
⇒ 𝑡𝑝𝐴 ⊁ 𝑡𝑝𝐵⇒ 𝑡𝑝𝐵 ⊁ 𝑡𝑝𝐴
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
𝑡𝑝𝐴 = 2, 2000
𝑡𝑝𝐵 = 4, 4500
A
B
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
Definition of Skyline:
The subset of all tuples that are not
dominated by any other tuple.
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
0
1
2
3
4
5
6
0 1000 2000 3000 4000 5000 6000 7000
Rating
Price
Price and Rating of Hotels
Definition of Skyline:
The subset of all tuples that are not
dominated by any other tuple.
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
Now that we know what a skyline is……
What is a two-tier streaming
setting for continuous skyline
monitoring?
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
Central Server
(Query Interface)
Data Sites
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Background Knowledge
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Problem: Efficient Continuous Skyline Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Problem: Efficient Continuous Skyline Monitoring
Problem Statement:
Concerning a geographically distributed
computing environment characterized by a
central server and multiple data sites, there is
a demand for a more efficient method for
continuous skyline monitoring.
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Initialization phase
• Obtain initial query result by merging all local
skylines
• Categorize all tuples based on their membership in the
local skyline and global skyline
Maintenance phase
• Continuously monitor global skyline by referring to
formalized cases of possible skyline changes
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
Site 2
Site 3
𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3 }𝑆𝐾𝑙 = {𝑡𝑝1}
𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
𝑆𝐾𝑙𝑔 = {𝑡𝑝2}
𝑆𝐾𝑓𝑝 = {𝑡𝑝1, 𝑡𝑝3}𝑆𝐾𝑙𝑔 = {𝑡𝑝1}
𝑆𝐾𝑓𝑝 = {∅}
Initialization
Phase
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Maintenance
Phase
Site 1
Site 2
Site 3Site 3
𝒕𝒑
𝑡𝑝 𝑡 → 𝑡𝑝(𝑡′)
⟹ 𝒕𝒑 𝒕 ∈ {𝑵𝑺, 𝑭𝑺, 𝑮𝑺}
⟹ Dominance Relationship
between and𝒕𝒑(𝒕) 𝒕𝒑(𝒕′)
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Question 1. Is 𝑡𝑝(𝑡′) not dominated by any global skyline point? If yes, 𝑡𝑝 𝑡′ is in the global skyline.
Question 2.Does 𝑡𝑝 𝑡′ dominate any global skyline point? If yes, the dominated skyline point will be
eliminated from the set of skyline points.
Question 3.𝑡𝑝(𝑡) was a global skyline point. If 𝑡𝑝(𝑡) solely dominates some non-skyline point, does
𝑡𝑝(𝑡′) stop dominating them? If yes, the previously non-skyline point will enter the set of
skyline points.
Question 4.Does 𝑡𝑝 𝑡′ stop being a false-positive global skyline points since it is now dominated by
some other point? If yes, remove 𝑡𝑝 from the false-positive set from the data site side.
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Site 1
𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
𝒕𝒑𝟐 is updated at 𝒕 = 𝒕′
𝑡𝑝2 ∈ ? 𝒕𝒑𝟐 ∈ 𝑵𝑺
Dominance? 𝒕𝒑𝟐 𝒕 ∽ 𝒕𝒑𝟐(𝒕′)
⇒ 𝑪𝒂𝒔𝒆 𝟏
𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
The Approach: Two-Phase Monitoring
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
⇒ 𝑪𝒂𝒔𝒆 𝟏
Consider Q1 and Q2
Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2
Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!
Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2
Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!
Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!
𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
𝑆𝐾𝑔 = { 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2
Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!
Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!
𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
The Approach: Two-Phase Monitoring
Site 1
⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2
Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!
Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!
𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}
𝑆𝐾𝑓𝑝 = {∅}
𝑆𝐾𝑔 = (1, 𝑡𝑝2 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Personal Feedback
I/O rate is increased dramatically
The performance of the proposed approach still
remains arguable due to the massive increase of I/O rates
(as opposed to the traditional two-tier streaming setting).
Keeping all skyline datasets in main-memory
throughout the whole maintenance phase is a considerable
option, but this will bring up fault-tolerance issues.
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Personal Feedback
Critical Path
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Personal Feedback
Critical Path
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU
Personal Feedback
Further enhancing real-time response
for two-tier streaming settings
Remote distributed shared memory datasets across
data sites (clouds)?
Is it possible?
HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU