Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

Post on 10-May-2015

149 views 1 download

Tags:

description

[Paper Study] Hua Lu, et al., Aalborg University, Denmark 2013 Elsevier Volume 38.

Transcript of Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

Author: Hua Lu, et al.

Aalborg University, Denmark

Reported by: Tzu-Li Tai

National Cheng Kung University, Taiwan

High Performance Parallel and Distributed Systems Lab

Elsevier: Information Systems, Volume 38, 2013

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

A. Background Knowledge

B. The Problem: Efficient Continuous Skyline Monitoring

C. The Approach: Two-Phase Monitoring

D. Personal Feedback

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Before anything else……

What is a skyline?

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Definition of “tuple A dominates tuple B”:

A is not worse than B for all attributes, and A is better than B for at least one

attribute

Notation:

𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 𝑝1, 𝑝2, … , 𝑝𝑛𝑡𝑝𝐵 = 𝑝1, 𝑝2, … , 𝑝𝑛

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

𝑡𝑝 = 𝑝𝑟𝑖𝑐𝑒, 𝑟𝑎𝑡𝑖𝑛𝑔

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 5, 4000

𝑡𝑝𝐵 = 2.5, 5000

A

B

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 4, 1500

𝑡𝑝𝐵 = 4, 4500A B

⇒ 𝑡𝑝𝐴 ⊁ 𝑡𝑝𝐵⇒ 𝑡𝑝𝐵 ⊁ 𝑡𝑝𝐴

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

𝑡𝑝𝐴 = 2, 2000

𝑡𝑝𝐵 = 4, 4500

A

B

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Definition of Skyline:

The subset of all tuples that are not

dominated by any other tuple.

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

Definition of Skyline:

The subset of all tuples that are not

dominated by any other tuple.

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Now that we know what a skyline is……

What is a two-tier streaming

setting for continuous skyline

monitoring?

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Central Server

(Query Interface)

Data Sites

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Problem: Efficient Continuous Skyline Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Problem: Efficient Continuous Skyline Monitoring

Problem Statement:

Concerning a geographically distributed

computing environment characterized by a

central server and multiple data sites, there is

a demand for a more efficient method for

continuous skyline monitoring.

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Initialization phase

• Obtain initial query result by merging all local

skylines

• Categorize all tuples based on their membership in the

local skyline and global skyline

Maintenance phase

• Continuously monitor global skyline by referring to

formalized cases of possible skyline changes

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

Site 2

Site 3

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3 }𝑆𝐾𝑙 = {𝑡𝑝1}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2}

𝑆𝐾𝑓𝑝 = {𝑡𝑝1, 𝑡𝑝3}𝑆𝐾𝑙𝑔 = {𝑡𝑝1}

𝑆𝐾𝑓𝑝 = {∅}

Initialization

Phase

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Maintenance

Phase

Site 1

Site 2

Site 3Site 3

𝒕𝒑

𝑡𝑝 𝑡 → 𝑡𝑝(𝑡′)

⟹ 𝒕𝒑 𝒕 ∈ {𝑵𝑺, 𝑭𝑺, 𝑮𝑺}

⟹ Dominance Relationship

between and𝒕𝒑(𝒕) 𝒕𝒑(𝒕′)

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Question 1. Is 𝑡𝑝(𝑡′) not dominated by any global skyline point? If yes, 𝑡𝑝 𝑡′ is in the global skyline.

Question 2.Does 𝑡𝑝 𝑡′ dominate any global skyline point? If yes, the dominated skyline point will be

eliminated from the set of skyline points.

Question 3.𝑡𝑝(𝑡) was a global skyline point. If 𝑡𝑝(𝑡) solely dominates some non-skyline point, does

𝑡𝑝(𝑡′) stop dominating them? If yes, the previously non-skyline point will enter the set of

skyline points.

Question 4.Does 𝑡𝑝 𝑡′ stop being a false-positive global skyline points since it is now dominated by

some other point? If yes, remove 𝑡𝑝 from the false-positive set from the data site side.

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝒕𝒑𝟐 is updated at 𝒕 = 𝒕′

𝑡𝑝2 ∈ ? 𝒕𝒑𝟐 ∈ 𝑵𝑺

Dominance? 𝒕𝒑𝟐 𝒕 ∽ 𝒕𝒑𝟐(𝒕′)

⇒ 𝑪𝒂𝒔𝒆 𝟏

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

The Approach: Two-Phase Monitoring

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

⇒ 𝑪𝒂𝒔𝒆 𝟏

Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑔 = { 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝑆𝐾𝑔 = (1, 𝑡𝑝2 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

I/O rate is increased dramatically

The performance of the proposed approach still

remains arguable due to the massive increase of I/O rates

(as opposed to the traditional two-tier streaming setting).

Keeping all skyline datasets in main-memory

throughout the whole maintenance phase is a considerable

option, but this will bring up fault-tolerance issues.

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Critical Path

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Critical Path

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Further enhancing real-time response

for two-tier streaming settings

Remote distributed shared memory datasets across

data sites (clouds)?

Is it possible?

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU