Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.

48
Research and Practice at University of Queensland Wei Lu ( 卢卢 ) 2/19/2009

Transcript of Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.

Research and Practice at University of Queensland

Wei Lu (卢卫 )

2/19/2009

Seminar Plan

• Tour in Australia

• Join UQ

• How to write a good paper(by Xiaofang and Xuemin)

• Research Interests

Seminar Plan

• Tour in Australia

• Join UQ

• How to write a good paper (by Xiaofang and Xuemin)

• Research Interests

Tour in Australia

Seminar Plan

• Tour in Australia

• Join UQ

• How to write a good paper (by Xiaofang and Xuemin)

• Research Interests

Join UQ

UQ

Introduction to UQ• founded in 1910• Outstanding Majors

– Business– Biology– Medical

• School of ITEE– Biomedical Engineering – Cognitive Systems Engineering – Complex & Intelligent Systems – Data & Knowledge Engineering (DKE)– eResearch – Microwave & Optical Communications – Power & Energy Systems – Security & Surveillance – Systems & Software Engineering – Ubiquitous Computing

Academic Staff in DKE Group

Prof. Xiaofang ZhouHe is the Head of the Data and Knowledge Engineering Research Group (DKE). He is also the Convenor of ARC Research Network in Enterprise Information Infrastructure (EII), and a Chief Investigator of ARC Centre of Excellence in Bioinformatics. Professor Zhou received his BSc and MSc degrees in Computer Science from Nanjing University in 1984 and 1987 respectively, and PhD in Computer Science from the University of Queensland in 1994. From 1994 to 1999, he worked as a Senior Research Scientist in CSIRO, leading its Spatial Information Systems group. His research focuses on finding effective and efficient solutions to managing, integrating and analysing very large amount of complex data for business and scientific applications. His research interests include spatial and multimedia databases, data quality, high performance query processing, Web information systems and bioinformatics.

Dr. Shazia Sadiq

Her research interests are innovative solutions for Business Information Systems that span several areas including business process management, governance, risk and compliance, data quality management, workflow systems, and service oriented computing.

Dr. Xue Li

Associate Professors

His research interests and expertise include: Data Mining, Multimedia Data Security, Database Systems, and Intelligent Web Information Systems.

Senior LectureDr. Hengtao Shen

His Research interests:

Media (Video)/Web Search Multimedia/Web/Spatial/Genome Database Management Nonlinear/Local Dimensionality Reduction Indexing and Query Processing P2P Database Management

Research StaffKen Deng:1. Data Quality 2. Spatial Database

Helen Huang:1.Video retrieval 2. knowledge discovery

Gabriel:1. Data Mining and Knowledge Discovery 2. Time Series Mining and Forecasting3. Skyline Query Processing

Stella:1. Video Search &

Retrieval, 2. Web Data Extraction &

Analysis,3. Recipe Data Modeling

Seminar Plan

• Tour in Australia

• Join UQ

• How to write a good paper (by Xiaofang and Xuemin)

• Research Interests

Seminar Plan

• Guide of writing a good paper (by Xiaofang and Xuemin)

• Research Interests:– Skyline Query Processing (From)– Data Quality—Record Linkage (To)

Guide of writing a good paper(by Xiaofang and Xuemin)

• Motivation– Interesting– Reasonable– Pure

• Solution– Smart– Sharp

• Conclusion– Experiment: time and space complexity

Cont.

• Tools– Word?– Latex: winEdt/eclipse+GNUPlot+Illustrator/smartdraw

• Format: Jian Pei’s papers– Ranking Queries on Uncertain Data: A Probabilisti

c Threshold Approach– Efficiently Answering Top-k Typicality Queries on

Large Databases– ……

Seminar Plan

• Tour in Australia

• Join UQ

• How to write a good paper

• Research Interests

Research Interests

• Skyline Query Processing (2007.12 ~ 2008.7)

• Data Quality—Record Linkage (2008.7~)

Skyline Query Processing

• Motivations

• How to do experiments

Motivations---skyline

• Given a dataset of d-dimensional points

height

app

eara

nce

Motivations---skyline

• Given a dataset of d-dimensional points– a dominates b iff a out

performs b at least one dimension and not worse at other dimensions

height

app

eara

nce a

b

Motivations---skyline

• Given a dataset of d-dimensional points– a dominates b iff a out

performs b at least one dimension and not worse at other dimensions

– S contains points not dominated by others • Example

– Dataset of girls– Prefer good-looking

tall

height

app

eara

nce a

b

Motivations---skyline

• Given a dataset of d-dimensional points– a dominates b iff a out

performs b at least one dimension and not worse at other dimensions

– S contains points not dominated by others

Skyline points

height

app

eara

nce

• Example– Dataset of girls– Prefer good-looking

tall

Dynamic Skyline

• Extension of skyline queries– Given a query point q– a dominates b iff a outpe

rforms b at least one dimension and not worse at other dimensions

– S contains points not dominated by others

height

app

eara

nce

Query point q

• Example– User defines “ideal” gi

rlfriend

Dynamic Skyline

• Extension of skyline queries– Given a query point q– a dominates b iff a outpe

rforms b at least one dimension and not worse at other dimensions

– S contains points not dominated by others

height

app

eara

nce

Query point q

• Example– User defines “ideal” gi

rlfriend

Dynamic Skyline

• Extension of skyline queries– Given a query point q– a dominates b iff a outpe

rforms b at least one dimension and not worse at other dimensions

– S contains points not dominated by others

height

app

eara

nce

Query point q

• Example– User defines “ideal” gi

rlfriend

Window Skyline

• Extension of skyline queries– Given an area and a que

ry point q

height

app

eara

nce

• Example– User defines “ideal” gi

rlfriend

Reverse Skyline

• Extension of skyline queries– A set of query points (re

d)– Which points (white) ma

ke q as its skyline point

height

app

eara

nce

Query point q

Skyline Cube

• A real estate example

price (100K) dist age …

P1 3 3 5 …

P2 5 1 1 …

P3 1 4 4 …

P4 4 5 2 …

P5 2 2 3 …Properties and Values

Skyline on price & dist

Skyline on price & age

P1

P3

P5

P4

P2

price

age

P4

P3

P5

P1

P2

price

dist

Variation of Skyline Queries

• Multi-Sources– Multiple query points

• P2P Skyline Computation– Each peer runs skyline computation

• ……

An Open Question

• Group Skyline– 10 NBA Players (score, rebound, assist),

choose 3 Player among them as a team to maximize the scores, rebounds, and assists

Fatal Defects of Skyline Queries

• The cardinality of result is huge – Given a set of d-dimensional points with the

cardinality n, the expected number of skyline points is O(lnd−1n/(d−1)!). (all the dimensions are independent )

• Unpractical

• How to improve this problem?

Reduce the cardinality of skyline

• Selecting Star

• K-Dominant Skyline

• Core Skyline

Selecting Stars

• Skyline {p2,p4,p6}

• 1 Representative skyline point– p6

• 2 Representative skyline points– {p2,p6}

Selecting Stars

Skyline Point Dominant setp2 p1

p4 p3

p6 p3, p5, p7

• Skyline {p2,p4,p6}

• 1 Representative skyline point– p6

• Skyline {p2,p4,p6}

• 2 Representative skyline points– {p2,p6}

• Challenge: NP-Complete when d > 2

Selecting Stars

Skyline Point Dominant setp2 , p4 p1 , p3

p2 , p6 p1 , p3, p5, p7

p4, p6 p3, p5, p7

K-Dominant Skyline

k-Dominant Skyline (cont.)

• k-Dominate– If A is not worse than B on k dimensions, and

better on at least one of the k dimensions, we say A k-dominate B.

k-Dominant Skyline (cont.)

• k-Dominant Skyline– k-dominant skyline contains all the points that

cannot be k-dominated by any other point

• Problems:– The result can be null– Some good points may be pruned

Core Skyline

Skylines on Uncertain Data

• Consider game-by-game statistics• Conventional methods compute the skyline on

– Separate game records– Aggregate: mean or median

• Limitations– Biased by outliers– Lose data distributions

• Probabilistic skylines– An instance has a probability

to represent the object– An object has a probability

to be in the skylineThe 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, September 23-28

2007

Possible worlds

A B C

1 a1 b1 c1

2 a1 b1 c2

3 a1 b2 c1

4 a1 b2 c2

5 a2 b1 c1

6 a2 b1 c2

7 a2 b2 c1

8 a2 b2 c2

P(A) = 1P(B) = 6/8P(C) = 0

• Top-K query– Given a threshold p, trying to identify all the players

with skyline probability >= p;

KNN probabilistic skyline

• Given an object O, find K objects whose skyline probabilities are nearest to O.

• Applications:– Given an NBA player (singer star), try to find

K NBA players (singer stars), whose performances are most similar to him/her.

Experiment

• Dataset– Real dataset: NBA– Synthetic dataset: anti-related, correlated,

independent

• Parameters– Dimension– Cardinality of dataset

• Efficiency– Time– Memory

Thanks!