Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.
-
Upload
marilyn-cain -
Category
Documents
-
view
251 -
download
1
Transcript of Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper(by Xiaofang and Xuemin)
• Research Interests
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang and Xuemin)
• Research Interests
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang and Xuemin)
• Research Interests
Introduction to UQ• founded in 1910• Outstanding Majors
– Business– Biology– Medical
• School of ITEE– Biomedical Engineering – Cognitive Systems Engineering – Complex & Intelligent Systems – Data & Knowledge Engineering (DKE)– eResearch – Microwave & Optical Communications – Power & Energy Systems – Security & Surveillance – Systems & Software Engineering – Ubiquitous Computing
Academic Staff in DKE Group
Prof. Xiaofang ZhouHe is the Head of the Data and Knowledge Engineering Research Group (DKE). He is also the Convenor of ARC Research Network in Enterprise Information Infrastructure (EII), and a Chief Investigator of ARC Centre of Excellence in Bioinformatics. Professor Zhou received his BSc and MSc degrees in Computer Science from Nanjing University in 1984 and 1987 respectively, and PhD in Computer Science from the University of Queensland in 1994. From 1994 to 1999, he worked as a Senior Research Scientist in CSIRO, leading its Spatial Information Systems group. His research focuses on finding effective and efficient solutions to managing, integrating and analysing very large amount of complex data for business and scientific applications. His research interests include spatial and multimedia databases, data quality, high performance query processing, Web information systems and bioinformatics.
Dr. Shazia Sadiq
Her research interests are innovative solutions for Business Information Systems that span several areas including business process management, governance, risk and compliance, data quality management, workflow systems, and service oriented computing.
Dr. Xue Li
Associate Professors
His research interests and expertise include: Data Mining, Multimedia Data Security, Database Systems, and Intelligent Web Information Systems.
Senior LectureDr. Hengtao Shen
His Research interests:
Media (Video)/Web Search Multimedia/Web/Spatial/Genome Database Management Nonlinear/Local Dimensionality Reduction Indexing and Query Processing P2P Database Management
Research StaffKen Deng:1. Data Quality 2. Spatial Database
Helen Huang:1.Video retrieval 2. knowledge discovery
Gabriel:1. Data Mining and Knowledge Discovery 2. Time Series Mining and Forecasting3. Skyline Query Processing
Stella:1. Video Search &
Retrieval, 2. Web Data Extraction &
Analysis,3. Recipe Data Modeling
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang and Xuemin)
• Research Interests
Seminar Plan
• Guide of writing a good paper (by Xiaofang and Xuemin)
• Research Interests:– Skyline Query Processing (From)– Data Quality—Record Linkage (To)
Guide of writing a good paper(by Xiaofang and Xuemin)
• Motivation– Interesting– Reasonable– Pure
• Solution– Smart– Sharp
• Conclusion– Experiment: time and space complexity
Cont.
• Tools– Word?– Latex: winEdt/eclipse+GNUPlot+Illustrator/smartdraw
• Format: Jian Pei’s papers– Ranking Queries on Uncertain Data: A Probabilisti
c Threshold Approach– Efficiently Answering Top-k Typicality Queries on
Large Databases– ……
Research Interests
• Skyline Query Processing (2007.12 ~ 2008.7)
• Data Quality—Record Linkage (2008.7~)
Motivations---skyline
• Given a dataset of d-dimensional points– a dominates b iff a out
performs b at least one dimension and not worse at other dimensions
height
app
eara
nce a
b
Motivations---skyline
• Given a dataset of d-dimensional points– a dominates b iff a out
performs b at least one dimension and not worse at other dimensions
– S contains points not dominated by others • Example
– Dataset of girls– Prefer good-looking
tall
height
app
eara
nce a
b
Motivations---skyline
• Given a dataset of d-dimensional points– a dominates b iff a out
performs b at least one dimension and not worse at other dimensions
– S contains points not dominated by others
Skyline points
height
app
eara
nce
• Example– Dataset of girls– Prefer good-looking
tall
Dynamic Skyline
• Extension of skyline queries– Given a query point q– a dominates b iff a outpe
rforms b at least one dimension and not worse at other dimensions
– S contains points not dominated by others
height
app
eara
nce
Query point q
• Example– User defines “ideal” gi
rlfriend
Dynamic Skyline
• Extension of skyline queries– Given a query point q– a dominates b iff a outpe
rforms b at least one dimension and not worse at other dimensions
– S contains points not dominated by others
height
app
eara
nce
Query point q
• Example– User defines “ideal” gi
rlfriend
Dynamic Skyline
• Extension of skyline queries– Given a query point q– a dominates b iff a outpe
rforms b at least one dimension and not worse at other dimensions
– S contains points not dominated by others
height
app
eara
nce
Query point q
• Example– User defines “ideal” gi
rlfriend
Window Skyline
• Extension of skyline queries– Given an area and a que
ry point q
height
app
eara
nce
• Example– User defines “ideal” gi
rlfriend
Reverse Skyline
• Extension of skyline queries– A set of query points (re
d)– Which points (white) ma
ke q as its skyline point
height
app
eara
nce
Query point q
Skyline Cube
• A real estate example
price (100K) dist age …
P1 3 3 5 …
P2 5 1 1 …
P3 1 4 4 …
P4 4 5 2 …
P5 2 2 3 …Properties and Values
Skyline on price & dist
Skyline on price & age
P1
P3
P5
P4
P2
price
age
P4
P3
P5
P1
P2
price
dist
Variation of Skyline Queries
• Multi-Sources– Multiple query points
• P2P Skyline Computation– Each peer runs skyline computation
• ……
An Open Question
• Group Skyline– 10 NBA Players (score, rebound, assist),
choose 3 Player among them as a team to maximize the scores, rebounds, and assists
Fatal Defects of Skyline Queries
• The cardinality of result is huge – Given a set of d-dimensional points with the
cardinality n, the expected number of skyline points is O(lnd−1n/(d−1)!). (all the dimensions are independent )
• Unpractical
• How to improve this problem?
Selecting Stars
• Skyline {p2,p4,p6}
• 1 Representative skyline point– p6
• 2 Representative skyline points– {p2,p6}
Selecting Stars
Skyline Point Dominant setp2 p1
p4 p3
p6 p3, p5, p7
• Skyline {p2,p4,p6}
• 1 Representative skyline point– p6
• Skyline {p2,p4,p6}
• 2 Representative skyline points– {p2,p6}
• Challenge: NP-Complete when d > 2
Selecting Stars
Skyline Point Dominant setp2 , p4 p1 , p3
p2 , p6 p1 , p3, p5, p7
p4, p6 p3, p5, p7
k-Dominant Skyline (cont.)
• k-Dominate– If A is not worse than B on k dimensions, and
better on at least one of the k dimensions, we say A k-dominate B.
k-Dominant Skyline (cont.)
• k-Dominant Skyline– k-dominant skyline contains all the points that
cannot be k-dominated by any other point
• Problems:– The result can be null– Some good points may be pruned
Skylines on Uncertain Data
• Consider game-by-game statistics• Conventional methods compute the skyline on
– Separate game records– Aggregate: mean or median
• Limitations– Biased by outliers– Lose data distributions
• Probabilistic skylines– An instance has a probability
to represent the object– An object has a probability
to be in the skylineThe 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, September 23-28
2007
Possible worlds
A B C
1 a1 b1 c1
2 a1 b1 c2
3 a1 b2 c1
4 a1 b2 c2
5 a2 b1 c1
6 a2 b1 c2
7 a2 b2 c1
8 a2 b2 c2
P(A) = 1P(B) = 6/8P(C) = 0
• Top-K query– Given a threshold p, trying to identify all the players
with skyline probability >= p;
KNN probabilistic skyline
• Given an object O, find K objects whose skyline probabilities are nearest to O.
• Applications:– Given an NBA player (singer star), try to find
K NBA players (singer stars), whose performances are most similar to him/her.
Experiment
• Dataset– Real dataset: NBA– Synthetic dataset: anti-related, correlated,
independent
• Parameters– Dimension– Cardinality of dataset
• Efficiency– Time– Memory