Making the Pyramid Technique Robust to Query Types and Workloads

23
Making the Pyramid Technique Robust to Query Types and Workloads Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore

description

Making the Pyramid Technique Robust to Query Types and Workloads. Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore. Outline. Backgrounds Existing work and limitations Our proposal: The P + -tree Experimental results - PowerPoint PPT Presentation

Transcript of Making the Pyramid Technique Robust to Query Types and Workloads

Page 1: Making the Pyramid Technique Robust to Query Types and Workloads

Making the Pyramid Technique Robust to Query Types and Workloads

Rui Zhang, Beng Chin Ooi, Kian-Lee Tan

Department of Computer Science

National University of Singapore

Singapore

Page 2: Making the Pyramid Technique Robust to Query Types and Workloads

Outline

• Backgrounds

• Existing work and limitations• Our proposal: The P+-tree

• Experimental results• Conclusion

Page 3: Making the Pyramid Technique Robust to Query Types and Workloads

Problem & Motivation

Problem:

Indexing multidimensional point data

Applications:

• Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans)

• High dimension: Image database, Video database, data warehouse

Page 4: Making the Pyramid Technique Robust to Query Types and Workloads

Typical Query Types

• Point Query

• Window Query

[q0min; q0max]; [q1min; q1max]… [qd-1min; qd-1max]

• Range Query

X(x1 , x2 , … xd-1), r

• K-Nearest Neighbor Query (kNN query)

X(x1 , x2 , … xd-1), k

Page 5: Making the Pyramid Technique Robust to Query Types and Workloads

Existing work: Four Strategies

• Data partitioning: R-tree family

• Space partitioning: k-d-tree family

• Dimensionality Reduction: mapping

• Data Compression: VA-file, IQ-tree

Page 6: Making the Pyramid Technique Robust to Query Types and Workloads

Existing work: Comparison

• Low-dimensional space– The R-tree family structures

• For high-dimensional space– Window query: the Pyramid tech. , the

iMinMax– kNN query: the IQ-tree, the iDistance

Page 7: Making the Pyramid Technique Robust to Query Types and Workloads

Existing work: Limitations

• Limited to query types– The Pyramid tech. , the iMinMax: window

query– The iDistance, the IQ-tree: kNN query

• Limited to certain workloads– The Pyramid tech. : hyper-cube shaped window

query, located around center of the data space

Page 8: Making the Pyramid Technique Robust to Query Types and Workloads

Our proposal: the P+-tree

• Based on the Pyramid tech.

• Support both window and kNN queries

• Robust under different workloads

Page 9: Making the Pyramid Technique Robust to Query Types and Workloads

Review of the Pyramid Tech.

i: pyramid numberhv: height , in the i’th (if i<d)or (i-d)’th (if i>=d) dimension

pvv=i+hv

Page 10: Making the Pyramid Technique Robust to Query Types and Workloads

Sensitivity to location of query window / data distribution

Page 11: Making the Pyramid Technique Robust to Query Types and Workloads

Sensitivity to shape of query

Page 12: Making the Pyramid Technique Robust to Query Types and Workloads

The P+-tree

• Divide data space to subspaces– Based on clustering– Divide in the dimension where two clusters differ

greatest

• Transform the points in each subspace– Transform a subspace to unit hyper-cube, [si min, si max]d -

>[0, 1]d, so that the pyramid tech can be applied– Move the cluster center to center of the transformed

space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient

Page 13: Making the Pyramid Technique Robust to Query Types and Workloads

Space division and data transformation

Page 14: Making the Pyramid Technique Robust to Query Types and Workloads

Transformation function• A set of d functions, t0 t1 … td-1 • Requirements:

– ti is a bijection from [si min , si max] to [0,1]– ti is monotonous– ti ( ci ) = 0.5

• In equations:– ti (si min ) = 0– ti (si max ) = 1– ti ( ci ) = 0.5

Page 15: Making the Pyramid Technique Robust to Query Types and Workloads

Transformation function

• ti(x)=(ai x – bi)^ei i=0, 1, … d-1

• For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max]

ai=1/(si min - si max)

bi= si min /(si min - si max)

ei=-1/log2(ai ci - bi)

Page 16: Making the Pyramid Technique Robust to Query Types and Workloads

The space-tree

SNo, ai, bi, ei are stored in leaf nodes

Page 17: Making the Pyramid Technique Robust to Query Types and Workloads

Space division algorithm

• Clustering data

• Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively)

• Build the space-tree

Page 18: Making the Pyramid Technique Robust to Query Types and Workloads

Build the P+-tree

• The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys

• P+-value: SNo · 2d + pv(v’)• For a newly inserted point v, traverse the space-

tree to determine the subspace it belongs to.• Transform the point v to v’, calculate P+-value• Insert the point v, with its P+-value as key

Page 19: Making the Pyramid Technique Robust to Query Types and Workloads

Window search algorithm

• Traverse the space-tree to see which subspaces are intersected by the query

• For each intersected subspace, transform the query according to the transformation function for the subspace

• Search the subspace according to the transformed query

Page 20: Making the Pyramid Technique Robust to Query Types and Workloads

KNN search algorithm

• Start from a small window query

• Gradually increase the side length of the query window until kNN are found

Page 21: Making the Pyramid Technique Robust to Query Types and Workloads

Experiments: Window Queries

Page 22: Making the Pyramid Technique Robust to Query Types and Workloads

Experiments: Partial Window Queries

Page 23: Making the Pyramid Technique Robust to Query Types and Workloads

Experiments: kNN Queries