Making the Pyramid Technique Robust to Query Types and Workloads
description
Transcript of Making the Pyramid Technique Robust to Query Types and Workloads
Making the Pyramid Technique Robust to Query Types and Workloads
Rui Zhang, Beng Chin Ooi, Kian-Lee Tan
Department of Computer Science
National University of Singapore
Singapore
Outline
• Backgrounds
• Existing work and limitations• Our proposal: The P+-tree
• Experimental results• Conclusion
Problem & Motivation
Problem:
Indexing multidimensional point data
Applications:
• Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans)
• High dimension: Image database, Video database, data warehouse
Typical Query Types
• Point Query
• Window Query
[q0min; q0max]; [q1min; q1max]… [qd-1min; qd-1max]
• Range Query
X(x1 , x2 , … xd-1), r
• K-Nearest Neighbor Query (kNN query)
X(x1 , x2 , … xd-1), k
Existing work: Four Strategies
• Data partitioning: R-tree family
• Space partitioning: k-d-tree family
• Dimensionality Reduction: mapping
• Data Compression: VA-file, IQ-tree
Existing work: Comparison
• Low-dimensional space– The R-tree family structures
• For high-dimensional space– Window query: the Pyramid tech. , the
iMinMax– kNN query: the IQ-tree, the iDistance
Existing work: Limitations
• Limited to query types– The Pyramid tech. , the iMinMax: window
query– The iDistance, the IQ-tree: kNN query
• Limited to certain workloads– The Pyramid tech. : hyper-cube shaped window
query, located around center of the data space
Our proposal: the P+-tree
• Based on the Pyramid tech.
• Support both window and kNN queries
• Robust under different workloads
Review of the Pyramid Tech.
i: pyramid numberhv: height , in the i’th (if i<d)or (i-d)’th (if i>=d) dimension
pvv=i+hv
Sensitivity to location of query window / data distribution
Sensitivity to shape of query
The P+-tree
• Divide data space to subspaces– Based on clustering– Divide in the dimension where two clusters differ
greatest
• Transform the points in each subspace– Transform a subspace to unit hyper-cube, [si min, si max]d -
>[0, 1]d, so that the pyramid tech can be applied– Move the cluster center to center of the transformed
space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient
Space division and data transformation
Transformation function• A set of d functions, t0 t1 … td-1 • Requirements:
– ti is a bijection from [si min , si max] to [0,1]– ti is monotonous– ti ( ci ) = 0.5
• In equations:– ti (si min ) = 0– ti (si max ) = 1– ti ( ci ) = 0.5
Transformation function
• ti(x)=(ai x – bi)^ei i=0, 1, … d-1
• For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max]
ai=1/(si min - si max)
bi= si min /(si min - si max)
ei=-1/log2(ai ci - bi)
The space-tree
SNo, ai, bi, ei are stored in leaf nodes
Space division algorithm
• Clustering data
• Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively)
• Build the space-tree
Build the P+-tree
• The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys
• P+-value: SNo · 2d + pv(v’)• For a newly inserted point v, traverse the space-
tree to determine the subspace it belongs to.• Transform the point v to v’, calculate P+-value• Insert the point v, with its P+-value as key
Window search algorithm
• Traverse the space-tree to see which subspaces are intersected by the query
• For each intersected subspace, transform the query according to the transformation function for the subspace
• Search the subspace according to the transformed query
KNN search algorithm
• Start from a small window query
• Gradually increase the side length of the query window until kNN are found
Experiments: Window Queries
Experiments: Partial Window Queries
Experiments: kNN Queries