The R* Tree
-
Upload
alok-yadav -
Category
Documents
-
view
219 -
download
1
Transcript of The R* Tree
-
7/30/2019 The R* Tree
1/38
An Efficient and Robust Access
Method forPoints and Rectangles
Norbert Beckmann, Hans-Peter Kriegel,
Ralf Schneider, Bernhard Seeger
Presented By :
Alok Kumar Yadav2009cs10177
-
7/30/2019 The R* Tree
2/38
Overview Introduction
Rtree and Its Optimization
R-tree Variants R*-tree
Experiments
Conclusions
-
7/30/2019 The R* Tree
3/38
Introduction
Spatial Access Methods (SAMs):
Approximation of complex spatial object by MBR
Pros: Complex object can be represented by limited no of bytes
Preserves most essential geometric properties i.e. location andextension
Cons: (Obviously) A lot of information is lost
-
7/30/2019 The R* Tree
4/38
Introduction (Cont..) R-tree:
B+-tree like structure
Popular access method for rectangles
Based on the heuristic optimization ofarea of enclosingrectangles in each inner node
R*-tree:
Combined optimization: Area, Margin & Overlap Outperforms exiting R-tree variants
Efficiently supports point and spatial data
-
7/30/2019 The R* Tree
5/38
Introduction (Cont..)
Given a city map, index all university buildings in an efficient structure for quicktopological search.
-
7/30/2019 The R* Tree
6/38
Introduction (Cont..)
-
7/30/2019 The R* Tree
7/38
Introduction (Cont..)
MBRof the cityneighbourhoods.
MBRof the citydefining the
overall search
region.
-
7/30/2019 The R* Tree
8/38
R-tree B+-tree like structure
Nodes : E (cp, rectangle)
cp : Child Pointer
For leaves it is a record in database
rectangle :
MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object
Max no of elements in a node is M, min is m
-
7/30/2019 The R* Tree
9/38
R-tree (Cont..)Structure
I(A) I(B) I(M)
I(a) I(b) I(c) I(d)
B
a
b
c
d
-
7/30/2019 The R* Tree
10/38
R-tree (Cont..) B+-tree like structure
Nodes : E (cp, rectangle) cp :
Child Pointer For leaves it is a record in database
rectangle : MBR of all rectangle in child node
For leaves it is enclosing rectangle of spatial object
Optimization criterion: Minimization of areaof enclosingrectangles in each inner node
Allows overlapping of directory rectangles, hence cannotguarantee a single search path
-
7/30/2019 The R* Tree
11/38
Insertion Algorithm
ChooseSubtree Algorithm
Split Algorithm
If ends in a node
filled with M entries
R-tree Variants It is dynamic, hence all optimization
have to applied during insertion
Finds most suitable subtree fornew entry
If node is filled with more then M
entries, distribute in two nodes in a
appropriate manner
-
7/30/2019 The R* Tree
12/38
R-tree Variants Original R-tree : Guttman
Greenes R-tree
-
7/30/2019 The R* Tree
13/38
Original R-tree: Guttman Method of optimization is minimize directory
rectangle area
Split algos: Exponential, Linear, Quadratic Exponential best but cpu cost too high
Others are approximations
Quadratic outperforms linear
-
7/30/2019 The R* Tree
14/38
Guttmans ChooseSubtreeCS1 Set N to be the root node
CS2 If N is a leaf,
return N
else
Choose the entry in N whose rectangle needs least areaenlargement to include the new data. Resolve ties bychoosing the entry with the rectangle of smallest area
endCS3 Set N to be the childnode pointed to by the childpointer of
the chosen entry. Repeat from CS2
-
7/30/2019 The R* Tree
15/38
Guttmans Split AlgorithmQuadratic Split[Divide a set of M+1 index entries into two groups]
QS1 Invoke PickSeeds to choose two entries, each be first entryof each group
QS2 Repeat
DistributeEntry
until
all entries are distributed or one of the two groups hasMm+1 entries (so that the other group has m entries)
QS3 If entries remain, assign them to the other group so that ithas the minimum number m required
-
7/30/2019 The R* Tree
16/38
Guttmans Split Algorithm (Cont..) PickSeedsPS1 For each pair of entries E1 and E2, compose a
rectangle R including E1 rectangle and E2 rectangle
Calculate d = area(R) - area(E1 rectangle) - area(E2 rectangle)PS2 Choose the pair with the largest d
DistributeEntry
DE1 Invoke PickNext to choose the next entry to be assignedDE2 Add It to the group whose covering rectangle will have tobe enlarged least to accommodate It. Resolve ties by addingthe entry to the group with the smallest area, then to theone with the fewer entries, then to either
-
7/30/2019 The R* Tree
17/38
Guttmans Split Algorithm (Cont..)PickNextDE1 For each entry E not yet in a group, calculate
d1 = the area increase required in the coveringrectangle of Group 1 to include E Rectangle.
Calculate d2 analogously for Group 2.
DE2 Choose the entry with the maximum difference
between d1 and d2
-
7/30/2019 The R* Tree
18/38
Problems Small Seeds:
If d-1 of the d axes of a far away rectangle is same as oneseed, needle like bounding rectangle may be formed
May initiate a bad split
R1 R2
R3
-
7/30/2019 The R* Tree
19/38
Problems (Cont..) Prefer Bounding Rectangle:
Algo prefer the MBR created
from previous assignment Since it was enlarged, it
requires less area enlarge-
ment to include next entry
If a group reached M-m+1 entries, rest are assigned
to other without considering geometric properties
Z
YX
W
G2
G1
-
7/30/2019 The R* Tree
20/38
Greenes R-tree ChooseSubtree is same as Guttmans
Alternative split algorithm
Invokes PickSeeds to find two most distant rectangles Picks a axis using these two rectangles depending
upon there separation distance
Sorts the remaining rectangles along chosen axis
Distributes half entries to one group and remaining toother
-
7/30/2019 The R* Tree
21/38
Greenes R-tree (Cont..) In some situations cannot find right axis and bad split
may occur
-
7/30/2019 The R* Tree
22/38
Inspiration For R*-tree Minimize overlap between directory rectangles :
Decrease no of path to be traversed
Minimize margin of a directory rectangle : Rectanglewould be shaped more quadratic
Optimize space utilization : Height will be low
-
7/30/2019 The R* Tree
23/38
R*-tree Structure same as R-tree
For insertion R-tree versions only consider area
R*-tree consider area, margin & overlap in differentcombinations
Overlap is defined as
-
7/30/2019 The R* Tree
24/38
R*-tree: ChooseSubtree Similar to original one, only difference is that it minimizes overlap enlargement
when N points to leaves
CS1 Set N to be the root node
CS2 If N is a leaf,return N
else Ifchildpointers in N point to leaves [determine theminimum overlap cost],choose the entry in N whose rectangle needs leastoverlap enlargement to include the new data rectangle.Resolve ties by choosing the entry whose rectangleneeds least area enlargement, then the entry with therectangle of smallest area
else [determine the minimum area cost]choose the entry in N whose rectangle needs least areaenlargement to include the new data rectangle. Resolveties by choosing entry with rectangle of smallest area
end
CS3 Set N to be the childnode pointed to by the childpointer ofthe chosen entry. Repeat from CS2
-
7/30/2019 The R* Tree
25/38
R*-tree: Split Algorithm Three goodness values:
area-value area[bb(first group)] +
area[bb(second group)]
margin-value margin[bb(first group)] +margin[bb(second group)]
overlap-value area[bb(first group)
bb(second group)]
Depending on these values final distribution is determined
*bb: bounding box
-
7/30/2019 The R* Tree
26/38
R*-tree: Split Algorithm (Cont..) Method for good split:
Along each axis entries are sorted first by lower and thenby upper value of their rectangles
For each sort M-2m+2 distribution of M+1 entries isdetermined
First group contains (m-1)+k entries while othercontains remaining (k=1,..,(M-2m+2))
For each distribution goodness value is measured
-
7/30/2019 The R* Tree
27/38
R*-tree: Split Algorithm (Cont..) SplitS1 Invoke ChooseSplitAxis to determine the axis, perpendicular to
which the split is performedS2 Invoke ChooseSplitIndex to determine the best distribution
into two groups along that axis
S3 Distribute the entries into two groups ChooseSplitAxisCSA1 For each axis
Sort the entries by the lower then by the upper value oftheir rectangles and determine all distributions asdescribed above. Compute S, the sum of all margin-
values of the different distributionsend
CSA2 Choose the axis with the minimum S as split axis
ChooseSplitIndexCSI1 Along the chosen split axis, choose the distribution with the
minimum overlap-value. Resolve ties by choosing thedistribution with minimum area-value
-
7/30/2019 The R* Tree
28/38
Reinsert Dealing with under filled nodes in R-tree: Remove its
entries and reinsert them
It improves retrieval performances
Deletion and reinsertions tunes R-tree but it is verystatic
To achieve dynamic reorganization R*-tree uses forced
reinsertion during insertion routine
-
7/30/2019 The R* Tree
29/38
Forced Reinsert If a node is overfilled, R*-tree takes p entries based on
distance of their center from center of MBR
Removes p entries and adjust the MBR
Reinserts them to prevent splits
If they are reinserted in the same node again then itcalls split
Now cpu cost of insert is increased but if we takeaverage on large insert it is only increased about 4%due to reduced splits and better structure
-
7/30/2019 The R* Tree
30/38
Forced Reinsert: Advantages More reconstruction, less split
Storage utilization is improved
Outer rectangles are reinserted, directory rectanglebecomes more quadratic which is a desired property
-
7/30/2019 The R* Tree
31/38
Experiments Comparison between four R-tree variants
R-tree with quadratic split algorithm (qua.Gut)
R-tree with linear split algorithm (lin.Gut)
Greenes variant of R-tree (Greene)
R*-tree
Six data files containing about 100,000 2D rectangles
All experiments were measured in number of diskaccesses
-
7/30/2019 The R* Tree
32/38
Experiments (Cont..) Types of queries
Rectangle intersection query given a rectangle S, find all rectangles R in the file with R S
Point query given a point P, find all rectangles R in the file with P R.
Rectangle enclosure query given a rectangle S, find all rectangles R in the file with R S
Spatial join Over two files as the set of all pairs of rectangles where rectangle
from f1 intersects rectangle from f2 Also measured the parameters insertion and storage
utilization
Seven query files were created
-
7/30/2019 The R* Tree
33/38
Results The page access for queries to R*-tree are standardized
to 100%.
Here is the relative performance for all 4 variants forR-tree
-
7/30/2019 The R* Tree
34/38
Results (Cont..) Unweighted average results over all distributions
-
7/30/2019 The R* Tree
35/38
Results (Cont..) R*-tree are very efficient for PAM
Even outperforms very popular 2-level grid file
-
7/30/2019 The R* Tree
36/38
Conclusions Since all three area, margin and overlap are reduced,
R*-tree is very robust against ugly data
Storage utilization is higher, insertion cost is low
Outperforms all existing R-tree variants
R*-tree can efficiently be used as an access method indatabase systems organizing both multidimensional
points and spatial data
-
7/30/2019 The R* Tree
37/38
References The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles
-N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990
http://en.wikipedia.org/wiki/R-tree
Image Sources: Maps & R-tee: http://electures.informatik.uni-freiburg.de/portal/download/3/8534/thm03%20-%20rTtee%20p1.ppt Thumbs Up: http://www.ideachampions.com/weblogs/Peer%20to%20Peer%20Recognition.png
Gears: https://reader009.{domain}/reader009/html5/0422/5adcbd7ec56a1/5adcbd8dbeb01.png Tree:http://a01421.deviantart.com/art/tree-variants-304634600
Choose: http://www.transforming-technologies.com/blog/index.php/2011/06/16/how-to-choose-an-esd-mat/
Original: http://www.pixmac.com/picture/original+ink+stamp/000045168969
Split: http://www.clipartguide.com/_pages/0511-1001-2605-2460.html
Forced: http://thepoliticalcarnival.net/2011/05/ Advantages : http://www.webgraffiti.ca/advantages.html
Experiments: http://nilssmith.com/becoming-a-social-media-pastor-part-4-the-experiment/
Results: http://www.iconshock.com/icons/sigma/project_managment/results-icon.html
Conclusions: http://herbertjlkld.portrelay.com/conclusions-clip-art.html
Introduction: http://www.eng.fju.edu.tw/iacd_2010S/computer/introduction1.htm
http://www.google.com/imgres?um=1&hl=en&safe=off&biw=1280&bih=622&tbm=isch&tbnid=qvmYAWv5AJ1YnM:&imgrefurl=http://a01421.deviantart.com/art/tree-variants-304634600&docid=QWzpLMFK3Va1XM&itg=1&imgurl=http://www.deviantart.com/download/304634600/tree_variants_by_a01421-d51ddk8.png&w=1500&h=2095&ei=r2wyUNqgJsuIrAfxgoEg&zoom=1&iact=hc&vpx=421&vpy=114&dur=1228&hovh=265&hovw=190&tx=87&ty=100&sig=114563138652018660749&page=1&tbnh=128&tbnw=92&start=0&ndsp=19&ved=1t:429,r:2,s:0,i:78http://www.google.com/imgres?um=1&hl=en&safe=off&biw=1280&bih=622&tbm=isch&tbnid=qvmYAWv5AJ1YnM:&imgrefurl=http://a01421.deviantart.com/art/tree-variants-304634600&docid=QWzpLMFK3Va1XM&itg=1&imgurl=http://www.deviantart.com/download/304634600/tree_variants_by_a01421-d51ddk8.png&w=1500&h=2095&ei=r2wyUNqgJsuIrAfxgoEg&zoom=1&iact=hc&vpx=421&vpy=114&dur=1228&hovh=265&hovw=190&tx=87&ty=100&sig=114563138652018660749&page=1&tbnh=128&tbnw=92&start=0&ndsp=19&ved=1t:429,r:2,s:0,i:78 -
7/30/2019 The R* Tree
38/38
Thanks
Q/A