The R* Tree

download The R* Tree

of 38

Transcript of The R* Tree

  • 7/30/2019 The R* Tree

    1/38

    An Efficient and Robust Access

    Method forPoints and Rectangles

    Norbert Beckmann, Hans-Peter Kriegel,

    Ralf Schneider, Bernhard Seeger

    Presented By :

    Alok Kumar Yadav2009cs10177

  • 7/30/2019 The R* Tree

    2/38

    Overview Introduction

    Rtree and Its Optimization

    R-tree Variants R*-tree

    Experiments

    Conclusions

  • 7/30/2019 The R* Tree

    3/38

    Introduction

    Spatial Access Methods (SAMs):

    Approximation of complex spatial object by MBR

    Pros: Complex object can be represented by limited no of bytes

    Preserves most essential geometric properties i.e. location andextension

    Cons: (Obviously) A lot of information is lost

  • 7/30/2019 The R* Tree

    4/38

    Introduction (Cont..) R-tree:

    B+-tree like structure

    Popular access method for rectangles

    Based on the heuristic optimization ofarea of enclosingrectangles in each inner node

    R*-tree:

    Combined optimization: Area, Margin & Overlap Outperforms exiting R-tree variants

    Efficiently supports point and spatial data

  • 7/30/2019 The R* Tree

    5/38

    Introduction (Cont..)

    Given a city map, index all university buildings in an efficient structure for quicktopological search.

  • 7/30/2019 The R* Tree

    6/38

    Introduction (Cont..)

  • 7/30/2019 The R* Tree

    7/38

    Introduction (Cont..)

    MBRof the cityneighbourhoods.

    MBRof the citydefining the

    overall search

    region.

  • 7/30/2019 The R* Tree

    8/38

    R-tree B+-tree like structure

    Nodes : E (cp, rectangle)

    cp : Child Pointer

    For leaves it is a record in database

    rectangle :

    MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object

    Max no of elements in a node is M, min is m

  • 7/30/2019 The R* Tree

    9/38

    R-tree (Cont..)Structure

    I(A) I(B) I(M)

    I(a) I(b) I(c) I(d)

    B

    a

    b

    c

    d

  • 7/30/2019 The R* Tree

    10/38

    R-tree (Cont..) B+-tree like structure

    Nodes : E (cp, rectangle) cp :

    Child Pointer For leaves it is a record in database

    rectangle : MBR of all rectangle in child node

    For leaves it is enclosing rectangle of spatial object

    Optimization criterion: Minimization of areaof enclosingrectangles in each inner node

    Allows overlapping of directory rectangles, hence cannotguarantee a single search path

  • 7/30/2019 The R* Tree

    11/38

    Insertion Algorithm

    ChooseSubtree Algorithm

    Split Algorithm

    If ends in a node

    filled with M entries

    R-tree Variants It is dynamic, hence all optimization

    have to applied during insertion

    Finds most suitable subtree fornew entry

    If node is filled with more then M

    entries, distribute in two nodes in a

    appropriate manner

  • 7/30/2019 The R* Tree

    12/38

    R-tree Variants Original R-tree : Guttman

    Greenes R-tree

  • 7/30/2019 The R* Tree

    13/38

    Original R-tree: Guttman Method of optimization is minimize directory

    rectangle area

    Split algos: Exponential, Linear, Quadratic Exponential best but cpu cost too high

    Others are approximations

    Quadratic outperforms linear

  • 7/30/2019 The R* Tree

    14/38

    Guttmans ChooseSubtreeCS1 Set N to be the root node

    CS2 If N is a leaf,

    return N

    else

    Choose the entry in N whose rectangle needs least areaenlargement to include the new data. Resolve ties bychoosing the entry with the rectangle of smallest area

    endCS3 Set N to be the childnode pointed to by the childpointer of

    the chosen entry. Repeat from CS2

  • 7/30/2019 The R* Tree

    15/38

    Guttmans Split AlgorithmQuadratic Split[Divide a set of M+1 index entries into two groups]

    QS1 Invoke PickSeeds to choose two entries, each be first entryof each group

    QS2 Repeat

    DistributeEntry

    until

    all entries are distributed or one of the two groups hasMm+1 entries (so that the other group has m entries)

    QS3 If entries remain, assign them to the other group so that ithas the minimum number m required

  • 7/30/2019 The R* Tree

    16/38

    Guttmans Split Algorithm (Cont..) PickSeedsPS1 For each pair of entries E1 and E2, compose a

    rectangle R including E1 rectangle and E2 rectangle

    Calculate d = area(R) - area(E1 rectangle) - area(E2 rectangle)PS2 Choose the pair with the largest d

    DistributeEntry

    DE1 Invoke PickNext to choose the next entry to be assignedDE2 Add It to the group whose covering rectangle will have tobe enlarged least to accommodate It. Resolve ties by addingthe entry to the group with the smallest area, then to theone with the fewer entries, then to either

  • 7/30/2019 The R* Tree

    17/38

    Guttmans Split Algorithm (Cont..)PickNextDE1 For each entry E not yet in a group, calculate

    d1 = the area increase required in the coveringrectangle of Group 1 to include E Rectangle.

    Calculate d2 analogously for Group 2.

    DE2 Choose the entry with the maximum difference

    between d1 and d2

  • 7/30/2019 The R* Tree

    18/38

    Problems Small Seeds:

    If d-1 of the d axes of a far away rectangle is same as oneseed, needle like bounding rectangle may be formed

    May initiate a bad split

    R1 R2

    R3

  • 7/30/2019 The R* Tree

    19/38

    Problems (Cont..) Prefer Bounding Rectangle:

    Algo prefer the MBR created

    from previous assignment Since it was enlarged, it

    requires less area enlarge-

    ment to include next entry

    If a group reached M-m+1 entries, rest are assigned

    to other without considering geometric properties

    Z

    YX

    W

    G2

    G1

  • 7/30/2019 The R* Tree

    20/38

    Greenes R-tree ChooseSubtree is same as Guttmans

    Alternative split algorithm

    Invokes PickSeeds to find two most distant rectangles Picks a axis using these two rectangles depending

    upon there separation distance

    Sorts the remaining rectangles along chosen axis

    Distributes half entries to one group and remaining toother

  • 7/30/2019 The R* Tree

    21/38

    Greenes R-tree (Cont..) In some situations cannot find right axis and bad split

    may occur

  • 7/30/2019 The R* Tree

    22/38

    Inspiration For R*-tree Minimize overlap between directory rectangles :

    Decrease no of path to be traversed

    Minimize margin of a directory rectangle : Rectanglewould be shaped more quadratic

    Optimize space utilization : Height will be low

  • 7/30/2019 The R* Tree

    23/38

    R*-tree Structure same as R-tree

    For insertion R-tree versions only consider area

    R*-tree consider area, margin & overlap in differentcombinations

    Overlap is defined as

  • 7/30/2019 The R* Tree

    24/38

    R*-tree: ChooseSubtree Similar to original one, only difference is that it minimizes overlap enlargement

    when N points to leaves

    CS1 Set N to be the root node

    CS2 If N is a leaf,return N

    else Ifchildpointers in N point to leaves [determine theminimum overlap cost],choose the entry in N whose rectangle needs leastoverlap enlargement to include the new data rectangle.Resolve ties by choosing the entry whose rectangleneeds least area enlargement, then the entry with therectangle of smallest area

    else [determine the minimum area cost]choose the entry in N whose rectangle needs least areaenlargement to include the new data rectangle. Resolveties by choosing entry with rectangle of smallest area

    end

    CS3 Set N to be the childnode pointed to by the childpointer ofthe chosen entry. Repeat from CS2

  • 7/30/2019 The R* Tree

    25/38

    R*-tree: Split Algorithm Three goodness values:

    area-value area[bb(first group)] +

    area[bb(second group)]

    margin-value margin[bb(first group)] +margin[bb(second group)]

    overlap-value area[bb(first group)

    bb(second group)]

    Depending on these values final distribution is determined

    *bb: bounding box

  • 7/30/2019 The R* Tree

    26/38

    R*-tree: Split Algorithm (Cont..) Method for good split:

    Along each axis entries are sorted first by lower and thenby upper value of their rectangles

    For each sort M-2m+2 distribution of M+1 entries isdetermined

    First group contains (m-1)+k entries while othercontains remaining (k=1,..,(M-2m+2))

    For each distribution goodness value is measured

  • 7/30/2019 The R* Tree

    27/38

    R*-tree: Split Algorithm (Cont..) SplitS1 Invoke ChooseSplitAxis to determine the axis, perpendicular to

    which the split is performedS2 Invoke ChooseSplitIndex to determine the best distribution

    into two groups along that axis

    S3 Distribute the entries into two groups ChooseSplitAxisCSA1 For each axis

    Sort the entries by the lower then by the upper value oftheir rectangles and determine all distributions asdescribed above. Compute S, the sum of all margin-

    values of the different distributionsend

    CSA2 Choose the axis with the minimum S as split axis

    ChooseSplitIndexCSI1 Along the chosen split axis, choose the distribution with the

    minimum overlap-value. Resolve ties by choosing thedistribution with minimum area-value

  • 7/30/2019 The R* Tree

    28/38

    Reinsert Dealing with under filled nodes in R-tree: Remove its

    entries and reinsert them

    It improves retrieval performances

    Deletion and reinsertions tunes R-tree but it is verystatic

    To achieve dynamic reorganization R*-tree uses forced

    reinsertion during insertion routine

  • 7/30/2019 The R* Tree

    29/38

    Forced Reinsert If a node is overfilled, R*-tree takes p entries based on

    distance of their center from center of MBR

    Removes p entries and adjust the MBR

    Reinserts them to prevent splits

    If they are reinserted in the same node again then itcalls split

    Now cpu cost of insert is increased but if we takeaverage on large insert it is only increased about 4%due to reduced splits and better structure

  • 7/30/2019 The R* Tree

    30/38

    Forced Reinsert: Advantages More reconstruction, less split

    Storage utilization is improved

    Outer rectangles are reinserted, directory rectanglebecomes more quadratic which is a desired property

  • 7/30/2019 The R* Tree

    31/38

    Experiments Comparison between four R-tree variants

    R-tree with quadratic split algorithm (qua.Gut)

    R-tree with linear split algorithm (lin.Gut)

    Greenes variant of R-tree (Greene)

    R*-tree

    Six data files containing about 100,000 2D rectangles

    All experiments were measured in number of diskaccesses

  • 7/30/2019 The R* Tree

    32/38

    Experiments (Cont..) Types of queries

    Rectangle intersection query given a rectangle S, find all rectangles R in the file with R S

    Point query given a point P, find all rectangles R in the file with P R.

    Rectangle enclosure query given a rectangle S, find all rectangles R in the file with R S

    Spatial join Over two files as the set of all pairs of rectangles where rectangle

    from f1 intersects rectangle from f2 Also measured the parameters insertion and storage

    utilization

    Seven query files were created

  • 7/30/2019 The R* Tree

    33/38

    Results The page access for queries to R*-tree are standardized

    to 100%.

    Here is the relative performance for all 4 variants forR-tree

  • 7/30/2019 The R* Tree

    34/38

    Results (Cont..) Unweighted average results over all distributions

  • 7/30/2019 The R* Tree

    35/38

    Results (Cont..) R*-tree are very efficient for PAM

    Even outperforms very popular 2-level grid file

  • 7/30/2019 The R* Tree

    36/38

    Conclusions Since all three area, margin and overlap are reduced,

    R*-tree is very robust against ugly data

    Storage utilization is higher, insertion cost is low

    Outperforms all existing R-tree variants

    R*-tree can efficiently be used as an access method indatabase systems organizing both multidimensional

    points and spatial data

  • 7/30/2019 The R* Tree

    37/38

    References The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles

    -N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990

    http://en.wikipedia.org/wiki/R-tree

    Image Sources: Maps & R-tee: http://electures.informatik.uni-freiburg.de/portal/download/3/8534/thm03%20-%20rTtee%20p1.ppt Thumbs Up: http://www.ideachampions.com/weblogs/Peer%20to%20Peer%20Recognition.png

    Gears: https://reader009.{domain}/reader009/html5/0422/5adcbd7ec56a1/5adcbd8dbeb01.png Tree:http://a01421.deviantart.com/art/tree-variants-304634600

    Choose: http://www.transforming-technologies.com/blog/index.php/2011/06/16/how-to-choose-an-esd-mat/

    Original: http://www.pixmac.com/picture/original+ink+stamp/000045168969

    Split: http://www.clipartguide.com/_pages/0511-1001-2605-2460.html

    Forced: http://thepoliticalcarnival.net/2011/05/ Advantages : http://www.webgraffiti.ca/advantages.html

    Experiments: http://nilssmith.com/becoming-a-social-media-pastor-part-4-the-experiment/

    Results: http://www.iconshock.com/icons/sigma/project_managment/results-icon.html

    Conclusions: http://herbertjlkld.portrelay.com/conclusions-clip-art.html

    Introduction: http://www.eng.fju.edu.tw/iacd_2010S/computer/introduction1.htm

    http://www.google.com/imgres?um=1&hl=en&safe=off&biw=1280&bih=622&tbm=isch&tbnid=qvmYAWv5AJ1YnM:&imgrefurl=http://a01421.deviantart.com/art/tree-variants-304634600&docid=QWzpLMFK3Va1XM&itg=1&imgurl=http://www.deviantart.com/download/304634600/tree_variants_by_a01421-d51ddk8.png&w=1500&h=2095&ei=r2wyUNqgJsuIrAfxgoEg&zoom=1&iact=hc&vpx=421&vpy=114&dur=1228&hovh=265&hovw=190&tx=87&ty=100&sig=114563138652018660749&page=1&tbnh=128&tbnw=92&start=0&ndsp=19&ved=1t:429,r:2,s:0,i:78http://www.google.com/imgres?um=1&hl=en&safe=off&biw=1280&bih=622&tbm=isch&tbnid=qvmYAWv5AJ1YnM:&imgrefurl=http://a01421.deviantart.com/art/tree-variants-304634600&docid=QWzpLMFK3Va1XM&itg=1&imgurl=http://www.deviantart.com/download/304634600/tree_variants_by_a01421-d51ddk8.png&w=1500&h=2095&ei=r2wyUNqgJsuIrAfxgoEg&zoom=1&iact=hc&vpx=421&vpy=114&dur=1228&hovh=265&hovw=190&tx=87&ty=100&sig=114563138652018660749&page=1&tbnh=128&tbnw=92&start=0&ndsp=19&ved=1t:429,r:2,s:0,i:78
  • 7/30/2019 The R* Tree

    38/38

    Thanks

    Q/A