G eneral i zed S earch T rees
description
Transcript of G eneral i zed S earch T rees
![Page 1: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/1.jpg)
Generalized Search Trees
J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int’l Conf.
On VLDB, Sep. 1995
Presented By Ihab Ilyas
![Page 2: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/2.jpg)
Topics
Motivation.Database Search Trees.Generalized Search Tree.Properties.Methods.Applications.
![Page 3: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/3.jpg)
Motivation
New applications (Multimedia, CAD tools, document libraries…etc.)
New Data types
Extending search trees to maximum flexibility
![Page 4: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/4.jpg)
Specialized Search TreesExample: Spatial Search Trees ( R-Trees)Problem: New Applications implies new tree
structure from scratchSearch Trees For Extensible Data TypesExample: Extending B+ to index any ordinal
dataProblem: Extending data but not the set of
queries supported.
Before GiST
![Page 5: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/5.jpg)
GiST
A third direction for extending search trees
Extensible both in data types supported and in the queries applied on this data.
Allows new data types to be indexed in a manner that supports the queries natural to the data type.
![Page 6: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/6.jpg)
GiST (Cont.)
Unifies previously disparate structures for currently common data types.Examples: B+ and R trees can be
implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.
![Page 7: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/7.jpg)
Database Search Trees
Canonical rough picture of database search tree
Leaf nodes (Linked List)
Internal Nodes
Key1 Key2 ….
![Page 8: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/8.jpg)
Search Trees (cont.)
Search Key: A search key may be arbitrary predicate that holds for each datum below the key.
Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.
![Page 9: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/9.jpg)
Generalized Search Tree
Definition: A GiST is a balanced multi-way tree of variable fan-out between kM and M Where k is the fill factor.
With the exception of the root node that can have fan-out from 2 to M.
212
kM
![Page 10: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/10.jpg)
GiST (Cont.)
Leaf nodes: (p,ptr)p: Predicate used as a search key.ptr: the identifier of some tuple of the database.
Non-leaf nodes: (p,ptr)p: Predicate used as a search key.ptr: Pointer to another tree node.
![Page 11: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/11.jpg)
Properties
Every node contain between kM and M unless it is the root.For each index entry (p,ptr) in a leaf node, p holds for the tuple For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr.All leaves appear on the same level.
![Page 12: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/12.jpg)
Note on Properties
…. (p,ptr) …..
…. (p’,ptr’) …..
…. (p1,ptr1) ….. …. (p2,ptr2)
p holds for p1,p2
p’ holds for p1,p2
p’ p Not Required
The ability of orthogonal classification.. Recall R-Tree
![Page 13: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/13.jpg)
GiST Methods
Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree.Tree Methods: Provided by the GiST, and may invoke the required key methods.
![Page 14: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/14.jpg)
Key Methods
Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise.Union(P): returns predicate r that holds for all predicates in PCompress(E): returns (p’,ptr).Decompress(E): returns (r,ptr) where pr. This a lossy compression as we do not require p r
E is an entry of the form (p,ptr) , q is a query, P a set of entries
![Page 15: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/15.jpg)
Key Methods (Cont.)
Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2).PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.
![Page 16: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/16.jpg)
Tree Methods
Search: Controlled by the Consistent Method.Insert: Controlled by the Penalty and PickSplit.Delete: Controlled by the Consistent
![Page 17: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/17.jpg)
ExampleNew (q,ptr)
Penalty = m Penalty = nm < n
Penalty =i Penalty = j j < i
Full.. Then split according to PickSplit
(p,ptr) (p,ptr) (p,ptr)
(p,ptr) (p,ptr)
(p,ptr) (p,ptr)
R
(p,ptr) (p,ptr) (p,ptr) (p,ptr)(q,ptr) (p,ptr) (p,ptr)
New (q,ptr)
![Page 18: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/18.jpg)
Applications
GiST Over Z (B+ Trees)
GiST Over Polygons in R2 (R Trees)
![Page 19: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/19.jpg)
B+ Trees Using GiST
p here is on the form Contains([xp,yp),v)Consistent(E,q) returns true if If q= Contains([xq,yq),v): (xp<yq)^(yp>xq) If q= Equal (xq,v): xp xq <yp
Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).
![Page 20: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/20.jpg)
B+ Trees Using GiST (Cont.)
Penalty(E,F) If E is the leftmost pointer on its node, returns
MAX(y2-y1,0) If E is the rightmost pointer on its node, returns
MAX(x1-x2,0) Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0)
PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.
2P
![Page 21: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/21.jpg)
B+ Trees Using GiST (Cont.)
Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x Decompress(E) if E is the leftmost key on a non-leaf node let x= -
otherwise let x=E.p.x If E is the rightmost key on a non-leaf node let y= . If
E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1
![Page 22: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/22.jpg)
R - Trees Using GiST
The key here is in the form (xul,yul,xlr,ylr)
Query predicates are: Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))
Returns true if (xul1 xul2) ^( yul1 yul2) ^ ( xlr1 xlr2) ^ ( ylr1 ylr2)
Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))Returns true if (xul1 xlr2) ^( yul1 ylr2) ^ ( xul2 xlr1) ^ ( ylr1 yul2)
Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))Returns true if (xul1= xul2) ^( yul1= yul2) ^ ( xlr1= xlr2) ^ ( ylr1= ylr2)
![Page 23: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/23.jpg)
R – Trees Using GiST(Cont.)
Consistent(E,q) p contains (xul1,yul1,xlr1,ylr1), and q is either
Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2)Returns true if Overlaps ((xul1,yul1,xlr1,ylr1),
(xul2,yul2,xlr2,ylr2))
Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.
![Page 24: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/24.jpg)
R – Trees Using GiST (Cont.)
Penalty(E,F)Compute q= Union(E,F) and return
area(q) – area(E.p)
PickSplit(P)Variety of algorithms are provided to best
split the entries in a over-full node.
![Page 25: G eneral i zed S earch T rees](https://reader035.fdocuments.in/reader035/viewer/2022081502/56815d3c550346895dcb418b/html5/thumbnails/25.jpg)
R – Trees Using GiST (Cont.)
Compress(E)Form the bounding rectangle of E.p
Decompress(E)The identity function