Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13...

44
Algorithms and Data Structures for Database Systems Jürgen Treml 2005-06-08

Transcript of Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13...

Page 1: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms and Data Structures for Database Systems

Jürgen Treml2005-06-08

Page 2: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 3: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 4: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Introduction, Motivation

Importance of effectively storing and indexing spatial data (CAD, VR, Image Processing, Cartography, …)One-dimensional indexes not suitableStrong limitations with most spatial structures (e.g. point data only, not dynamic, performance with paged memory, …)

Page 5: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Introduction, Motivation

Antonin Guttman, 1984“A Dynamic Index Structure For Spatial Searching”Based on B-trees

Region-trees (R-trees)

Page 6: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 7: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Index StructureData objects are re-

presented by their MBRs

Page 8: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Index Structure

Formal DescriptionStructure consisting of (regular) nodes containing tuples ),( CPI n

At the lowest level: leaf-nodes with tuples),( TIDIn

MBR (minimum bounding rectangle)),...,,( 10 nn IIII =

Page 9: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Index Structure

Important ParametersMaximum number of elements per node

MMinimum number of elements per node

2Mm ≤

Height of an R-tree⎡ ⎤ 1log −Nm

Page 10: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Index Structure

Important RestrictionsAll leaf-nodes are on the same levelRoot has at least two children (unless it is a leaf)

Page 11: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 12: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Search

Similar to B-tree searchQuite easy & straight forward(Traverse the whole tree starting at the root node)

No guarantee on good worst-case performance!(Possible overlapping of rectangles of entries within a single node!)

Page 13: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Search

Short Description:For all entries in a non-leaf node:Check if overlapping

If yes: check node pointed to by this entryIf node is a leaf-node:Check all entries if overlapping the search object

If yes: entry is a qualifying record!

Page 14: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Search

R111 R112 R113 R131 R132 R133 R221 R222 R223

R12R11 R13 R22R21

R121 R122 R211 R212

R1 R2

R1

R2

R11

R12

R13

R22

R21

R111

R112

R113

R121

R122

R131

R133

R132

R221

R222R223

R211

R212

Page 15: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 16: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Insert

Again: Similar to corresponding B-tree operationBasically consist of 3 parts:

1. CHOOSELEAF(Find place to insert new object)

2. INSERT(Insert the new object)

3. ADJUSTTREE(Adjusting preceding nodes)

Page 17: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Insert

Short Description:CHOOSELEAF:Start with root Run through all nodes:Find the one which would have to be least enlarged to include given object!INSERT:Check if room for another entry

Insert new entry directly OR after calling SPLITNODE (in case of no room)

Page 18: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Insert

Short Description:CHOOSELEAF (…)INSERT (…)ADJUSTTREE:Ascend from node with new entry:Adjust all MBRs!In case of node split:Add new entry to parent node(If no room in parent node, invoke SPLITNODE again)Propagate upwards till root is reached!

Page 19: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Insert

R1

R2

R11

R12

R13

R22

R21

R111

R112

R113

R121

R122

R131

R133

R132

R221

R222R223

R211

R212

Find node to insert the new object into:

R111 R112 R113 R131 R132 R133 R221 R222 R223

R22R21

R121 R122 R211 R212

R12 R13R11

R2R1

Leaf is full! Overflow!

Page 20: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R11

R111

R112

R113

R1

R2

R12

R13

R22

R21

R121

R122

R131

R133

R132

R221

R222

R223

R211

R212

Algorithms on R-trees: InsertR11

R111

R112

R113

R11

R111

R112

R113

R11’

R111 R112 R113R111 R112 R113

R11

R111

R112

R113

R11’

Page 21: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Insert

R111 R112 R113 R131 R132 R133 R221 R222 R223

R22R21

R121 R122 R211 R212

R12 R13R11

R2R1

R221 R222 R223R211 R212R111 R112 R131 R132 R133

R22R21

R121 R122

R1 R2R1’

R12 R13R11’

R113

R11

R131 R132 R133 R221 R222 R223

R22R21

R121 R122 R211 R212

R12 R13R11

R2R1

R111 R112 R113 R131 R132 R133 R221 R222 R223

R22R21

R121 R122 R211 R212

R12 R13R11

R2R1

R111 R112 R113 R131 R132 R133 R221 R222 R223

R22R21

R121 R122 R211 R212

R12 R13R11

R2R1

R111 R112 R113

Page 22: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Delete

NOT similar to B-tree DELETE(Treatment of underflows)B-tree:Merge under-full node with “neighbor”R-tree:Delete under-full node and re-insertWhy?Re-use of INSERT routineIncrementally refines spatial structure

Page 23: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Delete

“Very” Short DescriptionFINDLEAF:Locate the leaf that contains object to be deletedDELETE:Delete entry from nodeCONDENSETREE:Delete and re-insert node if under-full (and in the following all resulting under-full nodes)

Page 24: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: DeleteR1 R2

R12R11 R13

R111 R112 R113

R221 R222 R223

R121 R122 R131 R132 R133

R211 R212

R22R21

R22

R211 R221 R222 R223

R22

R211 R221 R222 R223

R1

R111 R112 R113 R121 R122 R131 R132 R133

R12R11 R13 R22

R222 R223 R211R221

R1 R2

R12R11 R13

R22R21

R111 R112 R113

R221 R222 R223

R121 R122 R131 R132 R133

R211 R212

R111 R112 R113 R121 R122 R131 R132 R133

R12R11 R13 R22

R222 R223 R211R221

R1 R2

R12R11 R13

R22R21

R111 R112 R113

R221 R222 R223

R121 R122 R131 R132 R133

R211 R212

Page 25: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 26: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Splitting

Splitting of nodes necessary after underflow or overflow (as a result of a delete or insert operation)Ultimate goal: Minimize the resulting node’s MBRsSecondary aim: Do it fast! ;-)3 Implementations by Guttmann:Exhaustive, Quadratic-cost, Linear-cost

Page 27: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Splitting

Minimal resulting MBRs:

Bad split Good split

Exhaustive Approach:Simply try all possible split-ups!

Page 28: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Splitting

Quadratic-cost Algorithm:Pick the 2 out of M entries that would consume the most space if put together.

Put one in each groupFor all remaining entries: Pick the one that would make the biggest difference in area when put to one of the two groups

Add it to the one with the least differenceFinished when all entries are put in either group!Quadratic in M, linear in the number of dimensions

Page 29: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Splitting

Linear-cost Implementation:Basically the same as quadratic-cost algorithmDifferences:First pair is picked by finding the two rectangles with the greatest normalized separation in any dimension.Remaining pairs are selected randomly.Linear in M as well as in the number of dimensions

Page 30: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 31: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Algorithms on R-trees: Other Ops

Modified search algorithmKey search / search for specific entries

Range deletion

R-trees are quite as well extensible and efficient as B-trees for the matter of algorithms

Page 32: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 33: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Performance of R-trees

Why do Benchmarking?Proof practicality of the structureFind suitable values for m and MTest various node-splitting algorithms

Testing environment:C-implementation running under UNIXLayout of a RISC-II chip with 1057 rectanglesDifferent page sizes (128 – 2048 bytes)Various values for M (6 – 102)

Page 34: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Performance of R-treesLinear INSERT fastestLinear INSERT quite insensitive to M and mQuadratic INSERT depends on M as well as mDELETE extremely sensitive to m

Page 35: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Performance of R-treesSame page hit / miss performance for linear and quadratic splitSlight advantage for exhaustive version ;-)Space usage strongly depending on m

Page 36: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Performance of R-treesQuadratic-cost splitting with jumps at INSERT operations for increasing amount of dataLinear INSERT extremely constantR-tree structure very effective in directing search to small subtrees

Page 37: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 38: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Modifications

Why?Improve performanceOptimize space usage

Different forms of R-trees:R+-trees: Reduce overlapping MBRs Increase search performanceR*-trees: Take overlapping and circumference in to consideration when performing INSERT or DELETE

Page 39: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

R-tree Modifications

TV-trees: Allow more complex objects than MBRsX-trees: Avoid / reduce overlapping by dynamically adjusting node capacitiesQR-trees: Hybrid concept, combining R-trees with quad trees

Page 40: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 41: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Conclusions

Reasons for R-trees:Importance of being able to store and effectively search spatial dataExisting structures with many limitations

Basic Idea:Structure similar to B-treeRepresent objects by their MBRAllow overlapping

Page 42: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

ConclusionsAlgorithms:

Search and insert similar to B-tree operationsDelete performing re-insert instead of merge for under-full nodes (incrementally refines spatial structure)Node splitting: Benchmarks show that linear version performs quite as well as quadratic-cost implementation

Modifications:Structure is easy to adapt for special applications and their needsPerformance advantages are usually paid for with price of a structure that gets harder to maintain

R-trees are the spatial correspondent of B-trees!

Page 43: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview

Overview – Where are we?

1. Introduction, Motivation2. R-tree Index Structure – Overview3. Algorithms on R-trees

SearchingInsert / DeleteNode SplittingUpdates & other Operations

4. Performance, Benchmarks5. R-tree Modifications6. Conclusion

Page 44: Algorithms and Data Structures for Database Systems · R121 R122 R211 R212 R1 R2 R1 R2 R11 R12 R13 R22 R21 R111 R112 R113 R121 R122 R131 R133 R132 R221 R222 R223 R211 R212. Overview