I/O-Algorithms
description
Transcript of I/O-Algorithms
![Page 2: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/2.jpg)
Lars Arge
I/O-algorithms
2
I/O-Model
• ParametersN = # elements in problem instanceB = # elements that fits in disk blockM = # elements that fits in main memory
T = # output size in searching problem
• We often assume that M>B2
• I/O: Movement of block between memory and disk
D
P
M
Block I/O
![Page 3: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/3.jpg)
Lars Arge
I/O-algorithms
3
Until now: Data Structures• B-trees
– Trees with fanout B, balanced using split/fuse– query, space, update
• Weight-balanced B-trees– Weight balancing constraint rather than degree constraint– Ω(w(v)) updates below v between consecutive operations on v
• Persistent B-trees– Update current version (getting new version)– Query all versions
• Buffer-trees– amortized bounds using buffers and lazyness
)(log NO B)(log BT
B NO )( BNO
)log( 1BN
BMBO
![Page 4: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/4.jpg)
Lars Arge
I/O-algorithms
4
Until now: Data Structures• Special cases of two-dimensional range search:
– Diagonal corner queries: External interval tree– Three-sided queries: External priority search tree
query, space, update
• Same bounds cannot be obtained for general planar range searching
q3
q2q1q
q
)(log NO B)(log BT
B NO )( BNO
![Page 5: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/5.jpg)
Lars Arge
I/O-algorithms
5
Until now: Data Structures
• General planer range searching:– External range tree: query, space,
update – O-tree: query, space, update
q3
q2q1
q4
)( logloglog
NN
BN
BB
BO)(log BT
B NO
)( BNO)( B
TB
NO )(log NO B
)( logloglog2
NNBB
BO
![Page 6: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/6.jpg)
Lars Arge
I/O-algorithms
6
Techniques• Tools:
– B-trees– Persistent B-trees– Buffer trees– Logarithmic method– Weight-balanced B-trees– Global rebuilding
• Techniques:– Bootstrapping– Filtering
q3
q2q1
q3
q2q1
q4
(x,x)
![Page 7: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/7.jpg)
Lars Arge
I/O-algorithms
7
Other results• Many other results for e.g.
– Higher dimensional range searching– Range counting, range/stabbing max, and stabbing queries– Halfspace (and other special cases) of range searching– Queries on moving objects– Proximity queries (closest pair, nearest neighbor, point location)– Structures for objects other than points (bounding rectangles)
• Many heuristic structures in database community
![Page 8: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/8.jpg)
Lars Arge
I/O-algorithms
8
Point Enclosure Queries• Dual of planar range searching problem
– Report all rectangles containing query point (x,y)
• Internal memory:– Can be solved in O(N) space and O(log N + T) time
x
y
![Page 9: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/9.jpg)
Lars Arge
I/O-algorithms
9
Point Enclosure Queries• Similarity between internal and external results (space, query)
– in general tradeoff between space and query I/O
Internal External
1d range search (N, log N + T) (N/B, logB N + T/B)
3-sided 2d range search (N, log N + T) (N/B, logB N + T/B)
2d range search
2d point enclosure (N, log N + T)
BT
NN NN log,loglog
log BT
BNN
BN N
BB
B log,logloglog
TNN , BT
BN
BN ,
(N/B, log N + T/B)?2B
(N/B, log N+T/B)(N/B1-ε, logB N+T/B)
![Page 10: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/10.jpg)
Lars Arge
I/O-algorithms
10
Rectangle Range Searching• Report all rectangles intersecting query rectangle Q
• Often used in practice when handlingcomplex geometric objects– Store minimal bounding rectangles (MBR)
Q
![Page 11: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/11.jpg)
Lars Arge
I/O-algorithms
11
Rectangle Data Structures: R-Tree• Most common practically used rectangle range searching structure• Similar to B-tree
– Rectangles in leaves (on same level)– Internal nodes contain MBR of rectangles below each child
• Note: Arbitrary order in leaves/grouping order
![Page 12: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/12.jpg)
Lars Arge
I/O-algorithms
12
Example
![Page 13: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/13.jpg)
Lars Arge
I/O-algorithms
13
Example
![Page 14: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/14.jpg)
Lars Arge
I/O-algorithms
14
Example
![Page 15: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/15.jpg)
Lars Arge
I/O-algorithms
15
Example
![Page 16: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/16.jpg)
Lars Arge
I/O-algorithms
16
Example
![Page 17: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/17.jpg)
Lars Arge
I/O-algorithms
17
• (Point) Query:– Recursively visit
relevant nodes
![Page 18: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/18.jpg)
Lars Arge
I/O-algorithms
18
Query Efficiency• The fewer rectangles intersected the better
![Page 19: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/19.jpg)
Lars Arge
I/O-algorithms
19
Rectangle Order• Intuitively
– Objects close together in same leaves small rectangles queries descend in few subtrees
• Grouping in internal nodes?– Small area of MBRs– Small perimeter of MBRs– Little overlap among MBRs
![Page 20: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/20.jpg)
Lars Arge
I/O-algorithms
20
R-tree Insertion Algorithm
• When not yet at a leaf (choose subtree):– Determine rectangle whose area
increment after insertion issmallest (small area heuristic)
– Increase this rectangle if necessaryand recurse
• At a leaf:– Insert if room, otherwise Split Node
(while trying to minimize area)
![Page 21: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/21.jpg)
Lars Arge
I/O-algorithms
21
Node Split
New MBRs
![Page 22: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/22.jpg)
Lars Arge
I/O-algorithms
22
Linear Split Heuristic
• Determine R1 and R2 with largest MBR: the seeds for sets S1 and S2
• While not all MBRs distributed– Add next MBR to the set whose MBR increases the least
![Page 23: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/23.jpg)
Lars Arge
I/O-algorithms
23
Quadratic Split Heuristic
• Determine R1 and R2 with largest area(MBR)-area(R1) - area(R2): the seeds for sets S1 and S2
• While not all MBRs distributed– Determine for every not yet distributed rectangle Rj :
d1 = area increment of S1 Rj
d2 = area increment of S2 Rj
– Choose Ri with maximal
|d1-d2| and add to the set with
smallest area increment
![Page 24: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/24.jpg)
Lars Arge
I/O-algorithms
24
R-tree Deletion Algorithm
• Find the leaf (node) and delete object; determine new (possibly smaller) MBR
• If the node is too empty:– Delete the node recursively at its parent– Insert all entries of the deleted node into the R-tree
• Note: Insertions of entries/subtrees always occurs at the level where it came from
![Page 25: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/25.jpg)
Lars Arge
I/O-algorithms
25
![Page 26: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/26.jpg)
Lars Arge
I/O-algorithms
26
![Page 27: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/27.jpg)
Lars Arge
I/O-algorithms
27
![Page 28: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/28.jpg)
Lars Arge
I/O-algorithms
28
![Page 29: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/29.jpg)
Lars Arge
I/O-algorithms
29
Insert as rectangle on middle level
![Page 30: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/30.jpg)
Lars Arge
I/O-algorithms
30
Insert in a leaf
object
![Page 31: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/31.jpg)
Lars Arge
I/O-algorithms
31
R*-trees
• Why try to minimize area?– Why not overlap, perimeter,…
• R*-tree:– Experimentally determined algorithms
for Choose Subtree and Split Node
![Page 32: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/32.jpg)
Lars Arge
I/O-algorithms
32
R*-trees; Choose Subtree
• At nodes directly above leaves:– Choose entry (rectangle) with
smallest overlap-increase
• At higher nodes:– Choose entry (rectangle) with
smallest area-increase
![Page 33: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/33.jpg)
Lars Arge
I/O-algorithms
33
R*-trees; Split Node
• Determine split axis: For both the x- and the y-axis:– Sort the rectangles by smallest and largest coordinate– Determine the M-2m+2 allowed distributions into two groups– Determine for each the perimeter of the two MBRs– Add up all perimeters
• Choose the axis with smallest sum of perimeters
• Determine split index (given the split axis):– Choose the allowed distribution
among the M - 2m + 2 with thesmallest area of intersection ofthe MBRs
![Page 34: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/34.jpg)
Lars Arge
I/O-algorithms
34
R*-trees; Forced reinsert
• Intuition:– When building R-tree by repeated insertion first inserted
rectangles are possibly badly placed
• Experiment:– Make R-tree by inserting 20.000 rectangles– Delete the first inserted 10.000 and insert them again
• Search time improvement of 20-50%
![Page 35: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/35.jpg)
Lars Arge
I/O-algorithms
35
R-Tree Variants• Many, many R-tree variants (heuristics) have been proposed
• Often bulk-loaded R-trees are used (forced reinsertion intuition) – Much faster than repeated insertions– Better space utilization– Can optimize more “globally”– Can be updated using previous update algorithms
• Recently first worst-case efficient structure: PR-tree
![Page 36: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/36.jpg)
Lars Arge
I/O-algorithms
36
Pseudo-PR-Tree
)( BNO
)( BT
BNO
1. Place B extreme rectangles from each direction in priority leaves
2. Split remaining rectangles by xmin coordinates (round-robin using xmin, ymin, xmax, ymax– like a 4d kd-tree)
3. Recursively build sub-trees
Query in I/Os– O(T/B) nodes with priority leaf
completely reported– nodes with no priority
leaf completely reported
![Page 37: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/37.jpg)
Lars Arge
I/O-algorithms
37
Pseudo-PR-Tree: Query Complexity• Nodes v visited where all rectangles in at least one of the priority
leaves of v’s parent are reported: O(T/B)• Let v be a node visited but none of the priority leaves at its parent
are reported completely, consider v’s parent u
Q
2d 4d
xmax = xmin(Q)
ymin = ymax(Q)
![Page 38: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/38.jpg)
Lars Arge
I/O-algorithms
38
Pseudo-PR-Tree: Query Complexity• The cell in the 4d kd-tree region of u is
intersected by two different 3-dimensional hyper-planes defined by sides of query Q
• The intersection of each pair of such 3-dimensional hyper-planes is a 2-dimensional hyper-plane
• Lemma: # of cells in a d-dimensional kd-tree that intersect an axis-parallel f-dimensional hyper-plane is O((N/B)f/d)
• So, # such cells in a 4d kd-tree:
• Total # nodes visited:
)/( BNO
)//( BTBNO
u
![Page 39: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/39.jpg)
Lars Arge
I/O-algorithms
39
PR-tree from Pseudo-PR-Tree
![Page 40: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/40.jpg)
Lars Arge
I/O-algorithms
40
Query Complexity Remains Unchanged
# nodes visited on leaf level BTBN //
Next level: 22 //// BTBBNBN
3223 ////// BTBBNBBNBN
![Page 41: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/41.jpg)
Lars Arge
I/O-algorithms
41
PR-Tree• PR-tree construction in I/Os
– Pseudo-PR-tree in I/Os – Cost dominated by leaf level
• Updates– O(logB N) I/Os using known heuristics
* Loss of worst-case query guarantee– I/Os using logarithmic method
* Worst-case query efficiency maintained
• Extending to d-dimensions– Optimal O((N/B)1-1/d + T/B) query
)log( BN
BMBNO
)log( BN
BMBNO
)(log2 NO B
![Page 42: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/42.jpg)
Lars Arge
I/O-algorithms
42
Geometric Algorithms• We will now (quickly) look at geometric algorithms
– Solves problem on set of objects• Example: Orthogonal line segment intersection
– Given set of axis-parallel line segments, report all intersections
• In internal memory many problems is solved using sweeping
![Page 43: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/43.jpg)
Lars Arge
I/O-algorithms
43
Plane Sweeping
• Sweep plane top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates)– Top endpoint of vertical segment: Insert in T– Bottom endpoint of vertical segment: Delete from T– Horizontal segment: Perform range query with x-interval on T
![Page 44: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/44.jpg)
Lars Arge
I/O-algorithms
44
Plane Sweeping
• In internal memory algorithm runs in optimal O(Nlog N+T) time• In external memory algorithm performs badly (>N I/Os) if |T|>M
– Even if we implements T as B-tree O(NlogB N+T/B) I/Os
• Solution: Distribution sweeping
![Page 45: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/45.jpg)
Lars Arge
I/O-algorithms
45
Distribution Sweeping
• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between
– part of horizontal segment spanning slab(s) and vertical segments• Distribute data to M/B-1 slabs
– vertical segments and non-spanning parts of horizontal segments• Recurse in each slab
![Page 46: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/46.jpg)
Lars Arge
I/O-algorithms
46
Distribution Sweeping
• Sweep performed in O(N/B+T’/B) I/Os I/Os• Maintain active list of vertical segments for each slab (<B in memory)
– Top endpoint of vertical segment: Insert in active list– Horizontal segment: Scan through all relevant active lists
* Removing “expired” vertical segments* Reporting intersections with “non-expired” vertical segments
)log( BT
BN
BN
BMO
![Page 47: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/47.jpg)
Lars Arge
I/O-algorithms
47
Distribution Sweeping• Other example: Rectangle intersection
– Given set of axis-parallel rectangles, report all intersections.
![Page 48: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/48.jpg)
Lars Arge
I/O-algorithms
48
Distribution Sweeping
• Divide plane into M/B-1 slabs with O(N/(M/B)) endpoints each• Sweep plane top-down while reporting intersections between
– part of rectangles spanning slab(s) and other rectangles• Distribute data to M/B-1 slabs
– Non-spanning parts of rectangles • Recurse in each slab
![Page 49: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/49.jpg)
Lars Arge
I/O-algorithms
49
Distribution Sweeping
• Seems hard to perform sweep in O(N/B+T’/B) I/Os• Solution: Multislabs
– Reduce fanout of distribution to – Recursion height still– Room for block from each multislab (activlist) in memory
)( BM
)(log BN
BMO
![Page 50: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/50.jpg)
Lars Arge
I/O-algorithms
50
Distribution Sweeping
• Sweep while maintaining rectangle active list for each multisslab– Top side of spanning rectangle: Insert in active multislab list– Each rectangle: Scan through all relevant multislab lists
* Removing “expired” rectangles* Reporting intersections with “non-expired” rectangles
I/Os)log( BT
BN
BN
BMO
![Page 51: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/51.jpg)
Lars Arge
I/O-algorithms
51
Distribution Sweeping• Distribution sweeping can relatively easily be used to solve a
number of other problems in the plane I/O-efficiently
• By decreasing distribution fanout to for c≥1 a number of higher-dimensional problems can also be solved I/O-efficiently
))((1
cBMO
![Page 52: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/52.jpg)
Lars Arge
I/O-algorithms
52
Other Results
• Other geometric algorithms results include:– Red blue line segment intersection
(using distribution sweep, buffer trees/batched filtering, external fractional cascading)
– General planar line segment intersection(as above and external priority queue)
– 2d and 3d Convex hull:(Complicated deterministic 3d and simpler randomized)
– 2d Delaunay triangulation
![Page 53: I/O-Algorithms](https://reader036.fdocuments.in/reader036/viewer/2022081604/56814bf3550346895db8e40a/html5/thumbnails/53.jpg)
Lars Arge
I/O-algorithms
53
References• External Memory Geometric Data Structures
Lecture notes by Lars Arge.– Section 8-9
• The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. L. Arge, M. de Berg, H. Haverkort, and K. Yi. Proc. SIGMOD'04. – Section 1-2
• External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93. – Section 2.0-2.1