SD-Rtree: A Scalable Distributed Rtree

33
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux

description

SD-Rtree: A Scalable Distributed Rtree. Witold Litwin & Cédric du Mouza & Philippe Rigaux. Plan. Introduction SDDS R-tree SD-Rtree Evolution Balancing Spatial Rotations Overlapping Redundant Coverage Queries Performance Conclusion. SDDS Principles (1993). - PowerPoint PPT Presentation

Transcript of SD-Rtree: A Scalable Distributed Rtree

Page 1: SD-Rtree: A Scalable Distributed Rtree

1

SD-Rtree: A Scalable Distributed Rtree

Witold Litwin &Cédric du Mouza & Philippe

Rigaux

Page 2: SD-Rtree: A Scalable Distributed Rtree

2

Plan

Introduction SDDS R-tree

SD-Rtree Evolution Balancing

Spatial Rotations Overlapping

Redundant Coverage Queries Performance Conclusion

Page 3: SD-Rtree: A Scalable Distributed Rtree

3

SDDS Principles (1993)

Data are at server nodes Communicating through point-to-point

messaging ; Overloaded servers split over new

servers Queries go to client nodes use local

images of the SDDS No central addressing component A node can be client and server (peer)

Page 4: SD-Rtree: A Scalable Distributed Rtree

4

SDDS Principles (1993)

An outdated image may send a query an incorrect server

Servers forward such a query to the correct server

Image gets adjusted Image Adjustment Message (IAM) comes back

Client does not repeat the same error twice Data are basically in the RAM of the servers

Page 5: SD-Rtree: A Scalable Distributed Rtree

5

SD-Rtree : a Spatial SDDS

Distributed Spatial Data

Page 6: SD-Rtree: A Scalable Distributed Rtree

6

SD-Rtree : a Spatial SDDS

•Distributed Index • No central component

Page 7: SD-Rtree: A Scalable Distributed Rtree

7

SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)

Page 8: SD-Rtree: A Scalable Distributed Rtree

8

SD-Rtree : Generalizes R-tree

R-tree: Nodes are minimal

bounding boxes Leaf nodes point to

data Internal nodes

bound subtrees May overlap Split when overflow Generate balanced

m-ary tree

Page 9: SD-Rtree: A Scalable Distributed Rtree

9

SD-Rtree : Generalizes R-tree

R-tree: An insert may

go through multiple paths

Ends up in the smallest bounding box

If there is any One of the

boxes gets enlarged

Box may split

Page 10: SD-Rtree: A Scalable Distributed Rtree

10

SD-Rtree : Generalizes R-tree

R-tree: Search may go

through multiple paths

All paths may bring relevant objects

Page 11: SD-Rtree: A Scalable Distributed Rtree

13

SD-Rtree: a Balanced Binary Tree

The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has

exactly two sons Each leaf node stores a subset of the

indexed dataset At each node, the height of the subtrees

differ by at most one Each server stores one data node and one

routing node

Page 12: SD-Rtree: A Scalable Distributed Rtree

14

Sd-tree: Binary Tree Structure

di = data node (leaf) ri = routing node (internal node)

Page 13: SD-Rtree: A Scalable Distributed Rtree

15

Sd-tree: Tree Distribution

Page 14: SD-Rtree: A Scalable Distributed Rtree

17

SD-Rtree Balancing

The binary tree should be height-balanced The heights of the two subtrees

rooted at any node should not differ by more than 1 (cf. AVL trees)

The tree height is then logarithmic in the number of leaves

Page 15: SD-Rtree: A Scalable Distributed Rtree

18

SD-Rtree Balancing

SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the

height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial

change rectangles of internal nodes Best rotation minimizes rectangle overlapping

Tie breaking minimizes the « dead space »

Page 16: SD-Rtree: A Scalable Distributed Rtree

20

Rotation Pattern

Properties The sons of a node are

not ordered => more freedom for

reorganizing the tree Any imbalanced node

matches a rotation pattern

A rotation pattern is a subtree a(b(e(f,g),d),c) such that:

h(c) = h(d) = h(f ) = n − 1 (n > 0)

h(g) = max(0, n − 2)

Page 17: SD-Rtree: A Scalable Distributed Rtree

21

SD-Rtree :Spatial Rotation

Page 18: SD-Rtree: A Scalable Distributed Rtree

22

Rotation Cost Constant number of messages (3 or 6,

depending on the choice) Few rotations in practice

In particular when the dataset is uniformly distributed

See our experiments

Page 19: SD-Rtree: A Scalable Distributed Rtree

23

SD-Rtree : Images

Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact

server IAMs make it a subtree

Splits make images outdated IAMs adjust it incrementally

Page 20: SD-Rtree: A Scalable Distributed Rtree

24

Image Adjustment Client contacts a server with a query Each incorrect server initiates a

traversal of the tree During the traversal, the description

of the nodes is collected The correct server sends the up-to-

date tree structure The client updates its image

Page 21: SD-Rtree: A Scalable Distributed Rtree

26

Out-of-range situation

Page 22: SD-Rtree: A Scalable Distributed Rtree

27

Insertion of objects

Page 23: SD-Rtree: A Scalable Distributed Rtree

28

Overlapping management The directory rectangles in an Rtree may

overlap Local subtree does not suffice for locating all

the nodes that contains the point (point query) or the window (window query) searched for.

SD-Rtree servers maintain data on node overlapping Redundant Coverage

It avoids to systematically access the root node.

Page 24: SD-Rtree: A Scalable Distributed Rtree

29

Redundant Coverage Example

The region common to A and B is stored on both nodes

If a point query sent to A falls in the region shared with B: A sends a point query message to B

For D: we must keep the intersection with C or B: here empty.

Page 25: SD-Rtree: A Scalable Distributed Rtree

30

Queries Point queries and window queries. The

technique is similar to the insertion algorithm: Search in the client image a server whose mbb

contains the point or intersects the window Send the query to this server If the server actually covers the point or the

window; it answers to the client; else it sends the query to its parent node

A server uses the overlapping information to transmit the query

Page 26: SD-Rtree: A Scalable Distributed Rtree

31

Experiments Synthetic data (points and rectangles)

generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects

Comparison of three SD-Rtree variants: BASIC: no image; every query is processed

top-down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images

Page 27: SD-Rtree: A Scalable Distributed Rtree

33

Per Insert Cost

Page 28: SD-Rtree: A Scalable Distributed Rtree

34

Cost of balancing

Page 29: SD-Rtree: A Scalable Distributed Rtree

35

Image convergence

Page 30: SD-Rtree: A Scalable Distributed Rtree

36

Distribution of messages

Page 31: SD-Rtree: A Scalable Distributed Rtree

37

Cost per Query

Page 32: SD-Rtree: A Scalable Distributed Rtree

38

Conclusion SD-Rtree is an efficient scalable distributed

Rtree For very large spatial data collections Can be processed in distributed RAM

Access time much faster than to disk data Load balancing

Spatial rotations Overlapping management

Redundant coverage O(log n) worst insert cost Future work

kNN-queries Objects distribution balancing on servers

Page 33: SD-Rtree: A Scalable Distributed Rtree

39

SD-Rtree

Thank You for

Your Attention

Questions: [email protected]