Trees

150
CSE 326: Data Structures Binary Search Trees

Transcript of Trees

Page 1: Trees

CSE 326: Data StructuresBinary Search Trees

Page 2: Trees

Today’s Outline

• Dictionary ADT / Search ADT

• Quick Tree Review

• Binary Search Trees

Page 3: Trees

ADTs Seen So Far

• Stack– Push– Pop

• Queue– Enqueue– Dequeue

• Priority Queue– Insert– DeleteMin

Then there is decreaseKey…

Page 4: Trees

The Dictionary ADT

• Data:– a set of

(key, value) pairs

• Operations:– Insert (key,

value)– Find (key)– Remove (key) The Dictionary ADT is also

called the “Map ADT”

• jfogartyJamesFogartyCSE 666

• phenryPeterHenryCSE 002

• boqinBoQinCSE 002

insert(jfogarty, ….)

find(boqin)• boqin Bo, Qin, …

Page 5: Trees

A Modest Few Uses

• Sets

• Dictionaries

• Networks : Router tables

• Operating systems : Page tables

• Compilers : Symbol tables

5

Probably the most widely used ADT!

Page 6: Trees

Implementations

• Unsorted Linked-list

• Unsorted array

• Sorted array

insert deletefind

Page 7: Trees

Tree CalculationsRecall: height is max

number of edges from root to a leaf

Find the height of the tree...

7

t

runtime:

Page 8: Trees

Tree Calculations Example

8

A

E

B

D F

C

G

IH

KJ L

M

L

N

How high is this tree?

Page 9: Trees

More Recursive Tree Calculations:Tree Traversals

A traversal is an order for visiting all the nodes of a tree

Three types:• Pre-order: Root, left subtree, right subtree

• In-order: Left subtree, root, right subtree

• Post-order: Left subtree, right subtree, root

+

*

2 4

5

(an expression tree)

Page 10: Trees

Inorder Traversal

void traverse(BNode t){

if (t != NULL)

traverse (t.left);

process t.element;

traverse (t.right);

}

}

Page 11: Trees

Binary Trees• Binary tree is

– a root– left subtree (maybe

empty) – right subtree (maybe

empty)

• Representation:

11

A

B

D E

C

F

HG

JI

Data

right pointer

leftpointer

Page 12: Trees

Binary Tree: Representation

12

Aright

pointerleft

pointer A

B

D E

C

F

Bright

pointerleft

pointer

Cright

pointerleft

pointer

Dright

pointerleft

pointer

Eright

pointerleft

pointer

Fright

pointerleft

pointer

Page 13: Trees

Binary Tree: Special Cases

A

B

D E

C

GF

IH

A

B

D E

C

F

A

B

D E

C

GF

Full Tree

Complete Tree Perfect Tree

Page 14: Trees

Binary Tree: Some Numbers!For binary tree of height h:

– max # of leaves:

– max # of nodes:

– min # of leaves:

– min # of nodes:

Average Depth for N nodes?

Page 15: Trees

Binary Search Tree Data Structure

• Structural property– each node has 2 children– result:

• storage is small• operations are simple• average depth is small

• Order property– all keys in left subtree smaller

than root’s key– all keys in right subtree larger

than root’s key– result: easy to find any given key

• What must I know about what I store?

4

121062

115

8

14

13

7 9

Page 16: Trees

Example and Counter-Example

3

1171

84

5

4

181062

115

8

20

21BINARY SEARCH TREES?

7

15

Page 17: Trees

Find in BST, Recursive

Node Find(Object key, Node root) { if (root == NULL) return NULL;

if (key < root.key) return Find(key, root.left); else if (key > root.key) return Find(key, root.right); else return root;}

2092

155

10

307 17

Runtime:

Page 18: Trees

Find in BST, Iterative

Node Find(Object key,

Node root) {

while (root != NULL &&

root.key != key) {

if (key < root.key)

root = root.left;

else

root = root.right;

}

return root;

}

2092

155

10

307 17

Runtime:

Page 19: Trees

Insert in BST

2092

155

10

307 17

Runtime:

Insert(13)Insert(8)Insert(31)

Insertions happen only at the leaves – easy!

Page 20: Trees

BuildTree for BST• Suppose keys 1, 2, 3, 4, 5, 6, 7, 8, 9 are

inserted into an initially empty BST. Runtime depends on the order!

– in given order

– in reverse order

– median first, then left median, right median, etc.

Page 21: Trees

Bonus: FindMin/FindMax

• Find minimum

• Find maximum 2092

155

10

307 17

Page 22: Trees

Deletion in BST

2092

155

10

307 17

Why might deletion be harder than insertion?

Page 23: Trees

Lazy DeletionInstead of physically deleting nodes, just mark them as deleted

+ simpler+ physical deletions done in

batches+ some adds just flip deleted

flag

– extra memory for “deleted” flag– many lazy deletions = slow

finds– some operations may have to

be modified (e.g., min and max)

2092

155

10

307 17

Page 24: Trees

Non-lazy Deletion

• Removing an item disrupts the tree structure.

• Basic idea: find the node that is to be removed. Then “fix” the tree so that it is still a binary search tree.

• Three cases:– node has no children (leaf node)– node has one child– node has two children

Page 25: Trees

Non-lazy Deletion – The Leaf Case

2092

155

10

307 17

Delete(17)

Page 26: Trees

Deletion – The One Child Case

2092

155

10

307

Delete(15)

Page 27: Trees

Deletion – The Two Child Case

3092

205

10

7

Delete(5)

What can we replace 5 with?

Page 28: Trees

Deletion – The Two Child Case

Idea: Replace the deleted node with a value guaranteed to be between the two child subtrees

Options:• succ from right subtree: findMin(t.right)• pred from left subtree : findMax(t.left)

Now delete the original node containing succ or pred• Leaf or one child case – easy!

Page 29: Trees

Finally…

3092

207

10

7 replaces 5

Original node containing7 gets deleted

Page 30: Trees

Balanced BST

Observation• BST: the shallower the better!• For a BST with n nodes

– Average height is O(log n)– Worst case height is O(n)

• Simple cases such as insert(1, 2, 3, ..., n)lead to the worst case scenario

Solution: Require a Balance Condition that1. ensures depth is O(log n) – strong enough!2. is easy to maintain – not too strong!

Page 31: Trees

Potential Balance Conditions1. Left and right subtrees of the root

have equal number of nodes

2. Left and right subtrees of the roothave equal height

Page 32: Trees

Potential Balance Conditions3. Left and right subtrees of every node

have equal number of nodes

4. Left and right subtrees of every nodehave equal height

Page 33: Trees

CSE 326: Data StructuresAVL Trees

Page 34: Trees

Balanced BST

Observation• BST: the shallower the better!• For a BST with n nodes

– Average height is O(log n)– Worst case height is O(n)

• Simple cases such as insert(1, 2, 3, ..., n)lead to the worst case scenario

Solution: Require a Balance Condition that1. ensures depth is O(log n) – strong enough!2. is easy to maintain – not too strong!

Page 35: Trees

Potential Balance Conditions1. Left and right subtrees of the root

have equal number of nodes

2. Left and right subtrees of the roothave equal height

3. Left and right subtrees of every nodehave equal number of nodes

4. Left and right subtrees of every nodehave equal height

Page 36: Trees

The AVL Balance ConditionAVL balance property:

Left and right subtrees of every nodehave heights differing by at most 1

• Ensures small depth– Will prove this by showing that an AVL tree of height

h must have a lot of (i.e. O(2h)) nodes

• Easy to maintain– Using single and double rotations

Page 37: Trees

The AVL Tree Data StructureStructural properties

1. Binary tree property (0,1, or 2 children)

2. Heights of left and right subtrees of every nodediffer by at most 1

Result:

Worst case depth of any node is: O(log n)

Ordering property– Same as for BST

4

121062

115

8

14137 9

15

Page 38: Trees

111

84

6

3

1171

84

6

2

5

AVL trees or not?

10 12

7

Page 39: Trees

Proving Shallowness Bound

121062

115

8

14137 9

15

Let S(h) be the min # of nodes in anAVL tree of height h

Claim: S(h) = S(h-1) + S(h-2) + 1

Solution of recurrence: S(h) = O(2h)(like Fibonacci numbers)

AVL tree of height h=4with the min # of nodes (12)

Page 40: Trees

Testing the Balance Property

2092

155

10

30177

NULLs have height -1

We need to be able to:

1. Track Balance

2. Detect Imbalance

3. Restore Balance

Page 41: Trees

An AVL Tree

20

92 15

5

10

30

177

0

0 0

011

2 2

3 10

3

data

height

children

Page 42: Trees

AVL trees: find, insert

• AVL find: – same as BST find.

• AVL insert: – same as BST insert, except may need to

“fix” the AVL tree after inserting new value.

Page 43: Trees

AVL tree insert

Let x be the node where an imbalance occurs.

Four cases to consider. The insertion is in the1. left subtree of the left child of x.2. right subtree of the left child of x.3. left subtree of the right child of x.4. right subtree of the right child of x.

Idea: Cases 1 & 4 are solved by a single rotation.Cases 2 & 3 are solved by a double

rotation.

Page 44: Trees

Bad Case #1

Insert(6)

Insert(3)

Insert(1)

Page 45: Trees

Fix: Apply Single Rotation

3

1 600

16

3

10

1

2

Single Rotation: 1. Rotate between x and child

AVL Property violated at this node (x)

Page 46: Trees

Single rotation in generala

ZY

b

Xh h

h

h -1

a

ZY

b

Xh+1 h h

X < b < Y < a < Z

Height of tree before? Height of tree after? Effect on Ancestors?

Page 47: Trees

Bad Case #2

Insert(1)

Insert(6)

Insert(3)

Page 48: Trees

Fix: Apply Double Rotation

3

1 600

1

3

6

1

0

1

2

6

3

1

0

1

2

AVL Property violated at this node (x)

Double Rotation1. Rotate between x’s child and grandchild2. Rotate between x and x’s new child

Page 49: Trees

Double rotation in generala

Z

b

W

c

X Yh-1

h

h h -1

a

Z

b

W

c

XYh-1 hh h

h 0

W < b <X < c < Y < a < Z

Height of tree before? Height of tree after? Effect on Ancestors?

Page 50: Trees

Double rotation, step 1

104

178

15

3 6

16

5

106

178

15

4

3

16

5

Page 51: Trees

Double rotation, step 2

106

178

15

4

3

16

5

10

6 17

8

15

4

3

16

5

Page 52: Trees

Imbalance at node X

Single Rotation

1. Rotate between x and child

Double Rotation

1. Rotate between x’s child and grandchild

2. Rotate between x and x’s new child

Page 53: Trees

9

5

2

11

7

1. single rotation?

2. double rotation?

3. no rotation?

Inserting what integer values would cause the tree to need a:

Single and Double Rotations:

13

30

Page 54: Trees

Insertion into AVL tree

1. Find spot for new key

2. Hang new node there with this key

3. Search back up the path for imbalance

4. If there is an imbalance:case #1: Perform single rotation and exit

case #2: Perform double rotation and exit

Both rotations keep the subtree height unchanged.Hence only one rotation is sufficient!

Page 55: Trees

Easy Insert

2092

155

10

3017

Insert(3)

120

0

100

1 2

3

0

Unbalanced?

Page 56: Trees

Hard Insert (Bad Case #1)

2092

155

10

3017

Insert(33)

3

121

0

100

2 2

3

00

How to fix?

Unbalanced?

Page 57: Trees

Single Rotation

2092

155

10

30173

12

33

1

0

200

2 3

3

10

0

3092

205

10

333

151

0

110

2 2

3

001712

0

Page 58: Trees

Hard Insert (Bad Case #2)

Insert(18)

2092

155

10

30173

121

0

100

2 2

3

00

How to fix?

Unbalanced?

Page 59: Trees

Single Rotation (oops!)

2092

155

10

30173

121

1

200

2 3

3

00

3092

205

10

3

151

1

020

2 3

3

01712

0

180

180

Page 60: Trees

Double Rotation (Step #1)

2092

155

10

30173

121

1

200

2 3

3

00

180

1792

155

10

203

121 200

2 3

3

10

300

180

Page 61: Trees

Double Rotation (Step #2)

1792

155

10

203

121 200

2 3

3

10

300

180

2092

175

10

303

151

0

110

2 2

3

0012

018

Page 62: Trees

Insert into an AVL tree: 5, 8, 9, 4, 2, 7, 3, 1

Page 63: Trees

CSE 326: Data StructuresSplay Trees

Page 64: Trees

AVL Trees Revisited

• Balance condition:Left and right subtrees of every nodehave heights differing by at most 1

– Strong enough : Worst case depth is O(log n)– Easy to maintain : one single or double rotation

• Guaranteed O(log n) running time for– Find ?– Insert ?– Delete ?– buildTree ?

Page 65: Trees

Single and Double Rotationsa

ZY

b

Xh hh

a

Z

b

W

c

X Yh-1

h

h h -1

Page 66: Trees

AVL Trees Revisited

• What extra info did we maintain in each node?

• Where were rotations performed?

• How did we locate this node?

Page 67: Trees

Other Possibilities?• Could use different balance conditions, different ways

to maintain balance, different guarantees on running time, …

• Why aren’t AVL trees perfect?

• Many other balanced BST data structures– Red-Black trees– AA trees– Splay Trees– 2-3 Trees– B-Trees– …

Page 68: Trees

Splay Trees

• Blind adjusting version of AVL trees– Why worry about balances? Just rotate

anyway!

• Amortized time per operations is O(log n)• Worst case time per operation is O(n)

– But guaranteed to happen rarely

Insert/Find always rotate node to the root!

SAT/GRE Analogy question: AVL is to Splay trees as ___________ is to __________

Page 69: Trees

Recall: Amortized ComplexityIf a sequence of M operations takes O(M f(n)) time,we say the amortized runtime is O(f(n)).

Amortized complexity is worst-case guarantee over sequences of operations.

• Worst case time per operation can still be large, say O(n)

• Worst case time for any sequence of M operations is O(M f(n))

Average time per operation for any sequence is O(f(n))

Page 70: Trees

Recall: Amortized Complexity• Is amortized guarantee any weaker than

worstcase?

• Is amortized guarantee any stronger than averagecase?

• Is average case guarantee good enough in practice?

• Is amortized guarantee good enough in practice?

Page 71: Trees

The Splay Tree Idea

17

10

92

5

If you’re forced to make a really deep access:

Since you’re down there anyway,fix up a lot of deep nodes!

3

Page 72: Trees

Find/Insert in Splay Trees

1. Find or insert a node k2. Splay k to the root using:

zig-zag, zig-zig, or plain old zig rotation

Why could this be good??

1. Helps the new root, ko Great if k is accessed again

2. And helps many others!o Great if many others on the path are accessed

Page 73: Trees

Splaying node k to the root:Need to be careful!

One option (that we won’t use) is to repeatedly use AVL single rotation until k becomes the root: (see Section 4.5.1 for details)

s

Ak

B C

r

D

q

E

p

F

r

D

q

E

p

F

C

s

A B

k

Page 74: Trees

Splaying node k to the root:Need to be careful!

What’s bad about this process?

s

A k

B C

r

D

q

E

p

F

r

D

q

E

p

F

C

s

A B

k

Page 75: Trees

Splay: Zig-Zag*

g

Xp

Y

k

Z

W

*Just like an…

k

Y

g

W

p

ZX

Which nodes improve depth?

Page 76: Trees

Splay: Zig-Zig*

k

Z

Y

p

X

g

W

g

W

X

p

Y

k

Z

*Is this just two AVL single rotations in a row?

Page 77: Trees

Special Case for Root: Zigp

X

k

Y

Z

root k

Z

p

Y

X

root

Relative depth of p, Y, Z? Relative depth of everyone else?

Why not drop zig-zig and just zig all the way?

Page 78: Trees

Splaying Example: Find(6)

2

1

3

4

5

6

Find(6)

2

1

3

6

5

4

?

Page 79: Trees

Still Splaying 6

2

1

3

6

5

4

1

6

3

2 5

4

?

Page 80: Trees

Finally…

1

6

3

2 5

4

6

1

3

2 5

4

?

Page 81: Trees

Another Splay: Find(4)

Find(4)

6

1

3

2 5

4

6

1

4

3 5

2

?

Page 82: Trees

Example Splayed Out

6

1

4

3 5

2

61

4

3 5

2

?

Page 83: Trees

But Wait…

What happened here?

Didn’t two find operations take linear timeinstead of logarithmic?

What about the amortized O(log n) guarantee?

Page 84: Trees

Why Splaying Helps

• If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay

• Overall, nodes which are low on the access path tend to move closer to the root

• Splaying gets amortized O(log n) performance. (Maybe not now, but soon, and for the rest of the operations.)

Page 85: Trees

Practical Benefit of Splaying• No heights to maintain, no imbalance to

check for– Less storage per node, easier to code

• Data accessed once, is often soon accessed again– Splaying does implicit caching by bringing it to

the root

Page 86: Trees

Splay Operations: Find

• Find the node in normal BST manner

• Splay the node to the root– if node not found, splay what would have

been its parent

What if we didn’t splay?

Amortized guarantee fails!Bad sequence: find(leaf k), find(k), find(k), …

Page 87: Trees

Splay Operations: Insert

• Insert the node in normal BST manner

• Splay the node to the root

What if we didn’t splay?

Page 88: Trees

Splay Operations: Remove

find(k)

L R

k

L R

> k< k

delete k

Now what?

Page 89: Trees

JoinJoin(L, R): given two trees such that (stuff in L) < (stuff in R), merge them:

Splay on the maximum element in L, then attach R

L R R

L

Does this work to join any two trees?

splay

max

Page 90: Trees

Delete Example

91

6

4 7

2

Delete(4)

find(4)

9

6

7

1

4

2

1

2

9

6

7

Find max

2

1

9

6

7

2

1

9

6

7

Page 91: Trees

Splay Tree Summary• All operations are in amortized O(log n) time

• Splaying can be done top-down; this may be better because:– only one pass– no recursion or parent pointers necessary– we didn’t cover top-down in class

• Splay trees are very effective search trees– Relatively simple– No extra fields required– Excellent locality properties:

frequently accessed keys are cheap to find

Page 92: Trees

Splay E

ED

CF

I

GB

AH

Page 93: Trees

Splay E

B

I

H

C

DG

F

E

A

Page 94: Trees

CSE 326: Data StructuresB-Trees

Page 95: Trees

B-Trees

Weiss Sec. 4.7

Page 96: Trees

CPU

(has registers)

Cache

Main Memory

Disk

TIme to access(conservative)

2-10 ns

40-100 ns

a few milliseconds

(5-10 Million ns)

SRAM

8KB - 4MB

DRAM

up to 10GB

many GB

Cache

Main Memory

Disk

1 ns per instruction

Page 97: Trees

Trees so far

• BST

• AVL

• Splay

Page 98: Trees

AVL treesSuppose we have 100 million items

(100,000,000):

• Depth of AVL Tree

• Number of Disk Accesses

Page 99: Trees

M-ary Search Tree

• Maximum branching factor of M

• Complete tree has height =

# disk accesses for find:

Runtime of find:

Page 100: Trees

Solution: B-Trees• specialized M-ary search trees

• Each node has (up to) M-1 keys:– subtree between two keys x and y contains

leaves with values v such thatx v < y

• Pick branching factor Msuch that each node takes one full {page, block}of memory

3 7 12 21

x<3 3x<7 7x<12 12x<21 21x

Page 101: Trees

B-Trees

What makes them disk-friendly?

1. Many keys stored in a node• All brought to memory/cache in one access!

2. Internal nodes contain only keys;Only leaf nodes contain keys and actual data• The tree structure can be loaded into memory

irrespective of data object size• Data actually resides in disk

Page 102: Trees

B-Tree: ExampleB-Tree with M = 4 (# pointers in internal node)

and L = 4 (# data items in leaf)

1AB

2xG

3 5 6 9

10 11 12

15 17

20 25 26

30 32 33 36

40 42

50 60 70

10 40

3 15 20 30 50

Note: All leaves at the same depth!Data objects, that I’ll ignore in slides

Page 103: Trees

B-Tree Properties ‡

– Data is stored at the leaves– All leaves are at the same depth and contains between

L/2 and L data items– Internal nodes store up to M-1 keys– Internal nodes have between M/2 and M children– Root (special case) has between 2 and M children (or

root could be a leaf)

‡These are technically B+-Trees

Page 104: Trees

Example, AgainB-Tree with M = 4and L = 4

1 2

3 5 6 9

10 11 12

15 17

20 25 26

30 32 33 36

40 42

50 60 70

10 40

3 15 20 30 50

(Only showing keys, but leaves also have data!)

Page 105: Trees

Building a B-Tree

The empty B-Tree

M = 3 L = 2

3 Insert(3)

3 14Insert(14)

Now, Insert(1)?

Page 106: Trees

Splitting the Root

And createa new root

1 3 14

1 3 14

14

1 3 14 3 14

Insert(1)

Too many keys in a leaf!

So, split the leaf.

M = 3 L = 2

Page 107: Trees

Overflowing leaves

Insert(59)

14

1 3 14 59

14

1 3 14

Insert(26)

14

1 3

14 26 59

14 59

1 3 14 26 59

And add a new child

Too many keys in a leaf!

So, split the leaf.

M = 3 L = 2

Page 108: Trees

Propagating Splits

14 59

1 3 14 26 59

14 59

14 26 59

1 3 5

Insert(5)

5 14

14 26 59 1 3 5

59

5

1 3 5 14 26 59

59

14

Add newchild

Create anew root

Split the leaf, but no space in parent!

So, split the node.

M = 3 L = 2

Page 109: Trees

Insertion Algorithm1. Insert the key in its leaf2. If the leaf ends up with L+1

items, overflow!– Split the leaf into two nodes:

• original with (L+1)/2 items• new one with (L+1)/2 items

– Add the new child to the parent– If the parent ends up with M+1

items, overflow!

3. If an internal node ends up with M+1 items, overflow!– Split the node into two nodes:

• original with (M+1)/2 items• new one with (M+1)/2 items

– Add the new child to the parent– If the parent ends up with M+1

items, overflow!

4. Split an overflowed root in two and hang the new nodes under a new root

This makes the tree deeper!

Page 110: Trees

After More Routine Inserts

5

1 3 5 14 26 59

59

14

5

1 3 5 14 26 59 79

59 89

14

89

Insert(89)Insert(79)

M = 3 L = 2

Page 111: Trees

Deletion

5

1 3 5 14 26 59 79

59 89

14

89

5

1 3 5 14 26 79

79 89

14

89

Delete(59)

What could go wrong?

1. Delete item from leaf2. Update keys of ancestors if necessary

M = 3 L = 2

Page 112: Trees

Deletion and Adoption

5

1 3 5 14 26 79

79 89

14

89

Delete(5)?

1 3 14 26 79

79 89

14

89

3

1 3 3 14 26 79

79 89

14

89

A leaf has too few keys!

So, borrow from a sibling

M = 3 L = 2

Page 113: Trees

Does Adoption Always Work?

• What if the sibling doesn’t have enough for you to borrow from?

e.g. you have L/2-1 and sibling has L/2 ?

Page 114: Trees

Deletion and Merging

3

1 3 14 26 79

79 89

14

89

Delete(3)?

1 14 26 79

79 89

14

89

1 14 26 79

79 89

14

89

A leaf has too few keys!

And no sibling with surplus!

So, deletethe leaf

But now an internal node has too few subtrees!

M = 3 L = 2

Page 115: Trees

Adopt aneighbor

1 14 26 79

79 89

14

89

14

1 14 26 79

89

79

89

Deletion with Propagation (More Adoption)

M = 3 L = 2

Page 116: Trees

Delete(1)(adopt asibling)

14

1 14 26 79

89

79

89

A Bit More Adoption

26

14 26 79

89

79

89

M = 3 L = 2

Page 117: Trees

Delete(26)26

14 26 79

89

79

89

Pulling out the Root

14 79

89

79

89

A leaf has too few keys!And no sibling with surplus!

14 79

89

79

89

So, delete the leaf;merge

A node has too few subtrees and no neighbor with surplus!

14 79

79 89

89

Delete the node

But now the root has just one subtree!

M = 3 L = 2

Page 118: Trees

Pulling out the Root (continued)

14 79

79 89

89

The root has just one subtree!

14 79

79 89

89

Simply makethe one childthe new root!

M = 3 L = 2

Page 119: Trees

Deletion Algorithm

1. Remove the key from its leaf

2. If the leaf ends up with fewer than L/2 items, underflow!– Adopt data from a sibling;

update the parent– If adopting won’t work, delete

node and merge with neighbor– If the parent ends up with

fewer than M/2 items, underflow!

Page 120: Trees

Deletion Slide Two3. If an internal node ends up with

fewer than M/2 items, underflow!– Adopt from a neighbor;

update the parent– If adoption won’t work,

merge with neighbor– If the parent ends up with fewer than

M/2 items, underflow!

4. If the root ends up with only one child, make the child the new root of the tree

This reduces the height of the tree!

Page 121: Trees

Thinking about B-Trees• B-Tree insertion can cause (expensive)

splitting and propagation• B-Tree deletion can cause (cheap)

adoption or (expensive) deletion, merging and propagation

• Propagation is rare if M and L are large (Why?)

• If M = L = 128, then a B-Tree of height 4 will store at least 30,000,000 items

Page 122: Trees

Tree Names You Might Encounter

FYI:– B-Trees with M = 3, L = x are called 2-3

trees• Nodes can have 2 or 3 keys

– B-Trees with M = 4, L = x are called 2-3-4 trees

• Nodes can have 2, 3, or 4 keys

Page 123: Trees

K-D Trees and Quad Trees

Page 124: Trees

Range Queries

• Think of a range query.– “Give me all customers aged 45-55.”– “Give me all accounts worth $5m to $15m”

• Can be done in time ________.

• What if we want both:– “Give me all customers aged 45-55 with

accounts worth between $5m and $15m.”

Page 125: Trees

Geometric Data Structures

• Organization of points, lines, planes, etc in support of faster processing

• Applications– Map information– Graphics - computing object intersections– Data compression - nearest neighbor search– Decision Trees - machine learning

Page 126: Trees

k-d Trees

• Jon Bentley, 1975, while an undergraduate• Tree used to store spatial data.

– Nearest neighbor search.– Range queries.– Fast look-up

• k-d tree are guaranteed log2 n depth where n is the number of points in the set.– Traditionally, k-d trees store points in

d-dimensional space which are equivalent to vectors in d-dimensional space.

Page 127: Trees

Range Queries

x

y

ab

f

c

gh

ed

i

x

y

ab

f

c

gh

ed

i

Rectangular query Circular query

Page 128: Trees

x

y

ab

f

c

gh

ed

i

Nearest Neighbor Search

query

Nearest neighbor is e.

Page 129: Trees

k-d Tree Construction

• If there is just one point, form a leaf with that point.• Otherwise, divide the points in half by a line

perpendicular to one of the axes. • Recursively construct k-d trees for the two sets of

points.• Division strategies

– divide points perpendicular to the axis with widest spread.– divide in a round-robin fashion (book does it this way)

Page 130: Trees

x

y

k-d Tree Construction

ab

f

c

gh

ed

i

divide perpendicular to the widest spread.

Page 131: Trees

y

k-d Tree Construction (18)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

k-d tree cell

Page 132: Trees

2-d Tree Decomposition

1

2

3

Page 133: Trees

x

y

k-d Tree Splitting

ab

c

gh

ed

i a d g b e i c h fa c b d f e h g i

xy

sorted points in each dimension

f

• max spread is the max offx -ax and iy - ay.

• In the selected dimension themiddle point in the list splits the data.

• To build the sorted lists for the other dimensions scan the sortedlist adding each point to one of two sorted lists.

1 2 3 4 5 6 7 8 9

Page 134: Trees

x

y

k-d Tree Splitting

ab

c

gh

ed

i a d g b e i c h fa c b d f e h g i

xy

sorted points in each dimension

f

1 2 3 4 5 6 7 8 9

a b c d e f g h i

0 0 1 0 0 1 0 1 1

indicator for each set

scan sorted points in y dimensionand add to correct set

a b d e g c f h iy

Page 135: Trees

k-d Tree Construction Complexity

• First sort the points in each dimension.– O(dn log n) time and dn storage.– These are stored in A[1..d,1..n]

• Finding the widest spread and equally divide into two subsets can be done in O(dn) time.

• We have the recurrence– T(n,d) < 2T(n/2,d) + O(dn)

• Constructing the k-d tree can be done in O(dn log n) and dn storage

Page 136: Trees

Node Structure for k-d Trees

• A node has 5 fields– axis (splitting axis)– value (splitting value)– left (left subtree)– right (right subtree)– point (holds a point if left and right children

are null)

Page 137: Trees

Rectangular Range Query• Recursively search every cell that

intersects the rectangle.

Page 138: Trees

Rectangular Range Query (1)

y

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h

x

f

i

Page 139: Trees

Rectangular Range Query (8)

y

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h

x

f

i

Page 140: Trees

Rectangular Range Query

print_range(xlow, xhigh, ylow, yhigh :integer, root: node pointer) {

Case {

root = null: return;

root.left = null:

if xlow < root.point.x and root.point.x < xhigh

and ylow < root.point.y and root.point.y < yhigh

then print(root);

else

if(root.axis = “x” and xlow < root.value ) or

(root.axis = “y” and ylow < root.value ) then

print_range(xlow, xhigh, ylow, yhigh, root.left);

if (root.axis = “x” and xlow > root.value ) or

(root.axis = “y” and ylow > root.value ) then

print_range(xlow, xhigh, ylow, yhigh, root.right);

}}

Page 141: Trees

k-d Tree Nearest Neighbor Search

• Search recursively to find the point in the same cell as the query.

• On the return search each subtree where a closer point than the one you already know about might be found.

Page 142: Trees

y

k-d Tree NNS (1)

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d e

g c f h i

x

f

query point

Page 143: Trees

y

k-d Tree NNS

x

ab

c

gh

ed

i s1

s2y y

s6

s3x

s4y

s7y

s8y

s5x

s1

s2

s3

s4

s5

s6

s7

s8

a b

d

c f h i

x

f

query point

w

e

g

Page 144: Trees

Notes on k-d NNS

• Has been shown to run in O(log n) average time per search in a reasonable model.

• Storage for the k-d tree is O(n).

• Preprocessing time is O(n log n) assuming d is a constant.

Page 145: Trees

y

Worst-Case for Nearest Neighbor Search

query point •Half of the points visited for a query

•Worst case O(n)

•But: on average (and in practice) nearest neighbor queries are O(log N)

x

Page 146: Trees

Quad Trees• Space Partitioning

x

y

ab

c

g

ed f g e

d ba cf

Page 147: Trees

A Bad Case

x

y

ab

Page 148: Trees

Notes on Quad Trees

• Number of nodes is O(n(1+ log(/n))) where n is the number of points and is the ratio of the width (or height) of the key space and the smallest distance between two points

• Height of the tree is O(log n + log )

Page 149: Trees

K-D vs Quad• k-D Trees

– Density balanced trees– Height of the tree is O(log n) with batch insertion– Good choice for high dimension– Supports insert, find, nearest neighbor, range queries

• Quad Trees– Space partitioning tree– May not be balanced– Not a good choice for high dimension– Supports insert, delete, find, nearest neighbor, range queries

Page 150: Trees

Geometric Data Structures

• Geometric data structures are common.

• The k-d tree is one of the simplest.– Nearest neighbor search– Range queries

• Other data structures used for– 3-d graphics models– Physical simulations