CSCE 3110 Data Structures & Algorithm Analysis

49
CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Search Trees Reading: Chap. 4, Weiss

description

CSCE 3110 Data Structures & Algorithm Analysis. Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Search Trees Reading: Chap. 4, Weiss. Sorting with BST. Use binary search trees for sorting Start with unsorted sequence Insert all elements in a BST Traverse the tree…. how ? - PowerPoint PPT Presentation

Transcript of CSCE 3110 Data Structures & Algorithm Analysis

Page 1: CSCE 3110 Data Structures &  Algorithm Analysis

CSCE 3110Data Structures & Algorithm Analysis

Rada Mihalceahttp://www.cs.unt.edu/~rada/CSCE3110

Search TreesReading: Chap. 4, Weiss

Page 2: CSCE 3110 Data Structures &  Algorithm Analysis

Sorting with BST

Use binary search trees for sortingStart with unsorted sequenceInsert all elements in a BSTTraverse the tree…. how ?Running time?

Page 3: CSCE 3110 Data Structures &  Algorithm Analysis

Prevent the degeneration of the BST :A BST can be set up to maintain balance during updating operations (insertions and removals)Types of BST which maintain the optimal performance:

splay treesAVL treesRed-Black treesB-trees

Better Binary Search Trees

Page 4: CSCE 3110 Data Structures &  Algorithm Analysis

AVL Trees

Balanced binary search treesAn AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. 88

44

17 78

32 50

48 62

2

4

1

1

2

3

1

1

Page 5: CSCE 3110 Data Structures &  Algorithm Analysis

Height of an AVL TreeProposition: The height of an AVL tree T storing n keys is O(log n).Justification: The easiest way to approach this problem is to find n(h): the minimum number of internal nodes of an AVL tree of height h.

n(1) = 1 and n(2) = 2for n ≥ 3, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and the other AVL subtree of height n-2. n(h) = 1 + n(h-1) + n(h-2)given n(h-1) > n(h-2) n(h) > 2n(h-2)n(h) > 2n(h-2)n(h) > 4n(h-4)…n(h) > 2in(h-2i)

pick i = h/2 – 1 n(h) ≥ 2 h/2-1

follow h < 2log n(h) +2 height of an AVL tree is O(log n)

Page 6: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion

A binary search tree T is called balanced if for every node v, the height of v’s children differ by at most one.Inserting a node into an AVL tree involves performing an expandExternal(w) on T, which changes the heights of some of the nodes in T.If an insertion causes T to become unbalanced, we travel up the tree from the newly created node until we find the first node x such that its grandparent z is unbalanced node.Since z became unbalanced by an insertion in the subtree rooted at its child y, height(y) = height(sibling(y)) + 2 Need to rebalance...

Page 7: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion: RebalancingTo rebalance the subtree rooted at z, we must perform a restructuring we rename x, y, and z to a, b, and c based on the order of the nodes in an in-order traversal.z is replaced by b, whose children are now a and c whose children, in turn, consist of the four other subtrees formerly children of x, y, and z.

Page 8: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion (cont’d)

88

44

17 78

32 50

48 62

2

5

1

1

3

4

2

1

54

1

T0T2

T3

x

y

z

2

3

4

5

67

1

88

44

17

7832 50

48

622

4

1

1

2 2

3

154

1

T0 T1

T2

T3

x

y z

unbalanced...

...balanced

12

3

4

5

6

7

Page 9: CSCE 3110 Data Structures &  Algorithm Analysis

RestructuringThe four ways to rotate nodes in an AVL tree, graphically represented

-Single Rotations:

T0T1

T2

T3

c = xb = y

a = z

T0 T1 T2

T3

c = xb = y

a = zsingle rotation

T3T2

T1

T0

a = xb = y

c = z

T0T1T2

T3

a = xb = y

c = zsingle rotation

Page 10: CSCE 3110 Data Structures &  Algorithm Analysis

Restructuring (cont’d)double rotations:

double rotationa = z

b = xc = y

T0T2

T1

T3 T0

T2T3T1

a = zb = x

c = y

double rotationc = z

b = xa = y

T0T2

T1

T3 T0

T2T3 T1

c = zb = x

a = y

Page 11: CSCE 3110 Data Structures &  Algorithm Analysis

Restructure AlgorithmAlgorithm restructure(x):

Input: A node x of a binary search tree T that has both a parent y and a grandparent zOutput: Tree T restructured by a rotation (either

single or double) involving nodes x, y, and z.

1: Let (a, b, c) be an inorder listing of the nodes x, y, and z, and let (T0, T1, T2, T3) be an inorder listing of the the four subtrees of x, y, and z, not rooted at x, y, or z.

2. Replace the subtree rooted at z with a new subtree rooted at b

3. Let a be the left child of b and let T0, T1 be the left and right subtrees of a, respectively.

4. Let c be the right child of b and let T2, T3 be the left and right subtrees of c, respectively.

Page 12: CSCE 3110 Data Structures &  Algorithm Analysis

Cut/Link Restructure Algorithm

Let’s go into a little more detail on this algorithm...Any tree that needs to be balanced can be grouped into 7 parts: x, y, z, and the 4 trees anchored at the children of those nodes (T0-3)

88

44

17

7850

48

62

54T0

T1

T2

T3

y

x

Page 13: CSCE 3110 Data Structures &  Algorithm Analysis

Cut/Link Restructure Algorithm

88

44

17

7850

48

62

54T0

T1

T2

T3

y

x

Make a new tree which is balanced and put the 7 parts from the old tree into the new tree so that the numbering is still correct when we do an in-order-traversal of the new tree.This works regardless of how the tree is originally unbalanced.Let’s see how it works!

Page 14: CSCE 3110 Data Structures &  Algorithm Analysis

Number the 7 parts by doing an in-order-traversal. (note that x,y, and z are now renamed based upon their order within the traversal)

88

44

17

7850

48

62

54T0

T1

T2

T3

z (a)

y (b)

x (c)

1 2

34

56

7

Cut/Link Restructure Algorithm

Page 15: CSCE 3110 Data Structures &  Algorithm Analysis

Now create an Array, numbered 1 to 7 (the 0th element can be ignored with minimal waste of space)

1 2 3 4 5 6 7• Cut() the 4 T trees and place them in their inorder rank in the

arrayT0 T1 T2 T3

1 2 3 4 5 6 7

Cut/Link Restructure Algorithm

Page 16: CSCE 3110 Data Structures &  Algorithm Analysis

Now cut x,y, and z in that order (child,parent,grandparent) and place them in their inorder rank in the array.

T0 T1 T2 T378c

62ba

44

62

b4

1 2 3 4 5 6 7• Now we can re-link these subtrees to the main tree.• Link in rank 4 (b) where the subtree’s root formerly

Cut/Link Restructure Algorithm

Page 17: CSCE 3110 Data Structures &  Algorithm Analysis

Link in ranks 2 (a) and 6 (c) as 4’s children.

62

b4

44 78

a c2 6

Cut/Link Restructure Algorithm

Page 18: CSCE 3110 Data Structures &  Algorithm Analysis

Finally, link in ranks 1,3,5, and 7 as the children of 2 and 6.

62

y4

44 78

z x

17

T0

2 6

50

48 54

T1

3 588

T3

7T2

• Now you have a balanced tree!

Cut/Link Restructure Algorithm

Page 19: CSCE 3110 Data Structures &  Algorithm Analysis

This algorithm for restructuring has the exact same effect as using the four rotation cases discussed earlier.Advantages: no case analysis, more elegantDisadvantage: can be more code to writeSame time complexity

Cut/Link Restructure Algorithm

Page 20: CSCE 3110 Data Structures &  Algorithm Analysis

Removal

We can easily see that performing a removeAboveExternal(w) can cause T to become unbalanced.Let z be the first unbalanced node encountered while traveling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height.We can perform operation restructure(x) to restore balance at the subtree rooted at z.As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached

Page 21: CSCE 3110 Data Structures &  Algorithm Analysis

Removal (cont’d)

example of deletion from an AVL tree:

88

44

17

78

32

50

48

621

4

1

2 2

3

154

1T0

T

T2

y

x

0

1

8817

78

50

48

62

1

1

2

23

1

541

T 0

T 2

T 3

y

x44

4

z

0

Page 22: CSCE 3110 Data Structures &  Algorithm Analysis

Removal (cont’d)

example of deletion from an AVL tree:

88

17 78

50

48

621 1

4

2

3

154

1

T 0 T 1 T 2

y

x

0

442

z

88

44

17

78

32

50

48

621

4

1

2 2

3

154

1T0

T1 T2 T3

z

y

x

0

Page 23: CSCE 3110 Data Structures &  Algorithm Analysis

Multi-way Search Trees

Each internal node of a multi-way search tree T:

has at least two childrenstores a collection of items of the form (k, x), where k is a key and x is an elementcontains d - 1 items, where d is the number of children d-nodes“contains” 2 pseudo-items: k0=- , kd=

Children of each internal node are “between” itemsall keys in the subtree rooted at the child fall between keys of those items

Page 24: CSCE 3110 Data Structures &  Algorithm Analysis

Multi-way Searching

Similar to binary searchingIf search key s<k1 search the leftmost childIf s>kd-1 , search the rightmost child

That’s it in a binary tree; what about if d>2?

Find two keys ki-1 and ki between which s falls, and search the child vi.

What would an in-order traversal look like?

3 4 6 8 23 24 27

22

5 10 25

11 13

14

Searchingfor s = 8

Searchingfor s = 12

Not found!

17 18 19 20 21

3 4

Page 25: CSCE 3110 Data Structures &  Algorithm Analysis

2-4 Trees

a. Nodes may contain 1, 2 or 3 items. b. A node with k items has k + 1 children c. All leaves are on same level.

Page 26: CSCE 3110 Data Structures &  Algorithm Analysis

Example

10 45

3 8 25 38 70 90 100

Page 27: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion

Insertion: Find the appropriate leaf. If there is only one item, just add to leaf.If no room, move middle item to parent and split remaining two items among two children.

Page 28: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion

10 45

3 825 38 70 80 90 100

insert 80

Overflow!

Page 29: CSCE 3110 Data Structures &  Algorithm Analysis

Insertion

10 45 80

3 825 38 90 100

Split & move middle element to parent

70

Page 30: CSCE 3110 Data Structures &  Algorithm Analysis

Removal

First : find the key with a simple multi-way searchIf the item to delete has non-external children, reduce to the case where item is at the bottom of the tree:

Find item which precedes it in in-order traversal• which one?

Swap themRemove the itemAlternative?

Page 31: CSCE 3110 Data Structures &  Algorithm Analysis

Removal

Not enough items in the node Underflow!

Pull an item from the parent, replace it with an item from a sibling - transfer Still not good enough! What happens if siblings are 2-nodes?Could we just pull one item from the parent?

Page 32: CSCE 3110 Data Structures &  Algorithm Analysis

Removal

Remove 3move 10 in the subtreemove 25 in the parent

10 45 80

3 25 38 90 10070

Page 33: CSCE 3110 Data Structures &  Algorithm Analysis

Removal

If siblings are 2-nodes (i.e. contain only one key)

cannot ‘steal’ from themDo node mergingRemove 3

move 10 in the subtreemerge 10 with 25

10 45 80

3 25 90 70

Page 34: CSCE 3110 Data Structures &  Algorithm Analysis

2-4 Trees

More on removal:What if parent is a 2-node?Propagate underflow up the tree

2-4 trees are easy to maintainInsertion and deletion take O(log n)Balanced trees

Page 35: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

Up to now, all data that has been stored in the tree has been in memory.If data gets too big for main memory, what do we do?If we keep a pointer to the tree in main memory, we could bring in just the nodes that we need.For instance, to do an insert with a BST, if we need the left child, we do a disk access and retrieve the left child.If the left child is NIL, then we can do the insert, and store the child node on the disk.Not too good for a BST

Page 36: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

The problem with BST: storing the data requires disk accesses, which is expensive, compared to execution of machine instructions.If we can reduce the number of disk accesses, then the procedures run faster.The only way to reduce the number of disk accesses is to increase the number of keys in a node.The BST allows only one key per leaf.

Very good and often used for Search Engines!(when collection size gets very big the index does not fit in memory)

Page 37: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

If we increase the number of keys in the nodes, how will we do any tree operations effectively?

10 20 30 40 50 60 70

• Above is a node with 7 keys. How do we add children?

10 20 30 40 50 60 70

1 2 3 4 5 6 7 8 9 11 22 33 44 55 66 77

Page 38: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

Clearly, the tree below is useless.

10 20 30 40 50 60 70

1 2 3 4 5 6 7 8 9 11 22 33 44 55 66 77

• How many pointers do we need?• Using the idea of BST, we need to be able to put nodes into

the tree that have smaller, same and larger values than the node we are currently examining.

Page 39: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees: A General Case of Multi-Way Search Trees

70605040302010

87 9684321 11 22 7733 44 55 66

• We can easily find any value.• We need to create operations, which require rules on what

makes a tree a B-Tree.• Clearly, having one key per node would be very bad.• We need a mechanism to increase the height of the tree (since

the number of keys in any node can get very high) so we can shift keys out of a node, making the nodes smaller.

Page 40: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees: Fields in a Node

A B-Tree is a rooted tree (whose root is root[T])) having the following properties:

1. Every internal node x has the following fields:

leaf[x]

key8[x]

key7[x]

key6[x]

key5[x]

key4[x]

key3[x]

key2[x]

key1[x]n[x]

c9[x]c8[x]c7[x]c6[x]c5[x]c4[x]c3[x]c2[x]c1[x]

n[x] is the number of keys in the node. n[x] = 8 above.leaf[x] = false for internal nodes, since x is not a leaf.The keyi[x] are the values of the keys, where keyi[x] keyi+1[x].ci[x] are pointers to child nodes. All the keys in ci[x] have values

that are between keyi-1[x] and keyi[x].

Page 41: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

Leaf nodes have no child pointersleaf[x] = true for leaf nodes.All leaf nodes are at the same level

leaf[x]

key8[x]

key7[x]

key6[x]

key5[x]

key4[x]

key3[x]

key2[x]

key1[x]n[x]

Page 42: CSCE 3110 Data Structures &  Algorithm Analysis

B-Trees

There are lower and upper bounds on the number of keys a node can contain. This depends on the “minimum degree” t 2, which we must specify for any given B-Tree.

a. Every node other than the root must have at least t-1 keys. Every internal node other than the root thus has t children. If the tree is nonempty, the root must have at least one key.

b. Every node can contain at most 2t-1 keys. Therefore, an internal node can have at most 2t children. A node is full if it contains exactly 2t-1 keys.

Page 43: CSCE 3110 Data Structures &  Algorithm Analysis

Height of B-Tree

If n 1, then for any n-key B-tree of height h and mimimum degree t 2,

height = h logt[(n+1)/2]

The important thing to notice is that the height of the tree is log base t. So, as t increases, the height, for any number of nodes n, will decrease.Using the formula logax = (logbx)/(logba), we can see that

log2106 = (log10106)/(log102) 6/0.30102999566398 19log10106 = 6So, 13 less disk accesses to get to the leafs!

Page 44: CSCE 3110 Data Structures &  Algorithm Analysis

Basic Operations

The root of the B-tree is always in main memory, so that a Disk-Read on the root is never required; a Disk-Write of the root is required, however, whenever the root node is changed.

Any nodes that are passed as parameters must already have had a Disk-Read operation performed on them.

Page 45: CSCE 3110 Data Structures &  Algorithm Analysis

Searching a B-Tree

If it is a leaf node, then you are done, as there is no leaf

to inspect

Otherwise, retrieve the child nodefrom the disk, and put it into

memory

Start at the leftmost keyin the node, and go to theright until you go too far.

Page 46: CSCE 3110 Data Structures &  Algorithm Analysis

Inserting into B-trees

Really very easy. Very similar with (2,4) trees. Just keep in mind that you are starting at the root, and then finding the subtree where the key should be inserted, and following the pointer.A deletion may eventually occur, and sometimes deletions force keys into their parents. So, if we encounter a full node on our way to the node where the insertion will take place, we must split that node into two.

Page 47: CSCE 3110 Data Structures &  Algorithm Analysis

B-Tree-InsertIf the node has

2t-1 keys, it can’taccept any more keys, so you need

to split it into2 nodes before

doing the insert.

Otherwise, call Nonfull()

Inserting into B-trees (cont’d)

Page 48: CSCE 3110 Data Structures &  Algorithm Analysis

Deleting Keys from Nodes

Page 49: CSCE 3110 Data Structures &  Algorithm Analysis

Deleting Keys from Nodes