Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search...

43
The Art of Data Structures Binary Search Trees Richard E Sarkis CSC 162: The Art of Data Structures

Transcript of Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search...

Page 1: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

The Art of Data Structures Binary Search Trees

Richard E Sarkis CSC 162: The Art of Data Structures

Page 2: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Class Administrivia

Page 3: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Agenda

• Binary Search Trees

• Search Tree Operations

• Search Tree Implementation

• Search Tree Analysis

Page 4: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees

Page 5: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Maps

• So far, two different ways to get key-value pairs in a collection

• Recall that these collections implement the map abstract data type

• The two implementations of a map ADT we discussed were binary search on a list and hash tables

Page 6: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Maps

• binary search trees as yet another way to map from a key to a value

• we are not interested in the exact placement of items in the tree, but we are interested in using the binary tree structure to provide for efficient searching

Page 7: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Map Specification

• Map() creates a new, empty binary tree

• put(key,val) Add a new key-value pair to the tree

• get(key) Given a key, return the value stored in the tree or None otherwise

• delete_key(key) Delete the key-value pair from the tree

Page 8: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Map Specification (cont.)

• length() Return the number of key-value pairs stored in the tree

• has_key(key) Return True if the given key is in the dictionary, False otherwise.

• operators We can use the above methods to overload the [] operators for both assignment and lookup; in addition, we can use has_key to override the in operator

Page 9: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Definition

• A binary search tree relies on the property that:

• keys that are less than the parent are found in the left subtree,

• keys that are greater than the parent are found in the right subtree

• This the bst property

Page 10: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Example

• The previous binary search tree represents the nodes that exist after we have inserted the following keys:

• 70, 31, 93, 94, 14, 23, 73

Page 11: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree A Simple Binary Tree

Binary Search TreesSearch Tree OperationsSearch Tree ImplementationSearch Tree Analysis

A Simple Binary Search Tree

70

31 93

14

23

73 94

Recursion

Page 12: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Required Classes

• We'll need to work with an empty binary search tree, so two classes are needed:

• BinarySearchTree

• TreeNode

Page 13: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree Implementation

class BinarySearchTree:

def __init__(self): self.root = None self.size = 0 def length(self): return self.size

def __len__(self): return self.size

def __iter__(self): return self.root.__iter__()

Page 14: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode Implementation

class TreeNode: def __init__(self, key, val, left=None, right=None, parent=None): self.key = key self.payload = val self.left_child = left self.right_child = right self.parent = parent

def has_left_child(self): return self.left_child

def has_right_child(self): return self.right_child def is_left_child(self): return self.parent and \ self.parent.left_child == self

Page 15: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode Implementation (cont.)

def is_right_child(self): return self.parent and \ self.parent.right_child == self

def is_root(self): return not self.parent

def is_leaf(self): return not (self.right_child or self.left_child)

def has_any_children(self): return self.right_child or self.left_child

def has_both_children(self): return self.right_child and self.left_child

Page 16: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode Implementation (cont.)

def replace_node_data(self, key, value, lc, rc): self.key = key self.payload = value self.left_child = lc self.right_child = rc if self.has_left_child(): self.left_child.parent = self if self.has_right_child(): self.right_child.parent = self

Page 17: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode Properties

• Every TreeNode instance keeps track of its parent node

• We leverage keyword arguments to provide optional, customizing parameters

Page 18: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.put

def put(self, key, val): if self.root: self._put(key, val, self.root) else: self.root = TreeNode(key, val) self.size = self.size + 1

def _put(self, key, val, current_node): if key < current_node.key: if current_node.has_left_child(): self._put(key, val, current_node.left_child) else: current_node.left_child = TreeNode(key, val, parent=current_node) else: if current_node.has_right_child(): self._put(key, val, current_node.right_child) else: current_node.right_child = TreeNode(key, val, parent=current_node)

Page 19: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__setitem__

def __setitem__(self, k, v): self.put(k, v)

Page 20: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Inserting a Node with Key = 19

272 Chapter 6. Trees

1 def __setitem__(self ,k,v):2 self.put(k,v)

Listing 6.26: Overloading setitem

17

5 35

29 38162

19 33

Figure 6.21: Inserting a Node with Key = 19

method because it simply searches the tree recursively until it gets to a non-matching leaf node or finds a matching key. When a matching key is found,the value stored in the payload of the node is returned.

Listing 6.27 shows the code for get, get and getitem . The searchcode in the get method uses the same logic for choosing the left or rightchild as the put method. Notice that the get method returns a TreeNode

to get, this allows get to be used as a flexible helper method for otherBinarySearchTree methods that may need to make use of other data fromthe TreeNode besides the payload.

By implementing the getitem method we can write a Python state-ment that looks just like we are accessing a dictionary, when in fact we areusing a binary search tree, for example z = myZipTree['Fargo']. As you cansee from Listing 6.27 all the getitem method does is call get.

Using get, we can implement the in operation by writing a contains

method for the BinarySearchTree. The contains method will simply

Page 21: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.get

def get(self, key): if self.root: res = self._get(key, self.root) if res: return res.payload else: return None else: return None

def _get(self, key, current_node): if not current_node: return None elif current_node.key == key: return current_node elif key < current_node.key: return self._get(key, current_node.left_child) else: return self._get(key, current_node.right_child)

Page 22: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__getitem__

def __getitem__(self, key): return self.get(key)

Page 23: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__contains__

def __contains__(self, key): if self._get(key, self.root): return True else: return False

Page 24: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.delete

def delete(self, key): if self.size > 1: node_to_remove = self._get(key, self.root) if node_to_remove: self.remove(node_to_remove) self.size = self.size-1 else: raise KeyError('Error, key not in tree') elif self.size == 1 and self.root.key == key: self.root = None self.size = self.size - 1 else: raise KeyError('Error, key not in tree')

Page 25: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__delitem__

def __delitem__(self, key): self.delete(key)

Page 26: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Deleting Node 16, Childless

6.7. Binary Search Trees 275

1 i f currentNode.isLeaf ():2 i f currentNode == currentNode.parent.leftChild:3 currentNode.parent.leftChild = None4 e l se :5 currentNode.parent.rightChild = None

Listing 6.30: Case 1: Deleting a Node with No Children

17

5 35

29 38112

169

8

17

5 35

29 38112

9

8

Figure 6.22: Deleting Node 16, a Node without Children

its parent. The code for this case is shown in Listing 6.31. As you look atthis code you will see that there are six cases to consider. Since the casesare symmetric with respect to either having a left or right child we willjust discuss the case where the current node has a left child. The decisionproceeds as follows:

1. If the current node is a left child then we only need to update theparent reference of the left child to point to the parent of the currentnode, and then update the left child reference of the parent to pointto the current node’s left child.

2. If the current node is a right child then we only need to update theparent reference of the right child to point to the parent of the currentnode, and then update the right child reference of the parent to pointto the current node’s right child.

Page 27: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Deleting Node 25, Single Child

6.7. Binary Search Trees 277

17

5

112

169

7

35

29 38

25

17

5

112

169

7

35

29 38

Figure 6.23: Deleting Node 25, a Node That Has a Single Child

successor

17

5 35

29 38112

169

7

8

17

7 35

29 38112

169

8

Figure 6.24: Deleting Node 5, a Node with Two Children

Page 28: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree Deleting Node 5, Two Children

6.7. Binary Search Trees 277

17

5

112

169

7

35

29 38

25

17

5

112

169

7

35

29 38

Figure 6.23: Deleting Node 25, a Node That Has a Single Child

successor

17

5 35

29 38112

169

7

8

17

7 35

29 38112

169

8

Figure 6.24: Deleting Node 5, a Node with Two Children

Page 29: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode.splice_out

def splice_out(self): if self.is_leaf(): if self.is_left_child(): self.parent.left_child = None else: self.parent.right_child = None elif self.has_any_children(): if self.has_left_child(): if self.is_left_child(): self.parent.left_child = self.left_child else: self.parent.right_child = self.left_child self.left_child.parent = self.parent else: if self.is_left_child(): self.parent.left_child = self.right_child else: self.parent.right_child = self.right_child self.right_child.parent = self.parent

Page 30: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode.find_successor

def find_successor(self): succ = None if self.has_right_child(): succ = self.right_child.find_min() else: if self.parent: if self.is_left_child(): succ = self.parent else: self.parent.right_child = None succ = self.parent.find_successor() self.parent.right_child = self return succ

Page 31: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree TreeNode.find_min

def find_min(self): current = self while current.has_left_child(): current = current.left_child return current

Page 32: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.remove

def remove(self, current_node): if current_node.is_leaf(): #leaf if current_node == current_node.parent.left_child: current_node.parent.left_child = None else: current_node.parent.right_child = None elif current_node.has_both_children(): #interior succ = current_node.find_successor() succ.splice_out() current_node.key = succ.key current_node.payload = succ.payload

else: # this node has one child if current_node.has_left_child(): if current_node.is_left_child(): current_node.left_child.parent = current_node.parent current_node.parent.left_child = current_node.left_child

Page 33: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.remove (cont.)

elif current_node.is_right_child(): current_node.left_child.parent = current_node.parent current_node.parent.right_child = current_node.left_child else: current_node.replace_node_data(current_node.left_child.key, current_node.left_child.payload, current_node.left_child.left_child, current_node.left_child.right_child) else: if current_node.is_left_child(): current_node.right_child.parent = current_node.parent current_node.parent.left_child = current_node.right_child elif current_node.is_right_child(): current_node.right_child.parent = current_node.parent current_node.parent.right_child = current_node.right_child else: current_node.replace_node_data(current_node.right_child.key, current_node.right_child.payload, current_node.right_child.left_child, current_node.right_child.right_child)

Page 34: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__iter__

• Using Python's iterator pattern: generator functions

• yield keyword

Page 35: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Tree BinarySearchTree.__iter__

def __iter__(self): if self: if self.has_left_child(): for elem in self.left_child: yield elem yield self.key if self.has_right_child(): for elem in self.right_child: yield elem

Page 36: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

Page 37: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

• Look at put()

• Tree height limits performance

• Added in random order, tree height is log2n

• The number of nodes at any particular level is 2d where d is the depth of the level

• Total nodes in a perfectly balanced binary tree is 2h+1 - 1, where h represents the height of the tree

Page 38: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

• Look at put()

• Tree height limits performance

• Added in random order, tree height is log2n

• The number of nodes at any particular level is 2d where d is the depth of the level

• Total nodes in a perfectly balanced binary tree is 2h+1 - 1, where h represents the height of the tree

Page 39: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

• A perfectly balanced tree has the same number of nodes in the left sub- tree as the right subtree

• In a balanced binary tree, the worst-case per- formance of put is O(log2 n)

• This gives us the height of the tree, and represents the maximum number of comparisons that put will need to do as it searches for the proper place to insert a new node

Page 40: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

• It is possible to construct a search tree that has height n simply by inserting the keys in sorted order

• In this case the performance of the put method is O(n)

Page 41: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees Analysis

• Since get searches the tree to find the key, in the worst case the tree is searched all the way to the bottom and no key is found

• For del, the worst-case scenario to find the successor is also just the height of the tree which means that you would simply double the work

• Since doubling is a constant factor it does not change worst case analysis of O(n) for an unbalanced tree

Page 42: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Binary Search Trees A Skewed Binary Search Tree6.8. Balanced Binary Search Trees 283

10

20

30

40

50

Figure 6.25: A Skewed Binary Search Tree

6.8 Balanced Binary Search Trees

In the previous section we looked at building a binary search tree. As welearned, the performance of the binary search tree can degrade to O(n)for operations like get and put when the tree becomes unbalanced. In thissection we will look at a special kind of binary search tree that automaticallymakes sure that the tree remains balanced at all times. This tree is called anAVL tree and is named for its inventors: G.M. Adelson-Velskii and E.M.Landis.

An AVL tree implements the Map abstract data type just like a regularbinary search tree, the only di↵erence is in how the tree performs. Toimplement our AVL tree we need to keep track of a balance factor foreach node in the tree. We do this by looking at the heights of the left andright subtrees for each node. More formally, we define the balance factorfor a node as the di↵erence between the height of the left subtree and theheight of the right subtree.

balanceFactor = height(leftSubTree)� height(rightSubTree)

Using the definition for balance factor given above we say that a subtreeis left-heavy if the balance factor is greater than zero. If the balance factoris less than zero then the subtree is right heavy. If the balance factor is zerothen the tree is perfectly in balance. For purposes of implementing an AVLtree, and gaining the benefit of having a balanced tree we will define a tree

Page 43: Binary Search Trees - University of Rochesterrsarkis/csc162/_static/lectures/Binary Search Trees.pdfBinary Search Tree Map Specification (cont.) • length() Return the number of key-value

Questions?