CS 221

40
CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees

description

CS 221. Analysis of Algorithms Ordered Dictionaries and Search Trees. Portions of these slides come from Michael Goodrich and Roberto Tamassia, Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons. and its authors, Michael Goodrich and Roberto Tamassia, - PowerPoint PPT Presentation

Transcript of CS 221

Page 1: CS 221

CS 221

Analysis of Algorithms

Ordered Dictionaries and Search Trees

Page 2: CS 221

Portions of these slides come from Michael Goodrich and Roberto Tamassia,

Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons.

and its authors, Michael Goodrich and Roberto Tamassia,

the books publisher John Wiley & Sons and… www.wikipedia.org

Page 3: CS 221

Reading material

Goodrich and Tamassia, 2002 Chapter 2, section 2.5,pages 114-137 see also section 2.6

Chapter 3, section 3.1 pages 141-151

Wikipedia: http://en.wikipedia.org/wiki/AVL_trees

Page 4: CS 221

in the previous episode… …we defined a data structure which we

called a dictionary. It was… a container to hold multiple objects or in

Goodrich and Tamassia’s terminology “items” each item = a (key, element) pair element = a “piece” of data

think= name, address, phone number key = a value we associate the element to help

us find, retrieve, delete, etc an element think = rdbms autoincrement key, student ID#

Page 5: CS 221

Dictionaries

Up til now we looked at Unordered dictionaries

container for (k,e) pairs but… in no particular order

Logfiles Hash Tables

Page 6: CS 221

Dictionaries

A terminology note for purposes of our discussion –

A linear unordered dictionary = logfile A lineary ordered dictionary = lookup table

Page 7: CS 221

Game Time

Twenty Questions One person thinks of an object that can

be any person, place or thing… and does not disclose the selected object

until it is specifically identified by the other players…

All other players take turns asking Yes/No questions in an attempt to identify the mystery object

Page 8: CS 221

Game Time

Twenty Questions An efficient problem solving strategy is

to ask questions for which the answers will optimally narrow the size of the problem space (possible solutions)

for example, Q: Is it a person? A: Yes ….we just eliminated all places and

non-human objects from the solution set

Page 9: CS 221

Game Time

Twenty Questions Size of problem?

N=??? large ~∞

Yes/No attack makes this a binary search problem…

So, what size of problem space can we effectively search? 220

Page 10: CS 221

Game Time

Twenty Questions Something to think about…

N is conceivably much larger than 220

So, how is that we can usually solve this problem in 20 steps or less… i.e. correctly identify the mystery object

Page 11: CS 221

Dictionaries Ordered Dictionaries

suppose the items in a dictionary are ordered (sorted) like low to high

Would that make a difference in terms of size() isEmpty() findElement() insertItem() removeItem()

Page 12: CS 221

Dictionaries Ordered Dictionaries

suppose we implement an ordered dictionary as a linear data structure or more specifically a vector

items are in vector in key order we gain considerable efficiency because we can

visit D[x], where x is a rank in O(1) time Can we achieve the same time of findElement()

time if the ordered dictionary were implemented as a linked list?

Page 13: CS 221

Binary Search Binary search performs operation findElement(k) on a

dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after O(log n) steps

Example: findElement(7)

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

0

0

0

0

ml h

ml h

ml h

lm h

Page 14: CS 221

Binary Search

Lookup tables are not very efficient for dynamic data (lot of insertItem, removeElement

Lookup tables are efficient for dictionaries where predominant access is findElement, and relatively little inserts or removes credit card authorizations, code translation tables,…

Method Logfile Lookup Table

findElement O(n) O(log n)

insertItem O(1) O(n)

removeElement

O(n) O(n)

closetKeyBef O(n) O(log n)

Page 15: CS 221

Binary Search Tree

Binary tree for holding (k,e) items, such that… each internal node v store elem e with

key k k of e in left subtree of v <= k of v k of e in right subtree of v >= k of v external nodes store no elements…

only placeholder (NULL_NODE)

Page 16: CS 221

Binary Search Tree Each left

subtree is less than its parent

Each right subtree is greater than its parent

All leaf nodes hold no items

58

31 90

25 42

12 36

62

75

Page 17: CS 221

SearchAlgorithm findElement(k, v)

if T.isExternal (v)return NO_SUCH_KEY

if k key(v)return findElement(k, T.leftChild(v))

else if k key(v)return element(v)

else { k key(v) }return findElement(k, T.rightChild(v))

6

92

41 8

Page 18: CS 221

removeElement(k) – simple case

To perform operation removeElement(k), we search for key k

Assume key k is in the tree, and let let v be the node storing k

If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w)

Example: remove 4

6

92

41 8

5

vw

6

92

51 8

Page 19: CS 221

RemoveElement(k) – more complicated case

We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w

that follows v in an inorder traversal

we copy key(w) into node v we remove node w and its

left child z (which must be a leaf) by means of operation removeAboveExternal(z)

Example: remove 3

3

1

8

6 9

5

v

w

z

2

5

1

8

6 9

v

2

Page 20: CS 221

Binary Search Tree Performance Consider a dictionary

with n items implemented by means of a binary search tree of height h the space used is O(n) methods findElement ,

insertItem and removeElement take O(h) time

The height h is O(n) in the worst case and O(log n) in the best case

Page 21: CS 221

Balanced Trees

When a path in a tree gets very long relative to other paths in the tree…

the tree is unbalanced In fact, in its extreme form an

unbalanced tree is a linear list. So, to achieve optimal performance… you need to keep the tree balanced

Page 22: CS 221

AVL Trees we want to maintain a balanced tree recall-

height of a node v = longest path from v to an external node

We want to maintain the principle that for every node v the height of its children

can differ by no more than 1 Height-Balance Property

Page 23: CS 221

AVL Trees h(right_subtree)-h(left_subtree) =

Balance Factor |h(right_subtree)-h(left_subtree)| =

{0,1} Tree with Balance Factor ≠ {-1,0,1}

Unbalanced Tree Must be rebalanced

Balance Factor exists for every node v except (trivially) external nodes

Page 24: CS 221

AVL Trees

If Balance Factor = -1,0,1 tree balanced does not need restructured

If Balance Factor = -2, 2 tree unbalanced needs restructured

restructured done by process called rotation

Page 25: CS 221

AVL Trees

Rotation Four types – but two are symmetrical

Left Single Rotation Right Single Rotation Left Double Rotation Right Double Rotation

Since two are symmetrical –only consider single and double rotation

Page 26: CS 221

AVL Trees

Rotation if BF = 2

Page 27: CS 221

AVL Trees

Binary Trees that maintain the Height-Balance Property are called

AVL trees the name comes from the inventors

G.M. Adelson-Velsky and E.M. Landis in paper entitled “An Algorithm for Information Organization”

Page 28: CS 221

AVL Trees

Unbalanced Tree Balanced Tree

from:http://en.wikipedia.org/wiki/AVL_trees

Page 29: CS 221

AVL Trees h(right_subtree)-h(left_subtree) =

Balance Factor (BF) If BF = {-1,0,1} then tree balanced

(do nothing) If BF ≠{-1,0,1} then tree unbalanced

(must be restructured) Restructuring done by rotation

from:http://en.wikipedia.org/wiki/AVL_trees

Page 30: CS 221

AVL Trees

Rotation four cases – but pairs are symmetrical

left single rotation right single rotation left double rotation right double rotation

singe symmetric – we only examine single and double

from:http://en.wikipedia.org/wiki/AVL_trees

Page 31: CS 221

AVL Trees - Insertion Rotation

If BF > 2 unbalance occurred further down in right subtree Recursively walk down subtree until |BF| =2

If BF < -2 unbalance occurred further down in left subtree Recursively walk down subtree until |BF| =2

from:http://en.wikipedia.org/wiki/AVL_trees

Page 32: CS 221

AVL Trees - Insertion Rotation

If BF = 2 unbalance occurred in right subtree Recursively walk down subtree until |BF| =2

If BF = -2 unbalance occurred in left subtree Recursively walk down subtree until |BF| =2

from:http://en.wikipedia.org/wiki/AVL_trees

Page 33: CS 221

AVL Trees - Insertion Rotation

If BF = 2 unbalance occurred in right subtree Step down to subtree to find where

insertion occurred If BF = -2 unbalance occurred in left

subtree Step down to subtree to find where

insertion occurred

from:http://en.wikipedia.org/wiki/AVL_trees

Page 34: CS 221

AVL Trees - Insertion

Rotation If BF at subtree = 1

insertion occurred on right leaf node single rotation required

If BF at subtree = -1 insertion occurred on left leaf node double rotation occurred

from:http://en.wikipedia.org/wiki/AVL_trees

Page 35: CS 221

AVL Trees - Insertion

Rotation See

http://en.wikipedia.org/wiki/AVL_trees

from:http://en.wikipedia.org/wiki/AVL_trees

Page 36: CS 221

AVL Trees - Insertion

Performance rotations – O(1) Recall h(T) maintained at O(log n) insertItem – O(log n) balanced tree - priceless

from:http://en.wikipedia.org/wiki/AVL_trees

Page 37: CS 221

Bounded –depth Search Trees

Search efficiency in tree is related to the depth of the tree

Can use depth bounded tree to create ordered dictionaries that run in O(log n) for search and update run-time

Page 38: CS 221

Multi-way Search Trees

Remember Binary Search Trees any node v can have at most 2 children what if we get rid of that rule

Suppose a node could have multiple children (>2)

Terminology – if v has d children – v is a d-node

Page 39: CS 221

Multi-way Search Trees

Multi-way Search Tree - T Each Internal node must have at least

two children -- internal node is d-node with d ≥ 2

Internal nodes store collections of items (k,e)

Each d-node stores d-1 items Special keys k0 = -∞ and kd = ∞ External nodes only placeholders

Page 40: CS 221