Download - Www.monash.edu.au 1 prepared from lecture material © 2004 Goodrich & Tamassia COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material.

www.monash.edu.au

1prepared from lecture material © 2004 Goodrich & Tamassia

COMMONWEALTH OF AUSTRALIA

Copyright Regulations 1969

WARNING

This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to Part VB of the

Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act.

Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.

Do not remove this notice.

www.monash.edu.au

FIT2004

Algorithms & Data Structures L9: Balanced Trees

Prepared by: Bernd Meyer from lecture materials © 2004 Goodrich & Tamassia February 2007

www.monash.edu.au


ADT: DictionaryA dictionary is a searchable collection of key-value pairs

ADT dictionarysorts dict, key, elem, bool;ops

empty: -> dict;insert: dict x key x elem -> dict;delete: dict x key -> dict;contains: dict x key -> bool;retrieve: dict x key -> elem;isempty: dict -> bool;

etc…

we want to implement a dictionary:– by lists --- bad idea: O(n)– by (binary) trees --- a good idea? O(log n)???

www.monash.edu.au


Search Tables

• A search table is a dictionary implemented by means of a sorted sequence– We store the items of the dictionary in an array-based sequence, sorted by key

• Performance:– find takes O(log n) time, using binary search– insert takes O(n) time since in the worst case we have to shift O(n) items to make room

for the new item– remove take O(n) time since in the worst case we have to shift O(n) items to compact

the items after the removal

• The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations)

www.monash.edu.au


Binary Search (Revision)

• Binary search can perform operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key– at each step, the number of candidate items is halved

– terminates after O(log n) steps

• Example: find(7)

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

0

0

0

0

ml h

ml h

ml h

lm h

www.monash.edu.au


Binary Search Trees

• A binary search tree is a binary tree storing keys (or key-value entries) at its internal nodes and satisfying the following property:– Let u, v, and w be three

nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u) key(v) key(w)

• External nodes do not store items

• Which traversal of the tree enumerates the keys in increasing order?

6

92

41 8

www.monash.edu.au


Search

• To search for a key k, we trace a downward path starting at the root

• The next node visited depends on the outcome of the comparison of k with the key of the current node

• If we reach a leaf, the key is not found and we return notFound

• Example: find(4)

Algorithm TreeSearch(k, v)if T.isExternal (v)

return vif k key(v)

return TreeSearch(k, T.left(v))else if k key(v)

return velse { k key(v) }

return TreeSearch(k, T.right(v))

6

92

41 8

www.monash.edu.au


Insertion

• To perform operation insert(k, o), we search for key k (using TreeSearch)

• Assume k is not already in the tree, and let let w be the leaf reached by the search

• We insert k at node w and expand w into an internal node

• Example: insert 5

6

92

41 8

6

92

41 8

5

w

w

www.monash.edu.au


Deletion

• To perform operation remove(k), we search for key k

• Assume key k is in the tree, and let let v be the node storing k

• If node v has a leaf child w, we remove v and w from the tree with operation removeExternal(w), which removes w and its parent

• Example: remove 4

6

92

41 8

5

vw

6

92

51 8

www.monash.edu.au


Deletion (cont.)

• We consider the case where the key k to be removed is stored at a node v whose children are both internal

– we find the internal node w that follows v in an inorder traversal

– we copy key(w) into node v– we remove node w and its left child z

(which must be a leaf) by means of operation removeExternal(z)

• Example: remove 3

• Why is this correct?

3

1

8

6 9

5

v

w

z

2

5

1

8

6 9

v

2

www.monash.edu.au


Performance

• Consider a dictionary with n items implemented by means of a binary search tree of height h

– the space used is O(n)– methods find, insert and

remove take O(h) time• The height h is O(n) in the

worst case and O(log n) in the best case

www.monash.edu.au


Recall: Average Binary Tree Depth

• What is the depth of an average binary search tree?– generate by insertion only, all permutations equally likely:

O(log n) --- we will show this– generated by insertion and deletion, very large sequences

( n) --- very hard to show J. Culberson 1985. The Effects of Updates in Binary Search Trees. 17th annual ACM Symposium on Theory of Computing.

• The tree degenerates: This is not good as all operations become more expensive

• Solution: Self-adjusting trees that maintain their balance using specialize (more expensive) update operations: next lecture.

randomly generated, 500 inserts after 250,000 insert/delete pairs

www.monash.edu.au


Dynamic Self-balancing Trees

• We cannot fully rebalance a binary tree after every operation. This is too costly.

• Thus, we cannot keep a binary tree perfectly balanced at all times.

• As an alternative we will try to relax the requirement in three different ways and hope that we still reach O(log n) access times. We will try to keep it

– almost balanced at all times (AVL Trees)– almost balanced most of the time (Splay Trees)– perfectly balanced at all times but allow it to be it non-binary (2-4-Tree, …)

www.monash.edu.au


AVL Trees

• AVL trees are balanced.• An AVL Tree is a binary

search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1.

88

44

17 78

32 50

48 62

2

4

1

1

2

3

1

1

An example of an AVL tree with heights shown next to the nodes

www.monash.edu.au


Height of an AVL Tree

• Fact: The height of an AVL tree storing n keys is O(log n).• Proof: Let us bound n(h): the minimum number of internal nodes of an

AVL tree of height h.• We easily see that n(1) = 1 and n(2) = 2• For h > 2, an AVL tree of height h contains the root node, one AVL

subtree of height h-1 and another of height h-2.• That is, n(h) = 1 + n(h-1) + n(h-2)• Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So

n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(h-6), … (by induction),n(h) > 2in(h-2i)

• Solving the base case we get: n(h) > 2 h/2-1

• Taking logarithms: h < 2log n(h) +2• Thus the height of an AVL tree is O(log n)

3

4 n(1)

n(2)

www.monash.edu.au


Insertion in an AVL Tree

• Insertion is as in a binary search tree• Always done by expanding an external node.• Example: 44

17 78

32 50 88

48 62

54w

b=x

a=y

c=z

44

17 78

32 50 88

48 62

before insertion after insertion

www.monash.edu.au


Trinode Restructuring

• let (a,b,c) be an inorder listing of x, y, z• perform the rotations needed to make b the topmost node of the three

b=y

a=z

c=x

T0

T1

T2 T3

b=y

a=z c=x

T0 T1 T2 T3

c=y

b=x

a=z

T0

T1 T2

T3b=x

c=ya=z

T0 T1 T2 T3

case 1: single rotation(a left rotation about a)

case 2: double rotation(a right rotation about c, then a left rotation about a)

(other two cases are symmetrical)

www.monash.edu.au


Trinode Restructuring Algorithm

• note that given the definitions above both types can be expressed

• let z be the first imbalanced node, y its higher child and x the higher child of y• let a,b,c be an inorder enumeration of x,y,z• let T0, …, T3 be the left-to-right listing of the subtrees rooted at x, y, z

Alg trinode-restructure

1. replace subtree rooted at z with subtree rooted at b2. let a be the left child of b, and let T0 (T1) be the left (right) child of a3. let c be the right child of b and T2 (T3) be the left (right) child of c

www.monash.edu.au


Insertion Example, continued

T0 T1 T3

unbalanced...

...balanced

88

44

17 78

32 50

48 62

2

5

1

1

3

4

2

1

54

1

T0T2

T3

x

y

z

2

3

4

5

67

1

88

44

17

7832 50

48

622

4

1

1

2 2

3

154

1

T2

x

y z1

2

3

4

5

6

7

T1

www.monash.edu.au


Restructuring (as Single Rotations)

• Single Rotations:

T0

T1

T2

T3

c = x

b = y

a = z

T0 T1 T2

T3

c = x

b = y

a = zsingle rotation

T3

T2

T1

T0

a = x

b = y

c = z

T0T1T2

T3

a = x

b = y

c = zsingle rotation

T0T1 T2 T3

www.monash.edu.au


Restructuring (as Double Rotations)

• double rotations:

double rotationa = z

b = x

c = y

T0

T2

T1

T3 T0

T2T3T1

a = z

b = x

c = y

double rotationc = z

b = x

a = y

T0

T2

T1

T3 T0

T2T3 T1

c = z

b = x

a = y

T0

T1T2 T3T0 T1

T2

T3

www.monash.edu.au


Removal in an AVL Tree

• Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an imbalance.

• Example: 44

17

7832 50

8848

62

54

44

17

7850

8848

62

54

before deletion of 32 after deletion

www.monash.edu.au


Rebalancing after a Removal• Let z be the first unbalanced node encountered while travelling up the tree

from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. If both subtrees have same height at y chose x to be the subtree on the same side of y as y’s side on z.

• We perform restructure(x) to restore balance at z.• As this restructuring may upset the balance of another node higher in the

tree, we must continue checking for balance until the root of T is reached44

17

7850

8848

62

54

w

c=x

b=y

a=z

44

17

78

50 88

48

62

54

www.monash.edu.au


Running Times for AVL Trees

• a single restructure is O(1)– using a linked-structure binary tree

• find is O(log n)– height of tree is O(log n), no restructures needed

• insert is O(log n)– initial find is O(log n)– Restructuring at the node, restoring heights

• remove is O(log n)– initial find is O(log n)– Restructuring up the tree, maintaining heights is O(log n)

www.monash.edu.au


Splay Trees

• Splay Trees are Binary Search Trees

• BST Rules:– entries stored only at internal

nodes– keys stored at nodes in the

left subtree of v are less than or equal to the key stored at v

– keys stored at nodes in the right subtree of v are greater than or equal to the key stored at v

• An inorder traversal will return the keys in order

(20,Z)

(37,P)(21,O)(14,J)

(7,T)

(35,R)(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

www.monash.edu.au


Searching in a Splay Tree

• Start same as BST• Search proceeds down

the tree to found item or an external node.

• Example: Search for time with key 11.

(20,Z)

(37,P)(21,O)(14,J)

(7,T)

(35,R)(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

www.monash.edu.au


Example Searching in a BST, continued

• search for key 8, ends at an internal node.

(20,Z)

(37,P)(21,O)(14,J)

(7,T)

(35,R)(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

www.monash.edu.au


Rotations after Every Operation (Even Search)

• new operation splay: moves a node to the root using rotations right rotation

makes the left child x of a node y into y’s parent; y becomes the right child of x

left rotation makes the right child y of a node x into

x’s parent; x becomes the left child of y

y

x

T1 T2

T3

y

x

T1

T2T3

y

x

T1 T2

T3

y

x

T1

T2T3

(structure of tree above y is not modified)

(structure of tree above x is not modified)

a right rotation about y a left rotation about x

www.monash.edu.au

30prepared from lecture material © 2004 Goodrich & Tamassiaprepared from lecture material © 2004 Goodrich & Tamassia

Splaying: “x is a left-left grandchild” means x is a left child of its parent, which is itself a left child of its parent

p is x’s parent; g is p’s parent

is x the root?

stop

is x a child of the root?

right-rotate about the root

left-rotate about the root

is x the left child of the

root?

is x a left-left grandchild?

is x a left-right grandchild?

is x a right-right grandchild?

is x a right-left grandchild?

right-rotate about g, right-rotate about p

left-rotate about g, left-rotate about p

left-rotate about p, right-rotate about g

right-rotate about p, left-rotate about g

start with node x

no

yes

yes

yes

yes

yes

yes

no

no

yes zig-zig

zig-zag

zig-zag

zig-zig

zigzig

www.monash.edu.au


Visualizing the Splaying Cases

zig-zag

y

x

T2 T3

T4

z

T1

y

x

T2 T3 T4

z

T1

y

x

T1 T2

T3

z

T4

zig-zig

y

z

T4T3

T2

x

T1

zig

x

w

T1 T2

T3

y

T4

y

x

T2 T3 T4

w

T1

www.monash.edu.au


Splaying Example• let x = (8,N)

– x is the right child of its parent, which is the left child of the grandparent

– left-rotate around p, then right-rotate around g

(20,Z)

(37,P)(21,O)(14,J)

(7,T)

(35,R)(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

x

g

p

(10,A)

(20,Z)

(37,P)(21,O)

(35,R)

(36,L) (40,X)(7,T)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(14,J)(8,N)

(7,P)

(10,U)

x

g

p (10,A)

(20,Z)

(37,P)(21,O)

(35,R)

(36,L) (40,X)

(7,T)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(14,J)

(8,N)

(7,P)

(10,U)

x

g

p

1.(before rotating)

2.(after first rotation) 3.

(after second rotation)

x is not yet the root, so we splay again

www.monash.edu.au


Splaying Example, Continued

• now x is the left child of the root

– right-rotate around root

(10,A)

(20,Z)

(37,P)(21,O)

(35,R)

(36,L) (40,X)

(7,T)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(14,J)

(8,N)

(7,P)

(10,U)

x

(10,A)

(20,Z)

(37,P)(21,O)

(35,R)

(36,L) (40,X)

(7,T)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(14,J)

(8,N)

(7,P)

(10,U)

x

1.(before applying rotation)

2.(after rotation)

x is the root, so stop

www.monash.edu.au


Example Result of Splaying

• tree might not be more balanced• e.g. splay (40,X)

– before, the depth of the shallowest leaf is 3 and the deepest is 7

– after, the depth of shallowest leaf is 1 and deepest is 8

(20,Z)

(37,P)(21,O)(14,J)

(7,T)

(35,R)(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

(20,Z)

(37,P)

(21,O)

(14,J)(7,T)

(35,R)

(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P) (36,L)(10,U)

(40,X)

(20,Z)

(37,P)

(21,O)

(14,J)(7,T)

(35,R)

(10,A)

(1,C)

(1,Q)

(5,G)(2,R)

(5,H)

(6,Y)(5,I)

(8,N)

(7,P)

(36,L)

(10,U)

(40,X)

before

after first splay

after second splay

www.monash.edu.au


Splay Tree Definition

• a splay tree is a binary search tree where a node is splayed after it is accessed (for a search or update)– deepest internal node accessed is splayed– splaying costs O(h), where h is height of the tree

– which is still O(n) worst-case> O(h) rotations, each of which is O(1)

www.monash.edu.au


Splay Trees & Ordered Dictionaries

• which nodes are splayed after each operation?

use the parent of the internal node that was actually removed from the tree (the parent of the node that the removed item was swapped with)

remove(k)

use the new node containing the entry insertedinsert(k,v)

if key found, use that node

if key not found, use parent of ending external nodefind(k)

splay nodemethod

www.monash.edu.au


Amortized Analysis of Splay Trees

• “Amortized” means to balance the immediate cost of an operation with the future cost of further operations.

– total cost = immediate cost + future cost– (note: future costs could be negative!)

• Cost (run time) of each operation is proportional to the cost for splaying.

– splay at depth d is O(d), so is find, insert, delete > The immediate cost of an individual rotation are clear:

– Costs: zig = $1, zig-zig = $2, zig-zag = $2.

> splay at depth d performs at most d/2 zig-zig/zig-zag + 1 zig

• We only need to worry about splaying cost!

www.monash.edu.au


Amortized Analysis of Splay Trees

• To get a handle on the future cost we imagine that we maintain a virtual account at each node.

• We can pay into these accounts or withdraw from them.

• We will show that we can maintain a balance of $r(v) at each vertex v by paying a total of $O(log(n)) for each operation.

• Thus the amortized cost (run time) of each operation is O(log(n))

www.monash.edu.au


Invariant for the Analysis

• We define s(v) as the size of the subtree rooted in v (number of nodes) and

r(v)=log2(s(v)) “rank of v”

• Invariant: the balance at each vertex is $r(v).– wisely chosen so that

> the accounts are always positive > we don’t have to cheat by making an advance payment for an empty tree

ie that we do not start the amortization with accounts that have money in them

> we can show the amortized cost to be O(log n)

• The future cost of a rotation is the cost of maintaining this invariant.

www.monash.edu.au


Cost per zig

• Let r(x) be the rank of x before the rotation,• Let r’(x) the rank after the rotation.

• Future cost of a zig at x is at most rank’(x) - rank(x):

future cost = r’(x) + r’(y) - r (y) - r (x) < r’(x) - r (x).• Total cost of zig at x is immediate cost + future cost:

cost < 1 + (r’(x) - r (x)) < 1 + 3*(r’(x) - r (x))

zig

x

w

T1 T2

T3

y

T4

y

x

T2 T3 T4

w

T1

www.monash.edu.au


Cost per zig-zig

• Future cost of a zig-zig at x is at most

future cost < 3(r’(x) - r(x)) - 2.The proof uses the same thinking as the zig case, but is somewhat more complex

(see Goodrich & Tamassia, proposition 9.2, page 440,

or Weiss Sec. 11.5 or Kingston Sec 6.6)

• Total cost of zig-zig at x is immediate cost + future cost:

cost < 2 + 3(r’(x) - r(x)) - 2 = 3(r’(x) - r(x))

y

x

T1 T2

T3

z

T4

zig-zig y

z

T4T3

T2

x

T1

www.monash.edu.au


Cost per zig-zag

Same as cost for a zig-zig:• Future cost of a zig-zag or zig-zag at x is at most

future cost < 3(r’(x) - r(x)) - 2.• Total cost of zig-zig or zig-zag at x is immediate cost + future cost:

cost < 2 + 3(r’(x) - r(x)) - 2 = 3(r’(x) - r(x))

zig-zagy

x

T2 T3

T4

z

T1

y

x

T2 T3 T4

z

T1

www.monash.edu.au


Cost of Splaying (= cost of find)

• Splaying a node x means to rotate it all the way up to the root (m rotations)

• The last operation is a zig, all others are zig-zig or zig-zag.

• total splay cost is the sum of (m-1) zig-zig/zig-zag plus a final zig

• Let rm(x) be the rank of x just after rotation step m

cii=1

m

∑ = cii=1

m−1

∑ + cm

≤ 3(ri (x)−ri−1(x))[ ] +1+ 3(rm(x)−rm−1(x))i=1

m−1

∑= 3rm−1(x)−3r0 (x))[ ] +1+ 3rm(x)−3rm−1(x)= −3r0 (x)) +1+ 3rm(x)≤ 1+ 3rm(x)= 1+ 3logsm(x)= 1+ 3logn= O(logn)

www.monash.edu.au


Cost of Deletion

• The tree shrinks (by one node)

• the total variation of all r(t) is negative

• we don’t have to worry about any extra payment for maintaining the invariant

www.monash.edu.au


Cost of Insertion

• inserting node v increases the ranks of all of v’s ancestors

• let v=vo, v1 be its parent, and vi…vd all the ancestors on the way to the root.

• let n(vi) be the size of the subtree rooted at vi and r(vi) the corresponding rank (before insert)

• let n’(vi) and r’(vi) be the same values after the insert.

– n’(vi) = n(vi) +1

– n(vi) +1 ≤ n(vi+1)

• We have

• so the total variation is

www.monash.edu.au


Amortized Cost

• we now amortize all the costs for m operations (find, insert, delete)– start with empty tree– let n be the total number of insertions (maximum number of keys)

– let ni be the number of keys in the tree after operation I

• the total cost for all these operation is

• thus the amortized cost of each operation is O(log n)

www.monash.edu.au


Performance of Splay Trees

• Thus, amortized cost of any splay operation is O(log n).

• Splay trees can adapt to perform searches on frequently-requested items much faster than O(log n) on average due to the “move-to-root” characteristics.