Balanced BST

49
E.G.M. Petrakis Trees in Main Memory 1 Balanced BST Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub- trees are about the same simple BST are O(N) in the worst case Categories of BSTs AVL, SPLAY trees: dynamic environment optimal trees: static environment

description

Balanced BST. Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees are about the same simple BST are O(N) in the worst case Categories of BSTs AVL , SPLAY trees: dynamic environment optimal trees : static environment. AVL Trees. - PowerPoint PPT Presentation

Transcript of Balanced BST

Page 1: Balanced BST

E.G.M. Petrakis Trees in Main Memory 1

Balanced BST

Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees

are about the same simple BST are O(N) in the worst case

Categories of BSTs AVL, SPLAY trees: dynamic

environment optimal trees: static environment

Page 2: Balanced BST

E.G.M. Petrakis Trees in Main Memory 2

AVL Trees

AVL (Adelson, Lelskii, Landis): the height of the left and right subtrees differ by at most 1 the same for every subtree

Number of comparisons for membership operations best case: completely balanced worst case: 1.44 log(N+2) expected case: logN + .25 <= 1)log(N

Page 3: Balanced BST

E.G.M. Petrakis Trees in Main Memory 3

AVL Trees

“—” : completely balanced sub-tree “/” : the left sub-tree is 1 higher “\” : the right sub-tree is 1 higher

Page 4: Balanced BST

E.G.M. Petrakis Trees in Main Memory 4

AVL Trees

Page 5: Balanced BST

E.G.M. Petrakis Trees in Main Memory 5

“//” : the left sub-tree is 2 higher “\\” : the right sub-tree is 2 higher

critical node

Non AVL Trees

Page 6: Balanced BST

E.G.M. Petrakis Trees in Main Memory 6

Single Right Rotations

Insertions or deletions may result in non AVL trees => apply rotations to balance the tree

T3

T1 T2

Α

Β

T1

T3T2

Α

Β

Page 7: Balanced BST

E.G.M. Petrakis Trees in Main Memory 7

T1

Α

Β

Α

T3T2

ΑB

ΑΑT3

T1 T2

Single Left Rotations

Page 8: Balanced BST

E.G.M. Petrakis Trees in Main Memory 8

4

2

4

2

1

single right rotation

2

1 insert 1

4

8

insert 9

4

8

single left rotation

8

4

4

9

9

Page 9: Balanced BST

E.G.M. Petrakis Trees in Main Memory 9

Double Left Rotation

Composition of two single rotations (one right and one left rotation)

ΑΒ

T4

Α

A

C

T1

ΑB

T2 T3

oror

T4

Α

Β

C

Α

T1

oror

Τ2Τ2

T3

Α

A

T3

oror

ΑC

Τ1Τ1T4 T2

Page 10: Balanced BST

E.G.M. Petrakis Trees in Main Memory 10

Example of Double Left Rotation

4

2 8

7 9

6

\\

/

/

== ==

== ==

4

2 7

6 8

9

7

4 8

2 6 9

Critical node

insert 6

Page 11: Balanced BST

E.G.M. Petrakis Trees in Main Memory 11

Double Right Rotation

Composition of two single rotations (one left and one right rotation)

ΑB

ΑΑ

T4

T1

ΑΒ

T2 T3

oror

ΑC

ΑB

T4

T2

oror

A

T1

T3

Α

C

Β

T3

oror

ΑA

ΤΤ44T2T1

Page 12: Balanced BST

E.G.M. Petrakis Trees in Main Memory 12

Insertion (deletion) in AVL

1. Follow the search path to verify that the key is not already there

2. Insert (delete) the key 3. Retreat along the search path

and check the balance factor 4. Rebalance if necessary (see next)

Page 13: Balanced BST

E.G.M. Petrakis Trees in Main Memory 13

Rebalancing

For every node reached coming up from its left sub-tree after insertion readjust balance factor ‘=’ becomes ‘/’ => no operation ‘\’ becomes ‘=’ => no operation ‘/’ becomes ‘//’ => must be rebalanced!!

The “//” node becomes a “critical node” Only the path from the critical node to

the leaf has to be rebalanced!! Rotation is applied only at the critical

node!

Page 14: Balanced BST

E.G.M. Petrakis Trees in Main Memory 14

Rebalancing (cont.)

The balance factor of the critical node determines what rotation is to take place single or double

If the child and the grand child (inserted node) of the critical node are on the same direction (both “/”) => single rotation

Else => double rotation Rebalance similarly if coming up from the

right sub-tree (opposite signs)

Page 15: Balanced BST

E.G.M. Petrakis Trees in Main Memory 15

Performance

Performance of membership operations on AVL trees: easy for the worst case!

An AVL tree will never be more than 45% higher that its perfectly balanced counterpart (Adelson, Velskii, Landis):

log(N+1) <= hb(N) <=

l.4404log(N+2) – 0.302

Page 16: Balanced BST

E.G.M. Petrakis Trees in Main Memory 16

Worst case AVL

Sparse tree => each sub-tree has minimum number of nodes

Page 17: Balanced BST

E.G.M. Petrakis Trees in Main Memory 17

Fibonacci Trees

Th: tree of height h

Th has two sub-trees, one with height h-1 and one with height h-2 else it wouldn’t have minimum number of

nodes

T0 is the empty sub-tree (also Fibonacci)

T1 is the sub-tree with 1 node (also Fibonacci)

Page 18: Balanced BST

E.G.M. Petrakis Trees in Main Memory 18

Fibonacci Trees (cont.)

Average height Nh number of nodes of Th

Nh = Nh-1 + Nh-2 + 1

N0 = 1

N1 = 2

From which h <= 1.44 log(N+1)

12

51

5

12

51

5

1N

2h2h

h

Page 19: Balanced BST

E.G.M. Petrakis Trees in Main Memory 19

single rotation

77

88

77

88

99

insert 9

88

9977

More Examples

Page 20: Balanced BST

E.G.M. Petrakis Trees in Main Memory 20

double rotation

66

88

66

88

77insert 7

77

8866

Examples (cont.)

Page 21: Balanced BST

E.G.M. Petrakis Trees in Main Memory 21

double rotation

88

66

88

66

7insert 7

77

8866

Examples (cont.)

Page 22: Balanced BST

E.G.M. Petrakis Trees in Main Memory 22

delete 6

66

77

99

88

77

99

88

88

9977

single rotation

Examples (cont.)

Page 23: Balanced BST

E.G.M. Petrakis Trees in Main Memory 23

55

66

99

88

66

99

88

88

9966

delete 5

77 77 77

single rotation

Examples (cont.)

Page 24: Balanced BST

E.G.M. Petrakis Trees in Main Memory 24

55

66

88

66

88

77

8866

delete 5

77 77

double rotation

Examples (cont.)

Page 25: Balanced BST

E.G.M. Petrakis Trees in Main Memory 25

5

3

4

8

7 10

96 1111

2

delete 4

5

2

3

8

7 10

96 1111

1

1

delete 8

General Deletions

Page 26: Balanced BST

E.G.M. Petrakis Trees in Main Memory 26

delete 8 delete 5

5

2

3

7

6 10

9 1111

1

delete 6

5

2

3

10

7 11

9

1

General Deletions (cont.)

Page 27: Balanced BST

E.G.M. Petrakis Trees in Main Memory 27

delete 5

3

2 10

7 11

9

1

General Deletions (cont.)

Page 28: Balanced BST

E.G.M. Petrakis Trees in Main Memory 28

Self Organizing Search

Splay Trees: adapt to query patterns move to root (front): whenever a node is

accessed move it to the root using rotations

equivalent to move-to-front in arrays current node

insertions: inserted node deletions: father of deleted node or null if

this is the root membership: the last accessed node

Page 29: Balanced BST

E.G.M. Petrakis Trees in Main Memory 29

20

15 30

13

1014

8

Search(10)

first rotation

20

15 30

10

8 13

14

second rotation

20

3010

158

13

14

third rotation

10

8 20

3015

13

14

Page 30: Balanced BST

E.G.M. Petrakis Trees in Main Memory 30

Splay Cases

If the current node q has no grandfather but it has father p => only one single rotation

– two symmetric cases: L, R

qa

b c

q

a bc

p

p

Page 31: Balanced BST

E.G.M. Petrakis Trees in Main Memory 31

gp qp

q

a

b

c d

gp

p

a b

cd

LL

gp

pq

a

b cd

pq

a b c d

gpRL

LL symmetric of RRRL symmetric of RL

If p has also grandfather qp => 4 cases

Page 32: Balanced BST

E.G.M. Petrakis Trees in Main Memory 32

1

a q

8 j

i2

7

h6

g3

c 4

d5

e f

current node

1

α q

8 j

i2

7

h6

g

c

4

d

5

ef

3

RR LL

Page 33: Balanced BST

E.G.M. Petrakis Trees in Main Memory 33

1

a q

8 j

i2

7

6

g3

c

4

5

e

f

b

LR

1

α q

8

j

i

2

7

6

h

3

c

4

5

ef

dg

RL

a, b, c … are sub-trees

Page 34: Balanced BST

E.G.M. Petrakis Trees in Main Memory 34

1

a

q

8 j

i

2

7

6

h

3

c

4

5

e

f

b

g

d

Page 35: Balanced BST

E.G.M. Petrakis Trees in Main Memory 35

Splay Performance

Splay trees adapt to unknown or changing probability distributions

Splay trees do not guarantee logarithmic cost for each access AVL trees do! asymptotic cost close to the cost of the

optimal BST for unknown probability distributions

It can be shown that the “cost of m operations on an initially empty splay tree, where n are insertions is O(mlogn) in the worst case”

Page 36: Balanced BST

E.G.M. Petrakis Trees in Main Memory 36

Optimal BST

Static environment: no insertions or deletions

Keys are accessed with various frequencies

Have the most frequently accessed keys near the root

Application: a symbol table in main memory

Page 37: Balanced BST

E.G.M. Petrakis Trees in Main Memory 37

Searching

Given symbols a1 < a2 < ….< an and theirprobabilities: p1, p2, … pn minimize cost

Successful search Transform unsuccessful to

successful consider new symbols E1, E2, … En

- … α1 … α2 …αi … αi+1 ….…αn… αn+1

E0 E1 E2 Ei En

Ei= (αi , αi+1 ) E0= (- , α1 ) En= (αn , )

n

iii alevelpt

1

)(cos

Page 38: Balanced BST

E.G.M. Petrakis Trees in Main Memory 38

an

an-1

an-2EEii

failure nodefailure node

an-3

unsuccessful search for all values in Ei

terminates on the samefailure node (in fact, onenode higher)

Unsuccessful Search

Page 39: Balanced BST

E.G.M. Petrakis Trees in Main Memory 39

(a1, a2, a3) = (do, if, read)p i = q i = 1/7

readread

readread

readread ifif

ifif

ifif

dodo

dodo

dodo

cost = 15/7cost = 15/7

cost = 13/7cost = 13/7

cost = 15/7cost = 15/7

Optimal BST

if

Example

Page 40: Balanced BST

E.G.M. Petrakis Trees in Main Memory 40

read

if

do

if

readdo

cost = 15/7cost = 15/7 cost = 15/7cost = 15/7

Page 41: Balanced BST

E.G.M. Petrakis Trees in Main Memory 41

Search Cost

If pi is the probability to search for ai and qi is the probability to search in Ei then

n

1i

n

1iii 1qp

1}(Ei){levelqlevel(ai)pcostn

1i

n

1iii

successfulsearch

unsuccessfulsearch

Page 42: Balanced BST

E.G.M. Petrakis Trees in Main Memory 42

In a BST, a subtree has nodes that are consecutive in a sorted sequence of keys (e.g. [5,26])

10 25

13 24 26

12 14

20

25

13 24 26

14

5

Observation 1

Page 43: Balanced BST

E.G.M. Petrakis Trees in Main Memory 43

If Tij is a sub-tree of an optimal BST holding keys from i to j then Tij must be optimal among all

possible BSTs that store the same keys

optimality lemma: all sub-trees of an optimal BST are also optimal

Observation 2

Page 44: Balanced BST

E.G.M. Petrakis Trees in Main Memory 44

Optimal BST Construction

1) Construct all possible trees: NP-hard, there are trees!!

2) Dynamic programming solves the problem in polynomial time O(n3)

at each step, the algorithm finds and stores the optimal tree in each range of key values

increases the size of the range at each step until the whole range is obtained

n

2n

1n1

Page 45: Balanced BST

E.G.M. Petrakis Trees in Main Memory 45

Example (successful search only):

1) BSTs with 1 node

range 1 10 20 40cost= 0.3 0.2 0.1 0.4

2) BSTs with 2 nodes k=1-10 k=10-20 k=20-20

range 1-10 range 20-40range 10-20

1

10

10

20

20

40cost=0.3 1+0.2 2=0.7 cost=0.2+0.1 2=0.4

101

4020

2010

cost=0.2 1+0,3.2=0.8 cost=0.1+0.2 2=0.5

optimal

keys 1 10 20 40probabilities 0,3 0,2 0,1 0,4

cost=0.1+0.8=0.9

cost=0.4+0.2=0.6

optimal

optimal

Page 46: Balanced BST

E.G.M. Petrakis Trees in Main Memory 46

k=1-20range 1-20

1

20

10cost=0.1+2 0.3+3 0.2=1.3

1 20

10

cost=0.2+2(0.3+0.1)=1

1

10

20

k=10-40range 10-40

1040

20

cost=0.2+2 0.4+3 0.1=1.3

cost=0.3+2 0.2+3 0.1=1

20

10 40cost=0.1+2(0.2+0.4)=1.3

40

10

20

cost=0.4+2 0.2+30.1=1.1

optimal

3) BSTs with 3 nodes

Page 47: Balanced BST

E.G.M. Petrakis Trees in Main Memory 47

4) BSTs with 4 nodesrange 1-40

1

4010

20

cost=0.3+2 0.4+3 0.2+4 0.1=2.1

10

1 40

20

cost=0.2+2(0.3+0.4)+3 0.1=1.9

20

20

40

40

1

1

10

10

OPTIMAL BST

cost=0.1+2(0.3+0.4)+3 0.2=2.1

cost=0.4+2 0.3+3 0.2+4 0.1=2

Page 48: Balanced BST

E.G.M. Petrakis Trees in Main Memory 48

Complexity

Compute all optimal BSTs for all Cij, i,j=1,2..n

Let m=j-i: number of keys in range Cij

n-m-1 Cij’s must be computed The one with the minimum cost must be

found, this takes O(m(n-m-1)) time For all Cij’s it takes There is a better O(n2) algorithm by Knuth There is also a O(n) greedy algorithm

)O(n)m(nm 3

nm1

2

Page 49: Balanced BST

E.G.M. Petrakis Trees in Main Memory 49

Optimal BSTs

High probability keys should be near the root

But, the value of a key is also a factor

It may not be desirable to put the smallest or largest key close to the root => this may result in skinny trees (e.g., lists)