COMP211slides_10
description
Transcript of COMP211slides_10
-
E.G.M. Petrakis Trees in Main Memory 1
Balanced BST
Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees
are about the same simple BST are O(N) in the worst case
Categories of BSTs AVL, SPLAY trees: dynamic environment optimal trees: static environment
-
E.G.M. Petrakis Trees in Main Memory 2
AVL Trees
AVL (Adelson, Lelskii, Landis): the height of the left and right subtrees differ by at most 1 the same for every subtree
Number of comparisons for membership operations best case: completely balanced worst case: 1.44 log(N+2) expected case: logN + .25
-
E.G.M. Petrakis Trees in Main Memory 3
AVL Trees
: completely balanced sub-tree
/ : the left sub-tree is 1 higher
\ : the right sub-tree is 1 higher
-
E.G.M. Petrakis Trees in Main Memory 4
AVL Trees
-
E.G.M. Petrakis Trees in Main Memory 5
// : the left sub-tree is 2 higher \\ : the right sub-tree is 2 higher
critical node
Non AVL Trees
-
E.G.M. Petrakis Trees in Main Memory 6
Single Right Rotations
Insertions or deletions may result in non AVL trees => apply rotations to balance the tree
T3
T1 T2
T1
T3 T2
-
E.G.M. Petrakis Trees in Main Memory 7
T1
T3 T2
B
T3
T1 T2
Single Left Rotations
-
E.G.M. Petrakis Trees in Main Memory 8
4
2
4
2
1
single right rotation
2
1 insert 1
4
8
insert 9
4
8
single left rotation
8
4
4
9
9
-
E.G.M. Petrakis Trees in Main Memory 9
10
5
4
11
2
? 6
-
E.G.M. Petrakis Trees in Main Memory 10
Double Left Rotation
Composition of two single rotations (one right and one left rotation)
T4
A
C
T1
B
T2 T3
or
T4
C
T1
or
2
T3
A
T3
or
C
1 T4 T2
-
E.G.M. Petrakis Trees in Main Memory 11
Example of Double Left Rotation
4
2 8
7 9
6
\\
/
/
= =
= =
4
2 7
6 8
9
7
4 8
2 6 9
Critical node
insert 6
-
E.G.M. Petrakis Trees in Main Memory 12
Double Right Rotation
Composition of two single rotations (one left and one right rotation)
B
T4
T1
T2 T3
or
C
B
T4
T2
or
A
T1
T3
C
T3
or
A
4 T2 T1
-
E.G.M. Petrakis Trees in Main Memory 13
Insertion (deletion) in AVL
1. Follow the search path to verify that the key is not already there
2. Insert (delete) the key
3. Retreat along the search path and check the balance factor
4. Rebalance if necessary (see next)
-
E.G.M. Petrakis Trees in Main Memory 14
Rebalancing
For every node reached coming up from its left sub-tree after insertion readjust balance factor = becomes / => no operation \ becomes = => no operation / becomes // => must be rebalanced!!
The // node becomes a critical node Only the path from the critical node to the
leaf has to be rebalanced!! Rotation is applied only at the critical node!
-
E.G.M. Petrakis Trees in Main Memory 15
Rebalancing (cont.)
The balance factor of the critical node determines what rotation is to take place single or double
If the child and the grand child (inserted node) of the critical node are on the same direction (both / or \) => single rotation
Else => double rotation Rebalance similarly if coming up from the
right sub-tree (opposite signs)
-
E.G.M. Petrakis Trees in Main Memory 16
Performance
Performance of membership operations on AVL trees: easy for the worst case!
An AVL tree will never be more than 45% higher that its perfectly balanced counterpart (Adelson, Velskii, Landis):
log(N+1)
-
E.G.M. Petrakis Trees in Main Memory 17
Worst case AVL
Sparse tree => each sub-tree has minimum number of nodes
-
E.G.M. Petrakis Trees in Main Memory 18
Fibonacci Trees
Th: tree of height h
Th has two sub-trees, one with height h-1 and one with height h-2 else it wouldnt have minimum number of nodes
T0 is the empty sub-tree (also Fibonacci)
T1 is the sub-tree with 1 node (also Fibonacci)
-
E.G.M. Petrakis Trees in Main Memory 19
Fibonacci Trees (cont.)
Average height Nh number of nodes of Th Nh = Nh-1 + Nh-2 + 1
N0 = 1
N1 = 2
From which h
-
E.G.M. Petrakis Trees in Main Memory 20
single rotation
7
8
7
8
9
insert 9
8
9 7
More Examples
-
E.G.M. Petrakis Trees in Main Memory 21
double rotation
6
8
6
8
7 insert 7
7
8 6
Examples (cont.)
-
E.G.M. Petrakis Trees in Main Memory 22
double rotation
8
6
8
6
7 insert 7
7
8 6
Examples (cont.)
-
E.G.M. Petrakis Trees in Main Memory 23
delete 6
6
7
9
8
7
9
8
8
9 7
single rotation
Examples (cont.)
-
E.G.M. Petrakis Trees in Main Memory 24
5
6
9
8
6
9
8
8
9 6
delete 5
7 7 7
single rotation
Examples (cont.)
-
E.G.M. Petrakis Trees in Main Memory 25
5
6
8
6
8
7
8 6
delete 5
7 7
double rotation
Examples (cont.)
-
E.G.M. Petrakis Trees in Main Memory 26
5
3
4
8
7 10
9 6 11
2
delete 4
5
2
3
8
7 10
9 6 11
1
1
delete 8
General Deletions
-
E.G.M. Petrakis Trees in Main Memory 27
delete 8 delete 5
5
2
3
7
6 10
9 11
1
delete 6
5
2
3
10
7 11
9
1
General Deletions (cont.)
-
E.G.M. Petrakis Trees in Main Memory 28
delete 5
3
2 10
7 11
9
1
General Deletions (cont.)
-
E.G.M. Petrakis Trees in Main Memory 29
Self Organizing Search
Splay Trees: adapt to query patterns move to root (front): whenever a node is
accessed move it to the root using rotations
equivalent to move-to-front in arrays
current node insertions: inserted node deletions: father of deleted node or null
if this is the root membership: the last accessed node
-
E.G.M. Petrakis Trees in Main Memory 30
20
15 30
13
10 14
8
Search(10)
first rotation
20
15 30
10
8 13
14
second rotation
20
30 10
15 8
13
14
third rotation
10
8 20
30 15
13
14
-
E.G.M. Petrakis Trees in Main Memory 31
Splay Cases
If the current node q has no grandfather but it has father p => only one single rotation
two symmetric cases: L, R
q a
b c
q
a b c
p
p
-
E.G.M. Petrakis Trees in Main Memory 32
gp q p
q
a
b
c d
gp
p
a b
c d
LL
gp
p q
a
b c d
p q
a b c d
gp RL
LL symmetric of RR RL symmetric of RL
If p has also grandfather qp => 4 cases
-
E.G.M. Petrakis Trees in Main Memory 33
1
a q
8 j
i 2
7
h 6
g 3
c 4
d 5
e f
current node
1
q
8 j
i 2
7
h 6
g
c
4
d
5
e f
3
RR LL
-
E.G.M. Petrakis Trees in Main Memory 34
1
a q
8 j
i 2
7
6
g 3
c
4
5
e
f
b
LR
1
q
8
j
i
2
7
6
h
3
c
4
5
e f
d g
RL
a, b, c are sub-trees
-
E.G.M. Petrakis Trees in Main Memory 35
1
a
q
8 j
i
2
7
6
h
3
c
4
5
e
f
b
g
d
-
E.G.M. Petrakis Trees in Main Memory 36
Splay Performance
Splay trees adapt to unknown or changing probability distributions
Splay trees do not guarantee logarithmic cost for each access AVL trees do! asymptotic cost close to the cost of the optimal
BST for unknown probability distributions It can be shown that the cost of m
operations on an initially empty splay tree, where n are insertions is O(mlogn) in the worst case
-
E.G.M. Petrakis Trees in Main Memory 37
Optimal BST
Static environment: no insertions or deletions
Keys are accessed with various frequencies
Have the most frequently accessed keys near the root
Application: a symbol table in main memory
-
E.G.M. Petrakis Trees in Main Memory 38
Searching
Given symbols a1 < a2 < .< an and their probabilities: p1, p2, pn minimize cost
Successful search
Transform unsuccessful to successful consider new symbols E1, E2, En
- 1 2 i i+1 .n n+1 E0 E1 E2 Ei En
Ei= (i , i+1 ) E0= (- , 1 ) En= (n , )
n
i
ii alevelpt1
)(cos
-
E.G.M. Petrakis Trees in Main Memory 39
an
an-1
an-2 Ei
failure node
an-3
unsuccessful search for all values in Ei terminates the same failure node (in fact, one node higher)
Unsuccessful Search
}1)({cos1
n
i
ii alevelpt
-
E.G.M. Petrakis Trees in Main Memory 40
Search Cost
If pi is the probability to search for ai and qi is the probability to search in Ei then
n
1i
n
1iii 1qp
1}(Ei){levelqlevel(ai)pcostn
1i
n
1iii
successful search
unsuccessful search
-
Optimal Tree
The binary search tree with the least cost
Minimizes the average cost for searching given the search probabilities for a static set of given keys
The nodes are known in advance No insertions or deletions
Not balanced in the general case
An AVL is not always optimal
E.G.M. Petrakis Trees in Main Memory 41
-
E.G.M. Petrakis Trees in Main Memory 42
(a1, a2, a3) = (do, if, read) p i = q i = 1/7
read
read
read if
if
if
do
do
do
cost = 15/7
cost = 13/7
cost = 15/7
Optimal BST
if
Example
-
E.G.M. Petrakis Trees in Main Memory 43
read
if
do
if
read do
cost = 15/7 cost = 15/7
-
E.G.M. Petrakis Trees in Main Memory 44
In a BST, a subtree has nodes that are consecutive in a sorted sequence of keys (e.g. [5,26])
10 25
13 24 26
12 14
20
25
13 24 26
14
5
Observation 1
-
E.G.M. Petrakis Trees in Main Memory 45
If Tij is a sub-tree of an optimal BST holding keys from i to j then Tij must be optimal among all
possible BSTs that store the same keys
optimality lemma: all sub-trees of an optimal BST are also optimal
Observation 2
-
E.G.M. Petrakis Trees in Main Memory 46
Optimal BST Construction
1) Construct all possible trees:
NP-hard, there are trees!!
2) Dynamic programming solves the problem in polynomial time O(n3) at each step, the algorithm finds and
stores the optimal tree in each range of key values
increases the size of the range at each step until the whole range is obtained
n
2n
1n1
-
E.G.M. Petrakis Trees in Main Memory 47
Example (successful search only):
1) BSTs with 1 node
range 1 10 20 40
cost= 0.3 0.2 0.1 0.4
2) BSTs with 2 nodes k=1-10 k=10-20 k=20-20
range 1-10 range 20-40 range 10-20
1
10
10
20
20
40 cost=0.3 1+0.2 2=0.7 cost=0.2+0.1 2=0.4
10
1
40
20
20
10 cost=0.2 1+0,3.2=0.8 cost=0.1+0.2 2=0.5
optimal
keys 1 10 20 40
probabilities 0,3 0,2 0,1 0,4
cost=0.1+0.8=0.9
cost=0.4+0.2=0.6
optimal
optimal
-
E.G.M. Petrakis Trees in Main Memory 48
k=1-20 range 1-20
1
20
10
cost=0.1+2 0.3+3 0.2=1.3
1 20
10
cost=0.2+2(0.3+0.1)=1
1
10
20
k=10-40 range 10-40
10
40
20
cost=0.2+2 0.4+3 0.1=1.3
cost=0.3+2 0.2+3 0.1=1
20
10 40
cost=0.1+2(0.2+0.4)=1.3
40
10
20
cost=0.4+2 0.2+30.1=1.1
optimal
3) BSTs with 3 nodes
-
E.G.M. Petrakis Trees in Main Memory 49
4) BSTs with 4 nodes
range 1-40 1
40 10
20
cost=0.3+2 0.4+3 0.2+4 0.1=2.1
10
1 40
20
cost=0.2+2(0.3+0.4)+3 0.1=1.9
20
20
40
40
1
1
10
10
OPTIMAL BST
cost=0.1+2(0.3+0.4)+3 0.2=2.1
cost=0.4+2 0.3+3 0.2+4 0.1=2
-
E.G.M. Petrakis Trees in Main Memory 50
Complexity
Compute all optimal BSTs for all Cij, i,j=1,2..n
Let m=j-i: number of keys in range Cij
n-m-1 Cijs must be computed
The one with the minimum cost must be found, this takes O(m(n-m-1)) time
For all Cijs it takes
There is a better O(n2) algorithm by Knuth
There is also an O(n) greedy algorithm
)O(n)m(nm 3nm1
2
-
E.G.M. Petrakis Trees in Main Memory 51
Optimal BSTs
High probability keys should be near the root
But, the value of a key is also a factor
It may not be desirable to put the smallest or largest key close to the root => this may result in skinny trees (e.g., lists)