COMP211slides_10

download COMP211slides_10

of 51

description

δομές αρχείων

Transcript of COMP211slides_10

  • E.G.M. Petrakis Trees in Main Memory 1

    Balanced BST

    Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees

    are about the same simple BST are O(N) in the worst case

    Categories of BSTs AVL, SPLAY trees: dynamic environment optimal trees: static environment

  • E.G.M. Petrakis Trees in Main Memory 2

    AVL Trees

    AVL (Adelson, Lelskii, Landis): the height of the left and right subtrees differ by at most 1 the same for every subtree

    Number of comparisons for membership operations best case: completely balanced worst case: 1.44 log(N+2) expected case: logN + .25

  • E.G.M. Petrakis Trees in Main Memory 3

    AVL Trees

    : completely balanced sub-tree

    / : the left sub-tree is 1 higher

    \ : the right sub-tree is 1 higher

  • E.G.M. Petrakis Trees in Main Memory 4

    AVL Trees

  • E.G.M. Petrakis Trees in Main Memory 5

    // : the left sub-tree is 2 higher \\ : the right sub-tree is 2 higher

    critical node

    Non AVL Trees

  • E.G.M. Petrakis Trees in Main Memory 6

    Single Right Rotations

    Insertions or deletions may result in non AVL trees => apply rotations to balance the tree

    T3

    T1 T2

    T1

    T3 T2

  • E.G.M. Petrakis Trees in Main Memory 7

    T1

    T3 T2

    B

    T3

    T1 T2

    Single Left Rotations

  • E.G.M. Petrakis Trees in Main Memory 8

    4

    2

    4

    2

    1

    single right rotation

    2

    1 insert 1

    4

    8

    insert 9

    4

    8

    single left rotation

    8

    4

    4

    9

    9

  • E.G.M. Petrakis Trees in Main Memory 9

    10

    5

    4

    11

    2

    ? 6

  • E.G.M. Petrakis Trees in Main Memory 10

    Double Left Rotation

    Composition of two single rotations (one right and one left rotation)

    T4

    A

    C

    T1

    B

    T2 T3

    or

    T4

    C

    T1

    or

    2

    T3

    A

    T3

    or

    C

    1 T4 T2

  • E.G.M. Petrakis Trees in Main Memory 11

    Example of Double Left Rotation

    4

    2 8

    7 9

    6

    \\

    /

    /

    = =

    = =

    4

    2 7

    6 8

    9

    7

    4 8

    2 6 9

    Critical node

    insert 6

  • E.G.M. Petrakis Trees in Main Memory 12

    Double Right Rotation

    Composition of two single rotations (one left and one right rotation)

    B

    T4

    T1

    T2 T3

    or

    C

    B

    T4

    T2

    or

    A

    T1

    T3

    C

    T3

    or

    A

    4 T2 T1

  • E.G.M. Petrakis Trees in Main Memory 13

    Insertion (deletion) in AVL

    1. Follow the search path to verify that the key is not already there

    2. Insert (delete) the key

    3. Retreat along the search path and check the balance factor

    4. Rebalance if necessary (see next)

  • E.G.M. Petrakis Trees in Main Memory 14

    Rebalancing

    For every node reached coming up from its left sub-tree after insertion readjust balance factor = becomes / => no operation \ becomes = => no operation / becomes // => must be rebalanced!!

    The // node becomes a critical node Only the path from the critical node to the

    leaf has to be rebalanced!! Rotation is applied only at the critical node!

  • E.G.M. Petrakis Trees in Main Memory 15

    Rebalancing (cont.)

    The balance factor of the critical node determines what rotation is to take place single or double

    If the child and the grand child (inserted node) of the critical node are on the same direction (both / or \) => single rotation

    Else => double rotation Rebalance similarly if coming up from the

    right sub-tree (opposite signs)

  • E.G.M. Petrakis Trees in Main Memory 16

    Performance

    Performance of membership operations on AVL trees: easy for the worst case!

    An AVL tree will never be more than 45% higher that its perfectly balanced counterpart (Adelson, Velskii, Landis):

    log(N+1)

  • E.G.M. Petrakis Trees in Main Memory 17

    Worst case AVL

    Sparse tree => each sub-tree has minimum number of nodes

  • E.G.M. Petrakis Trees in Main Memory 18

    Fibonacci Trees

    Th: tree of height h

    Th has two sub-trees, one with height h-1 and one with height h-2 else it wouldnt have minimum number of nodes

    T0 is the empty sub-tree (also Fibonacci)

    T1 is the sub-tree with 1 node (also Fibonacci)

  • E.G.M. Petrakis Trees in Main Memory 19

    Fibonacci Trees (cont.)

    Average height Nh number of nodes of Th Nh = Nh-1 + Nh-2 + 1

    N0 = 1

    N1 = 2

    From which h

  • E.G.M. Petrakis Trees in Main Memory 20

    single rotation

    7

    8

    7

    8

    9

    insert 9

    8

    9 7

    More Examples

  • E.G.M. Petrakis Trees in Main Memory 21

    double rotation

    6

    8

    6

    8

    7 insert 7

    7

    8 6

    Examples (cont.)

  • E.G.M. Petrakis Trees in Main Memory 22

    double rotation

    8

    6

    8

    6

    7 insert 7

    7

    8 6

    Examples (cont.)

  • E.G.M. Petrakis Trees in Main Memory 23

    delete 6

    6

    7

    9

    8

    7

    9

    8

    8

    9 7

    single rotation

    Examples (cont.)

  • E.G.M. Petrakis Trees in Main Memory 24

    5

    6

    9

    8

    6

    9

    8

    8

    9 6

    delete 5

    7 7 7

    single rotation

    Examples (cont.)

  • E.G.M. Petrakis Trees in Main Memory 25

    5

    6

    8

    6

    8

    7

    8 6

    delete 5

    7 7

    double rotation

    Examples (cont.)

  • E.G.M. Petrakis Trees in Main Memory 26

    5

    3

    4

    8

    7 10

    9 6 11

    2

    delete 4

    5

    2

    3

    8

    7 10

    9 6 11

    1

    1

    delete 8

    General Deletions

  • E.G.M. Petrakis Trees in Main Memory 27

    delete 8 delete 5

    5

    2

    3

    7

    6 10

    9 11

    1

    delete 6

    5

    2

    3

    10

    7 11

    9

    1

    General Deletions (cont.)

  • E.G.M. Petrakis Trees in Main Memory 28

    delete 5

    3

    2 10

    7 11

    9

    1

    General Deletions (cont.)

  • E.G.M. Petrakis Trees in Main Memory 29

    Self Organizing Search

    Splay Trees: adapt to query patterns move to root (front): whenever a node is

    accessed move it to the root using rotations

    equivalent to move-to-front in arrays

    current node insertions: inserted node deletions: father of deleted node or null

    if this is the root membership: the last accessed node

  • E.G.M. Petrakis Trees in Main Memory 30

    20

    15 30

    13

    10 14

    8

    Search(10)

    first rotation

    20

    15 30

    10

    8 13

    14

    second rotation

    20

    30 10

    15 8

    13

    14

    third rotation

    10

    8 20

    30 15

    13

    14

  • E.G.M. Petrakis Trees in Main Memory 31

    Splay Cases

    If the current node q has no grandfather but it has father p => only one single rotation

    two symmetric cases: L, R

    q a

    b c

    q

    a b c

    p

    p

  • E.G.M. Petrakis Trees in Main Memory 32

    gp q p

    q

    a

    b

    c d

    gp

    p

    a b

    c d

    LL

    gp

    p q

    a

    b c d

    p q

    a b c d

    gp RL

    LL symmetric of RR RL symmetric of RL

    If p has also grandfather qp => 4 cases

  • E.G.M. Petrakis Trees in Main Memory 33

    1

    a q

    8 j

    i 2

    7

    h 6

    g 3

    c 4

    d 5

    e f

    current node

    1

    q

    8 j

    i 2

    7

    h 6

    g

    c

    4

    d

    5

    e f

    3

    RR LL

  • E.G.M. Petrakis Trees in Main Memory 34

    1

    a q

    8 j

    i 2

    7

    6

    g 3

    c

    4

    5

    e

    f

    b

    LR

    1

    q

    8

    j

    i

    2

    7

    6

    h

    3

    c

    4

    5

    e f

    d g

    RL

    a, b, c are sub-trees

  • E.G.M. Petrakis Trees in Main Memory 35

    1

    a

    q

    8 j

    i

    2

    7

    6

    h

    3

    c

    4

    5

    e

    f

    b

    g

    d

  • E.G.M. Petrakis Trees in Main Memory 36

    Splay Performance

    Splay trees adapt to unknown or changing probability distributions

    Splay trees do not guarantee logarithmic cost for each access AVL trees do! asymptotic cost close to the cost of the optimal

    BST for unknown probability distributions It can be shown that the cost of m

    operations on an initially empty splay tree, where n are insertions is O(mlogn) in the worst case

  • E.G.M. Petrakis Trees in Main Memory 37

    Optimal BST

    Static environment: no insertions or deletions

    Keys are accessed with various frequencies

    Have the most frequently accessed keys near the root

    Application: a symbol table in main memory

  • E.G.M. Petrakis Trees in Main Memory 38

    Searching

    Given symbols a1 < a2 < .< an and their probabilities: p1, p2, pn minimize cost

    Successful search

    Transform unsuccessful to successful consider new symbols E1, E2, En

    - 1 2 i i+1 .n n+1 E0 E1 E2 Ei En

    Ei= (i , i+1 ) E0= (- , 1 ) En= (n , )

    n

    i

    ii alevelpt1

    )(cos

  • E.G.M. Petrakis Trees in Main Memory 39

    an

    an-1

    an-2 Ei

    failure node

    an-3

    unsuccessful search for all values in Ei terminates the same failure node (in fact, one node higher)

    Unsuccessful Search

    }1)({cos1

    n

    i

    ii alevelpt

  • E.G.M. Petrakis Trees in Main Memory 40

    Search Cost

    If pi is the probability to search for ai and qi is the probability to search in Ei then

    n

    1i

    n

    1iii 1qp

    1}(Ei){levelqlevel(ai)pcostn

    1i

    n

    1iii

    successful search

    unsuccessful search

  • Optimal Tree

    The binary search tree with the least cost

    Minimizes the average cost for searching given the search probabilities for a static set of given keys

    The nodes are known in advance No insertions or deletions

    Not balanced in the general case

    An AVL is not always optimal

    E.G.M. Petrakis Trees in Main Memory 41

  • E.G.M. Petrakis Trees in Main Memory 42

    (a1, a2, a3) = (do, if, read) p i = q i = 1/7

    read

    read

    read if

    if

    if

    do

    do

    do

    cost = 15/7

    cost = 13/7

    cost = 15/7

    Optimal BST

    if

    Example

  • E.G.M. Petrakis Trees in Main Memory 43

    read

    if

    do

    if

    read do

    cost = 15/7 cost = 15/7

  • E.G.M. Petrakis Trees in Main Memory 44

    In a BST, a subtree has nodes that are consecutive in a sorted sequence of keys (e.g. [5,26])

    10 25

    13 24 26

    12 14

    20

    25

    13 24 26

    14

    5

    Observation 1

  • E.G.M. Petrakis Trees in Main Memory 45

    If Tij is a sub-tree of an optimal BST holding keys from i to j then Tij must be optimal among all

    possible BSTs that store the same keys

    optimality lemma: all sub-trees of an optimal BST are also optimal

    Observation 2

  • E.G.M. Petrakis Trees in Main Memory 46

    Optimal BST Construction

    1) Construct all possible trees:

    NP-hard, there are trees!!

    2) Dynamic programming solves the problem in polynomial time O(n3) at each step, the algorithm finds and

    stores the optimal tree in each range of key values

    increases the size of the range at each step until the whole range is obtained

    n

    2n

    1n1

  • E.G.M. Petrakis Trees in Main Memory 47

    Example (successful search only):

    1) BSTs with 1 node

    range 1 10 20 40

    cost= 0.3 0.2 0.1 0.4

    2) BSTs with 2 nodes k=1-10 k=10-20 k=20-20

    range 1-10 range 20-40 range 10-20

    1

    10

    10

    20

    20

    40 cost=0.3 1+0.2 2=0.7 cost=0.2+0.1 2=0.4

    10

    1

    40

    20

    20

    10 cost=0.2 1+0,3.2=0.8 cost=0.1+0.2 2=0.5

    optimal

    keys 1 10 20 40

    probabilities 0,3 0,2 0,1 0,4

    cost=0.1+0.8=0.9

    cost=0.4+0.2=0.6

    optimal

    optimal

  • E.G.M. Petrakis Trees in Main Memory 48

    k=1-20 range 1-20

    1

    20

    10

    cost=0.1+2 0.3+3 0.2=1.3

    1 20

    10

    cost=0.2+2(0.3+0.1)=1

    1

    10

    20

    k=10-40 range 10-40

    10

    40

    20

    cost=0.2+2 0.4+3 0.1=1.3

    cost=0.3+2 0.2+3 0.1=1

    20

    10 40

    cost=0.1+2(0.2+0.4)=1.3

    40

    10

    20

    cost=0.4+2 0.2+30.1=1.1

    optimal

    3) BSTs with 3 nodes

  • E.G.M. Petrakis Trees in Main Memory 49

    4) BSTs with 4 nodes

    range 1-40 1

    40 10

    20

    cost=0.3+2 0.4+3 0.2+4 0.1=2.1

    10

    1 40

    20

    cost=0.2+2(0.3+0.4)+3 0.1=1.9

    20

    20

    40

    40

    1

    1

    10

    10

    OPTIMAL BST

    cost=0.1+2(0.3+0.4)+3 0.2=2.1

    cost=0.4+2 0.3+3 0.2+4 0.1=2

  • E.G.M. Petrakis Trees in Main Memory 50

    Complexity

    Compute all optimal BSTs for all Cij, i,j=1,2..n

    Let m=j-i: number of keys in range Cij

    n-m-1 Cijs must be computed

    The one with the minimum cost must be found, this takes O(m(n-m-1)) time

    For all Cijs it takes

    There is a better O(n2) algorithm by Knuth

    There is also an O(n) greedy algorithm

    )O(n)m(nm 3nm1

    2

  • E.G.M. Petrakis Trees in Main Memory 51

    Optimal BSTs

    High probability keys should be near the root

    But, the value of a key is also a factor

    It may not be desirable to put the smallest or largest key close to the root => this may result in skinny trees (e.g., lists)