Cs 157 Al 22 Indexing

download Cs 157 Al 22 Indexing

of 56

Transcript of Cs 157 Al 22 Indexing

  • 8/11/2019 Cs 157 Al 22 Indexing

    1/56

    The B-Tree I ndex

    Prof. Sin-Min Lee

    Department of Computer Science

    San Jose State University

  • 8/11/2019 Cs 157 Al 22 Indexing

    2/56

    Outline

    Introduction The B-tree shape

    Dynamic changes in the B-tree

    Properties of the B-tree Create index statement syntax

    More about the B-tree

    Summary

  • 8/11/2019 Cs 157 Al 22 Indexing

    3/56

    Introduction

    A B-tree is a keyed index structure, comparable to anumber of memory resident keyed lookup structures

    such as balanced binary tree, the AVL tree, and the 2-3

    tree.

    The difference is that a B-tree is meant to reside on disk,being made partially memory-resident only when entries

    int the structure are accessed.

    The B-tree structure is the most common used index typein databases today.

    It is provided byORACLE, DB2, and INGRES.

  • 8/11/2019 Cs 157 Al 22 Indexing

    4/56

  • 8/11/2019 Cs 157 Al 22 Indexing

    5/56

    The B-Tree Shape

    A B-tree is built upside down with the root at the top andthe leaves at the bottom.

    All nodes above the leaf level, including the root, are

    called directory nodes or index nodes.

    Directory nodesbelow the root are calledinternal

    nodes.

    The root node is known aslevel 1 of the B-tree and

    successively lower levels are given successively largerlevel numbers with the leaf nodes at the lowest level.

    The total number of levels is called thedepthof the B-

    tree.

  • 8/11/2019 Cs 157 Al 22 Indexing

    6/56

    Balanced and Unbalanced Trees

    Trees can be balancedor unbalanced.

    In a balanced tree, every path from the route to a leaf node is

    the same length.

    A tree that is balanced has at most logorder

    nlevels. This is

    desirable for an index.

  • 8/11/2019 Cs 157 Al 22 Indexing

    7/56

    The Problem of Unbalanced

    Trees a. A Troublesome Search Tree

    b. A More Troublesome Search Tree

    1

    2

    5

    3

    4

    6

    9

    7

    8

    5

    4

    3

    2

    1

  • 8/11/2019 Cs 157 Al 22 Indexing

    8/56

    Disadvantage of unbalanced tree

    Searching an unbalanced tree may

    require traversing an arbitrary and

    unpredictable number of nodes and

    pointers.

  • 8/11/2019 Cs 157 Al 22 Indexing

    9/56

    Unbalanced Tree (cont.)

    Problems:

    1. The levels of the tree are onlysparsely filled

    2. Resulting in long

    3. Deep paths and defeating thepurpose of binary trees in the first

    place.

  • 8/11/2019 Cs 157 Al 22 Indexing

    10/56

    The general name for B-trees is multiwaytrees. However,

    the best known of them have their very own names: 2-3

    trees,B* trees, andB+ trees. Here we will look only at 2-3trees. They are the simplest. And or purposes of learning

    the underlying principles this is good. Multiway trees can

    get pretty complicated pretty fast. Keep in mind that 2-3

    trees as we study them are rarely built. Their larger

    cousins, B* and B+ trees are often used for medium and

    large database applications. The general idea of a

    multiway tree of order nis that each node can hold up to n

    - 1key values and each node can have up to nchildern.

    So, a 2-3 tree is actually a multiway tree of order 3. Thatmeans that a 2-3 tree has nodes that can hold 1 or 2 values

    and that a node can have 0, 1, 2, or 3 children. Thus the

    name 2-3 tree

  • 8/11/2019 Cs 157 Al 22 Indexing

    11/56

    The B-Tree Shape (c.)

    level1: root node, level2: directory nodes, level3: leaf nodes

  • 8/11/2019 Cs 157 Al 22 Indexing

    12/56

  • 8/11/2019 Cs 157 Al 22 Indexing

    13/56

    B-Trees

    For a binary search tree, the search time is O(h), where h is

    the height of the tree.

    The height cannot be less than roughly log2

    n for a tree

    with n nodes.

    If the tree is too large for internal memory, each access of a

    node is an I/O.

    For a tree with 10,000 nodes:

    log210000 = 100 disk accesses

    We need a structure that would require only 3 or 4 disk

    accesses.

  • 8/11/2019 Cs 157 Al 22 Indexing

    14/56

    B-Trees - Definition

    A B-tree of order M is an M-ary tree

    with the following properties:

    (1) The data items are stored at leaves.

    (2) The nonleaf nodes store up to M - 1

    keys to guide the searching; key i represents

    the smallest key in subtree i + 1.

  • 8/11/2019 Cs 157 Al 22 Indexing

    15/56

    (3) The root is either a leaf or has between 2

    and M children.

    (4) All nonleaf nodes (except the root) havebetween ceiling(M/2) and M children.

    (5) All leaves are at the same depth and

    have between ceiling(L/2) and L data items,for some L.

  • 8/11/2019 Cs 157 Al 22 Indexing

    16/56

  • 8/11/2019 Cs 157 Al 22 Indexing

    17/56

    B-tree of order n

    Every B-tree is of some "order n", meaning nodes

    contain from n to 2n keys (so nodes are always at

    least half full of keys), and n+1 to 2n+1 pointers,and n can be any number.

    Keys are kept in sorted order within each node. A

    corresponding list of pointers are effectively

    interspersed between keys to indicate where tosearch for a key if it isn't in the current node.

  • 8/11/2019 Cs 157 Al 22 Indexing

    18/56

    A B-tree of order n is a multi-way search

    tree with two properties:

    1.All leaves are at the same level

    2.The number of keys in any node lies

    between n and 2n, with the possible

    exception of the root which may have fewerkeys.

  • 8/11/2019 Cs 157 Al 22 Indexing

    19/56

    Other definition

    A B-tree of order m is a m-way tree that satisfies the

    following conditions.

    Every node has < m children.

    Every internal node (except the root) has 2 children.

    An internal node with k children contains (k-1) ordered

    keys. The leftmost child contains keys less than or equal tothe first key in the node. The second child contains keys

    greater than the first keys but less than or equal to the

    second key, and so on.

  • 8/11/2019 Cs 157 Al 22 Indexing

    20/56

    A B-tree of order 2

  • 8/11/2019 Cs 157 Al 22 Indexing

    21/56

    Dynamic changes in the B-Tree

    A B-tree is an efficient self-modifying structure whennew entries are inserted pointing to new rows inserted in

    the indexed table.

    The nodes at every level are generally assumed not to be

    full.

    Space is left so that inserts are often possible to a node at

    any level without new disk space being required.

    An insert of a new entry always occurs at the leaf level,but occasionally the leaf node is too full to simply accept

    the new entry. In this case, for additional space the leaf

    level node is split into two leaf pages.

  • 8/11/2019 Cs 157 Al 22 Indexing

    22/56

    Properties of the B-Tree

    Assumptions: Entry key values can have variable length because of

    variable-length column values appearing in the index

    key. When a node split occurs, equal lengths of entry

    information are placed in the left and right split node.

    Rebalancing actions in the B-tree occur when entries are

    deleted.

  • 8/11/2019 Cs 157 Al 22 Indexing

    23/56

    Properties of the B-Tree (c.)

    Properties: Every node is disk-page sized and resides in a well-

    defined location.

    Nodes above the leaf level contain directory entries, with

    n-1 separator keys and n disk pointers to lower-level B-

    tree nodes.

    Nodes at the leaf level contain entries with (keyval,

    rowid) pairs pointing to individual rows indexed. All nodes below the root are at least half full with entry

    information.

    The root node contains at least two entries.

  • 8/11/2019 Cs 157 Al 22 Indexing

    24/56

    Insertion in B-Tree

    1. 2.

    a, g, f,b: k:

    a b f g

    a b g k

    f

  • 8/11/2019 Cs 157 Al 22 Indexing

    25/56

    Insertion (cont.)

    3. 4.

    d, h, m: j:

    5. 6.

    e, s, i, r: x:

    f

    a b d g h k m

    f j

    a b d g h k m

    f j

    a b d e k m r sg h i

    f j r

    g h i s xk ma b d e

  • 8/11/2019 Cs 157 Al 22 Indexing

    26/56

    Insertion (cont.)

    7.

    c, l, n, t, u:

    8.p:

    c f j r

    s t u xk l m ng h ia b d e

    j

    a b d e k l n p

    m rc f

    g h i s t u x

  • 8/11/2019 Cs 157 Al 22 Indexing

    27/56

    Inserting into a B-Tree

    To insert key value x into a B-Tree:

    Use the B-Tree search to determine on which node

    to make the insertion. Insert x into appropriate position on that leaf

    node.

    If resulting number of keys on that node < L, then

    simply output that node to disk and return.

    Otherwise, split the node.

  • 8/11/2019 Cs 157 Al 22 Indexing

    28/56

    Inserting into a B-Tree: Splitting a Node

    Allocate a new leaf node. Put about half (i.e.,

    about L/2) of the keys on the new node and

    leave about half of the keys on the existing node. Make appropriate changes to keys and pointers in

    the parent node.

    If the parent node was already full, then split theparent node.

    The splitting of parents may continue all the way

    back up to the root node.

  • 8/11/2019 Cs 157 Al 22 Indexing

    29/56

    Insert 19,12, 22,15.

  • 8/11/2019 Cs 157 Al 22 Indexing

    30/56

    Insertion

    Insert the keys in the folowing order into a B-tree of order 5.

    A, G, F, B, K, D, H, M, J, E, S, I, R, X, C, L, N, T, U, P.

  • 8/11/2019 Cs 157 Al 22 Indexing

    31/56

  • 8/11/2019 Cs 157 Al 22 Indexing

    32/56

    Searching

    Searching for an Item in a B-Tree:

    1. Make a local variable, i, equal to the first index such

    that data[i] >= target. If there is no such index, then set i

    equal to data_count, indicating that none of the entries is

    grater than or equal to the target.

    2. if (we found the target at data[i])

    return true;

    else if (the root has no children)

    return false;

    else

    return subset[i]->contains (target);

  • 8/11/2019 Cs 157 Al 22 Indexing

    33/56

    Searching (cont.)

    Example: target = 10

    2 3

    19 22

    6 17

    1610 18 20 25

    12

    5

    4

  • 8/11/2019 Cs 157 Al 22 Indexing

    34/56

    Deletion form a B-Tree

    1. detete h, r :

    s promote s

    and

    delete form leaf

    j

    c f

    g i

    d ea b k l n p

    m r

    g h i

    t u x

    s t u x

  • 8/11/2019 Cs 157 Al 22 Indexing

    35/56

    Deletion (cont.)

    2. delete p :

    t

    pull s down; pull t up

    j

    g i n pk ld ea b

    m sc f

    t u x

    n s

  • 8/11/2019 Cs 157 Al 22 Indexing

    36/56

    Deletion (cont.)

    3. delete d:

    Combine:

    j

    c f

    g id ea b k l n su x

    m t

  • 8/11/2019 Cs 157 Al 22 Indexing

    37/56

    Deletion (cont.)

    combine :

    f

    j

    u xn sk lg i

    g i k l n s u x

    m t

    a b c e

    f j m t

    a b c e

  • 8/11/2019 Cs 157 Al 22 Indexing

    38/56

    Deleting from a B-Tree

    To delete a key value x from a B-tree, first searchto determine the leaf node that contains x.

    If removing x leaves that leaf node with fewerthan the minimum number of keys, try to adopt a

    key from a neighboring node. If thats possible,

    then youre finished.

  • 8/11/2019 Cs 157 Al 22 Indexing

    39/56

    Deleting from a B-Tree

    (continued) If the neighboring node is already at its minimum,

    combine the leaf node with its neighboring node,

    resulting in one full leaf node. This will require restructuring the parent node

    since it has lost a child

    If the parent now has fewer than the minimum

    keys, adopt a key from one of its neighbors. Ifthats not possible, combine the parent with its

    neighbor.

  • 8/11/2019 Cs 157 Al 22 Indexing

    40/56

    Deleting from a B-Tree

    (continued) This process may percolate all the way to

    the root.

    If the root is left with only one child, thenremove the root node and make its child the

    new root.

    Both insertion and deletion are O(h), whereh is the height of the tree.

  • 8/11/2019 Cs 157 Al 22 Indexing

    41/56

    Delete 18

  • 8/11/2019 Cs 157 Al 22 Indexing

    42/56

    Delete 5

  • 8/11/2019 Cs 157 Al 22 Indexing

    43/56

    Delete 19

  • 8/11/2019 Cs 157 Al 22 Indexing

    44/56

    Delete 12

  • 8/11/2019 Cs 157 Al 22 Indexing

    45/56

    Properties of B-Trees

    (1) In a B-tree of order m with n keys the

    number of nodes, p, satisfies

    and on average p ~1.44 n/m.

    (2) On average less than

    nodes are split per insertion

  • 8/11/2019 Cs 157 Al 22 Indexing

    46/56

    Advantages of B-tree

    Searching a balanced tree means that all leaves are

    at the same depth. There is no runaway pointer

    overhead. Indeed, even very large B-trees can

    guarantee only a small number of nodes must be

    retrieved to find a given key. For example, a B-

    tree of 10,000,000 keys with 50 keys per node

    never needs to retrieve more than 4 nodes to findany key.

  • 8/11/2019 Cs 157 Al 22 Indexing

    47/56

    ORACLE Create Index Statement

    Syntax:create[unique]index indexname ontablename

    (columnname [asc| desc] {, columnname [asc| desc]})

    [tablespace tblspacename]

    [storage( [initialn] [nextn] [minextentsn]

    [maxextentsn] [pctincreasen] ) ]

    [pctfreen]

    [other disk storage and transaction clauses not covered

    or deferred]

    [nosort];

  • 8/11/2019 Cs 157 Al 22 Indexing

    48/56

    ORACLE Create Index Statement (c.)

    Explanation:

    The value of n inpctfree can range from 0 to 99 and this

    number determines the percentage of each B-tree node

    page.

    The list of columnnames in parentheses on the secondline specifies a concatenation of column values that make

    up an index key on the table specified.

    Thenosortindicates that the rows already lie on disk insorted order by the key values for this index.

  • 8/11/2019 Cs 157 Al 22 Indexing

    49/56

    DB2 Create Index Statement

    Syntax:create[unique] indexindexname ontablename

    (columname [asc| desc] {, columnname [asc| desc]})

    [using . . .][freepagen]

    [pctfreen]

    [additional clauses not covered or deferred];

  • 8/11/2019 Cs 157 Al 22 Indexing

    50/56

    DB2 Create Index Statement (c.)

    Explanation: The usingspecifies how the index is to be constructed

    from disk files.

    The integer n of the freepagespecifies how frequently an

    empty free page should be left in the sequence of pages

    assigned to the index when it is loaded with entries by a

    DB2utility.

    One free page is left for every n index pages where nvaries from 0 to 255. The defaut value for n is 0 meaning

    that no free pages are left. A value of n=1 means that

    alternate disk page are left empty.

  • 8/11/2019 Cs 157 Al 22 Indexing

    51/56

    INGRES Create Index Statement

    Syntax:create[unique] indexindexname ontable

    (columname {, columnname})

    [with /* comas separate clauses following */[location= . . . ]

    [structure= btree|isam |hash |. . .]

    [key= (columnname {, columnname})]

    [fillfactor= n]

    [nonleaffill= n]

    [additional clauses not covered or deferred] ];

  • 8/11/2019 Cs 157 Al 22 Indexing

    52/56

    INGRES Create Index Statement (c.)Explanation:

    The withkeyword must be present if any of the later

    coma-separated clauses appear.

    Thelocation specifies how the index is to be constructed

    from disk files. Thestructureis unique to INGRESand names the

    access structure that the index will be assigned when it is

    created.

    Thekey indicates that the index key value will be

    constructed from the columnnames listed.

    Thefillfactor andnonleaffill gives the percentage of

    node space that should be filled.

  • 8/11/2019 Cs 157 Al 22 Indexing

    53/56

    Index Node Layout and Free Space

    Below is the schematic layout of a normal leaf-level index

    node with unique key value.

    Header info Free space

    keyval rid keyval rid

  • 8/11/2019 Cs 157 Al 22 Indexing

    54/56

    More about the B-Tree

    The purpose of the B-tree index is to minimize the

    number of disk I/Os needed to locate a row wit a given

    index key value.

    The depth of the B-tree bears a close relationship to the

    number of disk I/Os used to reach the leaf-level entrywhere the rowid is kept.

    The nodes of the B-tree are loaded in a left-to-right

    fashion so that successive inserts normally occur to the

    same leaf node held consistently in memory buffer.

    When the leaf node splits, the successive leaf node is

    allocated from the next disk page of the allocated extent.

  • 8/11/2019 Cs 157 Al 22 Indexing

    55/56

    More about the B-Tree (c.)

    Node splits at every level occur in a controlled way andallow us to leave just the right amount of free space on

    each page.

    It is common to estimate the fanout at each level to have

    a value of n where n is expected number of entries thatappear in each node. Assuming that there are n directory

    entries at the root node and every node below that, the

    number of entries at the second level is n^2, at third level

    is n^3, and so on. For a tree of depth K, the number of

    leaf-level entries is n^K just before a root split occurs in

    the tree to make it a tree of depth K+1.

  • 8/11/2019 Cs 157 Al 22 Indexing

    56/56

    Summary

    The B-tree is a tree-like structure that helps us to

    organize data in an efficient way.

    The B-tree index is a technique used to minimize the disk

    I/Os needed for the purpose of locating a row with a

    given index key value. Because of its advantages, the B-tree and the B-tree index

    structure are widely used in databases nowadays.