B_plus_tree_ch14.doc

download B_plus_tree_ch14.doc

of 9

Transcript of B_plus_tree_ch14.doc

  • 7/27/2019 B_plus_tree_ch14.doc

    1/9

    B+ Trees

    A B+ tree is a data structure you can use to implement a very sorting,

    large amounts of data efficient method for sorting large amounts of

    data; B+ trees enable a correspondingly efficient searching algorithm.You can think of a B+ tree as providing an indexto a database, which

    is why B+ trees are sometimes referred to as indices.

    In Visual Prolog, a B+ tree resides in an external database. Each entry

    in a B+ tree is a pair of values: a key string key string and an

    associated database reference number. When building your database,

    you first insert a record in the database and establish a key for that

    record. The Visual Prolog btree predicates may then be used to insert

    this key and the database reference number corresponding to this

    record into a B+ tree.

    When searching a database for a record, all you have to do is to obtain

    a key for that record, and the B+ tree will give you the corresponding

    reference number. Using this reference number, you can retrieve the

    record from the database. As a B+ tree evolves, its entries are kept in

    key order. This means that you can easily obtain a sorted listing of the

    records.

    A B+ tree is analogous to a binary tree, with the exception that in a B+

    tree, more than one key string is stored at each node. B+ trees are

    also balanced; this means that the search paths to each key in the

    leaves of the tree have the same length. Because of this feature, a

    search for a given key among more than a million keys can be

    guaranteed, even in the worst case, to require accessing the disk only

    a few times--depending on how many keys are stored at each node.

    Although B+ trees are placed in an external database, they don't need

    to point to terms in the same database. It is possible to have a

    database containing a number of chains, and another database with a

    B+ tree pointing to terms in those chains.

    1. Pages, Order, and Keylength

  • 7/27/2019 B_plus_tree_ch14.doc

    2/9

    In a B+ tree, keys are grouped together inpages; each page has the

    same size, and all pages can contain the same number of keys, which

    means that all the stored keys for that B+ tree must be the same size.

    The size of the keys is determined by the KeyLen argument, which you

    must supply when creating a B+ tree. If you attempt to insert strings

    longer than KeyLen into a B+ tree, Visual Prolog will truncate them. In

    general, you should choose the smallest possible value for KeyLen in

    order to save space and maximize speed.

    When you create a B+ tree, you must also give an argument called

    its Order. This argument determines how many keys should be stored

    in each tree node; usually, you must determine the best choice by trial

    and error. A good first try for Orderis 4, which stores between 4 and 8

    keys at each node. You must choose the value ofOrderby

    experimentation because the B+ tree's search speed depends on the

    values KeyLen and Order, the number of keys in the B+ tree, and your

    computer's hardware configuration.

    2. Duplicate Keys

    When setting up a B+ tree, you must allow for all repeat occurrences

    of your key. For example, if you're setting up a B+ tree for a database

    of customers in which the key is the customer's last name, you need to

    allow for all those customers called Smith. For this reason, it is possible

    to have duplicate keys in a B+ tree.

    When you delete a term in the database, you must delete the

    corresponding entry in a B+ tree with duplicate keys by giving both the

    key and the database reference number.

    3. Multiple ScansIn order multiple, scans of B+ trees to have more than one internal

    pointer to the same B+ tree, you can open the tree more than once.

    Note, however, that if you update one copy of a B+ tree, for which you

    have other copies currently open, the pointers for the other copies will

    be repositioned to the top of the tree.

  • 7/27/2019 B_plus_tree_ch14.doc

    3/9

    4. The B+ Tree Standard Predicates

    Visual Prolog provides several predicates for handling B+ trees; these

    predicates work in a manner that parallels the

    corresponding db_... predicates.

    (1) bt_create/5 and bt_create/6

    You create new B+ trees by calling the bt_create predicate.

    bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order)

    /* (i,i,o,i,i) */

    bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order, Duplicates)

    /* (i,i,o,i,i,i) */

    The BtreeName argument specifies the name for the new tree. You

    later use this name as an argument for bt_open. The

    arguments KeyLen and Orderfor the B+ Tree are given when the tree

    is created and can't be changed afterwards. If you are

    calling bt_create/5 or bt_create/6 with the Duplicates argument set to

    1, duplicates will be allowed in the B+Tree. If you call bt_create/6 with

    the Duplicatesargument set to 0 you will not be allowed to insert

    duplicates in the B+Tree.

    (2) bt_open/3

    bt_open opens an already created B+ tree in a database, which is

    identified by the name given in bt_create.

    bt_open(Dbase, BtreeName, Btree_Sel) /* (i,i,o) */

    When you open or create a B+ tree, the call returns aselector (Btree_Sel) for that B+ tree. A B+ tree selector belongs to the

    predefined domain bt_selectorand refers to the B+ tree whenever the

    system carries out search or positioning operations. The relationship

    between a B+ tree's name and its selector is exactly the same as the

  • 7/27/2019 B_plus_tree_ch14.doc

    4/9

    relationship between an actual file name and the corresponding

    symbolic file name.

    You can open a given B+ tree more than once in order to handle

    several simultaneous scans. Each time a B+ tree is opened, adescriptor is allocated, and each descriptor maintains its own internal

    B+ tree pointer.

    (3) bt_close/2 and bt_delete/2

    You can close an open B+ tree with a call to bt_close or delete an

    entire B+ tree with bt_delete.

    bt_close(Dbase, Btree_Sel) /* (i,i) */bt_delete(Dbase, BtreeName) /* (i,i) */

    Calling bt_close releases the internal buffers allocated for the open B+

    tree with BtreeName.

    (4) bt_copyselector

    bt_copyselectorgives you a new pointer for an already open B+ tree

    selector (a new scan).

    bt_copyselector(Dbase,OldBtree_sel,NewBtree_sel) /* (i,i,o) */

    The new selector will point to the same place in the B+ tree as the old

    selector. After the creation the two B+ tree selectors can freely be

    repositioned without affecting each other.

    (5) bt_statistics/8

    bt_statistics returns statistical information for the B+ tree identified

    by Btree_Sel.

    bt_statistics(Dbase,Btree_Sel,NumKeys,NumPages, /* (i,i,o,o, */

    Depth,KeyLen,Order,PgSize) /* o,o,o,o) */

  • 7/27/2019 B_plus_tree_ch14.doc

    5/9

    The arguments to bt_statistics represent the following:

    Dbase is the db_selector identifying the database.

    Btree_Sel is the bt_selector identifying the B+ tree.

    NumKeys is bound to the total number of keys in the B+ tree Btree_Sel

    NumPages is bound to the total number of pages in the B+ tree.

    Depth is bound to the depth of the B+ tree.

    KeyLen is bound to the key length.

    Order is bound to the order of the B+ tree.

    PgSize is bound to the page size (in bytes).

    (6) key_insert/4 and key_delete/4

    You use the standard predicates key_insertand key_delete to update

    the B+ tree.

    key_insert(Dbase, Btree_Sel, Key, Ref /* (i,i,i,i) */

    key_delete(Dbase, Btree_Sel, Key, Ref) /* (i,i,i,i) */

    By giving both Keyand Refto key_delete, you can delete a specific

    entry in a B+ tree with duplicate keys.

    (7) key_first/3, key_last/3, and key_search/4

    Each B+ tree maintains an internal pointer to its

    nodes. key_firstand key_lastallow you to position the pointer at the

    first or last key in a B+ tree, respectively. key_search positions the

    pointer on a given key.

  • 7/27/2019 B_plus_tree_ch14.doc

    6/9

    key_first(Dbase, Btree_Sel, Ref) /* (i,i,o) */

    key_last(Dbase, Btree_Sel, Ref) /* (i,i,o) */

    key_search(Dbase, Btree_Sel, Key, Ref) /* (i,i,i,o)(i,i,i,i) */

    If the key is found, key_search will succeed; if it's notfound, key_search will fail, but the internal B+ tree pointer will be

    positioned at the key immediately after where Key would have been

    located. You can then use key_currentto return the key and database

    reference number for this key. If you want to position on an exact

    position in a B+ tree with duplicates you can also provide the Refas an

    input argument.

    (8) key_next/3 and key_prev/3

    You can use the predicates key_nextand key_prevto move the B+

    tree's pointer forward or backward in the sorted tree.

    key_next(Dbase, Btree_Sel, NextRef) /* (i,i,o) */

    key_prev(Dbase, Btree_Sel, PrevRef) /* (i,i,o) */

    If the B+ tree is at one of the ends, trying to move the pointer further

    will cause a fail, but the B+ tree pointer will act as if it were placed one

    position outside the tree.

    (9) key_current/4

    key_currentreturns the key and database reference number for the

    current pointer in the B+ tree.

    key_current(Dbase, Btree_Sel, Key, Ref) /* (i,i,o,o) */

    key_currentfails after a call to thepredicates bt_open, bt_create, key_insert, or key_delete, or when the

    pointer is positioned before the first key (using key_prev) or after the

    last (with key_next).

    5. Example: Accessing a Database via B+ Trees

  • 7/27/2019 B_plus_tree_ch14.doc

    7/9

    The following example program handles several text files in a single

    database file at once. You can select and edit the texts as though they

    were in different files. A corresponding B+ tree is set up for fast access

    to the texts and to produce a sorted list of the file names.

    /* Program ch14e04.pro */

    DOMAINS

    db_selector = dba

    PREDICATES

    % List all keys in an index

    list_keys(db_selector,bt_selector)

    CLAUSES

    list_keys(dba,Bt_selector):-

    key_current(dba,Bt_selector,Key,_),

    write(Key,' '),

    fail.

    list_keys(dba,Bt_selector):-

    key_next(dba,Bt_selector,_),!,

    list_keys(dba,Bt_selector).ist_keys(_,_).

    PREDICATES

    open_dbase(bt_selector)

    main(db_selector,bt_selector)

    ed(db_selector,bt_selector,string)

    ed1(db_selector,bt_selector,string)

    CLAUSES

    % Loop until escape is pressed

    main(dba,Bt_select):-

    write("File Name: "),

    readln(Name),

    ed(dba,Bt_select,Name),!,

  • 7/27/2019 B_plus_tree_ch14.doc

    8/9

    main(dba,Bt_select).

    main(_,_).

    % The ed predicates ensure that the edition will never fail.

    ed(dba,Bt_select,Name):-

    ed1(dba,Bt_select,Name),!.

    ed(_,_,_).

    %* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

    % There are three choices:

    %% a) The name is an empty string - list all the names

    % b) The name already exists - modify the contents of the file

    % c) The name is a new name - create a new file%* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

    ed1(dba,Bt_select,""):-!,

    key_first(dba,Bt_select,_),

    list_keys(dba,Bt_select),

    nl.

    ed1(dba,Bt_select,Name):-

    key_search(dba,Bt_select,Name,Ref),!,

    ref_term(dba,string,Ref,Str),

    edit(Str,Str1,"Edit old",NAME,"",0,"PROLOG.HLP",RET),

    clearwindow,

    Str>

  • 7/27/2019 B_plus_tree_ch14.doc

    9/9

    bt_open(dba,"ndx",INDEX).

    open_dbase(INDEX):-

    db_create(dba,"dd1.dat",in_file),

    bt_create(dba,"ndx",INDEX,20,4).

    GOAL

    open_dbase(INDEX),

    main(dba,INDEX),

    bt_close(dba,INDEX),

    db_close(dba).