B_plus_tree_ch14.doc
-
Upload
heena-sharma -
Category
Documents
-
view
215 -
download
0
Transcript of B_plus_tree_ch14.doc
-
7/27/2019 B_plus_tree_ch14.doc
1/9
B+ Trees
A B+ tree is a data structure you can use to implement a very sorting,
large amounts of data efficient method for sorting large amounts of
data; B+ trees enable a correspondingly efficient searching algorithm.You can think of a B+ tree as providing an indexto a database, which
is why B+ trees are sometimes referred to as indices.
In Visual Prolog, a B+ tree resides in an external database. Each entry
in a B+ tree is a pair of values: a key string key string and an
associated database reference number. When building your database,
you first insert a record in the database and establish a key for that
record. The Visual Prolog btree predicates may then be used to insert
this key and the database reference number corresponding to this
record into a B+ tree.
When searching a database for a record, all you have to do is to obtain
a key for that record, and the B+ tree will give you the corresponding
reference number. Using this reference number, you can retrieve the
record from the database. As a B+ tree evolves, its entries are kept in
key order. This means that you can easily obtain a sorted listing of the
records.
A B+ tree is analogous to a binary tree, with the exception that in a B+
tree, more than one key string is stored at each node. B+ trees are
also balanced; this means that the search paths to each key in the
leaves of the tree have the same length. Because of this feature, a
search for a given key among more than a million keys can be
guaranteed, even in the worst case, to require accessing the disk only
a few times--depending on how many keys are stored at each node.
Although B+ trees are placed in an external database, they don't need
to point to terms in the same database. It is possible to have a
database containing a number of chains, and another database with a
B+ tree pointing to terms in those chains.
1. Pages, Order, and Keylength
-
7/27/2019 B_plus_tree_ch14.doc
2/9
In a B+ tree, keys are grouped together inpages; each page has the
same size, and all pages can contain the same number of keys, which
means that all the stored keys for that B+ tree must be the same size.
The size of the keys is determined by the KeyLen argument, which you
must supply when creating a B+ tree. If you attempt to insert strings
longer than KeyLen into a B+ tree, Visual Prolog will truncate them. In
general, you should choose the smallest possible value for KeyLen in
order to save space and maximize speed.
When you create a B+ tree, you must also give an argument called
its Order. This argument determines how many keys should be stored
in each tree node; usually, you must determine the best choice by trial
and error. A good first try for Orderis 4, which stores between 4 and 8
keys at each node. You must choose the value ofOrderby
experimentation because the B+ tree's search speed depends on the
values KeyLen and Order, the number of keys in the B+ tree, and your
computer's hardware configuration.
2. Duplicate Keys
When setting up a B+ tree, you must allow for all repeat occurrences
of your key. For example, if you're setting up a B+ tree for a database
of customers in which the key is the customer's last name, you need to
allow for all those customers called Smith. For this reason, it is possible
to have duplicate keys in a B+ tree.
When you delete a term in the database, you must delete the
corresponding entry in a B+ tree with duplicate keys by giving both the
key and the database reference number.
3. Multiple ScansIn order multiple, scans of B+ trees to have more than one internal
pointer to the same B+ tree, you can open the tree more than once.
Note, however, that if you update one copy of a B+ tree, for which you
have other copies currently open, the pointers for the other copies will
be repositioned to the top of the tree.
-
7/27/2019 B_plus_tree_ch14.doc
3/9
4. The B+ Tree Standard Predicates
Visual Prolog provides several predicates for handling B+ trees; these
predicates work in a manner that parallels the
corresponding db_... predicates.
(1) bt_create/5 and bt_create/6
You create new B+ trees by calling the bt_create predicate.
bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order)
/* (i,i,o,i,i) */
bt_create(Dbase, BtreeName, Btree_Sel, KeyLen, Order, Duplicates)
/* (i,i,o,i,i,i) */
The BtreeName argument specifies the name for the new tree. You
later use this name as an argument for bt_open. The
arguments KeyLen and Orderfor the B+ Tree are given when the tree
is created and can't be changed afterwards. If you are
calling bt_create/5 or bt_create/6 with the Duplicates argument set to
1, duplicates will be allowed in the B+Tree. If you call bt_create/6 with
the Duplicatesargument set to 0 you will not be allowed to insert
duplicates in the B+Tree.
(2) bt_open/3
bt_open opens an already created B+ tree in a database, which is
identified by the name given in bt_create.
bt_open(Dbase, BtreeName, Btree_Sel) /* (i,i,o) */
When you open or create a B+ tree, the call returns aselector (Btree_Sel) for that B+ tree. A B+ tree selector belongs to the
predefined domain bt_selectorand refers to the B+ tree whenever the
system carries out search or positioning operations. The relationship
between a B+ tree's name and its selector is exactly the same as the
-
7/27/2019 B_plus_tree_ch14.doc
4/9
relationship between an actual file name and the corresponding
symbolic file name.
You can open a given B+ tree more than once in order to handle
several simultaneous scans. Each time a B+ tree is opened, adescriptor is allocated, and each descriptor maintains its own internal
B+ tree pointer.
(3) bt_close/2 and bt_delete/2
You can close an open B+ tree with a call to bt_close or delete an
entire B+ tree with bt_delete.
bt_close(Dbase, Btree_Sel) /* (i,i) */bt_delete(Dbase, BtreeName) /* (i,i) */
Calling bt_close releases the internal buffers allocated for the open B+
tree with BtreeName.
(4) bt_copyselector
bt_copyselectorgives you a new pointer for an already open B+ tree
selector (a new scan).
bt_copyselector(Dbase,OldBtree_sel,NewBtree_sel) /* (i,i,o) */
The new selector will point to the same place in the B+ tree as the old
selector. After the creation the two B+ tree selectors can freely be
repositioned without affecting each other.
(5) bt_statistics/8
bt_statistics returns statistical information for the B+ tree identified
by Btree_Sel.
bt_statistics(Dbase,Btree_Sel,NumKeys,NumPages, /* (i,i,o,o, */
Depth,KeyLen,Order,PgSize) /* o,o,o,o) */
-
7/27/2019 B_plus_tree_ch14.doc
5/9
The arguments to bt_statistics represent the following:
Dbase is the db_selector identifying the database.
Btree_Sel is the bt_selector identifying the B+ tree.
NumKeys is bound to the total number of keys in the B+ tree Btree_Sel
NumPages is bound to the total number of pages in the B+ tree.
Depth is bound to the depth of the B+ tree.
KeyLen is bound to the key length.
Order is bound to the order of the B+ tree.
PgSize is bound to the page size (in bytes).
(6) key_insert/4 and key_delete/4
You use the standard predicates key_insertand key_delete to update
the B+ tree.
key_insert(Dbase, Btree_Sel, Key, Ref /* (i,i,i,i) */
key_delete(Dbase, Btree_Sel, Key, Ref) /* (i,i,i,i) */
By giving both Keyand Refto key_delete, you can delete a specific
entry in a B+ tree with duplicate keys.
(7) key_first/3, key_last/3, and key_search/4
Each B+ tree maintains an internal pointer to its
nodes. key_firstand key_lastallow you to position the pointer at the
first or last key in a B+ tree, respectively. key_search positions the
pointer on a given key.
-
7/27/2019 B_plus_tree_ch14.doc
6/9
key_first(Dbase, Btree_Sel, Ref) /* (i,i,o) */
key_last(Dbase, Btree_Sel, Ref) /* (i,i,o) */
key_search(Dbase, Btree_Sel, Key, Ref) /* (i,i,i,o)(i,i,i,i) */
If the key is found, key_search will succeed; if it's notfound, key_search will fail, but the internal B+ tree pointer will be
positioned at the key immediately after where Key would have been
located. You can then use key_currentto return the key and database
reference number for this key. If you want to position on an exact
position in a B+ tree with duplicates you can also provide the Refas an
input argument.
(8) key_next/3 and key_prev/3
You can use the predicates key_nextand key_prevto move the B+
tree's pointer forward or backward in the sorted tree.
key_next(Dbase, Btree_Sel, NextRef) /* (i,i,o) */
key_prev(Dbase, Btree_Sel, PrevRef) /* (i,i,o) */
If the B+ tree is at one of the ends, trying to move the pointer further
will cause a fail, but the B+ tree pointer will act as if it were placed one
position outside the tree.
(9) key_current/4
key_currentreturns the key and database reference number for the
current pointer in the B+ tree.
key_current(Dbase, Btree_Sel, Key, Ref) /* (i,i,o,o) */
key_currentfails after a call to thepredicates bt_open, bt_create, key_insert, or key_delete, or when the
pointer is positioned before the first key (using key_prev) or after the
last (with key_next).
5. Example: Accessing a Database via B+ Trees
-
7/27/2019 B_plus_tree_ch14.doc
7/9
The following example program handles several text files in a single
database file at once. You can select and edit the texts as though they
were in different files. A corresponding B+ tree is set up for fast access
to the texts and to produce a sorted list of the file names.
/* Program ch14e04.pro */
DOMAINS
db_selector = dba
PREDICATES
% List all keys in an index
list_keys(db_selector,bt_selector)
CLAUSES
list_keys(dba,Bt_selector):-
key_current(dba,Bt_selector,Key,_),
write(Key,' '),
fail.
list_keys(dba,Bt_selector):-
key_next(dba,Bt_selector,_),!,
list_keys(dba,Bt_selector).ist_keys(_,_).
PREDICATES
open_dbase(bt_selector)
main(db_selector,bt_selector)
ed(db_selector,bt_selector,string)
ed1(db_selector,bt_selector,string)
CLAUSES
% Loop until escape is pressed
main(dba,Bt_select):-
write("File Name: "),
readln(Name),
ed(dba,Bt_select,Name),!,
-
7/27/2019 B_plus_tree_ch14.doc
8/9
main(dba,Bt_select).
main(_,_).
% The ed predicates ensure that the edition will never fail.
ed(dba,Bt_select,Name):-
ed1(dba,Bt_select,Name),!.
ed(_,_,_).
%* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
% There are three choices:
%% a) The name is an empty string - list all the names
% b) The name already exists - modify the contents of the file
% c) The name is a new name - create a new file%* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
ed1(dba,Bt_select,""):-!,
key_first(dba,Bt_select,_),
list_keys(dba,Bt_select),
nl.
ed1(dba,Bt_select,Name):-
key_search(dba,Bt_select,Name,Ref),!,
ref_term(dba,string,Ref,Str),
edit(Str,Str1,"Edit old",NAME,"",0,"PROLOG.HLP",RET),
clearwindow,
Str>
-
7/27/2019 B_plus_tree_ch14.doc
9/9
bt_open(dba,"ndx",INDEX).
open_dbase(INDEX):-
db_create(dba,"dd1.dat",in_file),
bt_create(dba,"ndx",INDEX,20,4).
GOAL
open_dbase(INDEX),
main(dba,INDEX),
bt_close(dba,INDEX),
db_close(dba).