Physical DB Design 10. 1 CSE2132 Database Systems Week 10 Lecture Physical Database Design - File...

Physical DB Design 10. 1

CSE2132CSE2132 Database Systems Database Systems

Week 10 Lecture

Physical Database Design - File Structures


Data Structures -What will we cover?Data Structures -What will we cover?

Underlying data structures

– File organizations

– Access modes

– Binary trees

– B+ trees

Oracle data structures


Underlying Data StructuresUnderlying Data Structures

Data structures are the bricks and mortar that hold databases together.

Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organization.

Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating Systems.

A good understanding and choice of data structures is important for machine performance, also to improve program design and to allow easier communication with DBMS specialists.


File OrganizationFile Organization

A file organization is a technique for physically arranging the records of a file on a secondary storage device.

File organizations

Sequential Indexed Direct

Sequential Non-sequential Relative-Addressed

Hash-Addressed

Hardware-dependent(ISAM)

Hardware-independent(VSAM)

(full index)(block index)


Record Access ModesRecord Access Modes

Sequential Access

In sequential access, record storage starts at a designated point, usually the beginning, and proceeds in a linear sequence through the file. Each record can only be retrieved by accessing all the records that physically precede it.

Random Access

In random access, a given record is accessed "out of the blue" without referencing other records in the file.


File Organization and Access ModeFile Organization and Access Mode

A File organization is established when the file is created, and is rarely changed. However, record access mode can change each time the file is used.

FileOrganization

Record access modeSequential Random

Sequential Yes No (impractical)

Indexed Seq. Yes Yes

Direct-Relative Yes Yes

Direct-Hashed No Yes (impractical)


Indexed Sequential ArchitectureIndexed Sequential Architecture(Partial Index)(Partial Index)

747

363 575 683

153 252 363 - -

Index set(many levels)

Sequence set

100 125 153

207 221 252

Control interval

Control Area

The actualdata records


Direct - Relative FilesDirect - Relative Files

Each record can be retrieved by specifying its relative record number.

The relative record number is a number 0 to n that gives the position of the record relative to the beginning of the file.

This provides a method of direct file organization.

Both sequential and direct access are handled but having a key allocation suitable for this method is not always easy or possible.


Direct - Hashed FilesDirect - Hashed Files In applications which do updates and retrievals in random mode, and

there is rarely the need for sequential access to the data records

(e.g. reservation systems). Hashed file organization provides rapid access to individual records based on a key.

The major disadvantage of hash organization is that sequential access is not convenient because the records are not stored in primary key sequence. But highly concurrent environments doing random access are suitable for using hash organization.

The basis of a hash file is an addressing algorithm which transforms the record identifier into a relative address.


Components of a Hashed FileComponents of a Hashed File

Identifier

Transformation

Primarystoragearea

Overflowstoragearea

Bucketoverflowtechnique

1 2 3 . . . . . s

1

2

b

1 2 3 . . . . . s 0

BucketSlot


Hashed File DesignHashed File DesignLoad Factor(Fill Factor): The load factor is the percentage of space allocated to the file that is taken up by the records in the file. A low load factor reduces the number of records that overflow their home addresses It is common to use 50% to 80%, using a lower load factor for files which that will grow.Bucket Capacity: Increasing the bucket capacity will also reduce the number of overflows and hence the average search length also.

AverageSearch Length 1.3

Load Factor (%)

b=1

b=2

b=3

b=4

20 40 60 80 100

1.0

1.1

1.5

b = records per bucket


Comparison of OrganizationsComparison of Organizations

Sequential

Indexed Sequential

Key

Start offile

ASTEROIDS

BREAKOUT COMBAT ZAXXON

. . . . . . . . . . . . . . .

ASTEROIDS

H P Z

A D K M

MEGAMANIA ZAXXON

Index

P. . . . . . H


Comparison of Organizations(2)Comparison of Organizations(2)

Direct - Relative

Direct - Hashed

CHESS COMBAT DEFENDER ZAXXON

1 2 3 nRelativerecord number

KEY HashingRoutine

Relativerecord no.

PITFALL BERSERK ODYSSEY DONKEYKONG

. . . .

1 2 3 n


Binary TreesBinary Trees

A non-linear data structure, each element having several "next" elements ( branching ).

A binary tree has a maximum of two branches per element or node.

A node consist of some data and a maximum of two pointers, a left pointer to the left branch and right pointer to the right branch. If there is no left or right branch then a nil pointer is used.


A Diagram of a Binary TreeA Diagram of a Binary Tree

Primary Key

Data Less Than Pointer

Greater Than Pointer

PRODUCT# LINK RLINKBasic binarytree recordlayout forPRODUCT_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

1000 1000

1600

1000

16000350

1000

0350 1600

2000

1000

0350 1600

20000975

(1) Initial tree (2) Insert 1000 (3) Insert 1600 (4) Insert 0350

(5) Insert 2000 (6) Insert 0975 (7) Insert 0625

1000

0350 1600

200009750625

>< >

< >

>

< >

> >

< >

>>

<


An Example of a Binary TreeAn Example of a Binary Tree

1000

0350 1600

20000975

0625

< >

>>

<1250

1425 1775

0100

<

> <

Task: Indicate the different traversals on this diagram.

<


B TreesB Trees

The problem with Binary Trees is balance, the tree can easily deteriorate to a linked list. Consequently, the reduced search times are lost, this problem is overcome in B trees.

B stands for Balanced, where all the leaves are the same distance from the root. B trees guarantee a predictable efficiency.

There are several varieties of Btrees, most applications use the B+tree.

A B+tree of degree m has the following properties:

1. All leaves are at the same level, that is the same depth from the root.

2. A non-leaf node that has n branches will contain n-1 keys.


Example of a B TreeExample of a B Tree

1250

0625 10001277 1282

16000350

< >1291

2107

1425 2000

A Btree provides balance and quick direct access but sequentialprocessing can be slow. Because of this the B+tree was introduced.In a B+tree all key values occur in a leaf node so that sequential processing can be supported. This means that the leaf nodes have a different structure to high level nodes and some key values will occur twice in the tree.


B+ Tree Node StructureB+ Tree Node Structure

P K P K P K P1 1 2 2 n-1 n-1 n

P K P K P K P1 1 2 2 n-1 n-1 n.. . . . . .

. . . . . . .

A high level node

A leaf node (Every key value appears in a leaf node)

Pointer tosubtree forkeys>= K & < K

Pointer tosubtree forkeys>= K1 n-2 n-1

Pointer tosubtree forkeys>= K & < K1 2

Pointer tosubtree forkeys< K n-1

Pointer torecord (block)with key K

Pointer torecord (block)with key K

Pointer to leafwith smallestkey greater than K

Pointer torecord (block)with key K 1 2 n-1 n-1


Example of a B+ TreeExample of a B+ Tree

1250

0625 10001425 2000

0350 0625

1300

1250 1300 1425 1600 20000350 0625 1000

1600

1425 20001000 1250

LeafNodes

Actual Data Records

>=<


Building a B+ TreeBuilding a B+ Tree

67, 89 , 123,18, 34, 87, 99, 104, 36, 55, 78, 9

8967 89

<89

67 123

< >=

89

34

18 123

< >=

89

89

18 123

< >=

67

89

data records

leaf node

root node

34 67

(node split a bc ; 3 do not fit so split and promote middle value)


A Review of TreesA Review of Trees

Can permit rapid retrieval of data for both random and sequential processing.

Can be used based on primary or secondary keys.

Trees are special cases of networks; in networks records from different files are joined without a strict hierarchy being observed.


Indexes in Oracle(1)Indexes in Oracle(1)

CREATE [bitmap] [unique] INDEX index ON table(column [,column]..);

An index is a schema object that contains an entry for each value that appears in the indexed column(s) of the table or cluster and provides direct, fast access to rows.

Indexes may be created on one or more(up to 32) columns of a table, a partitioned table, or a

cluster; one or more scalar typed object attributes of a table or a cluster.

It is preferable to use primary key when creating the table as Create Unique Index will fail if there are duplicates.


Indexes in Oracle(2)Indexes in Oracle(2)

An index is an ordered list of all the values that reside in a group of one or more columns at a given time. Such a list makes queries that test the values in those columns vastly more efficient. Indexes also take up storage space, and must be changed whenever the data is, so a cost-benefit analysis must be made in each case to determine whether and how indexes should be used. Oracle can use indexes to improve performance when: searching for rows with specified index column values accessing tables in index column order

When you initially insert rows into a new table, it is generally faster to create the table, insert the rows, and then create the index. If you create the index before inserting the rows, Oracle must update the index for every row inserted.


Indexes in Oracle(3)Indexes in Oracle(3) Multiple Indexes Per Table

Unlimited indexes can be created for a table provided that the combination of columns differ for each index. You can create more than one index using the same columns provided that you specify distinctly different combinations of the columns. For example, the following statements specify valid combinations:

CREATE INDEX emp_idx1 ON emp (ename, job);

CREATE INDEX emp_idx2 ON emp (job, ename); Note that each index increases the processing time needed to

maintain the table during updates to indexed data. There is overhead in maintaining indexes when a table is updated. Thus, updating a table with a single index will take less time than if the table had five indexes.


Indexes in Oracle(4) - NullsIndexes in Oracle(4) - Nulls

Table rows in which all key columns are NULL are not indexed.

Consider the following statement:

SELECT ename

FROM emp

WHERE comm IS NULL;

The above query does not use an index created on the COMM column.


Indexes in Oracle(5) - Bitmap IndexIndexes in Oracle(5) - Bitmap Index

Bitmap indexes store the rowids associated with a key value as a bitmap. Each bit in the bitmap corresponds to a possible ROWID, and if the bit is set, it means that the row with the corresponding ROWID contains the key value. The internal representation of bitmaps is best suited for applications with low levels of concurrent transactions, such as data warehousing.

Bitmap indexes are appropriate when there are few distinct values for a column that the index is created on. An example would be a flag column that held either Y or N.

CREATE BITMAP INDEX masterflagbitmap_ix ON film_copy(masterflag); The index holds a bitmap value for each possible value for every row in the table

Y < 1 1 0 1 1 0 0 1 . . . . . . . . . . . . >

N < 0 0 1 0 0 1 1 0 . . . . . . . . . . . . >


Clusters(1)Clusters(1)

A cluster is a schema object that contains one or more tables that all have one or more columns in common. Rows of one or more tables that share the same value in these common columns are physically stored together within the database.

Clustering provides more control over the physical storage of rows within the database. Clustering can reduce both the time it takes to access clustered tables and the space needed to store the table. After you create a cluster and add tables to it, the cluster is transparent. You can access clustered tables with SQL statements just as you can non-clustered tables.

While clustering multiple tables improves the performance of joins, it is likely to reduce the performance of full table scans, INSERT statements, and UPDATE statements that modify cluster key values.


Clusters(2) - creating an Indexed ClusterClusters(2) - creating an Indexed Cluster The rows of two related tables are interleaved in a single area called a cluster. The

cluster key is the column or columns by which the tables are usually joined in a query.

CREATE CLUSTER cluster (column datatype [,column datatype] . . . );

e.g.

CREATE CLUSTER workerandskill (tempname varchar2(25) );

This sets aside a space. The column name is irrelevant but the datatype must match Name in the table worker.

Next tables are created to be included in the cluster.

CREATE TABLE worker (Name Varchar2(25) not null,

Age Number,

Lodging Varchar2(15) )

CLUSTER workerandskill (Name);


Clusters(3) - creating an Indexed ClusterClusters(3) - creating an Indexed Cluster

Now a second table is added to the cluster

CREATE TABLE workerskill ( Name Varchar2(25) not null,

Skill Varchar2(25) not null,

Ability Varchar2(15) )

CLUSTER workerandskill (Name); Prior to inserting rows into worker and workerskill you must create a

cluster index.

CREATE INDEX workerandskill_ix ON CLUSTER workerandskill;

Note that no index columns are specified since the index is automatically built on all the columns of the cluster key. For cluster indexes, all rows are indexed.


Example of a Cluster: Example of a Cluster: NameName is the Cluster Key is the Cluster Key

Age Lodging Name Skill Ability

23 PAPA KING ADAH TALBOT WORK GOOD

29 ROSE HILL ANDREW DYE

22 CRAMNER BART SARJEANT

18 ROSE HILL DICK JONES SMITHY EXCELLENT

16 MATTS DONALD ROLLO

43 WEITBROCHT ELBERT TALBOT DISCUS SLOW

27 ROSE HILL JOHN PEARSON COMBINE DRIVER

WOODCUTTER GOOD

SMITHY AVERAGE

ROSE HILL KAY AND PALMER WALLBOM

From the WORKER table

From the WORKERSKILL table


Clusters(4) - creating an Indexed ClusterClusters(4) - creating an Indexed Cluster Each cluster key value is stored only once. It is as if the cluster were a

big table containing data drawn from both of the tables that make it up.

You may want to use indexed clusters in the following cases:

Your queries retrieve rows over a range of cluster key values.

Your clustered tables may grow unpredictably.

You cannot specify integrity constraints as part of the definition of a cluster key column. Instead, you can associate integrity constraints with the tables that belong to the cluster.


Clusters(5) - creating a Hash ClusterClusters(5) - creating a Hash Cluster In a hash cluster, Oracle stores together rows that have the same

hash key value. The hash value for a row is the value returned by the cluster's hash function.

When you create a hash cluster, you can either specify a hash function or use the Oracle internal hash function. Hash values are not actually stored in the cluster, although cluster key values are stored for every row in the cluster.

You may want to use hash clusters in the following cases:

Your queries retrieve rows based on equality conditions involving all cluster key columns.

Your clustered tables are static or you can determine the maximum number of rows and the maximum amount of space required by the cluster when you create the cluster.


Clusters(6) - creating a Hash ClusterClusters(6) - creating a Hash Cluster The following statement creates a hash cluster named PERSONNEL with

the cluster key column DEPARTMENT_NUMBER.

CREATE CLUSTER personnel

( department_number NUMBER )

HASHKEYS 500;

The hashkeys clause creates the hash cluster, using an internal hash function and specifies the number of hash values rounded to the nearest prime number (503 in this case).

Now create the tables indicating the cluster in the cluster clause

Physical DB Design 10. 1 CSE2132 Database Systems Week 10 Lecture Physical Database Design - File...

Documents

Transcript of Physical DB Design 10. 1 CSE2132 Database Systems Week 10 Lecture Physical Database Design - File...