Recreate Physical Standby Database after recovery of Primary Database
Physical DB Design 10. 1 CSE2132 Database Systems Week 10 Lecture Physical Database Design - File...
Transcript of Physical DB Design 10. 1 CSE2132 Database Systems Week 10 Lecture Physical Database Design - File...
Physical DB Design 10. 1
CSE2132CSE2132 Database Systems Database Systems
Week 10 Lecture
Physical Database Design - File Structures
Physical DB Design 10. 2
Data Structures -What will we cover?Data Structures -What will we cover?
Underlying data structures
– File organizations
– Access modes
– Binary trees
– B+ trees
Oracle data structures
Physical DB Design 10. 3
Underlying Data StructuresUnderlying Data Structures
Data structures are the bricks and mortar that hold databases together.
Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organization.
Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating Systems.
A good understanding and choice of data structures is important for machine performance, also to improve program design and to allow easier communication with DBMS specialists.
Physical DB Design 10. 4
File OrganizationFile Organization
A file organization is a technique for physically arranging the records of a file on a secondary storage device.
File organizations
Sequential Indexed Direct
Sequential Non-sequential Relative-Addressed
Hash-Addressed
Hardware-dependent(ISAM)
Hardware-independent(VSAM)
(full index)(block index)
Physical DB Design 10. 5
Record Access ModesRecord Access Modes
Sequential Access
In sequential access, record storage starts at a designated point, usually the beginning, and proceeds in a linear sequence through the file. Each record can only be retrieved by accessing all the records that physically precede it.
Random Access
In random access, a given record is accessed "out of the blue" without referencing other records in the file.
Physical DB Design 10. 6
File Organization and Access ModeFile Organization and Access Mode
A File organization is established when the file is created, and is rarely changed. However, record access mode can change each time the file is used.
FileOrganization
Record access modeSequential Random
Sequential Yes No (impractical)
Indexed Seq. Yes Yes
Direct-Relative Yes Yes
Direct-Hashed No Yes (impractical)
Physical DB Design 10. 7
Indexed Sequential ArchitectureIndexed Sequential Architecture(Partial Index)(Partial Index)
747
363 575 683
153 252 363 - -
Index set(many levels)
Sequence set
100 125 153
207 221 252
Control interval
Control Area
The actualdata records
Physical DB Design 10. 8
Direct - Relative FilesDirect - Relative Files
Each record can be retrieved by specifying its relative record number.
The relative record number is a number 0 to n that gives the position of the record relative to the beginning of the file.
This provides a method of direct file organization.
Both sequential and direct access are handled but having a key allocation suitable for this method is not always easy or possible.
Physical DB Design 10. 9
Direct - Hashed FilesDirect - Hashed Files In applications which do updates and retrievals in random mode, and
there is rarely the need for sequential access to the data records
(e.g. reservation systems). Hashed file organization provides rapid access to individual records based on a key.
The major disadvantage of hash organization is that sequential access is not convenient because the records are not stored in primary key sequence. But highly concurrent environments doing random access are suitable for using hash organization.
The basis of a hash file is an addressing algorithm which transforms the record identifier into a relative address.
Physical DB Design 10. 10
Components of a Hashed FileComponents of a Hashed File
Identifier
Transformation
Primarystoragearea
Overflowstoragearea
Bucketoverflowtechnique
1 2 3 . . . . . s
1
2
b
1 2 3 . . . . . s 0
BucketSlot
Physical DB Design 10. 11
Hashed File DesignHashed File DesignLoad Factor(Fill Factor): The load factor is the percentage of space allocated to the file that is taken up by the records in the file. A low load factor reduces the number of records that overflow their home addresses It is common to use 50% to 80%, using a lower load factor for files which that will grow.Bucket Capacity: Increasing the bucket capacity will also reduce the number of overflows and hence the average search length also.
AverageSearch Length 1.3
Load Factor (%)
b=1
b=2
b=3
b=4
20 40 60 80 100
1.0
1.1
1.5
b = records per bucket
Physical DB Design 10. 12
Comparison of OrganizationsComparison of Organizations
Sequential
Indexed Sequential
Key
Start offile
ASTEROIDS
BREAKOUT COMBAT ZAXXON
. . . . . . . . . . . . . . .
ASTEROIDS
H P Z
A D K M
MEGAMANIA ZAXXON
Index
P. . . . . . H
Physical DB Design 10. 13
Comparison of Organizations(2)Comparison of Organizations(2)
Direct - Relative
Direct - Hashed
CHESS COMBAT DEFENDER ZAXXON
1 2 3 nRelativerecord number
KEY HashingRoutine
Relativerecord no.
PITFALL BERSERK ODYSSEY DONKEYKONG
. . . .
1 2 3 n
Physical DB Design 10. 14
Binary TreesBinary Trees
A non-linear data structure, each element having several "next" elements ( branching ).
A binary tree has a maximum of two branches per element or node.
A node consist of some data and a maximum of two pointers, a left pointer to the left branch and right pointer to the right branch. If there is no left or right branch then a nil pointer is used.
Physical DB Design 10. 15
A Diagram of a Binary TreeA Diagram of a Binary Tree
Primary Key
Data Less Than Pointer
Greater Than Pointer
PRODUCT# LINK RLINKBasic binarytree recordlayout forPRODUCT_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
1000 1000
1600
1000
16000350
1000
0350 1600
2000
1000
0350 1600
20000975
(1) Initial tree (2) Insert 1000 (3) Insert 1600 (4) Insert 0350
(5) Insert 2000 (6) Insert 0975 (7) Insert 0625
1000
0350 1600
200009750625
>< >
< >
>
< >
> >
< >
>>
<
Physical DB Design 10. 16
An Example of a Binary TreeAn Example of a Binary Tree
1000
0350 1600
20000975
0625
< >
>>
<1250
1425 1775
0100
<
> <
Task: Indicate the different traversals on this diagram.
<
Physical DB Design 10. 17
B TreesB Trees
The problem with Binary Trees is balance, the tree can easily deteriorate to a linked list. Consequently, the reduced search times are lost, this problem is overcome in B trees.
B stands for Balanced, where all the leaves are the same distance from the root. B trees guarantee a predictable efficiency.
There are several varieties of Btrees, most applications use the B+tree.
A B+tree of degree m has the following properties:
1. All leaves are at the same level, that is the same depth from the root.
2. A non-leaf node that has n branches will contain n-1 keys.
Physical DB Design 10. 18
Example of a B TreeExample of a B Tree
1250
0625 10001277 1282
16000350
< >1291
2107
1425 2000
A Btree provides balance and quick direct access but sequentialprocessing can be slow. Because of this the B+tree was introduced.In a B+tree all key values occur in a leaf node so that sequential processing can be supported. This means that the leaf nodes have a different structure to high level nodes and some key values will occur twice in the tree.
Physical DB Design 10. 19
B+ Tree Node StructureB+ Tree Node Structure
P K P K P K P1 1 2 2 n-1 n-1 n
P K P K P K P1 1 2 2 n-1 n-1 n.. . . . . .
. . . . . . .
A high level node
A leaf node (Every key value appears in a leaf node)
Pointer tosubtree forkeys>= K & < K
Pointer tosubtree forkeys>= K1 n-2 n-1
Pointer tosubtree forkeys>= K & < K1 2
Pointer tosubtree forkeys< K n-1
Pointer torecord (block)with key K
Pointer torecord (block)with key K
Pointer to leafwith smallestkey greater than K
Pointer torecord (block)with key K 1 2 n-1 n-1
Physical DB Design 10. 20
Example of a B+ TreeExample of a B+ Tree
1250
0625 10001425 2000
0350 0625
1300
1250 1300 1425 1600 20000350 0625 1000
1600
1425 20001000 1250
LeafNodes
Actual Data Records
>=<
Physical DB Design 10. 21
Building a B+ TreeBuilding a B+ Tree
67, 89 , 123,18, 34, 87, 99, 104, 36, 55, 78, 9
8967 89
<89
67 123
< >=
89
34
18 123
< >=
89
89
18 123
< >=
67
89
data records
leaf node
root node
34 67
(node split a bc ; 3 do not fit so split and promote middle value)
Physical DB Design 10. 22
A Review of TreesA Review of Trees
Can permit rapid retrieval of data for both random and sequential processing.
Can be used based on primary or secondary keys.
Trees are special cases of networks; in networks records from different files are joined without a strict hierarchy being observed.
Physical DB Design 10. 23
Indexes in Oracle(1)Indexes in Oracle(1)
CREATE [bitmap] [unique] INDEX index ON table(column [,column]..);
An index is a schema object that contains an entry for each value that appears in the indexed column(s) of the table or cluster and provides direct, fast access to rows.
Indexes may be created on one or more(up to 32) columns of a table, a partitioned table, or a
cluster; one or more scalar typed object attributes of a table or a cluster.
It is preferable to use primary key when creating the table as Create Unique Index will fail if there are duplicates.
Physical DB Design 10. 24
Indexes in Oracle(2)Indexes in Oracle(2)
An index is an ordered list of all the values that reside in a group of one or more columns at a given time. Such a list makes queries that test the values in those columns vastly more efficient. Indexes also take up storage space, and must be changed whenever the data is, so a cost-benefit analysis must be made in each case to determine whether and how indexes should be used. Oracle can use indexes to improve performance when: searching for rows with specified index column values accessing tables in index column order
When you initially insert rows into a new table, it is generally faster to create the table, insert the rows, and then create the index. If you create the index before inserting the rows, Oracle must update the index for every row inserted.
Physical DB Design 10. 25
Indexes in Oracle(3)Indexes in Oracle(3) Multiple Indexes Per Table
Unlimited indexes can be created for a table provided that the combination of columns differ for each index. You can create more than one index using the same columns provided that you specify distinctly different combinations of the columns. For example, the following statements specify valid combinations:
CREATE INDEX emp_idx1 ON emp (ename, job);
CREATE INDEX emp_idx2 ON emp (job, ename); Note that each index increases the processing time needed to
maintain the table during updates to indexed data. There is overhead in maintaining indexes when a table is updated. Thus, updating a table with a single index will take less time than if the table had five indexes.
Physical DB Design 10. 26
Indexes in Oracle(4) - NullsIndexes in Oracle(4) - Nulls
Table rows in which all key columns are NULL are not indexed.
Consider the following statement:
SELECT ename
FROM emp
WHERE comm IS NULL;
The above query does not use an index created on the COMM column.
Physical DB Design 10. 27
Indexes in Oracle(5) - Bitmap IndexIndexes in Oracle(5) - Bitmap Index
Bitmap indexes store the rowids associated with a key value as a bitmap. Each bit in the bitmap corresponds to a possible ROWID, and if the bit is set, it means that the row with the corresponding ROWID contains the key value. The internal representation of bitmaps is best suited for applications with low levels of concurrent transactions, such as data warehousing.
Bitmap indexes are appropriate when there are few distinct values for a column that the index is created on. An example would be a flag column that held either Y or N.
CREATE BITMAP INDEX masterflagbitmap_ix ON film_copy(masterflag); The index holds a bitmap value for each possible value for every row in the table
Y < 1 1 0 1 1 0 0 1 . . . . . . . . . . . . >
N < 0 0 1 0 0 1 1 0 . . . . . . . . . . . . >
Physical DB Design 10. 28
Clusters(1)Clusters(1)
A cluster is a schema object that contains one or more tables that all have one or more columns in common. Rows of one or more tables that share the same value in these common columns are physically stored together within the database.
Clustering provides more control over the physical storage of rows within the database. Clustering can reduce both the time it takes to access clustered tables and the space needed to store the table. After you create a cluster and add tables to it, the cluster is transparent. You can access clustered tables with SQL statements just as you can non-clustered tables.
While clustering multiple tables improves the performance of joins, it is likely to reduce the performance of full table scans, INSERT statements, and UPDATE statements that modify cluster key values.
Physical DB Design 10. 29
Clusters(2) - creating an Indexed ClusterClusters(2) - creating an Indexed Cluster The rows of two related tables are interleaved in a single area called a cluster. The
cluster key is the column or columns by which the tables are usually joined in a query.
CREATE CLUSTER cluster (column datatype [,column datatype] . . . );
e.g.
CREATE CLUSTER workerandskill (tempname varchar2(25) );
This sets aside a space. The column name is irrelevant but the datatype must match Name in the table worker.
Next tables are created to be included in the cluster.
CREATE TABLE worker (Name Varchar2(25) not null,
Age Number,
Lodging Varchar2(15) )
CLUSTER workerandskill (Name);
Physical DB Design 10. 30
Clusters(3) - creating an Indexed ClusterClusters(3) - creating an Indexed Cluster
Now a second table is added to the cluster
CREATE TABLE workerskill ( Name Varchar2(25) not null,
Skill Varchar2(25) not null,
Ability Varchar2(15) )
CLUSTER workerandskill (Name); Prior to inserting rows into worker and workerskill you must create a
cluster index.
CREATE INDEX workerandskill_ix ON CLUSTER workerandskill;
Note that no index columns are specified since the index is automatically built on all the columns of the cluster key. For cluster indexes, all rows are indexed.
Physical DB Design 10. 31
Example of a Cluster: Example of a Cluster: NameName is the Cluster Key is the Cluster Key
Age Lodging Name Skill Ability
23 PAPA KING ADAH TALBOT WORK GOOD
29 ROSE HILL ANDREW DYE
22 CRAMNER BART SARJEANT
18 ROSE HILL DICK JONES SMITHY EXCELLENT
16 MATTS DONALD ROLLO
43 WEITBROCHT ELBERT TALBOT DISCUS SLOW
27 ROSE HILL JOHN PEARSON COMBINE DRIVER
WOODCUTTER GOOD
SMITHY AVERAGE
ROSE HILL KAY AND PALMER WALLBOM
From the WORKER table
From the WORKERSKILL table
Physical DB Design 10. 32
Clusters(4) - creating an Indexed ClusterClusters(4) - creating an Indexed Cluster Each cluster key value is stored only once. It is as if the cluster were a
big table containing data drawn from both of the tables that make it up.
You may want to use indexed clusters in the following cases:
Your queries retrieve rows over a range of cluster key values.
Your clustered tables may grow unpredictably.
You cannot specify integrity constraints as part of the definition of a cluster key column. Instead, you can associate integrity constraints with the tables that belong to the cluster.
Physical DB Design 10. 33
Clusters(5) - creating a Hash ClusterClusters(5) - creating a Hash Cluster In a hash cluster, Oracle stores together rows that have the same
hash key value. The hash value for a row is the value returned by the cluster's hash function.
When you create a hash cluster, you can either specify a hash function or use the Oracle internal hash function. Hash values are not actually stored in the cluster, although cluster key values are stored for every row in the cluster.
You may want to use hash clusters in the following cases:
Your queries retrieve rows based on equality conditions involving all cluster key columns.
Your clustered tables are static or you can determine the maximum number of rows and the maximum amount of space required by the cluster when you create the cluster.
Physical DB Design 10. 34
Clusters(6) - creating a Hash ClusterClusters(6) - creating a Hash Cluster The following statement creates a hash cluster named PERSONNEL with
the cluster key column DEPARTMENT_NUMBER.
CREATE CLUSTER personnel
( department_number NUMBER )
HASHKEYS 500;
The hashkeys clause creates the hash cluster, using an internal hash function and specifies the number of hash values rounded to the nearest prime number (503 in this case).
Now create the tables indicating the cluster in the cluster clause