1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage...

1CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7

Storage Allocation & Storage Allocation & Data Access MethodsData Access MethodsStorage Allocation & Storage Allocation & Data Access MethodsData Access Methods

By Dr. Akhtar Ali


1. Storage Allocation1. Storage Allocation1. Storage Allocation1. Storage Allocation


Storage Allocation – Storage Allocation – Logical and Physical ViewLogical and Physical View

Storage Allocation – Storage Allocation – Logical and Physical ViewLogical and Physical View

DatabaseŸ state (open,

audited, down)

TablespaceŸ mode (online,

offline, readonly)

Data segmentŸ type (temporary,

permanent)

Data extentŸ initial sizeŸ next sizeŸ % increase

Data blockŸ size

Memory blockŸ addressŸ block size

Data fileŸ directoryŸ file size

Disk deviceŸ labelŸ volume size

Database organization OS organisationComputerŸ IP address

located in

allocated for


Storage Allocation - Storage Allocation - Physical Files Allocation Physical Files Allocation

Storage Allocation - Storage Allocation - Physical Files Allocation Physical Files Allocation

How will the database be physically stored ? One physical file or many ?

– e.g. all data in one physical file ?– or each table or record type in its own physical file ?– data definitions (metadata) or indexes in separate files ?

On one disk or over several ?» or even distributed across a network ?

What is the optimum block size for each file ?– large block size allows more records to be read together in one physical read

» useful for sequential access or when related records are stored together– small block size is more efficient if records are accessed in a random manner– block size should be chosen to accommodate the most frequently accessed

physical groups of records» usually operation system specific - e.g. x*512 bytes for small up to 4k for

large blocks - Windows NT, 2k for small and up to 32k for large - UNIX


Storage Allocation – Storage Allocation – Physical Data DistributionPhysical Data Distribution

Storage Allocation – Storage Allocation – Physical Data DistributionPhysical Data Distribution


Storage Allocation – Storage Allocation – Physical Memory AllocationPhysical Memory Allocation

Storage Allocation – Storage Allocation – Physical Memory AllocationPhysical Memory Allocation


Example – Example – File AllocationFile Allocation in Oracle SQLin Oracle SQL

Example – Example – File AllocationFile Allocation in Oracle SQLin Oracle SQL

CREATE DATABASE <database name> DATAFILE <filename> ...

» specifies a <database name>.CTL file to hold all control data» specifies also several system files containing all table data unless storage

areas are explicitly specified

CREATE [TEMPORARY] TABLESPACE <storage name> DATAFILE <filename> ...

» used to create separate storage for system operations or database data» physical file <filename> will be automatically mapped by the DBMS to

<storage name>» the <filename> can include full path allowing using network files


File AllocationFile Allocation in Oracle SQL - continuedin Oracle SQL - continued

File AllocationFile Allocation in Oracle SQL - continuedin Oracle SQL - continued

Databases with explicit clauses for datafile control:» controls the overall growth of the database for physical storage of data

through a set of specified parameters

Datafile parameters– MAXDATAFILES - limits the number of datafiles which can be opened for

one database– AUTOEXTEND (On or Off) - allows allocating additional memory for the

next data segments after the file gets full– NEXT - the size of the next physical block for extending the file– MAXSIZE - controls the limit for extending of a datafile

ExampleCREATE DATABASE newtest

DATAFILE 'diska:dbone.dat' SIZE 2M MAXDATAFILES 10 DATAFILE

'disk1:df1.dbf' AUTOEXTEND ON'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE 128M


Storage Allocation - Storage Allocation - Database Tables StorageDatabase Tables Storage

Storage Allocation - Storage Allocation - Database Tables StorageDatabase Tables Storage

How will the database tables be physically spread? Entirely on the disk and/or in the cashed memory?

» Frequent vs. infrequent data use

In one physical storage area (block) or in several?– All data is static, no growth of tables projected– Dynamic data, table growth predicted

What is the size for each physical and logical storage area to be used?– Initial storage size– Size and number of the automatics extensions– Limits for extending the storage area


Storage Allocation – Storage Allocation – Database Tables storage - cntdDatabase Tables storage - cntd

Storage Allocation – Storage Allocation – Database Tables storage - cntdDatabase Tables storage - cntd


Example - TableExample - Table StorageStorage in Oracle SQL - continuedin Oracle SQL - continuedExample - TableExample - Table StorageStorage in Oracle SQL - continuedin Oracle SQL - continued

Tables with clauses for explicit tablespace control» control the growth of the tablespace segments used for physical storage of

database tables through a set of specified parameters

Tablespace parameters– INITIAL <integer>[K|M] - the original size of the tablespace– NEXT <integer>[K|M] - the size of the first physical block for extending the tablespace

(extent)– MINEXTENTS <integer> - indicative number of extensions– MAXEXTENTS <integer> - limiting number of extensions– PCTINCREASE <integer> - the percentage of increase of NEXT– OPTIMAL <integer> [K|M] | NULL - recommended value for NEXT

ExampleCREATE TABLE salgrade (grade NUMBER CONSTRAINT pk_salgrad PRIMARY KEY, losal NUMBER, hisal NUMBER) TABLESPACE human_resource

STORAGE (INITIAL 64 NEXT 64 MINEXTENTS 1 MAXEXTENTS 5)


Storage Allocation - Storage Allocation - Records PlacementRecords Placement

Storage Allocation - Storage Allocation - Records PlacementRecords Placement

For each record type, it is necessary to specify how and where it will be stored

Each record type should be stored in a way which gives best performance for the most important functions– the most frequent, on-line functions are likely to be most important– infrequent or off-line (batch) functions are probably less important– but also depends on the business perspective

Analyse the types of access required by these functions :– e.g. store new record ?– access an individual record directly via the primary key ?– access a range of records sequentially in primary key sequence ?– access a record or records from a related master record ?– access via a secondary key ?– access records in no particular sequence ?


Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd


Records may be stored continuously, but record placement will also depend on the number of records– e.g. if there only a few records then they can be stored in one physical block– e.g. related records can be stored together, but not if the number is large

Records may be stored serially as they arrive– simply add new records to the end of the file, and extend file when full– a good method for storing transaction data or archiving

» where the main overhead is storing new records» but the data is infrequently accessed

Records may be stored sequentially in primary key order for fast range search and direct match– allows sequential access for batch processing of similar data

Records may be stored randomly using prim. key algorithm– allows fast access for processing of single matching data




Indexed Sequential - the most popular– the primary key index can be a very efficient ‘limit’ index

» the index only needs to record the highest key value in each block

» the index does not need updating when records are added or deleted

– e.g. store Order records in Order number sequence to allow efficient production of pick lists, invoices etc.

Index B1 R4B2 R10B3 R14B4 R20

Database File - blocks B1, B2 etc, containing data records R1, R3 etc.

R1 R3

R4

R6 R7 R11 R14 R15 R16

R10 R18 R20

B1 B2 B3 B4

where will record R12 be stored ?

where will record R5 be stored ?




Records may be stored randomly using an algorithm on the primary key (hashing)– allows direct, fast access to individual records

– no need to maintain or access an index

– but sequential access will be very inefficient

» it will require an index to be maintained, or the records sorted

– e.g. store Customer records according to an algorithm on Cust ref

– algorithm = divide key value by 1000 and use remainder as address

B1 B2 B3 B4

R1 R1001

R3001

R2002 R2 R1003 R4003 R1004 R3004

R2003 R4

no need for an index where will record R5123 be stored ?

how many blocks in file ?




Records may be stored in physical groups of related records (clusters or partitions)– the master record can be stored as required - serial / sequential / random– the detail records are then stored in the same or adjacent block(s)– e.g. store Order Header and Order Item records together in same block(s)– related records can be read together in one physical read from disk – but if detail records need to be accessed independently of master then they

will have to be indexed additionally

Both random and sequential storage require overflow facilities and periodic reorganisation

B1 B2 B3 B4

H23 I23/1

I23/2

H92 I92/1 H16 I16/1 H74 I74/1

I16/2I92/2 I92/4


Clustered tablesClustered tablesClustered tablesClustered tables


Example - Clustered tables Example - Clustered tables in Oracle SQLin Oracle SQL

Example - Clustered tables Example - Clustered tables in Oracle SQLin Oracle SQL

CREATE CLUSTER [<schema>.]<logical cluster name>(<cluster keys>) [TABLESPACE <physical storage name>] …

» clusters store records from different tables sharing the same cluster key» clusters can be sorted or hashed for fast information retrieval

Example: hashed cluster containing two tables

CREATE CLUSTER personnel(deptno NUMBER(2), phoneno INTEGER) HASHKEYS 20;

CREATE TABLE dept(deptno NUMBER(2), dname VARCHAR2(9),loc VARCHAR2(9))CLUSTER personnel (deptno);

CREATE TABLE emp(empno NUMBER(4), ename VARCHAR2 (30), phoneno INTEGER)CLUSTER personnel (deptno, phoneno)

For physical grouping of records into single storage area


Partitioned tablesPartitioned tablesPartitioned tablesPartitioned tables


Example - Partitioned tables Example - Partitioned tables in Oracle SQLin Oracle SQL

Example - Partitioned tables Example - Partitioned tables in Oracle SQLin Oracle SQL

Used for both table and index data storage Both physical (e.g. size) and logical criteria for partitioning (e.g.

interval of values) Partitions are accessible by name directly in SQL Example: table partitioning by the date values of an attribute

CREATE TABLE xansactions (trade_date DATE, num_shares NUMBER(10),

price NUMBER(5,2)…) STORAGE (INITIAL 100K NEXT 50K) LOGGING PARTITION BY RANGE (trade_date) (PARTITION sx1992 VALUES LESS THAN (TO_DATE('01-JAN-93','DD-MON-YY')) TABLESPACE ts0, PARTITION sx1993 VALUES LESS THAN (TO_DATE('01-JAN-94','DD-MON-YY')) TABLESPACE ts1, …

For logical partitioning of physical storage area into parts


Indexed-organized tablesIndexed-organized tablesIndexed-organized tablesIndexed-organized tables


Example - Index organized tables in Example - Index organized tables in Oracle SQL Oracle SQL

Example - Index organized tables in Example - Index organized tables in Oracle SQL Oracle SQL

The primary key of the table is ordered for fast exact match and range search

All attributes are stored together with the primary key directly into the index space, so any new placements or updates do not require reordering

CREATE TABLE docindex(token char(20), doc_id NUMBER, token_frequency NUMBER, token_offsets VARCHAR2(512), CONSTRAINT pk_idx PRIMARY KEY (token, doc_id)) ORGANIZATION INDEX TABLESPACE ind_tbs ...

For sequential ordering of the physical location of table records


Storage Allocation - Storage Allocation - Records Placement - ctndRecords Placement - ctnd

Storage Allocation - Storage Allocation - Records Placement - ctndRecords Placement - ctnd

Record type : CUSTOMER Type of access

Functions On-line/ Store Primary key Direct

Off-line Direct Sequential Cust name

New Customer On 100/day

Place Order On 1000/day

Print Invoices Off 5000/week

Enquiry On 200/day 100/day

Add other access types and functions as required

It may be useful to analyse the record access requirements:


Storage Allocation - Storage Allocation - Linking Related RecordsLinking Related Records

Storage Allocation - Storage Allocation - Linking Related RecordsLinking Related Records

For each relationship type, how will the physical access path, from one record to its related records, be implemented ?

By physical grouping (i.e. clustering)– i.e. by storing records together as described above– a relationship where the master and its detail records are stored in the same

physical group is called a ‘primary’ relationship in SSADM– other relationships, where the master and detail records are physically

separated are known as ‘secondary’ relationships in SSADM

By logical separating (i.e. partitioning)– Storing records in subsequent partitions, i.e. splitting the year into monts– Each partition can be managed separately (storing, searching, backup, etc.)– Each partition can be also indexed and the indexes can be also partitioned


Storage Allocation - Storage Allocation - Linking Related Records - cntdLinking Related Records - cntd


Records may be stored in physical sequences (chains) by linked lists– the addresses of related records are stored with the data record itself

» e.g. a Customer record might hold the address of the latest Order record for that Customer

» each Order record could hold the address of the previous Order record for that Customer, and the address of the Customer record itself

Customer record

address of latest Order record

address of previous Order

address of Cust record

Order record 1089

address of previous Order

address of Cust record

Order record 972




By primary key ordering (i.e. record sorting)– requires an index on the foreign key in the detail record– gives a relatively inefficient access path for more records

» the index will create an overhead whenever new detail record is added» to find a record from a secondary index may require several reads

– but it is easy to add or change relationship types to database schema

By foreign key ordering (i.e. storage indexing)– the key values and address of detail records can be held in a small index

stored directly with the master record; so they can be found quickly» e.g. for every Customer record create an index for their Order records

– in a relational database, this could be done by creating a link table containing only the key values of the master and detail record :

Link Table : Master Detail M1 D2 M1 D9 M2 D5 M3 D1 etc.


2. Data Access Methods2. Data Access Methods2. Data Access Methods2. Data Access Methods


Data Access - Accessing RecordsData Access - Accessing RecordsData Access - Accessing RecordsData Access - Accessing Records

How will records need to be accessed ? this will have been analysed already to determine the record placement

Individual, direct access using the primary key value ?– may be provided by algorithmic random or indexed sequential record placement

– otherwise, create a hashed or sorted, unique primary key index

Via related records ? – master-detail and base-lookup relations

– see ‘Linking Related Records’ above

Sequential access in primary key order?– may be provided by indexed sequential record placement

– otherwise, create a sorted, unique primary key index to read indirectly

By secondary keys, in a group or individually?– create additional sorted indexes for each such key

– create additional hashed indexes for any secondary keys where only individual, direct access is ever required


Data Access - Index TypesData Access - Index TypesData Access - Index TypesData Access - Index Types

Indexing can be applied to both the data records (logical) and their storage (physical). There are usually two types of indexes:

Hashed indexes– the key values are stored within the index using a hashing algorithm

» allows fast direct access to data records via the hash key» does not allow sequential access

Sorted indexes– the key values and record addresses are sorted into a key sequence– the index usually has a tree structure (B-tree index), but it can be also just

simple enumeration– data records can be found fairly quickly directly– the index can be used to read the data records sequentially

» but not as efficiently as with sequential record placement

Functional indexes– the key values are calculated using pre-specified function


Data Access - Index Types - ctdData Access - Index Types - ctdData Access - Index Types - ctdData Access - Index Types - ctd

B-tree indexes– b-tree indexes are organized into ‘tables’ (of key values and addresses)

– i.e. a tree structure of index levels from a ‘root’ through ‘branches’ to ‘leaves’

– the leaf tables contain the key values and addresses of the data records

– the branch tables index the leaves or lower-level branches

– to find a record, the root is checked, then the appropriate branches down the tree are read to find the index table containing the record address and hence the data record itself

– as leaf tables fill up, they are split and the branch tables are updated

– indexes need periodic rebuilding to minimise table-splitting

– do not create unnecessary indexes

root

leaves

branches

records


Data Access - Processing Indexed DataData Access - Processing Indexed DataData Access - Processing Indexed DataData Access - Processing Indexed Data

Indexing the data records do not change the result of processing, but have substantial impact on the performance – database without indexes can work only when small number of records

– data records may have more then one index for different operations

– in principle, all the attributes in a data record could be indexed separately and/or jointly using composite indexes (fully indexed tables)

Secondary indexes will degrade performance for updates– the index must be updated every time a record is added or deleted or the key value

amended

– this may involve several physical updates of the index for each record update

Indexes can be processed as normal data records – i.e. partitioned data should have partitioned indexes as well

When loading data into a database– remove all indexes from the schema– load the data– rebuild the indexes


Indexing options Indexing options in Oracle SQLin Oracle SQL

Indexing options Indexing options in Oracle SQLin Oracle SQL

CREATE [UNIQUE | BITMAP] INDEX <index name>ON <table name > (<column selector>) [<indexing clause>] ...

» Index the table using column directly selected from the indexed table

CREATE [UNIQUE | BITMAP] INDEX <index name>ON <cluster name>

[<indexing clause>] …» Index the table using column selected from a cluster of tables with common

columns, in which the indexed table belongs

For creating indexes, specifying different index clauses and options and allocating storage for them


Indexing options Indexing options in Oracle SQL - continuedin Oracle SQL - continued


index can be stored in the same or different physical files to data records (depending on the frequency of table updates)

index can be independent or functionally dependent on the indexed columns (index function)

record placement is defined by the type of index– a hashed index gives hashed record placement– a sorted index gives logically sequential record placement– bitmap indexes use physical storage locators for record placement

additional clauses allow records (rows) of the table to be distributed over more than one physical file, as well as their indexes– either ‘randomly’ (i.e. arbitrarily, not hashed) – or partitioned ‘horizontally’ by key value hashing




Example: hashed indexCREATE INDEX sales_idx ON sales(item)

STORE IN (tbs1, tbs2)

Example: bitmapped index (Oracle 8)CREATE BITMAP INDEX partno_ix ON lineitem (partno)

TABLESPACE ts1

Example: partitioned index (Oracle 8i)CREATE INDEX stock_ix ON stock (stock_symbol, stock_line)

GLOBAL PARTITION BY RANGE (stock_symbol) PARTITION VALUES LESS THAN ('N')

TABLESPACE ts3, PARTITION VALUES LESS THAN (MAXVALUE)

TABLESPACE ts4)

1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage...

Documents

Transcript of 1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage...