1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage...
-
Upload
magdalen-rose -
Category
Documents
-
view
214 -
download
0
Transcript of 1 CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7 Storage...
1CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation & Storage Allocation & Data Access MethodsData Access MethodsStorage Allocation & Storage Allocation & Data Access MethodsData Access Methods
By Dr. Akhtar Ali
2CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
1. Storage Allocation1. Storage Allocation1. Storage Allocation1. Storage Allocation
3CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation – Storage Allocation – Logical and Physical ViewLogical and Physical View
Storage Allocation – Storage Allocation – Logical and Physical ViewLogical and Physical View
DatabaseŸ state (open,
audited, down)
TablespaceŸ mode (online,
offline, readonly)
Data segmentŸ type (temporary,
permanent)
Data extentŸ initial sizeŸ next sizeŸ % increase
Data blockŸ size
Memory blockŸ addressŸ block size
Data fileŸ directoryŸ file size
Disk deviceŸ labelŸ volume size
Database organization OS organisationComputerŸ IP address
located in
allocated for
4CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Physical Files Allocation Physical Files Allocation
Storage Allocation - Storage Allocation - Physical Files Allocation Physical Files Allocation
How will the database be physically stored ? One physical file or many ?
– e.g. all data in one physical file ?– or each table or record type in its own physical file ?– data definitions (metadata) or indexes in separate files ?
On one disk or over several ?» or even distributed across a network ?
What is the optimum block size for each file ?– large block size allows more records to be read together in one physical read
» useful for sequential access or when related records are stored together– small block size is more efficient if records are accessed in a random manner– block size should be chosen to accommodate the most frequently accessed
physical groups of records» usually operation system specific - e.g. x*512 bytes for small up to 4k for
large blocks - Windows NT, 2k for small and up to 32k for large - UNIX
5CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation – Storage Allocation – Physical Data DistributionPhysical Data Distribution
Storage Allocation – Storage Allocation – Physical Data DistributionPhysical Data Distribution
6CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation – Storage Allocation – Physical Memory AllocationPhysical Memory Allocation
Storage Allocation – Storage Allocation – Physical Memory AllocationPhysical Memory Allocation
7CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Example – Example – File AllocationFile Allocation in Oracle SQLin Oracle SQL
Example – Example – File AllocationFile Allocation in Oracle SQLin Oracle SQL
CREATE DATABASE <database name> DATAFILE <filename> ...
» specifies a <database name>.CTL file to hold all control data» specifies also several system files containing all table data unless storage
areas are explicitly specified
CREATE [TEMPORARY] TABLESPACE <storage name> DATAFILE <filename> ...
» used to create separate storage for system operations or database data» physical file <filename> will be automatically mapped by the DBMS to
<storage name>» the <filename> can include full path allowing using network files
8CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
File AllocationFile Allocation in Oracle SQL - continuedin Oracle SQL - continued
File AllocationFile Allocation in Oracle SQL - continuedin Oracle SQL - continued
Databases with explicit clauses for datafile control:» controls the overall growth of the database for physical storage of data
through a set of specified parameters
Datafile parameters– MAXDATAFILES - limits the number of datafiles which can be opened for
one database– AUTOEXTEND (On or Off) - allows allocating additional memory for the
next data segments after the file gets full– NEXT - the size of the next physical block for extending the file– MAXSIZE - controls the limit for extending of a datafile
ExampleCREATE DATABASE newtest
DATAFILE 'diska:dbone.dat' SIZE 2M MAXDATAFILES 10 DATAFILE
'disk1:df1.dbf' AUTOEXTEND ON'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE 128M
9CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Database Tables StorageDatabase Tables Storage
Storage Allocation - Storage Allocation - Database Tables StorageDatabase Tables Storage
How will the database tables be physically spread? Entirely on the disk and/or in the cashed memory?
» Frequent vs. infrequent data use
In one physical storage area (block) or in several?– All data is static, no growth of tables projected– Dynamic data, table growth predicted
What is the size for each physical and logical storage area to be used?– Initial storage size– Size and number of the automatics extensions– Limits for extending the storage area
10CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation – Storage Allocation – Database Tables storage - cntdDatabase Tables storage - cntd
Storage Allocation – Storage Allocation – Database Tables storage - cntdDatabase Tables storage - cntd
11CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Example - TableExample - Table StorageStorage in Oracle SQL - continuedin Oracle SQL - continuedExample - TableExample - Table StorageStorage in Oracle SQL - continuedin Oracle SQL - continued
Tables with clauses for explicit tablespace control» control the growth of the tablespace segments used for physical storage of
database tables through a set of specified parameters
Tablespace parameters– INITIAL <integer>[K|M] - the original size of the tablespace– NEXT <integer>[K|M] - the size of the first physical block for extending the tablespace
(extent)– MINEXTENTS <integer> - indicative number of extensions– MAXEXTENTS <integer> - limiting number of extensions– PCTINCREASE <integer> - the percentage of increase of NEXT– OPTIMAL <integer> [K|M] | NULL - recommended value for NEXT
ExampleCREATE TABLE salgrade (grade NUMBER CONSTRAINT pk_salgrad PRIMARY KEY, losal NUMBER, hisal NUMBER) TABLESPACE human_resource
STORAGE (INITIAL 64 NEXT 64 MINEXTENTS 1 MAXEXTENTS 5)
12CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records PlacementRecords Placement
Storage Allocation - Storage Allocation - Records PlacementRecords Placement
For each record type, it is necessary to specify how and where it will be stored
Each record type should be stored in a way which gives best performance for the most important functions– the most frequent, on-line functions are likely to be most important– infrequent or off-line (batch) functions are probably less important– but also depends on the business perspective
Analyse the types of access required by these functions :– e.g. store new record ?– access an individual record directly via the primary key ?– access a range of records sequentially in primary key sequence ?– access a record or records from a related master record ?– access via a secondary key ?– access records in no particular sequence ?
13CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Records may be stored continuously, but record placement will also depend on the number of records– e.g. if there only a few records then they can be stored in one physical block– e.g. related records can be stored together, but not if the number is large
Records may be stored serially as they arrive– simply add new records to the end of the file, and extend file when full– a good method for storing transaction data or archiving
» where the main overhead is storing new records» but the data is infrequently accessed
Records may be stored sequentially in primary key order for fast range search and direct match– allows sequential access for batch processing of similar data
Records may be stored randomly using prim. key algorithm– allows fast access for processing of single matching data
14CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Indexed Sequential - the most popular– the primary key index can be a very efficient ‘limit’ index
» the index only needs to record the highest key value in each block
» the index does not need updating when records are added or deleted
– e.g. store Order records in Order number sequence to allow efficient production of pick lists, invoices etc.
Index B1 R4B2 R10B3 R14B4 R20
Database File - blocks B1, B2 etc, containing data records R1, R3 etc.
R1 R3
R4
R6 R7 R11 R14 R15 R16
R10 R18 R20
B1 B2 B3 B4
where will record R12 be stored ?
where will record R5 be stored ?
15CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Records may be stored randomly using an algorithm on the primary key (hashing)– allows direct, fast access to individual records
– no need to maintain or access an index
– but sequential access will be very inefficient
» it will require an index to be maintained, or the records sorted
– e.g. store Customer records according to an algorithm on Cust ref
– algorithm = divide key value by 1000 and use remainder as address
B1 B2 B3 B4
R1 R1001
R3001
R2002 R2 R1003 R4003 R1004 R3004
R2003 R4
no need for an index where will record R5123 be stored ?
how many blocks in file ?
16CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Storage Allocation - Storage Allocation - Records Placement - cntdRecords Placement - cntd
Records may be stored in physical groups of related records (clusters or partitions)– the master record can be stored as required - serial / sequential / random– the detail records are then stored in the same or adjacent block(s)– e.g. store Order Header and Order Item records together in same block(s)– related records can be read together in one physical read from disk – but if detail records need to be accessed independently of master then they
will have to be indexed additionally
Both random and sequential storage require overflow facilities and periodic reorganisation
B1 B2 B3 B4
H23 I23/1
I23/2
H92 I92/1 H16 I16/1 H74 I74/1
I16/2I92/2 I92/4
17CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Clustered tablesClustered tablesClustered tablesClustered tables
18CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Example - Clustered tables Example - Clustered tables in Oracle SQLin Oracle SQL
Example - Clustered tables Example - Clustered tables in Oracle SQLin Oracle SQL
CREATE CLUSTER [<schema>.]<logical cluster name>(<cluster keys>) [TABLESPACE <physical storage name>] …
» clusters store records from different tables sharing the same cluster key» clusters can be sorted or hashed for fast information retrieval
Example: hashed cluster containing two tables
CREATE CLUSTER personnel(deptno NUMBER(2), phoneno INTEGER) HASHKEYS 20;
CREATE TABLE dept(deptno NUMBER(2), dname VARCHAR2(9),loc VARCHAR2(9))CLUSTER personnel (deptno);
CREATE TABLE emp(empno NUMBER(4), ename VARCHAR2 (30), phoneno INTEGER)CLUSTER personnel (deptno, phoneno)
For physical grouping of records into single storage area
19CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Partitioned tablesPartitioned tablesPartitioned tablesPartitioned tables
20CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Example - Partitioned tables Example - Partitioned tables in Oracle SQLin Oracle SQL
Example - Partitioned tables Example - Partitioned tables in Oracle SQLin Oracle SQL
Used for both table and index data storage Both physical (e.g. size) and logical criteria for partitioning (e.g.
interval of values) Partitions are accessible by name directly in SQL Example: table partitioning by the date values of an attribute
CREATE TABLE xansactions (trade_date DATE, num_shares NUMBER(10),
price NUMBER(5,2)…) STORAGE (INITIAL 100K NEXT 50K) LOGGING PARTITION BY RANGE (trade_date) (PARTITION sx1992 VALUES LESS THAN (TO_DATE('01-JAN-93','DD-MON-YY')) TABLESPACE ts0, PARTITION sx1993 VALUES LESS THAN (TO_DATE('01-JAN-94','DD-MON-YY')) TABLESPACE ts1, …
For logical partitioning of physical storage area into parts
21CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Indexed-organized tablesIndexed-organized tablesIndexed-organized tablesIndexed-organized tables
22CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Example - Index organized tables in Example - Index organized tables in Oracle SQL Oracle SQL
Example - Index organized tables in Example - Index organized tables in Oracle SQL Oracle SQL
The primary key of the table is ordered for fast exact match and range search
All attributes are stored together with the primary key directly into the index space, so any new placements or updates do not require reordering
CREATE TABLE docindex(token char(20), doc_id NUMBER, token_frequency NUMBER, token_offsets VARCHAR2(512), CONSTRAINT pk_idx PRIMARY KEY (token, doc_id)) ORGANIZATION INDEX TABLESPACE ind_tbs ...
For sequential ordering of the physical location of table records
23CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Records Placement - ctndRecords Placement - ctnd
Storage Allocation - Storage Allocation - Records Placement - ctndRecords Placement - ctnd
Record type : CUSTOMER Type of access
Functions On-line/ Store Primary key Direct
Off-line Direct Sequential Cust name
New Customer On 100/day
Place Order On 1000/day
Print Invoices Off 5000/week
Enquiry On 200/day 100/day
Add other access types and functions as required
It may be useful to analyse the record access requirements:
24CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Linking Related RecordsLinking Related Records
Storage Allocation - Storage Allocation - Linking Related RecordsLinking Related Records
For each relationship type, how will the physical access path, from one record to its related records, be implemented ?
By physical grouping (i.e. clustering)– i.e. by storing records together as described above– a relationship where the master and its detail records are stored in the same
physical group is called a ‘primary’ relationship in SSADM– other relationships, where the master and detail records are physically
separated are known as ‘secondary’ relationships in SSADM
By logical separating (i.e. partitioning)– Storing records in subsequent partitions, i.e. splitting the year into monts– Each partition can be managed separately (storing, searching, backup, etc.)– Each partition can be also indexed and the indexes can be also partitioned
25CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Linking Related Records - cntdLinking Related Records - cntd
Storage Allocation - Storage Allocation - Linking Related Records - cntdLinking Related Records - cntd
Records may be stored in physical sequences (chains) by linked lists– the addresses of related records are stored with the data record itself
» e.g. a Customer record might hold the address of the latest Order record for that Customer
» each Order record could hold the address of the previous Order record for that Customer, and the address of the Customer record itself
Customer record
address of latest Order record
address of previous Order
address of Cust record
Order record 1089
address of previous Order
address of Cust record
Order record 972
26CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Storage Allocation - Storage Allocation - Linking Related Records - cntdLinking Related Records - cntd
Storage Allocation - Storage Allocation - Linking Related Records - cntdLinking Related Records - cntd
By primary key ordering (i.e. record sorting)– requires an index on the foreign key in the detail record– gives a relatively inefficient access path for more records
» the index will create an overhead whenever new detail record is added» to find a record from a secondary index may require several reads
– but it is easy to add or change relationship types to database schema
By foreign key ordering (i.e. storage indexing)– the key values and address of detail records can be held in a small index
stored directly with the master record; so they can be found quickly» e.g. for every Customer record create an index for their Order records
– in a relational database, this could be done by creating a link table containing only the key values of the master and detail record :
Link Table : Master Detail M1 D2 M1 D9 M2 D5 M3 D1 etc.
27CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
2. Data Access Methods2. Data Access Methods2. Data Access Methods2. Data Access Methods
28CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Data Access - Accessing RecordsData Access - Accessing RecordsData Access - Accessing RecordsData Access - Accessing Records
How will records need to be accessed ? this will have been analysed already to determine the record placement
Individual, direct access using the primary key value ?– may be provided by algorithmic random or indexed sequential record placement
– otherwise, create a hashed or sorted, unique primary key index
Via related records ? – master-detail and base-lookup relations
– see ‘Linking Related Records’ above
Sequential access in primary key order?– may be provided by indexed sequential record placement
– otherwise, create a sorted, unique primary key index to read indirectly
By secondary keys, in a group or individually?– create additional sorted indexes for each such key
– create additional hashed indexes for any secondary keys where only individual, direct access is ever required
29CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Data Access - Index TypesData Access - Index TypesData Access - Index TypesData Access - Index Types
Indexing can be applied to both the data records (logical) and their storage (physical). There are usually two types of indexes:
Hashed indexes– the key values are stored within the index using a hashing algorithm
» allows fast direct access to data records via the hash key» does not allow sequential access
Sorted indexes– the key values and record addresses are sorted into a key sequence– the index usually has a tree structure (B-tree index), but it can be also just
simple enumeration– data records can be found fairly quickly directly– the index can be used to read the data records sequentially
» but not as efficiently as with sequential record placement
Functional indexes– the key values are calculated using pre-specified function
30CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Data Access - Index Types - ctdData Access - Index Types - ctdData Access - Index Types - ctdData Access - Index Types - ctd
B-tree indexes– b-tree indexes are organized into ‘tables’ (of key values and addresses)
– i.e. a tree structure of index levels from a ‘root’ through ‘branches’ to ‘leaves’
– the leaf tables contain the key values and addresses of the data records
– the branch tables index the leaves or lower-level branches
– to find a record, the root is checked, then the appropriate branches down the tree are read to find the index table containing the record address and hence the data record itself
– as leaf tables fill up, they are split and the branch tables are updated
– indexes need periodic rebuilding to minimise table-splitting
– do not create unnecessary indexes
root
leaves
branches
records
31CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Data Access - Processing Indexed DataData Access - Processing Indexed DataData Access - Processing Indexed DataData Access - Processing Indexed Data
Indexing the data records do not change the result of processing, but have substantial impact on the performance – database without indexes can work only when small number of records
– data records may have more then one index for different operations
– in principle, all the attributes in a data record could be indexed separately and/or jointly using composite indexes (fully indexed tables)
Secondary indexes will degrade performance for updates– the index must be updated every time a record is added or deleted or the key value
amended
– this may involve several physical updates of the index for each record update
Indexes can be processed as normal data records – i.e. partitioned data should have partitioned indexes as well
When loading data into a database– remove all indexes from the schema– load the data– rebuild the indexes
32CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Indexing options Indexing options in Oracle SQLin Oracle SQL
Indexing options Indexing options in Oracle SQLin Oracle SQL
CREATE [UNIQUE | BITMAP] INDEX <index name>ON <table name > (<column selector>) [<indexing clause>] ...
» Index the table using column directly selected from the indexed table
CREATE [UNIQUE | BITMAP] INDEX <index name>ON <cluster name>
[<indexing clause>] …» Index the table using column selected from a cluster of tables with common
columns, in which the indexed table belongs
For creating indexes, specifying different index clauses and options and allocating storage for them
33CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Indexing options Indexing options in Oracle SQL - continuedin Oracle SQL - continued
Indexing options Indexing options in Oracle SQL - continuedin Oracle SQL - continued
index can be stored in the same or different physical files to data records (depending on the frequency of table updates)
index can be independent or functionally dependent on the indexed columns (index function)
record placement is defined by the type of index– a hashed index gives hashed record placement– a sorted index gives logically sequential record placement– bitmap indexes use physical storage locators for record placement
additional clauses allow records (rows) of the table to be distributed over more than one physical file, as well as their indexes– either ‘randomly’ (i.e. arbitrarily, not hashed) – or partitioned ‘horizontally’ by key value hashing
34CG171 - Database Implementation and Development (Physical Database Design) – Lecture 7
Indexing options Indexing options in Oracle SQL - continuedin Oracle SQL - continued
Indexing options Indexing options in Oracle SQL - continuedin Oracle SQL - continued
Example: hashed indexCREATE INDEX sales_idx ON sales(item)
STORE IN (tbs1, tbs2)
Example: bitmapped index (Oracle 8)CREATE BITMAP INDEX partno_ix ON lineitem (partno)
TABLESPACE ts1
Example: partitioned index (Oracle 8i)CREATE INDEX stock_ix ON stock (stock_symbol, stock_line)
GLOBAL PARTITION BY RANGE (stock_symbol) PARTITION VALUES LESS THAN ('N')
TABLESPACE ts3, PARTITION VALUES LESS THAN (MAXVALUE)
TABLESPACE ts4)