Rdbms

RDBMSRelational Database Management SystemFall 2016

Muhammad Adeel RajputScientist/Instector

Contents

• Introduction to Database management systems• RDBMS• Entity Relationship Model• Normalization• Introduction to SQL• File Organization & Indexing• Database Administration• Recovery

Database Management System

Data, Database and Database Management System

Data• Data is numerical, character or other symbols which can be recorded in a

form suitable for processing by a computer. (e.g. names and addresses of students enrolling onto a university course).

Database• A Database is a collection of related data (such as an enrolling students

data) arranged for speedy search and retrieval.Database Management System• A Database Management System is a collection of programs that allows

users to specify the structure of a database, to create, query and modify the data in the database and to control access to it. (e.g. limit access to the database so that only relevant staff can access details of enrolling students).

PREVIOUS HISTORY OF DBMS

Before the concept of DBMS, they used to store the data (i.e. information) in the form of written copies and store them....

This made the retrieve process very difficult. It made wastage of paper, files, storage and precious time.

What is DBMS ?

• A set of programs to access the interrelated data.

• DBMS contains information about a particular enterprise.

• Computerized record keeping system.• Provides convenient environment to user

to perform operations: -Creation, Insertion, Deletion, Updating & Retrieval of information.

Examples of DBMS

• Some of the common used DBMSs are: -Oracle, IBM’s DB2, Microsoft’s SQL Server, MS-Access and Informix.

• Some of the desktop based DBMSs are: -Microsoft FoxPro, Borland dBase and Microsoft Access.

Advantages of DBMS• Controlling Data Redundancy: Data is recorded in only

one place in the database and it is not duplicated.• Data Consistency: Data item appears only once, and the

updated value is immediately available to all users.• Control Over Concurrency : In a computer file-based

system in updating, one may overwrite the values recorded by the other.

• Backup and Recovery Procedures: automatically create the backup of data and restore data if required.• Data Independence: Separation of data structure of

database from application program that uses the data is called data independence.

Disadvantages of DBMS

• Cost of Hardware and Software: Processor with high speed of data processing and memory of large size is required.

• Cost of Data Conversion: Very difficult and costly method to convert data of data file into database.

• Cost of Staff Training: A lot of amount for the training of staff to run the DBMS.

• Appointing Technical Staff: Trained technical persons such as database administrator, application programmers, data entry operators etc. are required to handle the DBMS.

• Database Damage: All data is integrated into a single database. If database is damaged due to electric failure or database is corrupted

on the storage media, then your valuable data may be lost forever.

Applications of DBMS....

Banking: for transactions Airlines: reservation and schedules Tele communications: for retrieving data of user Credit card: for transactions Universities: registration, retrieving marks, applications,

grades Human resources: employee records, salaries, tax deductions,

RDBMSDATABASE SYSTEM

Edger F CODD(Turing award)

RDBMS...• Most popular database system.

• Simple and sound theoretical basis.

• Developed by E F Codd in the early 1970's.

• The model is based on tables, rows and columns and the manipulation of data stored within.

• Relational database is a collection of these tables.

• First commercial system: MULTICS in 1978.

• Has overtaken Hierarchical and Network models.

• Main feature: Single database can be spread across several tables.

• Examples include: Oracle, IBM's DB2, Sybase, MySQL & Microsoft Access.

RDBMS Advantages

• Increases the sharing of data and faster development of new applications

• Support a simple data structure, namely tables or relations• Limit redundancy or replication of data• Better integrity as data inconsistencies are avoided by storing data in

one place • Provide physical data independence so users do not have to be aware

of underlying objects• Offer logical database independence - data can be viewed in different

ways by different users.• Expandability is relatively easy to achieve by adding new views of the

data as they are required.• Support one off queries using SQL or other appropriate language.• Better backup and recovery procedures• Provides multiple interfaces• Solves many problems created by other data models• The ability to handle efficiently simple data types• Multiple users can access which is not possible in DBMS

RDBMS Disadvantages

• Software is expensive• Complex software means expensive hardware • Requires skilled knowledge to implement• Certain applications are slower processing• Increased vulnerability• More difficult to recover if data is lost• Seen as a poor representation of the real world• Difficult to represent hierarchies• Difficult to represent complex data type.

Schema: - Logical structure of the database. - Doesn’t show the data in database. - Classification: 1. Physical 2. Conceptual 3. External

Cont…1. Physical Schema: -Describes the physical storage of database. -Not in terms of blocks or devices, but describes organization of files, access path

etc.2. Conceptual Schema: -Describes structure of whole database. -Describes entities their relationships and constraints.3. External Schema: -Provides a user’s view of data. -Shows relevant info particular to user, hides rest of the info. -one or more levels.

Instances: Actual data contained in database at a particular point of time.

Differences betwen DBMS and RDBMS

DBMS

• Data is stored in a single large table• Single record modification affects the whole database

RDBMS (Codd 1980)

• Database is 'broken down' into smaller pieces• The changes will not affect the entire database

The Future of RDBMS• It is very difficult to see where RDBMS’s could go because alternatives and

new versions such as XML have been introduced. From delving through forums it is plain to see that many people do not only prefer standard RDBMS’s but do not even like the alternatives.

• Academics refer to relation models “It's such a simple and elegant model that it can never become unfashionable”. - Andre Naess.

• Luke you may want to put something here about the incorporation of other software such as CAD. etc... Hope these slides are ok. Got my script written as well but you may want me to read different stuff out. X x x

ENTITY-RELATIONSHIP (E-R) MODEL

Entity-Relationship (E-R) ModelCollege Principal

College

Student C

Student A

Student BCollege 3

College 2

College 1

Course C

Course B

Course A

Student CourseAdmission

Stud_Name

Stud_Roll No Course_Id Course_Na

me

Relationships

E-R diagram

NORMALIZATION

Normalization• It is a technique for designing relational database tables to

minimize duplication of information.• Normalization is a practice to safeguard the database

against logical and structural anomalies.• Normalization is also termed as canonical synthesis by the

experts.• It is used to keep data consistent and check that no loss of

data as well as data integrity is there.• Its complexity may lead to higher degree of join operations

which sometimes lead to the degraded throughput times.• The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF, 5NF,

DKNF & 6NF are in practice.

DATABASE LANGUAGESQL

SQL…The most basic Oracle Database utility

A Basic command-line interface

The first thing you work with it in Oracle DBMS

What is SQL?– When a user wants to get some

information from a database file, he can issue a query.

– A query is a user–request to retrieve data or information with a certain condition.

– SQL is a query language that allows user to specify the conditions. (instead of algorithms)

Introduction to SQL…Concept of SQL

– The user specifies a certain condition.

– The result of the query will then be stored in form of a table.

– Statistical information of the data.

– The program will go through all the records in the database file and select those records that satisfy the condition.(searching).

Structured Query Language ...

Relational Database Management Systems use the language known as SQL

SQL is a simple programming language used for accessing and managing data in

relational databases.

Developed by IBM in 1970 to support its various relational products.

FILE ORGANIZATION

29

File Organization• The physical arrangement of data in a file into records and pages on

the disk• File organization determines the set of access methods for

– Storing and retrieving records from a file• Therefore, ‘file organization’ synonymous with ‘access method’ • We study three types of file organization

– Unordered or Heap files– Ordered or sequential files– Hash files

• We examine each of them in terms of the operations we perform on the database– Insert a new record– Search for a record (or update a record)– Delete a record

30

Unordered Or Heap File• Records are stored in the same order in which they are

created• Insert operation

– Fast – because the incoming record is written at the end of the last page of the file

• Search (or update) operation– Slow – because linear search is performed on pages

• Delete Operation– Slow – because the record to be deleted is first searched for– Deleting the record creates a hole in the page– Periodic file compacting work required to reclaim the wasted

space

31

Ordered or Sequential File• Records are sorted on the values of one or more fields

– Ordering field – the field on which the records are sorted– Ordering key – the key of the file when it is used for record sorting

• Search (or update) Operation– Fast – because binary search is performed on sorted records– Update the ordering field?

• Delete Operation– Fast – because searching the record is fast– Periodic file compacting work is, of course, required

• Insert Operation– Poor – because if we insert the new record in the correct position we need to

shift all the subsequent records in the file– Alternatively an ‘overflow file’ is created which contains all the new records as a

heap– Periodically overflow file is merged with the main file– If overflow file is created search and delete operations for records in the overflow

file have to be linear!

32

Hash File• Is an array of buckets

– Given a record, r a hash function, h(r) computes the index of the bucket in which record r belongs

– h uses one or more fields in the record called hash fields– Hash key - the key of the file when it is used by the hash

function• Example hash function

– Assume that the staff last name is used as the hash field– Assume also that the hash file size is 26 buckets - each bucket

corresponding to each of the letters from the alphabet– Then a hash function can be defined which computes the bucket

address (index) based on the first letter in the last name.

33

Hash File (2)• Insert Operation

– Fast – because the hash function computes the index of the bucket to which the record belongs

• If that bucket is full you go to the next free one• Search Operation

– Fast – because the hash function computes the index of the bucket

• Performance may degrade if the record is not found in the bucket suggested by hash function

• Delete Operation– Fast – once again for the same reason of hashing

function being able to locate the record quick

34

Indexing• Can we do anything else to improve query performance other than

selecting a good file organization?• Yes, the answer lies in indexing• Index - a data structure that allows the DBMS to locate particular

records in a file more quickly– Very similar to the index at the end of a book to locate various topics

covered in the book• Types of Index

– Primary index – one primary index per file– Clustering index – one clustering index per file – data file is ordered on a

non-key field and the index file is built on that non-key field– Secondary index – many secondary indexes per file

• Sparse index – has only some of the search key values in the file• Dense index – has an index corresponding to every search key

value in the file

35

Primary Indexes• The data file is sequentially ordered on the key field• Index file stores all (dense) or some (sparse) values of

the key field and the page number of the data file in which the corresponding record is storedB002 1

B003 1

B004 2

B005 2

B007 3

Branch BranchNo Street City Postcode

B002 56 Clover Dr London NW10 6EU

B003 163 Main St Glasgow G11 9QX

B004 32 Manse Rd Bristol BS99 1NZ

B005 22 Deer Rd London SW1 4EH

B007 16 Argyll St Aberdeen AB2 3SU

Branch B002 recordBranch B003 record

Branch B004 recordBranch B005 record

Branch B007 record

1

2

3

4

36

Indexed Sequential Access Method

• ISAM – Indexed sequential access method is based on primary index

• Default access method or table type in MySQL, MyISAM is an extension of ISAM

• Insert and delete operations disturb the sorting– You need an overflow file which periodically

needs to be merged with the main file

37

Secondary Indexes• An index file that uses a non primary field as an

index e.g. City field in the branch table• They improve the performance of queries that

use attributes other than the primary key• You can use a separate index for every attribute

you wish to use in the WHERE clause of your select query

• But there is the overhead of maintaining a large number of these indexes

DATABASE ADMINISTRATION

Database administration• Database administration is the function of managing

and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized IT (Information Technology) personnel called Database Administrators or DBAs.

https://en.wikipedia.org/wiki/Database_management_system

https://en.wikipedia.org/wiki/Oracle_database

https://en.wikipedia.org/wiki/IBM_DB2

https://en.wikipedia.org/wiki/Microsoft_SQL_Server

https://en.wikipedia.org/wiki/Information_technology

https://en.wikipedia.org/wiki/Database_administrator

DBA Responsibilities• Installation, configuration and upgrading of Database server software and

related products.• Evaluate Database features and Database related products.• Establish and maintain sound backup and recovery policies and procedures.• Take care of the Database design and implementation.• Implement and maintain database security (create and maintain users and

roles, assign privileges).• Database tuning and performance monitoring.• Application tuning and performance monitoring.• Setup and maintain documentation and standards.• Plan growth and changes (capacity planning).• Work as part of a team and provide 24x7 support when required.• Do general technical troubleshooting and give cons.• Database recovery.

https://en.wikipedia.org/wiki/Database_design

https://en.wikipedia.org/wiki/Database_tuning

Types of database administrationThere are three types of DBAs:• Systems DBAs: (also referred to as Physical DBAs, Operations DBAs or Production Support

DBAs): focus on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimization, maintenance and disaster recovery.

• Development DBAs: focus on the logical and development aspects of database administration such as data model design and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding stored procedures, collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-production activities.

• Application DBAs: usually found in organizations that have purchased 3rd party application software such as ERP (enterprise resource planning) and CRM (customer relationship management) systems. Examples of such application software includes Oracle Applications, Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the fence between the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the application components that interact with the database and carry out activities such as application installation and patching, application upgrades, database cloning, building and running data cleanup routines, data load process management, etc.

While individuals usually specialize in one type of database administration, in smaller organizations, it is not uncommon to find a single individual or group performing more than one type of database administration.

https://en.wikipedia.org/wiki/Data_model

https://en.wikipedia.org/wiki/Data_model

https://en.wikipedia.org/wiki/Data_Definition_Language

https://en.wikipedia.org/wiki/Stored_procedure

https://en.wikipedia.org/wiki/Third-party_developer

https://en.wikipedia.org/wiki/Application_software

https://en.wikipedia.org/wiki/Enterprise_resource_planning

https://en.wikipedia.org/wiki/Customer_relationship_management

https://en.wikipedia.org/wiki/Oracle_Applications

https://en.wikipedia.org/wiki/PeopleSoft

https://en.wikipedia.org/wiki/Software_componentry

https://en.wikipedia.org/wiki/Software_componentry

https://en.wikipedia.org/wiki/Process_management

Recovery

RecoveryWe give a short overview of how recovery might be implemented:• Requirements for recovery• A practical approach to recovery – keep a recovery log – must be write-

ahead• Example showing system components with values in DB and in-memory

cache• Checkpoint procedure: to aid processing of the very large recovery log• Transaction categories for recovery• An algorithm for the recovery manager

Requirements for Recovery• Media failure, e.g. disc-head crash. Part of persistent store is lost – need to restore it. Transactions in progress may be using this area – abort uncommitted transactions.

• System failure e.g. crash - main memory lost. Persistent store is not lost but may have been changed by uncommitted transactions. Also, committed transactions’ effects may not yet have reached persistent objects.

• Transaction abort Need to undo any changes made by the aborted transaction.

Our object model assumed all invocations are recorded with the object.It was not made clear how this was to be implemented – synchronously in persistent store? We need to optimise for performance reasons - not write-out every operation synchronously.

We consider one method – a recovery log. i.e. update data objects in place in persistent store, as and when appropriate, and make a (recovery) log of the updates.

Recovery Log1. Assume a periodic (daily?) dump of the database (e.g. Op. Sys. backup) 2. Assume that a record of every change to the database is written to a log

{transaction-ID, data-object-ID, operation (arguments), old value, new value } 3. If a failure occurs the log can be used by the Recovery manager to REDO or UNDO selected operations. UNDO and REDO must be idempotent (repeatable), e.g. contain before

and after values, not just “add 3”. Further crashes might occur at any time.Transaction abort: UNDO the operations – roll back the transactionSystem failure AIM: REDO committed transactions, UNDO uncommitted transactionsMedia failure reload the database from the last dump REDO the operations of all the transactions that committed since then

But the log is very large to search for this information so, to assist rapid recovery, take a CHECKPOINT at “small” time intervals e.g. after 5 mins or after n log items – see 15

Recovery Log must be “write-ahead”Two distinct operations:• write a change to an object in the database• write the log record of the changeA failure could occur between them – in which order should they be done?If an object is updated in the database, there is no record of the previous value, so no means of UNDOing the operation on abort.The log must be written first.

Also, a transaction is not allowed to commit until the log records for all its operations have been written out to the log.

Note: we can’t, and needn’t, take time to update in the database on every commit the (few) objects involved in a transaction.Note: a log can be written efficiently, because:• there are enough records from the many transactions in progress at any time, • the writes are to one place – the log file.

Checkpoints and the checkpoint procedureFrom 13: The log is very large to search for this information on transactions especially for abort of a single transaction, so take a CHECKPOINT at “small” time intervals e.g. After 5 mins or after n log items.

Checkpoint procedure :• Force-write any log records in main memory out to the log (OS must do this)• Force-write a checkpoint record to the log, containing: - list of all transactions active (started but not committed) at the time of the checkpoint - address within the log of each transaction’s most recent log record - note: the log records of a given transaction are chained• Force-write database buffers (database updates still in main memory) out to the database.• Write the address of the checkpoint record within the log into a restart file.

A recovery log with a checkpoint record

the data manager keepsobject updates and log records

in its cache in main memory

main memorylog records

T1: x, add(1), 2 ->3T2: a, add(2) 7->9

object valuesx = 3a = 9

persistent memorylog file

... many previous records ...

T1: x, add(1), 2 ->3

T2: a, add(2) 7->9

persistent system state

object valuesx = 2

a = 7

checkpoint recordactive Txs T1, T2

T1 most recent log locationT2 most recent log location

restart file

has the locationsof checkpoint records

in the log file

Transaction categories for recovery

Checkpoint record says T2 and T3 are activeT1: its log records were written out before commit. Any remaining DB updates were written out at checkpoint time. No action required.T2: any updates made after the checkpoint are in the log and can be re-applied (REDO)T4: log records are written on commit – can be re-applied (REDO is idempotent)T3 and T5: any changes that might have been made can be found in the log and previous state recovered (undone using UNDO operation)T3 requires log to be searched before the checkpoint – checkpoint contains pointer to previous log record.

Time checkpoint time failure time

T1: no action

T5

T4

T3

T2

T1

T2: REDO from checkpoint

T3: UNDO all

T4: REDO

T5: UNDO

Algorithm for recovery manager

Keeps: UNDO list - initially contains all transactions listed in the checkpoint record REDO list – initially empty

Searches forward through the log starting from the checkpoint record, to the end of the log• If it finds a start-transaction record it adds that transaction to the UNDO list• If it finds a commit record it moves that transaction from the UNDO list to the REDO list

Then, works backwards through the log UNDOing transactions on the UNDO list (restores state)Finally, works forward again through the log REDOing transactions on the REDO list

THANK YOU

Rdbms

Education

Transcript of Rdbms