DATABASE CONCEPTS - rvs06.files.wordpress.com€¦ · 13.07.2020 · Aadhaar database: This is the...

Kensri School and College, Bengaluru

Computer Science 1 Class 12/PU-II

DATABASE CONCEPTS

➢ Database Systems:

• Systems comprising of Databases and

Database Management Systems are

simply referred as database systems.

• A collection of data is referred to as

database and a database (management)

system is basically a computer-based

record keeping system.

• It maintains any information that may

be necessary to the decision-making

processes involved in the management

of the organization.

• The intention of a database is that the

same collection of data should serve as

many applications as possible.

• Database would permit not only the

retrieval of data but also continuous

modification of data needed for control

of operations.

• It may be possible to search the

database to obtain answers to queries

or information for planning purpose.

• A typical file processing system suffers from some major limitations like data redundancy, data

inconsistency, un-sharable data, unstandardized data, insecure data etc. On the other hand, a

database system overcomes all these limitations and ensure continues efficiency.

➢ Examples of Common Database Management Systems:

✓ MySQL, INGRES, POSTGRES, ORACLE, DB2.

➢ Features/Advantages of Database system are:

✓ Reduced data redundancy:

✓ Duplication of data is data redundancy. It leads to the problems like wastage of space and

data inconsistency.

✓ Data is said to be redundant if same data is copied at many places.

✓ For example: If a student wants to change Phone number, he has to get it updated at

various sections. Similarly, old records must be deleted from all sections representing

that student.

✓ Controlled data inconsistency:

✓ When the redundancy is not controlled, there may be occasions on which the two entries

about the same data do not agree. At such times, database is said to be inconsistence.

✓ If there is some redundancy retained in the database due to some technical reasons, the

database management system ensures that any change made to either of the two entries is

automatically made to the other.

✓ Shared data:

✓ The database allows sharing of data by several users. This means each user may have

access to the same database/table/record at a same time.



✓ Standardized data:

✓ The database management system can ensure that all the data follow the applicable

standards.

✓ There may be some industry standards, organizational standards, and national or

international standards.

✓ Standardizing stored data formats is particularly desirable as an aid to data interchange or

migration between systems.

✓ Secured data:

✓ Data is vital to any organization and some of it may be confidential. Confidential data

must not be accessed by unauthorized persons.

✓ Authentication schemes can be laid down, giving different levels of users, different

permissions to access data.

✓ Integrated data:

✓ This means that data is accurate and consistent. Checks can be built in to ensure correct

values are entered.

✓ For example, while placing an order, the quantity must be a number above zero. Also, if

an order is placed with a supplier, supplier must exist.

➢ Applications of database.

✓ Banking: For customer information, accounts and loans, and banking transactions.

✓ Colleges: For student information, course registrations and grades.

✓ Credit card transactions: For purchases on credit cards and generation of monthly

statements.

✓ Finance: For storing information about holdings, sales and purchases of financial

instruments such as stocks and bonds.

✓ Sales: For customer, product, and purchase information.

✓ Telecommunication: For keeping records of call made, generating monthly bills,

maintaining balance on prepaid calling cards, and storing information about the

communication networks.

✓ Aadhaar database: This is the biggest database in the world storing a data about 60

million people residing in India.

✓ Water meter billing : The RR number and all the details are stored in the database and

connected to the server based works.

✓ Rail and Airlines: For reservations and schedule information. Airlines were among the

first to use databases in a geographically distributed manner terminals situated around the

world accessed the central database system through phone lines and other data networks.

✓ Colleges : For student information, course registrations, and grades.

✓ Credit card transactions: For purchases on credit cards and generation of monthly

statements.

✓ Manufacturing: For management of supply chain and for tracking production of items

in factories, inventories of items in warehouses/ stores, and orders for items.

✓ Human resources: For information about employees, recruitment, salaries, payroll taxes

and benefits, and for generation of paychecks.



➢ Evolution of Database

✓ Manual File systems :

✓ The file management systems were often manual, paper-and-pencil systems. The papers

within these systems were organized to facilitate the expected use of the data

✓ As long as a collection of data was relatively small and an organization’s users had few

reporting requirements, the manual system served its role well as a data repository.

✓ As organizations grew and as reporting requirements became more complex, keeping

track of data in a manual file system became more difficult.

✓ Therefore, companies looked to computer technology for help.

✓ Computerized File systems:

✓ The computer files within the file system were similar to the manual files.

✓ The description of computer files requires a specialized vocabulary. Every discipline

develops its own terminology to enable its practitioners to communicate clearly.

Manual Data Processing Computerized Data Processing

The volume of data, which can be processed, is

limited.

The volume of data, which can be processed is

large

Requires large quantity of paper Requires less quantity of paper

Speed and accuracy is executed is limited Faster and Accurate

Labour cost is high Labour cost is low

Storage medium is paper Storage medium is Hard disk etc.

➢ Database terms :

✓ Data: Basic/raw facts about something which is not organized, for example details of some

students which is not organized.

✓ File : File is basic unit of storage in computer system. The file is the large collection of

related data.

✓ Information: Well processed data is called information. We can take decisions on the basis

of information

✓ Attribute or Field: Set of characters that represents specific data element. Each Columns is

identified by a distinct header called attribute or field

✓ Record: A single entry in a table is called a Record or Row. A Record in a table represents

set of related data. Following is an example of single record.

✓ Tuple :Records are also called the tuple.

✓ Domain :Set of values for an attribute in that column.

✓ An Entity Relationship is how each table link to each other

✓ Data Item: Each piece of information about an entity, such as name of a person or address,

age or name of a product or the price is a Data Item.

✓ Database: A Database is a collection of logically related data organized in a way that data

can be easily accessed, managed and updated.

✓ Tables : Table is a collection of data elements organized in terms of rows and columns. A

table is also considered as convenient representation of relations. Table is the most simplest

form of data storage.

✓ Relation: Relation (collection of rows and columns) generally refers to an active entity on

which we can perform various operations



✓ Below is an example :

Employee Table

Table :Employee,

Columns :Emp_Id, NAME, AGE, SALARY

Rows :There are four rows

Emp_Id Name Age Salary

1 Bharath 28 45000/-

2 Hitesh 27 47000/-

3 Druva 29 40000/-

4 Akash 30 50000/-

➢ Data types of DBMS

✓ Integer – Hold whole number without fractions.

✓ Single and double precision – Seven significant value for a number.

✓ Logical data type-Store data that has only two values true or false.

✓ Characters – Include letter, number, spaces, symbols and punctuation. Characters fields or

variables store text information like name, address, but size will be one byte.

✓ Strings – Sequence of character more than one. Fixed length is 0 to 63Kb and dynamic strings

length range from 0 to 2 billion characters.

✓ Memo data type – Store more than 255 characters. A memo fields can store up to 65536

characters. Long documents can store OLE objects.

✓ Index fields –Used to store relevant information along with the documents. The document

input to an index field is used to find those documents when needed. The programs provides

up to 25 user definable index fields in an index set. Name drop-down look-up list, Standard,

auto-complete History list.

✓ Currency fields – The currency field accepts data in dollar form by default.

✓ Date fields -The date fields accepts data entered in date format. 1

✓ Text fields – Accepts data as an alpha-numeric text string.

➢ Database users.

✓ To design, use and maintain the database, many peoples are involved.

✓ The people who work with the database include: End Users, System Analysts, Application

programmers, Database Administrators (DBA)

• End Users (Database Users): Database users are those who interact with the database in

order to query and update the database, and generate reports.

• System Analysts: System analysts determine the requirement of end users; (especially

naïve users), to create a solution for their business need and focus on non-technical and

technical aspects.

• Application programmers: These are the computer professionals who implement the

specifications given by the system analysts and develop the application programs.

• Database Administrators (DBA): DBA is a person who has central control over both data

and application. Some of the responsibilities of DBA are authorization access, schema

definition and modification, new software installation and security enforcement and

administration.



➢ Data processing cycle.

✓ Data Collection: It is the process of systematic gathering of data from various sources that

has been systematically observed, recorded and organized.

✓ Data Input: The raw data is put into the

computer using a keyboard, mouse or other

devices such as the scanner, microphone

and the digital camera.

✓ Data Processing: Processing is the series

of actions or operations on the input data to

generate outputs.

✓ Data storage: Data and information should

be stored in memory so that it can be

accessed later.

✓ Output: The result obtained after

processing the data must be presented to the

user in user understandable form. The output can be generated in the form of report as hard

copy or soft copy.

✓ Communication: Computers now-a days have communication ability which increases

their power. With wired or wireless communication connections, data may be input from a

far place, processed in a remote area and stored in several different places and then

transmitted by modem as an email or posted to the website where the online services are

rendered.

➢ Physical Data(File) Organization:

✓ All schemas are logical and actual data is stored in bit format on the disk.

✓ Namely storage medium: Hard disk (all the files will be stored), floppies, drum, tapes, SD

etc.,

✓ System designs choose to organize, access and process records and files in different ways

depending on the type of application and the needs of users.

✓ The three commonly used file organizations are Sequential, Direct and Indexed Sequential

Access Method(ISAM).

✓ The selection of a particular file organization depends upon the application used. To access a

record some key field or unique identifying value that is found in every record in a file is

used.

➢ File Organization:

✓ A method of organizing or arranging the files on a storage medium is called file

organization.

✓ This is classified into 3 types namely

1. Sequential file organization

2. Random file organization

3. Indexed sequential organization

1. Sequential file Organization:

✓ In this type the files are stored in a storage medium one after the other from beginning to

end. The files can also be accessed sequentially. The storage medium which is used for

sequential file organization is magnetic tapes.

✓ Advantages:

• Storage medium is cheaper.

• Files can be arranged or organized very easily.

• Efficient in the usage of a storage space.



✓ Disadvantages:

• Less storage capacity.

• Random search is not possible.

• Time consuming search.

2. Random File Organization:

✓ In this type, the files are stored in storage medium one after the other in random order. It is

also called relative/direct file organization. We use magnetic disk as Storage medium.

✓ Advantages

• More storage capacity.

• Random search is possible.

• Time consuming is more in random file.

✓ Disadvantages

• Storage medium is costlier.

• Organizing the files is difficult.

• Inefficient in usage of storage space.

3. Indexed Sequential Organization:

✓ In this type, the files are stored in storage medium in a sequential order along with index.

It is combination of sequential and random file organization. Here Magnetic disk is used as

storage medium. It is also called indexed sequential access method.

✓ Advantages

• Both sequential and random search is possible.

• Fast access to a desired recovered with the help of index.

• More storage capacity.

✓ Disadvantages

• Storage medium is costlier.

• Less efficient in usage of storage space.

• More memory is required to store index.

➢ Data abstraction:

✓ Data abstraction provides users with an abstract view of the system. It hides certain

details of how the data is stored, created and maintained.

✓ A database management system allows users to access and modify data stores in the files.

✓ Each user may have different requirements and the data must be retrieved selectively and

efficiently.

✓ The complex designs of the data structures are hidden from the users, thorough several

levels of abstraction order to simplify user interaction with a system.

➢ DBMS Architecture.

✓ The design of Database Management System highly depends on its architecture.

✓ It can be centralized or decentralized or hierarchical.

✓ Database architecture is logically divided into three types.

• Logical one-tier in 1-tier Architecture

• Logical two-tier Client/Server Architecture.

• Logical three-tier Client/Server Architecture.



✓ One-tier in 1-tier Architecture:

• DBMS is the only entity where user directly sits on

DBMS and uses it.

• Any changes done here will directly be on DBMS

itself.

• It does not provide handy tools for end users and

preferably database designers and programmers use

single tier architecture.

✓ Two-tier Client / Server Architecture:

• Two-tier Client / Server architecture is used for

User Interface program and Application

Programs that runs on client side.

• An interface called ODBC (Open Database

Connectivity) provides an API that allows

client side program to call the DBMS.

• Most DBMS vendors provide ODBC drivers. A

client program may connect to several

DBMS's. In this architecture some variation of

client is also possible for example in some

DBMS's more functionality is transferred to the

client including data dictionary, optimization etc.

✓ Three-tier Client / Server Architecture:

• Three-tier Client / Server database architecture is

commonly used architecture for web applications.

Intermediate layer called Application server or Web

Server stores the web connectivity software and the

business logic (constraints) part of application used

to access the right amount of data from the database

server.

• This layer acts like medium for sending partially

processed data between the database server and the

client.

➢ Various Levels of Database Implementation:

✓ DBMS 3-tier Architecture

DBMS 3-tier architecture divides the complete system into three inter-related but

independent modules as shown below:

1. Internal (or Physical) Level

2. Conceptual (or Logical) level

3. External (or View) level



1. Internal (or Physical) level

✓ The internal schema defines the

physical storage structure of the

database. The internal schema is a very

low-level representation of the entire

database. It contains multiple

occurrences of multiple types of

internal record. In the ANSI term, it is

also called "stored record'.

✓ It describes how data are actually

stored on the storage medium. At this

level, complex low-level structures are

described in detail.

✓ At the physical level, the information

about the location of database objects

in the data store is kept. Various users

of DBMS are unaware of the locations

of these objects.

2. Conceptual (or Logical) level

✓ It describes what data are stored in the database. It also describes the relationships among

the data. It is used by database administrators who decide what data is to be kept in the

database.

✓ The conceptual schema describes the Database structure of the whole database for the

community of users. This schema hides information about the physical storage structures

and focuses on describing data types, entities, relationships, etc.

✓ For Example, STUDENT database may contain STUDENT and COURSE tables which

will be visible to users but users are unaware of their storage.

3. External (or View) level

✓ Most users access only a part of the database and the system provides views according to

the user’s requirement.

✓ An external schema describes the part of the database which specific user is interested in.

It hides the unrelated details of the database from the user. There may be "n" number of

external views for each database.

✓ Each external view is defined using an external schema, which consists of definitions of

various types of external record of that specific view.

✓ An external view is just the content of the database as it is seen by some specific particular

user.

✓ For Example, FACULTY of a university is interested in looking course details of students,

STUDENTS are interested in looking at all details related to academics, accounts, courses

and hostel details as well. So, different views can be generated for different users.

✓ For example, a user from the sales department will see only sales related data.

✓ Data Independence:

Data independence is the ability to modify a scheme definition in one level without affecting a

scheme definition in a higher level. Two types of Data Independence are:

1. Physical data independence

✓ Modifies the scheme followed at the physical level without affecting the scheme followed

at the conceptual level.

✓ Modifications at the physical level are occasionally necessary in order to improve

performance of the system.



2. Logical data independence

✓ Modifies the conceptual scheme without causing any changes in the schemes followed at

view levels.

✓ Modifications at the conceptual level are necessary whenever logical structure of the

database get altered because of some unavoidable reasons.

✓ More difficult to achieve because the application programs are heavily dependent on the

logical structure of the database.

✓ e.g.; Adding or deleting attributes of a table should not affect the user’s view of the table.

➢ Different Data Models

✓ Data model is a collection of conceptual tools for describing data, data relationship, data

semantics and constraints.

✓ It helps in describing the structure of data at the logical level. It is a link between user’s view

of the world and bits stored in computer.

✓ A data model generally consists of

• Data model theory, which is a formal description of how data may be structured and used.

• Data model instance, which is a practical data model designed for a particular application.

✓ The process of applying model theory to create a data model instance is known as data

modeling.

✓ In history of database design, three models have been in use.

• Relational Data Models

• Network Data Models

• Hierarchical Data Models

✓ Relational Data Models

✓ The relation data model was

developed by E.F Codd in 1970.

✓ Unlike, hierarchical and network

model, there are no physical links.

✓ All data is maintained in the form of

tables consisting of rows and columns.

Each column has a unique name and is

called an attribute.

✓ Each row (record) represents an entity

and a column (field) represents an

attribute of the entity.

✓ In this model, data is organized in two-dimensional tables called relations. The tables or

relation are related to each other.

✓ A row of the table represents a relationship among a set of values. As the table is a

collection of such rows (or

relationships), it has a close

relationship with the mathematical

concept of relation, from where this

model takes its name.

✓ A database may contain many relations

providing a better classification of data

based on its nature and use. Multiple

relations are then linked/ associated

together on some common key data

values (foreign key).



✓ Network Data Model

✓ In 1971, the Conference on Data

Systems Languages (CODASYL)

formally defined the network models.

✓ In this model, data is represented by a

collection of records and the

relationships are represented by links.

✓ Each record is collection of fields,

which contains only one data value. A

link is an association between two

records.

✓ In the network model, entities are

organized in a graph, in which some

entities can be accessed through several

paths.

✓ Advantages:

o It is simple and easy to implement.

o It can handle many relationships within the organization.

o It has better data independence compared to hierarchical model.

✓ Disadvantages:

o More complex system of database structure

o Lack of structural dependence.

✓ Hierarchical Data Model

✓ In this data model, data is represented by

a collection of records and the

relationships are represented by links.

✓ Each record is a collection of fields

(attributes) each of which contains only

one data value.

✓ The Hierarchical data model organizes

data in a tree structure.

✓ In this model each entity has only one

parent but can have several children. At

the top of hierarchy there is only one

entity which is called Root node.

✓ Advantages:

o Simplicity: The relationship

between the various layers is

logically simple.

o Data Security: The data security is

provided by the DBMS.

o Data Integrity: There is always link

between the parent segment and the

child segment under it.

o Efficiency: It is very efficient

because when the database contains a large number of one to many relationships and

when the user requires large number of transaction.



➢ Comparison of Data Models:

Characteristic Hierarchical model Network model Relational model

Data

str

uct

ure

✓ One to many or one to

one relationships

✓ Based on parent.

child relationship

✓ Allowed the network

model to support many to

many relationships

✓ A record can have many

parents as well as many

children.

✓ One to One, One to

many, Many to many

relationships

✓ Based on relational data

structures

Data

man

ipu

lati

on

✓ Does not provide an

independent standalone

query interface

✓ Retrieve algorithms are

complex and asymmetric

✓ Uses CODASYL

(Conference on Data

Systems Languages)


complex and symmetric

✓ Relational databases are

what brings many sources

into a common query

(such as SQL)


simple and symmetric

Data

in

tegri

ty

✓ Cannot insert the

information of a child

who does not have any

parent.

✓ Multiple occurrences

of child records which

lead to problems of

inconsistency during the

update operation

✓ Deletion of parent results

in deletion of child records

✓ Does not suffer form any

insertion anomaly.

✓ Free from update

anomalies.

✓ Free from delete

anomalies

✓ Does not suffer from any

insert anomaly.

✓ Free form update

anomalies

✓ Free from delete

anomalies

➢ Basic Rules for the Relational Data model / Codd's Rules for RDBMS

Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came

up with twelve rules of his own, which according to him, a database must obey in order to be regarded

as a true relational database.

These rules can be applied on any database system that manages stored data using only its

relational capabilities. This is a foundation rule, which acts as a base for all the other rules.

✓ Rule 1: Information Rule The data stored in a database, may it be user data or metadata,

must be a value of some table cell. Everything in a database must be stored in a table format.

✓ Rule 2: Guaranteed Access Rule Every single data element value is guaranteed to be

accessible logically with a combination of table-name, primary-key rowvalue, and attribute-

name column value. No other means, such as pointers, can be used to access data.

✓ Rule 3: Systematic Treatment of NULL Values The NULL values in a database must be

given a systematic and uniform treatment. This is a very important rule because a NULL can

be interpreted as one the following − data is missing, data is not known, or data is not

applicable.

✓ Rule 4: Active Online Catalog The structure description of the entire database must be stored

in an online catalog, known as data dictionary, which can be accessed by authorized users.

Users can use the same query language to access the catalog which they use to access the

database itself.

✓ Rule 5: Comprehensive Data Sub-Language Rule A database can only be accessed using

a language having linear syntax that supports data definition, data manipulation, and

transaction management operations. This language can be used directly or by means of some

application. If the database allows access to data without any help of this language, then it is



considered as a violation.

✓ Rule 6: View Updating Rule All the views of a database, which can theoretically be updated,

must also be updatable by the system.

✓ Rule 7: High-Level Insert, Update, and Delete Rule A database must support high-level

insertion, updation, and deletion. This must not be limited to a single row, that is, it must also

support union, intersection and minus operations to yield sets of data records.

✓ Rule 8: Physical Data Independence The data stored in a database must be independent of

the applications that access the database. Any change in the physical structure of a database

must not have any impact on how the data is being accessed by external applications.

✓ Rule 9: Logical Data Independence The logical data in a database must be independent of

its user’s view application. Any change in logical data must not affect the applications using

it. For example, if two tables are merged or one is split into two different tables, there should

be no impact or change on the user application. This is one of the most difficult rule to apply.

✓ Rule 10: Integrity Independence A database must be independent of the application that

uses it. All its integrity constraints can be independently modified without the need of any

change in the application. This rule makes a database independent of the front-end application

and its interface.

✓ Rule 11: Distribution Independence The end-user must not be able to see that the data is

distributed over various locations. Users should always get the impression that the data is

located at one site only. This rule has been regarded as the foundation of distributed database

systems.

✓ Rule 12: Non-Subversion Rule If a system has an interface that provides access to low-level

records, then the interface must not be able to subvert the system and bypass security and

integrity constraint

➢ Normalization Rule:

✓ Normalization is a step by step process of removing the different kinds of redundancy and

anomaly one step at a time from the database.

✓ Normalization is the process of organizing data in a database. This includes creating tables

and establishing relationships between those tables according to rules designed both to protect

the data and to make the database more flexible by eliminating redundancy and inconsistent

dependency.

✓ Normalization rule are divided into following normal form.

1. First Normal Form

2. Second Normal Form

3. Third Normal Form

1. First Normal Form:

✓ First Normal Form is defined in the definition of relations (tables) itself. This rule defines that

all the attributes in a relation must have atomic domains. The values in an atomic domain are

indivisible units.

✓ In other words : An attribute (column) of a table cannot hold multiple values. It should hold

only atomic values.

Course Content

Programming C++, Python

Web HTML, PHP, ASP

✓ We re-arrange the relation (table) as below, to convert it to First Normal Form.

Course Content

Programming C++

Programming Python



Web HTML

Web PHP

Web ASP

✓ Each attribute must contain only a single value from its pre-defined domain.

2. Second Normal Form:

✓ Note : Prime attribute − An attribute, which is a part of the candidate-key

Non-prime attribute − An attribute, which is not a part of the prime-key

✓ The second normal form is said to be every non-prime attribute should be fully functionally

dependent on prime key attribute. That is, if X → A holds, then there should not be any proper

subset Y of X, for which Y → A also holds true.

✓ Students_project table

Stu-Id Proj-Id Stu-Name Proj-Name

✓ We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.

According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent

upon both and not on any of the prime key attribute individually. But we find that Stu_Name

can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This

is called partial dependency, which is not allowed in Second Normal Form.

✓ Students table

Stu-Id Stu-Name Proj-Id

✓ Project table

Proj-Id Proj-Name

3. Third Normal Form:

✓ For a relation to be in Third Normal Form, it must be in Second Normal form and the

following must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.

• For any non-trivial functional dependency, X → A, then either −

o X is a super-key or,

o A is prime attribute

✓ Student_Detail

Stu-Id Stu-Name City Zip

✓ We find that in the above Student_detail relation, Stu_ID is the key and only prime key

attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a

super-key nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists

transitive dependency. ✓ To bring this relation into third normal form, we break the relation into two relations as follows

✓ Student_detail

Stu-Id Stu-Name Zip

✓ Zip Codes



Zip City

➢ Entity-Relationship Diagram / ER Diagram

✓ ER-Diagram is a visual representation of data that describes how data

is related to each other.

• Entity:

o An Entity can be any object, place, person or class.

o In E-R Diagram, an entity is represented using rectangles.

o An entity is represented using rectangles..

• Attribute:

o An Attribute describes a property or characteristic of an entity.

o Attributes are represented by means of eclipses.

o Every eclipse represents one attribute and is directly connected to its entity (rectangle).

o For example, Roll_No, Name and Birth date can be attributes of a student

• Relationship:

o A relationship type is a meaningful association between entity

types.

o Relationship types are represented on the E-R diagram by a series

of lines.

o A Relationship describes relations between entities.

o Relationship is represented using diamonds shaped box.

o There are three types of relationship that exist between entities.

• Binary Relationship

• Recursive Relationship

• Ternary Relationship

✓ Binary Relationship:

o It means relation between two entities. This is further divided into three types.

1. One to One:

o This type of relationship is rarely seen in real world.

o Model

o Example

o The above example describes that one student can enroll only for one course and a course

will also have only one Student. This is not what you will usually see in relationship.

2. One to Many:

o It reflects business rule that one entity is associated with many number of same entity.

o For example, Student enrolls for only one Course but a Course can have many Students..



o Model

o Example

o The arrows in the diagram describes that one student can enroll for only one course..

3. Many to Many:

o Model

o Example

o The above diagram represents that many students can enroll for more than one course

➢ Generalization

✓ In generalization, a number of entities are brought

together into one generalized entity based on their

similar characteristics.

✓ It is a bottom-up approach in which two lower level

entities combine to form a higher level entity.

✓ In generalization, the higher level entity can also

combine with other lower level entity to make further

higher level entity

✓ For example, pigeon, house sparrow, crow and dove can all be generalized as Birds

➢ Specialization

✓ Specialization is the opposite of generalization.

✓ In specialization, a group of entities is divided into sub-

groups based on their characteristics.

✓ It is a top-down approach in which one higher level entity

can be broken down into two lower level entity.

✓ In specialization, some higher level entities may not have

lower-level entity sets at all.

✓ Take a group ‘Person’ for example. A person has name,

date of birth, gender, etc.

✓ Similarly, in a school database, persons can be specialized

as teacher, student, or a staff, based on what role they play in school as entities.

➢ The Relational Model

✓ Relational Model was proposed in 1970 by E.F. Codd of the IBM.

✓ It is a dominant model for commercial data processing applications. Nearly, all databases

are based on this model.

✓ Let us explore this model in details.

1. Terminology

2. Views



3. Structure of Relational Databases

• Keys

4. The Relational Algebra

• The Select Operation

• The Project Operation

• The Cartesian Product Operation

• The Union Operation

• The Set Difference Operation

• The Set Intersection Operation

1. Terminology:

Different terms used in relational model are being discussed here.

✓ Relation: A relation may be thought of

as a set of rows with several columns. A

relation has the following properties:

o Row is a real world entity or

relationship.

o All values in particular column are of

same kind.

o Order of columns is immaterial.

o Each row is distinct.

o Order of rows is immaterial.

o For a row, each column must have an

atomic . value (indivisible).

o For a row, a column cannot have more than one value.

✓ Domain: A domain is a pool of values from which the actual value present in a given

column are taken.

✓ Tuple: This is the horizontal part of the relation. One row represents one record of the

relation. The rows of a relation are also called tuples.

✓ Attributes – The columns of a table are also called attributes. The column is the vertical

part of the relation.

✓ Degree: The number of attributes(columns) in a relation determine the degree of a relation.

✓ Cardinality – It is the number of rows (or tuples) in a table.

2. Views:

✓ A view is a pseudo-table or virtual table. It displays the data. The data is derived from one

or more base tables.

✓ The view is a kind of table whose contents are taken upon other tables depending upon a

given query condition.

✓ No stored file is created to store the contents of a view rather its definition is stored only.

✓ The usefulness of views lies in the fact that they provide an excellent way to give people

access to some but not all of the information in a table

✓ Syntax:

CREATE VIEW <view name> AS SELECT <attribute list> FROM <table(s)>

WHERE <condition(s)>;



3. Structure of Relational Database:

✓ Keys:

Keys come here for express difference among rows in terms of their attributes.

o Primary Key

▪ It is a column (or columns) in a table that uniquely identifies each row.

▪ A primary key value is unique and cannot be null.

▪ There is only one primary key for a table.

▪ Ex: In Relation STUDENT, Regno serves as a primary key.

o Candidate key

▪ It is a column (or columns) that uniquely identify rows in a table.

▪ Any of the identified candidate keys can be used as the table's primary key.

o Alternate key

▪ Any of the candidate keys that are not part of the primary key is called an

alternate key.

▪ OR The alternate key of any table are those candidate keys which are not

currently selected as the primary key.

▪ This is also known as secondary key.

o Foreign key

▪ It is a column (or a set of columns) that refers to the primary key in another table

i.e. it is used as a link to a matching column in another table.

▪ OR A key used to link two tables together is called a foreign key.

▪ This is sometimes called a referencing key.

▪ Foreign key is a field that matches the primary key column of another table

o To understand the concept of keys let’s take an example, suppose in a school class there

are four students who are eligible for being monitor. All these students are called

candidate key. The student who selected for monitor will be treated as primary key.

o Now suppose a student who become a monitor is not available in class, so the class

teacher choose another student from rest three student who are eligible, for taking the

responsibly of monitor. This student is called alternate key.

4. The Relational Algebra:

✓ The relational algebra is a collection of operations on relation.

✓ The relational algebraic operations can be divided into

• Basic set-oriented operations: Union, Set different, Cartesian product

• Relational-oriented operations: Selection, Projection, Division, Joins

(A) The Select Operation:

✓ The select operation selects tuples from a relation that satisfy a given condition.

✓ The selection is denoted by lowercase Greek letter σ (sigma).

✓ Suppose we have table named Items as shown in Fig. (a) and to select those

tuples from Items relation where the price is more than 19.00, we shall write

σ price > 9.00 (Items)

✓ That means from table Item, select the tuples satisfying the condition price > 9.00. The

relation that results from above query is shown in Fig. (b).



Item# Item-Name Price

11 Milk 15.00 Item# Item-Name Price

12 Cake 5.00 11 Milk 15.00

13 Bread 9.00 14 Ice Cream 14.00

14 Ice Cream 14.00

15 Cold Drink 8.00

✓ In a selection condition, all the relational operators(=, ≠, <, ≤, >, ≥) may be used.

✓ More than one condition may be combined using connectives and(denoted by˄)

and or (denoted by ˅).

✓ To find those tuples pertaining to prices between 5.00 and 9.00 from relation

Items, we shall write σ price > 4.00 ˄ price < 9.00(Items)

✓ The result of this query is as shown in this table:

Item# Item-Name Price

12 Cake 5.00

13 Bread 9.00

14 Ice Cream 14.00

✓ NOTE: Here, in select operation ‘σ’ is used only to see how technically the operation is

done. In reality the SEL COMMAND is used to perform select operation instead of ‘σ’

which we will see in later slides.

(B) The Project Operation:

✓ The Project Operation yields a “vertical” subset of a given relation in contrast to

the “horizontal” subset returned by select operation.

✓ The projection lets you select specified attributes in a specified order and

duplicating tuples are automatically removed.

✓ Projection is denoted by Greek letter pi(π).

✓ Suppose we have table Suppliers as in Fig.-(a) and to project Supplier names and

their cities from the relation Supplier, we shall write

π Supp-Name, City (Suppliers)

✓ The relation resulting from this query is as shown in Fig.-(b).

Supp# Supp-Name Status City Supp-Name City

S1 Britannia 10 Delhi Britannia Delhi

S2 New Bakers 30 Mumbai New Bakers Mumbai

S3 Mother Dairy 10 Delhi Mother Dairy Delhi

S4 Cookz 50 Bangalore Cookz Bangalore

S5 Haldiram 40 Jaipur Haldiram Jaipur

✓ Duplicating tuples are automatically removed in the resulting relation.

For instance, if you write

π City (Suppliers)

City

Delhi

Mumbai



✓ The resulting relation will be as ------> Delhi

Bangalore

Jaipur

✓ Project operation can also be applied on a resulting relation of a query.

✓ Consider the table Items given in previous slide, if we want only the

names of those items that are costlier than Rs. 9.00, we may write it as

π Item-Name (σ Price > 9.00 (Items))

✓ First the inner query is evaluated and then outer query is evaluated.

✓ The result of the above query is as ------>

Item-Name

Milk

Ice Cream

✓ NOTE: Here, in project operation ‘π’ is used only to see how technically the operation is

done. In reality the SE COMMAND is used to perform project operation instead of ‘π’

which we will see in later slides.

(C) The Cartesian Product Operation:

✓ The cartesian product is a binary operation and is denoted by a cross (x).

✓ The cartesian product of two relations a and B is written as A x B.

✓ The cartesian product of two relation yields a relation with all possible combination of the

tuples of the two relations operated upon.

✓ All tuples of first relation are concatenated with all the tuples of second relation to form the

tuples of the new relation.

✓ Suppose we have two relations Student and Instructor as following:

Student Instructor

Stud# Stud-Name Hosteler Inst# Inst-Name Subject

S001 Meenakshi Y 101 K. Lal English

S002 Radhika N 102 R.L. Arora Maths

S003 Abhinav N

✓ The cartesian product of these two relations, Students x Instructor, will yield a relation that

will have a degree of 6 (3 + 3: sum of degrees of student and Instructor) and a cardinality 8

(4 x 2: product of cardinalities of two relations).

✓ The resulting relation (Students x Instructor) is as following:

Stud# Stud-Name Hosteler Inst# Inst-Name Subject

S001 Meenakshi Y 101 K. Lal English

S001 Meenakshi Y 102 R.L. Arora Maths

S002 Radhika N 101 K. Lal English

S002 Radhika N 102 R.L. Arora Maths

S003 Abhinav N 101 K. Lal English

S003 Abhinav N 102 R.L. Arora Maths

✓ See the resulting relation contains all possible combinations of tuples of the two relations.

(D) The Union Operation:

✓ The union operation requires two relation and produces a third relation that contains tuple

from both the operand relation.

✓ The union operation is denoted by U. Thus, to denote the union of two relation X and Y,

we will write as X U Y.

✓ For a union operation A U B to be valid, the following two conditions must be satisfied by

the two operands A and B:



✓ The relations A and B must be of the same degree. That is, they must have the same

number of attributes.

✓ The domains of the ith attributes of A and the ith attributes of b must be the same.

✓ Suppose we have two Drama and Song as following:

Drama Song

Rollno Name Age Rollno Name Age

13 Kush 15 2 Manya 15

17 Swati 14 10 Rishabh 15

13 Kush 15

✓ Result of Song U Drama will be as following:

Rollno Name Age

2 Manya 15

10 Rishabh 15

13 Kush 15

17 Swati 14

✓ Notice that one duplicating tuple (13, Kush, 15) has been automatically removed.

(E) The Set Difference Operation:

✓ The set difference operation denoted by – (minus) allows us to find tuples that are in one

relation but not in another.

✓ The expression A – B results in a relation containing those tuples in A but not in B.

✓ Suppose we have two Drama and Song as given in previous slide.

✓ Result of Song – Drama will be as following:

Rollno Name Age

2 Manya 15

10 Rishabh 15

(F) The Set Intersection Operation:

✓ The set intersection operation finds tuples that are common to the two operands relations.

✓ The set intersection operation is denoted by ∩. That means A ∩ B will yield a relation

having tuples common to A and B.

✓ Suppose we have two Drama and Song as given in previous slide.

✓ Result of Song ∩ Drama will be as following:

Rollno Name Age

13 Kush 15

✓ NOTE: Any relational algebra expression using set

intersection can be rewritten by replacing the intersection o

with a pair of set difference operations as:

A ∩ B = A – (A – B)



➢ Comparing Relation Algebra and Structured Query Language.

RA SQL

Relation Algebra Structured Query Language

Is closed (the result of every

expression is a relation)

Is a superset of relation algebra

Simple semantics Complicated Semantics

It is used for reasoning, query,

optimization etc

It is an end-user language.

➢ Data warehouse

✓ A data ware house is a repository of an organization's electronically stored data.

✓ Data warehouse are designed to facilitate reporting and supporting data analysis.

✓ The concept of data warehouses was introduced in late 1980's

✓ The components of data warehouse.

• Data Source

• Data Transformation

• Reporting

• Metadata

✓ Additional components are Dependent data marts, Logical Data marts, Operational Data

store.

✓ Advantages of data ware houses:

• Enhance end-user access to reports and analysis of information.

• Increases data consistency.

• Increases productivity and decreases computing costs.

• Able to combine data from different sources, in one place.

• Data warehouses provide an infrastructure that could support changes to data and

replication of the changed data back into the operational systems.

✓ Disadvantages:

• Extracting, cleaning and loading data could be time consuming.

• Data warehouses can get outdated relatively quickly.

• Problems with compatibility with systems already in place.

• Providing training to end-users.

• Security could develop into a serious issue, especially if the data warehouses is internet

accessible.

• A data warehouses is usually not static and maintenance costs are high.

➢ Data Mining

✓ Data mining is concerned with the analysis and picking out relevant information.

✓ It is the computer, which is responsible for finding the patterns by identifying the underling

rules of the features in the data

DATABASE CONCEPTS - rvs06.files.wordpress.com€¦ · 13.07.2020 · Aadhaar database: This is the...

Documents

Transcript of DATABASE CONCEPTS - rvs06.files.wordpress.com€¦ · 13.07.2020 · Aadhaar database: This is the...