DATABASE CONCEPTS - rvs06.files.wordpress.com€¦ · 13.07.2020 · Aadhaar database: This is the...
Transcript of DATABASE CONCEPTS - rvs06.files.wordpress.com€¦ · 13.07.2020 · Aadhaar database: This is the...
Kensri School and College, Bengaluru
Computer Science 1 Class 12/PU-II
DATABASE CONCEPTS
➢ Database Systems:
• Systems comprising of Databases and
Database Management Systems are
simply referred as database systems.
• A collection of data is referred to as
database and a database (management)
system is basically a computer-based
record keeping system.
• It maintains any information that may
be necessary to the decision-making
processes involved in the management
of the organization.
• The intention of a database is that the
same collection of data should serve as
many applications as possible.
• Database would permit not only the
retrieval of data but also continuous
modification of data needed for control
of operations.
• It may be possible to search the
database to obtain answers to queries
or information for planning purpose.
• A typical file processing system suffers from some major limitations like data redundancy, data
inconsistency, un-sharable data, unstandardized data, insecure data etc. On the other hand, a
database system overcomes all these limitations and ensure continues efficiency.
➢ Examples of Common Database Management Systems:
✓ MySQL, INGRES, POSTGRES, ORACLE, DB2.
➢ Features/Advantages of Database system are:
✓ Reduced data redundancy:
✓ Duplication of data is data redundancy. It leads to the problems like wastage of space and
data inconsistency.
✓ Data is said to be redundant if same data is copied at many places.
✓ For example: If a student wants to change Phone number, he has to get it updated at
various sections. Similarly, old records must be deleted from all sections representing
that student.
✓ Controlled data inconsistency:
✓ When the redundancy is not controlled, there may be occasions on which the two entries
about the same data do not agree. At such times, database is said to be inconsistence.
✓ If there is some redundancy retained in the database due to some technical reasons, the
database management system ensures that any change made to either of the two entries is
automatically made to the other.
✓ Shared data:
✓ The database allows sharing of data by several users. This means each user may have
access to the same database/table/record at a same time.
Kensri School and College, Bengaluru
Computer Science 2 Class 12/PU-II
✓ Standardized data:
✓ The database management system can ensure that all the data follow the applicable
standards.
✓ There may be some industry standards, organizational standards, and national or
international standards.
✓ Standardizing stored data formats is particularly desirable as an aid to data interchange or
migration between systems.
✓ Secured data:
✓ Data is vital to any organization and some of it may be confidential. Confidential data
must not be accessed by unauthorized persons.
✓ Authentication schemes can be laid down, giving different levels of users, different
permissions to access data.
✓ Integrated data:
✓ This means that data is accurate and consistent. Checks can be built in to ensure correct
values are entered.
✓ For example, while placing an order, the quantity must be a number above zero. Also, if
an order is placed with a supplier, supplier must exist.
➢ Applications of database.
✓ Banking: For customer information, accounts and loans, and banking transactions.
✓ Colleges: For student information, course registrations and grades.
✓ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
✓ Finance: For storing information about holdings, sales and purchases of financial
instruments such as stocks and bonds.
✓ Sales: For customer, product, and purchase information.
✓ Telecommunication: For keeping records of call made, generating monthly bills,
maintaining balance on prepaid calling cards, and storing information about the
communication networks.
✓ Aadhaar database: This is the biggest database in the world storing a data about 60
million people residing in India.
✓ Water meter billing : The RR number and all the details are stored in the database and
connected to the server based works.
✓ Rail and Airlines: For reservations and schedule information. Airlines were among the
first to use databases in a geographically distributed manner terminals situated around the
world accessed the central database system through phone lines and other data networks.
✓ Colleges : For student information, course registrations, and grades.
✓ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
✓ Manufacturing: For management of supply chain and for tracking production of items
in factories, inventories of items in warehouses/ stores, and orders for items.
✓ Human resources: For information about employees, recruitment, salaries, payroll taxes
and benefits, and for generation of paychecks.
Kensri School and College, Bengaluru
Computer Science 3 Class 12/PU-II
➢ Evolution of Database
✓ Manual File systems :
✓ The file management systems were often manual, paper-and-pencil systems. The papers
within these systems were organized to facilitate the expected use of the data
✓ As long as a collection of data was relatively small and an organization’s users had few
reporting requirements, the manual system served its role well as a data repository.
✓ As organizations grew and as reporting requirements became more complex, keeping
track of data in a manual file system became more difficult.
✓ Therefore, companies looked to computer technology for help.
✓ Computerized File systems:
✓ The computer files within the file system were similar to the manual files.
✓ The description of computer files requires a specialized vocabulary. Every discipline
develops its own terminology to enable its practitioners to communicate clearly.
Manual Data Processing Computerized Data Processing
The volume of data, which can be processed, is
limited.
The volume of data, which can be processed is
large
Requires large quantity of paper Requires less quantity of paper
Speed and accuracy is executed is limited Faster and Accurate
Labour cost is high Labour cost is low
Storage medium is paper Storage medium is Hard disk etc.
➢ Database terms :
✓ Data: Basic/raw facts about something which is not organized, for example details of some
students which is not organized.
✓ File : File is basic unit of storage in computer system. The file is the large collection of
related data.
✓ Information: Well processed data is called information. We can take decisions on the basis
of information
✓ Attribute or Field: Set of characters that represents specific data element. Each Columns is
identified by a distinct header called attribute or field
✓ Record: A single entry in a table is called a Record or Row. A Record in a table represents
set of related data. Following is an example of single record.
✓ Tuple :Records are also called the tuple.
✓ Domain :Set of values for an attribute in that column.
✓ An Entity Relationship is how each table link to each other
✓ Data Item: Each piece of information about an entity, such as name of a person or address,
age or name of a product or the price is a Data Item.
✓ Database: A Database is a collection of logically related data organized in a way that data
can be easily accessed, managed and updated.
✓ Tables : Table is a collection of data elements organized in terms of rows and columns. A
table is also considered as convenient representation of relations. Table is the most simplest
form of data storage.
✓ Relation: Relation (collection of rows and columns) generally refers to an active entity on
which we can perform various operations
Kensri School and College, Bengaluru
Computer Science 4 Class 12/PU-II
✓ Below is an example :
Employee Table
Table :Employee,
Columns :Emp_Id, NAME, AGE, SALARY
Rows :There are four rows
Emp_Id Name Age Salary
1 Bharath 28 45000/-
2 Hitesh 27 47000/-
3 Druva 29 40000/-
4 Akash 30 50000/-
➢ Data types of DBMS
✓ Integer – Hold whole number without fractions.
✓ Single and double precision – Seven significant value for a number.
✓ Logical data type-Store data that has only two values true or false.
✓ Characters – Include letter, number, spaces, symbols and punctuation. Characters fields or
variables store text information like name, address, but size will be one byte.
✓ Strings – Sequence of character more than one. Fixed length is 0 to 63Kb and dynamic strings
length range from 0 to 2 billion characters.
✓ Memo data type – Store more than 255 characters. A memo fields can store up to 65536
characters. Long documents can store OLE objects.
✓ Index fields –Used to store relevant information along with the documents. The document
input to an index field is used to find those documents when needed. The programs provides
up to 25 user definable index fields in an index set. Name drop-down look-up list, Standard,
auto-complete History list.
✓ Currency fields – The currency field accepts data in dollar form by default.
✓ Date fields -The date fields accepts data entered in date format. 1
✓ Text fields – Accepts data as an alpha-numeric text string.
➢ Database users.
✓ To design, use and maintain the database, many peoples are involved.
✓ The people who work with the database include: End Users, System Analysts, Application
programmers, Database Administrators (DBA)
• End Users (Database Users): Database users are those who interact with the database in
order to query and update the database, and generate reports.
• System Analysts: System analysts determine the requirement of end users; (especially
naïve users), to create a solution for their business need and focus on non-technical and
technical aspects.
• Application programmers: These are the computer professionals who implement the
specifications given by the system analysts and develop the application programs.
• Database Administrators (DBA): DBA is a person who has central control over both data
and application. Some of the responsibilities of DBA are authorization access, schema
definition and modification, new software installation and security enforcement and
administration.
Kensri School and College, Bengaluru
Computer Science 5 Class 12/PU-II
➢ Data processing cycle.
✓ Data Collection: It is the process of systematic gathering of data from various sources that
has been systematically observed, recorded and organized.
✓ Data Input: The raw data is put into the
computer using a keyboard, mouse or other
devices such as the scanner, microphone
and the digital camera.
✓ Data Processing: Processing is the series
of actions or operations on the input data to
generate outputs.
✓ Data storage: Data and information should
be stored in memory so that it can be
accessed later.
✓ Output: The result obtained after
processing the data must be presented to the
user in user understandable form. The output can be generated in the form of report as hard
copy or soft copy.
✓ Communication: Computers now-a days have communication ability which increases
their power. With wired or wireless communication connections, data may be input from a
far place, processed in a remote area and stored in several different places and then
transmitted by modem as an email or posted to the website where the online services are
rendered.
➢ Physical Data(File) Organization:
✓ All schemas are logical and actual data is stored in bit format on the disk.
✓ Namely storage medium: Hard disk (all the files will be stored), floppies, drum, tapes, SD
etc.,
✓ System designs choose to organize, access and process records and files in different ways
depending on the type of application and the needs of users.
✓ The three commonly used file organizations are Sequential, Direct and Indexed Sequential
Access Method(ISAM).
✓ The selection of a particular file organization depends upon the application used. To access a
record some key field or unique identifying value that is found in every record in a file is
used.
➢ File Organization:
✓ A method of organizing or arranging the files on a storage medium is called file
organization.
✓ This is classified into 3 types namely
1. Sequential file organization
2. Random file organization
3. Indexed sequential organization
1. Sequential file Organization:
✓ In this type the files are stored in a storage medium one after the other from beginning to
end. The files can also be accessed sequentially. The storage medium which is used for
sequential file organization is magnetic tapes.
✓ Advantages:
• Storage medium is cheaper.
• Files can be arranged or organized very easily.
• Efficient in the usage of a storage space.
Kensri School and College, Bengaluru
Computer Science 6 Class 12/PU-II
✓ Disadvantages:
• Less storage capacity.
• Random search is not possible.
• Time consuming search.
2. Random File Organization:
✓ In this type, the files are stored in storage medium one after the other in random order. It is
also called relative/direct file organization. We use magnetic disk as Storage medium.
✓ Advantages
• More storage capacity.
• Random search is possible.
• Time consuming is more in random file.
✓ Disadvantages
• Storage medium is costlier.
• Organizing the files is difficult.
• Inefficient in usage of storage space.
3. Indexed Sequential Organization:
✓ In this type, the files are stored in storage medium in a sequential order along with index.
It is combination of sequential and random file organization. Here Magnetic disk is used as
storage medium. It is also called indexed sequential access method.
✓ Advantages
• Both sequential and random search is possible.
• Fast access to a desired recovered with the help of index.
• More storage capacity.
✓ Disadvantages
• Storage medium is costlier.
• Less efficient in usage of storage space.
• More memory is required to store index.
➢ Data abstraction:
✓ Data abstraction provides users with an abstract view of the system. It hides certain
details of how the data is stored, created and maintained.
✓ A database management system allows users to access and modify data stores in the files.
✓ Each user may have different requirements and the data must be retrieved selectively and
efficiently.
✓ The complex designs of the data structures are hidden from the users, thorough several
levels of abstraction order to simplify user interaction with a system.
➢ DBMS Architecture.
✓ The design of Database Management System highly depends on its architecture.
✓ It can be centralized or decentralized or hierarchical.
✓ Database architecture is logically divided into three types.
• Logical one-tier in 1-tier Architecture
• Logical two-tier Client/Server Architecture.
• Logical three-tier Client/Server Architecture.
Kensri School and College, Bengaluru
Computer Science 7 Class 12/PU-II
✓ One-tier in 1-tier Architecture:
• DBMS is the only entity where user directly sits on
DBMS and uses it.
• Any changes done here will directly be on DBMS
itself.
• It does not provide handy tools for end users and
preferably database designers and programmers use
single tier architecture.
✓ Two-tier Client / Server Architecture:
• Two-tier Client / Server architecture is used for
User Interface program and Application
Programs that runs on client side.
• An interface called ODBC (Open Database
Connectivity) provides an API that allows
client side program to call the DBMS.
• Most DBMS vendors provide ODBC drivers. A
client program may connect to several
DBMS's. In this architecture some variation of
client is also possible for example in some
DBMS's more functionality is transferred to the
client including data dictionary, optimization etc.
✓ Three-tier Client / Server Architecture:
• Three-tier Client / Server database architecture is
commonly used architecture for web applications.
Intermediate layer called Application server or Web
Server stores the web connectivity software and the
business logic (constraints) part of application used
to access the right amount of data from the database
server.
• This layer acts like medium for sending partially
processed data between the database server and the
client.
➢ Various Levels of Database Implementation:
✓ DBMS 3-tier Architecture
DBMS 3-tier architecture divides the complete system into three inter-related but
independent modules as shown below:
1. Internal (or Physical) Level
2. Conceptual (or Logical) level
3. External (or View) level
Kensri School and College, Bengaluru
Computer Science 8 Class 12/PU-II
1. Internal (or Physical) level
✓ The internal schema defines the
physical storage structure of the
database. The internal schema is a very
low-level representation of the entire
database. It contains multiple
occurrences of multiple types of
internal record. In the ANSI term, it is
also called "stored record'.
✓ It describes how data are actually
stored on the storage medium. At this
level, complex low-level structures are
described in detail.
✓ At the physical level, the information
about the location of database objects
in the data store is kept. Various users
of DBMS are unaware of the locations
of these objects.
2. Conceptual (or Logical) level
✓ It describes what data are stored in the database. It also describes the relationships among
the data. It is used by database administrators who decide what data is to be kept in the
database.
✓ The conceptual schema describes the Database structure of the whole database for the
community of users. This schema hides information about the physical storage structures
and focuses on describing data types, entities, relationships, etc.
✓ For Example, STUDENT database may contain STUDENT and COURSE tables which
will be visible to users but users are unaware of their storage.
3. External (or View) level
✓ Most users access only a part of the database and the system provides views according to
the user’s requirement.
✓ An external schema describes the part of the database which specific user is interested in.
It hides the unrelated details of the database from the user. There may be "n" number of
external views for each database.
✓ Each external view is defined using an external schema, which consists of definitions of
various types of external record of that specific view.
✓ An external view is just the content of the database as it is seen by some specific particular
user.
✓ For Example, FACULTY of a university is interested in looking course details of students,
STUDENTS are interested in looking at all details related to academics, accounts, courses
and hostel details as well. So, different views can be generated for different users.
✓ For example, a user from the sales department will see only sales related data.
✓ Data Independence:
Data independence is the ability to modify a scheme definition in one level without affecting a
scheme definition in a higher level. Two types of Data Independence are:
1. Physical data independence
✓ Modifies the scheme followed at the physical level without affecting the scheme followed
at the conceptual level.
✓ Modifications at the physical level are occasionally necessary in order to improve
performance of the system.
Kensri School and College, Bengaluru
Computer Science 9 Class 12/PU-II
2. Logical data independence
✓ Modifies the conceptual scheme without causing any changes in the schemes followed at
view levels.
✓ Modifications at the conceptual level are necessary whenever logical structure of the
database get altered because of some unavoidable reasons.
✓ More difficult to achieve because the application programs are heavily dependent on the
logical structure of the database.
✓ e.g.; Adding or deleting attributes of a table should not affect the user’s view of the table.
➢ Different Data Models
✓ Data model is a collection of conceptual tools for describing data, data relationship, data
semantics and constraints.
✓ It helps in describing the structure of data at the logical level. It is a link between user’s view
of the world and bits stored in computer.
✓ A data model generally consists of
• Data model theory, which is a formal description of how data may be structured and used.
• Data model instance, which is a practical data model designed for a particular application.
✓ The process of applying model theory to create a data model instance is known as data
modeling.
✓ In history of database design, three models have been in use.
• Relational Data Models
• Network Data Models
• Hierarchical Data Models
✓ Relational Data Models
✓ The relation data model was
developed by E.F Codd in 1970.
✓ Unlike, hierarchical and network
model, there are no physical links.
✓ All data is maintained in the form of
tables consisting of rows and columns.
Each column has a unique name and is
called an attribute.
✓ Each row (record) represents an entity
and a column (field) represents an
attribute of the entity.
✓ In this model, data is organized in two-dimensional tables called relations. The tables or
relation are related to each other.
✓ A row of the table represents a relationship among a set of values. As the table is a
collection of such rows (or
relationships), it has a close
relationship with the mathematical
concept of relation, from where this
model takes its name.
✓ A database may contain many relations
providing a better classification of data
based on its nature and use. Multiple
relations are then linked/ associated
together on some common key data
values (foreign key).
Kensri School and College, Bengaluru
Computer Science 10 Class 12/PU-II
✓ Network Data Model
✓ In 1971, the Conference on Data
Systems Languages (CODASYL)
formally defined the network models.
✓ In this model, data is represented by a
collection of records and the
relationships are represented by links.
✓ Each record is collection of fields,
which contains only one data value. A
link is an association between two
records.
✓ In the network model, entities are
organized in a graph, in which some
entities can be accessed through several
paths.
✓ Advantages:
o It is simple and easy to implement.
o It can handle many relationships within the organization.
o It has better data independence compared to hierarchical model.
✓ Disadvantages:
o More complex system of database structure
o Lack of structural dependence.
✓ Hierarchical Data Model
✓ In this data model, data is represented by
a collection of records and the
relationships are represented by links.
✓ Each record is a collection of fields
(attributes) each of which contains only
one data value.
✓ The Hierarchical data model organizes
data in a tree structure.
✓ In this model each entity has only one
parent but can have several children. At
the top of hierarchy there is only one
entity which is called Root node.
✓ Advantages:
o Simplicity: The relationship
between the various layers is
logically simple.
o Data Security: The data security is
provided by the DBMS.
o Data Integrity: There is always link
between the parent segment and the
child segment under it.
o Efficiency: It is very efficient
because when the database contains a large number of one to many relationships and
when the user requires large number of transaction.
Kensri School and College, Bengaluru
Computer Science 11 Class 12/PU-II
➢ Comparison of Data Models:
Characteristic Hierarchical model Network model Relational model
Data
str
uct
ure
✓ One to many or one to
one relationships
✓ Based on parent.
child relationship
✓ Allowed the network
model to support many to
many relationships
✓ A record can have many
parents as well as many
children.
✓ One to One, One to
many, Many to many
relationships
✓ Based on relational data
structures
Data
man
ipu
lati
on
✓ Does not provide an
independent standalone
query interface
✓ Retrieve algorithms are
complex and asymmetric
✓ Uses CODASYL
(Conference on Data
Systems Languages)
✓ Retrieve algorithms are
complex and symmetric
✓ Relational databases are
what brings many sources
into a common query
(such as SQL)
✓ Retrieve algorithms are
simple and symmetric
Data
in
tegri
ty
✓ Cannot insert the
information of a child
who does not have any
parent.
✓ Multiple occurrences
of child records which
lead to problems of
inconsistency during the
update operation
✓ Deletion of parent results
in deletion of child records
✓ Does not suffer form any
insertion anomaly.
✓ Free from update
anomalies.
✓ Free from delete
anomalies
✓ Does not suffer from any
insert anomaly.
✓ Free form update
anomalies
✓ Free from delete
anomalies
➢ Basic Rules for the Relational Data model / Codd's Rules for RDBMS
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came
up with twelve rules of his own, which according to him, a database must obey in order to be regarded
as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.
✓ Rule 1: Information Rule The data stored in a database, may it be user data or metadata,
must be a value of some table cell. Everything in a database must be stored in a table format.
✓ Rule 2: Guaranteed Access Rule Every single data element value is guaranteed to be
accessible logically with a combination of table-name, primary-key rowvalue, and attribute-
name column value. No other means, such as pointers, can be used to access data.
✓ Rule 3: Systematic Treatment of NULL Values The NULL values in a database must be
given a systematic and uniform treatment. This is a very important rule because a NULL can
be interpreted as one the following − data is missing, data is not known, or data is not
applicable.
✓ Rule 4: Active Online Catalog The structure description of the entire database must be stored
in an online catalog, known as data dictionary, which can be accessed by authorized users.
Users can use the same query language to access the catalog which they use to access the
database itself.
✓ Rule 5: Comprehensive Data Sub-Language Rule A database can only be accessed using
a language having linear syntax that supports data definition, data manipulation, and
transaction management operations. This language can be used directly or by means of some
application. If the database allows access to data without any help of this language, then it is
Kensri School and College, Bengaluru
Computer Science 12 Class 12/PU-II
considered as a violation.
✓ Rule 6: View Updating Rule All the views of a database, which can theoretically be updated,
must also be updatable by the system.
✓ Rule 7: High-Level Insert, Update, and Delete Rule A database must support high-level
insertion, updation, and deletion. This must not be limited to a single row, that is, it must also
support union, intersection and minus operations to yield sets of data records.
✓ Rule 8: Physical Data Independence The data stored in a database must be independent of
the applications that access the database. Any change in the physical structure of a database
must not have any impact on how the data is being accessed by external applications.
✓ Rule 9: Logical Data Independence The logical data in a database must be independent of
its user’s view application. Any change in logical data must not affect the applications using
it. For example, if two tables are merged or one is split into two different tables, there should
be no impact or change on the user application. This is one of the most difficult rule to apply.
✓ Rule 10: Integrity Independence A database must be independent of the application that
uses it. All its integrity constraints can be independently modified without the need of any
change in the application. This rule makes a database independent of the front-end application
and its interface.
✓ Rule 11: Distribution Independence The end-user must not be able to see that the data is
distributed over various locations. Users should always get the impression that the data is
located at one site only. This rule has been regarded as the foundation of distributed database
systems.
✓ Rule 12: Non-Subversion Rule If a system has an interface that provides access to low-level
records, then the interface must not be able to subvert the system and bypass security and
integrity constraint
➢ Normalization Rule:
✓ Normalization is a step by step process of removing the different kinds of redundancy and
anomaly one step at a time from the database.
✓ Normalization is the process of organizing data in a database. This includes creating tables
and establishing relationships between those tables according to rules designed both to protect
the data and to make the database more flexible by eliminating redundancy and inconsistent
dependency.
✓ Normalization rule are divided into following normal form.
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
1. First Normal Form:
✓ First Normal Form is defined in the definition of relations (tables) itself. This rule defines that
all the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
✓ In other words : An attribute (column) of a table cannot hold multiple values. It should hold
only atomic values.
Course Content
Programming C++, Python
Web HTML, PHP, ASP
✓ We re-arrange the relation (table) as below, to convert it to First Normal Form.
Course Content
Programming C++
Programming Python
Kensri School and College, Bengaluru
Computer Science 13 Class 12/PU-II
Web HTML
Web PHP
Web ASP
✓ Each attribute must contain only a single value from its pre-defined domain.
2. Second Normal Form:
✓ Note : Prime attribute − An attribute, which is a part of the candidate-key
Non-prime attribute − An attribute, which is not a part of the prime-key
✓ The second normal form is said to be every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
✓ Students_project table
Stu-Id Proj-Id Stu-Name Proj-Name
✓ We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name
can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This
is called partial dependency, which is not allowed in Second Normal Form.
✓ Students table
Stu-Id Stu-Name Proj-Id
✓ Project table
Proj-Id Proj-Name
3. Third Normal Form:
✓ For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
• No non-prime attribute is transitively dependent on prime key attribute.
• For any non-trivial functional dependency, X → A, then either −
o X is a super-key or,
o A is prime attribute
✓ Student_Detail
Stu-Id Stu-Name City Zip
✓ We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
super-key nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency. ✓ To bring this relation into third normal form, we break the relation into two relations as follows
✓ Student_detail
Stu-Id Stu-Name Zip
✓ Zip Codes
Kensri School and College, Bengaluru
Computer Science 14 Class 12/PU-II
Zip City
➢ Entity-Relationship Diagram / ER Diagram
✓ ER-Diagram is a visual representation of data that describes how data
is related to each other.
• Entity:
o An Entity can be any object, place, person or class.
o In E-R Diagram, an entity is represented using rectangles.
o An entity is represented using rectangles..
• Attribute:
o An Attribute describes a property or characteristic of an entity.
o Attributes are represented by means of eclipses.
o Every eclipse represents one attribute and is directly connected to its entity (rectangle).
o For example, Roll_No, Name and Birth date can be attributes of a student
• Relationship:
o A relationship type is a meaningful association between entity
types.
o Relationship types are represented on the E-R diagram by a series
of lines.
o A Relationship describes relations between entities.
o Relationship is represented using diamonds shaped box.
o There are three types of relationship that exist between entities.
• Binary Relationship
• Recursive Relationship
• Ternary Relationship
✓ Binary Relationship:
o It means relation between two entities. This is further divided into three types.
1. One to One:
o This type of relationship is rarely seen in real world.
o Model
o Example
o The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many:
o It reflects business rule that one entity is associated with many number of same entity.
o For example, Student enrolls for only one Course but a Course can have many Students..
Kensri School and College, Bengaluru
Computer Science 15 Class 12/PU-II
o Model
o Example
o The arrows in the diagram describes that one student can enroll for only one course..
3. Many to Many:
o Model
o Example
o The above diagram represents that many students can enroll for more than one course
➢ Generalization
✓ In generalization, a number of entities are brought
together into one generalized entity based on their
similar characteristics.
✓ It is a bottom-up approach in which two lower level
entities combine to form a higher level entity.
✓ In generalization, the higher level entity can also
combine with other lower level entity to make further
higher level entity
✓ For example, pigeon, house sparrow, crow and dove can all be generalized as Birds
➢ Specialization
✓ Specialization is the opposite of generalization.
✓ In specialization, a group of entities is divided into sub-
groups based on their characteristics.
✓ It is a top-down approach in which one higher level entity
can be broken down into two lower level entity.
✓ In specialization, some higher level entities may not have
lower-level entity sets at all.
✓ Take a group ‘Person’ for example. A person has name,
date of birth, gender, etc.
✓ Similarly, in a school database, persons can be specialized
as teacher, student, or a staff, based on what role they play in school as entities.
➢ The Relational Model
✓ Relational Model was proposed in 1970 by E.F. Codd of the IBM.
✓ It is a dominant model for commercial data processing applications. Nearly, all databases
are based on this model.
✓ Let us explore this model in details.
1. Terminology
2. Views
Kensri School and College, Bengaluru
Computer Science 16 Class 12/PU-II
3. Structure of Relational Databases
• Keys
4. The Relational Algebra
• The Select Operation
• The Project Operation
• The Cartesian Product Operation
• The Union Operation
• The Set Difference Operation
• The Set Intersection Operation
1. Terminology:
Different terms used in relational model are being discussed here.
✓ Relation: A relation may be thought of
as a set of rows with several columns. A
relation has the following properties:
o Row is a real world entity or
relationship.
o All values in particular column are of
same kind.
o Order of columns is immaterial.
o Each row is distinct.
o Order of rows is immaterial.
o For a row, each column must have an
atomic . value (indivisible).
o For a row, a column cannot have more than one value.
✓ Domain: A domain is a pool of values from which the actual value present in a given
column are taken.
✓ Tuple: This is the horizontal part of the relation. One row represents one record of the
relation. The rows of a relation are also called tuples.
✓ Attributes – The columns of a table are also called attributes. The column is the vertical
part of the relation.
✓ Degree: The number of attributes(columns) in a relation determine the degree of a relation.
✓ Cardinality – It is the number of rows (or tuples) in a table.
2. Views:
✓ A view is a pseudo-table or virtual table. It displays the data. The data is derived from one
or more base tables.
✓ The view is a kind of table whose contents are taken upon other tables depending upon a
given query condition.
✓ No stored file is created to store the contents of a view rather its definition is stored only.
✓ The usefulness of views lies in the fact that they provide an excellent way to give people
access to some but not all of the information in a table
✓ Syntax:
CREATE VIEW <view name> AS SELECT <attribute list> FROM <table(s)>
WHERE <condition(s)>;
Kensri School and College, Bengaluru
Computer Science 17 Class 12/PU-II
3. Structure of Relational Database:
✓ Keys:
Keys come here for express difference among rows in terms of their attributes.
o Primary Key
▪ It is a column (or columns) in a table that uniquely identifies each row.
▪ A primary key value is unique and cannot be null.
▪ There is only one primary key for a table.
▪ Ex: In Relation STUDENT, Regno serves as a primary key.
o Candidate key
▪ It is a column (or columns) that uniquely identify rows in a table.
▪ Any of the identified candidate keys can be used as the table's primary key.
o Alternate key
▪ Any of the candidate keys that are not part of the primary key is called an
alternate key.
▪ OR The alternate key of any table are those candidate keys which are not
currently selected as the primary key.
▪ This is also known as secondary key.
o Foreign key
▪ It is a column (or a set of columns) that refers to the primary key in another table
i.e. it is used as a link to a matching column in another table.
▪ OR A key used to link two tables together is called a foreign key.
▪ This is sometimes called a referencing key.
▪ Foreign key is a field that matches the primary key column of another table
o To understand the concept of keys let’s take an example, suppose in a school class there
are four students who are eligible for being monitor. All these students are called
candidate key. The student who selected for monitor will be treated as primary key.
o Now suppose a student who become a monitor is not available in class, so the class
teacher choose another student from rest three student who are eligible, for taking the
responsibly of monitor. This student is called alternate key.
4. The Relational Algebra:
✓ The relational algebra is a collection of operations on relation.
✓ The relational algebraic operations can be divided into
• Basic set-oriented operations: Union, Set different, Cartesian product
• Relational-oriented operations: Selection, Projection, Division, Joins
(A) The Select Operation:
✓ The select operation selects tuples from a relation that satisfy a given condition.
✓ The selection is denoted by lowercase Greek letter σ (sigma).
✓ Suppose we have table named Items as shown in Fig. (a) and to select those
tuples from Items relation where the price is more than 19.00, we shall write
σ price > 9.00 (Items)
✓ That means from table Item, select the tuples satisfying the condition price > 9.00. The
relation that results from above query is shown in Fig. (b).
Kensri School and College, Bengaluru
Computer Science 18 Class 12/PU-II
Item# Item-Name Price
11 Milk 15.00 Item# Item-Name Price
12 Cake 5.00 11 Milk 15.00
13 Bread 9.00 14 Ice Cream 14.00
14 Ice Cream 14.00
15 Cold Drink 8.00
✓ In a selection condition, all the relational operators(=, ≠, <, ≤, >, ≥) may be used.
✓ More than one condition may be combined using connectives and(denoted by˄)
and or (denoted by ˅).
✓ To find those tuples pertaining to prices between 5.00 and 9.00 from relation
Items, we shall write σ price > 4.00 ˄ price < 9.00(Items)
✓ The result of this query is as shown in this table:
Item# Item-Name Price
12 Cake 5.00
13 Bread 9.00
14 Ice Cream 14.00
✓ NOTE: Here, in select operation ‘σ’ is used only to see how technically the operation is
done. In reality the SEL COMMAND is used to perform select operation instead of ‘σ’
which we will see in later slides.
(B) The Project Operation:
✓ The Project Operation yields a “vertical” subset of a given relation in contrast to
the “horizontal” subset returned by select operation.
✓ The projection lets you select specified attributes in a specified order and
duplicating tuples are automatically removed.
✓ Projection is denoted by Greek letter pi(π).
✓ Suppose we have table Suppliers as in Fig.-(a) and to project Supplier names and
their cities from the relation Supplier, we shall write
π Supp-Name, City (Suppliers)
✓ The relation resulting from this query is as shown in Fig.-(b).
Supp# Supp-Name Status City Supp-Name City
S1 Britannia 10 Delhi Britannia Delhi
S2 New Bakers 30 Mumbai New Bakers Mumbai
S3 Mother Dairy 10 Delhi Mother Dairy Delhi
S4 Cookz 50 Bangalore Cookz Bangalore
S5 Haldiram 40 Jaipur Haldiram Jaipur
✓ Duplicating tuples are automatically removed in the resulting relation.
For instance, if you write
π City (Suppliers)
City
Delhi
Mumbai
Kensri School and College, Bengaluru
Computer Science 19 Class 12/PU-II
✓ The resulting relation will be as ------> Delhi
Bangalore
Jaipur
✓ Project operation can also be applied on a resulting relation of a query.
✓ Consider the table Items given in previous slide, if we want only the
names of those items that are costlier than Rs. 9.00, we may write it as
π Item-Name (σ Price > 9.00 (Items))
✓ First the inner query is evaluated and then outer query is evaluated.
✓ The result of the above query is as ------>
Item-Name
Milk
Ice Cream
✓ NOTE: Here, in project operation ‘π’ is used only to see how technically the operation is
done. In reality the SE COMMAND is used to perform project operation instead of ‘π’
which we will see in later slides.
(C) The Cartesian Product Operation:
✓ The cartesian product is a binary operation and is denoted by a cross (x).
✓ The cartesian product of two relations a and B is written as A x B.
✓ The cartesian product of two relation yields a relation with all possible combination of the
tuples of the two relations operated upon.
✓ All tuples of first relation are concatenated with all the tuples of second relation to form the
tuples of the new relation.
✓ Suppose we have two relations Student and Instructor as following:
Student Instructor
Stud# Stud-Name Hosteler Inst# Inst-Name Subject
S001 Meenakshi Y 101 K. Lal English
S002 Radhika N 102 R.L. Arora Maths
S003 Abhinav N
✓ The cartesian product of these two relations, Students x Instructor, will yield a relation that
will have a degree of 6 (3 + 3: sum of degrees of student and Instructor) and a cardinality 8
(4 x 2: product of cardinalities of two relations).
✓ The resulting relation (Students x Instructor) is as following:
Stud# Stud-Name Hosteler Inst# Inst-Name Subject
S001 Meenakshi Y 101 K. Lal English
S001 Meenakshi Y 102 R.L. Arora Maths
S002 Radhika N 101 K. Lal English
S002 Radhika N 102 R.L. Arora Maths
S003 Abhinav N 101 K. Lal English
S003 Abhinav N 102 R.L. Arora Maths
✓ See the resulting relation contains all possible combinations of tuples of the two relations.
(D) The Union Operation:
✓ The union operation requires two relation and produces a third relation that contains tuple
from both the operand relation.
✓ The union operation is denoted by U. Thus, to denote the union of two relation X and Y,
we will write as X U Y.
✓ For a union operation A U B to be valid, the following two conditions must be satisfied by
the two operands A and B:
Kensri School and College, Bengaluru
Computer Science 20 Class 12/PU-II
✓ The relations A and B must be of the same degree. That is, they must have the same
number of attributes.
✓ The domains of the ith attributes of A and the ith attributes of b must be the same.
✓ Suppose we have two Drama and Song as following:
Drama Song
Rollno Name Age Rollno Name Age
13 Kush 15 2 Manya 15
17 Swati 14 10 Rishabh 15
13 Kush 15
✓ Result of Song U Drama will be as following:
Rollno Name Age
2 Manya 15
10 Rishabh 15
13 Kush 15
17 Swati 14
✓ Notice that one duplicating tuple (13, Kush, 15) has been automatically removed.
(E) The Set Difference Operation:
✓ The set difference operation denoted by – (minus) allows us to find tuples that are in one
relation but not in another.
✓ The expression A – B results in a relation containing those tuples in A but not in B.
✓ Suppose we have two Drama and Song as given in previous slide.
✓ Result of Song – Drama will be as following:
Rollno Name Age
2 Manya 15
10 Rishabh 15
(F) The Set Intersection Operation:
✓ The set intersection operation finds tuples that are common to the two operands relations.
✓ The set intersection operation is denoted by ∩. That means A ∩ B will yield a relation
having tuples common to A and B.
✓ Suppose we have two Drama and Song as given in previous slide.
✓ Result of Song ∩ Drama will be as following:
Rollno Name Age
13 Kush 15
✓ NOTE: Any relational algebra expression using set
intersection can be rewritten by replacing the intersection o
with a pair of set difference operations as:
A ∩ B = A – (A – B)
Kensri School and College, Bengaluru
Computer Science 21 Class 12/PU-II
➢ Comparing Relation Algebra and Structured Query Language.
RA SQL
Relation Algebra Structured Query Language
Is closed (the result of every
expression is a relation)
Is a superset of relation algebra
Simple semantics Complicated Semantics
It is used for reasoning, query,
optimization etc
It is an end-user language.
➢ Data warehouse
✓ A data ware house is a repository of an organization's electronically stored data.
✓ Data warehouse are designed to facilitate reporting and supporting data analysis.
✓ The concept of data warehouses was introduced in late 1980's
✓ The components of data warehouse.
• Data Source
• Data Transformation
• Reporting
• Metadata
✓ Additional components are Dependent data marts, Logical Data marts, Operational Data
store.
✓ Advantages of data ware houses:
• Enhance end-user access to reports and analysis of information.
• Increases data consistency.
• Increases productivity and decreases computing costs.
• Able to combine data from different sources, in one place.
• Data warehouses provide an infrastructure that could support changes to data and
replication of the changed data back into the operational systems.
✓ Disadvantages:
• Extracting, cleaning and loading data could be time consuming.
• Data warehouses can get outdated relatively quickly.
• Problems with compatibility with systems already in place.
• Providing training to end-users.
• Security could develop into a serious issue, especially if the data warehouses is internet
accessible.
• A data warehouses is usually not static and maintenance costs are high.
➢ Data Mining
✓ Data mining is concerned with the analysis and picking out relevant information.
✓ It is the computer, which is responsible for finding the patterns by identifying the underling
rules of the features in the data