Syllabus 1.Introduction 2.ER Model 3.Relational Model 4.SQL 5.Integrity and Security 6.Relational...

Post on 02-Jan-2016

217 views 1 download

Transcript of Syllabus 1.Introduction 2.ER Model 3.Relational Model 4.SQL 5.Integrity and Security 6.Relational...

Syllabus1. Introduction

2. ER Model

3. Relational Model

4. SQL

5. Integrity and Security

6. Relational Database Design

7. File Structure, Indexing and Hashing

8. Transaction

9. Concurrency control

10. Recovery System

Chapter 1: Introduction• What is Database?• Purpose of Database Systems• View of Data• Data Models • Data Definition Language • Data Manipulation Language• Transaction Management • Storage Management• Database Administrator• Database Users• Overall System Structure

Definition of Database

• A shared collection of logically related data, designed to meet the information needs of multiple users in an organization.

• The term database is often erroneously referred to as a synonym for a “DataBase Management System (DBMS)"

Contd…• A collection of data: part numbers, product codes, customer information, etc.

It usually refers to data organized and stored on a computer that can be searched and retrieved by a computer program.

 

• A data structure that stores metadata, i.e. data about data. More generally we can say an organized collection of information.

 

• A collection of information organized and presented to serve a specific purpose. (A telephone book is a common database.) A computerized database is an updated, organized file of machine readable information that is rapidly searched and retrieved by computer.

 

• An organized collection of information in computerized format.

 

• A collection of related information about a subject organized in a useful manner that provides a base or foundation for procedures such as retrieving information, drawing conclusions, and making decisions.

ExampleThing Data (Facts/Figures)

Cricket PlayerCountry, name, date of birth, specialty, matches played, runs etc.

ScholarsName, data of birth, age, country, field, books published etc.

Food Name, ingredients, taste, preferred time, origin, etc.

VehicleRegistration number, make, owner, type, price, etc.

Purpose / Need of Database Systems

Let us discuss an example

• We might start by building a file with the following structure:

• This text file is easy to deal with. So there's no need for a DBMS!

Example: Personal Calendar

What Day When Who Where

Lunch 10/24 1pm Rick Joe’s DinerCS123 10/25 9am Dr. Egghead Morris234Biking 10/26 9am Jane Jane’s houseDinner 10/26 6PM Jane Café Le Boeuf

Problem 1: Data Organization

• Consider the all-important ``who'' field. Do we also want to keep e-mail addresses, telephone numbers etc?

• Expand our file to look like:

• Now we are keeping our address book in our calendar and doing so redundantly.

What When Who-name Who-email Who-tel …. Where …

“Link” Calendar with Address Book?

• Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information.

• This link could be based on something as simple as the person's name.

Problem 2: Efficiency• Size of personal address book is probably less

than one hundred entries, but there are things we'd like to do quickly and efficiently. – “Give me all appointments on 10/28”– “When am I next meeting Jim?”

• “Program” these as quickly as possible. • Have these programs executed efficiently. • What would happen if you were using a

“corporate” calendar with hundreds of thousands of entries?

Problem 3. Concurrency and Reliability

• Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?

• Suppose the system crashes while we are changing the calendar. How do we recover our work?

Example

Suppose a manager schedule a meeting with his staff today (3:00pm) and at the same time his secretary schedules him to meet with the Chairman. They both see that the time is open, but presumably only one of the two meetings will show on the calendar later.

What is a DBMS?• A database (DB) is a large, integrated

collection of data.• A DB models a real-world enterprise /

DBMS contains information about a particular enterprise.

• A database management system (DBMS) is a software package designed to store and manage databases / set of programs to access the data.

• DBMS provides an environment that is simultaneously convenient, secure and efficient to use.

• Is the software or tool that is used to manage the database and its users.

• A DBMS consist of different components or subsystem.

• Each subsystem or component of the DBMS performs different function(s).

• So a DBMS is collection of different programs but they all work jointly to manage the data stored in the database and its users.

• Database is collection of data, DBMS is tool to manage this data, and both jointly are called database system.

• Organization of data• Efficient retrieval of data• Reliable storage of data• Maintaining consistent data• All these topics are interrelated.

What the DBMS is about

Drawbacks of file systems• In the early days, database applications

were built directly on top of file systems• Drawbacks of using file systems to store

data:– Data redundancy and inconsistency

• Multiple file formats, duplication of information in different files

– Difficulty in accessing data • Need to write a new program to carry out each

new task

– Data isolation — multiple files and formats

Drawbacks of file systems (Cont.)

– Integrity problems• Integrity constraints (e.g. account balance >

0) become “buried” in program code rather than being stated explicitly

• Hard to add new constraints or change existing ones

– Atomicity of updates• Failures may leave database in an inconsistent

state with partial updates carried out• Example: Transfer of funds from one account to

another should either complete or not happen at all

Drawbacks of file systems (Cont.)

• Concurrent access by multiple users• Concurrent accessed needed for

performance• Uncontrolled concurrent accesses can

lead to inconsistencies– Example: Two people reading a

balance and updating it at the same time

• Security problems• Hard to provide user access to some, but

not all, data• Database systems offer solutions to all the

above problems

Database Applications

– Banking: all business transactions– Airlines: reservations, schedules– Universities: registration, grades– Sales: customers, products, purchases– Manufacturing: production, inventory,

orders, supply chain– Human resources: employee records,

salaries, tax deductions

Data and Information• Data is the collection of raw facts collected

from any specific environment for a specific purpose.

• Data in itself does not show anything about its environment.

• So to get desired types of results from the data we transform it into information by applying certain processing on it.

• Once we have processed data using different methods data is converted into meaningful form and that form of the Data is called information

Levels of AbstractionDBMS users are not computer trained, developers hide the complexity from users thro’ levels of abstraction, to simplify user’s interactions with the system

• Physical level: lowest level describes how a record (e.g., customer) is stored.

• Logical level: next higher level describes what data stored in database, and the relationships among the data.

type customer = recordcustomer_id : string; customer_name : string;customer_street : string;customer_city : integer;

end;

• View level: highest level describes only part of the entire database. DBMS may provide many views for the same database.

View of DataAn architecture for a database system

Data Abstraction

LogicalLevel

Physical Level

View Level

View 1 View 2 View n…

How data is actually stored ? e.g. are we using disks ? Which file system ?

What data is stored ? describe data properties such as data semantics, data relationships

What data users and application programs see ?

Instances and Schemas

Similar to types and variables in programming languages

• Schema – the logical structure / overall design of the database – Example: The database consists of information about a

set of customers and accounts and the relationship between them

– Analogous to type information of a variable in a program

– Physical schema: database design at the physical level

– Logical schema: database design at the logical level

• Instance – the actual content of the database at a particular point in time – Analogous to the value of a variable

Data Independence• Physical Data Independence – the ability

to modify the physical schema without changing the logical schema– Applications depend on the logical schema– In general, the interfaces between the various

levels and components should be well defined so that changes in some parts do not seriously influence others.

• Logical Data Independence – the ability to modify the logical schema without causing application programs to be rewritten. It is difficult to achieve since application programs are heavily dependent on logical structure of data that they access.

Database LanguageLanguage for accessing and manipulating the data organized by the appropriate data model• One to specify database schema, storage

structure and access methods (DDL) and • other to express database queries and

updates (DML)Data Definition Language (DDL) • Specification notation for defining the database

schema• DDL compiler generates a set of tables stored in

a data dictionary, a file that contains metadata, i.e. data about data. Data dictionary is consulted before reading or modifying the actual data

Data Manipulation Language (DML)

• Language for accessing and manipulating the data organized by the appropriate data model– DML also known as query language

• Two classes of languages – Procedural – user specifies what data is

required and how to get those data – Declarative (nonprocedural) – user

specifies what data is required without specifying how to get those data

• SQL is the most widely used query language

Data Modeling• A data model is a collection of concepts for describing

data properties and domain knowledge:– Data relationships– Data semantics– Data constraints– Relational Model

• Only one abstract concept• Closer to the physical representation on disk• Normalization

• Entity-Relationship data model (mainly for database design)

• Relational model• Object-based data models (Object-

oriented and Object-relational)• Semistructured data model (XML)• Other older models:

– Network model – Hierarchical model

• Models an enterprise as a collection of entities and relationships– Entity: a “thing” or “object” in the enterprise that is

distinguishable from other objects• Described by a set of attributes

– Relationship: an association among several entities

e.g. each employee is an entity described by empno, empname, designation etc.

– Entity-relationship Model• Diagrammatic representation• Easier to work with• Syntax not important, but remember the “meaning”• Remember what you can model

Entity-Relationship Model

• Entity set – set of all entities of the same type

• Relationship set – set of all relationships of same type

• Mapping cardinality – number of entities to which another entity can be associated via relationship set

• The overall logical design of database can be expressed graphically by an E-R diagram, which has following components:

• Rectangles - entity set• Ellipses –attributes of an entity• Diamonds –relationship among

entity sets• Lines – link attributes to entity sets

and entity sets to relationship sets

Each component of E-R diagram is labeled with entity or relationship that it represents

Example of schema in the entity-relationship model

Relational Model• It uses collection of tables to

represent data as well as relationship among those data.

• Each table has multiple columns, each column has unique name.

Other Models Network Model

• Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers.

• Records are organized as a collection of arbitrary graph

Other Models Cont.Hierarchical Model

• Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers.

• Records are organized as a collection of trees rather than arbitrary graph

Database UsersUsers are differentiated by the way they

expect to interact with the system• End Users

access to the database for querying, updating, and generating reports

Casual end users: occasionally access the databaseneed different information each time learn only a few facilities that they may use repeatedly.use a sophisticated database query language to specify their

requests typically middle- or high-level managers or other occasional

browsers

• Application programmers – interact with system through DML calls

• Sophisticated users – form requests in a database query language

Database Users• Specialized users – write specialized

database applications that do not fit into the traditional data processing framework

• Naïve users – invoke one of the permanent application programs that have been written previously– E.g. people accessing database over the

web, bank tellers, clerical staff

Database Users

• System Analysts and Application Programmers

– Determine the requirements of end users, especially naive and parametric end users, and develop specifications for canned transactions that meet these requirements

– Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions

• Workers behind the Scene

– Typically do not use the database for their own purposes

– DBMS system designers and implementers

– design and implement the DBMS modules (for implementing the catalog, query language, interface processors, data access, concurrency control, recovery, and security. ) and interfaces as a software package

Database Users• Tool developers

– Tools are optional packages that are often purchased separately

– include packages for database design, performance monitoring, natural language or graphical interfaces, prototyping, simulation, and test data generation.

• Operators and maintenance personnel – system administration personnel who are

responsible for the actual running and maintenance of the hardware and software environment for the database system

Database Administrator• Coordinates all the activities of the database

system; the database administrator has a good understanding of the enterprise’s information resources and needs.

• Database administrator's duties include:– Schema definition– Storage structure and access method definition– Schema and physical organization modification– Granting user authority to access the database– Specifying integrity constraints– Monitoring performance and responding to

changes in requirements

Database Actors• Database Administrators

– In a database environment, the primary resource is the database itself and the secondary resource is the DBMS and related software

– authorizing access to the database– coordinating and monitoring its use– acquiring software and hardware resources as

needed

• Database Designers

– identifying the data to be stored in the database

– choosing appropriate structures to represent and store this data undertaken before the database is actually implemented and populated with data

– communicate with all prospective database users, in order to understand their requirements

– develop a view of the database that meets the data and processing requirements for each group of users

– These views are then analyzed and integrated with the views of other user groups. The final database design must be capable of supporting the requirements of all user groups

Database Actors

Overall Database System Structure

Query Processor Components• DML Compiler – translates DML statements in

a query language into low level instructions

• DDL interpreter – interprets DDL statements and records them in a set of tables containing metadata.

• Query evaluation engine – executes low-level instructions generated by DML compiler

Storage Manager Components• Authorization & integrity manager –

tests for satisfaction of integrity constraints and checks the authority of user to access the data

• Transaction manager – ensures the consistency of the database despite system failures, and concurrent transaction executions proceed with conflicting

• File Manager – Manages the allocation of space on disk storage and the data structures used to represent information on disk

• Buffer Manager – responsible for fetching data from disk storage into main memory, and deciding what data to catch in memory

Data structures are required as a part of physical implementation

• Data files – stores database itself• Data dictionary - stores meta data

about the structure of database • Indices – which provides fast access to

data items that hold particular values• Statistical data – which stores

statistical information about the data in the database, used by strategy selector

DATABASE SYSTEM ARCHITECTURE

Storage Management• Storage manager is a program module

that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

• The storage manager is responsible to the following tasks: – Interaction with the file manager – Efficient storing, retrieving and updating of data

• Issues:– Storage access– File organization– Indexing and hashing

Transaction Management• A transaction is a collection of operations

that performs a single logical function in a database application

• Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

• Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.