RDBMS TOTAL

Click here to load reader

description

Relational database managment covers all topic of rdbms(almost).data design and structure,,data languages

Transcript of RDBMS TOTAL

prepered by kailash dhirwani (

WHAT IS DBMS ? - To be able to carry out operations like insertion, deletion and retrieval, the database needs to be managed by a substantial piece of software; this software is usually called a Database Management System(DBMS). - A DBMS is usually a very large software package that enables many different tasks including the provision of facilities to enable the user to access and modify information in the database. - Data Description Languages (DDL) and Data Manipulation Languages (DML) are needed for manipulating and retrieving data stored in the DBMS. These languages are called respectively. An architecture for database systems, called the three-schema architecture was proposed to help achieve and visualize the important characteristics of the database approach.

What is a Database?A collection of related pieces of data: Representing/capturing the information about a real-world enterprise or part of an enterprise. Collected and maintained to serve specific data management needs of the enterprise. Activities of the enterprise are supported by the database and continually update the database. University Database: Data about students, faculty, courses, researchlaboratories, course registration/enrollment etc. Reflects the state of affairs of the academic aspects of the university. Purpose: To keep an accurate track of the academic activities of the university. RDBMS

1

A Relational Database Management System is a program that lets you create, update and administrator a relational database. The primary rule for RDBMS is that the Data should be stored in the form of tables. Most of the RDBMSs use the Structures Query Language to access the database. When a database undergoes NORMALISATION it is called as a RDBMS.

THE THREE-SCHEMA ARCHITECTURE: The goal of the three-schema architecture is to separate the user applications and the physical database. In this architecture, schemas can be defined at 3 levels : 1. Internal level or Internal schema : Describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. 2. Conceptual level or Conceptual schema : Describes the structure of the whole database for a community of users. It hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Implementation data model can be used at this level. 3. External level or External schema : It includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user is interested in and hides the rest of the database from user. Implementation data model can be used at this level.

2

What is the purpose of the mappings in the Three Schema Architecture? Is the user or the DBMS responsible for using the mappings? Ans: The purpose of the mappings in the Three Schema Architecture is to describe how a schema at a higher level is derived from a schema at a lower level. The DBMS, not the user, is responsible for using the mappings

IMPORTANT TO REMEMBER : Data and meta-data three schemas are only meta-data(descriptions of data). data actually exists only at the physical level. Mapping DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into the internal schema. requires information in meta-data on how to accomplish the mapping among various levels. overhead(time-consuming) leading to inefficiencies. few DBMSs have implemented the full three-schema architecture.

3

DATA INDEPENDENCE The disjointing of data descriptions from the application programs (or user-interfaces) that uses the data is called data independence. Data independence is one of the main advantages of DBMS. The three-schema architecture provides the concept of data independence, which means that upper-levels are unaffected by changes to lower-levels. The three schemas architecture makes it easier to achieve true data independence. There are two kinds of data independence. - Physical data independence * The ability to modify the physical scheme without causing application programs to be

rewritten. * Modifications at this level are usually to improve performance. - Logical data independence * The ability to modify the conceptual scheme without causing application programs to be rewritten. * Usually done when logical structure of database is altered. Logical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages. . What is a DBMS? Ans: A database management system (DBMS) is a collection of software that supports the creation, use, and maintenance of databases. Initially, DBMSs provided efficient storage and retrieval of data. Due to marketplace demands and product innovation, DBMSs have evolved to provide a broad range of features for data acquisition, storage, dissemination, maintenance, retrieval, and formatting. The evolution of these features has made DBMSs rather complex What is SQL? Ans: The Structured Query Language (SQL) is an industry standard language supported by most DBMSs. SQL contains statements for data definition, data manipulation, and data control. A DBMS has to be persistent, that is it should be accessible when the program created the data ceases to exist or even the application that created the data restarted. A DBMS also has to provide some uniform methods independent of a specific application for accessing the information that is stored. RDBMS is a Relational Data Base Management System Relational DBMS. This adds the additional condition that the system supports a tabular structure for the data, with enforced relationships between the tables. This excludes the databases that don't support a tabular structure or don't enforce relationships between tables.

4

Many DBA's think that RDBMS is a Client Server Database system but thats not the case with RDBMS. Yes you can say DBMS does not impose any constraints or security with regard to data manipulation it is user or the programmer responsibility to ensure the ACID PROPERTY of the database whereas the rdbms is more with this regard bcz rdbms define the integrity constraint for the purpose of holding ACID PROPERTY. I have found many answers on many websites saying that DBMS are for smaller organizations with small amount of data, where security of the data is not of major concern and RDBMS are designed to take care of large amounts of data and also the security of this data. and this is completely wrong by definition of RDBMS and Dbms

Different abstract levels - a widely accepted general architecture for a database - database described by three abstract levels - internal schema (physical database) - conceptual schema (conceptual database) - external schema (view) Objectives - insulation of application programs and data - support of multiple user views - use of schema to store the DB description (mete-data) The Three Schema Architecture External schema - describes a subset of the database that a particular user group is interested in, according to the format the format user wants, and hides the rest - may contain virtual data that is derived from the files, but is not explicitly stored Conceptual schema - hides the details of physical storage structures and concentrates on describing entities, data types, relationships, operations, and constraints. Internal schema - describes the physical storage structure of the DB - uses a low-level (physical) data model to describe

5

the complete details of data storage and access paths Three Schema Architecture Data and meta-data - three schemas are only meta-data (descriptions of data) - data actually exists only at the physical level Mapping - DBMS must transform a request specified on an external schema into a request against the conceptual schema, and then into the internal schema - requires information in meta-data on how to accomplish the mapping among various levels - overhead (time-consuming) leading to inefficiencies - few DBMSs have implemented the full three-schema architecture Benefits of Three Schema Architecture Logical data independence - the capacity to change the conceptual schema without having to change external schema or application prgms ex: Employee (E#, Name, Address, Salary) A view including only E# and Name is not affected by changes in any other attributes. Physical data independence - the capacity to change the internal schema without having to change the conceptual (or external) schema - internal schema may change to improve the performance (e.g., creating additional access structure) - easier to achieve logical data independence, because application programs are dependent on logical structures Data Models Data abstraction - one fundamental characteristic of the database approach - hides details of data storage that are not needed by most database users and applications Data model - a set of data structures and conceptual tools used to describe the structure of a database (data types, relationships, and constraints) - used in the definition of the conceptual, external, and internal schema - must provide means for DB designers to represent the real-world information completely and naturally

Data ModelsHigh-level (conceptual) data models - use concepts such as entities, attributes, relationships 6

- object-based models: ER model, OO model Representational (implementation) data models - most frequently used in commercial DBMSs - record-based models: relational, hierarchical, network Low-level (physical) data models - to describe the details of how data is stored - captures aspects of database system implementation: record structures (fixed/variable length) and ordering, access paths (key indexing), etc. Schemas and Instances In any data model, it is important to distinguish between the description of the database and the database itself.

Data Models ER model - popular high-level conceptual model used in DB design

- proposed by P. Chen in 1976 (ACM TODS) - perception of real-world consisting of a collection of entities and relationships among them OO model - DB is defined in terms of objects, their properties, and their operations (methods) Relational model - represents a DB as a collection of tables Network model - represents DB as record types and 1:N relationships Hierarchical model - represents data as hierarchical tree structures..oo..

Logical and Physical Data OrganizationLogical organization - conceptual or logical format of the data (e.g., employee record has E#, Name, Address) Physical organization - actual structure of the data and all supporting access structures (e.g., index)

7

(e.g., employee: E# 32 bits Name 30 bytes Address 50 bytes) Benefit - application programs must know the logical organization but the physical organization is an implementation detail they need not know

A Data Manipulation Language (DML) statement is executed when youo Add new rows to a table o Modify existing rows in a table o Remove existing rows from a table A transaction consists of a collection of DML statements that form a logical unit of work. Adding a new row to a table is accomplished using the INSERT statement INSERT INTO table [column, column, column] VALUES (value, value, value); Because you can insert a new row that contains values for each column, the column list is not required in the INSERT clause. However, if you do not use the column list, the values must be listed according to the default order of the columns in the table. You can insert NULL values by simply omitting the column value, or by specifying either () or NULL as the item to be inserted. Heres an example of an insert SQL> insert into emp 2 values (2296, AROMANO, SALESMAN, 7782, 3 TO_DATE(FEB 3, 1997, MON DD, YYYY), 4 1300, NULL, 10); 1 row created. Note that the TO_DATE() function formats the string into a DATE datatype. Creating a Script with Customized Prompts o ACCEPT stores the value in the variable o PROMPT displays your customized text. Lets look at the following examplefirst, we create the following script and save it with the following name scriptsWithCustomizedPrompts.sql: ACCEPT department_id PROMPT Please enter the department number: ACCEPT department_name PROMPT Please enter the department name: ACCEPT location PROMPT Please enter the location: INSERT INTO dept (deptno, dname, loc) VALUES (&department_id, &department_name, &location);

8

We then run the script using the START keyword SQL> START scriptsWithCustomizedPrompts Please enter the department number:90 Please enter the department name:PAYROLL Please enter the location:HOUSTON old 2: VALUES (&department_id, &department_name, &location) new 2: VALUES (90, PAYROLL, HOUSTON) 1 row created. Notice that the ACCEPT keyword allows us to accept the value entered by the user and to store it in thein the variable name that follows it, which , by the way, does not require the substitution parameter (&). However, when we do make use of its contents in the INSERT statement, we must include the ampersand! When the script is run, the user is asked to provide the values for all three variables here defined: department_id, department_name, and location. Copying Rows from another Table: Write your INSERT statement with a subquery. Do not use the VALUES clause. Match the number of columns in the INSERT clause to those in the subquery. SQL> create table managers(id number(4), name varchar2(10), salary number(7,2), hiredate date) 2/ Table created. SQL> INSERT INTO managers(id, name, salary, hiredate) 2 SELECT empno, ename, sal, hiredate 3 FROM emp 4 WHERE job = MANAGER; 3 rows created. SQL> select * 2 from managers 3/ ID NAME SALARY HIREDATE --------- ---------- --------- --------7566 JONES 2000 02-APR-81 7698 BLAKE

9

2000 01-MAY-81 7782 CLARK 2000 09-JUN-81 For changing date in a table, we make use of the UPDATE statement SQL> update managers

Introduction to Structured Query Language (SQL)SQL allows users to access data in relational database management systems, such as Oracle, Sybase, Informix, Microsoft SQL Server, Access, and others, by allowing users to describe the data the user wishes to see. SQL also allows users to define the data in a database, and manipulate that data. The SQL used in this document is "ANSI", or standard SQL,

Table of ContentsBasics of the SELECT Statement Conditional Selection Relational Operators Compound Conditions IN & BETWEEN Using LIKE Joins Keys Performing a Join Eliminating Duplicates Aliases & In/Subqueries Aggregate Functions Views Creating New Tables Altering Tables Adding Data Deleting Data Updating Data Indexes GROUP BY & HAVING More Subqueries EXISTS & ALL UNION & Outer Joins Embedded SQL Common SQL Questions 10

Nonstandard SQL Syntax Summary Exercises Important Links

Basics of the SELECT StatementIn a relational database, data is stored in tables. An example table would relate Social Security Number, Name, and Address: EmployeeAddressTable SSN FirstName LastName Address City State

512687458 Joe Smith 83 First Street Howard Ohio 758420012 Mary Scott 842 Vine Ave. Losantiville Ohio 102254896 Sam Jones 33 Elm St. Paris New York 876512563 Sarah Ackerman 440 U.S. 110 Upton Michigan Now, let's say you want to see the address of each employee. Use the SELECT statement, like so:SELECT FirstName, LastName, Address, City, State FROM EmployeeAddressTable;

The following is the results of your query of the database: First Name Last Name Address City State Joe Smith 83 First Street Howard Ohio Mary Scott 842 Vine Ave. Losantiville Ohio 11

Sam Jones 33 Elm St. Paris New York Sarah Ackerman 440 U.S. 110 Upton Michigan To explain what you just did, you asked for the all of data in the EmployeeAddressTable, and specifically, you asked for the columns called FirstName, LastName, Address, City, and State. Note that column names and table names do not have spaces...they must be typed as one word; and that the statement ends with a semicolon (;). The general form for a SELECT statement, retrieving all of the rows in the table is:SELECT ColumnName, ColumnName, ... FROM TableName;

To get all columns of a table without typing all column names, use:SELECT * FROM TableName;

Each database management system (DBMS) and database software has different methods for logging in to the database and entering SQL commands; see the local computer "guru" to help you get onto the system, so that you can use SQL

Data model

A data model provides the details of information to be stored, and is of primary use when the final product is the generation of computer software code for an application or the

12

preparation of a functional specification to aid a computer software make-or-buy decision. The figure is an example of the interaction between process and data models. According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment

Database model

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include: This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.

Hierarchical model: In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Network model: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members. Relational model: is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language. Star schema is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

13

Hirchiel model

network model

relational model

Data structure

A binary tree, a simple type of branching linked data structure. A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type. A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated grammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system. The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify abstractions of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people

Data flow diagramA data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system. It differs from the flowchart as it shows the data flow instead of the control flow of the program. A data flow diagram can also be used for the visualization of data processing (structured design). Data flow diagrams were invented by Larry Constantine, the original developer of structured design,[19] based on Martin and Estrin's "data flow graph" model of computation.

14

It is common practice to draw a context-level Data flow diagram first which shows the interaction between the system and outside entities. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level Data flow diagram is then "exploded" to show more detail of the system being modeled

Object modelAn object model in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system. For example, the Document Object Model (DOM) [3] is a collection of objects that represent a page in a web browser, used by script programs to examine and dynamically change the page. There is a Microsoft Excel object model[21] for controlling Microsoft Excel from another program, and the ASCOM Telescope Driver[22] is an object model for controlling an astronomical telescope. In computing the term object model has a distinct second meaning of the general properties of objects in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java object model, the COM object model, or the object model of OMT. Such object models are usually defined using concepts such as class, message, inheritance, polymorphism, and encapsulation. There is an extensive literature on formalized object models as a subset of the formal semantics of programming languages

Data propertiesSome important properties of data for which requirements need to be met are:

definition-related properties o relevance: the usefulness of the data in the context of your business. o clarity: the availability of a clear and shared definition for the data.

15

o

consistency: the compatibility of the same type of data from different sources.

Another kind of data model describes how to organize data using a database management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as the physical data model, but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns. While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). {Presumably we call ourselves systems analysts because no one can say systems synthesists.} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships./o

Concurrency Control & RecoveryIn computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other this occurs in programs like SharePoint where two users edit the same document at the same time. This can be avoided by using the check-in and check-out feature in SharePoint. Versioning must be turned on at the site level for this to work Concurrency Control Provide correct and highly available access to data in the presence of concurrent access by large and diverse user populations Recovery Ensures database is fault tolerant, and not corrupted by software, system or media failure 7x24 access to mission critical data Existence of Concrncy&Recovry allows applications to be written without explicit concern for concurrency and fault tolerance

16

Database transaction and the ACID rulesthe concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs):

Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a transaction is completed (committed or aborted respectively). In other words, to the outside world a committed transaction appears

(by its effects) to be indivisible, atomic, and an aborted transaction does not leave effects at all, as if never existed. Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction must transform a database from one consistent state to another consistent state (it is the responsibility of the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends to perform while maintaining the integrity rules). Thus since a database can be normally changed only by transactions, all the database's states are consistent. An aborted transaction does not change the state. Isolation - Transactions cannot interfere with each other. Moreover, usually the effects of an incomplete transaction are not visible to another transaction. Providing isolation is the main goal of concurrency control. Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording the transaction's effects and its commit event in a non-volatile memory). Thus concurrency control is an essential element for correctness in any system where two database transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose database system. Consequently a vast body of related research has been accumulated since database systems have emerged in the early 1970s. A well established concurrency control theory exists for database systems: serializability theory, which allows to effectively design and analyze concurrency control methods and mechanisms.

17

transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable result may occur. Here are some typical examples: The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not.

Many methods for concurrency control exist. Most of them can beimplemented within either main category above. The major methods, which have each many variants, and in some cases may overlap or be combined, are: Locking (e.g., Two-phase locking - 2PL) - Two-Phase Locking (2PL)Two-Phase Locking Protocol Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. A transaction can not request additional locks once it releases any locks. If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object

Controlling access to data by locks assigned to the data. Access of a transaction to a data item (database object) locked by another transaction may be blocked (depending on lock type and access operation type) until lock release. Serialization graph checking (also called Serializability, or Conflict, or Precedence graph checking) - Checking for cycles in the schedule's graph and breaking them by aborts. Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or checking access to data by timestamp order. Commitment ordering (or Commit ordering; CO) - Controlling or checking transactions' order of commit events to be compatible with their respective precedence order.

18

Lock Management Lock and unlock requests are handled by the lock manager Lock table entry: Number of transactions currently holding a lock Type of lock held (shared or exclusive) Pointer to queue of lock requests Locking and unlocking have to be atomic operations Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

DeadlocksDeadlock: Cycle of transactions waiting for locks to be released by each other. Two ways of dealing with deadlocks: Deadlock prevention Deadlock detection Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 Deadlock Prevention Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits If a transaction re-starts, make sure it has its original timestamp Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Deadlock DetectionCreate a waits-for graph: Nodes are transactions There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock

19

Periodically check for cycles in the waits-for graph

Database Systems Introduction Dr P Sreenivasa Kumar Professor CS&E Department I I T Madras Introduction What is a Database? A collection of related pieces of data: Representing/capturing the information about a real-world enterprise or part of an enterprise. Collected and maintained to serve specific data management needs of the enterprise. Activities of the enterprise are supported by the database and continually update the database. University Database: Data about students, faculty, courses, researchlaboratories, course registration/enrollment etc. Reflects the state of affairs of the academic aspects of the university. Purpose: To keep an accurate track of the academic activities of the university. An Example Database Management System (DBMS) A general purpose software system enabling: Creation of large disk-resident databases. Posing of data retrieval queries in a standard manner. Retrieval of query results efficiently. Concurrent use of the system by a large number of users in a consistent manner. Guaranteed availability of data irrespective of system failures. OS File System Storage Based Approach Files of records used for data storage data redundancy wastage of space maintaining consistency becomes difficult Record structures hard coded into the programs structure modifications hard to perform Each different data access request (a query)

20

performed by a separate program difficult to anticipate all such requests Creating the system requires a lot of effort Managing concurrent access and failure recovery are difficult DBMS Approach DBMS separation of data and metadata flexibility of changing metadata program-data independence Data access language standardized SQL ad-hoc query formulation easy System development less effort required concentration on logical level design is enough components to organize data storage process queries, manage concurrent access, recovery from failures, manage access control are all available Data Model Collection of conceptual tools to describe the database at a certain level of abstraction. Conceptual Data Model a high level description useful for requirements understanding. Representational Data Model describing the logical representation of data without giving details of physical representation. Physical Data Model description giving details about record formats, file structures etc. E/R (Entity/Relationship) Model A conceptual level data model. Provides the concepts of entities, relationships and attributes. The University Database Context Entities: student, faculty member, course, departments etc. Relationships: enrollment relationship between student & course, employment relationship between faculty member, department etc. Attributes: name, rollNumber, address etc., of student entity, name, empNumber, phoneNumber etc., of faculty thee-schema Architecture(1/2) Logical Level Schema

21

Describes the logical structure of the entire database. No physical level details are given. Physical Level Schema Describes the physical structure of data in terms of record formats, file structures, indexes etc. Remarks Views are optional - Can be set up if the DB system is very large and if easily identifiable user-groups exist The logical scheme is essential Modern RDBMSs hide details of the physical layer Three-schema Architecture(2/2) The ability to modify physical level schema without affecting the logical or view level schema. Performance tuning modification at physical level creating a new index etc. Physical Data Independence modification is localized achieved by suitably modifying PL-LL mapping. a very important feature of modern DBMS. Physical Data Independence Three Schema Arch Logical Data Independence The ability to change the logical level scheme without affecting the view level schemes or application programs Adding a new attribute to some relation no need to change the programs or views that dont require to use the new attribute Deleting an attribute no need to change the programs or views that use the remaining data view definitions in VL-LL mapping only need to be changed for views that use the deleted attribute .

Functional dependencyFunctional dependencies are represented, associated with a particular schema, by a set of elements found in the antecedent, and set of elements in the consequent. These functional dependencies can be manipulated by application of Armstrongs axioms. This manipulation, as well as automatic generation of candidate keys, is handled by a Solver. The Solver is invoked when a user selects an axiom to apply.

22

A functional dependency (FD) is a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X Y) if and only if each X value is associated with precisely one Y value. Customarily we call X the determinant set and Y the dependent attribute. Thus, given a tuple and the values of the attributes in X, one can determine the corresponding value of the Y attribute. For the purposes of simplicity, given that X and Y are sets of attributes in R, X Y denotes that X functionally determines each of the members of Y - in this case Y is known as the dependent set. Thus, a candidate key is a minimal set of attributes that functionally determine all of the attributes in a relation. (Note: the "function" being discussed in "functional dependency" is the function of identification.)

Constraint between two sets of attributesFormal method for grouping attributes DB as one single universal relation/-literal R = {A1,A2,,An} Two sets of attributes, X subset R,Y subset R Functional dependency (FD or f.d.) X -> Y If t1[X] = t2[X], then t1[Y] = t2[Y] Values of the Y attribute depend on value of X X functionally determines Y, not reverse necessarily

A functional dependency FD: X Y is called trivial if Y is a subset of X. The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible.

Irreducible function depending setA functional depending set S is irreducible if the set has following three properties: 1. Each right set of a functional dependency of S contains only one attribute. 2. Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information). 3. Reducing any functional dependency will change the content of S. Sets of Functional Dependencies(FD) with these properties are also called canonical or minimal.

23

Properties of functional dependenciesGiven that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are Armstrong's axioms, which are used in database normalization:

Subset Property (Axiom of Reflexivity): If Y is a subset of X, then X Y Augmentation (Axiom of Augmentation): If X Y, then XZ YZ Transitivity (Axiom of Transitivity): If X Y and Y Z, then X Z

From these rules, we can derive these secondary rules:

Union: If X Y and X Z, then X YZ Decomposition: If X YZ, then X Y and X Z Pseudotransitivity: If X Y and WY Z, then WX Z

Equivalent sets of functional dependencies are called covers of each other. Every set of functional dependencies has a canonical cover.

Inclusion dependenciesINDs (which can say, for example, that every manager is an employee) are studied, including their interaction with functional dependencies, or FDs. A simple complete axiomatization for INDs is presented, and the decision problem for INDs is shown to be PSPACE-complete. (The decision problem for INDs is the problem of determining whether or not C logically implies u, given a set Z of INDs and a single IND u).

As an example, an inclusion dependency can say that every MANAGER entry of the R relation appears as an EMPLOYEE entry of the S relation. In general, an inclusion dependency is of the form We note that INDs differ from other commonly studied database dependencies in two important respects. First, INDs may be interrelational, whereas the others deal with a single relation at a time. Second, INDs are not typed [Fa4]; they are special cases of extended embedded implicational dependencies [Fa4], for which the existence of Armstrong-like databases have been proven. We show that INDs have asimple complete axiomatization. However, we also show the rather surprising fact that the decision problem for INDs

DATA MANIPULATION LANGUAGE(D M L)24

These commands are used to append, change or remove the data in a Table. COMMIT/ROLLBACK statement should be given to make the changes permanent or to revert back. 2.2.1 INSERT Using this command we can append data into tables. Syntax INSERT INTO (, , ...) VALUES (column1_value, column2_value, ...); INSERT INTO (, , ...) VALUES (column2-value, column1-value, ...); INSERT INTO VALUES (value1, value2, ...); Example INSERT INTO Employee(empno, ename, salary, hire_date, gender, email) VALUES(1234, JOHN, 8000, 18-AUG-80, M, [email protected]); INSERT INTO Employee(email , ename, empno, hire_date, gender, salary) VALUES([email protected], RHONDA, 1235, 24-JUL-81, F, 7500); INSERT INTO Employee VALUES(1236, JACK, 15000, 23-SEP-79, m, [email protected]);

42.2.2 UPDATE This command is used to modify the data existing in the tables. Syntax UPDATE SET = ; UPDATE SET = WHERE ; Example UPDATE Employee SET salary = 15000; UPDATE employee SET salary = 15000 WHERE empno = 1235; 2.2.3 DELETE This command is used to remove the data from the tables. Syntax DELETE FROM ; DELETE FROM WHERE ; Example DELETE FROM Employee; DELETE FROM Employee WHERE empno = 1236

Oracle FormsOracle Forms is a software product for creating screens that interact with an Oracle database. It has an IDE including an object navigator, property sheet and code editor that uses PL/SQL. It was originally developed to run server-side in character mode terminal sessions. It was ported to other platforms, including Windows, to function in a client server environment. Later versions were ported to Java where it runs in a Java EE container and can integrate with Java and web services. The primary focus of Forms is to create data entry systems that access an Oracle database Oracle Forms accesses the Oracle database and generates a screen that presents the data. The source form (*.fmb) is compiled into an "executable" (*.fmx), that is run (interpreted) by the forms runtime module. The form is used to view and edit data in

25

database-driven applications. Various GUI elements, such as buttons, menus, scrollbars, and graphics can be placed on the form. The environment supplies built-in record creation, query, and update modes, each with its own default data manipulations. This minimizes the need to program common and tedious operations, such as creating dynamic SQL, sensing changed fields, and locking rows. As is normal with event driven interfaces, the software implements event-handling functions called triggers which are automatically invoked at critical steps in the processing of records, the receipt of keyboard strokes, and the receipt of mouse movements. Different triggers may be called before, during, and after each critical step. Each trigger function is initially a stub, containing a default action or nothing. Programming Oracle Forms therefore generally consists of modifying the contents of these triggers in order to alter the default behavior. Some triggers, if provided by the programmer, replace the default action while others augment it. As a result of this strategy, it is possible to create a number of default form layouts which possess complete database functionality yet contain no programmer-written code at all.

HistoryOracle Forms is sold and released separately from the Oracle database. However, major releases of an Oracle database usually result in a new major version of Oracle Forms to support new features in the database. HOW IT WOKS Oracle Forms accesses the Oracle database and generates a screen that presents the data. The source form (*.fmb) is compiled into an "executable" (*.fmx), that is run (interpreted) by the forms runtime module. The form is used to view and edit data in database-driven applications. Various GUI elements, such as buttons, menus, scrollbars, and graphics can be placed on the form. The environment supplies built-in record creation, query, and update modes, each with its own default data manipulations. This minimizes the need to program common and tedious operations, such as creating dynamic SQL, sensing changed fields, and locking rows. As is normal with event driven interfaces, the software implements event-handling functions called triggers which are automatically invoked at critical steps in the processing of records, the receipt of keyboard strokes, and the receipt of mouse movements. Different triggers may be called before, during, and after each critical step.

26

Each trigger function is initially a stub, containing a default action or nothing. Programming Oracle Forms therefore generally consists of modifying the contents of these triggers in order to alter the default behavior. Some triggers, if provided by the programmer, replace the default action while others augment it. As a result of this strategy, it is possible to create a number of default form layouts which possess complete database functionality yet contain no programmer-written code at all.

The first version of Oracle Forms was named Interactive Application Facility (IAF). This had two main components, the compiler (Interactive Application Generator - IAG) and the runtime interpreter (Interactive Application Processor - IAP). This provided a character mode interface to allow users to enter and query data from an Oracle database. IAF was released with Oracle Database Version 2, the first commercial version of Oracle. It was renamed to FastForms with Oracle Database version 4 and added an additional tool to help generate a default form to edit with standard tool (IAG). Renamed to SQL*Forms version 2 with the Oracle 5 database. Oracle Forms 2.3 was character based, and did not use PL/SQL. The source file was an *.INP ASCII file. It was common for developers to edit the INP file directly although that was not supported by Oracle. This version used its own primitive and unfriendly built-in language, augmented by user exitscompiled language code linked to the binary of the Oracle-provided run-time Oracle Forms 3 was character based, and was the first real version of Forms, using PL/SQL. All subsequent versions are a development of this version. It could run under X but did not support any X interface specific features such as checkboxes. The source file was an *.INP ASCII file. The IDE was vastly improved from 2.3 which dramatically decreased the need to edit the INP file directly, although this was still a common practice. Forms 3 automatically generated triggers and code to support some database constraints. Constraints could be defined, but not enforced in the Oracle 6 database at this time, so Oracle used Forms 3 to claim support for enforcing constraints. There was a "GUI" version of Forms 3 which could be run in environments such as X Window, but not Microsoft Windows. This had no new trigger types, which made it difficult to attach PL/SQL to GUI events such as mouse movements. Oracle Forms version 4.0 was the first "true" GUI based version. A character based runtime was still available for certain customers on request. The arrival of Microsoft Windows 3 forced Oracle to release this GUI version of Forms for commercial reasons. Forms 4.0 accompanied Oracle version 6 with support for Microsoft Windows and X

27

Window. This version was notoriously buggy and introduced an IDE that was unpopular with developers. This version was not used by the Oracle Financials software suite. The 4.0 source files were named *.FMB and were binary. Oracle Forms version 4.5 was really a major release rather than a "point release" of 4.0 despite its ".5" version number. It contained significant functional changes and a brand new IDE, replacing the unpopular IDE introduced in 4.0. It is believed to be named 4.5 in order to meet contractual obligations to support Forms 4 for a period of time for certain clients. It added GUI-based triggers, and provided a modern IDE with an object navigator, property sheets and code editor. Due to conflicting operational paradigms, Oracle Forms version 5, which accompanied Oracle version 7, featured custom graphical modes tuned especially for each of the major systems. However, its internal programmatic interface remained system-independent. It was quickly superseded by Forms 6. Forms 6 was released with Oracle 8.0 database; it was rereleased as Forms 6i with Oracle 8i. This was basically Forms 4.5 with some extra wizards and bug-fixes. But it also included the facility to run inside a web server. A Forms Server was supplied which solved the problem of adapting Oracle Forms to a three-tier, browser-based delivery, without incurring major changes in its programmatic interface. The complex, highly interactive form interface was provided by a Java applet which communicated directly with the Forms server. However the web version did not work very well over HTTP. A fix from Forms 9i was retrofitted to later versions of 6i to address this. The naming and numbering system applied to Oracle Forms underwent several changes due to marketing factors, without altering the essential nature of the product. The ability to code in Java, as well as PL/SQL, was added in this period. Forms 9i included many bug fixes to 6i and was a stable version. But it did not include either clientserver or character-based interfaces, and three-tier, browser-based delivery is the only deployment choice from here on. The ability to import java classes means that it can act as a web service client. Forms 10g is actually Forms version 9.0.4, so is merely a rebadged forms 9i. Forms 11 will include some new features, relying on Oracle AQ to allow it to interact with JM

10g is Oracle's award winning Web Rapid Application Development tool, part of the Oracle Developer Suite 10g. It is a highly productive, end-to-end, PL/SQL based, development environment for building enterprise-class, database centric Internet applications. Oracle Application Server 10g provides out-of-the-box optimized Web deployment platform for Oracle Forms 10g. Oracle itself is using 28

Oracle Forms for Oracle Applications

DBMS Function1. Data Dictionary Management 2. Data Storage Management 3. Data Transformation and Presentation 4. Security Management 5. Multi-User Access Control 6. Backup and Recovery Management 7. Data Integrity Management 8. Database Access Languages and Application Programming Interfaces 9. Database Communication Interfaces

Database Model16

Copyright 2004 R.M. Laurie

Collection of logical constructs used to represent the data structure and the data relationships found within the database. Conceptual models focus on what is representedrather than how it is represented.

Entity Relationship DiagramObject Oriented ModelImplementation

models emphasis on how the data is represented in the database or on how the data structures are implemented. Hierarchical Database Model Relational Database ModelObject Oriented Database Model

17

Database Conceptual ModelThree Types of RelationshipsOne-to-many

relationships (1:M)

A painter paints many different paintings, but each one of them is painted by only that painter.PAINTER (1) paints PAINTING (M)

Many-to-many

relationships (M:N)

An employee might learn many job skills, and each job skill might be learned by many employees.EMPLOYEE (M) learns SKILL (N)

One-to-one

relationships (1:1)

Each store is managed by a single employee and each store manager (employee) only manages a single store.

29

EMPLOYEE (1) manages STORE (1)Copyright 2004 R.M. Laurie 18

Logically represented by an upside down tree Each Each child Figure 1.8

parent can have many children has only one parent

Implementation Model: Hierarchical DatabaseCopyright 2004 R.M. Laurie 19

Hierarchical DatabaseAdvantages Conceptual Database

simplicity Relationships defined security Uniform throughout system Data independence Data type cascaded Database integrity Child referenced to parent Efficiency Parent to Child (One to Many)

Disadvantages Complex Difficult

implementation to manage Lacks structural independence Applications programming and use is complex Implementation limitations (Many to Many) Lack of standards

Copyright 2004 R.M. Laurie 20

Implementation Model: Relational DatabaseBasic Structure Relational

DataBase Management Systems (RDBMS) allows operations in a human logical environment The relational database is perceived as a collection of tables. Each table consists of a series of row/column intersections. Tables (or relations) are related to each other by sharing a common entity characteristic The relationship type shown in a relational schema A table yields data and structural independence Microsoft Access is a RDBMS

21

Relational Database ModelAdvantages Structural Improved

independence conceptual simplicity Easier database design, implementation, management, and use Ad hoc query capability (SQL) Powerful database management system Most common DBMS used today

Disadvantages Substantial

hardware and system software

30

overhead Possibility of poor design and implementation Potential islands of information = local DBCopyright 2004 R.M. Laurie

Figure 1.11Copyright 2004 R.M. Laurie 22

Relational Database Model Conceptual Model: Entity Relationship23

E-R models are normally represented in an Entity Relationship Diagram (ERD). An entity is represented by a rectangle. Usually

a Noun or Object of the sentence.

A relationship is represented by a diamond connected to the related entities. Usually

a Verb. by ellipses connected to entity.

An attribute is a characteristic of the entity. Represented Usually Nouns Figure 1.13 Note: Preferred over crow's feet because can use PowerPoint to drawCopyright 2004 R.M. Laurie 25

Entity Relationship ModelAdvantagesExceptional

conceptual simplicity Visual representation Effective communication tool Integrated with the relational database model

DisadvantagesLimited

constraint representation Limited relationship representation No data manipulation language Loss of information contentCopyright 2004 R.M. Laurie 26

Implementation Model: Object-Oriented DBBasic StructureObjects

are abstractions of actual entities. Attributes are properties of an object. A Class is a collection of similar objects with shared structure (attributes) and behavior (methods). Classes are organized in a class hierarchy. An object can inherit the attributes and methods of the classes above it.

31

Figure 1.15: A Comparison: The OO Data Model and the ER ModelCopyright 2004 R.M. Laurie 28

Object-Oriented Database ModelAdvantages Visual

presentation integrity Both structural and data independence Object Oriented Method with Class Inheritance Database

Disadvantages Lack

of Object Oriented Data Model standards navigational data access Steep learning curve High system overhead slows transactions Complex

Normal formsThe normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms.

The main normal forms are summarized below. Normal form First normal form (1NF) Defined by Two versions: E.F. Codd (1970), C.J. Date (2003)[11] Brief definition Table faithfully represents a relation and has no repeating groups 32

Second normal form (2NF) Third normal form (3NF) BoyceCodd normal form (BCNF) Fourth normal form (4NF) Fifth normal form (5NF) Domain/key normal form (DKNF) Sixth normal form (6NF)

E.F. Codd (1971)[12] E.F. Codd (1971)[13]; see +also Carlo Zaniolo's equivalent but differently-expressed definition (1982)[14] Raymond F. Boyce and E.F. Codd (1974)[15] Ronald Fagin (1977)[16] Ronald Fagin (1979)[17] Ronald Fagin (1981)[18]

No non-prime attribute in the table is functionally dependent on a proper subset of a candidate key Every non-prime attribute is nontransitively dependent on every candidate key in the table Every non-trivial functional dependency in the table is a dependency on a superkey Every non-trivial multivalued dependency in the table is a dependency on a superkey Every non-trivial join dependency in the table is implied by the superkeys of the table Every constraint on the table is a logical consequence of the table's domain constraints and key constraints Table features no non-trivial join dependencies at all (with reference to generalized join operator)

C.J. Date, Hugh Darwen, and Nikos Lorentzos (2002)[4]

AnomalyWhen an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the sideeffects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics:

The same information can be expressed on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfullyif, that is, the employee's address is updated on some records but not othersthen the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a 33

Faculty ID, Faculty Name, Faculty Hire Date, and Course Codethus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses except by setting the Course Code to null. This phenomenon is known as an insertion anomaly. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly

34

35

Mapping ConstraintsAn E-R scheme may define certain constraints to which the contents of a database must conform.

Mapping Cardinalities: express the number of entities to which another entity can be associated via a relationship. For binary relationship sets between entity sets A and B, the mapping cardinality must be one of: 1. One-to-one: An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. (Figure 2.3) 2. One-to-many: An entity in A is associated with any number in B. An entity in B is associated with at most one entity in A. (Figure 2.4) 3. Many-to-one: An entity in A is associated with at most one entity in B. An entity in B is associated with any number in A. (Figure 2.5) 4. Many-to-many: Entities in A and B are associated with any number from each other. (Figure 2.6) The appropriate mapping cardinality for a particular relationship set depends on the real world being modeled. (Think about the CustAcct relationship...)

Existence Dependencies: if the existence of entity X depends on the existence of entity Y, then X is said to be existence dependent on Y. (Or we say that Y is the dominant entity and X is the subordinate entity.) For example, o o o o Consider account and transaction entity sets, and a relationship log between them. This is one-to-many from account to transaction. If an account entity is deleted, its associated transaction entities must also be deleted. Thus account is dominant and transaction is subordinate.

ER diagram

36

(ER diagram)Entity relationship diagram is a graphical representation of a data model of an application. It acts as the basis for mapping the application to the relational database

37

The Entity-Relationship (ER) Diagram. One of the key techniques in ER modeling is to document the entity and relationship types in a graphical form called, Entity-Relationship (ER) diagram. Figure 2 is a typical ER diagram. The entity types such as EMP and PROJ are depicted as rectangular boxes, and the relationship types such as WORK-FOR are depicted as a diamond-shaped box. The value sets (domains) such as EMP#, NAME, and PHONE are depicted as circles, while attributes are the mappings from entity and relationships types to the value sets. The cardinality information of relationship is also expressed. For example, the 1 or N on the lines between the entity types and relationship types indicated the upper limit of the entities of that entity type participating in that relationships.Fig. 2. An Entity-Relationship (ER) Diagram

ER Model is based on Strong Mathematical Foundations. The ER model is based on (1) Set Theory, (2) Mathematical Relations, (3) Modern Algebra, (4) Logic, and (5) Lattice Theory. A formal definition of the entity and relationship concepts can be found in Fig. 3.Fig. 3. Formal Definitions of Entity and Relationship Concepts

Significant Differences between the ER model and the Relational Model.There are several differences between the ER model and the Relational Model: ER Model uses the Mathematical Relation Construct to Express the Relationships between Entities. The relational model and the ER model both use the mathematical structure called Cartesian product. In some way, both models look the same both use the mathematical structure that utilizes the Cartesian product of something. As can be seen in Figure 3, a relationship in the ER model is defined as an ordered tuple of

38

entities. In the relational model, a Cartesian product of data domains is a relation, while in the ER model a Cartesian product of entities is a relationships. In other words, in the relational model the

mathematical relation construct is used to express the structure of data values, while in the ER model the same construct is used to express the structure of entities. ER Model Contains More Semantic Information than the Relational Model. By the original definition of relation by Codd, any table is a relation. There is very little in the semantics of what a relation is or should be. The ER model adds the semantics of data to a data structure. Several years later, Codd developed a data model called RM/T, which incorporated some of the concepts of the ER model. ER Model has Explicit Linkage between Entities. As can be seen in Figures 2 and 4, the linkage between entities is explicit in the ER model while in the relational model is implicit. In addition, the cardinality information is explicit in the ER model, and some of the cardinality information is not captured in the relational model Object-Oriented (OO) Analysis Techniques are Partically Based on the ER Concepts It is commonly acknowledged that one major component of the object-oriented (OO) analysis techniques are based on the ER concepts. However, the relationship concept in the OO analysis techniques are still hierarchy-oriented and not yet equal to the general relationship concept advocated in the ER model. It is noticeable in the past few years that the OO analysis techniques are moving toward the direction of adopting a more general relationship concept. 4.4 Data Mining is a Way to Discover Hidden Relationships Many of you have heard about data mining. If you think deeply about what the data mining actually does, you will see the linkage between data mining and the ER model. What is data mining? What does the data mining really is doing? In our view, it is a discovery of hidden relationships between data entities. The relationships exist already, and we need to discover them and then take advantage of them. This is different from conventional database design in which the database designers identify the relationships. In data mining, algorithms instead of humans are used to discover the hidden relationship

An ERD is a model that identifies the concepts or entities that exist in a system and the relationships between those entities. An ERD is often used as a way to visualize a relational database: each entity represents a database table, and the relationship lines represent the keys in one table that point to specific records in related tables. ERDs may also be more abstract, not necessarily capturing every table needed within a database, but serving to diagram the major concepts and relationships. This ERD is of the latter type, intended to present an abstract, theoretical view of the major entities and relationships needed for management of e-resources. It may assist the database design process for an ERM system, but does not identify every table that would be necessary for an e-resource management database. This ERD should be examined in close consultation with other components of the Report of the DLF Electronic Resource Management Initiative, especially Appendix D (Data Element Dictionary) and Appendix E (Data Structure). The ERD presents a visual representation of e-resource management concepts and the relationships between them. The Data Element Dictionary identifies and defines the individual data elements that an e-resource management system must contain and manage, but leaves the relationship between the elements to be inferred by the reader. The Data Structure associates each

39

data element with the entities and relationships defined in the ERD. Together, these three documents form a complete conceptual data model for e-resource management.

Understanding the ModelThere are several different modeling systems for entity relationship diagramming. This ERD is presented in the Information Engineering style. Those unfamiliar with entity relationship diagramming or unfamiliar with this style of notation may wish to consult the following section to clarify the diagramming symbology

Relational AlgebraSteps in Building and Using a Database1. Design schema 2. Create schema in DBMS 3. Load initial data 4. Repeat: execute queries and updates on the database

Database Query LanguagesWhat is a query? Given a database, ask questions, get answers Example: get all students who are now taking CS145 Example (from the TPC-D benchmark):The Volume Shipping Query finds, for two given nations, the gross discounted revenues derived from lineitems in which parts were shipped from a supplier in either nation to a customer in the other nation during 1995 and 1996. The query lists the supplier nation, the customer nation, the year, and the revenue from shipments that took place in that year. The query orders the answer by supplier nation, customer nation, and year (all ascending).

Some queries are easy to pose, some are not Some queries are easy for DBMS to answer, some are not

Relational Query LanguagesFormal: Relational Algebra, Relational Calculus, Datalog Practical: SQL, Quel, QBE (Query-by-Example)

40

What is a relational query? Input: a number of relations in your database Output: one relation as the answer

Relational AlgebraBasic operators: selection, projection, cross product, union, difference, and renaming Additional operators (can be defined using basic ones): theta-join, natural join, intersection, etc.

Operands: relations Input relation(s) operator output relation Jun Yang 1 CS145 Spring 1999 Example: Student(SID, name, age, GPA) Take(SID, CID) Course(CID, title)

SelectionNotation: Purpose: pick rows according to some criteria Input: a table Output: has the same columns as , but only the rows of that satisfy Example: the student with SID 123 Example: students with GPA higher than 3.0 Example: straight-A students under 18 or over 21 The selection predicate in general can include any columns of , constants, comparisons such as , , etc., and Boolean connectives (and), (or), (not)

ProjectionNotation: Purpose: pick columns to output Input: a table Output: has only the columns of listed in Example: SIDs and names of all students Example: SIDs of students taking classes Notice the elimination of duplicate rows Example of composing and : names of students under 18 Jun Yang 2 CS145 Spring 1999

Product and JoinsCross Product Notation: Purpose: pair rows from two tables Input: two tables and Output: for each row in and each row in , output a row ; the output table has the columns of and the columns of 41

Example: Student Take If column names conflict, prefix the names with the table name and a dot Looks odd to glue unrelated tuples together; why use then? Example: names of students and CIDs of the courses they are taking Theta-Join Notation: Purpose: relate rows from two tables according to some criteria Shorthand for: Example: names of students and CIDs of the courses they are taking Natural Join

Notation: Purpose: relate rows from two tables, and enforce equality on all common attributes eliminate one copy of common attributes Shorthand for: , where , and Example: Student Take Example: names of students taking calculus Jun Yang 3 CS145 Spring 1999

Set OperatorsUnion: Difference: Intersection: Input: two tables and with identical schema Output: has the same schema as and Duplicate rows are eliminated (as usual) in union is just a shorthand for Example of union: Student(SID, name, age, GPA) GradStudent(SID, name, age, GPA, advisor) Find all student SIDs Example of difference: CIDs of the courses that nobody is taking What if we also want course titles?

RenamingNotation: , or Purpose: rename a table and/or its columns Example: SIDs of all pairs of classmates Atomicity

42

In database systems, atomicity (or atomicness) is one of the ACID transaction properties. In an atomic transaction, a series of database operations either all occur, or nothing occurs. ... Atomicity All database modifications must follow an all or nothing rule in which eachtransaction is atomic. That means that if one part of the transaction fails, the entire transaction fails. No splitting of atoms allowed! It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure A.C.I.D. stands for Atomicity, Consistency, Isolation and Durability

43