ANNA UNIVERSITY

ANNA UNIVERSITY- CHENNAI-JUNE 2010DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SUB CODE/SUB NAME: CS9221 / DATABASE TECHNOLOGYANSWER KEY

Part A – (10*2=20 Marks)

1. What is fragmentation?Fragmentation is a database server feature that allows you to control where data is stored at the table level. Fragmentation enables you to define groups of rows or index keys within a table according to some algorithm or scheme. You use SQL statements to create the fragments and assign them to dbspaces.

2. What is Concurrency control?Concurrency control is the activity of coordinating concurrent accesses to a database in a multiuser system. Concurrency control allows user to access a database in a multi-programmed fashion while preserving the consistency of the data.

3. What is Persistence?Persistence is the property of an object through which its existence transcends time i.e. (the object continues to exist after its creator ceases to exist), and/or space (i.e. the object’s location moves from the address space in which it was created).

4. What is Transaction Processing?A Transaction Processing system (TPS) is a set of information which processes the data transaction in database system that monitors transaction programs (a special kind of program). For e.g. in an electronic payment is made, the amount must be both withdrawn from one account and added to the other; it cannot complete only one of those steps. Either both must occur, or neither. In case of a failure preventing transaction completion, the partially executed transaction must be 'rolled back' by the TPS.

5. What is Client/Server model?

http://en.wikipedia.org/wiki/Rollback_(data_management)

The server in a client/server model is simply the DBMS, whereas the client is the database application serviced by the DBMS.

The client/server model of a database system is classified into basic & distributed client/server model.

6. What is the difference between data warehousing and data mining?Data warehousing: It is the process that is used to integrate and combine data from multiple sources and format into a single unified schema. So it provides the enterprise with a storage mechanism for its huge amount of data.Data mining: It is the process of extracting interesting patterns and knowledge from huge amount of data. So we can apply data mining techniques on the data warehouse of an enterprise to discover useful patterns.

7. Why do we need Normalization?Normalization is a process followed for eliminating redundant data and establishes a meaningful relationship among tables based on rules and regulations in order to maintain integrity of data. It is done for maintaining storage space and also for performance tuning.

8. What is Integrity?Integrity refers to the process of ensuring that a database remains an accurate reflection of the universe of discourse it is modeling or representing. In other words there is a close correspondence between the facts stored in the database and the real world it models0

9. Give two features of Multimedia Databases The multimedia database systems are to be used when it is required to administrate a

huge amounts of multimedia data objects of different types of data media (optical storage, video, tapes, audio records, etc.) so that they can be used (that is, efficiently accessed and searched) for as many applications as needed.

The Objects of Multimedia Data are: text, images, graphics, sound recordings, videorecordings, signals, etc., that are digitalized and stored.

10. What are Deductive Databases?A Deductive Database is the combination of a conventional database containing facts, a knowledge base containing rules, and an inference engine which allows the derivation of information implied by the facts and rules.

Part B – (5*16=80 Marks)

11. (a)Explain the architecture of Distributed Databases. (16)

(b)Write notes on the following: (i) Query processing. (8)

(ii) Transaction processing. (8)A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency.≪ concurrency transparency ≪ failure transparency

Example Transaction – SQL VersionBegin_transaction Reservationbegininput(flight_no, date, customer_name);EXEC SQL UPDATE FLIGHTSET STSOLD = STSOLD + 1WHERE FNO = flight_no AND DATE = date;EXEC SQL INSERTINTO FC(FNO, DATE, CNAME, SPECIAL);VALUES (flight_no, date, customer_name, null);output(“reservation completed”)end . {Reservation}Properties of TransactionsATOMICITY≪ all or nothingCONSISTENCY≪ no violation of integrity constraintsISOLATION≪ concurrent changes invisible E serializableDURABILITY≪ committed updates persistThese are the ACID Properties of TransactionAtomicity Either all or none of the transaction's operations are performed. Atomicity requires that if a transaction is interrupted by a failure, its partial results must be undone. The activity of preserving the transaction's atomicity in presence of transaction aborts due to input errors, system overloads, or deadlocks is called transaction recovery. The activity of ensuring atomicity in the presence of system crashes is called crash recovery.ConsistencyInternal consistency≪ A transaction which executes alone against a consistent database leaves it in a consistent state.≪ Transactions do not violate database integrity constraints.Transactions are correct programs.IsolationDegree 0≪ Transaction T does not overwrite dirty data of other transactions≪ Dirty data refers to data values that have been updated by a transaction prior to its commitment.Degree 2≪ T does not overwrite dirty data of other transactions

≪ T does not commit any writes before EOT≪ T does not read dirty data from other transactionsDegree 3≪ T does not overwrite dirty data of other transactions≪ T does not commit any writes before EOT≪ T does not read dirty data from other transactions≪ Other transactions do not dirty any data read by T before T completes.IsolationSerializability≪ If several transactions are executed concurrently, the results must be the same as if they were executed serially in some order.Incomplete results≪ An incomplete transaction cannot reveal its results to other transactions before its commitment.≪ Necessary to avoid cascading aborts.Durability: Once a transaction commits, the system must guarantee that the results of its operations will never be lost, in spite of subsequent failures.Database recovery

Transaction transparency: Ensures all distributed Ts maintain distributed database’s integrity and consistency.

• Distributed T accesses data stored at more than one location. • Each T is divided into no. of subTs, one for each site that has to be accessed.• DDBMS must ensure the indivisibility of both the global T and each of the subTs.

Concurrency transparency: All Ts must execute independently and be logically consistent with results obtained if Ts executed in some arbitrary serial order.

• Replication makes concurrency more complex Failure transparency: must ensure atomicity and durability of global T.

• Means ensuring that subTs of global T either all commit or all abort. • Classification transparency: In IBM’s Distributed Relational Database Architecture

(DRDA), four types of Ts:• Remote request• Remote unit of work

• Distributed unit of work• Distributed request.

12. (a)Discuss the Modeling and design approaches for Object Oriented DatabasesMODELING AND DESIGN Basically, an OODBMS is an object database that provides DBMS capabilities to objects that have been created using an object-oriented programming language (OOPL). The basic principle is to add persistence to objects and to make objects persistent. Consequently application programmers who use OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and the language has some kind of Persistent class, Database class, Database Interface, or Database API that provides DBMS functionality as, effectively, an extension of the OOPL. Object-oriented DBMSs, however, go much beyond simply adding persistence to any one object-oriented programming language. This is because, historically, many object-oriented DBMSs were built to serve the market for computer-aided design/computer-aided manufacturing (CAD/CAM) applications in which features like fast navigational access, versions, and long transactions are extremely important. Object-oriented DBMSs, therefore, support advanced object-oriented database applications with features like support for persistent objects from more than one programming language, distribution of data, advanced transaction models, versions, schema evolution, and dynamic generation of new types. Object data modeling An object consists of three parts: structure (attribute, and relationship to other objects like aggregation, and association), behavior (a set of operations) and characteristic of types (generalization/serialization). An object is similar to an entity in ER model; therefore we begin with an example to demonstrate the structure and relationship.

Attributes are like the fields in a relational model. However in the Book example we have, for attributes publishedBy and writtenBy, complex types Publisher and Author, which are also objects. Attributes with complex objects, in RDNS, are usually other tables linked by keys to the

employee table. Relationships: publish and writtenBy are associations with I: N and 1:1 relationship; composed of is an aggregation (a Book is composed of chapters). The 1: N relationship is usually realized

as attributes through complex types and at the behavioral level. For example,

Generalization/Serialization is the is a relationship, which is supported in OODB through class hierarchy. An ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A

subclass inherits all the attribute and method of its superclass.

Message: means by which objects communicate, and it is a request from one object to another to execute one of its methods. For example: Publisher_object.insert (”Rose”, 123…) i.e. request to

execute the insert method on a Publisher object) Method: defines the behavior of an object. Methods can be used to change state by modifying its attribute values to query the value of selected attributes The method that responds to the message

example is the method insert defied in the Publisher class. The main differences between

relational database design and object oriented database design include:

Many-to-many relationships must be removed before entities can be translated into relations. Many-to-many relationships can be implemented directly in an object-oriented database. Operations are not represented in the relational data model. Operations are one of the main components in an object-oriented database. In the relational data model relationships are implemented by primary and foreign keys. In the object model objects communicate through their interfaces. The interface describes the data (attributes) and operations (methods) that are visible to other objects.

(b) Explain the Multi-Version Locks and Recovery in Query Languages. (16)Multi-Version LocksMultiversion concurrency control (abbreviated MCC or MVCC), in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will implement updates not by deleting an old piece of data and overwriting it with a new one, but instead by marking the old data as obsolete and adding the newer "version." Thus there are multiple versions stored, but only one is the latest. This allows the database to avoid overhead of filling in holes in memory or disk structures but requires (generally) the system to periodically sweep through and delete the old, obsolete data objects. For a document-oriented database such as CouchDB, Riak or MarkLogic Server it also allows the system to optimize documents by writing entire documents onto contiguous sections of disk—when updated, the entire document can be re-written rather than bits and pieces cut out or maintained in a linked, non contiguous database structure

http://en.wikipedia.org/wiki/MarkLogic_Server

http://en.wikipedia.org/wiki/Riak

http://en.wikipedia.org/wiki/CouchDB

http://en.wikipedia.org/wiki/Database_management_system

http://en.wikipedia.org/wiki/Database_management_system

http://en.wikipedia.org/wiki/Concurrency_control

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Database

MVCC also provides potential "point in time" consistent views. In fact read transactions under MVCC typically use a timestamp or transaction ID to determine what state of the DB to read, and read these "versions" of the data. This avoids managing locks for read transactions because writes can be isolated by virtue of the old versions being maintained, rather than through a process of locks or mutexes. Writes affect future "version" but at the transaction ID that the read is working at, everything is guaranteed to be consistent because the writes are occurring at a later transaction ID.In other words, MVCC provides each user connected to the database with a "snapshot" of the database for that person to work with. Any changes made will not be seen by other users of the database until the transaction has been committed.MVCC uses timestamps or increasing transaction IDs to achieve transactional consistency. MVCC ensures a transaction never has to wait for a database object by maintaining several versions of an object. Each version would have a write timestamp and it would let a transaction (Ti) read the most recent version of an object which precedes the transaction timestamp (TS (Ti)).If a transaction (Ti) wants to write to an object, and if there is another transaction (Tk), the timestamp of Ti must precede the timestamp of Tk (i.e., TS(Ti) < TS(Tk)) for the object write operation to succeed, which is to say a write cannot complete if there are outstanding transactions with an earlier timestamp.Every object would also have a read timestamp, and if a transaction Ti wanted to write to object P, and the timestamp of that transaction is earlier than the object's read timestamp (TS(Ti) < RTS(P)), the transaction Ti is aborted and restarted. Otherwise, Ti creates a new version of P and sets the read/write timestamps of P to the timestamp of the transaction TS (Ti).The obvious drawback to this system is the cost of storing multiple versions of objects in the database. On the other hand reads are never blocked, which can be important for workloads mostly involving reading values from the database. MVCC is particularly adept at implementing true snapshot isolation, something which other methods of concurrency control frequently do either incompletely or with high performance costs.At t1 the state of a DB could beTime Object1 Object2t1 “Hello” “Bar”t2 “Foo” “Bar”This indicates that the current set of this database (perhaps a key-value store database) is Object1="Hello", Object2="Bar". Previously, Object1 was "Foo" but that value has been superseded. It is not deleted because the database holds “multiple versions” but will be deleted later.If a long running transaction starts a read operation, it will operate at transaction "t1" and see this state. If there is a concurrent update (during that long-running read transaction) which deletes Object 2 and adds Object 3 = “foo-bar” the database will look likeTime Object1 Object2 Object3t2 “Hello” (deleted) “Foo-Bar”t1 “Hello” Bar

http://en.wikipedia.org/wiki/Snapshot_isolation

http://en.wikipedia.org/wiki/Timestamp

http://en.wikipedia.org/wiki/Isolation_(database_systems)

t0 “Hello” BarNow there is a new version as of transaction ID t2. Note critically that the long-running read transaction still has access to a coherent snapshot of the system at t1* even though the write transaction added data as of t2, so the read transaction is able to run in isolation from the update transaction that created the t2 values. This is how MVCC allows isolated, ACID, reads without any locks.Recovery

13. (a) Discuss in detail Data Warehousing and Data Mining.

ANNA UNIVERSITY

Documents

Transcript of ANNA UNIVERSITY