Bt0066

August 2010

Bachelor of Science in Information Technology (BScIT) – Semester 1/

Diploma in Information Technology (DIT) – Semester 1

BT0066 – Database Management Systems – 3 Credits

(Book ID: B0950)

Assignment Set – 1 (60 Marks)

Answer all questions 10 x 6 = 60

1. List out the database implicit properties.

Answer:- A database has the following implicit properties:

a) A database represents some aspect of the real word, sometimes called the

miniworld or universe of discourse (UOD). Changes to the miniword are

reflected in the database.

b) A database is a logically coherent collection of data with some inherent

meaning.

c) A database is designed, built, and, populated with data for a specific purpose.

2. What are the applications of SQL Server 2000?

Answer:- SQL Server™ 2000 relational database management software is widely deployed in

worldwide enterprises for online transaction processing (OLTP), online analytical processing,

and data mining. Highly scalable, reliable, easy to deploy, and self-tuning, SQL Server is used

for demanding, mission-critical applications. Whether used as the database engine behind

such Microsoft products as Commerce Server and Content Management Server, or accessed

by custom applications created with the Microsoft VisualStudio® .NET development system,

SQL Server can provide. a robust environment to manage corporate data. Dell best practices

advocate building enterprise data centers by sharing the workload among farms of small

servers that have four or fewer CPUs, rather than concentrating the workload on larger servers

with eight or more CPUs. Scaling out the data center using smaller servers can offer cost,

fault-tolerance, and ease-of- expansion advantages over larger, scaled-up servers.SQL Server

applications can be designed to run on multiple servers through the use of high-speed data

replication technology.

3. Distinguish between three major types of architectural data models.

Answer:-

1) Primitive data models: In this approach, object are represented by record structure

grouped in file-structure. The main operations available are read and write operations

over record.

2) Classic data model: These are the hierarchical, network and relational data models.

The hierarchical data model is an extension of the primitive data model discussed

above. The network is an extension of the hierarchical approach. The relational data

model is a fundamental departure from the hierarchical and network approaches.

3) Semantic data models: The main problem with classic data models, such as the

relational data model, is that they maintain a fundamental record-orientation.

4. Explain various file organizations in detail.

Answer:- A file is organized logically as a sequence of record. These records are mapped on

to disk bocks. Files are provided as a basic construct in operating system, so we shall

assume the existence of an underlying fie system. We need to consider ways of representing

logical data models in term of files.Although bocks are of a fixed size determined by the

physical properties of the disk and by the operating system, record sizes vary. In a relational

database, tuples of distinct are generally of different sizes

Record 0

Record 1

Record 2

Record 3

Record 4

Record 5

Record 6

A-102 Perryridge 400

A-305 Round Hill 350

A-215 Mianus 700

A-101 Downtown 500

A-222 Redwood 700

A-201 Perryridge 900A-217 Brighton 750A-110 Downtown 600A-218 Perriridge 700

Record 7

Record 8

One approach to mapping the database to fie is to use several files, and to store record of only

one fixed length in any fie.

As an example, let us consider a file of account records for our bank database. Each record of

this file is defend as:

Type deposit = record

Account – number: char (10)

Branch-name: char (22)

Balance: real;

End

If we assume that each character occupies 1 byte and that a real occupies 8 bytes, our

account record is 40 bytes long: A simple approach is to use first 40 bytes for the first record,

the next 40 bytes for the second record, however, there are two problem with this simple

approach.

1) it is difficult to delete a record from this structure. The space occupied by the record to

be deleted must be filled with some other record of the fie, or we must have a way of

marking deleted record so that they can be ignored.

2) Unless the bock size happens to be a multiple of 40 , some record wi cross bock

boundaries. That is, part of the record will be stored in one bock and part in another. It

would thus required two bock accesses to read or write such a record.

5. What is the goal of query optimization? Why it is important?

Answer:- Queries of a database can be fast or slow. Depends on a lot of things. The size of

the table, the amount of data you are requesting from the query, etc.

One of the ways a dba can help query optimization, is by "updating statistics" on a table.

Statistics of a table allows the query to find the most efficient way to gather the data from the

table.

The two types of optimization goals for query performance are as follows:

• Optimizing total query time

• Optimizing user-response time

Total query time is the time it takes to return all rows to the application. Total query time is

most important for batch processing or for queries that require all rows be processed before

returning a result to the user, as in the following query:

User-response time is the time that it takes for the database server to return a screen full of

rows back to an interactive application. In interactive applications, only a screen full of data

can be requested at one time. For example, the user application can display only 10 rows at

one time for the following query:

Which optimization goal is more important can have an effect on the query path that the

optimizer chooses. For example, the optimizer might choose a nested-loop join instead of a

hash join to execute a query if user-response time is most important, even though a hash join

might result in a reduction in total query time.

The default behavior is for the optimizer to choose query plans that optimize the total query

time. You can specify optimization of user-response time at several different levels:

6. Explain the statement that relational algebra operators can be composed. Why the ability to

compose operators is important?

Answer:- Relational algebra received little attention until the publication of E.F.Code’s

relational model of data in in 1970. codd proposed such an algebra as a basis for database

query language. Even the query language of SQl is oosely based on a relational algebra,

though the operands in SQl are not exactly relations and several useful theorems about

relational algebra do not hold in the SQl counterpart.

Relational algebra is a procedural language. It specifies the operations to be performed on

existing relations to derive result relations. Furthermore, it define the complete scheme for

each of the result relations. The relational algebraic operations can be divide in to basic set

oriented operations and relational-oriented operations. The former are the traditional set

operations, the latter, those for performing joins, selection, projection, and division.

By the end of unit 7 the earners are able to understand.

a) relational algebra and its history.

b) Basic operations like union, difference and intersections.

c) Additional relational algebra operations including selection, project, join and

division opertions.

Basic operation are the traditional set operations: union difference, intersection and Cartesian

product. Three of these four basic operations-union, intersection, and difference- require that

operand relations be union compatible. Two relations are union compatible if they have the

same arty and one-to-one correspondence of the attributes with the corresponding attributes

defined over the same domain. The Cartesian product can be defined on any two relations.

Two relations P and Q are said to be union compatible if both P and Q are of the same degree

n and the domain of the corresponding n attributes are identical,

i.e,

If p= { P1,,,,,,,Pn } and Q={ Q1,….Qn }then

Dom (Pi) = Dom (Q) for I = { 1,2…n }

Where Dom(Pi) represents the domain of the attribute Pi.

7. What is an unsafe query? Give an example and explain why it is important to disallow such

queries.

Answer:- Consider the query { S | (S Sailors) }. This query is syntactically correct. However, it

asks for all topples S such that S is not in (the given instance of) Sailors. The set of such S

topples is obviously infinite, in the context of infinite domains such as the set of all integers.

This simple example illustrates an unsafe query. It is desirable to restrict relational calculus to

disallow unsafe queries.

8. Define the term functional dependency.

Answer:- A functional dependency (FD) is a constraint between two sets of attributes in a

relation from a database.

Given a relation R, a set of attributes X in R is said to functionally determine another

attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely

one Y value. Customarily we call X the determinant set and Y the dependent attribute. Thus,

given a tuple and the values of the attributes in X, one can determine the corresponding value

of the Y attribute. For the purposes of simplicity, given that X and Y are sets of attributes in R,

X → Y denotes that X functionally determines each of the members of Y - in this case Y is

known as the dependent set. Thus, a candidate key is a minimal set of attributes that

functionally determines all of the attributes in a relation.

(Note: the "function" being discussed in "functional dependency" is the function of

identification.)

A functional dependency FD: X → Y is called trivial if Y is a subset of X.

The determination of functional dependencies is an important part of designing databases in

the relational model, and in database normalization and renormalization. The functional

dependencies, along with the attribute domains, are selected so as to generate constraints

that would exclude as much data inappropriate to the user domain from the system as

possible.

For example, suppose one is designing a system to track vehicles and the capacity of their

engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN

→ Engine Capacity because it would be inappropriate for a vehicle's engine to have more

http://en.wikipedia.org/wiki/Vehicle_identification_number

http://en.wikipedia.org/w/index.php?title=User_domain&action=edit&redlink=1

http://en.wikipedia.org/wiki/Attribute_domain

http://en.wikipedia.org/wiki/Denormalization

http://en.wikipedia.org/wiki/Database_normalization

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Subset

http://en.wikipedia.org/wiki/Identification_(information)

http://en.wikipedia.org/wiki/Candidate_key

http://en.wikipedia.org/wiki/Tuple

http://en.wikipedia.org/wiki/Attribute_(computing)

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Dependency_theory_(database_theory)

than one capacity. (Assuming, in this case, that vehicles only have one engine.) However,

Engine Capacity → VIN is incorrect because there could be many vehicles with the same

engine capacity.

This functional dependency may suggest that the attribute Engine Capacity be placed in a

relation with candidate key VIN. However, that may not always be appropriate. For example, if

that functional dependency occurs as a result of the transitive functional dependencies VIN →

Vehicle Model and Vehicle Model → Engine Capacity then that would not result in a

normalized relation.

9. Discuss the relative advantages of centralized and distributed databases.

1. Answer:- Advantages of distributed database systems are:

o local autonomy (in enterprises that are distributed already)

o improved performance (since data is stored close to where needed and a query

may be split over several sites and executed in parallel)

o improved reliability/availability (should one site go down)

o economics

o expandability

o share ability

10. List a few requirements for multimedia data management.

Answer:- The ability to store multimedia information in digital form has spurred both the

demand and offer of new electronic appliances (e.g., DVD players, digital cameras, mobile

phones connected to the Web, etc.) and new applications (e.g., interactive video, digital photo

album, electronic postcard, distance learning, etc.). The increasing production of digital

multimedia data magnifies the traditional problems of multimedia data management and

creates new problems such as content personalisation and access from mobile devices. The

http://en.wikipedia.org/wiki/Transitivity_(mathematics)

http://en.wikipedia.org/wiki/Candidate_key

major issues are in the areas of multimedia data modeling, physical storage and indexing as

well as query processing with multimedia data. We have been working on the following forms

of multimedia data : image, audio, video, and geo-temporal metadata possibly attached to a

document In our previous works, we showed that Galois' lattices are a useful tool to access

image database by browsing rather than querying them. They suffered from a performance

problem, the time complexity of building a lattice being quadratic. This has been solved thanks

to a pre-processing step using our summarization tool. Nonetheless, Galois' lattices suffer an

additional problem: all the descriptions are orthogonal, which means that the number of

incoming and out-going edges of a node can be extremely high, hence a visualization issue for

large image databases. (This problem is somewhat alleviated thanks to the summarization

process.)

In order to solve this second problem for large databases, we have proposed techniques

aimed to masking parts of the Galois' lattice during a browsing session [41] . In other words,

we create dynamic views of the Galois' lattice consisting of coherent sub-lattices obtained

either by removing nodes, or edges. In the case of node masking (the investigated case, so

far), the algorithm exhibits a linear complexity in the number of masked nodes. When using a

lattice of clusters of images, the size of the whole lattice is ``only'' in , where n is the

number of images. This means that this masking technique can be used at run-time even for

large databases.

August 2010

Bachelor of Science in Information Technology (BScIT) – Semester 1/

Diploma in Information Technology (DIT) – Semester 1

BT0066 – Database Management Systems – 3 Credits

(Book ID: B0950)

Assignment Set – 2 (60 Marks)

Answer all questions 10 x 6 = 60

1. Differentiate between physical data independence and logical data independence.

Answer:-

o Physical data independence is the ability to modify the physical schema without

making it necessary to rewrite application programs. E.G., changing from unblocked to

blocked record storage, or from sequential to random-access files.

o Logical data independence is the ability to modify the conceptual schema without

making it necessary to rewrite application programs. E.G., adding a new field to a

record. An application program's view hides this change from the program.

2. Explain the three level architecture of DBMS.

Answer:-

We now discuss a conceptual framework for a DBMS. Several different frameworks

have been suggested over the last several years. For example, a framework may be

developed based on the functions that the various components of a DBMS must

provide to its users. It may also be based on different views of data that are possible

within a DBMS. We consider the latter approach.

A commonly used view of data approach is the three-level architecture suggested by

ANSI/SPARC (American National Standards Institute/Standards Planning and

Requirements Committee). ANSI/SPARC produced an interim report in 1972 followed

by a final report in 1977. The reports proposed an architectural framework for

databases. Under this approach, a database is considered as containing data about an

enterprise. The three levels of the architecture are three different views of the data:

1. External - individual user view

2. Conceptual - community user view

3. Internal - physical or storage view

The three level database architecture allows a clear separation of the information

meaning (conceptual view) from the external data representation and from the physical

data structure layout. A database system that is able to separate the three different

views of data is likely to be flexible and adaptable. This flexibility and adaptability is

data independence that we have discussed earlier.

We now briefly discuss the three different views.

The external level is the view that the individual user of the database has. This view is

often a restricted view of the database and the same database may provide a number

of different views for different classes of users. In general, the end users and even the

applications programmers are only interested in a subset of the database. For

example, a department head may only be interested in the departmental finances and

student enrolments but not the library information. The librarian would not be expected

to have any interest in the information about academic staff. The payroll office would

have no interest in student enrolments.

The conceptual view is the information model of the enterprise and contains the view of

the whole enterprise without any concern for the physical implementation. This view is

normally more stable than the other two views. In a database, it may be desirable to

change the internal view to improve performance while there has been no change in

the conceptual view of the database. The conceptual view is the overall community

view of the database and it includes all the information that is going to be represented

in the database. The conceptual view is defined by the conceptual schema which

includes definitions of each of the various types of data.

The internal view is the view about the actual physical storage of data. It tells us what

data is stored in the database and how. At least the following aspects are considered

at this level:

1. Storage allocation e.g. B-trees, hashing etc.

2. Access paths e.g. specification of primary and secondary keys, indexes and

pointers and sequencing.

3. Miscellaneous e.g. data compression and encryption techniques, optimization

of the internal structures.

Efficiency considerations are the most important at this level and the data structures

are chosen to provide an efficient database. The internal view does not deal with the

physical devices directly. Instead it views a physical device as a collection of physical

pages and allocates space in terms of logical pages.

The separation of the conceptual view from the internal view enables us to provide a

logical description of the database without the need to specify physical structures. This

is often called physical data independence. Separating the external views from the

conceptual view enables us to change the conceptual view without affecting the

external views. This separation is sometimes called logical data independence.

Assuming the three level view of the database, a number of mappings are needed to

enable the users working with one of the external views. For example, the payroll office

may have an external view of the database that consists of the following information

only:

1. Staff number, name and address.

2. Staff tax information e.g. number of dependents.

3. Staff bank information where salary is deposited.

4. Staff employment status, salary level,leave information etc.

The conceptual view of the database may contain academic staff, general staff, casual

staff etc. A mapping will need to be created where all the staff in the different

categories are combined into one category for the payroll office. The conceptual view

would include information about each staff's position, the date employment started, full-

time or part-time, etc etc. This will need to be mapped to the salary level for the salary

office. Also, if there is some change in the conceptual view, the external view can stay

the same if the mapping is changed.

3. Explain the distinctions among the terms primary key, candidate key, and super key.

Answer:-

A A super key is any set of attributes such that the values of the attributes (taken

together) uniquely identify one entity in the entity set.

• A candidate key is a minimal super key -- a super key with no redundant attributes. In other

words, if any one of the attributes is removed, the set of attributes that remain no longer

form a super key.

• A primary key is one of the candidate keys, designated by the database designer.

• Every primary key is also a candidate key; every candidate key is also a super key, but not

vice versa.

4. Explain various storage devices and their characteristics.

Answer:- Many different forms of storage, based on various natural phenomena, have been invented. So far, no practical universal storage medium exists, and all forms of storage have some drawbacks. Therefore a computer system usually contains several kinds of storage, each with an individual purpose.

A digital computer represents data using the binary numeral system. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte, equal to 8 bits. A piece of information can be handled by any computer whose storage space is large enough to accommodate the binary representation of the piece of information, or simply data. For example, using eight million bits, or about one megabyte, a typical computer could store a short novel.

Traditionally the most important part of every computer is the central processing unit (CPU, or simply a processor), because it actually operates on data, performs any calculations, and controls all the other components.

Without a significant amount of memory, a computer would merely be able to perform fixed operations and immediately output the result. It would have to be reconfigured to change its behavior. This is acceptable for devices such as desk calculators or simple digital signal processors. Von Neumann machines differ in that they have a memory in which they store their operating instructions and data. Such computers are more versatile in that they do not need to have their hardware reconfigured for each new program, but can simply be reprogrammed with new in-memory instructions; they also tend to be simpler to design, in that a relatively simple processor may keep state between successive computations to build up complex procedural results. Most modern computers are von Neumann machines.

In practice, almost all computers use a variety of memory types, organized in a storage hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary, tertiary and off-line storage is also guided by cost per bit

5. What are the benefits of making the system catalogs relations?

Answer:- There are several advantages to storing the system

catatlogs as relations. Relational system catalogs take advantage of all of the implementation

and management benefits of relational tables: Effective information storage and rich querying

capabilities. The choice of what system catalogs to maintain is left to the DBMS implementer

6. What is relational completeness? If a query language is relationally complete, can you

write any desired query using that language?

Answer:- 6 and 7 question type same.

http://www.blurtit.com/q2772107.html



http://en.wikipedia.org/wiki/Latency_(engineering)

http://en.wikipedia.org/wiki/Bandwidth_(computing)

http://en.wikipedia.org/wiki/Memory_hierarchy

http://en.wikipedia.org/wiki/Memory_hierarchy

http://en.wikipedia.org/wiki/Program_state

http://en.wikipedia.org/wiki/Computer_programming

http://en.wikipedia.org/wiki/Instruction_(computer_science)

http://en.wikipedia.org/wiki/Von_Neumann_architecture

http://en.wikipedia.org/wiki/Digital_signal_processing

http://en.wikipedia.org/wiki/Digital_signal_processing

http://en.wikipedia.org/wiki/Calculator

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Megabyte

http://en.wikipedia.org/wiki/Data_(computing)

http://en.wikipedia.org/wiki/Byte

http://en.wikipedia.org/wiki/Bit

http://en.wikipedia.org/wiki/Binary_numeral_system

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Computer

7. What is relational completeness? If a query language is relationally complete, can you

write any desired query in that language?

Answer:- Codd defined the term relational completeness to refer to a language that is

complete with respect to first-order predicate calculus apart from the restrictions he proposed.

In practice the restrictions have no adverse effect on the applicability of his relational algebra

for database purposes.

8. What is the basic purpose of 4NF?

Answer:- The purpose of 4NF, or the 4th Normal Form, is to remove multivalve dependencies.

This means that by adding a new value to the database, multiple other values are not required

to maintain consistency.

9. How might a distributed database designed for a local area network differ from one

designed for a wide area network?

Answer:- The main difference is range of coverage...A WAN covers the largest area and can

cover multiple cities, states, and can span across different continents. An example of a WAN

would be the World Wide Web.

A system that links together electronic office equipment, such as computers and word

processors, and forms a network within an office or building.

LANs cover a smaller area...Office networks in the same building or area and home

networks are examples LANs

10. What are the drawbacks of current commercial databases?

Answer: Existing commercial DBMS, both small and large has proven inadequate for these

applications. The traditional database notion of storing data in two-dimensional tables, or in flat

files, breaks down quickly in the face of complex data structures and data types used in

today's applications. Research in model and process complex data has gone in two directions:

a. Extending the functionality of RDBMS

b. Developing and implementing OODBMS that is based on object oriented programming

paradigm.

Bt0066

Documents

Transcript of Bt0066