August 2010
Bachelor of Science in Information Technology (BScIT) – Semester 1/
Diploma in Information Technology (DIT) – Semester 1
BT0066 – Database Management Systems – 3 Credits
(Book ID: B0950)
Assignment Set – 1 (60 Marks)
Answer all questions 10 x 6 = 60
1. List out the database implicit properties.
Answer:- A database has the following implicit properties:
a) A database represents some aspect of the real word, sometimes called the
miniworld or universe of discourse (UOD). Changes to the miniword are
reflected in the database.
b) A database is a logically coherent collection of data with some inherent
meaning.
c) A database is designed, built, and, populated with data for a specific purpose.
2. What are the applications of SQL Server 2000?
Answer:- SQL Server™ 2000 relational database management software is widely deployed in
worldwide enterprises for online transaction process- ing (OLTP), online analytical processing,
and data mining. Highly scalable, reliable, easy to deploy, and self-tuning, SQL Server is used
for demanding, mission-critical applications. Whether used as the database engine behind
such Microsoft products as Commerce Server and Content Management Server, or accessed
by custom applications created with the Microsoft VisualStudio® .NET development system,
SQL Server can provide. a robust environment to manage corporate data. Dell best practices
advocate building enterprise data centers by sharing the workload among farms of small
servers that have four or fewer CPUs, rather than concentrating the workload on larger servers
with eight or more CPUs. Scaling out the data center using smaller servers can offer cost,
fault-tolerance, and ease-of- expansion advantages over larger, scaled-up servers.SQL Server
applications can be designed to run on multiple servers through the use of high-speed data
replication technology.
3. Distinguish between three major types of architectural data models.
Answer:-
1) Primitive data models: In this approach, object are represented by record structure
grouped in file-structure. The main operations available are read and write operations
over record.
2) Classic data model: These are the hierarchical, network and relational data models.
The hierarchical data model is an extension of the primitive data model discussed
above. The network is an extension of the hierarchical approach. The relational data
model is a fundamental departure from the hierarchical and network approaches.
3) Semantic data models: The main problem with classic data models, such as the
relational data model, is that they maintain a fundamental record-orientation.
4. Explain various file organizations in detail.
Answer:- A file is organized logically as a sequence of record. These records are mapped on
to disk bocks. Files are provided as a basic construct in operating system, so we shall
assume the existence of an underlying fie system. We need to consider ways of representing
logical data models in term of files.Although bocks are of a fixed size determined by the
physical properties of the disk and by the operating system, record sizes vary. In a relational
database, tuples of distinct are generally of different sizes
Record 0
Record 1
Record 2
Record 3
Record 4
Record 5
Record 6
A-102 Perryridge 400
A-305 Round Hill 350
A-215 Mianus 700
A-101 Downtown 500
A-222 Redwood 700
A-201 Perryridge 900A-217 Brighton 750A-110 Downtown 600A-218 Perriridge 700
Record 7
Record 8
One approach to mapping the database to fie is to use several files, and to store record of only
one fixed length in any fie.
As an example, let us consider a file of account records for our bank database. Each record of
this file is defend as:
Type deposit = record
Account – number: char (10)
Branch-name: char (22)
Balance: real;
End
If we assume that each character occupies 1 byte and that a real occupies 8 bytes, our
account record is 40 bytes long: A simple approach is to use first 40 bytes for the first record,
the next 40 bytes for the second record, however, there are two problem with this simple
approach.
1) it is difficult to delete a record from this structure. The space occupied by the record to
be deleted must be filled with some other record of the fie, or we must have a way of
marking deleted record so that they can be ignored.
2) Unless the bock size happens to be a multiple of 40 , some record wi cross bock
boundaries. That is, part of the record will be stored in one bock and part in another. It
would thus required two bock accesses to read or write such a record.
5. What is the goal of query optimization? Why it is important?
Answer:- Queries of a database can be fast or slow. Depends on a lot of things. The size of
the table, the amount of data you are requesting from the query, etc.
One of the ways a dba can help query optimization, is by "updating statistics" on a table.
Statistics of a table allows the query to find the most efficient way to gather the data from the
table.
The two types of optimization goals for query performance are as follows:
• Optimizing total query time
• Optimizing user-response time
Total query time is the time it takes to return all rows to the application. Total query time is
most important for batch processing or for queries that require all rows be processed before
returning a result to the user, as in the following query:
User-response time is the time that it takes for the database server to return a screen full of
rows back to an interactive application. In interactive applications, only a screen full of data
can be requested at one time. For example, the user application can display only 10 rows at
one time for the following query:
Which optimization goal is more important can have an effect on the query path that the
optimizer chooses. For example, the optimizer might choose a nested-loop join instead of a
hash join to execute a query if user-response time is most important, even though a hash join
might result in a reduction in total query time.
The default behavior is for the optimizer to choose query plans that optimize the total query
time. You can specify optimization of user-response time at several different levels:
6. Explain the statement that relational algebra operators can be composed. Why the ability to
compose operators is important?
Answer:- Relational algebra received little attention until the publication of E.F.Code’s
relational model of data in in 1970. codd proposed such an algebra as a basis for database
query language. Even the query language of SQl is oosely based on a relational algebra,
though the operands in SQl are not exactly relations and several useful theorems about
relational algebra do not hold in the SQl counterpart.
Relational algebra is a procedural language. It specifies the operations to be performed on
existing relations to derive result relations. Furthermore, it define the complete scheme for
each of the result relations. The relational algebraic operations can be divide in to basic set
oriented operations and relational-oriented operations. The former are the traditional set
operations, the latter, those for performing joins, selection, projection, and division.
By the end of unit 7 the earners are able to understand.
a) relational algebra and its history.
b) Basic operations like union, difference and intersections.
c) Additional relational algebra operations including selection, project, join and
division opertions.
Basic operation are the traditional set operations: union difference, intersection and Cartesian
product. Three of these four basic operations-union, intersection, and difference- require that
operand relations be union compatible. Two relations are union compatible if they have the
same arty and one-to-one correspondence of the attributes with the corresponding attributes
defined over the same domain. The Cartesian product can be defined on any two relations.
Two relations P and Q are said to be union compatible if both P and Q are of the same degree
n and the domain of the corresponding n attributes are identical,
i.e,
If p= { P1,,,,,,,Pn } and Q={ Q1,….Qn }then
Dom (Pi) = Dom (Q) for I = { 1,2…n }
Where Dom(Pi) represents the domain of the attribute Pi.
7. What is an unsafe query? Give an example and explain why it is important to disallow such
queries.
Answer:- Consider the query { S | (S Sailors) }. This query is syntactically correct. However, it
asks for all topples S such that S is not in (the given instance of) Sailors. The set of such S
topples is obviously infinite, in the context of infinite domains such as the set of all integers.
This simple example illustrates an unsafe query. It is desirable to restrict relational calculus to
disallow unsafe queries.
8. Define the term functional dependency.
Answer:- A functional dependency (FD) is a constraint between two sets of attributes in a
relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another
attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely
one Y value. Customarily we call X the determinant set and Y the dependent attribute. Thus,
given a tuple and the values of the attributes in X, one can determine the corresponding value
of the Y attribute. For the purposes of simplicity, given that X and Y are sets of attributes in R,
X → Y denotes that X functionally determines each of the members of Y - in this case Y is
known as the dependent set. Thus, a candidate key is a minimal set of attributes that
functionally determines all of the attributes in a relation.
(Note: the "function" being discussed in "functional dependency" is the function of
identification.)
A functional dependency FD: X → Y is called trivial if Y is a subset of X.
The determination of functional dependencies is an important part of designing databases in
the relational model, and in database normalization and renormalization. The functional
dependencies, along with the attribute domains, are selected so as to generate constraints
that would exclude as much data inappropriate to the user domain from the system as
possible.
For example, suppose one is designing a system to track vehicles and the capacity of their
engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN
→ Engine Capacity because it would be inappropriate for a vehicle's engine to have more
than one capacity. (Assuming, in this case, that vehicles only have one engine.) However,
Engine Capacity → VIN is incorrect because there could be many vehicles with the same
engine capacity.
This functional dependency may suggest that the attribute Engine Capacity be placed in a
relation with candidate key VIN. However, that may not always be appropriate. For example, if
that functional dependency occurs as a result of the transitive functional dependencies VIN →
Vehicle Model and Vehicle Model → Engine Capacity then that would not result in a
normalized relation.
9. Discuss the relative advantages of centralized and distributed databases.
1. Answer:- Advantages of distributed database systems are:
o local autonomy (in enterprises that are distributed already)
o improved performance (since data is stored close to where needed and a query
may be split over several sites and executed in parallel)
o improved reliability/availability (should one site go down)
o economics
o expandability
o share ability
10. List a few requirements for multimedia data management.
Answer:- The ability to store multimedia information in digital form has spurred both the
demand and offer of new electronic appliances (e.g., DVD players, digital cameras, mobile
phones connected to the Web, etc.) and new applications (e.g., interactive video, digital photo
album, electronic postcard, distance learning, etc.). The increasing production of digital
multimedia data magnifies the traditional problems of multimedia data management and
creates new problems such as content personalisation and access from mobile devices. The
major issues are in the areas of multimedia data modeling, physical storage and indexing as
well as query processing with multimedia data. We have been working on the following forms
of multimedia data : image, audio, video, and geo-temporal metadata possibly attached to a
document In our previous works, we showed that Galois' lattices are a useful tool to access
image database by browsing rather than querying them. They suffered from a performance
problem, the time complexity of building a lattice being quadratic. This has been solved thanks
to a pre-processing step using our summarization tool. Nonetheless, Galois' lattices suffer an
additional problem: all the descriptions are orthogonal, which means that the number of
incoming and out-going edges of a node can be extremely high, hence a visualization issue for
large image databases. (This problem is somewhat alleviated thanks to the summarization
process.)
In order to solve this second problem for large databases, we have proposed techniques
aimed to masking parts of the Galois' lattice during a browsing session [41] . In other words,
we create dynamic views of the Galois' lattice consisting of coherent sub-lattices obtained
either by removing nodes, or edges. In the case of node masking (the investigated case, so
far), the algorithm exhibits a linear complexity in the number of masked nodes. When using a
lattice of clusters of images, the size of the whole lattice is ``only'' in , where n is the
number of images. This means that this masking technique can be used at run-time even for
large databases.
August 2010
Bachelor of Science in Information Technology (BScIT) – Semester 1/
Diploma in Information Technology (DIT) – Semester 1
BT0066 – Database Management Systems – 3 Credits
(Book ID: B0950)
Assignment Set – 2 (60 Marks)
Answer all questions 10 x 6 = 60
1. Differentiate between physical data independence and logical data independence.
Answer:-
o Physical data independence is the ability to modify the physical schema without
making it necessary to rewrite application programs. E.G., changing from unblocked to
blocked record storage, or from sequential to random-access files.
o Logical data independence is the ability to modify the conceptual schema without
making it necessary to rewrite application programs. E.G., adding a new field to a
record. An application program's view hides this change from the program.
2. Explain the three level architecture of DBMS.
Answer:-
We now discuss a conceptual framework for a DBMS. Several different frameworks
have been suggested over the last several years. For example, a framework may be
developed based on the functions that the various components of a DBMS must
provide to its users. It may also be based on different views of data that are possible
within a DBMS. We consider the latter approach.
A commonly used view of data approach is the three-level architecture suggested by
ANSI/SPARC (American National Standards Institute/Standards Planning and
Requirements Committee). ANSI/SPARC produced an interim report in 1972 followed
by a final report in 1977. The reports proposed an architectural framework for
databases. Under this approach, a database is considered as containing data about an
enterprise. The three levels of the architecture are three different views of the data:
1. External - individual user view
2. Conceptual - community user view
3. Internal - physical or storage view
The three level database architecture allows a clear separation of the information
meaning (conceptual view) from the external data representation and from the physical
data structure layout. A database system that is able to separate the three different
views of data is likely to be flexible and adaptable. This flexibility and adaptability is
data independence that we have discussed earlier.
We now briefly discuss the three different views.
The external level is the view that the individual user of the database has. This view is
often a restricted view of the database and the same database may provide a number
of different views for different classes of users. In general, the end users and even the
applications programmers are only interested in a subset of the database. For
example, a department head may only be interested in the departmental finances and
student enrolments but not the library information. The librarian would not be expected
to have any interest in the information about academic staff. The payroll office would
have no interest in student enrolments.
The conceptual view is the information model of the enterprise and contains the view of
the whole enterprise without any concern for the physical implementation. This view is
normally more stable than the other two views. In a database, it may be desirable to
change the internal view to improve performance while there has been no change in
the conceptual view of the database. The conceptual view is the overall community
view of the database and it includes all the information that is going to be represented
in the database. The conceptual view is defined by the conceptual schema which
includes definitions of each of the various types of data.
The internal view is the view about the actual physical storage of data. It tells us what
data is stored in the database and how. At least the following aspects are considered
at this level:
1. Storage allocation e.g. B-trees, hashing etc.
2. Access paths e.g. specification of primary and secondary keys, indexes and
pointers and sequencing.
3. Miscellaneous e.g. data compression and encryption techniques, optimization
of the internal structures.
Efficiency considerations are the most important at this level and the data structures
are chosen to provide an efficient database. The internal view does not deal with the
physical devices directly. Instead it views a physical device as a collection of physical
pages and allocates space in terms of logical pages.
The separation of the conceptual view from the internal view enables us to provide a
logical description of the database without the need to specify physical structures. This
is often called physical data independence. Separating the external views from the
conceptual view enables us to change the conceptual view without affecting the
external views. This separation is sometimes called logical data independence.
Assuming the three level view of the database, a number of mappings are needed to
enable the users working with one of the external views. For example, the payroll office
may have an external view of the database that consists of the following information
only:
1. Staff number, name and address.
2. Staff tax information e.g. number of dependents.
3. Staff bank information where salary is deposited.
4. Staff employment status, salary level,leave information etc.
The conceptual view of the database may contain academic staff, general staff, casual
staff etc. A mapping will need to be created where all the staff in the different
categories are combined into one category for the payroll office. The conceptual view
would include information about each staff's position, the date employment started, full-
time or part-time, etc etc. This will need to be mapped to the salary level for the salary
office. Also, if there is some change in the conceptual view, the external view can stay
the same if the mapping is changed.
3. Explain the distinctions among the terms primary key, candidate key, and super key.
Answer:-
A A super key is any set of attributes such that the values of the attributes (taken
together) uniquely identify one entity in the entity set.
• A candidate key is a minimal super key -- a super key with no redundant attributes. In other
words, if any one of the attributes is removed, the set of attributes that remain no longer
form a super key.
• A primary key is one of the candidate keys, designated by the database designer.
• Every primary key is also a candidate key; every candidate key is also a super key, but not
vice versa.
4. Explain various storage devices and their characteristics.
Answer:- Many different forms of storage, based on various natural phenomena, have been invented. So far, no practical universal storage medium exists, and all forms of storage have some drawbacks. Therefore a computer system usually contains several kinds of storage, each with an individual purpose.
A digital computer represents data using the binary numeral system. Text, numbers, pictures, audio, and nearly any other form of information can be converted into a string of bits, or binary digits, each of which has a value of 1 or 0. The most common unit of storage is the byte, equal to 8 bits. A piece of information can be handled by any computer whose storage space is large enough to accommodate the binary representation of the piece of information, or simply data. For example, using eight million bits, or about one megabyte, a typical computer could store a short novel.
Traditionally the most important part of every computer is the central processing unit (CPU, or simply a processor), because it actually operates on data, performs any calculations, and controls all the other components.
Without a significant amount of memory, a computer would merely be able to perform fixed operations and immediately output the result. It would have to be reconfigured to change its behavior. This is acceptable for devices such as desk calculators or simple digital signal processors. Von Neumann machines differ in that they have a memory in which they store their operating instructions and data. Such computers are more versatile in that they do not need to have their hardware reconfigured for each new program, but can simply be reprogrammed with new in-memory instructions; they also tend to be simpler to design, in that a relatively simple processor may keep state between successive computations to build up complex procedural results. Most modern computers are von Neumann machines.
In practice, almost all computers use a variety of memory types, organized in a storage hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from the CPU. This traditional division of storage to primary, secondary, tertiary and off-line storage is also guided by cost per bit
5. What are the benefits of making the system catalogs relations?
Answer:- There are several advantages to storing the system
catatlogs as relations. Relational system catalogs take advantage of all of the implementation
and management benefits of relational tables: Effective information storage and rich querying
capabilities. The choice of what system catalogs to maintain is left to the DBMS implementer
6. What is relational completeness? If a query language is relationally complete, can you
write any desired query using that language?
Answer:- 6 and 7 question type same.
7. What is relational completeness? If a query language is relationally complete, can you
write any desired query in that language?
Answer:- Codd defined the term relational completeness to refer to a language that is
complete with respect to first-order predicate calculus apart from the restrictions he proposed.
In practice the restrictions have no adverse effect on the applicability of his relational algebra
for database purposes.
8. What is the basic purpose of 4NF?
Answer:- The purpose of 4NF, or the 4th Normal Form, is to remove multivalve dependencies.
This means that by adding a new value to the database, multiple other values are not required
to maintain consistency.
9. How might a distributed database designed for a local area network differ from one
designed for a wide area network?
Answer:- The main difference is range of coverage...A WAN covers the largest area and can
cover multiple cities, states, and can span across different continents. An example of a WAN
would be the World Wide Web.
A system that links together electronic office equipment, such as computers and word
processors, and forms a network within an office or building.
LANs cover a smaller area...Office networks in the same building or area and home
networks are examples LANs
10. What are the drawbacks of current commercial databases?
Answer: Existing commercial DBMS, both small and large has proven inadequate for these
applications. The traditional database notion of storing data in two-dimensional tables, or in flat
files, breaks down quickly in the face of complex data structures and data types used in
today's applications. Research in model and process complex data has gone in two directions:
a. Extending the functionality of RDBMS
b. Developing and implementing OODBMS that is based on object oriented programming
paradigm.