A Methodology for Creating User Views in Database …nikos/mis-ii/papers/storey.pdfA Methodology for...

A Methodology for Creating User Views in Database Design VEDA C. STOREY University of Rochester and ROBERT C. GOLDSTEIN University of British Columbia

The View Creation System (VCS) is an expert system that engages a user in a dialogue about the information requirements for some application, develops an Entity-Relationship model for the user’s database view, and then converts the E-R model to a set of Fourth Normal Form relations. This paper describes the knowledge base of VCS. That is, it presents a formal methodology, capable of mechanization as a computer program, for accepting requirements from a user, identifying and resolving inconsistencies, redundancies, and ambiguities, and ultimately producing a normalized relational representation. Key aspects of the methodology are illustrated by applying VCS’s knowledge base to an actual database design task.

Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design; H.2.7 [Data- base Management]: Database Administration; 1.2.1 [Artificial Intelligence]: Applications and Expert Systems

General Terms: Design

Additional Key Words and Phrases: View Creation System

1. INTRODUCTION

Logical database design is concerned with determining the contents of a database independent of implementation considerations. The design process usually takes as its starting point a statement of requirements in the form of a set of user uiews. Each view describes the database content and structure that are

This research was supported by grants from The Imperial Order of the Daughters of the Empire, The Natural Sciences and Engineering Council of Canada, Suncor, Inc., The University of British Columbia, the IBM Program of Support for Education and Research in the Management of Infor- mation Systems, and the William E. Simon Graduate School of Business Administration, University of Rochester. Portions of this paper are adapted from V. Storey’s View Creation: An Expert System for Database Design, published by ICIT Press in 1988. 0 by International Center for Information Technologies, 1988. All rights reserved. Authors’ addresses: V. C. Storey, William E. Simon Graduate School of Business Administration, University of Rochester, Rochester, NY 14627; R. C. Goldstein, Faculty of Commerce and Business Administration, University of British Columbia, 2053 Main Mall, Vancouver, BC, Canada V6T lY8. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1988 ACM 0362-5915/88/0900-0305 $01.50

ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988, Pages 305-338.

306 l V. C. Storey and R. C. Goldstein

appropriate for a particular function that a user (or a group of users) performs. The process of designing user views relies heavily on judgment rather than mechanistic algorithms. Traditionally, it has involved an experienced database designer collecting information from users or systems analysts and then producing a view specification that is refined through an iterative process. Good database designers are both scarce and costly. Consequently, the number of users consulted and the number of design iterations are both usually less than what ideally would be desired.

This paper discusses a methodology that formalizes the view specification task. The objective of the formalism is to provide a procedure for developing user views that minimizes the need for scarce human expertise. The methodology has been implemented as the knowledge base of an expert system, called the View Creation System (VCS).

The following section defines user views and view modeling. Section 3 outlines the view creation methodology. The View Creation System is then discussed in Section 4. Section 5 contains concluding remarks and a discussion of future work. A partial transcript of a design session using the expert VCS is included in the Appendix.

2. VIEW MODELING

A suer view can be defined as “the perception of users about what a proposed database (or an ideal database) should contain” [ll]. In essence, a user view is a representation of reality relevant to a particular user or group of users for a specific purpose. The set of all views used in an organization can be taken as a specification of the required contents of that organization’s database. Currently, most methodologies for database design assume the existence of a set of view definitions and are concerned primarily with integrating these into a unified whole [e.g., [lo, 151).

The process of eliciting a user’s view of the database is called view modeling and is defined formally by Navathe and Schkolnick as “the modeling of the usage and information structure of the real world from the point of view of different users and/or applications” [13]. Navathe and Schkolnick describe the two major components of view modeling as

(1) extracting from the user or from a person in charge of application development the relevant parts of real-world information, and

(2) abstracting this information into a form that completely represents the user view so that it can be subsequently used in the design.

View representation has been addressed mainly as a by-product of data model development [2, 131. According to Navathe and Schkolnick [13], the most pertinent work done in this area has been the Entity-Relationship (E-R) data model of Chen [4] and the Data Abstraction methodology of Smith and Smith [16]. Navathe and Schkolnick also propose their own data model, the Navathe and Schkolnick (N-S) model, as a vehicle for modeling user views.

In addition, there are two methodologies that have been developed explicitly for constructing user views. These are Bubble Charting [lo] and the Interactive Specification methodology of Baldissera et al. [l]. A detailed summary of these ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.

Creating User Views in Database Design 307

approaches may be found in [17]. The methodology described in this paper employs ideas from the E-R model, the Data Abstraction methodology, and the Interactive Specification methodology.

2.1 View Modeling in Logical Database Design

Database design is a complex and lengthy process beginning with the determination of users’ information requirements and concluding with a physical design. User requirements are usually specified in the form of a set of views, each of which is relevant to a particular task or group of tasks. In an actual application, it is possible that more than one user might be asked to provide a view for a single task. If, as is likely, these views do not exactly coincide, the differences among them must be examined and reconciled. This is followed by a view integration process that is concerned with producing an overall database design compatible with the complete set of user views. At that point, the logical database design task is complete.

The methodology described in this paper provides a formal approach to the elicitation of user views and their representation as a set of Fourth Normal Form relations. At the end of the paper, we discuss prospects for extending the methodology to cover both the view reconciliation and view integration tasks.

3. VIEW CREATION METHODOLOGY

3.1 E-R Model

This methodology for generating user views is based on the E-R model [4], which is widely accepted as an effective approach to database design. The model employs two basic constructs: entities and relationships. An entity is a “thing” of interest in a database, for example, student. A relationship is an association among entities; for example, students take courses is an association between the entities student and course. Attributes are properties or characteristics that can be identified for both entities and relationships. For example, student-number could be an attribute of the entity student, and grade an attribute of the relationship students take courses.

3.2 Rule Set

The view creation methodology is represented as a set of rules that forms the knowledge base of the View Creation System. There are 130 major rules found in the knowledge base, many of which contain a number of subrules. Altogether, there are approximately 500 rules, with the exact number changing slowly but continuously as the methodology is used and refined. The rule set is a mixture of both procedural and production rules.

3.2.1 Procedural Rules. Procedural rules dictate the order in which various tasks are performed. The first such rule controls the overall procedure for the creation of a user view:

First: Identify entities, their attributes, and candidate keys. Then: Determine relationships, relationship attributes, and mapping ratios. Then: Detect and resolve ambiguities, redundancies, and inconsistencies.

ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.


Then: Select primary keys for entities. Then: Represent entities and relationships as relations. Then: Identify and resolve partial and transitive functional dependencies.

Other procedural rules are used to sequence functions within each of these major sections. For example, when an entity is obtained, the procedural rule governing how it should be treated is as follows:

First: Verify that the entity name is unique. Then: Elicit entity attributes. Then: Convert repeating attributes to entities. Then: Convert multivalued attributes to entities. Then: Obtain candidate keys. Then: Add the entity to the database specification.

3.2.2 Production Rules. Production rules are of the form IF-THEN. They are interpreted as IF a certain condition holds, THEN carry out a particular action. These rules indicate what should be done for each condition that could arise in attempting to achieve the subgoals specified in the procedural rules. As an example, the following production rule deals with how a certain type of binary relationship should be represented:

IF: a relationship is of the form A is-u B THEN: represent the relationship by adding the key of entity B as a foreign key

ofA.

3.3 Sources of Knowledge

The knowledge incorporated in the methodology was obtained from various sources. These are listed below along with examples of the types of knowledge they provided:

(1) Database design theory:

-procedures for converting an E-R model into a relational one, -properties of is-a relationships, -alternative ways of obtaining mapping ratios and their function in a design,

and -candidate keys and their use as primary or foreign keys in a database design.

(2) E-R model:

-a set of constructs (entities, relationships, and attributes) for modeling a user’s view, and

-a top-down approach to view modeling.

(3) Normalization theory:

-a means of determining whether or not a set of relations avoids certain anomalies, and

-rules for identifying and resolving violations of normalization principles. ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.

Creating User Views in Database Design l 309

(4) Expert database designers:

-heuristics, and -suggestions for improvements to the user interface.

(5) Colleagues and other people knowledgeable in database design:

-suggestions for improvements to the user interface, -rules for distinguishing entities and attributes, and -rules for ascertaining whether an entity is a subset or superset entity in a

relationship.

(6) General knowledge:

-attribute names that are often used to identify entities.

(7) Experience using and testing the system:

-rules for identifying missing information, -rules for detecting inconsistencies, -system default values, -rules for allowing the key of one entity to be used in the identification of

another, and -improvements to the user interface.

The remainder of this section describes the step-by-step procedure of the view creation methodology. Refer to the Appendix for a partial transcript illustrating this methodology as implemented in VCS.

3.4 Entities

Since entities are the fundamental units in the E-R model, the first step in the procedure is to obtain a list of entities. The example used thoughout this paper is a library circulation function where the objective is to keep track of where books are at any given point in time. For this example an initial set of entities might be borrower, book, volume,l etc.

3.4.1 Entity Attributes. The attributes appropriate for each entity are identified as each entity is obtained. Although it would be possible to postpone the determination of attributes until a later point, there are advantages to doing it as soon as each entity is known. First, it forces one to think carefully about the application. Knowledge of the attributes may also aid in the detection of consist- ency problems. The occurrence of particular attributes might imply the need for certain relationships. Examples are given below:

(1) Multivalued attributes. An attribute, Att, that can have more than one value for a given instance of an entity, El, indicates the existence of a relationship between El and the entity EZ identified by Att.

E.g.: book: [catalog-no, title, volume, . . .]

’ In all the examples in this paper, volume is considered to be not one part of a multipart work, as a volume of an encyclopedia, but rather a physical instance of a book. Libraries often have many copies of popular books, and it is essential in this application to distinguish between the conceptual “book” and the physical “volume.”



Since each book can have more than one volume, volume is a multivalued attribute. This indicates the existence of a relationship between book and volume. If the volume entity is not already known, the need for it is hereby established.

(2) Attribute name is entity name. Any attribute that is the name of another entity deserves special attention. In this situation, the name of one entity, Ez, appears as an attribute of another entity, E1. If Ez is needed as (part of) a unique identifier of E1, then the attribute should be retained, otherwise, this attribute implies the existence of a relationship between EI and ES.

E.g.: branch: [branch-name, library, address]

If library is needed to identify uniquely branch, then the attribute library should later be replaced by its primary key. If library is not needed in the identification of branch, a relationship between the two entities is implied.

(3) Repeating attributes. If an entity, E, has attributes of the form Attl, Att2, Att3, . . . , Attn, there is a presumption that these attributes represent instances of some entity, Att, rather than properties or characteristics of E. A relationship between Att and E is also implied.

E.g.: borrower: [name, address, bookl, book2, book31

Having bookl, book2, and book3 as attributes of borrower suggests the need for a book entity and a relationship between book and borrower.

3.4.2 Candidate Keys. Each entity occurrence in a database must be uniquely identifiable. A candidate key is an attribute or a combination of attributes that uniquely identifies instances of an entity. As each entity is identified, a set of candidate keys should be obtained. Eventually, one of these will be selected as a primary key for the entity.

E.g. (Key attributes are in UPPER CASE): borrower: [NUMBER, name, address, phone] borrower: [NAME, ADDRESS, number, phone]

Borrower has two candidate keys: (1) [NUMBER] and (2) [NAME, ADDRESS].

Key indicator attributes. Certain attributes are commonly used in the identification of entities. These are attributes such as name, number, id, and code. Whenever such attributes occur, they should be considered as possible candidate keys.

Generated identifiers. If an entity, E, is identified that does not have any attributes, then a unique identifier must be generated. This is done by concaten- ating the entity name, E, to the suffix, id, to obtain a key, E-id.

3.4.3 Missing Entities. Once an initial set of entities and attributes has been identified, the attributes should be scanned for indications of “missing entities.” For example, suppose the entity book has the following attributes:

book: [CATALOG-NO, title author-id, . . .] ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.


The attribute author-id is of the form X-key indicator attribute. This suggests author might also be an entity in the database. If so, a relationship is needed between book and author.

3.5 Relationships

Following Baldissera et al. [l], relationships are restricted to binary ones of the form A VP B, where VP stands for verb phrase.2 Examples are

-borrowers borrow volumes, -libraries have branches, and -library-director is-a person.

The procedural rule for dealing with each relationship A VP B requires that (1) A and B values be appropriately identified as entities or attributes; (2) mapping ratios be determined; and (3) if appropriate, corresponding relationship attributes be obtained. Before discussing each of these steps, it is important to note that the use of certain verb phrases enables one to make inferences about the semantics of the application.

3.5.1 Semantics. Two special verb phrases that imply specific semantic relationships are is-a and haue/has. The verb phrase have/has is subject to multiple interpretations. Two of these interpretations, instance-of and component-of, as well as the is-a verb phrase, are important for determining primary keys, “inheriting” attributes, and detecting inconsistencies, as will be discussed later.

Is-a relationships. The is-a verb phrase corresponds directly to Smith and Smith’s [6] concept of generalization. A relationship A is-a B implies that one should be able to attribute to A all the properties of B, but not vice versa. For each occurrence of B, there may or may not be a corresponding occurrence of A; for each occurrence of A, there is precisely one occurrence of B.

E.g.: Relationship: librarian is-a person

A person may or may not be a librarian, but every librarian is a person. All the attributes of person should be attributable to librarian.

Instance-of verb phrases. The instance-of verb phrase is similar to the is-a verb phrase in the sense that, for A instance-of B, one should be able to attribute properties of B to A (but not vice versa). The instance-of verb phrase differs from the is-a verb phrase, however, in that it allows for many occurrences of A for each B. For each occurrence of A, there is precisely one occurrence of B.

E.g.: Relationship: volume instance-of book

For each book there can be one to many volumes; for each volume there is one and only one book. All the attributes of book should be attributable to volume.

’ With some dexterity on the part of the user, the methodology also permits the modeling of nonbinary relationships. For example, the relationship students take courses in a given semester and receive a correspondinggrade could be modeled as the entity: student-grade: [STUDENT, COURSE, SEMES- TER, grade], where the entities student, course, and semester would eventually be replaced by their key attributes.


312 - V. C. Storey and R. C. Goldstein

3.5.2 Unidentified As and Bs. The A and B in a relationship, A VP B, are normally assumed to be entities. For certain verb phrases, however, it is possible that either or both can be attributes. If a relationship is specified for which A and/or B is unknown, it is necessary to classify them appropriately. A series of rules is provided for dealing with such situations based on

-the semantics of the verb phrase, and -existing information about A and B.

For example, suppose a relationship A is-a B occurs in which B is known to be an entity but A is unidentified. Since B is a generalization of A (by definition of is-a [3]), A must also be an entity.

3.5.3 Mapping Ratios. Mapping ratios describe the minimum and maximum number of A values that can occur for each B value in a relationship, A VP B, and vice versa. Tsichritzis and Lockovsky [ 181 refer to this type of mapping ratio as the minimum and maximum cardinality of the mapping. For example, if each value of A can have from 0 to many corresponding values of B, the min/max cardinalities of A are (0, N). Similarly, if each value of B has one and only one corresponding value of A, the min/max cardinalities of B are (1, 1).

3.5.4 Infer Min/Max Cardinalities. In some cases, it might be possible to infer some or all of the min/max cardinalities by examining (1) the verb phrase and (2) the form of the entities (singular or plural) as they appear in a relationship.

(1) Is-a verb phrases. A relationship A is-a B is interpreted as an association between a specific A and a generic B; that is, A is a subset of the superset B [3]. Each value of A, therefore, can have one and only one corresponding value of B, so the min/max cardinalities of A are (1, 1). For each value of B, there may or may not be a corresponding value of A. Thus, the min/max cardinalities of B are (0, 1).

E.g.: Relationship: librarian is-a person

The min/max cardinalities for librarian are (1, 1) because each librarian corresponds to one and only one person. The min/max cardinalities for person are (0, 1) because a person may or may not be a librarian.

(2) Entities in singular or plural form. Inferences about mapping ratios can also be made by examining the form (singular or plural) of the entities appearing in a relationship. For example, if, in the relationship A VP B, A and B are both singular, then there is one and only one B for each A. The min/max cardinalities of A, therefore, must be (1, 1). The inverse, however, is not necessarily true.

E.g.: Relationship: book has publisher

Using the singular form for both book and publisher implies that a book has one and only one publisher, so the min/max cardinalities for book are (1, 1). As can be seen from this example, the inverse is not implied because, obviously, a publisher is not restricted to publishing only one book. ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.


As another example, in the relationship book has authors, the singular book and plural authors imply there are multiple authors for a single book. Therefore, the min/max cardinalities for book are (1, N).

3.5.5 Relationship Attributes. As is the case with entities, relationships can have attributes: properties or characteristics of the relationship as a whole that are of interest to the user. Unlike entities, which usually have corresponding attributes, however, only some types of relationships have attributes. These relationships are identified by examining the min/max cardinalities of A and B. It will be shown that relationship attributes only exist when the min/max cardinalities of A are (0, N)3 or (1, N), and the min/max cardinalities of B are not (1, 1) or vice versa.4

Consider a relationship A VP B where the min/max cardinalities of A are (0, N) or (1, N) and the min/max cardinalities of B are (1, 1). It must be shown that a relationship attribute cannot exist in such a situation. Suppose such a relationship attribute, R,,,, does exist. R,,, is a function of the relationship A VP B and hence a function, f, of the entities A and B. Formally this is represented as

Ratt = f(A B).

A is a function of B, however, since there is one and only one value of A for each value of B. The above equation can therefore be rewritten as

Ratt = f(fi(B), B)

or

Ratt = g(B).

Thus, if a relationship attribute did exist, for A(0, N) or A(l, N) and B(1, l), it would be a function of the entity B only and, hence, appear as an attribute of B. Analogously, it can be shown that, for A (1, 1) and B(0, N) or B(1, N), a relationship attribute would be a function of the entity A and would thus appear as an attribute of A.

Now consider a relationship A VP B where A (1, 1) and B (1, 1). This case is easily shown to be an extension of the above. An apparent relationship attribute can, in this case, be expressed as an attribute of either the entity A or the entity B.

3.5.6 Missing Relationships. There are various ways to identify missing relationships; among them are the following:

(1) It would be unlikely to have an entity that does not participate in any relationship. The appearance of such an entity suggests a missing relationship.

3 N can mean one or many, depending on the situation. Therefore, (0,l) may sometimes be represented as (0, N). 4 The following proof was suggested by Yair Wand.


314 - V. C. Storey and R. C. Goldstein

(2) If an entity, El, had a multivalued attribute that was converted to another entity, Ez, then a relationship should exist between El and EP.

(3) If the name of one entity appears as an attribute of another, the existence of a relationship between the two entities is implied.

(4) If an entity, E, originally had an attribute of the form X-suffix and X became a new entity, then a relationship should exist between E and X.

(5) If an entity, E, originally had repeating attributes of the form Xl, X2, X3, . , Xn and X became a new entity, then a relationship should exist between

& and X.

The last four cases involve the appearance of an attribute of one entity that refers to some other entity. Such an attribute implicitly indicates the existence of a relationship between the two entities.

3.6 Ambiguities, Redundancies, and Inconsistencies

The previous steps concentrate on eliciting an application’s information requirements and modeling them using the E-R formalism. The model must now be examined for undesirable properties. The following sections on have/has relationships, inherited attributes, and synonyms indicate how ambiguities, inconsistencies, and redundancies can be detected.

3.6.1 Have/Has Relationships. Relationships employing a verb phrase that is some form of have/has are inherently ambiguous. At least four interpretations are possible:

(1) A possesses B; for example, library has books; (2) B component-of A; for example, book has chapters; (3) B instance-of/example-of A; for example, book hu.s volumes; and (4) B associated-with A in some other way; for example, books have authors.

Have/has relationships with the instance-of interpretation are of particular interest because they assist in selecting primary keys and detecting inconsistencies in the user’s input (see Sections 3.6.2 and 3.6.3). Component-of is useful in ensuring that primary keys are complete (see Section 3.7). The other two interpretations (possession and association) are employed simply to reflect more of the semantics of the application than the verbs have and has.

3.6.2 Hierarchical Relationships. Both is-a and instance-of verb phrases indicate the existence of hierarchical relationships. Other relationships involving entities that appear in a hierarchical relationship must be examined to ensure they are specified at the most appropriate hierarchical level. To illustrate, given the hierarchical relationship volume instance-of book, it is necessary to examine other relationships in which either volume or book appears. For example, the relationship borrowers borrow books would need to be changed to borrowers borrow volumes because it is physical “volumes” that can be borrowed, not conceptual “books” (refer to Footnote 1). On the other hand, the relationship authors write books, when examined, would be determined to be at the correct level.

3.6.3 Inherited Attributes. The analysis described in the previous section is concerned with ensuring that relationships involving entities that appear in ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.


hierarchical relationships are specified at the correct level. This section discusses a similar analysis for attributes of such entities. In a relationship A is-a/instance- of B, all attributes of the superset entity B should be attributable to the subset entity A. An attribute of the superset entity that cannot validly be applied to the subset entity indicates an inconsistency: Either

-the hierarchical relationship is incorrectly specified, or -the attribute in question should not appear in the definition of the superset

entity.

To illustrate, consider the hierarchical relationship librarian is-an employee, and suppose that union is an attribute of employee, but that librarians do not belong to unions. An inconsistency exists that can be corrected by (1) creating two new entities-manager and worker; (2) making union an attribute of worker; and (3) replacing the original relationship, librarian is-an employee, with librarian is-u manager, manager is-an employee, and worker is-an employee. The effect is to interpose an additional hierarchical level to distinguish the two categories of employee.

Note that any attribute that appears in both the subset and superset entities should be deleted from the subset entity to avoid redundancy.

E.g.: Relationship: employee is-a person

Person: [PERSON-NAME, ADDRESS, person-birthday, . . .] Employee: [EMPLOYEE-NUMBER, employee-birthday, person-name, address, . . .]

Becomes: (‘employee-birthday” and ‘address” are deleted from “employee”)

Person: [PERSON-NAME, ADDRESS, person-birthday, . . .] Employee: [EMPLOYEE-NUMBER, person-name]

3.6.4 Synonyms. Synonyms in either entities or relationships represent redundant information that should be removed from the design. One way to detect synonyms is to examine the format of the relationships. Consider, for example, relationships of the form A, VP B, AS VP B, . . . , A, VP B. Al, AZ, . . . , A,, are candidates to be either synonyms or related in some way that is not already known.

The Ais are considered pairwise to determine whether they are synonyms or if one is a subset of the other. If synonyms are found, one term is selected for further use. If one is a subset of the other, an is-u relationship is implied.

E.g.: Relationships: student borrows volumes borrower borrows volumes

There are four possibilities for the appropriate relationship between student and borrower:

(1) They are synonyms, in which case one of the terms is selected to replace the other throughout the design and any resulting redundancies are eliminated.

(2) Students are a subset of borrowers, implying that the relationship student is- a borrower should be added and the relationship student borrows volumes deleted.



(3) Borrowers are a subset of students, implying the relationship borrower is-a student should be added and the relationship borrower borrows volumes deleted.

(4) None of the above are correct, so no modification is necessary.

3.7 Primary Keys

Each entity occurrence in a database must be uniquely identifiable. Any of the candidate keys is, by definition, an acceptable identifier. If there is more than one candidate key, one of them must be selected as the primary key. The rules for selecting entity primary keys take into account the semantics of relationships in which the entity appears and attempt to maximize the efficiency of join operations.

3.7.1 Unique Attributes Names. Before primary keys are chosen, any attribute names that either (1) are “key indicator attributes” (e.g., name number, code, and id) or (2) exist for more than one entity are prefixed by their entity names in order to produce a set of unique attribute names. This ensures that the resulting primary keys will all be unique, which is especially important for entities involved in hierarchical relationships.

3.7.2 Rules for the Selection of Primary Keys. The rules for choosing primary keys are heuristic. They concentrate first on obtaining the simplest possible primary key, that is, the alternative that consists of the smallest number of attributes. When this criterion does not result in a unique choice, the candidate key that appears most often as a candidate or primary key for other entities is selected. The latter criterion aims at enhancing retrieval performance by increasing the efficiency of join operations that might be required during use of the database. Finally, if neither of these criteria are met (or there is a tie), the candidate key that was provided first is chosen, as it is probably the most natural one for the user.

Three classes of entities must be considered: (1) those that occur in is-a hierarchies, (2) those that occur in instance-of hierarchies, and (3) all others.

Is-a relationships. An is-a hierarchy occurs when there are relationships of the form . . . A is-a B, B is-a C, and so forth. Since C is the generic term for B, the key of C must be a suitable candidate key for the entity B (and, for that matter, for the entity A as well). (E.g., if manager is-an employee and employee is-a person, the key of person can serve as an identifier of both employee and manager.) Therefore, the primary key of the highest entity in the hierarchy is chosen first. This key is then “inherited” as a candidate key by the entities at the next lower level in the hierarchy. This process is applied recursively until primary keys have been selected for all entities in the hierarchy.

E.g.: 1) employee is-a person 2) librarian is-an employee.

Original set of candidate keys: Person: [PERSON-NAME, ADDRESS, . . . ] Employee: [EMPLOYEE-NUMBER, . . . ] Librarian: [JOB-TITLE, BRANCH, librarianname, . . . ]



The primary key [PERSON-NAME, ADDRESS] is adopted for the entity, personD since it is the only alternative. It is then added as a candidate key for %mployce’:

Person: [PERSON-NAME, ADDRESS,. . . ] Employee: [EMPLOYEE-NUMBER, person-name, address, . . . ] Employee: ]PERSONNAME, ADDRESS, employee-number, . . . ] Librarian: [JOB-TITLE, BRANCH, librarian-name, . . . ]

The primary key [EMPLOYEE-NUMBER] is chosen for aemployce’, and added as a candidate key for ‘librarian”:

Person: [PERSON-NAME, ADDRESS,. . . ] Employee: [EMPLOYEE-NUMBER, person-name, address, . . .] Librarian: [JOB-TITLE, BRANCH, librarian-name, employee-number, . . .] Librarian: [EMPLOYEE-NUMBER, librarianname, branch, job-title, . . .]

The primary key for ‘librarian’ is determined: Person: [PERSON-NAME, ADDRESS,. . . ] Employee: [EMPLOYEE-NUMBER, person-name, address, . . . ] Librarian: [EMPLOYEE-NUMBER, 1 b i rarianname, branch, job-title, . . .)

At this point, for a subset entity that adopts the primary key of its superset entity, the subset key is prefixed by its entity name. This is done to preserve primary key uniqueness, which facilitates the representation of relationships between subset and superset entities.

E.g.: Relationship: libmrian is-an employee

Employee: [EMPLOYEE-NUMBER, person-name, address] Librarian: [EMPLOYEE-NUMBER, branch, job-title, . . .]

Becomes: Employee: [EMPLOYEE-NUMBER, personname, address] Librarian: [LIBRARIANEMPLOYEENUMBER, branch, job-title, . . .]

Instagze-of relationships. Instance-of hierarchies are similar to is-a hierarchies, for example, . . . A instance-of B, B instance-of C, and so forth. In this case, however, the key of the superset entity does not uniquely identify occurrences of the subset entity because there can be many subset entity occurrences for each superset entity occurrence, for example, volume instance-of book. Rather, the key of the subset entity (e.g., volume) must include the key of the superset entity (e.g., book). This is because the key of the subset entity might only be unique within a particular occurrence of the superset entity. Therefore, the primary key of the superset entity is concatenated to each candidate key of the subset entity, if it is not already there, before the latter’s primary key is selected.5

As in the case of is-a hierarchies, instance-of hierarchies are processed from the entity at the highest level downwards.

E.g.: Relationship volume instance-of book

Candidate Keys: book: [CATALOG-NO, book-title, author, publisher] volume: (COPY-NO, volume-title]

5 One could conceive of a situation where the subset entity is given a key that is unique in itself without reference to the superset entity. In such a case, however, either the subset entity key must include the superset entity key or something that is functionally related to it. Such a disguised representation of the relationship seems likely to give rise to normalization-related difficulties.



There is only one candidate key for “book” eo it becomes the primary key. book: [CATALOG-NO, book-title, author, publisher]

Note that WOPY-NO” only uniquely identifies a volume for a particular book. There- fore, the candidate key, and hence the primary key, for $olume” becomes:

volume: [CATALOG-NO, COPY-NO, volume-title]

The attribute bolume-title’ ia deleted from %olume” because it can be inherited from =book”.

volume: [CATALOG-NO, cow-No]

3.7.3 Entities Requiring Other Entities for Identification. In some cases, the key of one entity must include the key of another entity in order to guarantee uniqueness. For example, if branches of a library are allowed to assign card numbers independently of each other, then the key of library-card must include the key of the branch that issued it.

3.7.4 Component-of Relationships. In a relationship A component-of B, the key of B might be needed in order to identify uniquely an instance of A. For example, if branch names are unique only within a library, then the key of branch must include the key of library.

E.g. Relationship: Branch componen+oj Library

Primary Kc ya: Library: [LIBRARY-ID, library-name, library-address] Branch: IBRANCH-NAME, branch-address] The key of LIBRARY is concatenated to the key of branch:

Branch: [LIBRARY-ID, BRANCH-NAME, branch-address]

3.8 Entity Representation

Each entity is represented by a separate entity relation with the key and nonkey attributes of the relation corresponding directly to those of the entity. The relation thus constructed may not be in its final form. Modifications discussed below might be required to ensure adherence to normalization principles.

3.9 Relationship Representation

There are two alternative representations for each relationship A VP B:

(1) A relation can be constructed with relation name A- VP-B, and relation key equal to the concatenation of the keys of the A and B entities.

(2) The key attributes of one entity can be added as nonkey attributes of the other (the foreign key approach).

The choice of representation depends on the mapping ratios and possibly the anticipated usage.

As will become apparent in the discussion that follows, the only cardinalities that are relevant to this decision are 0, 1, and N. The distinguishing factor in determining how a relationship should be represented is whether or not one (or both) of the involved entities has min/max cardinalities of (1, 1).

3.9.1 Relationships Involving (1, 1) Cardinalities. There are two cases to consider:

(1) Only one of the involved entities has (1, 1) cardinalities. (2) Both entities have (1, 1) cardinalities. ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.


Case 1. In a relationship A VP B, suppose the min/max cardinalities are (1, 1) for A, but not for B. Therefore, there is precisely one occurrence of the relationship for each occurrence of entity A.

If we adopt the foreign key approach to representing such a relationship, the storage used would be that required to add a foreign key attribute (i.e., the key of B) to each occurrence of entity A. Database processing that requires locating the B corresponding to a given A (i.e., the B of A type of query) would be efficiently processed through the A relation. The inverse query (i.e., the A of B) would also be efficiently handled if the A relation has an index on the B foreign key field.

The alternative is to construct a new relation to represent this relationship. The length of the new relation would be equal to that of the A relation because of the (1, 1) cardinality of A. The width of the new relation would be equal to the sum of the sizes of the A and B keys. Compared to the foreign key approach, this solution requires additional storage equal to size of the A key multiplied by the length of the A relation. Retrievals of either type (the B of A or the A of B) should be equally efficient because both entity keys occur in the key of the new relation. Thus, the indexes needed to avoid exhaustive searching should exist.

The two alternatives are equally appealing in terms of their retrieval performance, but the foreign key option is preferred because it requires significantly less storage.

Case 2. When both A and B have (1, 1) cardinalities, the above analysis suggests that the relationship could be represented by adding the key of either entity as a foreign key of the other. Because of the (1, 1) cardinalities of both entities, the lengths of the A and B relations must be equal. Assuming the existence of the necessary indexes, retrieval performance will also be equivalent for the two alternatives. Therefore, the only basis for selection is the size and complexity of the two keys. If one of the keys involves fewer attributes than the other, then it should become the foreign key in the other relation. If both keys have the same number of attributes, then there is some saving of storage by using the shorter key as the foreign key.

3.9.2 Relationships Not Involving (1, 1) Cardinalities. A relationship may have attributes only when neither of the entity cardinalities are (1, l), as previously discussed. Any relationship that does have attributes must be represented by a separate relation since use of the foreign key approach would unavoidably result in normalization violations.

For relationships that do not have attributes, there are two cases to consider:

(1) relationships in which the cardinalities of both entities are (0, l), and (2) all others.

Case 1. (0, l)/(O, 1) and no relationship attributes. Consider a relationship A VP B where the min/max values of both A and B are (0, 1). The first thing to examine is participation rates (i.e., the percentage of occurrences of each entity that participates in the relationship). If the participation rate of one entity is significantly higher than for the other, the relationship should be represented



using the foreign key approach in the entity relation with the higher participation rate.

E.g. : Relationship: applicant fills position

Initially assume that the number of applicants greatly exceeds the number of positions. Then, the participation rate for positions is much higher than that for applicants. Therefore, the relationship should be represented by making the key of applicant a foreign key in the position relation. This choice would require less storage for the relationship and less use of null foreign key attribute values than the alternative of making position a foreign key in the applicant relation.

If both entities have high participation rates, then the choice of which representation to use should be based on anticipated query frequencies. By query frequencies we mean whether the user is most often interested in the A of B or the B of A type of queries. We will use the term input entity for the one about which the user knows something and output entity for the one about which information is required. In such cases, the key of the input relation should appear as a foreign key in the output relation.

If the user knows the key of the input entity, this solution allows such queries to be processed with only an indexed access to the output relation. If the user does not know the key of the input entity, the input relation must be searched, and in this situation the performance is not affected by which entity key is used as the foreign key.

Finally, if neither entity has a high participation rate, the most efficient representation would be a new relation.

Case 2. Not (0, l)/(O, 1) and no relationship attributes. The only remaining cases are those in which the entities have either (0, N) or (1, N) cardinalities. There is no foreign key approach for these cases that would not violate normalization principles. Therefore, there is no alternative but to represent these relationships by relations.

3.10 Normalization

The last step deals with two types of undesirable functional dependencies: partial and transitive. In normalization theory, partial functional dependencies violate Second Normal Form, while transitive functional dependencies violate Third Normal Form.

A partial functional dependency exists when a nonkey attribute in a relation depends on only part, as opposed to the complete, relation key. A transitive functional dependency exists when a nonkey attribute depends on other nonkey attributes instead of directly on the key. Following normal database design practice, these normalization violations are removed by splitting the original relation into two or more relations.

The procedure outlined here cannot produce relations that violate 4NF.6 Therefore, the result will be a set of 4NF relations that represent a user’s database view.

6 This result is demonstrated in Storey [17].



4. VIEW CREATION SYSTEM

The methodology presented in this paper has been implemented as the knowledge base of an expert system, called the View Creation System (VCS). The implementation serves as a precise specification of the methodology as well as providing an extremely useful tool for evaluating and refining it. A partial transcript of a VCS session is included in the Appendix in order to illustrate the view creation methodology. The system engages the user in a dialogue designed to elicit the entities, relationships, and attributes of a view. At appropriate points during a session, VCS explains the concepts of entities, attributes, and relationships using brief tutorials. Thus, the user is not expected to know anything about database design techniques or terminology. The user is led to describe his or her application using the constructs of the E-R model, while the system attempts to detect and resolve inconsistencies, ambiguities, and redundancies.

4.1 System Development

The development of the knowledge base was a three-step process. A prototype was built using general knowledge of the database design process (i.e., from standard textbooks). The knowledge base was then expanded to include expertise from database design experts and further refined through testing the system in a number of different organizations.

4.1.1 Expertise. Consultation sessions were arranged with a number of expert database designers. During these sessions each designer was asked to create a database design for a hypothetical problem with one of the researchers playing the role of the user. At the conclusion of each session, the designer was asked to describe how and why certain decisions were made. Finally, the designer was invited to experiment with and critique the then-current version of the system. The rules and heuristics obtained from each of these sessions were added to the formalization and, hence, to the knowledge base of the expert system.

4.1.2 Testing Sessions. The system was tested for seven different database design problems using real users in real organizations. These sessions identified some missing rules and also produced suggestions for improvement of VCS’s user interface. Use of the system in a number of organizations subsequent to the end of the formal testing phase continues to identify occasional refinements. This is discussed further in Section 4.3.

4.2 VCS Implementation

4.2.1 Use of PROLOG. The system was implemented in PROLOG. This language was chosen for several reasons:

-Many researchers (e.g., [5, 6, 8, 141) have characterized PROLOG as an appropriate language for defining and implementing expert systems. In particular, the constructs used in E-R modeling can be easily represented in PROLOG [14].

-PROLOG easily accommodates incremental additions or deletions [6], which facilitated development of the system in an iterative manner.



-Updating of acquired knowledge is easily and quickly accomplished [S]. -It is easy to implement the capability for the system to explain its reasoning. -PROLOG is well suited for processing input in a restricted form of natural

language. For example, it is easy to perform string searches needed to identify related terms, as well as to properly recognize singular and plural forms of words.

On the negative side, the version of PROLOG used in the development of the system did not include facilities for menu-oriented input or for graphical input or output. Menu selection capabilities, which were considered to be absolutely essential at certain points in the procedure, had to be specifically written into the program. At other points, less satisfactory dialogue sequences were used because of the amount of programming required to support a menu interface. The lack of graphical input/output facilities was also unfortunate. Human database designers routinely use diagrams for communicating with users, and there can be little doubt that such a capability would enhance the effectiveness of VCS as well. Altogether, encoding of the knowledge base and the dialogue management facilities required approximately 21,000 lines of PROLOG.

4.2.2 Original System. The original version of the View Creation System was implemented on a 48-Mbyte Amdahl 5850 running the Michigan Terminal System. The interpreter was C-PROLOG. Under normal system loads, the performance was quite satisfactory with the system usually waiting for response from the user, rather than vice versa. A typical design session for a user view consisting of about six entities and the same number of relationships took approximately 1.5 hours to complete.

4.2.3 Current System. The View Creation System has since been transferred to a microcomputer environment using Arity PROLOG. Among the reasons for this change were a desire to make the system as portable as possible and a wish to take advantage of a number of additional capabilities that are present in Arity PROLOG. The Arity PROLOG system includes a compiler that permits a significant performance improvement over the interpreted version. Currently it is only possible to compile about 60 percent of the code because of memory limitations. This restriction should disappear, however, when Version 5 of Arity PROLOG becomes available. Performance of the compiled portion of the system running on an IBM PC/AT-class microcomputer compares favorably with that of the interpreted version on the mainframe.

Arity PROLOG also contains a collection of screen management predicates that facilitate implementation of a “Macintosh-like” menu-oriented user interface in place of the current one, which relies primarily on a question-and-answer dialogue. Finally, Arity PROLOG has facilities for interfacing to other languages and systems that will make it possible to eventually add graphical input and output capabilities.

4.3 System Testing

In the testing phase of the View Creation System, the system was used by real users in real organizations to create views for real tasks. The resulting output- sets of relations representing database views-was examined by the researchers ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.

Creating User Views in Database Design

Table I. Test Sessions

. 323

User and application Evaluation System modification

User 1 Type: Systems analyst Application: Training

database

User 2 Type: Some familarity

with database concepts Application: Student-ad-

visory database

User 3 Type: Knowledgeable in

data modeling

User 4 Type: Naive user Application: Origin-desti-

nation database for movement of traffic

User 5 Type: Naive user Application: Equipment

database

Users 6 and 7 The view does not represent the Type: Naive users who de- users’ information require-

signed a single view ments because of the difficulty Application: Database for the users had in identifying en-

insurance claims tities in their applications

User 8 Type: Learning database

design Application: Database for

software maintenance control

The view appropriately reflects User 8’s information requirements and is free of undesirable properties

The view contains one redundant relation and one relation where the information requirements are not represented at the correct level of detail

The output is a normalized set of relations, but does not totally represent the user’s application because of the difficulty the user had when identifying entities and attributes

The output is correct and free of any undesirable properties

The view produced is small, but correct and free of any undesirable properties

The view is correct and free of undesirable properties, but does not reflect all the user’s requirements because the user failed to model one dimension of the application

Two rules were added: (1) A new relation should not be constructed to represent instance- of relationships; and (2) for relationships A instance-of B, determine whether any other entity appearing in a relationship with B should be associated with A instead

The system’s instructions were modified to highlight some of the more subtle points

Two minor modifications to the user interface were made based on User 3’s suggestions

No modification

A rule was added that allows one to distinguish what role subset and superset entities of is-a relationships play in other relationships when they both have the same primary keys

No modification

One rule was added: When a relationship A have/has B (with attributes) is converted to an entity Adtnve/hm~B and a shorter name is not provided, the entity name should later be modified to reflect the appropriate interpretation of have/has

and by database designers from cooperating organizations. When the system failed to perform as expected, the responsible error or omission in the knowledge base was identified and corrected.

Some of the users who participated in the testing had no prior exposure to database concepts, whereas others had varying amounts of training and/or experience. The test sessions are summarized in Table I.



4.3.1 Results. The system performed best for users who knew something about database design. For all these users, the design produced either was an accurate representation of the user’s information requirements, or highlighted a flaw in the system’s knowledge base, which was subsequently corrected. The system modifications were done immediately after each testing session so that the version given to the succeeding user included that knowledge.

One potential criticism of this approach to testing is that it cannot be proved that the system, and hence the methodology, has reached a correct “steady state” where no further improvements are necessary. It may be noted, however, that no major modifications were required after the testing session with User 5. All the changes made after that point involved minor refinements to the knowledge base. In general, these later changes were associated with capturing more of the semantics of the application rather than correcting errors. The system, therefore, did reach a reasonable degree of stability.

5. CONCLUSION

5.1 Summary

A methodology for generating user views has been formalized and expressed as a set of rules that comprise the knowledge base of an expert View Creation System. The methodology is based on the E-R model. Using this approach, a user’s information requirements for a database are initially expressed in terms of entities, attributes, and relationships, and later transformed into a set of normalized relations that represents the user’s database view.

The significance of this research lies in the insight it provides into the process of database design through formalization of part of the logical database design task. In addition to providing a means for precisely expressing this formalization, implementation of the expert system has made it possible to experimentally validate its adequacy and completeness. The primary contribution of this research, however, lies in the rules and procedures that comprise the system’s knowledge base.

5.2 Future Work The original objective of this research was to develop a methodology for formal- izing the creation of the user views that are essential input to most database design procedures. This methodology, and the expert system implementing it, could, in principle, be used to design a complete database if a single user, or group of users working together, could supply all the necessary information. This is not, however, the way database design is done in real organizations. Rather, different individuals or groups specify requirements in their own areas. Conflict- ing or inconsistent requirements must be identified and resolved, and then a comprehensive design produced. This problem is usually referred to as view integration. It appears that the appropriate point to introduce view integration into the view creation methodology is after views have been expressed in E-R form, but before they have been converted to sets of normalized relations. A considerable amount of information about the meaning of each view is acquired ACM Transactions on Database Systems, Vol. 13, No. 3, September 1988.

Creating User Views in Database Design . 325

during the view creation process that is lost in the conversion to relational form. The extension of VCS to include view integration is currently under way.

An interesting, related problem concerns the reconciliation of alternative versions of the same user view. The availability of an expert database designer in economical software form makes it feasible to obtain database views from several individuals concerned with a single application. Ideally, these views should be identical, but as a practical matter, each individual is likely to have a slightly different understanding of the task from those of his or her colleagues. Thus, obtaining several versions of a view might result in a more accurate or complete representation of the task than any one of them individually. Detecting and resolving differences among alternative versions of a view are special cases of the view integration problem, so a solution to that problem will also solve this one.

Finally, it is believed that the effectiveness of VCS could be enhanced by improving its user interface. One such improvement would involve the addition of graphical input and output capabilities. Human database designers employ various types of diagrams, both for communication with users and to record the evolving status of the design. A computerized expert system has its own techniques for the second of these tasks, but there is probably no effective substitute for diagrams for aiding the user in understanding what is happening. In particular, it seems likely that allowing the user to see an evolving E-R diagram would assist him or her greatly in identifying missing entities and relationships.

We also propose to examine the feasibility of increasing the system’s “com- monsense” knowledge. Although this is not, strictly speaking, a user interface issue, it would appear to the user primarily as an enhancement in that area. The current VCS has a very limited amount of general knowledge that permits it to recognize that certain attributes are potential keys or that certain verbs imply particular forms of relationships. Much more could be done in this area to produce a system that is less demanding of the user. This is a topic of current research that should be applicable to the view creation problem.

Appendix A. Design Session

This section contains excerpts from a transcript of a design session. To make the transcript more readable, it has been divided into sections as indicated by the headlines that have been inserted. (These headlines do not actually appear during a session.) Comments have also been added and are identified as text between the symbols / * and * /.

A.1 Initialization

User? Please enter your name followed by a dot (.) and return.

I have to ask you to always respond in this manner, i.e. give your answer followed by a dot and return.

Now enter your name please. I: Veda.

Hello veda. For what do you wish to design a database? (If you need more than one word to respond, connect the words with underscore

characters.) I: library-circulation.



A.2 Entities

Tell me what things are of interest to you in a “library4rculation”. I will refer to these “things” as ‘entities.’ I also want you to give me the ‘attributes’ of these “entities,” i.e. the properties or characteristics that are of interest to you.

____-__-- E.g. AIRLINE-RESERVATION - - - - - - - - -

ENTITIES ATTRIBUTES flight number cityfrom city-to no-of-passengers passenger name frequent-flyerlrtatus aircraft id type max-passengers person name address phone

Note that you may use only one ‘word” for each attribute but the “word” may consist of several wcrde connected by underscore characters.

If you enter something and then decide that it’s wrong, you can change it later. Now please enter the name of an entity in its SINGULAR’ form.

I: borrower.

A.2.1 Entity Attributes

What are the attributes for ‘borrower”?

Enter them separated only by a blank and with a period after the last one. Else enter “end.” if you really do not know.

I: number name address phone book1 book2 bookb.

A.2.2 Repeating Attributes

Are “bookl” “book2” “book3” different occurrences of “book”? “yes.’ / “no.” / ‘why? t why? It sounds like ‘bookl’ “book2” “book3” all refer to the same “thing”-i.e. ‘book”.

A ‘thing” should be an entity. Now, are “bookl’ “book2’ “book3” all occurrences of “book”? ‘yes.” / %o.” / ‘why?” I: yes. Then we shall make ‘book” an entity when we have finished with this one.

/* The system Vemembers’ that a relationship will probably be required between ‘Borrower” and ubook”. */

A.2.3 Candidate Keys /’ uBorrower’ has two 4ey indicator attributes’, namely humber” and Same”. The system suggests that the ueer consider these when identifying candidate keys. */

Here are the attributes for: borrower: [number, name, address, phone] Enter an attribute or combination of attributes which could be used to identify UNIQUELY

an instance of the entity ‘borrower”. I will call this attribute (or combination of attributes) a ‘key”. (If you really do not

know, enter “end.“)

In particular, you may wish to consider: 1) number 2) name.

I: number.

7 Entity names are requested in their singular form because the system will later use the form of an entity (singular or plural) as it appears in a relationship to infer mapping ratios.



Is there any other way that an instance of the entity ‘borrower” could be identified? “see.“, to see how I can identify it so far / ‘no.” / ‘yes.” / identifying attribute(s) I: name address.

A.2.4 Multivalued Attributes

/* Since 4ook” was a repeating attribute, the system converts it to an entity. */

Is the term “book” in its SINGULAR form?” (‘yes.’ / %o.“) I: yes.

What are the attributes for ‘book”? / “end.” Enter them separated only by a blank and with a period after the last one. Else enter

‘end.” if you really do not know. I: catalogno title authorid volume publisher publisher-city

Any more attributes for “book”? ‘see.” / uyes.” / ‘no.” / new attributes I: n.

Good. Here are the attributes for: book: [catalog-no, title, authorid, volume, publisher, publisher-city] Can a single ‘book” have more than one value for any of these attribute(s)? “yes.a / %o.” / “why?” / attribute(s) which can have more than one value / “end.”

if you do not know. I: why?

If an attribute can have more than one value, then these values may be instances of the same ‘thing’, i.e. an entity. Now consider

book: [catalog-no, title authorid, volume, publisher, publisher-city] Can any of these attribute(s) have more than one value for each occurrence of “book”? ‘yes. n / ‘no.” / attributes which can have more than one value / =end.” if you do

not know. I: volume.

I will delete the attribute ‘volume” and we will make it an entity later. . . .

/’ The ueer identifies two candidate AUTHOR-ID]. The system %emembers’ %olume”. */

keys f07 abook=: I) [CATALOG-NO] and f) [TITLE, that a relationship will be required between abook’ and

Now let’s deal with ‘volume”. . .

/* The entity “volume’ is added with attributes: [COPY-NO, title, author-name]. */

A.25 Attribute Name Is Entity Name

/’ Zn the following ezample, one entity is NOT needed in the identification of another. The ezistence of a relationship is implied. */

Enter next entity (in its SINGULAR form) / ‘see.” / “end.” I: library. Enter the attributes for “library”. / ‘end.” I: name address books. I already know that “book” is an entity. Does this means that you will need to know

“book” before you will be able to identify UNIQUELY an instance of “library”? I: no. Then you are really telling me that a relationship exists between “library” and ‘book”.

We’ll get to that in a minute. . . .


328 l V. C. Storey and FL C. Goldstein

/* One entity IS needed to identify another. */

Enter next entity (in its SINGULAR form) / “see.” / “end.” I: branch. Enter the attributes for “branch” / “end.” t name address library. I already know that “library” is an entity. Does this mean that you will need to know

“library” before you will be able to identify UNIQUELY an instance of “branch”? I: yes. Fine. . . .

/’ The entity %anchD is added with attributes: [NAME, LIBRARY, address]. */

A.2.6 Generated Identifiers

Enter next entity / %ee.” / ‘end.” t librarian. Enter the attributes for “librarian.” / “end.” I: end. Then should the entity ‘librariar? exist? (‘yes.’ / ‘no.” ) I: yes. Then I shall make an attribute ‘librarianid.”

Enter next entity (in its SINGULAR form) / “see.” / “end.” I: employee . . .

A.2.7 Entity Modification

Here are the entities with their attributes and keys. borrower: [NUMBER, name, address phone] borrower: [NAME, ADDRESS, number, phone] book: [CATALOG-NO, author-id, title, publisher publisher-city] book: [TITLE, AUTHOR-ID, catalog-no, publisher publisher-city]

Do you wish to add, delete or change (rename, modify attributes / keys) any of the entities?

(‘no.” / “add.” / =delete.” / “change.“) I: add. Enter the name of the entity to be added: I: person. Enter the attributes for “person” / “end.”

Any more modifications?

( ‘see.” / “no.” / “add.” / ‘delete.” / %hange.‘) I: see.

borrower: [NUMBER, name, address phone] borrower: [NAME, ADDRESS, number, phone] book: [CATALOG-NO, title, authorid, publisher, publisher-city] book: [TITLE, AUTHOR-ID, catalog-no, publisher, publisher-city] volume: [COPY-NO, title, author-name] library: NAME, address] branch: [NAME, LIBRARY, address] librarian: [LIBRARIAN-ID]



employee: [NUMBER, name, address, years-ofservice] employee: [NAME, ADDRESS, number, years-of-service] person: [NAME, ADDRESS, phone, date-of-birth] Do you wish to add, delete or change (rename, modify attributes / keys) any of the

entities? (‘yea.” / ‘no.’ / ‘delete.” / “change.‘) t no. Fine.

A.2.8 Missing Entities

For the entity “book” I have an attribute ‘authorid”. It sounds like “author” should aleo be an entity in your database. Should I make it an entity? (=yes.” / ‘no.” / “why?“) t why? An attribute that ends in “id” is often used to identify something. If that is the case

here, then that something haa not been explicitly defined. Should I make “author” an entity?

(=yes.’ / ‘no.‘) t y. Enter the attributes for =author’. / “end.” t id. Any more attributes for ‘author”?

( ‘see.” / =yes.” / ‘no.’ / new attributes) t n.

Would you like to include the attribute ‘authorname”? (‘yes.” / %o.T / “why?“) t w. I know “author-name’ ie an attribute of at least one other entity, e.g. “volume”. Since

it starts with “autho?, I thought that it could be an attribute of “author”.

Do you wish to add ‘authorname” as an attribute of “author”? (‘yes.‘/%o.“) t yes. . . .

A.3 Relationships

You have a ‘library-circulation’ with the things you told me about. I need to know what happens with all of these things an how they interact with each other.

flights have aircrafts passenger is-a pereon paesengere reservefor lights

Please enter your information with three words (may use underscores) on each line followed by a dot and return. From now on I will refer to the information that you are going to enter ae ‘relationships”

Note that you should use both SINGULAR and PLURAL forms of the entity names. This is important, so think carefully about how many instances of one entity can occur in relationship to another. If you make a mistake, you can change it later.

Fir& would you like to see you entities again? (‘yes.‘/%o.“) t n.


330 ’ V. C. Storey and R. C. Goldstein

Fine. Enter a relationship of the form ‘A verb B”.

. . . .

A.3.1 Unidentified A and B Values

/’ In the first of the following two examples the verb phrase is ahas’ and =BB” is the name of an attribute. In the second ezample, aA” is unknown to the system. */

Enter next relationship / =see.” / “end. / =see-ent” (to see entities) I: book has title.

Should “title” be an entity? (Le. a “thing” of interest in your database.) (Yyes.n / =no.“) I: n.

Fine, then I do not need the relationship because I know ‘title” is an attribute of =book”.

~&;~;;t;;~~n;&~,“,8/ =see.” / ‘end.” / =see-ent’ ’ (to see entities)

Is =students” a new entity? You haven’t mentioned it before. (=yes.I) / “no.“) I: y. Please give me the singular form for =students” / ‘end.” I: student. Enter the attributes for =student’ / “end.“. . .

A.3.2 Mapping Ratios

/* 1. System infers all mapping ratios for relationships of the form #A is-a B’. Mapping ratios are A (1,l) and B (OJ). */

Enter next relationship / ‘see.” / ‘end.’ / ‘seeant.’ (to see entities) I: borrower is-a person.

Enter next relationship . . .

/* 6. System makes an inference about the mapping ratioa when the relationship is of the form aA (singular) verb phrase B (singular)‘. */

Enter next relationship / ‘see.” / ‘end.’ I: librarian directs library.

‘librarian directs library” Is there at least one “librarian” for every ‘library’? (‘yes.’ / ‘no.‘)

t Y- Is there at most one ‘librarian’ for every ‘library”? (‘yes.’ / =no.D) t y.

/* Since both Vibmrian” and ?ibmry# are in their singular forma, this implies that a libmrian directs one and only one library. Therefore, the system inferred that the min/maz values for libmrian are (1,l). It then queried fhe ueer for the min/maz values of library. Baaed on this information, the relationship will be represented by adding the key of “librarian= as a foreign key of ‘Library’. */

/* 3. System queries user for all the mapping mtios. */

p;;;;e;t;;eJ:“/ ‘see.” / ‘end.” / ‘seeant.” (to see entities)

“libraries have branches=



Is there at least one “branch” for every “library”? (‘yes.’ / ‘no.“) I: y. Is there at most one ‘branch’ for every “library?” (‘yes.” / ‘no.‘) I: Il.

Is there at least one “library” for every “branch”? (‘yes.* / ‘no.“) I: y. Is there at most one ‘library” for every “branch?” (‘yes.” / ‘no.‘) I: y.

/’ Hen, the min/maz values are A (l,N) and B (l,l), i.e. each library can have one to many branchca; each branch has one and only one library. This relationship will be represented by adding the key of libmry aa a foreign key of branch. */

A.3.3 Relationship Attributes

I: authors write books. Is there at least one “book” for every Uauthor”? (=yeam / ‘no.‘) I: y. Is there at most one ‘book” for every ‘author?” (=yes.” / yno.D) I: n.

Is there at least one =authof for every “book”? (‘yes.’ / %o.“) I: y. Is there at most one “authop for every “book?” (‘yes.’ / ‘no.“) I: n.

/* Here, the min/maz values are: A (l,N) and B (l,N). */

An attribute of a relationship is a property or characteristic of the relationship which is of interest to you.

E.g. Relationship: #passengers reserve-for flights” Acceptable response: “class’.

Note that %&ss~ is not an attribute of ‘passenger” or “flight” but is specifically an attribute of the relationship:

“passengers reservefor flights”.

When I add certain relationships I will ask for corresponding attributes. Relationship attributes may not always exist.

Enter attributes for ‘authors write books” / =end.” I: title date-completed.

/* The system checks that a relationship attribute has not previously been identified as an attribute of one of the entities. If this situation occurs, the system informs the user of the inconsistency and asks whether the attrifute should belong to the entity or the relationship. */

I already know that “title” is an attribute of “book”. Think carefully. Is “title” an attribute (i.e. a property or a characteristic) of:

a) the entity “book’ b) the relationship “authors write books”?

(‘a.* / ‘b.” / “no.” if you do not know.) I: a. Fine. Relationship attribute Udate-completed’ added.



/* System checks for multivalued relationship attributes. */

Can a single occurrence of ‘authors write books” have more than one value for the following attribute(s)?

[date-completed] I: n.

A.3.4 Prompts for Missing Relationships

/’ The system first checks for %olo entities’4.c. entities that do not appear in any relationship.

*/

I have the entity ‘employee’ but it does not occur in any relationship. Would you like to add a relationship(s) which includes the entity yemployee’?

(‘yes.” / #no.* / new relationship of the form “A Verb B” / “end.“) I: librarian is-an employee.

/* The user is prompted for a relationship between %ook# and %olumc’ because bolume” was originally a multivalued attribute of ‘book’ that wae converfed to an entity. */

Is there a relationship between “book” and “volume”? (‘yes.- / ‘no.’ / ‘why?” / new relationship of the form ‘A Verb B” / “end.“) t why? When you gave me the entity ‘book” you said that ‘volume” was one of its attributes.

That tells me that ‘book” and “volume” should be related. Is there a relationship between ‘book’ and “volume”? . . .

I: books have volumes. . . .

/’ The ueer originally tried to add %ok’ ae an attribute of ‘library” which implies that there is a relationship between %bmry’ and %ook’. */

Is there a relationship between ‘library’ and “book”? (‘ye~.~ / ‘no.” / ‘why?” / new relationship of the form “A Verb B” / ‘end.“) t why? Earlier you wanted to add a variation of ‘book” as an attribute of “library”. Since

‘book” was also an entity, you were really implying the existence of a relationship between “library” and ‘book’. Now, is there a relationship between ‘library” and ‘book”? . . .

I: libraries have books. . . .

/* The user has provided a a relationship between %ok’ and aauthor’ eo the system does not have to prompt for it. ‘Author-id’ woe orginally an attribute of ‘book’. Since aauthor’ became an entity, the original association between Sot’ and ‘author’ through this attribute suggests that there is a relationehip befwccn the two. The system informs the wer that ‘author-id’ is no longer needed as an attribute of 3ook’ and deletes it. Candidate keys for Book’ are reconsidered. */

Since I know that ‘book” and ‘authof are somehow related, I no longer need Uau- thorid’ as an attribute of “book” so I shall delete it.

Earlier you said that ‘author-id” was (part of) an identifier for ‘book’ so we will now have to go back and reconsider the possible identiliers for “book”. . .

/* The entity %ook’ ends up with only one candidate key: [CATALOG-NO]. */

/* l Author-name’ is an attribute of the entity ‘volume’ but there ie no relationship between aauthor” and bolume’. The system prompts the wer for one but, since relationships e&t between 3ook’ and aauthor’ and %ooka and bolume’, a new relationship is not necessary. */

Do you wish to provide a relationship between ‘volume” and uauthor”? (‘yes.” / ‘no.” / “why?“)



I: why?

You have an attribute of the entity %olume” which contains some form of the term Uauthor”. Since ‘autho? is also an entity, this suggests there could be at least one relationship between “volume” and “author”. Do you wish to provide a relationship between ‘volume’ and ‘author”. . .

I: no.

Then should ‘author-name” be an attribute of ‘volume’? (‘yes.” / ‘no.” / “why?“) I: why? Having ‘authorname” as an attribute of ‘volume” implies that a relationship could

exist between ‘volume” and “author”. Since you did not provide a relationship I was wondering if ‘author-name” should really be an attribute of “volume”. Now, should ‘authorname” be an attribute of Qolume”?

(‘yes.’ / ‘no.“) I: no.

Then I shall delete ‘author-name” as an attribute of ‘volume”.

A.35 Relationship Modification

RELATIONSHIPS borrower isa person employee8 are people student8 borrow books (date-due] students are people librarian direct8 library libraries have branches authors write books [date-completed] librarian is-an employee books have volumes borrowers borrow books Idate-due] libraries have books [date-acquired] End relationships. Do you wish to add or delete any of the relationships? . . . I: no.

A.3.6 Have/Has Relationships

/’ ‘Wave/hoe’ verb phrasea are modified to relect the appropriate interpretation of 8Have/has”--i.e. possession, component-of, instance-of or associated-with. */

“books have volumes.”

Which of the following is true? a) “book” owns / possesses ‘volume’ b) ‘volume” is a component of ‘books c) ‘volume’ is an instance / example of ‘book” d) “volume” associated with ‘book’ (in some other way) (‘a.’ / “b.” / ‘c.” / ‘no.’ (if you do not know.) t c.

/* The relationship %oke have volumes” is changed to %olume instance-of book’. */ /* The system checke that entities involved in %nstance-of* relationships are represented at the correct level of detail. */

You have previously indicated that ‘%olume instance-of book’ and “borrowers borrow books’.


334 ’ V. C. Storey and R. C. Goldstein

Which relationship(s) should I have: a) borrowers borrow volume b) borrowers borrow books c) both

ua.a / “b.” / %.” / 910.” if you do not know. I: a. Fine, then I shall delete “borrowers borrow books” and add “borrowers borrow vol-

ume.” . . .

/* A similar procedure is carried out for the relationship “students borrow books’. The relationship becomes Wudents borrow volume’. */

/* Zibraries have books’ becomea 5ook component-of library”. */

A.4 Modifications: First

/* The user is given an opportunity to modify the current set of entities, attributes and relationships. */

A.5 Synonyms

/* Relationships: 1) students borrow volume and i?) borrowers borrow volume. “/

Which of the following is true about a ‘student’ and a ‘borrower”?

a) They refer to exactly the same thing. b) ‘student’ is a subset of ‘borrower”

(i.e. every “student” is a ‘borrow& but not every ‘borrower” is a “student”) c) “borrower” is a subset of ‘student”

(i.e. every ‘borrowe? is a “student’ but not every ‘student” is a “borrower”) d) None of the above or you do not know. $.’ / ‘b.” / =c.‘/ =d:)

/*If the rerponse was =aO”, the user would have been asked which entity should be used; if the response was pb’, the relationship Wudent is-a borrower’ would have been added; if the response was “c”, the relationship ‘%orrower is-a student’ would have been added. */

A.6 Inherited Attributes

Since “librarian is-an employee”, we should be able to attribute to “librarian” all of the attributes of ‘employee” (whether or not you are interested in them). Would it be wrong to attribute any of the following attribute(s) to “librarian”?

1) number 2) name 3) address 4) years-of-service (‘yes.” / uno.“) I: no. Fine. . . .

A.7 Modifications: Last

/* The user is given a final opportunity to make modifications to the entities, attributes and relationships. */

A.8 Primary Keys

/* Before the selection of primary keys is made, some attribute names are prefized by their entity names in order to make the attribute name3 unique. */

I have adjusted some attribute names to make then unique. Here is your new set of entities.

borrower: [BORROWER-NUMBER, borrowername, borrower-address, borrower-phone]



borrower: (BORROWER-NAME, BORROWER-ADDRESS, borrowernumber, borrower-phone]

A.8.1 Component-of Relationships

/* The relationship “libraries have books s became Sook component-of library”. */

Here is what, I have for the entity “book” book: [CATALOG-NO, publisher, publisher-city, book-title] I was wondering if:

[CATALOG-NO] is sufficient to identify UNIQUELY an instance of “book” or do I need to add:

[LIBRARY-NAME] If you respond with “yes.” I will add the attribute(s). (‘yes.” / “no.” / “why?“) I: w. Since “book component-of library” I thought that I might need to know “library”

before I could UNIQUELY identify “book”. Now, consider the following for “book” book: [CATALOG-NO, publisher, publisher-city, book-title] Does:

(CATALOG-NO] UNIQUELY identify ‘book” or do I need to add:

[LIBRARY-NAME] If you respond with ‘yes.” I will add the attribute(s). (‘yes.’ / ‘no.“) I: no. Fine. . . .

/* The system has selected primary keys without interaction with the user. Note: 1) The key of bolume’ has been augmented by the key of ‘book’ as a result of the %&once-of- relationship that ezists between the two entities. The attribute Yitle’ has been deleted from %olume’ because %olume’ can inherit thin attribute from Sook#. g) The primary key of %anch” now contains the key of Ilibmry” because the attribute qibmry” has been replaced by its primary key. 3) The attribute =phone’ of borrower’ has been deleted because it can be inherited from aperaonB through the “is-a” relationship aborrower is-a person’. 4) The key of ‘person” appears aa a foreign key of ‘Borrower” due to the way the ayatcm considera ‘?a-a” nlationahipa when selecting primary keys. */

borrower: [BORROWER-NUMBER, person-address, personname] book: [CATALOG-NO, publiiher, publisher-city, book-title] volume: [CATALOG-NO, Cow-No] library: LIBRARY-NAME, library-address] branch: BRANCH-NAME, LIBRARY-NAME, branch-address] author: AUTHORID, author-name]

A.9 Relations

/* The system constructs relations. Each entity ia represented by constructing a relation where the key attributes of the entity Serve aa the relation’s key and the non-key attributer of the entity aa the relation’s non-key attributes. A relationship ia represented by either: 1) adding the key attributes of one entity as non-key attributes of another or g) constructing a new relation whose key is the concatenation of the keys of the entities involved. */


336 9 V. C. Storey and R. C. Goldstein

authors-write-books: [AUTHOR-ID, CATALOGNO, date-completed] students-borrow-volume: [STUDENT-NUMBER, CATALOGNO, COPY-NO, date-due] borrower: [BORROWER-NUMBER, personname, person-address] book: [CATALOG-NO, publisher, publisher-city, book-title] volume: [CATALOG-NO, copy-No]

A.1 0 Functional Dependencies

A.10.1 Partial Functional Dependencies

branch: [BRANCH-NAME, LIBRARY-NAME, branch-address] Which of the following do I need before I know the “branch-address”? a) branch-name b) libraryname c) branch-name &r libraryname ‘a.” / “b.” / ‘c.” / ‘no.’ if you do not know. kc. . . .

A.10.2 Transitive Functional Dependencies

For each of the following, identify any non-key attributes (written in lower case) which depend directly upon other non-key attributes rather than the key attributes.

Respond with: “non-key(s) depends-on non-key(s)“. Separate the non-key names by only a blank. E.g. flight: [FLIGHT-NO, cityfrom, city-to, departure-airport] Acceptable response:

‘departure-airport depends-on cityfrom city-to.’ Note that this means that knowing the values for “city-from” and “city-to” implies

that one will know what the departure-airport is.

Consider the following: Book: [CATALOG-NO, publisher, publisher-city, book-title]

horn this lit enter any non-key attribute(s) which depend on other non-key attribute(s) / “end.’

I: end.

What about “publisher” and “publisher-city”? Enter any “non-key attribute(s) which depend-on other non-key attribute(s) / “end.” Respond with ‘non-key(s) depends-on non-key(s).” I: publisher-city depends-on publisher.

/’ Relations produced are: Book: [CATALOG-NO, publisher, book-title] Publisher: [PUBLISHER, publisher-city]. */

A.1 1 End Session

That’s all I have to ask you about. Here is what I have:

student: [STUDENT-NUMBER, department, person-name, person-address] author: [AUTHOR-ID, authorname] person: [PERSON-NAME, PERSON-ADDRESS, person-phone, date-of-birth] employee: [EMPLOYEE-NUMBER, years-of-service, personname, person-address] librarian: [LIBRARIAN-EMPLOYEE-NUMBER] branch: [BRANCH-NAME, LIBRARY-NAME, branch-address]



library: [LIBRARY-NAME, library-address, librarian-employeenumber] volume: [CATALOG-NO, COPY-NO] book: [CATALOG-NO, publisher, book-title] borrower: [BORROWER-NUMBER, personname, person-address] authors-write-books: [AUTHOR-ID, CATALOGNO, date-completed] students-borrow-volume: [STUDENT-NUMBER, CATALOGNO, COPY-NO, date-due] borrowers-borrow-volume: [BORROWER-NUMBER, CATALOG-NO, COPY-NO, date-due] publisher: [PUBLISHER, publisher-city] End Relations

/’ Zibmria# no longer needa ifs generated key attribute since it has adopted the primary key of its superset entity, Smployee’. */

ACKNOWLEDGMENTS

We would like to thank our anonymous referees for their helpful comments on an earlier version of this paper.

BIBLIOGRAPHY

1. BALDISSERA, C., CERI, S., PALEGATTI, G., AND BRACCHI, G. Interactive and formal specification of user’s views in data base design. In Proceedings of the International Conference on Very Large Data Bases (Rio de Janiero, 1979), pp. 262-272.

2. BRACCHI, G. Methodologies and tools for logical database design. In Database Management: Theory and Applications, C. W. Holsapple and A. B. Whinston, Eds. Reidel, Hingham, Mass., 1981, pp. 59-86.

3. BRACHMAN, R. J. What is-a is and isn’t: An analysis of taxonomic links in semantic networks. Computer (Oct. 1983), 30-36.

4. CHEN, P. P.-S. The Entity-Relationship model-Toward a unified view of data. ACM Trans. Database Syst. 1, 1 (Mar. 1976), 9-36.

5. CLARK, K., AND MCCABE, F. PROLOG: A language for implementing expert systems. Tech. Rep. Dot 80/21, Imperial College, Univ. of London, 1980.

6. COELHO, H. The art of knowledge engineering with PROLOG. INFOLOG Pr06, Fat. Ciencias, Univ. Lisboa, Portugal, 1983.

7. DATE, C. J. An Introduction to Database Systems. Vol. 1, 4th ed. Addison-Wesley, Reading, Mass., 1986.

8. HAMMOND, P. Logic programming for expert systems. Tech. Rep. Dot 82/4, Dept. of Computing, Imperial College of Science and Technology, Univ. of London, 1982.

9. HOWE, D. R. Data Analysis for Data Base Design. Arnold, London, 1983. 10. MARTIN, J. An End User’s Guide to Data Bases. Prentice-Hall, Englewood Cliffs, N.J., 1981. 11. NAVATHE, S. B., AND ELMASRI, R. Integrating user views in database design. Computer (Jan.

1986), 50-62. 12. NAVATHE, S. B., AND GADQIL, S. G. A methodology for view integration in logical database

design. In Proceedings of the 8th International Conference on Very Large Data Bases (Mexico City). 1982, pp. 142-164.

13. NAVATHE, S. B., AND SCHKOLNICK, M. View representation in logical database design. In Proceedings of the ACM-SZGMOD International Conference (Austin, Tex., May 31June 2,1978). ACM, New York, 1978, pp. 144-156.

14. PARSAYE, K. Database management, knowledge base management and expert systems development in Prolog. ACM-SZGMOD Database Week for Business and Office Applications (San Jose, Calif., May). ACM, New York, 1983, pp. 159-178.

15. RAVER, N., AND HUBBARD, G. U. Automated logical data base design: Concepts and applications. IBM Syst. J. 16,3 (1977).

16. SMITH, J. M., AND SMITH, D. C. P. Database abstractions: Aggregation and generalization. ACM Trans. Database Syst. 2,2 (June 1977), 105-133.


338 . V. C. Storey and FL C. Goldstein

17. STOREY, V. C. View Creation:An Expert System for Database Design. Ph.D. dissertation, Faculty of Commerce and Business Administration, Univ. of British Columbia, Vancouver, B.C., Canada, Oct. 1986, ICIT Press, 1988.

18. TSICHRITZIS, D., AND LOCKOVSKY, F. Data Models. Prentice-Hall, Englewood Cliffs, N.J., 1982.

Received December 1986; revised October 1987; accepted November 1987


A Methodology for Creating User Views in Database …nikos/mis-ii/papers/storey.pdfA Methodology for...

Documents

Transcript of A Methodology for Creating User Views in Database …nikos/mis-ii/papers/storey.pdfA Methodology for...