Data Modeling...For example, consider the doctor’s office system that was described in Chapter 6....

30
A data model describes the data that flows through the business processes in an organi- zation. During the analysis phase, the data model presents the logical organization of data without indicating how the data are stored, created, or manipulated so that analysts can focus on the business without being distracted by technical details. Later, during the design phase, the data model is changed to reflect exactly how the data will be stored in databases and files. This chapter describes entity relationship diagramming, one of the most common data modeling techniques used in industry. OBJECTIVES Understand the rules and style guidelines for creating entity relationship diagrams. Be able to create an entity relationship diagram. Become familiar with the data dictionary and metadata. Become familiar with the process of normalization. Understand how to balance between entity relationship diagrams and data flow diagrams. CHAPTER OUTLINE INTRODUCTION During the analysis phase, analysts create process models to represent how the business system will operate. At the same time, analysts need to understand the information that is used and created by the business system (e.g., customer information, order informa- tion). In this chapter, we discuss how the data that flow through the processes are orga- nized and presented. A data model is a formal way of representing the data that are used and created by a business system; it illustrates people, places, or things about which information is captured Introduction The Entity Relationship Diagram Reading an Entity Relationship Diagram Elements of an Entity Relationship Diagram The Data Dictionary and Metadata Creating an Entity Relationship Diagram Building Entity Relationship Diagrams Applying the Concepts at CD Selections Validating an ERD Design Guidelines Normalization Balancing Entity Relationship Diagrams with Data Flow Diagrams Summary APPENDIX A Data Modeling 1

Transcript of Data Modeling...For example, consider the doctor’s office system that was described in Chapter 6....

A data model describes the data that flows through the business processes in an organi-zation. During the analysis phase, the data model presents the logical organization of datawithout indicating how the data are stored, created, or manipulated so that analysts canfocus on the business without being distracted by technical details. Later, during the designphase, the data model is changed to reflect exactly how the data will be stored in databasesand files. This chapter describes entity relationship diagramming, one of the most commondata modeling techniques used in industry.

OBJECTIVES

■ Understand the rules and style guidelines for creating entity relationship diagrams.■ Be able to create an entity relationship diagram.■ Become familiar with the data dictionary and metadata.■ Become familiar with the process of normalization.■ Understand how to balance between entity relationship diagrams and data flow diagrams.

CHAPTER OUTLINE

INTRODUCTION

During the analysis phase, analysts create process models to represent how the businesssystem will operate. At the same time, analysts need to understand the information thatis used and created by the business system (e.g., customer information, order informa-tion). In this chapter, we discuss how the data that flow through the processes are orga-nized and presented.

A data model is a formal way of representing the data that are used and created by abusiness system; it illustrates people, places, or things about which information is captured

IntroductionThe Entity Relationship Diagram

Reading an Entity Relationship DiagramElements of an Entity Relationship

DiagramThe Data Dictionary and Metadata

Creating an Entity Relationship DiagramBuilding Entity Relationship Diagrams

Applying the Concepts at CD Selections

Validating an ERDDesign GuidelinesNormalizationBalancing Entity Relationship

Diagrams with Data Flow DiagramsSummary

A P P E N D I X A

Data Modeling

1

and how they are related to each other. The data model is drawn using an iterative processin which the model becomes more detailed and less conceptual over time. During analysis,analysts draw a logical data model, which shows the logical organization of data withoutindicating how data are stored, created, or manipulated. Because this model is free of anyimplementation or technical details, the analysts can focus more easily on matching thediagram to the real business requirements of the system.

In the design phase, analysts draw a physical data model to reflect how the data willphysically be stored in databases and files. At this point, the analysts investigate ways tostore the data efficiently and to make the data easy to retrieve. The physical data model(Chapter 8) and performance tuning (Chapter 9) are discussed later in the textbook.

Project teams usually use CASE tools to draw data models. Some of the CASE tools aredata modeling packages, such as ERwin by Platinum Technology that help analysts createand maintain logical and physical data models; they have a wide array of capabilities to aidmodelers, and they can automatically generate many different kinds of databases from themodels that are created. Other CASE tools (e.g., Oracle Designer) come bundled with data-base management systems (e.g., Oracle), and they are particularly good for modeling data-bases that will be built in their companion database products. A final option is to use afull-service CASE tool, such as Visible Analyst Workbench in which data modeling is oneof many capabilities, and the tool can be used with many different databases. These toolshave decent data modeling tools although they do not have as much functionality and sup-port for data modeling as the two other kinds of CASE software. A benefit of the full-ser-vice CASE tool is that it well integrates the data model information with other relevantparts of the project.

In this chapter, we focus on creating a logical data model. Although there are severalways to model data, we will present one of the most commonly used techniques: entityrelationship diagramming, a graphic drawing technique developed by Peter Chen1 thatshows all the data components of a business system. We will first describe how to create anentity relationship diagram (ERD) and discuss some style guidelines. Then, we will presenta technique called normalization that helps analysts validate the data models that theydraw. The chapter ends with a discussion of how data models balance, or interrelate, withthe process models that you learned about in Chapter 6.

THE ENTITY RELATIONSHIP DIAGRAM

An entity relationship diagram (ERD) is a picture that shows the information that is created,stored, and used by a business system. An analyst can read an ERD to find out the individ-ual pieces of information in a system and how they are organized and related to each other.On an ERD, similar kinds of information are listed together and placed inside boxes calledentities. Lines are drawn between entities to represent relationships among the data, andspecial symbols are added to the diagram to communicate high-level business rules thatneed to be supported by the system. The ERD implies no order, although entities that arerelated to each other are usually placed close together.

For example, consider the doctor’s office system that was described in Chapter 6.Although we understand how the system will work from studying the data flow diagram(DFD), we have very little detailed understanding about the information itself that flows

2 Appendix A Data Modeling

1 P. Chen, “The Entity-Relationship Model—Toward a Unified View of Data,” ACM Transactions on DatabaseSystems, 1976, 1:9–36.

through the system. What exactly is “patient information”? What pieces of information arecaptured when making an “appointment”? How is a patient related to the appointment thathe or she makes?

Reading an Entity Relationship DiagramThe analyst can answer these questions and many more by examining the ERD that is pre-sented in Figure A-1. First, the analyst knows that the data to support the appointment sys-tem can be organized into six main categories—patient, appointment, doctor, bill,payment, and insurance company—and that patient information includes the patient’s IDnumber, last name, first name, address, phone number, and birthdate. The analyst under-stands what information is used to uniquely identify a patient, an appointment, a doctor,and so on by looking for the information in the top box of each category. For example, thepatient ID number is used to identify a particular patient.

The lines connecting the six categories of information communicate the relationshipsthat the categories share. By reading the relationship lines, the analyst understands that adoctor can be scheduled for an appointment that is scheduled by a patient.

The ERD also communicates high-level business rules. Business rules are constraintsor guidelines that are followed during the operation of the system; they are rules like “onlyone person can be seen by a doctor at a time” or “payments must be made in the form ofcash or check” or “a bill can be paid by insurance, a patient, or a combination of both.”Over the course of a workday, people are constantly applying business rules to do theirjobs, and they know what the rules are because they either have been taught them (theoffice manager will know from experience that credit card payments are not accepted) orthey can look them up somewhere. If a situation arises and people don’t know how tohandle it, they may have to refer to a policy guide or written procedure to determine whatthe business rules are.

On a data model, business rules are communicated by the kinds of relationships thatthe entities share. From the ERD, for example, we know that a patient can schedule manyappointments from the round dot placed on the line closest to appointment, and we knowthat a person does not have to be insured to become a patient because of the diamond onthe line nearest the insurance company. A doctor can be scheduled for many appointments,and each appointment generates one bill (communicated by the number one on the lineclosest to bill). Both the patient and insurance company make payments, and this rulemakes sense given that a bill can have many payments associated with it. Ultimately, thenew system should support the business rules we just described, and it should ensure thatusers don’t violate the rules when performing the processes of the system.

Now that you’ve seen an ERD, let’s step back and learn the ERD basics. In the follow-ing sections, we will first describe the syntax of the ERD using the diagram in Figure A-1.Then we will teach you how to create an ERD using an example from CD Selections.

Elements of an Entity Relationship DiagramThere are three basic elements in the data modeling language (entities, attributes, and rela-tionships), each of which is represented by a different graphic symbol. There are many dif-ferent sets of symbols that can be used on an ERD. No one set of symbols dominatesindustry use, and none is necessarily better than another. We will use IDEF1X in this book.Figure A-2 summarizes the three basic elements of ERDs and the symbols we will use.

Entity The entity is the basic building block for a data model. It is a person, place,event, or thing about which data is collected—for example, an employee, an order, or aproduct. An entity is depicted by a rectangle, and it is described by a singular noun

The Entity Relationship Diagram 3

spelled in capital letters. All entities have a name, a short description that explains whatthey are, and an identifier that is the way to locate information in the entity (which is dis-cussed later). In Figure A-1, the entities are patient, doctor, appointment, payment,insurance company, and bill.

Entities represent something for which there exists multiple instances, or occurrences.For example, John Smith and Susan Jones could be instances of the entity patient (FigureA-3). We would expect the patient entity to stand for all of the people who have scheduledan appointment, and each of them would be an instance in the patient entity.

Attribute An attribute is some type of information that is captured about an entity. Forexample, date of birth, home address, and last name are all attributes of a patient. It is easyto come up with hundreds of attributes for an entity (e.g., a patient has an eye color, afavorite hobby, a religious affiliation), but only those that will actually be used by a busi-ness process should be included in the model.

Attributes are nouns that are listed within an entity. Usually, some form of the entityname is appended to the beginning of each attribute to make it clear as to what entity itbelongs (e.g., PAT_lastname, PAT_address). Without doing this, you can get confused bymultiple entities that have the same attributes—for example, a doctor and a patient bothcan have an attribute called “lastname.” DOC_lastname and PAT_lastname are muchclearer attributes to place on the data model.

4 Appendix A Data Modeling

is scheduled for/includes

generates /is generated by makes /

is made by

schedules /is scheduled by

is paid by /pays

makes /is made by

insures /is insured by

PAY_amount

PAY_date

PAY_method

PAT_idnumber

PAT_firstname

PAT_lastname

PAT_address

PAT_city

PAT_state

PAT_zipcode

PAT_homephone

PAT_birthdate

PATIENT

PAY_receiptnumber

PAYMENT

INS_name

INSURANCE COMPANY

BIL_amountinsured

BIL_amountnotinsured

BIL_datesent

BIL_status

BIL_number

BILL1

APPOINTMENT

APP_date

APP_time

APP_reason

APP_status

APP_duration

INS_benefitscontact

INS_phonenumber

INS_claimsaddress

INS_claimssummaryinformation

DOCTOR

DOC_physicianidnumber

DOC_firstname

DOC_lastname

DOC_address

DOC_city

DOC_state

DOC_zipcode

DOC_homephone

DOC_pagenumber

DOC_primaryspecialty

FIGURE A-1 Example Entity Relationship Diagram

One or more attributes can serve as the identifier, the attribute(s) that can uniquely iden-tify one instance of an entity, and the attributes that serve as the identifier are placed in the topbox of the entity rectangle. If there are no patients with the same last name, then last name canbe used as the identifier of the patient entity. In this case, if we need to locate John Brown, thename Brown would be sufficient to identify the one instance of the Brown last name.

Suppose we add a patient named Sarah Brown. Now we have a problem: using the nameBrown would not uniquely lead to one instance—it would lead to two (i.e., John Brown andSarah Brown). You have three choices at this point, and all are acceptable solutions. First, youcan use a combination of multiple fields to serve as the identifier (last name and first name).This is called a concatenated identifier because several fields are combined, or concatenated,to uniquely identify an instance. Second, you can find a field that is unique for each instance,like the patient ID number. Third, you can wait to assign an identifier (like a randomly gen-erated number that the system will create) until the design phase of the SDLC (Figure A-4).Many data modelers don’t believe that randomly generated identifiers belong on a logicaldata model because they do not logically exist in the business process.

Relationship Relationships are associations between entities, and they are lines that con-nect the entities together. Every relationship has a parent entity and a child entity, the par-ent being the first entity in the relationship, and the child being the second.

Relationships should be clearly labeled with active verbs so that the connectionsbetween entities can be understood. If one verb is given to each relationship, it is read intwo directions. For example, we could write the verb schedules alongside the relationshipfor the patient and appointment entities, and this would be read as “a patient schedules an

The Entity Relationship Diagram 5

IDEF1X Chen Crow’s Foota

An ENTITY: Is a person, place, or thing Has a singular name spelled in all capital letters Has an identifier Should contain more than one instance of data

An ATTRIBUTE: Is a property of an entity Should be used by at least one business process Is broken down to its most useful level of detail

A RELATIONSHIP: Shows the association between two entities Has modality (0,1) Has cardinality (1,M) Is described with a verb phrase

aThis is the notation that will be used throughout the textbook

Relationship-name Relationship-name

ENTITY-NAME ENTITY-NAME ENTITY-NAME

*Identifier

ENTITY-NAME ENTITY-NAME

Attribute-nameAttribute-nameAttribute-name

Identifier

Attribute-nameAttribute-nameAttribute-name

FIGURE A-2Data Modeling Symbol Sets

appointment” and “an appointment is scheduled by a patient.” In Figure 7.1, we haveincluded words for both directions of the relationship line; the top words are read fromparent to child, and the bottom words are read from child to parent. Notice that the doctorentity is the parent entity in the doctor – appointment relationship.

Cardinality Relationships have two properties. First, a relationship has cardinality, whichis ratio of parent instances to child instances. To determine the cardinality for a relation-ship, we ask ourselves: “How many instances of one entity are associated with an instanceof the other?” (Remember that an instance is one occurrence of an entity, such as patientJohn Smith, Dr. Dave Brousseau, or Aetna Insurance Company.) For example, how manyappointments can a patient schedule? How many patients are associated with an appoint-ment? How many insurance companies can a patient have? The cardinality for binary rela-tionships (i.e., relationships between two entities) are 1 : 1, 1 : N, or M : N, and we willdiscuss each in turn.

The 1 : 1 relationship (read as “one to one”), means that one instance of the parententity is associated with one instance of the child entity. Notice that there is a 1 : 1 rela-tionship between appointment and bill in Figure A-1. Each appointment (the parententity) always results in a single bill that is generated, and a bill (the child entity) is associ-ated with only one appointment. The model contains a solid dot and a number “1” on therelationship line nearest the child entity to designate the 1 : 1 relationship.

More often, relationships are 1 : N relationships (read as “one to many”). In this kindof relationship, a single instance in a parent entity is associated with many instances of achild entity; however, the child-entity instance is only related to one instance in the par-ent. For example, a patient (parent entity) can schedule many appointments (childentity), but a particular appointment is scheduled by only one patient, suggesting a 1 : Nrelationship between patient and appointment. Another 1 : N relationship is betweeninsurance company and payment (an insurance company can make many payments, but aspecific payment is only associated with one insurance company). A character resemblinga solid dot is placed closest to appointment to show the “many” end of the relationship.

6 Appendix A Data Modeling

PatientJohn SmithSusan JonesPeter ToddDale TurnerPat Turner

Entity Example Instances

FIGURE A-3Entities and Instances

PATIENT PATIENT PATIENT

ConcatenatedIdentifier

Single Identifier Identifier to be AddedLater

PAT_lastnamePAT_firstname

PAT_Idnumber

PAT_lastnamePAT_firstname

PAT_lastnamePAT_firstnameFIGURE A-4

Choices for Identifiers

The parent entity is always on the “1” side of the 1 : N relationship. Can you identify sev-eral other 1 : N relationships in Figure A-1? Identify the parent and child entities for eachrelationship.

A third kind of relationship is the M : N relationship (read as “many to many”). In thiscase, many instances of a parent entity can be related to many instances of a child entity.There are no M : N relationships in Figure A-1, but take a look at Figure A-5, which pro-vides two M : N relationship examples. The first example demonstrates that one patient(parent entity) can display many different symptoms at a time (e.g., sore throat, fever,runny nose), and a symptom (child entity) can be associated with many different patients.The second M : N in Figure A-5 communicates that a doctor can have many different spe-cialties (e.g., infant contagious disease, cardiology, neurology), and a specialty can be asso-ciated with many different doctors. M : N relationships are depicted on an ERD by havingsolid dots at both ends of the relationship line.

Modality Second, relationships have a modality of null or not null, which refers towhether or not an instance of a child entity can exist without a related instance in the par-ent entity. Basically, the modality of a relationship indicates whether the child-entityinstance must participate in the relationship. It forces you to ask questions like: Can youhave an appointment without a doctor? Can you have a bill without an appointment?Modality is depicted by placing a diamond on the relationship line next to the parent entityif nulls are allowed.

The Entity Relationship Diagram 7

can help patients regarding /is assigned to

PAT_idnumber

PAT_firstname

INS_name

PAT_lastname

PAT_address

PAT_city

PAT_state

PAT_zipcode

PAT_homephone

PAT_birthdate

PATIENT

SPE_name

SPE_description

SPECIALTY

comes down with /is shown by

SYM_name

SYM_description

SYMPTOM

DOCTOR

DOC_physicianidnumber

DOC_firstname

DOC_lastname

DOC_address

DOC_city

DOC_state

DOC_zipcode

DOC_homephone

DOC_pagenumberFIGURE A-5M : N Relationships

In the first two questions we asked, the answer is no—of course you need a doctor tohave an appointment, and of course you can’t send a bill without an associated appoint-ment. Thus, the relationships are left alone, and the analyst understands that the modalityis “not null” for both of these relationships. Notice, however, that a diamond has beenplaced on the relationship line next to insurance company. This means that we can have apatient in our system who does not have an insurance company, and the modality is “null.”

The Data Dictionary and MetadataAs we described earlier, a CASE tool is used to help build ERDs. Every CASE tool hassomething called a data dictionary, which quite literally is where the analyst goes todefine or look up information about the entities, attributes, and relationships on theERD. Figures A-6, A-7, and A-8 illustrate common data dictionary entries for an entity,

8 Appendix A Data Modeling

A wealthy businessman owns a large number of paint-ings that he loans to museums all over the world. He isinterested in setting up a system that records what heloans to whom so that he doesn’t lose track of his invest-ments. He would like to keep information about thepaintings that he owns as well as the artists who paintedthem. He also wants to track the various museums thatreserve his art along with the actual reservations. Obvi-ously artists are associated with paintings, paintings areassociated with reservations, and reservations are associ-ated with museums.

Questions:

1. Draw the four entities that belong on this data model.2. Provide some basic attributes for each entity, and

select an identifier, if possible.3. Draw the appropriate relationships between the

entities and label them.4. What is the cardinality for each relationship? Depict

this on your drawing.5. What is the modality for each relationship? Depict

this on your drawing.6. List two business rules that are communicated by

your ERD.

A-1 Understanding the Elements of an ERDYOUR

TURN

FIGURE A-6Data Dictionary Entryfor the Patient Entity(Shown Using ERwin)

an attribute, and a relationship; notice the kinds of information the data dictionary cap-tures about each element.

The information you see in the data dictionary is called metadata, which, quite simply,is data about data. Metadata is anything that describes an entity, attribute, or relationship,such as entity names, attribute descriptions, and relationship cardinality, and it is capturedto help designers better understand the system that they are building and to help users bet-ter understand the system that they will use. Figure A-9 lists typical metadata that are foundin the data dictionary. Notice that the metadata can describe an ERD element (like entityname) and also information that is helpful to the project team (like the user contact, theanalyst contact, and special notes).

Metadata is stored in the data dictionary so that it can be shared and accessed by devel-opers and users throughout the SDLC. The data dictionary allows you to record the stan-dard pieces of information about your elements in one place, and it makes thatinformation accessible to many parts of a project. For example, the data elements in a data

The Entity Relationship Diagram 9

Alpha-Numeric

Yes

Usually, the patient’s social security number is used. When a patientdoes not have a social security number, the number assigned by thepatient’s insurance provider is used. If the patient does not have aninsurance provider, a driver’s license number is used.

The ID number assigned by the doctor’s office that uniquely identifiesa patient.

Data Type:

Identifier:

Notes:

If the patient’s insurance provider number is used and it is greaterthan 9 characters, the first 9 characters are used to create thisattribute. Zeros are added to the number if it is less than 9characters.

Notes:

Description:

Patient ID Number, Social Security Number, SSN

PATIENT

PAT _ idnumber

Aliases:

Entity:

Attribute Name:

FIGURE A-7CASE Entry for Patient

1 : N

Nulls allowed

A patient can exist without insurance.

An insurance company insures patients, usually through work-relatedbenefits. A patient can be insured by an insurance company, althoughpatients exist without insurance.

Cardinality:

Modality:

Notes:

An insurance company can be affiliated with any number of patient.Notes:

Description:

PATIENT

Insures/Is insured by

INSURANCE COMPANY

Child Entity:

Relationship:

Parent Entity:

FIGURE A-8CASE Entry for theInsurance Companyand Patient Relationship

model also appear on the process models as data stores and data flows, and on the userinterface as fields on an input screen. When you make a change in the data dictionary, thechange ripples to the relevant parts of the project that are affected.

When metadata is complete, clear, and shareable, the information can be used to integratethe different pieces of the analysis phase and ultimately lead to a much better design. Itbecomes much more detailed as the project evolves though the SDLC.

10 Appendix A Data Modeling

Examine the CASE tool that you will be using for yourproject, or find a CASE tool on the Web that you areinterested in learning about. What kind of metadata

does its data dictionary capture? Does the CASE toolintegrate data model information with other parts of aproject? How?

A-2 Evaluate Your CASE ToolYOUR

TURN

Entity Name Doctor

Definition A doctor is any medical professional who has been hired to see patients during scheduled appointments in the doctor’s office.

Special notes A nurse practitioner is considered a doctor within this system.

User contact Virginia Baker is the office manager (x4335), and she can provide informationabout

the doctor entity.

Analyst contact Barbara Wixom is the analyst assigned to this entity.

Attribute Name DOC_physicianidnumber

Definition A physicianidnumber is a number assigned to a doctor by the medical industry. The number was put in place to track physicians and the prescription they make.

Alias Physician identification number, PID

Sample values 334997300

Acceptable values Any 9-digit number

Format 9-digits, no spaces or dashes

Type Numeric

Special Notes If a nurse practitioner does not have a PID, then his or her social security numberwill

be used.

Relationship Verb Phrase Schedules

Parent Entity Doctor

Child Entity Appointment

Definition One doctor is scheduled to participate in an appointment, and he or she can be scheduled for many appointments over time. A single appointment can only haveone doctor affiliated with it.

Cardinality 1 : N

Modality Not null

Special Notes Investigate how walk-in patients are handled! Are they placed on the schedule like regularly scheduled appointments?

ERD Element Kinds of Metadata Example

FIGURE A-9 Types of Metadata Captured by the Data Dictionary

CREATING AN ENTITY RELATIONSHIP DIAGRAM

Drawing an ERD is an iterative process of trial and revision. It usually takes considerablepractice. ERDs can become quite complex—in fact, there are systems that have ERDs con-taining hundreds or thousands of entities. The basic steps in building an ERD are these: (1)identify the entities, (2) add the appropriate attributes to each entity, and then (3) draw rela-tionships among entities to show how they are associated with one another. First, we willdescribe the three steps in creating ERDs using the data model example from Figure A-1.Then, we will present an ERD for CD Selections.

Building Entity Relationship DiagramsStep 1: Identify the Entities As we explained, the most popular way to start an ERD isto first identify the entities for the model and their attributes. The entities should representthe major categories of information that you need to store in your system. If the processmodels (e.g., DFDs) have been prepared, the easiest way to start is with them: the datastores on the DFDs, the external entities, and the data flows indicate the kinds of informa-tion that are captured and flow through the system. If you begin your data model using ause case, look at the major inputs to the use case, the major outputs, and the informationused for the use-case steps.

Examine the process model (Figure 6.12) and the use cases (Figure 5.1) for the doc-tor’s office system. As you look at the doctor’s office system’s level 0 DFD, you see that thereare three data stores: patient, appointment, and billing. Each of these unique types of dataprobably will be represented using entities on our data model.

As a next step, you should examine the external entities and ask, “Will the system needto capture information about any of these entities?” We are already capturing patient infor-mation, but will we also need doctor and insurance company data? The answer is yes—weneed information about doctors and insurance companies. The make appointments processwill certainly need to include what doctor is affiliated with the appointment, and the performbilling process likely cannot be performed without information about the patient’s insurer.

We should note that if the doctor’s office only contained one doctor, then there wouldbe no need for a doctor entity. A rule of thumb is to include only entities that contain morethan one instance. We will assume that this doctor’s office has multiple doctors who can seepatients, so we include doctor on the ERD.

Finally, it is good to look at the data flows and see if there is information that flowsthrough the system that has not yet been captured by the ERD. Notice that there is infor-mation called a bill that flows from the perform billing process. This seems logically differ-ent from the other entities. Remember that creating ERDs is an iterative process, so we cango ahead and add an entity called bill for now to store bill information—we always canmake changes down the road if this turns out to be a bad assumption.

Step 2: Add Attributes and Assign Identifiers The information that describes eachentity becomes its attributes. It is likely that you identified a few attributes if you read thedoctor’s office system use cases and paid attention to the information flows on its DFDs.For example, a patient has a name and information, and an appointment has a date. Unfor-tunately, much of the information from the process models and use cases is at too high alevel to identify the exact attributes that should exist in each of our six entities.

On a real project, there are a number of places where you can go to figure out whatattributes belong in your entity. For one, you can check in the CASE tool—often an analystwill describe a process model data flow in detail when he or she enters the data flow into theCASE repository. For example, an analyst may create an entry for the patient informationdata flow like the one shown in Figure A-10 that lists nine data elements that comprise patient

Creating an Entity Relationship Diagram 11

information. The elements of the data flow should be added to the ERD as attributes in yourentities. A second approach is to check the requirements definition. Often there is a sectionunder functional requirements called data requirements. This section describes the dataneeds for the system that were identified while gathering requirements. A final approach toidentifying attributes is to use requirements-gathering techniques. The most effective tech-niques would be interviews (e.g., asking people who create and use reports about their dataneeds) or document analysis (e.g., examining existing reports or input screens).

Once the attributes are identified, one or more of them will become the entity’s identi-fier. The identifier must be an attribute(s) that is able to uniquely identify a single instance ofthe entity. Look at Figure A-1 and notice the identifiers that were selected for each entity. Fiveof the entities had some attribute that could identify an instance by itself (e.g., physician IDnumber, bill number, payment receipt number, insurance company name, and patient ID).

Step 3: Identify Relationships The last step in creating ERDs is to determine how theentities are related to each other. Lines are drawn between entities that have relationships,and each relationship is labeled and assigned both a cardinality and a modality. The easi-est approach is to begin with one entity and determine all the entities with which it sharesrelationships. For example, there likely there are relationships between a patient andappointment, payment, and insurance company. We know that a patient schedulesappointments, makes payments, and is insured by an insurance company. The same exer-cise is done with each entity until all of the relationships have been identified. Each rela-tionship should be labeled appropriately as it is added.

When you find a relationship to include on the model, you need to determine its car-dinality and modality. For cardinality, ask how many instances of each entity participate inthe relationship. You know that an insurance company can have many patients, but that apatient is affiliated with only one insurer. This suggests that there is a 1: N relationship inwhich the insurance company is the parent entity (the “1”) and the patient is the childentity (the “many”). Next we examine the relationship’s modality. Can a patient exist with-out an insurance company? We already determined that the answer is “yes,” so the modal-ity for the relationship is “null.” This exercise is done for every relationship. Notice thecardinality and modality for each relationship in Figure A-1.

Again, remember that data modeling is an iterative process. Often the assumptions youmake and the decisions you make change as you learn more about the business require-ments and as changes are made to the use cases and process models. But, you have to startsomewhere—so do the best you can with the three steps we just described and keep iterat-ing until you have a model that works. Later in this chapter, we will show you a few waysto validate the ERDs that you draw.

Applying the Concepts at CD SelectionsLet’s go through one more example of creating a data model using the context of CD Selec-tions. For now, review the example use case that was presented in Figure 5-6 and the finallevel 0 process model presented in Figure 6-19.

12 Appendix A Data Modeling

Data flow name: Patient information

Data elements: identification number + first name + last name + address + city + state + zip code + home phone number + birth date

FIGURE A-10Elements of the PatientInformation Data Flow

Identify the Entities When you examine the CD Selections level 0 DFD, you seethat there are four data stores: CD information, inventory, marketing, and in-storeholds. Each of these unique types of data likely will be represented using entities on adata model.

As a next step, you should examine the external entities and ask, “Will the system needto capture information about any of these entities?” You may be tempted to include mar-keting manager and in-store staff, but there really is no need to track information abouteither of these in our system. Later, we may want to track system users, passwords, and dataaccess privileges, but this information has to do with the use of the new system and wouldnot be added until the physical data model (which is created in the design phase).

Customer and vendor are different matters. We need to capture information aboutboth of these external entities. First consider customer. The take-request process requiresthat a customer provide a name and some basic contact information that is used for hold-ing and special ordering CDs. A customer entity is needed on the data model to store thisinformation. Likewise, the maintain marketing materials process accepts basic vendorinformation when a vendor sends in marketing material. This information should bestored in a vendor entity.

The last external entity, special order system, would not be added to the data modelbecause we don’t need to store or capture any information about it. Additionally, the spe-cial order system entity refers to only one system. A rule of thumb is to exclude entities onyour data model that have only one instance. If, however, our Internet order system inter-faced with multiple systems and we tracked information about each of them for some busi-ness reason, then it would be wise to create an entity called system to represent the multiplesystems involved (the special order system would be one instance of this entity).

It is good practice to also look at the data flows on your process model and make surethat all of the information that flows through the system has been covered by your ERD.Unlike the doctor’s office example (which found an additional entity by looking at the dataflows), it appears that the main entities for CD Selections have been identified after exam-ining the data stores and external entities.

To recap, by examining our process models, we have determined that our data modelwill contain six entities: CD, inventory, marketing material, in-store hold, customer, andvendor. See Figure A-11 for the beginning of our data model.

Identify the Attributes The next step is to select which attributes should be used todescribe each entity. It is likely that you identified a handful of attributes if you read the CDSelections use cases and examined the DFDs. For example, a CD has an artist, title, cate-gory, and “sale” status and some attributes of customer are name and contact information,which likely includes address, phone number, and e-mail address.

As a second example, the inventory external entity suggests that a data store is neededto hold inventory information, and the take-request use case provides some indication ofthe kind of information that it will have. The use case says that the new system will find a

Creating an Entity Relationship Diagram 13

MARKETING MATERIAL CD INVENTORY

VENDOR CUSTOMER HOLD

FIGURE A-11Entities for CD Selections ERD

store close to the customer and display the availability of the CD in that store. To do this,we need to capture the CD availability, or inventory information, such as the retail storethat has the CD in stock. We should also include the store’s zip code because the system willfind the nearest store to the customer based on this information.

See if you can think about possible attributes to include in some of the other entitieslike Marketing Material and CD. Remember that on a real project, you would have the datadictionary, the requirements definition, and a variety of requirements-gathering tech-niques to use as sources for identifying attributes.

Figure A-12 shows the data model with some of the attributes that likely would befound on the CD Selections data model. The model shows that we would capture the typeof marketing material (e.g., video clip, audio clip), a short description, the e-mail addressof the person who provided the material, and the actual contents in the marketing mater-ial entity. The CD entity would have the following attributes: SKU, title, artist, category, andsale status (i.e., whether or not the CD is on sale).

One or more of the attributes will be used to uniquely identify an instance of eachentity. E-mail address may be a good choice for the customer entity in case two customershave the same name. You know that every CD has a unique SKU (stock keeping unit), so thiscan be used as the identifier in the CD entity. In case there are two vendors with the samename, we may want to use a concatenated identifier that combines vendor name and addressfor the vendor entity. Notice that there do not seem to be obvious identifiers to use for themarketing material, inventory, and in-store hold entities. We will leave the identifier off forthe time being until we learn about some special types of entities later in this chapter.

Identify the Relationships The last step in creating ERDs is to determine how theentities are related to each other. Lines are drawn between entities that have relation-ships, and each relationship is labeled and assigned both a cardinality and a modality. Forexample, there likely is some kind of relationship between a CD and marketing materialused to promote it. When you probe into the cardinality of the relationship, you discover

14 Appendix A Data Modeling

MARKETING MATERIAL

MAR_type

MAR_description

MAR_email

MAR_content

CD_sku

CD

CD_title

CD_artist

CD_category

CD_salestatus

INVENTORY

INV_store

INV_zipcode

VEN_name

VENDOR

VEN_address

VEN_city

VEN_state

VEN_zip

VEN_contact

VEN_phone

CUS_email

CUS_firstname

CUS_lastname

CUST_city

CUST_address

CUST_state

CUST_zip

CUST_daytimephone

CUSTOMER HOLD

HOL_date

FIGURE A-12Attributes and Identifiers for CDSelections ERD

that a CD can have many different kinds of marketing material (e.g., audio clip, videoclip, jacket cover photo), but a specific piece of marketing material is only associated withone CD. This suggests that there is a 1 : N relationship between the CD and marketingmaterial entities.

Next we examine the modality. Can marketing material exist without a CD? Well,maybe. There is a chance that a vendor will send us information before a CD is availablefor sale. In this case, we may want to store the marketing material until we are ready to useit. Therefore, we would make the modality of the relationship “null.” Look at Figure A-13.Notice the labels, cardinality, and modality that have been included on the diagram.

Advanced Syntax Now that we have created a data model according to the basic syn-tax that was presented earlier, we can move to some more advanced concepts. There arethree special types of entities that need to be explained and added to the diagram tomake it clearer.

Independent Entity The first type of entity is an independent entity, an entity thatcan exist without the help of another entity, such as patient, doctor, and insurance com-pany. These three entities all have identifiers that were created using their own attrib-utes (i.e., patient ID number, physician ID number, and insurer name). In other words,attributes from other entities were not needed to uniquely identify an instance. Forexample, using the patient ID number is sufficient for uniquely identifying a patient inthe system shown in Figure A-1. Information from the insurance company, appoint-ment, or payment are not needed to identify a patient, even though these entities sharerelationships with the patient entity. Independent entities are depicted using rectangleswith straight corners.

Creating an Entity Relationship Diagram 15

MARKETING MATERIAL

MAR_type

MAR_description

MAR_email

MAR_content

CD_sku

CD

CD_title

CD_artist

CD_category

CD_salestatus

INVENTORY

INV_store

INV_zipcode

VEN_name

VENDOR

VEN_address

VEN_city

VEN_state

VEN_zip

VEN_contact

VEN_phone

CUS_email

CUS_firstname

CUS_lastname

CUST_city

CUST_address

CUST_state

CUST_zip

CUST_daytimephone

CUSTOMER

provides /is provided by

is described by /describes

is stored by /stores

places /is placed by

reserves /is reserved by

IN-STORE HOLD

HOL_date

FIGURE A-13 Relationships for CD Selections ERD

When relationships have an independent child entity, they are called nonidentifyingrelationships, and they are denoted using a dotted relationship line. This name originatedbecause the attributes from parent entity were not used as part of the child’s identifier. Thenonidentifying relationships in Figure A-1 include the relationships between appointmentand bill, appointment and patient, patient and payment, bill and payment, insurance com-pany and payment, and insurance company and patient.

16 Appendix A Data Modeling

I have two very different stories regarding data models.First, when I worked with First American Corporation, thehead of Marketing kept a data model for the marketing sys-tems hanging on a wall in her office. I thought this was alittle unusual for a high-level executive, but she explainedto me that data was critical for most of the initiatives thatshe puts in place. Before she can approve a marketingcampaign or new strategy, she likes to confirm that the dataexists in the systems and that it’s accessible to her analysts.She has become very good at understanding ERDs over theyears because they had been such an important communi-cations tool for her to use with her own people and with IT.

On a very different note, here is a story I receivedfrom a friend of mine who heads up an IT department:

“We were working on a business critical, timedependent development effort, and VERY senior manage-ment decided that the way to ensure success was to havethe various teams do technical design walkthroughs tosenior management on a weekly basis. My team wasresponsible for the data architecture and database design.How could senior management, none of whom probablyhad ever designed an Oracle architecture, evaluate thesoundness of our work?

So, I had my staff prepare the following for the one(and only) design walkthrough our group was asked to do.First, we merged several existing data models and thenduplicated each one … that is, every entity and relation-ship printed twice (imitating, if asked, the redundantarchitecture). Then we intricately color coded the modeland printed the model out on a plotter and printed onecopy of every inch of model documentation we had. Onthe day of the review, I simply wheeled in the documen-tation and stretched the plotted model across the execu-tive boardroom table. ‘Any questions,’ I asked? ‘Veryimpressive,’ they replied. That was it! My designs werenever questioned again.”

—Barbara Wixom

Questions:

1. Based on these two stories, what do you think isthe user’s role in data modeling?

2. When is it appropriate to involve users in the ERDcreation process?

3. How can users help analysts create better ERDs?

A-A The User’s Role in Data ModelingCONCEPTS

IN ACTION

Consider the following system that was described in Chap-ter 5. Use the use cases and process models that you createdin Chapters 5 and 6 to help you answer the questions below.

The Campus Housing Service helps students findapartments. Owners of apartments fill in informationforms about the rental units they have available (e.g.,location, number of bedrooms, monthly rent). Studentswho register with the service can search the rental infor-mation to find apartments that meet their needs (e.g., atwo-bedroom apartment for $400 or less per monthwithin 1/2 mile of campus). They then contact the apart-ment owners directly to see the apartment and possibly

rent it. Apartment owners call the service to delete theirlisting when they have rented their apartment(s).

Questions:

1. What entities would you include on a data model?2. What attributes would you list for each entity?

Select an identifier for each entity, if possible.3. What relationships exist between the entities that

you identified? Label the relationships appropri-ately, and denote the cardinality and modality ofreach relationship.

A-3 Campus Housing SystemYOUR

TURN

Dependent Entity You may have guessed that there are situations when a child entitydoes rely on attributes from the parent entity to uniquely identify an instance. In thesecases, the child entity is called a dependent entity (or associative entity), and it needs exter-nal attributes to use for identification purposes.

Look at Figure A-14, which shows the appointment entity that originally appeared inFigure A-1. Alone, an appointment cannot be identified using the appointment’s date andtime because likely there are many appointments that are scheduled concurrently. (Severaldoctors may be seeing patients at 10 AM on March 26.) In this case, you actually would needto add the identifier from the parent entity (doctor) to help identify a unique appointment.Appointment is considered a dependent entity, and the fact that it is a dependent entity isdepicted using a rectangle with rounded corners.

When relationships have a dependent child entity, they are called identifying relation-ships, and they are denoted using a solid relationship line. This name originated becauseattributes from parent entity were needed as part of the child’s identifier. The only identi-fying relationship in Figure A-1 includes the doctor and appointment relationship.

Creating an Entity Relationship Diagram 17

The CD Selections ERD has been updated in Figure A-15. Which entities are dependent entities on this ver-sion? Locate the identifying relationships. How did

you find them? Can you create a rule that describesthe association between dependent entities and iden-tifying relationships?

A-5 Dependent EntitiesYOUR

TURN

is scheduled for /includes

APPOINTMENT

APP_date

APP_time

DOC_physicianidnumber

APP_reason

APP_status

APP_duration

DOCTOR

DOC_physicianidnumber

DOC_firstname

DOC_lastname

DOC_address

DOC_city

DOC_state

DOC_zipcode

DOC_homephone

DOC_pagenumber

DOC_primaryspecialty

FIGURE A-14Dependent EntityExamples

Locate the independent entities on Figure 7-13. Howdo you know which of the entities are independent?Locate the nonidentifying relationships. How did you

find them? Can you create a rule that describes theassociation between independent entities and noniden-tifying relationships?

7-4 Independent EntitiesYOUR

TURN

Intersection Entity A third kind of entity is called an intersection entity, and it exists onthe basis of a relationship between two other entities. Typically, intersection entities are cre-ated to store information about two entities that share a M : N relationship. Think back tothe M : N relationship between patient and symptom that was described in Figure A-5.Currently, one instance of a symptom (e.g., sore throat, fever, runny nose) could occur withmany patients, and a patient could have many symptoms at one time. A difficulty arises ifwe want to capture the date upon which a particular patient demonstrates one or moresymptoms. For example, we can’t place a date in the patient entity because a patient getssick many times. Alternatively, the date doesn’t belong in the symptom entity because itcontains symptoms that are shared by all kinds of patients. In this case, an intersectionentity is needed to capture attributes about the relationship itself—attributes that describewhen a particular patient actually has a particular symptom.

There are three steps to adding an intersection entity, and this process usually is called“resolving a M : N relationship” because it eliminates the M : N relationship and problemsassociated with it from the data model. Step 1: remove the M : N relationship line betweenthe two entities and add a third entity in between the two existing ones. Step 2: add two 1 :N relationships to the model. The two original entities should serve as the parent entities foreach 1 : N, and the new intersection entity becomes the child entity in both relationships.Step 3: Name the intersection entity. Many times intersection entities are named using aconcatenation of the two entities that created it (e.g., patient symptom), making its purposeclear, or the entity can be given another appropriate name (e.g., illness). Figure A-16 showsthe M : N relationship and how it was resolved using an intersection entity.

Are intersection entities dependent or independent? Actually, it depends. Sometimes anintersection entity has a logical identifier that can uniquely identify instances within it. Forexample, an intersection entity between student and course (a student can take manycourses and a course can be taken by many students) may be called transcript. If transcripts

18 Appendix A Data Modeling

MARKETING MATERIAL

MAR_type

MAR_description

MAR_email

MAR_content

CD_sku

CD

CD_title

CD_artist

CD_category

CD_salestatus

INVENTORY

VEN_name

VENDOR

VEN_address

VEN_city

VEN_state

VEN_zip

VEN_contact

VEN_phone

CUS_email

CUS_firstname

CUS_lastname

CUST_city

CUST_address

CUST_state

CUST_zip

CUST_daytimephone

CUSTOMER

provides /is provided by

is described by /describes

is stored by /stores

places /is placed by

reserves /is reserved by

IN-STORE HOLD

HOL_date

INV_store

INV_zipcode

INV_itemnumber

FIGURE A-15 Dependent Entities for CD Selections

have unique transcript numbers, then the entity would be considered dependent. In con-trast, the illness intersection entity in Figure A-16 requires the identifiers from both symp-tom and patient for an instance to be uniquely identified. Thus, illness is a dependent entity.

VALIDATING AN ERD

As you probably guessed from the previous section, creating ERDs is pretty tough. It takesa lot of experience to draw ERDs well, and there are not many black-and-white rules tohelp guide you. Luckily, there are some general design guidelines that you can keep in mindas you build ERDs, and once the ERDs are drawn, you can use a technique called normal-ization to validate that your models are well formed. Another technique is to check yourERD against your process models to make sure that both models balance with each other.

Validating an ERD 19

PAT_idnumber

INS_firstname

PAT_lastname

PAT_address

PAT_city

PAT_state

PAT_zipcode

PAT_homephone

PAT_birthdate

PAT_idnumber

INS_firstname

PAT_lastname

PAT_address

PAT_city

PAT_state

PAT_zipcode

PAT_homephone

PAT_birthdate

PATIENT

SYM_description

SYM_name

SYMPTOM

SYM_description

SYM_name

SYMPTOM

PS_date

comesdownwith

comes down with /is shown by

isshownby

PATIENT_SYMPTOM

FIGURE A-16Resolving an M : NRelationship

Resolve the M : N relationship between the doctor andspecialty that is shown in Figure A-5. What kinds of infor-mation could you capture about this relationship? Whatwould the new ERD look like? Would the intersectionentity be considered dependent or independent?

Can you think of other kinds of M : N relationshipsthat exist in the real world? How would you resolvethese M : N relationships if you were to include them onan ERD?

A-6 Intersection EntitiesYOUR

TURN

Design GuidelinesDesign guidelines are not rules that must be followed; rather, they are “best practices” thatoften lead to better-quality diagrams. For example, labels and naming conventions areimportant for creating clear ERDs. Names should not be ambiguous (e.g., name, number);instead, they should clearly communicate what the model component represents. Thesenames should be consistent across the model and reflect the terminology used by the busi-ness. If CD Selections refers to people who order products as customers, the data modelshould include an entity called customer, not client or stakeholder.

There are no rules covering the layout of ERD components. They can be placed any-where you like on the page, although most systems analysts try to put the entities togetherthat are related to each other. If the model becomes too complex or busy (some companieshave hundreds of entities on a data model), the model can be broken down into subjectareas. Each subject area would contain related entities and relationships, and the analystcan work with one group of entities at a time to make the modeling process less confusing.

In general, data modeling can be quite tricky, mainly because the data model is heav-ily based on interpretation; therefore, when business rules change, likely the relationshipsor other data model components will have to be altered. Assumptions are an importantpart of data modeling. For example, we assumed that only one doctor can be scheduledfor an appointment. What if a patient needs to see two different specialists at once? Thedata model would have to change to allow one appointment to have multiple doctors.

Therefore, when you data model, don’t panic or become overwhelmed by details.Rather, you should slowly add components to the diagram knowing that they will bechanged and rearranged many times. Make assumptions along the way and then confirmthese assumptions with the business users. Work iteratively and constantly challenge thedata model with business rules and exceptions to see if the diagram is communicating thebusiness system appropriately. Figure A-17 summarizes the guidelines presented in thischapter to help you evaluate your data model.

NormalizationOnce you have created your ERD, there is a technique called normalization that can helpanalysts validate the models that they have drawn. It is a process whereby a series of rules areapplied to a logical data model or a file to determine how well formed it is (Figure A-18).Normalization rules help analysts identify entities that are not represented correctly in a log-ical data model, or entities that can be broken out from a file. We describe here three nor-malization rules that are applied regularly in practice.

First Normal Form A logical data model is in first normal form (1NF) if it does not con-tain attributes that have repeating values for a single instance of an entity. Often this prob-lem is called repeating attributes, or repeating groups. Every attribute in an entity shouldhave only one value per instance for the model to “pass” 1NF.

Let’s pretend that the CD Selections project team was given the layout for the specialorder file that is used by the existing special order system. The team is anxious to incorpo-rate the data from this file into their own system, and they decide to put the file into thirdnormal form to make the information easier to understand and ultimately easier for themto add to the data model for the new Internet order system. See Figure A-19 for the file lay-out that the project team received.

If you examine the file carefully, you should notice that there are two cases in which multiplevalues are captured for the same attribute for a single special order instance. The first violation isthe customer’s book preferences, which are the kinds of books that the customer enjoys to read(e.g., horror, nonfiction, sci-fi). The fact that the attribute is plural leads us to believe that many

20 Appendix A Data Modeling

different preferences are captured for each instance of a special order, and preference is a repeat-ing attribute. This needs to be resolved by creating a new entity that contains preference informa-tion, and a relationship is added between special order and preference. See Figure A-20 for the datamodel in 1NF. The new relationship is M : N because special order can be associated with manypreferences, and a preference can be found on many special orders.

Additionally, notice that the special order file in Figure A-19 captures informationabout many books for a single special order—another violation of 1NF. This time, an entiregroup of attributes repeats (in this case up to three times) information about books eachtime an order is placed. This is called a repeating group, and it can be removed by creatinga book entity and placing all of the book attributes into it. The relationship between the spe-cial order and book entities also is a M : N; a book can be found on many special orders, anda special order can include many books.

Second Normal Form Second normal form (2NF) requires first that the data model is in1NF and second that the data model leads to entities containing attributes that are depen-dent on the whole identifier. This means that the value of all attributes that serve as iden-tifier can determine the value for all of the other attributes for an instance in an entity.Sometimes nonidentifier attributes are only dependent on part of the identifier (i.e., par-tial dependency), and these attributes belong in another entity.

Figure A-21 shows the special order data model placed in 2NF. Notice that originally,the special order entity had three attributes that were used as identifiers: special order date,customer last name, and customer first name. The problem was that some of the attributes

Validating an ERD 21

COU_name

COUNTRY UNI_name

UNI_datefounded

UNI_enrollment

UNI_firstpresident

UNI_founder

UNIVERSITY

TEA_idnumber

TEA_startdate

TEA_name

name

area

description

TEACHER

SUBJECT

hires/ works for

specializes/ is studied by

contains/ is located within

1. Country only has onenstance (i.e., Mexico). Thisentity is not needed. 3. Why are all of these attributes

being captured about university? Willit be necessary to store the founderand first president of each university?If not, these attributes should beremoved from the ERD.

5. The name attribute really should bebroken down into last name and firstname—otherwise, there would be no wayto manipulate names in the system. Forexample, there would be no way to sortby last name if it were combined withsomeone’s first name.

4. The attributes in the subject entityare poorly labeled. For one, we have noway of knowing with which entity theybelonged if they stood alone—it would be helpful to begin each attribute withSUB_. Also, what is area? A word like department or field of research may bemore descriptive.

6. This model assumes that a teachercan only work for one university—whatabout those with joint appointments? Anassumption should be stated on the model or in the documentation so thatthis business rule can be confirmed.

2. If teachers are called “Professors”,then the ERD should contain an entitycalled “Professor” to remain consistent.

FIGURE A-17 A Few Data Modeling Guidelines

were dependent on the customer last name and first name, but had no dependency on spe-cial order date. These attributes were all of the attributes that describe a customer: phone,address, and birth date. To resolve this problem, a new entity called customer was created,and the customer attributes were moved into the new entity. A 1 : N relationship existsbetween customer and special order because a customer can place many special orders, buta special order is only associated with one customer.

Remember that the customer last name and first name are still used in the special orderentity—we know this because of the identifying 1 : N relationship between customer and spe-cial order. The identifying relationship implies that the customer identifier (i.e., last nameand first name) are used in special order as a part of its identifier.

Notice that we moved the relationship with reference to the new customer entity. Log-ically, a preference should be associated with a customer, not a particular special order.

Third Normal Form Third normal form (3NF) occurs when a model is in both 1NF and2NF and when in the resulting entities none of the attributes are dependent on a noniden-tifier attribute (i.e., transitive dependency). Violations of 3NF can be found in both the bookand special order entities in Figure A-21.

The problem with the book entity is that author university is dependent on the author,not the ISBN. In other words, by knowing a book’s ISBN, you do not automatically knowthe author’s current university—but you would be able to find that value using the author

22 Appendix A Data Modeling

0 Normal Form

Yes: Remove the repeating attributes and repeating groups. Create an entity that describes the attributes. Usually you will need to add a rela- tionship to connect the old and new entities.

Do any attributes have multiplevalues for a single instance ofan entity?

Yes: Remove the partial dependency. Move the attributes to an entity in which their values are dependent on the entire identifier. Usually you will need to create a new entity and add a relationship to connect the old and new entities. No: The data model is in 2NF.

Is the identifier comprised ofmore than one attribute? If so,are any attribute valuesdependent on just part of theidentifier?

No: The data model is in 1NF.

1 Normal Form

Yes: Remove the transitive dependency or derived attribute. Move the attributes to an entity in which their values are dependent on the identifier. Usually you will need to create a new entity and add a relationship to connect the old and new entities.No: The data model is in 3NF.

Do any attribute values dependon an attribute that is not theentity’s identifier?

2 Normal Form

3 Normal FormFIGURE A-18Normalization Steps

name, a nonidentifier attribute. Therefore, we create aseparate entity called author and move the authorattributes to the new entity. The 1 : N relationshipassumes that an author can write many books, but oursystem only captures information about one author(the first author) for each book.

In a similar vein, notice how store manager andlocation depend on store name, which is not an identi-fier for special order. This transitive dependency can beresolved by creating a store entity with store attributes.

Third normal form also addresses problemscaused by derived, or calculated, attributes. The val-ues of derived attributes don’t need to be stored in adatabase (and thus, can be eliminated from the datamodel) because they can be derived from otherattributes. For example, a system would never cap-ture someone’s age if it also contains birthdate—agecan be derived by knowing the birthdate and com-paring that date to today. In Figure A-21 the numberof days since a special order had been placed isremoved from the data model. There is no need tocapture this attribute since we can determine it usingthe attribute that captures the value for the date anorder was placed. Figure A-22 shows the final datamodel in 3NF.

Balancing Entity Relationship Diagrams with Data Flow DiagramsAll the analysis activities of the systems analyst are interrelated. For example, the require-ments analysis techniques are used to determine how to draw both the process models anddata models, and the CASE repository is used to collect information that is stored andupdated throughout the entire analysis phase. Now we will see how the process models anddata models are interrelated.

Although the process model focuses on the processes of the business system, it con-tains two data components—the data flow (which is composed of data elements) and the

Validating an ERD 23

Special Order Date

SPECIAL ORDER

Customer Last NameCustomer First Name

Customer phoneCustomer AddressCustomer birthdateCustomer Book PreferencesBook ISBN1Book Name1Book Author1Book Publication Year1Book Author University1Book ISBN2Book Name2Book Author2Book Publication Year2Book Author University2Book ISBN3Book Name3Book Author3Book Publication Year3Book Author University3Store NameStore ManagerStore LocationSpecial Order StatusSpecial Order Days on Order

FIGURE A-19Normal Form

Special Order Date

SPECIAL ORDER

Customer Last NameCustomer First Name

Customer PhoneCustomer AddressCustomer birthdateStore NameStore ManagerStore LocationSpecial Order StatusSpecial Order Days on Order

PRE_Type

PREFERENCElists/is

listed on

BOO_nameBOO_authorBOO_publicationyearBOO_authoruniversity

BOO_isbn

BOOKcontains/is

contained on

FIGURE A-20First Normal Form

data store. The purposes of these are to illustrate what data are used and created by theprocesses and where those data are kept. These components of the DFD need to balancewith the ERD. In other words, the DFD data components need to correspond with theERD’s data stores (i.e., entities) and the data elements that comprise the data flows (i.e.,attributes) depicted on the data model.

Many CASE tools offer the feature of identifying problems with balance amongDFDs and ERDs; however, it is a good idea to understand how to identify problems on

24 Appendix A Data Modeling

Pretend that you have been asked to build a system thattracks student involvement in activities around campus.You have been given a file with information that needs tobe imported into the system, and the file contains the fol-lowing fields:

■ Student Social Security number (identifier)■ Activity 1 code (identifier)■ Activity 1 description■ Activity 1 start date■ Activity 1 years with activity■ Activity 2 code■ Activity 2 description■ Activity 2 start date

■ Activity 3 years with activity■ Activity 3 code■ Activity 3 description■ Activity 3 start date■ Activity 3 years with activity■ Student last name■ Student first name■ Student birthdate■ Student age■ Student advisor name■ Student advisor phone

Normalize the file. Show how the logical data modelwould change as you move from 1NF to 2NF to 3NF.

A-7 Normalizing a Student Activity FileYOUR

TURN

SPECIAL ORDER

Special Order Date

Store NameStore ManagerStore LocationSpecial Order StatusSpecial Order Days on Order

CUSTOMER

Cus_lastnameCus_firstname

CUS_ phoneCUS_ addressCUS_ birthdate

PRE_type

PREFERENCE

states/ is stated by

BOO_nameBOO_authorBOO_publicationyearBOO_authoruniversity

BOO_isbn

BOOK

contains/ is contained on

places/ is placed by

FIGURE A-21Second Normal Form

your own. For example, examine the process model that was created in Chapter 6 (Fig-ure 6-19). Notice that two of the entities on the data model, vendor and customer, do notexist on the process model as data stores (only as external entities). Because we will needto collect and use both customer and vendor information (and therefore will need placesto store that information), we will need to add customer and vendor data stores on theprocess model. The vendor store will be added to the maintain marketing materialprocess because that process is the one in which the vendor information is manipulated,and the customer data store will be used by both take requests and process in-store holds.

Why don’t you try to see how well balanced the models are for the doctor’s office sys-tem. Take a look at the DFD in Figure 6-11 and the ERD in Figure A-1. Can you identifyany data stores that are missing from the DFD? (Hint: what entities on the data model donot have a corresponding data store?)

Validating an ERD 25

SPECIAL ORDER

Special Order Date

Special Order Status

CUSTOMER

CUS_lastnameCUS_firstname

CUS_ phoneCUS_ addressCUS_ birthdate

PRE_type

PREFERENCE

AUT_name

AUT_university

AUTHOR

writes/ is written by

receives/is placed in

BOO_nameBOO_publicationyear

BOO_isbn

BOOK

contains/ is contained in

places/ is placed by

states/ is stated by

STO_locationSTO_manager

STO_name

STORE

FIGURE A-22Third Normal Form

A charter company owns boats that are used to charttrips to islands. The company has created a computersystem to track the boats it owns, including each boat’sID number, name, and seating capacity. The companyalso tracks information about the various islands, suchas their names and populations. Every time a boat ischartered, it is important to know the date that the trip isto take place and the number of people on the trip. Thecompany also keeps information about each captain,such as Social Security number, name, birthdate, and

contact information for next of kin. Boats travel to onlyone island per visit.

Questions:1. Create a data model. Include entities, attributes,

identifiers, and relationships.2. Which entities are dependent? Which are inde-

pendent?3. Use the steps of normalization to put your data model

in 3NF. Describe how you know that it is in 3NF.

A-8 Boat Charter CompanyYOUR

TURN

We hope you noticed that doctor, payment, and insurance company do not appear onthe process model as data stores, and they should be added to the DFD. Where would youput them?

Similarly, the bits of information that are contained in the data flows (these are usu-ally defined in the CASE entry for the data flow) should match up to the attributes foundin entities in the data models. For example, if the customer information data flow thatgoes from the customer entity to the take-requests process were defined as having cus-tomer name, e-mail address, and home address, then each of these pieces of informationshould be recorded as attributes in the customer entity on the data model. Take a momentnow to determine the contents of the search request data flow. Given what you knowabout the kind of information belonging in a search request, do the contents balancewith the data model?

In general, balance occurs when all the data stores can be equated to entities andwhen all entities are referred to by data stores. Likewise, all data elements should be cap-tured by data attributes on the data model. If the DFD and ERD are not balanced, theninformation critical to the business process will be missing, or the system will containunnecessary data.

SUMMARY

Basic Entity Relationship Diagram SyntaxThe entity relationship diagram (ERD) is the most common technique for drawing adata model, a formal way of representing the data that are used and created by a businesssystem. There are three basic elements in the data modeling language, each of which isrepresented by a different graphic symbol. The entity is the basic building block for adata model. It is a person, place, or thing about which data is collected. An attribute issome type of information that is captured about an entity. The attribute that canuniquely identify one instance of an entity is called the identifier. The third data modelcomponent is the relationship, which conveys the associations between entities. Rela-tionships have cardinality (the ratio of parent instances to child instances) and modality(a parent needs to exist if a child exists). Information about all of the components is cap-tured by metadata in the data dictionary.

Creating an Entity Relationship DiagramThe basic steps in building an ERD are: (1) identify the entities, (2) add the appropri-ate attributes to each entity, and then (3) draw relationships among entities to showhow they are associated with one another. There are three special types of entities thatERDs contain. Most entities are independent, because one (or more) attribute can beused to uniquely identify an instance. Entities that rely on attributes from other entitiesto identify an instance are dependent. An intersection entity is placed between two enti-ties to capture information about their relationship. In general, data models are basedon interpretation; therefore, it is important to clearly state assumptions that reflectbusiness rules.

Validating an Entity Relationship DiagramNormalization is the process whereby a series of rules are applied to the logical data modelto determine how well formed it is. A logical data model is in first normal form (1NF) if it

26 Appendix A Data Modeling

does not lead to repeating attributes, which are attributes that capture multiple values fora single instance. Second normal form (2NF) requires that all entities are in 1NF and leadto attributes whose values are dependent on the whole identifier (i.e., no partial depen-dency). Third normal form (3NF) occurs when a model is in both 1NF and 2NF and noneof the resulting attributes are dependent on nonidentifier attributes (i.e., no transitivedependency). With each violation, additional entities should be created to remove therepeating attributes or improper dependencies from the existing entities. Finally, ERDsshould be balanced with the data flow diagrams (DFDs)—which were presented in Chap-ter 6—by making sure that data model entities and attributes correspond to data stores anddata flows on the process model.

Questions 27

1. If you must select an identifier for an employee entity,what three options do you have? What are the prosand cons of each choice?

2. Why do identifiers need to contain unique values?3. Describe to a businessperson the cardinality and

modality of a relationship between two entities.4. Describe the metadata that can be collected about an

entity, an attribute, and a relationship.5. Why is metadata important?6. What is the difference between a dependent and inde-

pendent entity? How is each depicted on a datamodel? What kind of relationships are associated withthese entities?

7. What is a business rule? Provide three examples ofbusiness rules. How would the business rules be incor-porated into an ERD?

8. What is an intersection entity? When would it be adependent entity? When would it be an independententity?

9. What would need to occur for a data flow diagram(DFD) and an entity relationship diagram (ERD) tobe balanced?

10. Why is it important to balance DFDs and ERDs?11. How can you make an ERD easier to understand? (It

might help if you first think about how to make onedifficult to understand.)

12. What do you think are three common mistakes noviceanalysts make in creating ERDs?

13. What is the purpose of normalization?14. How does a model meet the requirements of third

normal form?

QUESTIONS

1 : 1 relationship1 : N relationshipAssumptionAttributeBalanceBusiness ruleCardinalityChild entityConcatenated identifierData dictionaryData modelDependentDependent entityDerived attribute

EntityEntity relationship diagram (ERD)First normal form (1NF)IDEF1XIdentifierIdentifying relationshipIndependent entityInstanceIntersection entityLogical data modelM : N relationshipMetadataModalityNonidentifying relationship

NormalizationParent entityPartial dependencyPhysical data modelRelationshipRepeating attributesRepeating groupsSecond normal form (2NF)Subject areaThird normal form (3NF)TranscriptTransitive dependency

KEY TERMS

28 Appendix A Data Modeling

A. Draw data models for the following entities:• Movie (title, producer, length, director, genre)• Ticket (price, adult or child, showtime, movie)• Patron (name, adult or child, age)

B. Draw a data model for the following entities, consid-ering the entities as representing a system for a patientbilling system and including only the attributes thatwould be appropriate for this context:• Patient (age, name, hobbies, blood type, occupation,

insurance carrier, address, phone)• Insurance carrier (name, number of patients on

plan, address, contact name, phone)• Doctor (specialty, provider identification number,

golf handicap, age, phone, name)C. Draw the following relationships. Would the relation-

ships be identifying or nonidentifying? Why?• A patient must be assigned to only one doctor, and a

doctor can have many patients.• An employee has one phone extension, and a unique

phone extension is assigned to an employee.• A movie theater shows many different movies, and

the same movie can be shown at different movie the-aters around town.

D. Draw an entity relationship diagram (ERD) for thefollowing situations:1. Whenever new patients are seen for the first time,

they complete a patient information form that askstheir name, address, phone number, and insurancecarrier, all of which is stored in the patient infor-mation file. Patients can be signed up with only onecarrier, but they must be signed up to be seen by thedoctor. Each time a patient visits the doctor, aninsurance claim is sent to the carrier for payment.The claim must contain information about thevisit, such as the date, purpose, and cost. It wouldbe possible for a patient to submit two claims onthe same day.

2. The state of Georgia is interested in designing adatabase that will track its researchers. Informationof interest includes researcher name, title, position;university name, location, enrollment; and researchinterests. Each researcher is associated with onlyone institution, and each researcher has severalresearch interests.

3. A department store has a bridal registry. This reg-istry keeps information about the customer (usu-ally the bride), the products that the store carries,

and the products for which each customer registers.Customers typically register for a large number ofproducts, and many customers register for the sameproducts.

4. Jim Smith’s dealership sells Fords, Hondas, andToyotas. The dealership keeps information abouteach car manufacturer with whom it deals so thatemployees can get in touch with manufacturerseasily. The dealership also keeps information aboutthe models of cars that it carries from each manu-facturer. It keeps such information as list price, theprice the dealership paid to obtain the model, andthe model name and series (e.g., Honda Civic LX).The dealership also keeps information about allsales that it has made (for instance, employees willrecord the buyer’s name, the car the buyer bought,and the amount the buyer paid for the car). Toallow employees to contact the buyers in the future,contact information is also kept (e.g., address,phone number).

E. Examine the data models that you created for ExerciseD. How would the respective models change (if at all)on the basis of these corresponding new assumptions?• Two patients have the same first and last names.• Researchers can be associated with more than one

institution.• The store would like to keep track of purchased

items.• Many buyers have purchased multiple cars from Jim

over time because he is such a good dealer.F. Visit a Web site that allows customers to order a prod-

uct over the Web (e.g., Amazon.com). Create a datamodel that the site needs to support its businessprocess. Include entities to show what types of infor-mation the site needs. Include attributes to representthe type of information the site uses and creates.Finally, draw relationships, making assumptions abouthow the entities are related.

G. Create metadata entries for the following data modelcomponents and, if possible, input the entries into acomputer-aided software engineering (CASE) tool ofyour choosing:• Entity—product• Attribute—product number• Attribute—product type• Relationship—company makes many products, and

any one product is made by only one company

EXERCISES

Minicases 29

H. Describe the assumptions that are implied from thedata model shown above.

I. Create a data model for one of the processes in theend-of-chapter Exercises for Chapter 6. Explain howyou would balance the data model and process model.

J. Apply the steps of normalization to validate the mod-els you drew in Exercise D.

K. You have been given a file that contains the followingfields relating to CD information. Using the steps ofnormalization, create a logical data model that repre-sents this file in third normal form. The fields includethe following:• Musical group name• Musicians in group

• Date group was formed• Group’s agent• CD title 1• CD title 2• CD title 3• CD 1 length• CD 2 length• CD 3 length

The assumptions are as follows:• Musicians in group contains a list of the members of

the people in the musical group.• Musical groups can have more than one CD, so both

group name and CD title are needed to uniquelyidentify a particular CD.

cin_addresscin_citycin_statecin_zipcin_phone

CINEMA

THEATER

the_number

the_capacity

SHOWINGis shown at / shows

mov_titlemov_directormov_rating

MOVIE

shows / is shown in

contains / is a part of

cin_namesho_timesho_date

sho_total_attendance

mov_id

1. West Star Marinas is a chain of 12 marinas that offerlakeside service to boaters; service and repair of boats,motors, and marine equipment; and sales of boats,motors, and other marine accessories. The systemsdevelopment project team at West Star Marinas hasbeen hard at work on a project that eventually willlink all the marina’s facilities into one unified, net-worked system.

The project team has developed a logical processmodel of the current system. This model has beencarefully checked for syntax errors. Last week, the teaminvited a number of system users to role-play the var-ious data flow diagrams, and the diagrams wererefined to the users’ satisfaction. Right now, the projectmanager feels confident that the as-is system has beenadequately represented in the process model.

The Director of Operations for West Star is thesponsor of this project. He sat in on the role-playing ofthe process model and was very pleased by the thor-ough job the team had done in developing the model.He made it clear to you, the project manager, that hewas anxious to see your team begin work on theprocess model for the to-be system. He was a little

skeptical that it was necessary for your team to spendany time modeling the current system in the first placebut grudgingly admitted that the team really seemedto understand the business after going through thatwork.

The methodology you are following, however, spec-ifies that the team should now turn its attention todeveloping the logical data model for the as-is system.When you stated this to the project sponsor, heseemed confused and a little irritated. “You are goingto spend even more time looking at the current sys-tem? I thought you were done with that! Why is thisnecessary? I want to see some progress on the waythings will work in the future!”a. What is your response to the Director of Operations?b. Why do we perform data modeling?c. Is there any benefit to developing a data model of

the current system at all?d. How does the process model help us develop the

data model?2. Holiday Travel Vehicles sells new recreational vehicles

and travel trailers. When new vehicles arrive at HolidayTravel Vehicles, a new vehicle record is created. Included

MINICASES

30 Appendix A Data Modeling

in the new vehicle record is a vehicle serial number,name, model, year, manufacturer, and base cost.

When a customer arrives at Holiday Travel Vehicles,he or she works with a salesperson to negotiate a vehi-cle purchase. When a purchase has been agreed to, asales invoice is completed by the salesperson. Theinvoice summarizes the purchase, including full cus-tomer information, information on the trade-in vehi-cle (if any), the trade-in allowance, and informationon the purchased vehicle. If the customer requestsdealer-installed options, they will be listed on theinvoice as well. The invoice also summarizes the finalnegotiated price, plus any applicable taxes and licensefees. The transaction concludes with a customer sig-nature on the sales invoice.a. Identify the data entities described in this scenario

(you should find six). Customers are assigned a cus-tomer ID when they make their first purchase fromHoliday Travel Vehicles. Name, address, and phonenumber are recorded for the customer. The trade-invehicle is described by a serial number, make,model, and year. Dealer installed options aredescribed by an option code, description, and price.

b. Develop a list of attributes for each of the entities.Each invoice will list just one customer. A person

does not become a customer until he or she purchasea vehicle. Over time, a customer may purchase a num-ber of vehicles from Holiday Travel Vehicles.

Every invoice must be filled out by only one sales-person. A new salesperson may not have sold any vehi-cles, but experienced salespeople have probably soldmany vehicles.

Each invoice only lists one new vehicle. If a newvehicle in inventory has not been sold, there will be noinvoice for it. Once the vehicle sells, there will be justone invoice for it. A customer may decide to have nooptions added to the vehicle, or may choose to addmany options. An option may be listed on no invoices,or it may be listed on many invoices.

A customer may trade in no more than one vehicleon a purchase of a new vehicle. The trade-in vehicle

may be sold to another customer, who later trades it inon another Holiday Travel vehicle.c. Based on these business rules in force at Holiday

Travel Vehicles, draw an ERD and document therelationships with the appropriate cardinality andmodality.

3. The system development team at the Wilcon Com-pany is working on developing a new customer orderentry system. In the process of designing the new sys-tem, the team has identified the following data entityattributes:

Inventory OrderOrder Number (identifier)Order DateCustomer NameStreet AddressCityStateZipCustomer TypeInitialsDistrict NumberRegion Number1 to 22 occurrences of:

Item NameQuantity OrderedItem UnitQuantity ShippedItem OutQuantity Received

a. State the rule that is applied to place an entity infirst normal form. Revise this data model so that itis in first normal form.

b. State the rule that is applied to place an entity intosecond normal form. Revise the data model (ifnecessary) to place it in second normal form.

c. State the rule that is applied to place an entity intothird normal form. Revise the data model to placeit in third normal form.

d. What other guidelines and rules can you follow tovalidate that your data model is in good form?