2012 Chap 06 Entity Relationship Diagrams.pptx

102
Chapter 6 Entity Relationship Diagrams

description

ERD, entity relationships and diagrams, chapter 6

Transcript of 2012 Chap 06 Entity Relationship Diagrams.pptx

Page 1: 2012 Chap 06 Entity Relationship Diagrams.pptx

Chapter 6 Entity Relationship Diagrams

Page 2: 2012 Chap 06 Entity Relationship Diagrams.pptx

Outline • The ERD

– Reading an ERD– Elements of an ERD– The data dictionary and metadata

• Creating an ERD– Building ERD– Advanced syntax

• Validating ERD– Design guidelines– Normalization– Balancing ERD with DFD

Page 3: 2012 Chap 06 Entity Relationship Diagrams.pptx

Introduction

• Data model – A formal way of representing the data that are

used and created by a business system– Shows the nouns --- people, places and things

about which data is captured and the relationships among them.

– Can be used as a logical data model in analysis and as a physical data model in design.

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 3

Page 4: 2012 Chap 06 Entity Relationship Diagrams.pptx

Introduction • Data model

– software packages that provide data modeling capabilities – Erwin, Oracle Designer, Visible Analyst Workbench, Visio

– Erwin by Platinum Technology – creates and maintains logical and physical data models, have a wide array of capabilities, generates databases

– Oracle designer – bundled with database management systems– Visible Analyst Workbench

– can be used with many different databases;– integrates the data model with other parts of the project; – a full-service CASE tool; – Data modeling is one of many capabilities

– Visio – a Microsoft productPowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 4

Page 5: 2012 Chap 06 Entity Relationship Diagrams.pptx

Introduction

• Analysis phase – logical data model– Presents the logical organization of data without indicating

how the data are stored, created, or manipulated.– Presents how the business system will operate.

• Design phase – physical data model– Reflects exactly how the data will be stored in databases

and files.– How the data that flow through the processes are

organized and presented.– shows how the data will actually be stored in databases or

files.

Page 6: 2012 Chap 06 Entity Relationship Diagrams.pptx

Introduction

• ERD – one of the most common logical data modeling techniques– Shows all the data components of a business

system– Developed by Peter Chen

• Normalization is the process analysts use to validate data models.

• Data models should balance with process modelsPowerPoint Presentation for

Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 6

Page 7: 2012 Chap 06 Entity Relationship Diagrams.pptx

The Entity Relationship Diagram

• Elements of an ERD• Reading an ERD• The data dictionary and metadata

Page 8: 2012 Chap 06 Entity Relationship Diagrams.pptx

What Is an ERD?• A picture showing the information created, stored, and used

by a business system. • Entities - generally represent similar kinds of information

– Data elements are listed together and place inside boxes called entities.

• Lines - drawn between entities show relationships among the data

• Special symbols – are added to communicate high level business rules

• An analyst can read an ERD to discover – the individual pieces of information in a system and – how they are organized and related to each otherPowerPoint Presentation for

Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 8

Page 9: 2012 Chap 06 Entity Relationship Diagrams.pptx

Elements of an ERD

• There are many different sets of symbols that can be used on an ERD, but we use crow’s foot.

• The elements are– Entity– Attribute– Relationships

Page 10: 2012 Chap 06 Entity Relationship Diagrams.pptx

ERD Elements

7 - 10

The top words are read from parent to child, and the bottom words are read from child to parent.

Page 11: 2012 Chap 06 Entity Relationship Diagrams.pptx

Entity

• The basic building block for a data model.• A person, place, event, or thing about which data is

collected• Must be multiple occurrences to be an entity

– Example: If a firm has only one warehouse, the warehouse is not an entity. However, if the firm has several warehouses, the warehouse could be an entity if the firm wants to store data about each warehouse instance.

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 11

Page 12: 2012 Chap 06 Entity Relationship Diagrams.pptx

Attributes

• information that is captured about an entity.• Attribute names are nouns.• Sometimes entity name is added at the beginning of

the attribute name for clarity• the attribute with asterisk --- one or more attributes can

serve as the entity identifier, uniquely identifying each entity instance

• Concatenated identifier consists of several attributes• An identifier may be ‘artificial,’ such as creating an ID

number• Identifiers may not be developed until the Design PhasePowerPoint Presentation for

Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 12

Page 13: 2012 Chap 06 Entity Relationship Diagrams.pptx

Relationships

• Relationships – are shown by lines that connect the entities together.– Every relationship has a parent entity and a child

entity; the parent being the first entity and the child being the second.

• Relationships have two properties:– Cardinality and Modality

• are the indicators of the business rules around a relationship.

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 13

Page 14: 2012 Chap 06 Entity Relationship Diagrams.pptx

Relationships

• Associations between entities– The first entity in the relationship is the parent

entity; – the second entity in the relationship is the child

entity• Relationships should have active verb names• Relationships go in both directions

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 14

Page 15: 2012 Chap 06 Entity Relationship Diagrams.pptx

Cardinality

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 15

• Cardinality – refers to the number of times instances in one

entity can be related to instances in another entity• One instance in an entity refers to one and only one

instance in the related entity (1:1)• One instance in an entity refers to one or more

instances in the related entity (1:N)• One or more instances in an entity refer to one or more

instances in the related entity (M:N)

Page 16: 2012 Chap 06 Entity Relationship Diagrams.pptx

Cardinality • Cardinality - refers to the maximum number of

times an instance in one entity can be associated with instances in the related entity.– can be 1 or Many and the symbol is placed on the

outside ends of the relationship line, closest to the entity.

– For a cardinality of 1 a straight line is drawn. – For a cardinality of Many a foot with three toes is

drawn.

Page 17: 2012 Chap 06 Entity Relationship Diagrams.pptx

Cardinality

• Cardinality - refers to the number of entity instances involved in the relationship. – 1:1 "One to One“

• one EMPLOYEE receives one PAYCHECK• one SALESPERSON is assigned one COMPANY_CAR

– 1:N "One to Many“• one CUSTOMER may place many CUSTOMER ORDERS

– N:M "Many to Many“• many STUDENTS may sign up for many CLASSES

• PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 17

Page 18: 2012 Chap 06 Entity Relationship Diagrams.pptx

Modality • Modality

• refers to the minimum number of times an instance in one entity can be associated with an instance in the related entity.

– can be 1 or 0 and the symbol is placed on the inside, next to the cardinality symbol.

– Refers to whether a child entity can exist with or without a related instance in the parent entity

– For a modality of 1 a straight line is drawn– means required– Not null – must exist to be valid

– For a modality of 0 a circle is drawn– means not required or optional.– Null – not necessary to be valid

Page 19: 2012 Chap 06 Entity Relationship Diagrams.pptx

ERD Elements

7 - 19

The top words are read from parent to child, and the bottom words are read from child to parent.

Page 20: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Data Attribute/Field: a single item of data• Entity: database representation of an individual resource, event, or

agent about which we choose to collect data, may be – physical (inventories, customers, and employees) or – conceptual (sales, accounts receivable, and depreciation expense)

• Record Type: table or file– group together the data attributes that logically define an entity– group of fields– Table or file - Multiple occurrences (more than one) of a particular type of

record; group of record• Database: the set of record types that an organization needs to

support its business processes– Group of files

Database Terminology:Before introducing these models formally, we need to review some important database terms and concepts:

20Hall, 3e

Page 21: 2012 Chap 06 Entity Relationship Diagrams.pptx

Database Terminology• Association - relationships

– Record types that constitute a database exist in relation to other record types. – Three basic record associations are: one-to-one, one to-many, and many-to-

many.– Represented by a line connecting two entities– Described by a verb, such as ships, requests, or receives

• Cardinality and Modality are the indicators of the business rules around a relationship. – Cardinality – the degree of association between two entities

• refers to the maximum number of times an instance in one entity can be associated with instances in the related entity.

• can be 1 or Many– Modality refers to the minimum number of times an instance in one entity

can be associated with an instance in the related entity.• can be 1 or 0

21Hall, 3e

Page 22: 2012 Chap 06 Entity Relationship Diagrams.pptx

Association:One-to-one association

This means that for every occurrence in Record Type X, there is one (or possibly zero) occurrencein Record Type Y.

For every occurrence (employee) in the employee table, there is only one (or zero for new employees) occurrence in the year-to-dateearnings table.

Page 23: 2012 Chap 06 Entity Relationship Diagrams.pptx

Association:One-to-many association

For every occurrence in Record Type X, there are zero, one, or many occurrencesin Record Type Y.

For every occurrence (customer) in the customerable, there are zero, one, or many sales orders in the sales order table. This means that a particular customer may have purchased goods from the company zero, oneor many times during the period under review.

Page 24: 2012 Chap 06 Entity Relationship Diagrams.pptx

Association:Many-to-many association

For each occurrence of Record Types X and Y, there are zero, one, or manyoccurrences of Record Types Y and X, respectively.

The business relationship between an organization’s inventory and its suppliers illustrates the M:M association. A particular supplier provides the company with zero (the supplieris in the database, but the firm does not buy from the supplier), one, or manyinventory items. Similarly, the company may buy a particular inventory item from zero (e.g., the firm makes the item in-house), one, or many different suppliers.

Page 25: 2012 Chap 06 Entity Relationship Diagrams.pptx

fig_06_01 – ERD for the Supermarket Checkout Scenario

fig_06_01Reading an ERD

The data to support the checkout process can be organized into four main categories:

an item is described by its UPC (Universal Product Code), price, description, category, and tax status.

the item UPC is used to uniquely identify every item sold.

Page 26: 2012 Chap 06 Entity Relationship Diagrams.pptx

Reading an ERD

• Using the ERD to Show Business Rules:– Business rules are constraints that are followed

when the system is in operation.– Rules such as:

• A payment can be cash, check, debit card, credit card, coupon, or food stamps.

• A sale is paid by one or more payments.• A payment pays for only one sale.

– Refer to a policy guide or written procedure to determine the proper business rulesPowerPoint Presentation for

Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 26

Page 27: 2012 Chap 06 Entity Relationship Diagrams.pptx

Reading an ERD• Using the ERD to Show Business Rules:

– On a data model, business rules are communicated by the kinds of relationships that the entities share.

– From the ERD, we know from the crow’s foot the business rule.

7 - 27

A sale may include many sold items.

A payment pays for exactly one sale.

Business rules:The system should not permit:- A sale with no sold items.- A payment to pay for more than

one sale.

Page 28: 2012 Chap 06 Entity Relationship Diagrams.pptx

Reading an ERD

• ERD symbols can show when one instance of an entity must exist for an instance of another to exist (1:1)– A doctor must exist before appointments for the doctor

can be made• ERD symbols can show when one instance of an entity can be

related to only one or many instances of another entity (1: N)– One doctor can have many patients; each patient may have

only one primary doctor• ERD symbols show when the existence of an entity instance is

optional for a related entity instance – A patient may or may not have insurance coverage

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 28

Page 29: 2012 Chap 06 Entity Relationship Diagrams.pptx

Relationships:Cardinality and Modality

Zero or Many

One or Many

One and Only One

Zero or One

Page 30: 2012 Chap 06 Entity Relationship Diagrams.pptx

Relationships:Cardinality and Modality

A single student may register for several courses. A single course can have many students enrolled in it.

Each student fills one seat in a class. Each seat is filled by one student.

A single instructor may teach several courses. Each course has only one instructor.

Each professor may teach several course sections but may not teach at all if on sabbatical. Assume there is no team teaching, therefore each section must have a single professor.

Each course must have at least one section but often has several sections.

Page 31: 2012 Chap 06 Entity Relationship Diagrams.pptx

Relationships:Cardinality and Modality

Page 32: 2012 Chap 06 Entity Relationship Diagrams.pptx

The data dictionary and metadata• CASE tools is used to help build ERDs.

– Every CASE tool has something called a data dictionary.• Data dictionary – where the analyst goes to define or loop up

information about entities, attributes, and relationships on the ERD.

• Metadata – is the information in the data dictionary.– Is data about data.– Is anything that describes an entity, attribute, or relationship, such

as:• Entity names• Attribute descriptions• Relationship cardinality

Page 33: 2012 Chap 06 Entity Relationship Diagrams.pptx

The Data Dictionary and Metadata

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 33

• Metadata is information stored about components of the data model

• Metadata is stored in the data dictionary so it can be shared by developers and users throughout the SDLC

• A complete, shareable data dictionary helps improve the quality of the system under development

Page 34: 2012 Chap 06 Entity Relationship Diagrams.pptx

Quiz

Page 35: 2012 Chap 06 Entity Relationship Diagrams.pptx

Outline • The ERD

– Reading an ERD– Elements of an ERD– The data dictionary and metadata

• Creating an ERD– Building ERD– Advanced syntax

• Validating ERD– Design guidelines– Normalization– Balancing ERD with DFD

Page 36: 2012 Chap 06 Entity Relationship Diagrams.pptx

Basic ERD Syntax• ERD – is the most common technique for drawing a data model.

– A formal way of representing the data that are used and created by a business system.

• The 3 basic elements in the data modeling language, each of which is presented by a different graphic symbol:– Entity – is the basic building block for a data model.

• A person, place, or thing about which data are collected.– Attribute – is some type of information that is captured about an entity.

• Identifier - can uniquely identify one instance of an entity.– Relationship – conveys the associations between entities.

• Have cardinality (the ratio of parent instances to child instances).• Have modality (a parent needs to exist if a child exists).

• Information about all of the components is captured by metadata in the data dictionary.

Page 37: 2012 Chap 06 Entity Relationship Diagrams.pptx

Creating an ERD

• Building ERD• Advanced syntax

Page 38: 2012 Chap 06 Entity Relationship Diagrams.pptx

Building ERD

• Step 1 Identify the entities• Step 2 Add attributes and assign identifiers• Step 3 Identify relationships

Page 39: 2012 Chap 06 Entity Relationship Diagrams.pptx

Step 1: Identify the Entities

• Identify major categories of information– If available, check the process models for data

stores, external entities, and data flows• Data stores – potential to become an entity

– Check the major inputs and outputs from the use cases

• Verify that there is more than one instance of the entity that occurs in the system

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 39

Page 40: 2012 Chap 06 Entity Relationship Diagrams.pptx

Step 2: Add Attributes and Assign Identifiers

• Attributes – the information that describes each entity.• Identify attributes of the entity that are relevant to the

system under development– Check the process model repository entries for details on

data flows and data stores• The elements of data flow – become attributes of the entity

– Check the data requirements of the requirements definition – Interview knowledgeable users– Perform document analysis on existing forms and reports

• Select the entity’s identifierPowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 40

Page 41: 2012 Chap 06 Entity Relationship Diagrams.pptx

Step3: Identify Relationships

• Start with an entity and identify all entities with which it shares relationships

• Describe the relationship with the appropriate verb phrase

• Determine the cardinality and modality by discussing the business rules with knowledgeable users

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 41

Page 42: 2012 Chap 06 Entity Relationship Diagrams.pptx

ERD Building Tips

• Data stores of the DFD should correspond to entities

• Only include entities with more than one instance of information

• Don’t include entities associated with implementation of the system (they will be added later)

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 42

Page 43: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax

• There are 3 special types of entities that ERDs contain:– Independent entity– Dependent entity– Intersection entity

Page 44: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax

7 - 44

• Independent Entity– Can exist without the help of another entity– Identifiers created from the entity’s own attributes– Attributes from other entities are not needed to uniquely

identify instances of these entities– an entity at the “1” end of a relationship or an entity with an

identifier that describes only the entity.– Drawn as rectangles with a single border line– Non-identifying relationship – when a relationship includes an

independent child• Parent entity attributes are not needed as part of the child entity’s

identifier.

Page 45: 2012 Chap 06 Entity Relationship Diagrams.pptx

fig_06_11

fig_06_11Independent entities - Manufacturer, vehicle, salesperson, and customer.

The vehicle number is sufficient to uniquely identify vehicles.

Information from order, offer, or sold vehicle entities is not needed to identify a vehicle.

The customer – offer relationship is an example of a non-identifying relationship.

Page 46: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 46

• Dependent Entity– Relationships when a child entity does require attributes from

the parent entity to uniquely identify an instance– entity cannot exist without the presence of another entity and is

normally on the “many” end of a relationship or has an identifier that is based on another entity’s attribute

– entities that rely on attributes from other entities to identify an instance.

– Drawn as a rectangle with a double border line– Identifying relationships – when relationships have a dependent

child• Parent entity attributes are needed as part of the child entity’s identifier

Page 47: 2012 Chap 06 Entity Relationship Diagrams.pptx

fig_06_11

fig_06_11

A sold vehicle is a specific vehicle that has been sold to a specific customer.

To fully indentify the Sold vehicle, we use the vehicle number from the Vehicle entity and the customer number from the Customer entity as the sold vehicle identifiers.

Sold vehicle entity is an example of dependent entity.

Page 48: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 48

• Intersection Entity– Also called associative entity.– is placed between two entities to capture

information about their relationship. – added to a data model to store information

about two entities sharing an M : N relationship.

Page 49: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax – Resolving an M : N RelationshipThere are 3 steps involved in adding an intersection entity:

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 49

Step 1 - Remove the M:N relationship line and insert a new entity in between the two existing ones.

Step 2 - Add two 1:N relationships to the model. The 2 original entities should serve as the parent entities for each 1:N and the new intersection entity becomes the child entity in both relationships.

Step 3 - Name the intersection entity

Page 50: 2012 Chap 06 Entity Relationship Diagrams.pptx

Advanced Syntax

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 50

• Intersection Entity• Are intersection entities dependent or

independent?• It depends.

– Independent entity - sometimes, it has a logical identifier that can uniquely identify its instances.

– Dependent entity – requires the identifiers from both parent entities.

Page 51: 2012 Chap 06 Entity Relationship Diagrams.pptx

• In general, data models are based on interpretation; therefore, it is important to clearly state assumptions that reflect business rules.

Page 52: 2012 Chap 06 Entity Relationship Diagrams.pptx

Validating ERD

• Design guidelines• Normalization• Balancing ERD with DFD

Page 53: 2012 Chap 06 Entity Relationship Diagrams.pptx

Design Guidelines

• Design guidelines – are not rules that must be followed; rather, they are “best practices” that often lead to better quality diagrams.

Page 54: 2012 Chap 06 Entity Relationship Diagrams.pptx

Design Modeling Guidelines Summary

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 54

Page 55: 2012 Chap 06 Entity Relationship Diagrams.pptx

Outline • The ERD

– Reading an ERD– Elements of an ERD– The data dictionary and metadata

• Creating an ERD– Building ERD– Advanced syntax

• Validating ERD– Design guidelines– Normalization– Balancing ERD with DFD

Page 56: 2012 Chap 06 Entity Relationship Diagrams.pptx

The two methods to validate ERD

a. Normalization and b. Balancing with process models or Balancing

ERD with DFD

Page 57: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 57

• Normalization - is the process whereby a series of rules is applied to the logical data model to determine how well formed it is.

• Technique used to validate data models• Series of rules applied to logical data model to improve its

organization– Identify entities that are not represented correctly – Identify entities that can be broken out from a file

• Result of normalization: the data attributes are arranged to form stable, yet flexible relations for the data model.

• Three normalization rules are common – 1NF, 2NF, 3NF

Page 58: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization• A logical data model is in:

– 1NF (first normal form) – if it does not contain repeating attributes, which are attributes that capture multiple values for a single instance.

• If the logical data model does not contain attributes that have repeating values• Repeating attributes, or repeating groups• For the model to pass 1NF -- every attribute in an entity should have only one value per instance

– 2NF (second normal form) – requires that all entities are in 1NF and contain only attributes whose values are dependent on the whole identifier (i.e., no partial dependency).

• If the logical data model contains attribute values that depend on an attribute that is not the identifier

• When the analyst is evaluating a data model to ensure that all fields in a record depend fully on the entire primary key

• A logical data model that does not lead to repeating fields and that the data models leads to tables containing fields that are dependent on the whole identifier

– 3NF (third normal form) – occurs when a model is in both 1NF and 2NF and none of the resulting attributes is dependent on nonidentifier attributes (i.e., no transitive dependency).

• With each violation, additional entities should be created to remove the repeating attributes or improper dependencies from the existing entities.

Page 59: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization Steps

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 59

Page 60: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization• A logical data model is in:

– 1NF (first normal form) – if it does not contain repeating attributes (attributes that capture multiple values for a single instance)

• If the logical data model does not contain attributes that have repeating values (Repeating attributes, or repeating groups)

• For the model to pass 1NF -- every attribute in an entity should have only one value per instance

• With each violation, additional entities should be created to remove the repeating attributes or improper dependencies from the existing entities.

Page 61: 2012 Chap 06 Entity Relationship Diagrams.pptx

Unnormalized Entity

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 61

Moving from O normal form to 1st normal form (1NF) When normalizing data models, if you take attributes that have multiple values for a single instance of an entity and create separate entities for those attributes,

Page 62: 2012 Chap 06 Entity Relationship Diagrams.pptx

Unnormalized Entity

7 - 62

There are 2 cases in which multiple values are captured for one or more attributes:1) Multiple occurrences of CD – upc,

title, artist, label, category, price- create a new entity called CD

2) Music preferences – jazz, rock, etc.- create a new entity called

preference

Look for repeating groups of attributes and remove them into separate entities

Page 63: 2012 Chap 06 Entity Relationship Diagrams.pptx

First Normal Form (1NF)

7 - 63Unnormalized EntityFirst Normal Form (1NF)

Page 64: 2012 Chap 06 Entity Relationship Diagrams.pptx

First Normal Form (1NF)

Resolve M:N relationships, creating new intersection entity

new intersection entity

Page 65: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization• A logical data model is in:

– 2NF (second normal form) – requires that all entities are in 1NF and contain only attributes whose values are dependent on the whole identifier (i.e., no partial dependency).

• Means that the value of all attributes that serve as identifier can determine the value for all of the other attributes for an instance in an entity.

• If the logical data model contains attribute values that depend on an attribute that is not the identifier

• When the analyst is evaluating a data model to ensure that all fields in a record depend fully on the entire primary key

• A logical data model that does not lead to repeating fields and that the data models leads to tables containing fields that are dependent on the whole identifier

• With each violation, additional entities should be created to remove the repeating attributes or improper dependencies from the existing entities.

Page 66: 2012 Chap 06 Entity Relationship Diagrams.pptx

Second Normal Form (2NF)

7 - 66

Is the identifier comprised of more than one attribute?If so, are any attribute values dependent on just part of the identifier?Yes: Remove the partial dependency.Move the attributes to an entity in which their values are dependent on the entire identifier.Usually you will need to create a new entity and add a relationship to connect the old and new entities.

Page 67: 2012 Chap 06 Entity Relationship Diagrams.pptx

Second Normal Form (2NF)

7 - 67

Is the identifier comprised of more than one attribute? Yes. If so, are any attribute values dependent on just part of the identifier? Yes.1) Remove the partial

dependency.- The CD Purchase had 3

attributes. - Some of the attributes

were dependent on the customer last and first name, but had no dependency on purchase date.

If an entity has a concatenated identifier, look for attributes thatdepend only on part of the identifier. If found, remove to new entity.

Page 68: 2012 Chap 06 Entity Relationship Diagrams.pptx

Second Normal Form (2NF)

7 - 68

2) Move the attributes to an entity in which their values are dependent on the entire identifier.

- These attributes were those that describe a customer: phone, address, e-mail, and birth date.

- To resolve this problem, a new entity called customer was created and customer attributes were also moved.

If an entity has a concatenated identifier, look for attributes thatdepend only on part of the identifier. If found, remove to new entity.

Page 69: 2012 Chap 06 Entity Relationship Diagrams.pptx

Second Normal Form (2NF)

7 - 69

3) Usually you will need to create a new entity and add a relationship to connect the old and new entities.

- The identifying relationship 1:N between customer and CD purchase implies that the customer identifier (last name and first name) are used in CD Purchase as a part of its identifier.

If an entity has a concatenated identifier, look for attributes thatdepend only on part of the identifier. If found, remove to new entity.

Page 70: 2012 Chap 06 Entity Relationship Diagrams.pptx

Normalization• A logical data model is in:

– 3NF (third normal form) – occurs when a model is in both 1NF and 2NF and none of the resulting attributes is dependent on nonidentifier attributes (i.e., no transitive dependency).

• With each violation, additional entities should be created to remove the repeating attributes or improper dependencies from the existing entities.

Page 71: 2012 Chap 06 Entity Relationship Diagrams.pptx

Third Normal Form (3NF)

7 - 71

Page 72: 2012 Chap 06 Entity Relationship Diagrams.pptx

Third Normal Form (3NF)

7 - 72

Look for attributes that depend only on another non-identifying attribute. If found, remove to new entity. Also remove any calculated attributes.

The problem with the CD Purchase is that there are attributes that depend on the payment number, not the CD purchase date and customer first and last names.

The payment type, account number, authorization, and amount depend on the payment number, a nonidentifying attribute.

Create a separate Payment entity and move the payment attributes to it.

Page 73: 2012 Chap 06 Entity Relationship Diagrams.pptx

Third Normal Form (3NF)

7 - 73

Look for attributes that depend only on another non-identifying attribute. If found, remove to new entity. Also remove any calculated attributes.

Create a separate Payment entity and move the payment attributes to it.

The 1:1 relationship assumes that there is one payment for every CD purchase, and every CD purchase has one payment.

Also, a payment is required for every CD purchase, and every CD purchase requires a payment.

Page 74: 2012 Chap 06 Entity Relationship Diagrams.pptx

Third Normal Form (3NF)• 3NF – also addresses issues of derived, or calculated attributes.• Derived attributes – can be calculated from other attributes

and do not need to be stored in the data model.• Example, person’s age – would not be stored as an attribute if

birthdate were stored• How about the total due?

– The total due is not stored as an attribute Its value can be calculated by summing the prices of all the CDs

– The total due is stored as an attribute - serve as a control value – to verify that no purchased CDs are omitted from the entire purchase

7 - 74

Page 75: 2012 Chap 06 Entity Relationship Diagrams.pptx

Balancing ERD with DFD

• ERDs should be balanced with the DFD – by making sure that data model entities and attributes correspond to data stores and data flows on the process model.

• The CRUD matrix – is a valuable tool too use when balancing process and data models.

Page 76: 2012 Chap 06 Entity Relationship Diagrams.pptx

Balancing ERDs with DFDs

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 76

• All analysis activities are interrelated• Process models contain two data components

– Data flows and data stores• The DFD data components need to balance the ERD’s data stores

(entities) and data elements (attributes)• Many CASE tools provide features to check for imbalance• Check that all data stores and elements correspond between

models– Data that is not used is unnecessary– Data that has been omitted results in an incomplete system

• Do not follow thoughtlessly -- check that the models make sense!

Page 77: 2012 Chap 06 Entity Relationship Diagrams.pptx

Partial Process Model and CRUD Matrix

7 - 77

CRUD Matrix – a useful tool to clearly depict the interrelationship between process and data models.

CRUD – create, read, update, delete, a table that depicts how the system’s processes use the data within the system.

If a process reads information from a data store, but does not update it, there should be a data flow coming out of the data store only.

When a process updates a data store in some way, there should be a data flow going from the process to the data store.

Page 78: 2012 Chap 06 Entity Relationship Diagrams.pptx

Partial Process Model and CRUD Matrix

7 - 78

CRUD Matrix – a useful tool to clearly depict the interrelationship between process and data models.

If the attribute is not read by some process, then the attribute is probably not needed.

If the attribute is not created or updated, the attribute probably needs to be added to a data flow in the process model.

Page 79: 2012 Chap 06 Entity Relationship Diagrams.pptx

Summary• The ERD is the most common technique for drawing data

models. The building blocks of the ERD are:– Entities describe people, places, or things– Attributes capture information about the entity– Relationships associate data across entities

• Intersection, dependent, and independent entities must be recognized.

• The ERD must be balanced with the DFD.

PowerPoint Presentation for Dennis, Wixom, & Roth Systems Analysis and Design, 4th EditionCopyright 2009 © John Wiley & Sons, Inc. All rights reserved..

7 - 79

Page 80: 2012 Chap 06 Entity Relationship Diagrams.pptx
Page 81: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Provide three different options that are available when selecting an identifier for a student entity. What are the pros and cons of each choice?

• • The three options available for selecting an entity identifier: (1) Put

together a combination of attributes to serve as the identifier (such as first_name and last_name). (2) Sometimes, a single attribute is available that can serve as the identifier (such as social security number, though there may be legal constraints on use of this data in this way). (3) A new attribute can be created to serve as the identifier (such as student_id). Creating a new attribute is normally done in the design phase.

• • Any of these ways of selecting an identifier is acceptable. The important

point is to ensure that the chosen identifier uniquely identifies each instance of the entity.

Page 82: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is the purpose of developing an identifier for an entity?• • One of the aspects of the definition of an entity is the fact

that there are multiple occurrences of the entity. If there are not multiple instances of something that is a potential entity, that something is not an entity in the system. Consequently, there must be a way of identifying each individual occurrence of an entity so that it can be picked out from amongst all the other instances of the entity. That is the purpose of having identifiers with unique values.

Page 83: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What type of high-level business rule can be stated by an ERD? Give two examples.

• • A business rule is a constraint or guideline to follow

during operation of the system. Examples of business rules are: an order belongs to just one customer; a customer cannot cancel an order that has been shipped; a backorder can be created for an out of stock product. Business rules are expressed on ERDs by the kinds of relationships that the entities share.

Page 84: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Define what is meant by an entity in a data model. How should an entity be named? What information about an entity should be stored in the CASE repository?

• • An entity is a person, place, thing, or event about which data

is collected and stored. Entities names are nouns. Information stored in a CASE repository regarding an entity includes:

• Name• Definition• Special Notes

Page 85: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Define what is meant by an attribute in a data model. How should an attribute be named? What information about an attribute should be stored in the CASE repository?

• • An attribute is a characteristic that describes an entity. Attribute names are nouns.

Information stored in a CASE repository regarding an attribute includes:• Name• Definition• Alias• Sample Values• Acceptable Values• Format• Type• Special Notes

Page 86: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Define what is meant by a relationship in a data model. How should a relationship be named? What information about a relationship should be stored in the CASE repository?

• • A relationship describes the association between entities. Relationship names

are labeled as active verbs. Information stored in a CASE repository regarding a relationship includes:

• Verb Phrase• Parent Entity• Child Entity• Definition• Cardinality• Modality• Special Notes

Page 87: 2012 Chap 06 Entity Relationship Diagrams.pptx

• A team of developers is considering including ‘warehouse’ as an entity in their data model. The company for whom they are developing the system has just one warehouse location. Should warehouse be included? Why or why not?

• • Entities represent something for which there exist multiple

instances or occurrences. If there is only one instance of a warehouse, then it would not be best represented by an entity. However, if multiple warehouses were planned in the future, then a warehouse entity should be included.

Page 88: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is meant by a concatenated identifier?• • A concatenated identifier is one in which a

combination of attributes serves to uniquely identify an entity. For instance, an appointment entity may have multiple instances for a single date identifier, but the combination of date and time will uniquely identify each instance.

Page 89: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Describe to a businessperson the cardinality and modality of a relationship between two entities.

• • These two terms are used to refer to the ‘numerical’ relationship between two entities in a

data model. The term, cardinality, refers to the maximum number of times an instance of one entity can be related to instances of the other entity. If the cardinality is one -to-one, then we can infer that one instance of the parent entity can be related to just one instance of the child entity. If the cardinality is more than one, then we know that one instance of the parent entity can be related to more than one instance of the child entity. The determination of cardinality is based upon whatever is appropriate for the business situation being described. The term, modality, refers to whether or not an instance of a child entity can exist without a related instance in the parent entity. Modality values are either null or not null. If the modality is null, then we can infer that no instances of the child entity are required for an instance of the parent entity. If the modality value is not null, then there must be one instance of the child entity for an instance of the parent entity. Just as in the case of cardinality, the determination of modality is based upon whatever is appropriate for the business situation being described.

Page 90: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is metadata? Why is it important to system developers?.• • Metadata is information we want to collect and document regarding the

components of the data model. Metadata helps us more fully understand the meaning and use of the data model components. Since there are typically several members of the project team, specifying metadata helps ensure that each team member has a consistent understanding of the data model components. Metadata is usually stored in the project repository; CASE tools have their own structures for the entry of metadata.

• • Metadata is captured to help designers better understand the system that

they are building and to help users better understand the system they will use. The metadata information can be used to integrate the different pieces of the analysis phase and can lead to a much better design.

Page 91: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is an independent entity? What is a dependent entity? How are the two types of entities differentiated on the data model?

• • Independent entities are entities that can exist without the presence of another

entity. The independent entity does not rely on any other entity in order to exist. On the other hand, dependent entities require the presence of another entity in order to exist. These entities rely on attributes from the parent entity to uniquely identify an instance, and therefore depend on another entity. For example, when an order is placed for a product, an entity that represents a specific product on a specific order is usually created. This entity would not exist without the order and product entities, and in fact gets its identifiers from those entities. So, this ordered_product entity is a dependent entity. Independent entities are represented by rectangles, while dependent entities are usually represented as rectangles with double-border lines.

Page 92: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Explain the distinction between identifying and non-identifying relationships.

• • When relationships have an independent child

entity, they are called non-identifying relationships, When relationships have a dependent child entity, they are called identifying relationships.

Page 93: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is the purpose of an intersection entity? How do you know one is needed in an ERD?

• • An intersection entity is created when we need to capture more

information about the relationship between two entities. This often occurs when two entities have a many-to-many relationship. One instance of entity A may be related to many instances of entity B, and one instance of entity B can be related to many instances of entity A. The intersection entity is inserted between entities A and B, and is used to capture information about a specific instance of entity A related to a specific instance of entity B.

Page 94: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Describe the three-step process of creating an intersection entity.

• • Remove the M:N relationship line and insert a

new entity between the two existing ones.• Add two 1:N relationships to the model.• Name the intersection entity.

Page 95: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Is an intersection entity dependent or independent? Explain your answer.

• • If the intersection entity has a logical identifier

that can uniquely identify instances within, then it would be considered an independent entity. If, however, the intersection entity requires the identifiers from its parent entities to be uniquely identified, then it is a dependent entity.

Page 96: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is the purpose of normalization?• • Normalization is a process that optimizes

relational data storage for storage efficiency. Normalization helps analysts identify entities that are not represented correctly in a logical data model, or entities that can be broken out from a file. The rules of normalization help assure that the data is stored as efficiently as possible.

Page 97: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Describe the analysis that is applied to a data model in order to place it in first normal form (1NF).• • First normal form requires that all repeating fields or groups of fields have been removed to

separate tables. • • Describe the analysis that is applied to a data model in order to place it in second normal form

(2NF).• • Second normal form requires that the data model be in 1NF, and that all partial dependencies have

been removed. • • • Describe the analysis that is applied to a data model in order to place it in third normal form (3NF).• • Third normal form requires that the data model be in 2NF, and that all transitive dependencies are

removed (i.e., that no fields are dependent on other, non-primary key fields).•

Page 98: 2012 Chap 06 Entity Relationship Diagrams.pptx

• Describe how the data model and process model should be balanced against each other.

• • The key to balancing DFDs and ERDs is to recognize that all

system data must be accounted for on each type of diagram. The ERD shows the system data ‘at rest,’ while the DFD shows the flow and use of data in the system. Generally, all of the data entities shown on the ERDs will correspond to data stores on the DFDs. That is one aspect of balancing. In addition, the attributes that are a part of the data model should be used somewhere in the flows and stores of the process models.

Page 99: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is a CRUD matrix? How does it relate to process models and data models?

• • The CRUD (create, read, update, delete) matrix shows how data

is used by the processes within the system. In the design phase, it helps analysts ensure that all of the data stores used by the processes have been created. This is a tangible way to link the processes from the process models and the data stores from the data model, ensuring that no data required by the processes has been omitted from the data model. It will also depict data from the data model that is not used by any processes, and should therefore be considered for elimination from the data model.

Page 100: 2012 Chap 06 Entity Relationship Diagrams.pptx

• What is a CRUD matrix? How does it relate to process models and data models?

• • The CRUD (create, read, update, delete) matrix shows how data

is used by the processes within the system. In the design phase, it helps analysts ensure that all of the data stores used by the processes have been created. This is a tangible way to link the processes from the process models and the data stores from the data model, ensuring that no data required by the processes has been omitted from the data model. It will also depict data from the data model that is not used by any processes, and should therefore be considered for elimination from the data model.

Page 101: 2012 Chap 06 Entity Relationship Diagrams.pptx
Page 102: 2012 Chap 06 Entity Relationship Diagrams.pptx