1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring...

94
1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010

Transcript of 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring...

Page 1: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

1

AUInformation & Data Analysis

Professor J. Alberto Espinosa

Business AnalysisITEC-455 Spring 2010

Page 2: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

2

Agenda

• Introduction to database concepts• Data modeling & relational database design• Transitional artifacts: the CRUD matrix –

linking requirements to data design• Normalization• Database queries

Page 3: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

3

Data Modeling Concepts

Page 4: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

4

Database Management System (i.e., Database Platform)Database Management System (i.e., Database Platform)(e.g., Oracle, Access, SQL Server, etc.)(e.g., Oracle, Access, SQL Server, etc.)

BusinessBusinessApplication 1Application 1

How Most Business Applications are Implemented:

Database 2

BusinessBusinessApplication 2Application 2

BusinessBusinessApplication 3Application 3

EtcEtc

Database 3 Database 4 Etc. Database 1

Page 5: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

5

DBMS and database work in the same computer: the user’s computer OK for personal productivity

Stand-alone DBMS

Database

Stand-aloneDBMS

(e.g., MS Access)

Page 6: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

6

DBMS Server: runs the “back-end” part of the DBMS and performs most of the data management functions – e.g., queries, updates, etc.

DBMS Client: runs “front-end” part of the DBMS that provides the user interface (e.g., data entry, screen displays or presentation, report formatting, query building tools)

Data Request (e.g., query)

DBMS in a Client/Server Environment:Better for corporate use the DBMS has two components

Database

DBMSClient

DBMSServer

Response(e.g., query result)

Retrieve, add, delete and/or update data

Page 7: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

7

DBMS in a Web Server Environment:Very common when there are large numbers of users and would be impractical to

deploy and install a DBSM client access to the database is done through a browser (e.g., on-line purchases)

Request (ex. get a price quote, place an order)

Response (ex. query results with HTML-formatted product price or order confirmation notice)

Page 8: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

8

Business to Business E-Commerce Example

using XML

Internet

e.g., supplier

e.g., buyerDBMS

(e.g., Oracle)SELECT

query

XML Processor

XML Document (e.g., Purchase

Order)

DBMS(e.g., MS

SQL Server)

INSERT query

XML Processor

XML Document (e.g., Purchase

Order)

Page 9: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

9

Most Common Database Models

• Hierarchical (of historical interest only)

• Network (of historical interest only)

• Relational

• Object Oriented (new)

Page 10: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

10

• For a database to be truly relational, it must comply with 12 rules defined by its inventor (Dr. E. F. Codd).

• No commercially available database complies with the full set of rules, but the 12 rules are used as guidelines for sound database design.

• Rule 1 states that data should be presented in tables• Rule 2 states that data must be accessible without

ambiguity• We will talk more about other rules later (i.e., about

entity integrity and referential integrity – stay tuned).

Relational Database

Page 11: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

11

A relational database must have:• Tables: or “entities”

Every table has a unique nameEx. Students, Courses

• Fields: or “columns”, “attributes”Every field has a unique name within the tableEx. Students (StudentID, StudentName, Major, Address)Ex. Courses (CourseNo, CouseName, CreditPoints,

Description)• Records: or “rows”, “tuples”, “instances”

Every record is unique (has a unique field that identifies it)Ex. {“jdoe”, “John Doe”, “CS”, 5000 Forbes Ave.)Ex. {“MGMT-352-001”, “MIS”, Fall 2002, “A great course”}

Implications about Rule 1

Page 12: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

12

Object Oriented (OO) Databases

• OO languages + added database functionality, or• Database products + added OO programming facilities• Similar to relational databases• “Classes” (a grouping of similar objects -- like tables)• “Objects” (an instance of a class -- like records)• “Object properties” (object attributes -- like fields)

• Plus:– Methods (i.e., procedures or programs)

Programs embedded in classes and objects– Other OO Properties (inheritance, encapsulation, etc.)

Page 13: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

13

Terminology Equivalence

ERD or Data Model

OO Database RelationalDatabase

OtherTerms Used

Entity Class Table

Instances Objects Records Rows, Tuples

Relationship Relationship Relationship

Attributes Properties Fields Columns

Page 14: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

14

Important Data Modeling Concepts

Page 15: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

15

Data Modeling Goals

• Data integrityAvoid anomalies in the data

• No data redundancyRecord the data in one place only

• Efficient data entryDuplicate data means having to enter the same data more than once

• ConsistencyDuplicate data can lead to inconsistencies when the data changese.g., 2 different addresses for same client

• Flexibility and easy evolutionEast to maintain, update and add new tables

Page 16: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

16

Data Integrity Issue #1:

Enforcing Entity Integrity Inspect Each Table

Page 17: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

17

Entity Integrity

• Is ensuring that every record in each table in the database can be addressed (i.e., found) – this means that there each record has to have a unique identifier that is not duplicate or null (i.e., not blank)

• Examples: every student has an AU ID; every purchase order has a unique number; every customer has an ID

Primary key (PK) helps enforce Entity Integrity:• Field(s) that uniquely identifies a record in a table

(e.g., AU user ID)• Entity integrity = PK is not duplicate & not blank• PK can be:

– A single field (e.g., UserID), or– Or more than one field (e.g., OrderNo, LineItem)

Page 18: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

18

Data Integrity Issue #2:

Enforce Referential Integrity

Inspect each relationship between any two tables

Page 19: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

19

• Is ensuring that the data that is entered in one table is consistent with data in other tables

• Examples: purchase orders can only be placed by valid customers; accounting transactions can only be posted to valid company accounts

Foreign key (FK) helps enforce referential Integrity:• A field in a table that is a PK in another table• That is, a field that “must” exist in another table• This is how referential integrity is maintained

Referential Integrity

Page 20: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

20

Illustration: Primary and Foreign Keys

PK

PK

FK

Page 21: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

21

Entity, Referential Integrity

PK

FKPK

PK

PK, FKPK, FK

Database Schema: The structure of the database, which contain tables, views, constraints, relations, etc. – just about everything, except the data itself

Page 22: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

22

Other Important Keys

• Candidate Keys:– Often there are more than one keys that could serve as a primary

key– Example: Order, LineItem vs. Order, ProdID– Example: AU ID, SSN, AU Login ID– These are called candidate– Any candidate can be selected as the primary key

• Alternative Keys:– Once a primary key has been selected from the choice of

candidate keys, the other keys (not used as PKs) are referred to as “alternative keys”

Page 23: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

23

Developing Data Modelsalso called Entity-Relationship Diagram (ERD)

Page 24: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

24

Data Model ExampleCourse Registration System

Instructors

InstructorID

LastNameFirstNameTelephoneEMailAddr

Courses

CourseNo

CourseDescriptionInstructorIDCreditPointsPreRequisitesClassroomNo

Teach

Enrollments

StudentIDCourseNo

Comments

Students

StudentID

LastNameFirstNameSSNDepartmentCollegeMajorEMailAddr

Enrolls

Includes

1Many

Many

Many

1

1

Entities

Relationships

Page 25: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

25

Data Model Example (MS Access equivalent)

Course Registration System

Teaches

Enrolls

Includes

1 toMany

EntitiesRelationships

Cardinality

Page 26: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

26

The Textbook’s ERD Notation

LastName FirstName

Telephone EMail

InstructorID

InstructorID(FK) CourseDescr

CreditPoints PreReqs

CourseNo

Instructors CoursesTeach

Entities

Relationships

Page 27: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

27

Peter Chen’s ERD Notation

Instructors

PK InstructorID

LastNameFirstNameTelephoneEMail

Course

PK CourseNo

CourseDescriptionFK1 InstructorID

CreditPointsPreRequisites

Teaches

Page 28: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

28

Conceptual Data Modeling

• Data-oriented modeling method that describes the data and relationships among data entities

• Goal: capture meaning of the data

• 2 main ERD or data model constructs:

Entities and its attributes

Relationships between entities

Page 29: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

29

Entity

“An object, person, place, event or thing or which we want to record data”

• Equivalent to a table in a database• Examples: instructors, students, classrooms, invoices,

registration, machines, countries, states, etc.

• Entity instance: a single occurrence of an entityExample: Espinosa, KSB T58, ITEC 455

• Entities can be identified in a requirements analysis description by following the use of NOUNS

Page 30: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

30

Relationships

• Relationships describe how two entities relate to each other

• Relationships in a database application can be identified following the VERBS that describe how entities are associated with one another

• Examples: students enroll in courses countries have cities, etc.

Page 31: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

31

Cardinality• Cardinality is an important database concept to describe how two

entities are related

• The Cardinality of a relationship describes how many instances of one entity can be associated with another entity

• The cardinality of a relationship between two entities has two components:– Maximum Cardinality: is the maximum number of instances that

can be associated with the other entity – usually either 1 or many (the exact number is rarely used)

– Minimum Cardinality: is the minimum number of instances that can be associated with the other entity – usually either 0 or 1

– Symbols:

0

1

Many

Page 32: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

32

Cardinality (cont’d.)

• A relationship is fully described by describing the cardinality in both directions of the relationship: e.g., a client places zero (i.e., optional) or many orders and each order must relate to only one (i.e., mandatory) client.

• Examples:

1 student can only park 1 (or 0) cars 1 to (0 or) 1

1 client can place (0 or ) many orders 1 to (0 or) many

1 student can enroll in (at least 1 or) many courses anda course can have (0 or) many students (0 or) many to (1 or) many

Page 33: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

33

Example: 2 Entities, 1 Relationship

Instructors

PK InstructorID

LastNameFirstNameTelephoneEMail

Course

PK CourseNo

CourseDescriptionFK1 InstructorID

CreditPointsPreRequisites

Teaches

Peter Chen’s notation& MS Visio software

One and only one

Zero or many

Page 34: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

34

ERD SYMBOLS (cont’d.)Note: high level conceptual models don’t show attributes, just entities

1 to 1MaximumCardinality(outer symbol)

Minimum Cardinality (inner symbol)

Mandatory Optional

Employee BioData

Employee FamilyData

Has

Has

Peter Chen’s notationusing Systems Architect software

Page 35: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

35

ERD SYMBOLS (cont’d.)

1 to Many

1 to Many (or None)

MaximumCardinality

Minimum Cardinality

Mandatory Optional

Advisor Student

Faculty CourseTeaches

Peter Chen’s (“crow’s feet”) notationusing Systems Architect software

→ Advises← Have

Page 36: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

36

Many to Many Relationships?

Many to Many

1 to Many

1 to Many (or None)

Convert a Many-to-Many into 2 One-to-Many’s

Orders Products

ProductsOrders

LineItems

Intersection Table

Page 37: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

37

Cardinality: 1 to 1 (MS Access notation)

Page 38: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

38

Cardinality: 1 to many(MS Access notation)

Page 39: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

39

Steps in data modeling Modeling

1. Identify and diagram all ENTITIES

2. Add PK attributes – i.e., implement entity integrityEnsure PK’s are non-null & non-duplicates

3. Identify and diagram all RELATIONSHIPSNote CARDINALITIES (1 to 1, 1 to n, n to n)

4. Add FK attributes – i.e., implement referential integrity (this is automatic in some tools—MS Access)

5. Add remaining attributes

Page 40: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

40

ERD Example:Course Registration System

Courses (CourseNo (PK), CourseDescripition, InstructorID, CreditPoints, ClassroomNo)

PreRequisites (CourseNo (PK), PreRequisiteNo (PK), Comments)

Students (StudentID (PK), LastName, FirstName, SSN, Department, College, Major, EMail)

Enrollment (StudentID (PK), CourseNo (PK), Comments)

Instructors (InstructorID (PK), LastName, FirstName, Telephone, EMail)

Classrooms (ClassroomNo (PK), ClassroomName, Building, BuildingRoomNo, Equipment, Capacity)

Note: PK denotes a primary key

Page 41: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

41

Example: Course Registration SystemStep 1. Draw Entities

InstructorsCoursePreRequisites

ClassRooms Enrollment Students

Page 42: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

42

Instructors

InstructorID

Course

CourseNo

PreRequisites

CourseNoPreRequisiteNo

ClassRooms

ClassroomNo

Enrollment

StudentIDCourseNo

Students

StudentID

Example: Course Registration SystemStep 2. Add PK’s (undeline/separate with a line)

Page 43: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

43

Instructors

PK InstructorID

Course

PK CourseNo

TeachesPreRequisites

PK,FK1 CourseNoPK PreRequisiteNo

ClassRooms

PK ClassroomNo

Enrollment

PK,FK1 StudentIDPK,FK2 CourseNo

Students

PK StudentID

has

Enrolls

IncludesAssigned

Example: Course Registration SystemStep 3. Add Relationships (w/Cardinalities)

Page 44: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

44

Example: Course Registration SystemStep 4. Add FK’s

Instructors

PK InstructorID

Course

PK CourseNo

FK1 InstructorIDFK2 ClassroomNo

TeachesPreRequisites

PK,FK1 CourseNoPK PreRequisiteNo

ClassRooms

PK ClassroomNo

Enrollment

PK,FK1 StudentIDPK,FK2 CourseNo

Students

PK StudentID

has

Enrolls

IncludesAssigned

Page 45: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

45

Instructors

PK InstructorID

LastNameFirstNameTelephoneEMail

Course

PK CourseNo

CourseDescriptionFK1 InstructorID

CreditPoints

FK2 ClassroomNo

Teaches

PreRequisites

PK,FK1 CourseNoPK PreRequisiteNo

Comments

ClassRooms

PK ClassroomNo

ClassroomNameBuildingBuildingRoomNoEquipmentCapacity

Enrollment

PK,FK1 StudentIDPK,FK2 CourseNo

Comments

Students

PK StudentID

LastNameFirstNameSSNDepartmentCollegeMajorEMail

Has

Enrolls

Includes

Assigned

Example: Course Registration SystemStep 5. Add Remaining Attributes

Page 46: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

46

Example:Course Registration System

(in MS Access)

Page 47: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

47

EXAMPLE:Package Delivery Tracking System

ClientsPackages

Trucks Drivers

DeliveriesClients

PK ClientID

Packages

PK PackageNo

Trucks

PK TruckNo

Drivers

PK DriverNo

Deliveries

PK DeliveryNo

Clients

PK ClientID

Packages

PK PackageNo

Trucks

PK TruckNo

Drivers

PK DriverNo

Deliveries

PK DeliveryNo

Clients

PK ClientID

Packages

PK PackageNo

FK4 DeliveryNo

Trucks

PK TruckNo

Drivers

PK DriverNo

FK1 TruckNo

Deliveries

PK DeliveryNo

FK4 ClientIDFK5 DriverNo

Clients

PK ClientID

LastName FirstName Address Telephone

Packages

PK PackageNo

FK4 DeliveryNo Size Charge

Trucks

PK TruckNo

Make Model Year

Drivers

PK DriverNo

FK1 TruckNo DriverName LicenseNo

Deliveries

PK DeliveryNo

FK4 ClientIDFK5 DriverNo Date Status

Page 48: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

48

Example:Package Delivery Tracking System

Page 49: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

49

EXAMPLE:Airline Reservation System

Page 50: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

50

Example:Airline Reservation System

Page 51: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

51

Final Data Modeling Step:

“Normalize” Your Design(we will discuss this later)

Page 52: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

52

AUTransitional

Artifact:The CRUD Matrix

Connecting Data Objects to

Use Cases

Page 53: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

53

Identifying Data Entities from Use Cases

• Identify and highlight (or bold face) all nouns in the use cases

• Inspect these nouns to see if they represent possible data entities (i.e., database tables)

• But be careful, a noun may not refer to an entity, but simply to an attribute of an entityA data entity is something you want to collect data

about (e.g., Students)An attribute is the data you want to collect about that

entity (StudentID, Name, SSN, EmailAddress)

Page 54: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

54

The CRUD Matrix

• A “transitional artifact” is one that helps establish a relationship or cross reference between artifacts

• A CRUD matrix is a transitional artifact between Use Cases and Data Entities

• Helps ensure that the Use Cases specified have all the necessary Data Entities to handle the data needs of the application and, conversely, that the set of Data Entities identified cover the entire functionality specified in the requirements.

• The Use Cases, if properly specified, must describe all the actions necessary to maintain all the application’s database tables

• A CRUD matrix is a table that cross references which Use Cases: (C)reate, (R)ead, (U)pdate and/or (D)elete data in these objects

Page 55: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

55

Developing a CRUD Matrix

• The CRUD matrix has one row for every data entity identified and one column for every Use Case specified (or the other way around)

• So, first create a column (or row) for every Use Case in your model

• Every noun highlighted in the Use Cases will suggest the need for data entity to store the respective data you, so you need to create a row (or column) for each of these data entities

• Then go through every cell in the first Use Case and enter a C, R, U and/or D on the cell depending on whether the Use Case is creating, reading, updating or deleting records in the respective data entity (i.e., database table).

• The C’s, R’s, U’s and D’s should give you an idea of the SQL queries that you will need to develop for your application

Page 56: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

56

Illustration

UC-101 UC-102 UC-103

Entity 1 C R

Entity 2 U

Entity 3 D

• UC-102 Reads data from Table 1 It will require an SQL SELECT query

• UC-101 Creates a record in Table 1 It will require an SQL INSERT query

• UC-103 Deletes records data from Table 3 It will require an SQL DELETE query

• UC-102 Updates data in Table 2 It will require an SQL UPDATE query

Page 57: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

57

CRUD Matrix Example for a Loan Processing Application

Use Case

Data Entity

Submit a Loan Request

Evaluate a Loan Request

Book a Loan

Applicant C

Loan Application C R

Credit Score C R

Credit Report C R

Account History C R

Loan Request C R,U R

Loan Officer R

Evaluation C R

Loan Agreement R

Loan Account C

Loan Clerk R

In a database application, these are tables and these are queries

Page 58: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

58

ATM Application Example

Page 59: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

59

ATM Use Case

Use Case ID UC-100

Use Case Withdraw Funds

Actors (P) Customer

Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to withdraw funds. Once in the funds withdrawal screen, the customer is prompted to enter the amount to withdraw. After the amount is entered, the system will check for availability of funds for that customer. Provided that funds are available, the system will dispense the amount requested in cash and then debit that amount from the customer’s bank account. The system will record the last withdrawal date in customer’s file and record transaction in ATM transaction log .

Priority

Non-Functional Requirements

Assumptions

Source

Page 60: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

60

ATM Use Case

Use Case ID UC-101

Use Case Deposit Funds

Actors (P) Customer

Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to deposit funds. Once in the funds deposit screen, the customer is prompted to enter the amount to deposit. After the amount is entered, deposit slot door opens, customer places deposit envelop in slot, deposit slot door closes. The system credits the customer’s account accordingly, records the last deposit date in the customer’s file and record the transaction in ATM transaction log.

Priority

Non-Functional Requirements

Assumptions

Source

Page 61: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

61

ATM Use Case

Use Case ID UC-102

Use Case Transfer Funds

Actors (P) Customer

Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to transfer funds. Once in the funds transfer screen, the customer is prompted to enter the amount to transfer, from account and to account. After the information is entered, the checks for availability of funds. If funds are available, it displays the transaction and asks for confirmation. The customer confirms transaction and the customer’s account gets adjusted accordingly. The system records the last funds transfer date in the customer’s file and records the transaction in ATM transaction log.

Priority

Non-Functional Requirements

Assumptions

Source

Page 62: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

62

ATM Use Case

Use Case ID UC-103

Use Case Balance Inquiry

Actors (P) Customer

Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choice to inquire balances. The machine prints balances, records the last balance inquiry date in the customer’s file and records the transaction in ATM transaction log .

Priority

Non-Functional Requirements

Assumptions

Source

Page 63: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

63

ATM System’s CRUD Matrix

Use Case

Data Entity

Withdraw Funds

Deposit Funds

Transfer Funds

Inquire Balances

ATM R,U

ATM Transaction Log C U U U

Customer File R,U R,U R,U R,U

Customer Account R,U U R,U R

Customer Transactions C U U

Page 64: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

64

Database Design Issue #5:

“Normalize” Your Design

Page 65: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

65

Database Design Goals

• Data integrity (Entity and Referential Integrity – ERD’s)

Avoid anomalies in the data

• No data redundancy

Record the data in one place only

• Efficient data entry

Duplicate data means having to enter the same data more than once

• Consistency

Duplicate data can lead to inconsistencies when the data changes

e.g., 2 different addresses for same client

• Flexibility and easy evolution

East to maintain, update and add new tables

Normalization

Page 66: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

66

Why Normalization?

• Question: if a data model/ERD is sound and all entity integrity, referential integrity, update/delete and business rules have been well implemented, does this guarantee a good database design?

Answer: not necessarily. If your design is not “normalized”, you could have redundant data, and that would be a BAD thing (design)

• Normalization should yield the most efficient way to organize and record the data internally—not necessarily how users want to see the data, but what makes more sense for non-redundant data storage

• We can later build user table views (i.e., what the user wants or needs to see) by querying these normalized tables.

• Redundancy: only PK and FK (e.g., client ID’s) values should appear in multiple tables (because they are needed to link tables)

Non-key data (e.g., client last name) that appears in multiple tables is “redundant”

Page 67: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

67

ExampleYou gather requirements from users and one user gives you this table and tell you that she would like the system to collect this data.

How would you organize this data internally in the database?

Page 68: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

68

Normalization

• Normalization = The systematic process of “decomposing” a set of unorganized tables with redundant data into smaller, simpler, and more organized tables with only minimal data redundant in key fields and no data redundancy on non-key fields — i.e., from chaos to order

Decomposition

Query

Decompose to most efficient internal organization

You can always recover the original data format with a query

Page 69: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

69

Degree of Normalization

• Normalization is a matter of degree -- the more normalized your design is, the lower the chances of having redundant data

• Normal Forms (NF) (higher NF designs are more normalized):1NF 2NF 3NF BCNF PJNF DKNF 4NF 5NF

• The process of normalizing a design to 3NF may seem complex, but the concept is very simple:

(1) Minimize data redundancy in key attributes -- i.e., data in key fields can be entered in more than one table

(2) Eliminate data redundancy in non-key attributes -- i.e., data in non-key fields should be entered only in one table

(3) Ensure that every piece of data (each non-key attribute) can be unambiguously located by its PK

(4) Each incremental NF gets us a step closer in this direction

Page 70: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

70

To what extent is a database normalized?

• Normalization is a matter of degree• Measured in what is called “normal forms” (NF)

• 1NF, 2NF, 3NF, etc., higher NF = more normalized• 3NF Good enough for most applications• BCNF Boyce-Codd NF (more robust version of 3NF)

Mostly of academic interest (and complex applications):

• 4NF, 5NF or PJNF (Project Join), DKNF (Domain-Key) More advanced theoretically, little practical use Useful for research and formal methods only

Normal Forms

Page 71: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

71

Q: What’s wrong with this table?

A: Data in PayDate & Amount fields not single-valued—i.e., they have repeating values

Page 72: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

72

Similar Table, Same Problem

A: repeating values for a PK value PK is duplicate

Page 73: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

73

First Normal Form (1NF)

• A “TABLE” is in 1NF if there are no multi-valued attributesand no PK is duplicated

• i.e., attributes are “atomic”

• A “DATABASE” is in 1NF if ALL its tables are in 1NF

Page 74: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

74

Decomposition to 1NF:

Create a separate table where the repeating values can be recorded as rows

Page 75: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

75

Decomposition

Page 76: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

76

Q: What’s wrong with this table?

A: Some data in the Client and OrderDate fields are entered twicei.e., some non-key data are redundant

i.e., there are “partial dependencies” in the table (see next slide)

Page 77: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

77

Functional Dependencies

• An attribute B is functionally dependent on attribute A if the value of a valid instance of attribute A uniquely determines the value of attribute B

• Represented as:

A B

Page 78: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

78

Functional Dependency Examples

StudentID StudentName

StudentID StudentMajor

What are the functional dependencies in this relations?

Clients (ClientID, ClientName, City, State, Zip)

LineItems (OrderNo, LineItem, ClientID, ProdID, Qty)

Page 79: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

79

Second Normal Form (2NF)

• Applies to tables with “composite” PKs (i.e., PK has more than one attribute)

• A “TABLE” is in 2NF if(1) it is in 1NF, and (2) non-key attributes are functionally dependent on the whole PK, not on just part of it (i.e., no partial dependencies)

• Note: we only need to worry about 2NF when PK contains more than one attribute (i.e., “composite”)

• That is: if a table is in 1NF and has a single PK, it is automatically in 2NF

• A “DATABASE” is in 2NF if ALL its tables are in 2NF

Page 80: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

80

Decomposition to 2NF:

Move the partial key (e.g., OrderNo) and the fields that are functionally dependent on only that part of the key (e.g., ClientID, OrderDate) to a separate table and make that partial key the PK in that new table

Page 81: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

81

Decomposition

Page 82: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

82

Q: What’s wrong with this table?

A: Some of the data in the ClientCity field is redundant, because once we know who the ClientID is, we know the city where they live

i.e., there are “transitive dependencies” in the table

Page 83: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

83

Transitive Dependencies

• If a non-key attribute C is functionally dependent on another non-key attribute B (BC) and B is in turn dependent on the PK attribute A (AB)this implies C is transitively dependent on A (AC) (through B or ABC), which will cause redundancies

• In 2NF, all non-key attributes are functionally dependent on the PK

• Thus, in a 2NF table, a transitive dependency will occur every time there is a functional dependency between any two non-key attributes.

Page 84: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

84

Transitive Dependency Examples

OrderNo ClientID ClientName

CourseNo InstructorID InstructorName

Are there transitive dependencies in these relations?

LineItems (OrderNo, LineItem, ProdID, Qty)

LineItems (OrderNo, LineItem, ProdID, ProdName, Qty)

Page 85: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

85

Third Normal Form (3NF)

• A “TABLE” is in 3NF if (1) it is in 2NF and (2) non-key attributes depend on the PK and nothing else

• That is, non-key attributes are NOT functionally dependent on other non-key attributes (just on the PK)

• In other words, there are no transitive dependencies

• A “DATABASE” is in 3NF if ALL its tables are in 3NF

Page 86: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

86

Decomposition to 3NF:

Move the fields with transitive dependencies to a separate table

Page 87: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

87

Decomposition

Page 88: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

88

In Summary

• 1NF = no multi-value attributes (or no PK duplicates)

• 2NF = 1NF + the “whole” PK, not just part of it

• 3NF = 2NF + the PK and “nothing but” the PK

• Important! it is OK to have non-normalized designs, and some database applications may actually require a non-normalized design, but you must have an understanding of which normalization form you are violating and a good reason for doing it

Page 89: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

89

Exercises

Indicate the normal form (PK underlined) and decompose to 3NF

Class (CourseNo, SectionNo, RoomNo)Class (CourseNo, SectionNo, RoomNo, Capacity)Class (CourseNo, SectionNo, CourseName, RoomNo, Capacity)

Page 90: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

90

ExercisesPOS System:

Indicate the normal form (PK underlined) and decompose to 3NF

Sales (SaleNo, ClientID, ClientName, SaleDate, SaleAmount)

SalesDetails (SaleNo, LineItem, SaleDate, ProdID, ProdName, Qty)

Other Systems:

VideoRental (VideoNo, Date, MovieID, MovieName, ClientID)

VideoRental (VideoNo, Date, ClientID, RentalDays)

Videos (VideoNo, MovieID, MovieName, MovieType)

Videos (VideoNo, MovieID, VideoCondition)

Movies (MovieID, MovieName, MovieType, Producer, ReleaseDate)

Page 91: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

91

ExerciseIndicate the normal form and decompose to 3NF

Page 92: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

92

Decomposition QueriesConceptually, normalization can be thought of the opposite of a SELECT

SQL query. When you normalize, you decompose a large table into simpler, smaller tables without redundancies. In contrast, when you query several small tables, the result is a larger table in which redundancies don’t matter.

For example, the decomposed tables of the exercise in the prior page can be reconstructed by querying the normalized tables as follows:

SELECT Companies.CompanyID, CompanyName,

Employees.EmployeeID, EmployeeName,

Departments.DeptID, DeptName

FROM Departments, Companies, Employees

WHERE Companies.CompanyID = Employees.CompanyID

AND Departments.DeptID = Employees.DeptID

Page 93: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

93

ExerciseIndicate the normal form and decompose to 3NF

(and then try to write an SQL query to re-construct the original table)

Page 94: 1 A U Information & Data Analysis Professor J. Alberto Espinosa Business Analysis ITEC-455 Spring 2010.

94

Back to Basics: Back to Basics: Enterprise ArchitectureEnterprise Architecture

Organization’sGoals

BusinessApplication

Enterprise Model

Enterprise Process Model

Enterprise Technology Model

Enterprise Application Model

Enterprise Data Model

BusinessDomain

ITEC 630:•Business Process•Business Data Model• Business Application Model•Technology Infrastructure