ICT Data ModellingInstitutional and Sector Modernisation Facility ICT Standards Project Funded by...

Institutional and Sector Modernisation Facility

ICT Standards

Project Funded by the European Union

ICT Data Modelling Document number: ISMF-ICT/3.07 Version: 1.00

1 Document control

1.1 List of Abbreviations Abbreviation Description ERD Entity Relationship Diagram ISMF Institutional and Sector Modernisation Facility LDM Logical Data Model MoCT Ministry of Communications and Technology PDM Physical Data Model

1.2 Purpose of this Document The purpose of this document is to provide an overview of the basic notions (methodology and notations) to perform Data Modeling. It is used as reference material for other ICT – Project Life Cycle documents, such as: � Feasibility study, � System Analysis and � System Design

2 Introduction This document provides an overview of the data modeling notions and usage, including notations, normalization rules and examples of usage, as well as a sample methodology to develop Entity-Relationship diagrams.

2.1 Objectives The objectives of the document are: � to give an overview of applying Data Modeling Techniques in:

� Logical data Models, � Physical data Models � Entity – Relationship Models, � As well as an example methodology to elaborate them.

� To describe their use in the ICT – Project Life Cycle. It is out of this document’s scope to suggest a specific notation or methodology to develop Data Modeling. Guideline 2.2 Relate notions to Data Modeling are: � Function and Process Modeling � Common Codification Schemes development Function and Process Modeling are described in “ICT Business Function Analysis” document. Please refer to it for detailed information. Common Codifications Schemes are essential to reduce duplication of data in a given Organization and Organization groups, as well as to enforce methodologies and standards for data Collections maintenance and usage. Please refer to “ICT Common Codification Schemes” document for a detailed description of the subject.

2.2 Audience The primary audience consists of the technical staff of the Syrian Public Sector, ho are involved in the Planning, Analysis and Design phases of an ICT Project. The secondary audience consists of administrative staffs that are willing to check the completeness and consistency of the ICT contractor’s deliverables.

2.3 Assumptions It is assumed that the readers of the document are aware about the ICT System Analysis and Design notions.

2.4 Other Standards � Feasibility Study � System Analysis � System Design

3 Overview Data modeling is the act of exploring data-oriented structures. Data models can be used effectively at both the enterprise level and on projects. Enterprise architects will often create one or more high-level LDMs that depict the data structures that support your enterprise, models typically referred to as enterprise data models or enterprise information models. Enterprise data models provide information that a project team can use both as a set of constraints as well as important insights into the structure of their system.

Like other modeling artefacts, data models can be used for a variety of purposes, from high-level conceptual models to physical data models. From the point of view of an object-oriented developer data modeling is conceptually similar to class modeling. With data modeling you identify entity types whereas with class modeling you identify classes. Data attributes are assigned to entity types just as you would assign attributes and operations to classes. There are associations between entities, similar to the associations between classes – relationships, inheritance, composition, and aggregation are all applicable concepts in data modeling.

Traditional data modeling is different from class modeling because it focuses solely on data – class models allow you to explore both the behavior and data aspects of your domain, with a data model you can only explore data issues.

Some examples of basic notations syntaxes used in Data Modeling, are given in ANNEX I.

3.1 Types of data Models There are two basic styles of data models: � Logical data models (LDMs) – What are the needed data . LDMs are used to explore

the domain concepts, and their relationships, of your problem domain. This could be done for the scope of a single project or for your entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the data attributes describing those entities, and the relationships between the entities.

� Physical data models (PDMs) – How data are structured, validated and maintained.PDMs are used to design the internal schema of a database, depicting the data tables, the data columns of those tables, and the relationships between the tables. The focus of this document is on physical modeling.

Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that they model can be significantly different. This is because the goals for each diagram are different – you can use an LDM to explore domain concepts with your stakeholders and the PDM to define your database design. Figure 1 presents a simple LDM and Figure 2 a simple PDM, both modeling the concept of citizens and bank accounts as well as the relationship between them. Notice how the PDM shows greater detail, including an associative table required to implement the association as well as the keys needed to maintain the relationships. PDMs should also reflect the organization’s database naming standards; in this case an abbreviation of the entity name is appended to each column name. A PDM should also indicate the data types for the columns, such as integer and char(nn). Although Figure 2 does not show them, lookup tables (also called reference tables or description tables) for how the address is used as well as for states and countries are implied by the attribute Account_Branch.

Figure 1: A simple Logical Data Model

Figure 2: A simple Physical Data Model The main difference between the entities and relationships presented in figures 1 and 2 relies on the fact that the “many-to-many” relationship in figure 1 is replaced by two “one-to-many” in Figure 2, with the extra addition of an intermediate table, according to the normalization rules.

3.2 Entity Relationship Diagrams The notions of Entity Relationship Diagrams (ERD) derive from those of Entity-Relationship Model (Chen 1976). In this concept, an entity-type (an abstract and generalized representation of physical entities, such as Citizens, Departments, Materials, etc) is related to an other entity-type (e.g. an employee belongs to a department, a request belongs to a citizen, a request is submitted to a CSC etc). Cardinality defines the number of occurrences of one entity-type for a single occurrence (instance) of the related entity-type, e.g. an employee may process many requests. Entity-types have characteristics, whose abstract representation in the model is known as attribute-type. Note: From hereon we will use the notions entity, relationship, attribute instead of entity-type, relationship-type, attribute-type. Single occurrences of them will be noted as instances. Basic notations are as follows:

#Cit_ID: Integer Cit_SSN : char(10) Cit_Surname : char (30) Cit_Name : char (20) ………………………

T_CITIZEN

# Account_ID : Integer Account_Branch : Integer Account_Date_Open : Date Account_Date_Closed : Date

T_BANK_ACCOUNT

#Cit_ID : Integer #Account_ID : Integer

T_ACCOUNT_CITIZEN

Citizen Code Citizen SSN Citizen Surname Citizen Name ………………………

CITIZENAccount ID Branch ID Account Date Open Account Date Closed …………………..

BANK ACCOUNT

has

Notation Name ExamplesEntity � Citizens

� Departments � Employees � Requests

Attribute � Surname, Name, Date of Birth etc… � Name, Ministry etc .. � Surname, Name, Educational level etc … � Description, Date submitted etc …

Relationship � Submits � Process � Belongs

Figure 3: Basic notations of an Entity – relationship diagram The figure bellow depicts a simple ERD using the notations above.

Figure 4: A simple Entity-Relationship Diagram

Citizen Submits Request

Surname Name Description Date

Employee

Process

Department Belongs

Name Surname DoB Name Ministry

Note: The diagram above is an extremely simplistic one. On fact, E-R diagrams are often very close (and use also the same notations) to the LDM’s. Furthermore, they often depict a normalized view of tables, primary key definitions etc.

4 Boyce – Code Normal Forms As mentioned above, many-to-many relationships are not permitted in a Physical Data Model. Furthermore, to strengthen data integrity, data schema must be normalized as follows: Level RuleFirst normal form (1NF) An entity is in 1NF when it contains no repeating groups of

data. Second normal form (2NF) An entity is in 2NF when it is in 1NF and when all of its

non-key attributes are fully dependent on its primary key. Third normal form (3NF) An entity is in 3NF when it is in 2NF and when all of its

attributes are directly dependent on the primary key. Table 1: The Boyce – Code Normal Forms. The advantage of having a highly normalized data schema is that information is stored in one place and one place only, reducing the possibility of inconsistent data. Unfortunately, normalization usually comes at a performance cost. To mitigate this issue, some data redundancy is permitted in designing tables which are expected to be used frequently in conjunction with others (e.g. for reporting purposes).

5 How to carry out Project teams will typically create LDMs as a primary analysis artefact. However, LDMs are often a poor choice when a project team is using object-oriented or component-based technologies because the developers would rather work with UML diagrams or when the project is not data-oriented in nature. When a relational database is used for data storage project teams are best advised to create a PDMs to model its internal schema. A PDM is often one of the critical design artifacts for business application development projects The deployment of both LDM’s and ERD’s follows in general the following steps: No Step Description1 Identify Entities Identify the roles, events, locations, tangible things or

concepts about which the end-users want to store data. 2 Find Relationships Find the natural associations between pairs of entities

using a relationship matrix. 3. Draw Rough ERD Put entities in rectangles and relationships on line

segments connecting the entities. 4. Fill in Cardinality Determine the number of occurrences of one entity for a

single occurrence of the related entity. 5. Define Primary Keys Identify the data attribute(s) that uniquely identify one and

only one occurrence of each entity. Table 2: The first five steps in implementing an Entity – Relationship model of a System.

The table below depicts to the steps needed to produce a more accurate E-R diagram, close to the Physical Data Model (PDM). This is done by applying the normalization rules (i.e. bringing entities to 3rt Normal Form):

No Step Description6. Draw Key-Based ERD Eliminate Many-to-Many relationships and include primary

and foreign keys in each entity. 7. 7. Identify Attributes Name the information details (fields) which are essential to

the system under development. 8. 8. Map Attributes For each attribute, match it with exactly one entity that it

describes. 9. 9. Draw fully attributed

ERD Adjust the ERD from step 6 to account for entities or relationships discovered in step 8.

10. 10. Check Results Does the final Entity Relationship Diagram accurately depict the system data?

11 11. Define secondary keys

Anticipate the most probable ways (attributes and criteria) in which data will be accessed. Define the suitable secondary keys to speed up data access and sorting.

Table 3: The last six steps in implementing an Entity – Relationship model of a System. A more detailed description of the above sample methodology is given in ANNEX II.

6 When to carry out Information systems deal with data, whose structure is presented by Data Models. Preliminary Data Modeling must be performed in the early stages of an ICT Project, even during the elaboration of pre-feasibility study, for two purposes: � Presenting the system’s notions to the stakeholders (by using perhaps a simplified E-R

diagram, like the one depicted in figure 4). � Performing a rough estimation of the system’s needs in terms of capacity and costs. Further modeling is required during the Analysis Phase of the Project’s Life Cycle, where a complete Logical Data Model (LDM) is required. Finally, the Physical Data Model (probably depicted from the development environment designing tools), must be included in the ICT system’s documentation.

7 CONCLUSION Data Modeling must depict the structure of the system’s data, serving purposes such as: � Identifying some of the system’s basic aspects regarding the entities and their roles in the

business functions (at the early stages). � Guiding the developer team during the build phase to implement the system’s static

structure. � Documenting the static structure as a reference material for further exploitation (report

designing, data migration and warehouse implementation). There are different notations (and thus modeling ways) in performing Data Modeling. In any case the notations and models used must be adequate for the specific purpose. Please refer to ANNEX III for a checklist of the review items of a complete PDM.

APPENDIX I

8 Appendix I - Notations for Logical and Physical Data Models

8.1 Presentation The figure bellow presents a summary of the syntax of three common data modeling notations: Information Engineering (IE), Barker and the Unified Modeling Language (UML). This diagram isn’t meant to be comprehensive; instead its goal is to provide a basic overview. Notation Information

Engineering Barker Notation UML

Multiplicities:

- Zero or one 0…1

- One only 1

- Zero or more 0…*

- One or more 1…*

- Specific range N/A N/A 3…7

8.2 Comments The table below discusses the basic data modeling notations.

Notation Comments IE The IE notation (Finkelstein 1989) is simple and easy to read, and is well suited

for high-level logical and enterprise data modeling. The only drawback of this notation, arguably an advantage, is that it does not support the identification of attributes of an entity. The assumption is that the attributes will be modeled with another diagram or simply described in the supporting documentation.

Barker The Barker notation is one of the more popular ones; it is supported by Oracle’s toolset, and is well suited for all types of data models.

UML This is not an official data modeling notation (yet). Although several suggestions for a data modeling profile for the UML exist, none are complete and more importantly are not “official” UML yet. However, the Object Management Group (OMG) in December 2005 announced an RFP for data-oriented models.

APPENDIX II

9 Appendix II - Developing Entity Relationship Diagrams (ERDs)

9.1 Purpose

Entity Relationship Diagrams are a major data modeling tool and will help organize the data in your project into entities and define the relationships between the entities. This process has proved to enable the analyst to produce a good database structure so that the data can be stored and retrieved in a most efficient manner.

9.2 INFORMATION:

9.2.1 Entity

A data entity is anything real or abstract about which we want to store data. Entity types fall into five classes: roles, events, locations, tangible things or concepts. E.g. employee, payment, campus, book. Specific examples of an entity are called instances. E.g. the employee John Jones, Mary Smith's payment, etc.

9.2.2 Relationship

A data relationship is a natural association that exists between one or more entities. E.g. Employees process payments. Cardinality defines the number of occurrences of one entity for a single occurrence of the related entity. E.g. an employee may process many payments, but might also not process any payments depending on the nature of her job.

Attribute A data attribute is a characteristic common to all or most instances of a particular entity. Synonyms include property, data element and field. E.g. Name, address, Employee Number, pay rate are all attributes of the entity employee. An attribute or combination of attributes that uniquely identifies one and only one instance of an entity is called a primary key or identifier.E.g. Employee Number is a primary key for Employee.

9.3 An Entity Relationship Diagram Methodology No Step Description1 Identify Entities Identify the roles, events, locations, tangible things or

concepts about which the end-users want to store data. 2 Find Relationships Find the natural associations between pairs of entities

using a relationship matrix. 3. Draw Rough ERD Put entities in rectangles and relationships on line

segments connecting the entities. 4. Fill in Cardinality Determine the number of occurrences of one entity for a

single occurrence of the related entity. 5. Define Primary Keys Identify the data attribute(s) that uniquely identify one and

only one occurrence of each entity. 6. Draw Key-Based ERD Eliminate Many-to-Many relationships and include primary

and foreign keys in each entity. 7. 7. Identify Attributes Name the information details (fields) which are essential to

the system under development.

No Step Description8. 8. Map Attributes For each attribute, match it with exactly one entity that it

describes. 9. 9. Draw fully attributed

ERD Adjust the ERD from step 6 to account for entities or relationships discovered in step 8.

10. 10. Check Results Does the final Entity Relationship Diagram accurately depict the system data?

11. 11. Define secondary keys

Secondary keys (which must have duplicates if full normalization of the PDM has taken place) may speed up SQL queries when reading or sorting.

9.4 A Simple Example A Ministry has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number.

9.4.1 Identify Entities

The entities in this system are Department, Employee, Supervisor and Project. One is tempted to make Ministry an entity, but it is a false entity because it has only one instance in this problem. True entities must have more than one instance.

9.4.2 Find Relationships

We construct the following Entity Relationship Matrix:

Department Employee Supervisor ProjectDepartment is assigned run by Employee belongs to works on Supervisor runs Project uses

9.4.3 Draw Rough ERD

We connect the entities whenever a relationship is shown in the entity Relationship Matrix.

9.4.4 Fill in Cardinality

From the description of the problem we see that:

• Each department has exactly one supervisor. • A supervisor is in charge of one and only one department. • Each department is assigned at least one employee. • Each employee works for at least one department. • Each project has at least one employee working on it. • An employee is assigned to 0 or more projects.

9.4.5 Define Primary Keys

The primary keys are Department Name, Supervisor Number, Employee Number, Project Number.

9.4.6 Draw Key-Based ERD

There are two many-to-many relationships in the rough ERD above, between Department and Employee and between Employee and Project. Thus we need the associative entities Department-Employee and Employee-Project. The primary key for Department-Employee is

the concatenated key Department Name and Employee Number. The primary key for Employee-Project is the concatenated key Employee Number and Project Number.

9.4.7 Identify Attributes

The only attributes indicated are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee NUMBER and a unique project number.

9.4.8 Map Attributes

Attribute Entity Attribute EntityDepartment Name

Department Supervisor Number

Supervisor

Employee Number

Employee Supervisor Name

Supervisor

Employee Name

Employee Project Name

Project

Project Number

Project

9.4.9 Draw Fully Attributed ERD

10. Check Results The final ERD appears to model the data in this system well.

11. Define Secondary Keys Anticipate the way (attributes and criteria) data will be accessed. Define secondary keys to speed up data access.

9.5 Further Discussion

9.5.1 Step 1. Identify Entities

A data entity is anything real or abstract about which we want to store data. Entity types fall into five classes: roles, events, locations, tangible things, or concepts. The best way to identify entities is to ask the system owners and users to identify things about which they would like to capture, store and produce information. Another source for identifying entities is to study the forms, files, and reports generated by the current system. E.g. a student registration form would refer to Student (a role), but also Course (an event), Instructor (a role), Advisor (a role), Room (a location), etc.

9.5.2 Step 2. Find Relationships

There are natural associations between pairs of entities. Listing the entities down the left column and across the top of a table, we can form a relationship matrix by filling in an active

verb at the intersection of two entities which are related. Each row and column should have at least one relationship listed or else the entity associated with that row or column does not interact with the rest of the system. In this case, you should question whether it makes sense to include that entity in the system. . A student is enrolled in one or more courses subject verb objects

9.5.3 Step 3. Draw Rough ERD

Using rectangles for entities and lines for relationships, we can draw an Entity Relationship Diagram (ERD).

9.5.4 Step 4. Fill in Cardinality

At each end of each connector joining rectangles, we need to place a symbol indicating the minimum and maximum number of instances of the adjacent rectangle there are for one instance of the rectangle at the other end of the relationship line. The placement of these numbers is often confusing. The first symbol is either 0 to indicate that it is possible for no instances of the entity joining the connector to be related to a given instance of the entity on the other side of the relationship, 1 if at least one instance is necessary or it is omitted if more than one instance is required. For example, more than one student must be enrolled in a course for it to run, but it is possible for no students to have a particular instructor (if they are on leave).

The second symbol gives the maximum number of instances of the entity joining the connector for each instance of the entity on the other side of the relationship. If there is only one such instance, this symbol is 1. If more than 1, the symbol is a crows foot opening towards the rectangle.

If you read it like a sentence, the first entity is the subject, the relationship is the verb, the cardinality after the relationship tells how many direct objects (second entity) there are.

I.e. A student is enrolled in one or more courses subject verb objects

9.5.5 Step 5. Define Primary Keys

For each entity we must find a unique primary key so that instances of that entity can be distinguished from one another. Often a single field or property is a primary key (e.g. a Student ID). Other times the identifier is a set of fields or attributes (e.g. a course needs a department identifier, a course number, and often a section number; a Room needs a Building Name and a Room Number). When the entity is written with all its attributes, the primary key is underlined.

9.5.6 Step 6. Draw Key-Based ERD

Looking at the Rough Draft ERD, we may see some relationships which are non-specific or many-to-many. I.e., there are crows feet on both ends of the relationship line. Such relationships spell trouble later when we try to implement the related entities as data stores or data files, since each record will need an indefinite number of fields to maintain the many-to-many relationship.

Fortunately, by introducing an extra entity, called an associative entity for each many-to-many relationship, we can solve this problem. The new associative entity's name will be the

hyphenation of the names of the two originating entities. It will have a concatenated key consisting of the keys of these two entities. It will have a 1-1 relationship with each of its parent entities and each parent will have the same relationship with the associative entity that they had with each other before we introduced the associative entity. The original relationship between the parents will be deleted from the diagram.

The key-based ERD has no many-to-many relationships and each entity has its primary and foreign keys listed below the entity name in its rectangle.

9.5.7 Step 7. Identify Attributes

A data attribute is a characteristic common to all or most instances of a particular entity. In this step we try to identify and name all the attributes essential to the system we are studying without trying to match them to particular entities. The best way to do this is to study the forms, files and reports currently kept by the users of the system and circle each data item on the paper copy. Cross out those which will not be transferred to the new system, extraneous items such as signatures, and constant information which is the same for all instances of the form (e.g. Ministry’s name and address). The remaining circled items should represent the attributes you need. You should always verify these with your system users. (Sometimes forms or reports are out of date.)

9.5.8 Step 8. Map Attributes

For each attribute we need to match it with exactly one entity. Often it seems like an attribute should go with more than one entity (e.g. Name). In this case you need to add a modifier to the attribute name to make it unique (e.g. Customer Name, Employee Name, etc.) or determine which entity an attribute "best' describes. If you have attributes left over without corresponding entities, you may have missed an entity and its corresponding relationships. Identify these missed entities and add them to the relationship matrix now.

9.5.9 Step 9. Draw Fully-Attributed ERD

If you introduced new entities and attributes in step 8, you need to redraw the entity relationship diagram. When you do so, try to rearrange it so no lines cross by putting the entities with the most relationships in the middle. If you use a tool like Systems Architect, redrawing the diagram is relatively easy.

Even if you have no new entities to add to the Key-Based ERD, you still need to add the attributes to the Non-Key Data section of each rectangle. Adding these attributes automatically puts them in the repository, so when we use the entity to design the new system, all its attributes will be available.

9.5.10 Step 10. Check Results

Look at your diagram from the point of view of a system owner or user. Is everything clear? Check through the Cardinality pairs. Also, look over the list of attributes associated with each entity to see if anything has been omitted.

9.5.11 Step 11. Define Secondary Keys

In a full normalized data base schema (PDM), secondary keys must have duplicates, since all attributes of an entity are functionally depended only from the primary key. Any way, this is not

a real life case, especially in cases where non-semantic items (e.g. auto-incremental numbers) are used as primary keys.

APPENDIX III

10 Appendix III - Checklist The following table contains a checklist of the review items of a complete Physical Data Model (PDM) No Item Yes/No Remarks 1. Are all needed data described

within the proper entities?

2. Are the types – sizes of the fields defined as appropriate?

3. Are any existing data structure(s) taken into consideration?

4. Are the relationships between entities clearly described?

5. Are the cardinalities clearly defined?

6. Are primary and secondary keys defined?

7. Are the entities normalized up to the 3rd normal form? Are there any deviations justified?

ICT Data ModellingInstitutional and Sector Modernisation Facility ICT Standards Project Funded by...

Documents

Transcript of ICT Data ModellingInstitutional and Sector Modernisation Facility ICT Standards Project Funded by...