Logical Relational Data Modeling...

22
Property and Casualty Insurance Working Group Logical Relational Data Modeling Standards Version 1.0

Transcript of Logical Relational Data Modeling...

Property and Casualty Insurance Working Group

Logical Relational Data Modeling

Standards

Versio n 1.0

Property and Casualty Insurance Working Group

Jun e 16, 2008

Table of ContentsIntroduction .................................................................................................................................. 4

Purpose .................................................................................................................................... 4

Document Maintenance ............................................................................................................ 4

Scope ....................................................................................................................................... 4

Logical Relational Data Model Definition ....................................................................................... 4

ER Diagramming Conventions ..................................................................................................... 6

Logical Relational Data Modeling Standard Page 2

Property and Casualty Insurance Working Group

Modeling Syntax ....................................................................................................................... 6

Diagramming Layout Guidelines ............................................................................................... 7

Normal Forms ........................................................................................................................... 8

Writing Definitions of Logical Objects ............................................................................................ 9

Logical Object Definition Guidelines: ........................................................................................ 9

Entity Definition Guidelines: ...................................................................................................... 9

Attribute Definition Guidelines: ................................................................................................. 9

Naming Logical Objects .............................................................................................................. 10

Logical Object Naming Guidelines .......................................................................................... 10

Entity Naming Guidelines ........................................................................................................ 11

Attribute Naming Guidelines .................................................................................................... 11

Relationship Naming Guidelines ............................................................................................. 13

Relationship Standards .............................................................................................................. 14

Super- types and Sub- types ...................................................................................................... 15

Entity Keys ................................................................................................................................. 17

Dimensional Data Modeling ........................................................................................................ 18

Appendix ................................................................................................................................... 19

Class Words ........................................................................................................................... 19

Logical Relational Data Modeling Standard Page 3

Property and Casualty Insurance Working Group

Introduction

PurposeThis document provides standards and guidance for the naming and use of objects in logical

relational data models. Logical objects are created and maintained to meet business

requirements. Accurate naming clarifies the specific nature of each logical object. Consistency

allows the logical names to have persistent value in differentiating data items. Name formation

and the use of logical modeling objects are independent of any particular data modeling tool or

relational database management system (RDBMS) platform. These logical relational data

modeling guidelines are independent of specific CASE tools.

The intention of this standard is to establish an agreed- upon basis for developing logical relational

data models in order to promote greater quality and consistency across data models and enable

objective model reviews.

Document MaintenanceTo suggest improvements, changes or additions to this standard, contact:

Gail Austin or Harsh Sharma

[email protected] [email protected]

ScopeThese standards apply to all logical relational data models that are developed by OMG submission teams.

Logical Relational Data Model DefinitionThe relational model for database management is a database model based on predicate logic and

set theory. It was first formulated and proposed in 1969 by Edgar Codd with aims that included

avoiding, without loss of completeness, the need to write computer programs to express database

queries and enforce database integrity constraints. “Relation” is a mathematical term for “table”,

Logical Relational Data Modeling Standard Page 4

Property and Casualty Insurance Working Group

and thus “relations” roughly means “based on tables”. It does not refer to the links or “keys”

between tables, contrary to popular belief.1

A logical relational data model defines what an organization knows about things of interest to the

business and graphically shows how they relate to each other in an entity relationship (ER)

diagram. An entity relationship diagram is an abstract conceptual representation of structured

data. It uses standard symbols to denote the things of interest to the business (entities), the

relationships between entities and the cardinality and optionality of those relationships. The

Logical Relational Data Model, in contrast to the more abstract Conceptual Relational Data Model,

contains detailed characteristics of the entities (attributes) and their definitions. It generates the

structure of a physical data model which in turn generates a database following Model Driven

Architecture principles. It is a result of detailed analysis of the business requirements.

The following illustration shows how the logical model fits into the overall data modeling process:

1 Wikipedia – relational model

Logical Relational Data Modeling Standard Page 5

Property and Casualty Insurance Working Group

Ultimately, the logical relational data model helps to solidify and validate business requirements

and delivers stable, flexible data structures that are easily navigated and can answer

unanticipated questions.

ER Diagramming Conventions

Modeling SyntaxThe recommended notation for models is Information Engineering (IE) – “Crow’s Feet” - because

it is easier for users to interpret than the Integration Definition for Information Modeling (IDEF1X)

notation.2

2 The choice of IE notation will be revisited when the Barker notation becomes more widely available in the

modeling tools.

Logical Relational Data Modeling Standard Page 6

Property and Casualty Insurance Working Group

Diagramming Layout GuidelinesOrient entities so that the “toes” of a relationship’s crow’s foot always point down. This puts

fundamental entities in the top area of the diagram, and positions associative and subtype entities

in the lower area of the diagram.

Recommended crow’s feet down convention Avoid dead crows!

CONTACT PROFILE

Person Identifier (FK)Contact Point Identifier (FK)

Home Contact Point IndicatorWork Contact Point Indicator

CONTACT POINT

Contact Point Identifier

PERSON

Person Identifier (FK)

First NameMiddle NameLast NameLegal NameNicknameName SuffixBirth DateBirth Place NameGender Code

CONTACT PROFILE

Person Identifier (FK)Contact Point Identifier (FK)

Home Contact Point IndicatorWork Contact Point Indicator

CONTACT POINT

Contact Point Identifier

PERSON

Person Identifier (FK)

First NameMiddle NameLast NameLegal NameNicknameName SuffixBirth DateBirth Place NameGender Code

Keep the relationship lines as straight as possible. Avoid unnecessary bends. Too many symbols

clutter the diagram and make it confusing to the viewer.

Avoid crossing relationship lines. Crossed lines make it difficult to understand which entities are

related.

Relationship names should be placed on the diagram so that the verbs or verb phrases are read in

a clockwise direction from one entity to the related entity.

Example:

Logical Relational Data Modeling Standard Page 7

Property and Casualty Insurance Working Group

POLICY

Policy Identifier

EXPOSURE

Policy Identifier (FK)Insured Object Identifier (FK)Coverage Type Identifier (FK)

covers

is covered by

Normal FormsNormal Forms provide a way to structure data to eliminate undesirable redundancies,

inconsistencies and dependencies. Normalization is a formalized technique for creating the most

desirable logical model for the given data and business rules. Completed logical models should

be in, at least, Boyce/Codd Normal Form (BCNF)3. For a model to be in BCNF, every entity in the

model must be in BCNF. The normal forms are summarized below:

Firs t Nor ma l For m (1NF) identifies and eliminates repeating groups and establishes a

primary key.

Secon d Nor ma l For m (2NF) identifies and removes partial- key dependencies. This applies

only to tables with composite keys.

Thir d Nor ma l For m (3NF) identifies and eliminates non- key attributes that are dependent

on other non- key attributes.

Boyce/Cod d Nor ma l For m (BCNF) identifies and eliminates key attributes that are

dependent upon other key attributes in an entity with a composite key.

3 See Wikipedia Database Normalization: http://en.wikipedia.org/wiki/Database_normalization

Logical Relational Data Modeling Standard Page 8

Property and Casualty Insurance Working Group

Writing Definitions of Logical ObjectsGood Logical Object names are important because they provide a persistent record of the unique

nature of each object. Good names cannot be developed unless the object first has a good

business definition.

Logical Object Definition Guidelines: Use industry definitions where possible and appropriate.

Describe what the entity or attribute is – not where, when or by whom it is used.

Be clear and concise.

Write as if the reader is unfamiliar with the business area.

Use business terms rather than technical terms to express the meaning and importance to

the business.

Use mixed case according to standard business English conventions.

Do not use jargon, abbreviations or acronyms.

Do not include information that should be documented elsewhere, such as process

descriptions.

Entity Definition Guidelines: Entity definitions should be robust and communicate the essential and unique business

nature of the entity.

Do not depend on or refer to the definition of another object in the model.

Express one concept or idea – each entity should have a unique meaning.

Attribute Definition Guidelines: Attribute definitions should communicate the essential business nature and purpose of the

attribute.

Do not depend on or refer to the definition of another object in the model, except for

derived attributes.

Include the domain of allowed values and default value where appropriate.

Logical Relational Data Modeling Standard Page 9

Property and Casualty Insurance Working Group

Naming Logical Objects

Logical Object Naming Guidelines Use one or more words which are formed using the 26 letters (A- Z), the 10 digits (0- 9), and no

special characters.

Separate words in the name with one space

Spell out words completely using no abbreviations.

Use the minimum set of words for the name that completely and uniquely capture the concept

expressed in the business definition

Reflect the business nature of the object in its name

Review names and corresponding definitions with business subject matter experts and get

their approval

Express a single idea or concept in the name that is clear and self- explanatory.

Write in plain English, spelling out all terms in full using business terms as defined by the

business client or as defined in a business or industry dictionary.

Do not use the possessive form; the articles “a”, “an”, or “the”; conjunctions; verbs; or

prepositions in the name.

Do not use the names of organizations, departments, computer applications, reports,

publications, forms or computer screens in the name.

Exceptions

Acronyms – An acronym is a word formed from the initial letters of a name, as WAC for

Women’s A rmy Corps, or by combining initial letters or parts of a series of words, as

r a da r for r a dio detecting and r anging. When an acronym is widely known it may be an

exception to the no abbreviation rule. A list of exceptions should be maintained as an

appendix to this standard and subject to an approval and a governance process.

Abbreviations – if the object name is too long to fit in the space allotted by the data

modeling tool and all non- essential words have been eliminated from the name,

abbreviate the class word. If the name is still too long, find text in the name that can form

acronyms. Starting with the right- most text, apply the acronym and repeat moving left in

the name until the name fits. Hyphen – use if the correct spelling of the word contains a

hyphen (e.g. off- premises)

Slash – allowed if used in a business term (e.g. Actual/Expected)

Logical Relational Data Modeling Standard Page 10

Property and Casualty Insurance Working Group

Camel Case – allowed if the business term has an uppercase letter beyond the first letter

– though rarely found in formal written English, it is sometimes found in product names

or company names (e.g. NetQuote, SmartBrief)

Entity Naming Guidelines Form a meaningful, concise, descriptive business name for the entity by extracting the

important concepts from its business definition. The name should avoid confusion with

similarly named but differently defined entities in other business areas.

Use business terms as defined by a business subject matter expert or by a business

dictionary.

Make the entity name a singular noun or noun phrase with qualifying adjectives because each

instantiation of the object represented by the entity is a single thing.

Use UPPER CASE.

Consider appending “LOOKUP” to reference entity names to make them easier to distinguish

from fundamental entities.

Do not use the words “Entity” or “Table” in the entity name unless they are part of common

business terminology.

Combine the names of the parent entities to form the name of the associative entity if that

forms a meaningful business name. For example, PERSON SKILL describes the association

between the PERSON and SKILL entities. In other cases, the noun form of the relationship

verb may form the associative entity name as in POLICY describes the association between

PARTY and PARTY.

Attribute Naming Guidelines Form a meaningful, concise, descriptive business name for the attribute by extracting the

important concepts from its business definition. Attributes in more than one model should

Logical Relational Data Modeling Standard Page 11

Property and Casualty Insurance Working Group

have the same name and definition in all models.

Use a singular noun or singular noun phrase with qualifying adjectives that are meaningful

to the business.

Use Title Case.

Do not use a class word or its abbreviation by itself as an attribute name.

Do not use the word “Attribute” in the attribute name unless it is part of common business

terminology.

Attribute Name Structure

o An attribute name begins with at least one Qualifier followed by a Class Word. Note

that conjunctions, verbs and other parts of speech are eliminated when they do not

affect the meaning of the name.

o Class words describe the type of data identified by the attribute name. Examples

include: amount, code, date, indicator, name and number.

o End the name with an approved class word that best categorizes the attribute.

Class words may also give an indication of the data type and possible values of the

attribute, e.g. an indicator is always a single alphanumeric character with only 2

possible values other than Null, ‘Y’ or ‘N’.4

o Units of Measure describe the quantity that was measured such as height or

volume.

o Objects are used for program objects, images, sounds and videos.

4 See Appendix for details on Class Words.

Logical Relational Data Modeling Standard Page 12

Property and Casualty Insurance Working Group

Examples of logical attribute names and their components:

QUALIFIERS CLASS WORDS

MODIFIER PRIME WORD KEY WORD UNIT OF MEASURE OBJECT

Automobile Acquisition Date

Insurance Company Name

Payment Status Code

Valid Driver License Indicator

Vehicle Engine Capacity Cubic Centimeters

Accident Photograph Image Jpeg

Relationship Naming Guidelines The relationship name should be a verb or a verb phrase in third person singular form, i.e.

a verb form that is appropriate for a singular occurrence of the entity. This verb or verb

phrase should be an active verb in the parent to child direction and a passive verb phrase

in the child to parent direction. When used with the cardinality and optionality information,

the verb or verb phrase allows the relationship to be read as bi- directional English

sentences. For example: A POLICY covers zero, one or many EXPOSURE(S). An

EXPOSURE is covered by one and only one POLICY.

Logical Relational Data Modeling Standard Page 13

Property and Casualty Insurance Working Group

Do not include words that convey cardinality or optionality in the verb phrase – words such

as ‘may’, ‘must’, ‘one and only one’ or ‘one or many’ are derived from the relationship

symbols.

Avoid using generic or vague words and phrases such as ‘is’, ‘has’, ‘consists of’, ‘relates

to’, ‘associated with ‘, etc.

Relationship StandardsA relationship describes the precise business rules governing the association between two

entities and facilitates the identification of foreign keys and referential integrity rules that may be

required in the database design.

The minimum components that must be specified for each relationship are:

o Name – a verb or verb phrase from parent to child

o Optionality rules

o Cardinality rules

o Qualification as an identifying or non- identifying relationship

Many- to- many relationships are desirable in Conceptual Data Models but should always

be resolved with an associative entity in a Logical Data Model even if the associative entity

has no attributes other than the keys.

Investigate all mandatory one- to- one relationships because usually the two entities are in

fact one entity.

Eliminate circular relationships because they cause problems establishing proper data

dependency sequences. They usually result from an incorrect or misunderstood business

rule.

Logical Relational Data Modeling Standard Page 14

Property and Casualty Insurance Working Group

Eliminate redundant relationships that consist of two dependency paths between the same

two entities. One of the paths is a direct relationship between the entities; the other uses a

non- direct path that involves other entities. These redundant relationships may lead to

problems with database consistency.

Carefully review multiple relationships between the same two entities as they tend to

represent process logic and may introduce conflicting cardinalities. If the multiple

relationships are created to document roles, a better solution may be to create a role entity

with appropriate subtypes.

Super-types and Sub-typesSuper- types and sub- types can be the result of either a generalization process – bottom- up – or

a specialization process – top- down. The result is a super- type (parent) that contains attributes

that are shared by all subtypes and a sub- type (child) that inherits all the shared attributes from

the super- type but also has unique attributes of its own.

A sub- type has an ‘is a’ relationship to its super- type. Sub- types are not ‘composed of’

relationships.

Super- types and sub- types clarify complex business rules and constraints between

entities.

The super- type and sub- type have an exclusive OR relationship. An instance of the

super- type can be an instance of only one of the sub- type entities.

Logical Relational Data Modeling Standard Page 15

Property and Casualty Insurance Working Group

An example of super- types and sub- types:

INSURED OBJECT

Insured Object Identifier

Geographic Location ID (FK)

HOME

Insured Object Identifier (FK)

VEHICLE

Insured Object Identifier (FK)

Registration State Code (FK)

MOTORCYCLE

Insured Object Identifier (FK)RECREATIONAL VEHICLE

Insured Object Identifier (FK)

AUTOMOBILE

Insured Object Identifier (FK)

Logical Relational Data Modeling Standard Page 16

Property and Casualty Insurance Working Group

Entity KeysA key identifies specific occurrences of an entity. They can be simple, consisting of a single

attribute, or they can be composite, consisting of two or more attributes.

A Candida te Key uniquely identifies occurrences of an entity. There may be more than

one candidate key for an entity. Candidate keys are not usually recorded in the logical

data model because they become either a primary key or an alternate key.

A Primary Key is a single candidate key selected as the ‘primary’ unique identifier for

the entity.

o The primary key must be stabl e for a relational data model. If the value were to

change over time, the result could be either a non- unique key value or multiple key

values for one instance of an entity. Either situation could cause ambiguous or lost

data, system crashes or difficult update processes.

o The primary key should be defini tiv e because it uniquely identifies an instance of

the entity and thus no instance can be added to the entity until its identity is fully

known. The primary key cannot be nullable or contain nullable components.

o The primary key should use the m in ima l number of attributes required to define a

unique instance of the entity. A concise key has advantages in the physical

database such as smaller indexes and foreign keys.

An Al terna te Key is any candidate key not selected as the primary key of an entity.

Alternate keys are not usually recorded in the logical model but may become indexes in

the physical model. Alternate keys are usually unique but are not required to be.

A Surroga te Key consists of a single attribute created for the sole purpose of uniquely

identifying an instance of an entity. Natural keys consist of attributes that ‘naturally’ belong

to each occurrence of the entity. Surrogate keys are identifiers that contain no inherent,

Logical Relational Data Modeling Standard Page 17

Property and Casualty Insurance Working Group

embedded data about the entity. That is to say, they are always non- intelligent keys.

Surrogate keys are usually a numeric attribute whose value can be generated

automatically either as a sequential number or a random number. Synonyms for a

surrogate key include: ar tificia l ke y, syn the tic ke y, arbi trar y ke y, and sys tem- genera te d ke y.

A Foreig n Key is a primary key of one entity (the ‘parent’ or independent entity) that is

duplicated in a separate, related entity (the ‘child’ or dependent entity). A foreign key is not

required to be unique within the child entity. A foreign key that is part of a composite

primary key in the child entity is known as an identifying or primary foreign key. Attributes

in a non- identifying foreign key become non- key attributes in the child entity.

Dimensional Data ModelingThere are dimensional data modeling concepts such as the grain of the model, conformed

dimensions, and diagramming layouts that deserve coverage in a standards document dedicated

to dimensional modeling. The next few paragraphs talk about which parts of the Relational Data

Modeling Standard apply to the Dimensional Modeling Standards and which do not.

Relational Data Models are designed to support operational databases that capture complex

information accurately. They deliver stable, flexible data structures that are easily navigated and

can answer unanticipated questions. Dimensional Data Models are designed to support reporting

and business analytics databases. They deliver simple, high- performance queries that answer a

set of anticipated questions.

Although Relational and Dimensional Data Models serve different purposes, they share many of

the same standards. Most importantly, they both use the Model Driven Architecture approach.

Also, the Logical Object, Entity, and Attribute Definition and Naming Guidelines apply to both

styles of modeling. They are both Entity Relationship diagrams and both use the same IE

modeling syntax. The Relationship Standards also apply to both though in practice relationship

names are not used as often in Dimensional models as they are in Relational.

Logical Relational Data Modeling Standard Page 18

Property and Casualty Insurance Working Group

Dimensional models are a denormalized design. Super- types and sub- types would be merged.

Their diagramming layouts often use a star schema design and occasionally a snow- flaked

design so their Diagramming Layout Guidelines are different from the Relational model.

Appendix

Class WordsThe three tables below enumerate approved class words which come in three flavors: key words ,

u n i t s of m easu r e and objects . Each class word has a standard abbreviation, definition, and

associated logical data type. The example is a typical column name and data value.

Key Word Abbreviation Definition Logical

Datatype

Example

Amount AMT A quantity of money. NUMERIC Policy Face Amount = 1,200.0

Code CD Letters and numbers used for brevity to identify something.

STRING Sales Office Code = AR11

Count CNT A numeric count or calculated quantity of anything other than money, used when no unit of measure applies.

NUMERIC Active Employee Count = 41,256

Date DT Time stated in terms of year, month and day.

DATE Disability Date = 2002/4/5

Description DSCR A statement that represents something in words.

STRING Policy Change Reason Description = “Match coverage to changed income”

Identifier, ID, Identification, Identity

ID Data that serves to uniquely identify one item in a group

STRING or

NUMERIC

Employee ID = 0123456

Indicator IND Data that can have only one of two values other than NULL: Y(es) or N(o).

STRING

(1 character)

Auditing Approval Indicator = Y

Line LN A set of characters normally printed or displayed as one horizontal row.

STRING First Address Line = “451 MAIN ST”

Name NM A word or words by which a thing is designated and

STRING Person Full Name = “Sammy Somerset”

Logical Relational Data Modeling Standard Page 19

Property and Casualty Insurance Working Group

distinguished from others.Number NUM Normally numeric data

used to identify ordinal position or to distinguish between items in a set. When numeric, it must always be a whole number.

STRING or

NUMERIC

Arrival Sequence Number = 5

Objects See Object list below

Binary Objects, such as program objects, images, sounds, or videos.

STRING

PercentPercentage

PCT Numeric data specifying a portion or share out of each 100 units. (75 units out of 100 is 75 percent (%). Percent values are multiplied by 0.01 in order to facilitate customary processing. In the example, 75 percent would be stored as 0.7500 but displayed as 75.00 %.)

NUMERIC Sales Closure Percentage = .7500

Text TXT Data having relatively undefined content and arrangement such as a note, comments or an explanation

STRING Audience Comment Text = “Enthusiastic and attentive”

Time TM Time stated in terms of hours, minutes and seconds

TIME Check-In Time = 8:45 AM

Timestamp TS Time stated in terms of year, month, day, hours, minutes, seconds and fractions of seconds. Identifies an instant in time.

TIMESTAMP Transaction Timestamp = 20021203134516.872

Units of Measure

See Unit of Measure list below

All units of measure, e.g. Feet, Months, Miles, Centimeters.

NUMERIC

Logical Relational Data Modeling Standard Page 20

Property and Casualty Insurance Working Group

Unit of Measure Abbreviation

Beats per Minute BPM

Centimeters (Centimetres) CM

Cubic Centimeters (Centimetres) CC

Days DAY

Degrees DEG

Feet FT

Grams G

Horsepower HP

Hours HR

Inches IN

Kilograms KG

Kilometers (Kilometres) KM

Kilometers (Kilometres) per Hour KMH

Liters (Litres) L

Meters (Metres) M

Miles MILE

Miles Per Hour MPH

Millimeters (Millimetres) MM

Logical Relational Data Modeling Standard Page 21

Property and Casualty Insurance Working Group

Minutes MIN

Months MO

Ounces OZ

Pounds LB

Units. “Units” is a generic Unit of

Measure (UOM) used when data with

different UOM will be stored in a

common column. In this case there

must be a companion code column

containing a UOM abbreviation

indicating the UOM of the Units value.

UNIT

Weeks WK

Years YR

Object Type Object Class Abbreviation

C++ Program Object OBJ_C

PowerBuilder Program Object OBJ_PB

SmallTalk Program Object OBJ_ST

Bitmap Image IMG_BMP

Gif Image IMG_GIF

Jpeg Image IMG_JPG

Rav Sound SND_RAV

Logical Relational Data Modeling Standard Page 22