GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA -...

30
Page 1 CvLvP - May 31, 2016 © Gordon C. Everest, All rights reserved. Presentation to DAMA, Minnesota, 2016/06 Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE © Gordon C. Everest Professor Emeritus of MIS and Database Carlson School of Management University of Minnesota [email protected] http://geverest.umn.edu Conceptual vs. Logical vs. Physical Stages of Data Modeling 2 Outline Goals of this presentation Levels of Data Models - Conceptual vs. Logical vs. Physical Data Models - Role of Abstraction in Conceptual Models; examples Data Modeling Data Modeling Schemes Data Models – focus and name for Stages of Data Models/Modeling - A continuum of introducing modeling constructs - Starts with a user narrative => elementary fact sentences - Objects (nouns) – instances, types, populations, sub/supertypes - Relationships (verbs) => Characteristics and Constraints - Attributes – where do they fit? - Identifiers, Keys, Foreign Keys CvLvP [slide#] [4] [10] [23] [30] [35] [40] N

Transcript of GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA -...

Page 1: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 1

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

1

DAMA - Minnesota, 2016 JuneGETITLE

©Gordon C. EverestProfessor Emeritus of MIS and Database

Carlson School of ManagementUniversity of Minnesota

[email protected] http://geverest.umn.edu

Conceptualvs. Logicalvs. Physical

Stages of DataModeling

2

Outline

Goals of this presentation

• Levels of Data Models- Conceptual vs. Logical vs. Physical Data Models- Role of Abstraction in Conceptual Models; examples

• Data Modeling • Data Modeling Schemes• Data Models – focus and name for• Stages of Data Models/Modeling

- A continuum of introducing modeling constructs- Starts with a user narrative => elementary fact sentences- Objects (nouns) – instances, types, populations, sub/supertypes- Relationships (verbs) => Characteristics and Constraints- Attributes – where do they fit?- Identifiers, Keys, Foreign Keys

CvLvP

[slide#]

[4]

[10]

[23]

[30]

[35]

[40]

N

Page 2: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 2

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

3

SCOPE

User Narratives

SENTENCES“FACTS”

NOUNS

VERBS

OBJECTInstances

OBJECTTYPES

OBJECTNAMES

OBJECTPopulation

CONSTRAINTS

CONSTRAINTSdefined after

element introduced

ObjectIDENTIFIERS

DATA TYPES

SUB/SUPERTYPES

S/STYPECONSTRAINTS

RELATIONSHIPTYPES

RelationshipNAMES

MULTIPLICITY

DEPENDENCY

CLUSTER

ENTITYRecords

ATTRIBUTES

KEYS FOREIGNKEYS

DENORMALIZE

INDEXES

COLUMN NAMES

RESOLVEto Tables

RelationalTABLES

Relationship& Role Set

CONSTRAINTS

( 1NF )

BINARYonly

Fact ModelingER/RelationalModeling

PhysicalModeling

Stages of Data Modeling: Introducing Data Elements or Constraints

A B C

ROLENAMES

PARTITIONING

DISTRIBUTION

CvLvP

?

4

Common Understanding - Levels

• Conceptual – high-level, enterprise-wide, abstract model

• Physical – How data is stored in some database system

• Logical – adding detail to the conceptual model, … free of physical implementation details which do not contribute to the logical understanding of the data model.

– Often considered the ER or Relational Model.

CvLvP

Let’s look at the generic meaning of these terms, but first…

Generally depictedas a pyramid,

implying levelsof models:

Conceptual

Logical

Physical

See David HAY video

Page 3: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 3

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

5

Conceptual Data Model? Logical?

MAINTDIST Maintenance

District

COMMISSIONCommissioner

COMMWORKCommissionerHours Worked

COMREPORT Commissioners

Report

PETITIONPetition &

Lis Pendens

TRIALSETL Trial and

Settlement

IMPROVEMENT Improvementson R/W Parcel

OTHERBIDS Other Bids

SALESACTSales Action

EMDOMACTEm Domain Action: St vs.

COMASSIGN Commissioner Assignment

LEASE Lease

EDPARCTRK EmDom Parcel

Tracking

LESSEE Lessee

CONTRACTORContractor

REMOVCONT Removal Contract

FINALCERT Final

Certificate

COUNTYCounty

Num | Code... AGREEMENT Agreement

AUTHMAP Authorization

Map

CHARGEIDCharge

Identifier

RWPROJR/W PROJECT

900's or Dash # FEDPROJ Federal Project

PMSSPROJPMSS Project

PROJECTSProject Actions

COMORDACT CommissionersOrders Action

COMMORDER Commissioners

Order

PARTY NAD Party Name & Address

PARTY INT Party toInterest

INTHOLDER InterestHolder

APPRAISER Appraiser

APPRAISAL Appraisal

APPACTION Appraisal

Action & Cert

OCCATTRNY Occupant

Attorney NAD

MEMBERS Household Members

OCCUPANTOccupant

Relocation

RELOCPMTS Relocation

Payments & Appls

DIRPURCH Direct

Purchase

SUPHOUSINGSupplemental

Housing

3-5 1-4

5/yr

rare

rare10%

10%

rare

20%

rare

usually 1

rare <99

3%

0-2

?

ROADSECTRoad Section

Cty# |RS#

<.01

<- last

latest

V

<3

3%

2 if EG m if 88

Minnesota DOT Right of Way

Database Structure

Gordon C. Everest

LEGENDOne )----------E

Dependent -- --D -- --Orphan -- -- -- -- F -- --Foreign ID -- -- -- -- -->

DMODPRE

( many

N

PARCEL

Interest in aLand Parcel

6

PARCELInterest in a

Parcel of Land

Conceptual Data Model?DMODPRE

N

Page 4: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 4

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

7

Conceptual - Definition

CONCEPTUAL:

• Consisting of, relating to, concerned with…Concepts*; abstract.

• Concerned with the definitions or relations of concepts, rather than the facts.

Synonyms: theoretical, visual, imaginary.

Antonyms: real; facts.

*CONCEPT: • an idea of what something is or how it works;

something formed in the mind; a mental image.

-- Mirriam-Webster, Dictionary.com

If mental, how do we document, communicate?

If entity/object, relationship, identifier, domains – already logical?

If add attributes, foreign keys – now Relational.

CvLvP

8

Logical - Definition

LOGICAL• Of or according to the rules of logic or formal argument;

characterized by or capable of clear, sound reasoning.

Synonyms: natural, reasonable, sensible, understandable*

Logical Data Model – a model of some user domain**complete and understandable in the detail needed to represent that domain,

built according to and consistent with some formal modeling scheme,

within a defined scope.

**area of the business being modeled - real world, user world, domain of discourse, subject area, …

*Understandable - defined, documented, communicated.

CvLvP

Page 5: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 5

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

9

Physical Data Model

• How data will be encoded and stored

• Implemented in some data system (DBMS, NoSQL…)

• Dealing with storage & processing performance, volumetrics (time & space), partitioning, distribution.

Physical Data Model – a stored representation of a Logical data model

Physical vs. Logical separation• Historically, to better understand physically stored data existing

on punched cards, tape, etc. the notion of a logical representation was introduced to strip away storage considerations and focus on documenting just the logical aspects of the data.

• Logical derived from, a representation of… the Physical

CvLvP

10

AbstractionABSTRACTION* = “leaving something out”; Hiding

*Webster Dictionary: abstract. (n) summary; shortened version. abstraction. (n) the act of taking away

(v) to take out, remove something.(adj) as in abstract object (vs. concrete) - a different meaning, not useful here.

In Designing/Developing a Data Model: (can’t do it all at once)

• Start with high-level preliminary sketches (top down)Details are still presumed to be present, yet to be added

• Work on one part or subject area at a time• Could also start with some details (bottom up)• Once built, (how) do you maintain the Conceptual Model? Useful?

In Presenting a Data Model: (already completed in all its detail)

• Start with a high-level view, then successively add detail–> VERTICAL ABSTRACTION

• One part at a time –> HORIZONTAL ABSTRACTION

CvLvP

N

Page 6: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 6

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

11

Sample Data Model

MAINTDIST Maintenance

District

COMMISSIONCommissioner

COMMWORKCommissionerHours Worked

COMREPORT Commissioners

Report

PETITIONPetition &

Lis Pendens

TRIALSETL Trial and

Settlement

IMPROVEMENT Improvementson R/W Parcel

OTHERBIDS Other Bids

SALESACTSales Action

EMDOMACTEm Domain Action: St vs.

COMASSIGN Commissioner Assignment

LEASE Lease

EDPARCTRK EmDom Parcel

Tracking

LESSEE Lessee

CONTRACTORContractor

REMOVCONT Removal Contract

FINALCERT Final

Certificate

COUNTYCounty

Num | Code... AGREEMENT Agreement

AUTHMAP Authorization

Map

CHARGEIDCharge

Identifier

RWPROJR/W PROJECT

900's or Dash # FEDPROJ Federal Project

PMSSPROJPMSS Project

PROJECTSProject Actions

COMORDACT CommissionersOrders Action

COMMORDER Commissioners

Order

PARTY NAD Party Name & Address

PARTY INT Party toInterest

INTHOLDER InterestHolder

APPRAISER Appraiser

APPRAISAL Appraisal

APPACTION Appraisal

Action & Cert

OCCATTRNY Occupant

Attorney NAD

MEMBERS Household Members

OCCUPANTOccupant

Relocation

RELOCPMTS Relocation

Payments & Appls

DIRPURCH Direct

Purchase

SUPHOUSINGSupplemental

Housing

3-5 1-4

5/yr

rare

rare10%

10%

rare

20%

rare

usually 1

rare <99

3%

0-2

?

ROADSECTRoad Section

Cty# |RS#

<.01

<- last

latest

V

<3

3%

2 if EG m if 88

Minnesota DOT Right of Way

Database Structure

Gordon C. Everest

LEGENDOne )----------E

Dependent -- --D -- --Orphan -- -- -- -- F -- --Foreign ID -- -- -- -- -->

DMODPRE

PARCEL

Interest in aLand Parcel

( many

FOCUS

What could improve this presentation?N

CvLvP

12

2.a HORIZONTAL ABSTRACTION

- Partitioning

• Fencing off a part of the Diagram:– Often helpful to have some overlap of the partitions

APPRAISER Appraiser

APPRAISAL Appraisal

APPACTION Appraisal

Action & Cert

DIRPURCH Direct

Purchase

3%

0-2

DMODPRECvLvP

PARCEL

Interest in aLand Parcel

Page 7: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 7

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

13 DISTRICT

COMMISSIONERSORDERS

CONSTRUCTIONPROJECT

LEGALAGREEMENT

CONTRACTOR

AUTHORIZATIONMAP (GRAPHIC)

2.b DEPTH – Vertical Levels of Abstraction B

INTERESTHOLDER"OWNER"

APPRAISAL

INTEREST IN APARCEL OF LAND

DIRECTPURCHASE

OFFER

SUPPLEMENTALHOUSING PAYMENT

RELOCATIONOCCUPANT

CONDEMNATIONACTION

IMPROVEMENTS ONLAND PARCEL

IMPROVEMENT

REMOVAL CONTRACT

CONTRACTOR

SALES ACTION

OTHER BIDSAPPRAISER

APPRAISAL ACTION

APPRAISAL

CERTIFIED APPRAISAL

APPRAISER

NAME

ADDRESS

RATINGS

FEE RATES

APPRAISER:

ID NUMNAME, PERSONADDRESS, MAILINGPHONEALTPHONENAME-COMPANY (OPT)DATE OF LAST APPRAISAL (der)QUALIFICATION RATINGEVALUATION RATINGTESTIMONY RATINGHOURLY FEEWORK AGREEMENT NAMEXPIRATION DATE

DMODPRE

Drilling downon parts forincreasinglevels ofdetail.

How manyobjectshere?

N

CvLvP

Adding Attributes:

PARCEL OF LAND

14

HECB Student DatabaseDMODPRE

Is this how you would first present the data modelto your users?

What entity or entities are the most important?

FIRST:

What is missingfrom this diagram-- two things?

Page 8: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 8

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

15

HECB Student Database

Unfolding detail from the most important:

DMODPRE

PAST, PRESENTor PROSPECTIVE

STUDENT

POST-SECONDARYEDUCATIONAL

INSTITUTION

ENROLLMENT

PROGRAM(Degree/Diploma/Certif)

COMPLETION

FINANCIALAID

?

16

Student-Course High-Level Data Model

Start with major:

• Entities, and

• Relationships

COURSE INSTRUCTOR

STUDENT

DMOD

Page 9: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 9

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

17

Student-Course Data Model

Adding Intersection Entities:

• to resolve M:N Relationships

• to store additional attributes

REGISTRATIONCOURSEOFFERING

in >

DMOD

COURSE

STUDENT

INSTRUCTOR

18

Student-Course Data Model

• Adding AttributesSTUDENT

REGISTRATIONCOURSEOFFERING

COURSE INSTRUCTOR

StudentID

Name

Address

Major

Course#

Title

Credits

Grade

Year

Term

Section

Building

Room

Days-of-week

TimeStart

TimeEnd

SSN

Name

Address

Phone

Dept

in >

DMODPREDMOD

Page 10: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 10

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

19

Extended Student-Course Data Model B5

STUDENT

REGISTRATIONCOURSEOFFERING

COURSE INSTRUCTOR

StudentID

Name

Address

Major

Course#

Title

Credits

Year

Term

Section

Building

Room

Days-of-week

TimeStart

TimeEnd

SSN

Name

Address

Phone

Dept

in >

DMODPREDMOD

DEPTDeptNo

Name

TEXTBOOK

?AUTHOR

ISBN

Title

Author(s)

Grade

Description

Office Number

Course# (FK)

StudentID (FK)

CourseOffID (FK)

(City, State, Zipcode)

(FK)

20

Student-Course Database - Table Diagram

COURSECourse# TitleDescriptionCredits

INSTRUCTORSSNLastNameFirstNameAddressPhoneDept

STUDENTStudent IDNameAddressMajorGPA

COURSEOFFERINGCourse#YearTermSectionBuildingRoomDaysTime StartControlEnrollmentInstructor SSN

REGISTRATIONCourse IDStudent IDGrade

LEGEND:

ENTITY NAME (upper case)

Identifier (bold face)

Attributes (not bold face)

Foreign Key IdentifierM:1 relationship

Adding Attributes & FKeys. Diagram of the Schema:

What if you move the arrow head to the other end of the arc?

DMOD

Page 11: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 11

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

21

ORM Data Model - Presentation

BOSS

LIMITLIMIT

SKILL

RATING

EMPLOYEEDEP

Tworks in employs

supervises is headed by

reports to superior to

may spend up to of spending for

with proficiency of assigned to

possesses possessed by

"EmployeeSkill!"

1 .. 10

1000 .. 9999

ac

SALARY(dollars)

earns paid to

DESCRIPTION(name)

has is of <=5

DMODPRE

(number) (number)

(code)

A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottom-up design,

BUT… ER Diagrams usually hide the details of attributes and mostconstraints.

So, present the ORM model using a series of top-down unfolding … abstractions.

22

Abstractions of ORM Data Model

BOSS

LIMITLIMIT

SKILL

RATING

EMPLOYEEDEPT

works in employs

supervises is headed by

reports to superior to

may spend up to of spending for

with proficiency of assigned to

possesses possessed by

"EmployeeSkill!"

1 .. 10

1000 .. 9999

ac

SALARY(dollars)

earns paid to

DESCRIPTION(name)

has is of

<=5

DMODPRE

1. Hide "Terminal" (M:1) Objects (=> Attributes)

(number)(number)

(code)

2. Hide Reference Modes

3. Hide Constraints

4. Hide Less Important Objects & Predicates- Subtypes- Objectified Predicates- Reflexive Relationships

5. Hide all Predicates

Leaving BASE Entities!

6. Add back Multiplicitychar. on relationships

=> A High-level Abstract “Conceptual” Data Model...

an ER Diagram ?!!!

SKILL

EMPLOYEE DEPT

Is this the same data modelwe started with?

Page 12: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 12

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

23

Levels (or Stages?) of Data Models

• Reality - the real world User Domain, infinitely complex

• Mental Model - in our minds– must be formally documented so we can communicate it to others

• Conceptual Model - "natural", unconstrained, initial.– independent of physical storage and implementation

• Logical Model - according to a modeling scheme– e.g., the E-R or Relational Model (most popular today)

• Physical Model - defining storage characteristics– Encoding, storage structure and access methods (indexes, etc.)

• Implementation Model - for a given DataStore Manager– memory organization (blocking, buffering, partitioning,

distribution, etc.)

DMODCvLvP

24

Objective of Data Modeling

(WHAT we are trying to do)

TO ACCURATELY AND COMPLETELY MODEL

SOME PORTION OF THE REAL WORLDUNIVERSE OF DISCOURSE (UoD)

(the USER DOMAIN)

OF INTEREST TO SOME ORGANIZATIONOR COMMUNITY OF USERS.

DMOD

Page 13: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 13

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

25

Modeling: is Choosing...

• SCOPE / Boundary- where to look

• FOCUS- what to look for

• DEPTH / Resolution- how much detail to look for

... based upon our PURPOSE

DMOD

REALITY is Infinite, Complex, Multidimensional, Detailed.

- so we must CHOOSE:

26

A Model is an Abstraction DMOD

Both realities are infinitely complex.

NEED some constructs to look for and use in modeling.

Sometimes we have the data, and try to find what it means.

Abstract

“Conceptual” View

of the Real World

Concrete Symbols

Stored on some Medium

“Logical”

“DATA” MODEL

MentalModel

Physical (Storage)

Model

REALIZATION

TWO PERSPECTIVESof Data:

CvLvP

Page 14: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 14

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

27

Modeling Process – the HOW

MODEL = Abstract (Re).present.(ation)

Knowledgein the world(infinitely complex)

Knowledgeexternalized,formalized, shared.

Knowledgein the head(mental models)

Reality MODELMODELINGPROCESS

DMOD

pres

ent

Re .

pres

ent

What drives or guides the process?

28

The Modeling Process

Real WorldUniverse of Discourse

MODELINGPROCESS

MODELING SCHEMEContext

ConstructsCompositionConstraints

MODEL

perception

selection/filtering

DMOD

METHODOLOGY:

Steps/Tasks + Milestones + Deliverables +

REPRESENTATIONAL FORMS:Narrative, Graphical Diagram,Formal Language Statements

(the Syntax)The Semantics are most important

The SEMANTICS of a data modelcan only be seen through the presentation, the SYNTAX.N

Page 15: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 15

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

29

Data Model to Database Realization

DATAMODEL

DATABASE"Schema"

DEFINITION

DatabaseDefinitionLanguage

DATABASE

DataBaseManagementSystem

DataBaseManagementSystem

DATABASEDEFINER

describesdata

Input&

Query

DDL

stmts

DMOD

30

Data Modeling Schemes

• ALL data modeling activity and data management tools are driven or guided by some Data Modeling Scheme

• Think of it as a Meta Model (or Meta-Meta-Data)

• Logical Rules however formal or informal

• May be developed independently of any implementation Not based on any particular implementation

• Many variants within families

• Tells you what to look for, what constructs to use, how to put them together (compose) with what constraints, and how to represent that all syntactically.

CvLvP

Since all data modeling is driven by some modeling scheme,i.e., by some logical rules for building a model,

All data models are logical models!

Page 16: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 16

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

31

The Many Faces of Databases B

Logical Database Structures

Flat File(FORTRAN)

HierarchicalFile (COBOL)

CODASYL

Network

Object-Oriented(UML)

Multi-Dimensional“Star” schema

Fact Modeling(ORM)

Relational

ANSI SQL

DMOD

1

2

3 4

6

78

Snowflake

5

What do all these

have in common?----->

5

7s

SingleFile

(1)

DatabaseMulti-File

(M)

NoFile

(0)

© Gordon C. Everest

32

Data Modeling Schemes – Types & Examples

• Developed along generation linesSCHEME Examples

Flat file Fortran (1956?), spread sheet (VisiCalc, Multiplan>Excel, Quattro)

Hierarchy COBOL*(1960), System 2000, HQL*

NetworkO-O ext.

CODASYL*(1971), IDMS, ANSI-NDL*, IMS (DL/1), AdabasOO-COBOL*, ANSI-SQL:1999*, UML*

E-R Chen*(1976), IE*(Finkelstein), Barker, IDEF1X*(ERwin), ER Studio,

Relational(SQL)

Codd*(1970), SEQUEL*(1976), Oracle (1979), DB2, ANSI-SQL*(1986), Sybase, SQL Server, Dbase II

(Inverted) Fully indexed - not really a logical scheme, Model 204, CASE 360(IBM)

Dimensional As a “Cube”: EXPRESS(6) (MDS>IRI>Oracle), Multiplan(3), MicroStrategyAs a Relational Model = Star Schema*(R. Kimball), Red Brick

Fact-Based NIAM*(1976), ORM*(1989), NORMA, FCO-IM

NoSQL A family of tools to overcome the limitations of SQL tools.Each tool has its own modeling scheme – key-value pair, columnar, document (XML, Hierarchical), graph (nodes & edges).

*initially not an implementation but a concept paper or language specification

CvLvP

Page 17: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 17

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

33

Modeling Schemes in NoSQL Tools

• NoSQL refers to a family of tools designed to handle Big Data, “Unstructured” Data, Fast(er than SQL tools)

• Based on particular data storage schemes

• Vendors have augmented their Relational/SQL tools with some of these storage schemes, and other models

• One driver is OO programming languages which handle objects of varying structure and complexity.Mapping OO to a relational structure is inadequate.

PhysicalScheme Examples

Key-Value pairValue - any complex structure

Dynamo, Redis, Riak, LevelDB- an index

Graph (O1 O2 R – triplets) Neo4j, OrientDB, Infinite Graph, Mark Logic

“Document” (XML, JSON) MongoDB, CouchBase, Mark Logic

(Wide) Column storesInverse (“dual”) of Tables

Cassandra, HBase

CvLvP

34

Criteria for a Data Modeling Scheme

• Simple, understandable – for human communication

• Comprehensive – can model every phenomenon in theuser domain, e.g., overlapping populations ==> generalization

• Direct – visually intuitive, unambiguouse.g., “Fork” for manyness <

– without spurious, artificial, intermediate constructse.g, intersection entity (for M:N), foreign key (redundant with an arc)

• Minimal – at most one way to model a given phenomenon

• Consistent – uses same syntax for similar phenomenone.g., for dependency within a record, between records, and S/Stypes

• Universal – independent of language

DMOD

Page 18: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 18

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

35

Outward Facing vs. Inward FacingOutward Facing – to the business user domain

Inward Facing – to existing, stored data

CvLvP

• Historically we had data on punched cards or paper tape, and needed a representation which transcended its physical storage, hence, logical data models. Still inward facing.

• Next we found logical models too complex and needed to simplify, particularly at the beginning stages of development, hence conceptual data models, even before logical models.

• Then we realized that these models were really representations of things in the business user domain, hence outward facing.

• The modern approach to data modeling is to begin by modeling user domains independent of any physical storage or implementation considerations.

• More recently, we collect massive amounts of data (BIG data), it exists. Now the challenge is to process it efficiently (hence NoSQL tools), and apply analytics to make sense of it.

NOTE: Someone designed the stored data, so where are the definitions?

36

Data Model – Outward Facing

Initially a data model is outward facing, to the businessCvLvP

• Whether modeling big data, fast data, thick data, NoSQL data, Relational data (SQL) … (these are all representations for physical implementation) you still need to know, understand and document the business.

• The “first stage” data model is a fully detailed model of the business independent of any physical implementation

BUT… capturing rich, detailed semantics which describe the user domain in the model.

Page 19: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 19

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

37

“Data” Modeling is NOT about:

• Scope – what do we mean by “enterprise-wide”?

• Simplified, high-level, abstract, “conceptual”– Whether in data model development or– A matter of presentation, choosing to hide detail.

• Syntax – chosen notation to represent a Data Model– Data Modeling is about Semantics – meaning– The same semantic can have several notations

- e.g., multiplicity in a relationship: --<(fork), ‘M’, *, -->>

• Storage, Physical Implementation, Performance– Only exogenous information to represent the user domain– No unnecessary, artificial, spurious constructs introduced

on the path to implementation, e.g., FKey, 1NF, entity records!

– User-facing, NOT database/datastore-facing

Though these are all important aspects of Data Modeling.

CvLvP

38

What to Call our model?

• Conceptual is a scoping and presentation issue

• Physical is not part of the Data Model

• Logical is what we are left with. But all models are logical!

• Data Model - but not always a model of data,particularly when the database has not yet been built

None of these adjectives are helpful when referring to data models so…

How do we distinguish types of data models?

What do we call the initial complete data model?

CvLvP

Page 20: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 20

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

39

A Business Data Model

For our initial but complete, detailed data model• Capturing all exogenous* information about the user

domain which is of interest

• Capturing only exogenous information about the user domain

• Our mental models need to be externalized, and formally documented to be communicated.

• Hence, we need a modeling scheme with a rich Syntax to represent the Semantics of the user domain in the model

• Devoid of anything relating to physical storage, technology, encoding, implementation, etc.

CvLvP

* relating to, developed or derived from external factors; originating from outside

Let’s call it a “Business Data Model” (G. Witt)Halpin calls it the Conceptual Data Model.

40

Introducing Design Elements

As we move through the continuum of:

Conceptual ==> Logical ==> Physical

CvLvP

• How to rationalize the many differences and alternatives in logical data models?

• Logical data models differ based on which modeling elements are included in the modeli.e. the modeling scheme

SO

• Let’s lay out the various design elements in a precedence graph reflecting order of introduction

Page 21: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 21

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

41

Data Modeling Constructs

ENTITY(OBJECT)

ATTRIBUTE(Data Item)

RELATIONSHIP

IDENTIFIER [ FOREIGN KEY ]

characteristics

characteristics

What to look for: Relative emphasis differentiates Data Modeling Schemes

• ER modeling focuses on Entities and Relationships,de-emphasizing, even hiding Attributes.

• Relational (restricted ER, 1NF) focuses on Entities and Attributes,relegating Relationships to Foreign Keys.

• Object Role Modeling (ORM) folds Attribute and Entity into Object

DMOD

VALUEWhat about ?

N

42

Traditional “Levels” of Data Models

MentalModel Database

RealityUser Domain

N

| | | |

Managed

Datastore

CvLvP

Let’s forget levels, and focus on the ordering of design elements

Are intersection/associative entities or foreign keyspart of the logical model or the physical model?

Conceptual

Logical

Physical

Page 22: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 22

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

43

Introducing Elements of a Data Model

MentalModel Database

Point at which we have all the essential exogenous semantic

information from theuser domain to build a complete data model

=====>

=====> Point from which we begin to introduce additional data elements to physically implementthe data model (build a database).

Continuum for introducing

Elements of a Data Model

What to call it?_________ Data Model

RealityUser Domain

All of these are logicalbecoming physical.

N

CvLvP

Business

44

Data Modeling Constructs

• “Things” – objects, entities, attributes– Names of things– Populations (or types) of things, Domains (of values)– Subtypes/Supertypes to model overlapping populations

• Relationships– Names of relationships, Names of Object Roles– Characteristics/constraints (dependency, multiplicity)– Ternary+++ relationships

• Entity records– Clustered attributes (based on relationships)

• Identifiers– Encoded representation of instances of things (IDs)– Keys, Foreign keys

CvLvP

________________________________Business Data Model.moving to implementation

Page 23: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 23

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

45

SCOPE

User Narratives

SENTENCES“FACTS”

NOUNS

VERBS

OBJECTInstances

OBJECTTYPES

OBJECTNAMES

OBJECTPopulation

CONSTRAINTS

CONSTRAINTSdefined after

element introduced

ObjectIDENTIFIERS

SUB/SUPERTYPES

S/STYPECONSTRAINTS

RELATIONSHIPTYPES

RelationshipNAMES

MULTIPLICITY

DEPENDENCY

Relationship& Role Set

CONSTRAINTS

Fact Modeling

Stages of Data Modeling: Introducing Data Elements or Constraints

OBJECTROLE

NAMES

CvLvP

?

CLUSTERING

46

SCOPE

User Narratives

SENTENCES“FACTS”

NOUNS

VERBS

OBJECTInstances

OBJECTTYPES

OBJECTNAMES

OBJECTPopulation

CONSTRAINTS

CONSTRAINTSdefined after

element introduced

ObjectIDENTIFIERS

DATA TYPES

SUB/SUPERTYPES

S/STYPECONSTRAINTS

RELATIONSHIPTYPES

RelationshipNAMES

MULTIPLICITY

DEPENDENCY

CLUSTER

ENTITYRecords

ATTRIBUTES

KEYS FOREIGNKEYS

DENORMALIZE

INDEXES

COLUMN NAMES

RESOLVEto Tables

RelationalTABLES

Relationship& Role Set

CONSTRAINTS

( 1NF )

BINARYonly

Fact ModelingER/RelationalModeling

PhysicalModeling

Stages of Data Modeling: Introducing Data Elements or Constraints

A B C

ROLENAMES

PARTITIONING

DISTRIBUTION

CvLvP

?

Page 24: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 24

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

47

Observations on the Diagram

• Showing the precedence ordering of the introduction of data modeling elements

• At some point we have all the exogenous semantic information needed to complete the model. Up to that point our model is outward or user-facing

• Anything introduced later is physical realization or implementation, i.e., inward facing.

Stages:

• Fact Modeling

• >> controversial elements in between

• ER / Relational Modeling

• Physical Modeling

• Implementation Modeling

CvLvP

48

Start with User Narratives

Begin with Statements from User Domain Experts (SMEs) within a defined, agreed upon scope; they are the primary source of knowledge about the world being modeled.Then find the Model Elements

• Analyze the Vocabulary in the User Narratives – develop agreed upon definitions

• Breakdown user narratives => into elementary fact sentences

• Extract the nouns => become Things or Objects

• Extract the verbs => become Relationships

• Extract other words/phrases => become Constraints

CvLvP

Page 25: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 25

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

49

constraint

verbVerbalize User Descriptions

GIVEN A DESCRIPTION FROM THE USER(S):ORMODLG

Famous Foods, a small, specialty food wholesaler, fills orders for restaurants. Customers have names, addresses, etc. An order can include several products. Products have unique SKU numbers, descriptions, manufacturer, etc. The company has one big warehouse with many rooms on several floors. Each product is stored in only one bin location in the warehouse, but it can change frequently. Multiple products may be stored in the same bin. Bin numbers are only unique within a room, hence the same number can be used in different rooms. Since the bin locations can be hard to find in a room (could be on a shelf, on the floor, in a cabinet or cooler, hanging from the ceiling, etc.), and the rooms can be hard to find in the warehouse (with many hallways, doors, tunnels, split levels, mezzanines, etc.), explicit location directions must be recorded for each room and for each bin in the room. Location information is a textual narrative and is used by the pickers who run around gathering the items to fill an order. Each product has its own standard price but it may be modified by applying a discount (a fraction) on any individual order. The discount can be different for each of the products on an order, and for the same product on different orders. The quantity of each product on an order is recorded ( it is not the quantity on hand or in inventory). Terms indicates the number of days during which a standard discount can be taken on the payment. The terms can vary from one customer to the next, and from one order to the next for the same customer.

noun

50

Establishing the Vocabulary

Before we can develop a data model we must first carefully define our terms so we can talk about it• From a business perspective

• By the user domain or subject matter experts– Listen to what they say, talking about the domain

• Initially will be fuzzy, with areas of disagreement– Requiring some discussion and negotiation to come to a common

understanding; and documenting that– The most difficult and important aspect of data modeling

Call it a business glossary or [data] dictionary?

However, a glossary is usually only for nouns (objects).

We also need to define the relationships - the mortar that holds the bricks (nouns) together… and constraints.

CvLvP

N

Page 26: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 26

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

51

Objects

• Encompass Entities,and Attributes… independent of entities described

• Derived from the nouns in the user narratives

• A single instance (of what population?)… or a population of individual instances

– e.g., given the noun ‘George’: until it is associated with a particular, defined population, it is just a string of characters

• Each Object Population – uniquely named

• Define the population, criteria for inclusion of members, how we know we have one, what’s not included, etc.e.g., does Employee include retired, suspended, laid off, contract, visitor, temp

• Not concerned (yet) with how the members are represented – identifiers - surrogate lexical encoding. e.g. Days of the Week, 7 members, one represented by – Tuesday, Tues, Tue, Tu, Mardi, Martes, …

N

CvLvP

52

Objects – 2

• Grouping individual instances into populations does not occur naturally in the real world.The designer chooses to include members based on some common characteristics for some purpose(s)

• By convention we name object type populations with a singular noun, makes it easier to build sentences

• NOTE: in general, individual object populations could be overlapping, i.e., an individual could be a member of multiple populationse.g., an Employee could also be a Customer or a ShareholderThis is handled using Subtype/Supertype constructs

USER NARRATIVES in the Domain of Discourse=> NOUNS => OBJECT instances

=> OBJECT TYPES => NAMES

CvLvP

Page 27: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 27

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

53

Subtypes/Supertypes

• Data modeling schemes assume Object Populations are strictly disjointi.e., an individual member of an Object Population cannot be a member of any other Object Population

• We know that is not always truee.g., Person can be Employee, Customer, and ShareholderIf these are modeled as separate populations, redundancy results which can lead to inconsistent data.Maintaining consistent data becomes a user responsibility

• S/Stype construct is used to formally represent overlapping populations. It only depends upon the nature of defined Object populations.

• Supertype is a generalization of its Subtypes.Several constraints can be defined on S/Stypes.

See Dataversity Webinar

OBJECT TYPES => SUBTYPE/SUPERTYPES=> S/Stype CONSTRAINTS

CvLvP

54

Relationships

• “Connection” between or among members of one or more object populations.Arity = number of Roles played by Objects participating in the Relationship, e.g., Unary, Binary, Ternary, etc.

All valid X-A pairs(in the R/W)

X A(Binary) RELATIONSHIP TYPE:

RELATIONSHIP Instances:

X

A

What are the characteristics of the relationship ‘X-A’ ?

CvLvP

Page 28: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 28

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

55

Relationship Names and Object Roles

• Naming all Relationships and Roles not necessary,can use the Object Names as a default: Relationship “X-Y”

– But user narratives will include verb phrases to reference relationshipsmaking it easier to form sentences when talking about the user domain.

− Except if there are multiple relationships on X-Ye.g., “Employee works in Dept” and “Employee heads Dept” in which case the Employee plays the role of Boss in the “heads” relationship.NOTE: role order matters, e.g.,binary has two readings

− Except if the same Object type plays multiple rolese.g., “Person is parent of Person” then must name the relationship or distinguish the roles as Parent and Child

• Object Role names are nouns within context of a relationship

USER NARRATIVES in the Domain of Discourse => OBJECT TYPES => => VERBS => RELATIONSHIP TYPES

=> NAMES => ROLE NAMES

CvLvP

56

Constraints on a Relationship

• The defaults are the least constrained– Multiple

- every object instance may participate more than oncee.g., many-to-many (M:N) for a binary relationship

– Optional- every object instance need not participate in the relationship

• The Constraints would be: (the opposites)– Exclusive – at most one

– Dependent (Mandatory, Required …) – at least one

Many different notations, sometimes confusing.Combination called ‘Cardinality’ [min:max], a notational convenience

RELATIONSHIP TYPES => EXCLUSIVITY Constraint=> DEPENDENCY Constraint

CvLvP

Page 29: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 29

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

57

What is an Attribute?

An ATTRIBUTE is …ORMvER

What comes first?

Ω

N

playing a ROLE

with some (other) OBJECT.

in a RELATIONSHIP

an OBJECT...

of what?

RELATIONSHIPS => MULTIPLICITY => CLUSTERING => => ENTITY records => ATTRIBUTES

CvLvP

58

Data Modeling Constructs

ENTITY(Object)

ATTRIBUTE(Data Item)

RELATIONSHIP

IDENTIFIER [ FOREIGN KEY ]

characteristics

characteristics:

What to look for:DMOD

DOMAIN

What’s the difference?N

A Day of the Week:Tuesday, Tues, Tu, Mardi, Martes...

Page 30: GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA - Minnesota, 2016 June GETITLE ... Conceptual vs. Logical vs. Physical Stages of Data

Page 30

CvLvP - May 31, 2016

© Gordon C. Everest, All rights reserved.

Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling

59

SCOPE

User Narratives

SENTENCES“FACTS”

NOUNS

VERBS

OBJECTInstances

OBJECTTYPES

OBJECTNAMES

OBJECTPopulation

CONSTRAINTS

CONSTRAINTSdefined after

element introduced

ObjectIDENTIFIERS

DATA TYPES

SUB/SUPERTYPES

S/STYPECONSTRAINTS

RELATIONSHIPTYPES

RelationshipNAMES

MULTIPLICITY

DEPENDENCY

CLUSTER

ENTITYRecords

ATTRIBUTES

KEYS FOREIGNKEYS

DENORMALIZE

INDEXES

COLUMN NAMES

RESOLVEto Tables

RelationalTABLES

Relationship& Role Set

CONSTRAINTS

( 1NF )

BINARYonly

Fact ModelingER/RelationalModeling

PhysicalModeling

Stages of Data Modeling: Introducing Data Elements or Constraints

A B C

ROLENAMES

PARTITIONING

DISTRIBUTION

CvLvP

?

60

Conceptual vs. Logical vs. Physical

Data Models

Questions?©Gordon C. Everest

Professor EmeritusCarlson School of Management

University of [email protected] http://geverest.umn.edu

GETITLE