GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA -...
Transcript of GETITLE Conceptual vs. Logical vs. Physical - DAMA-MN 2016... · Stages of Data Modeling 1 DAMA -...
Page 1
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
1
DAMA - Minnesota, 2016 JuneGETITLE
©Gordon C. EverestProfessor Emeritus of MIS and Database
Carlson School of ManagementUniversity of Minnesota
[email protected] http://geverest.umn.edu
Conceptualvs. Logicalvs. Physical
Stages of DataModeling
2
Outline
Goals of this presentation
• Levels of Data Models- Conceptual vs. Logical vs. Physical Data Models- Role of Abstraction in Conceptual Models; examples
• Data Modeling • Data Modeling Schemes• Data Models – focus and name for• Stages of Data Models/Modeling
- A continuum of introducing modeling constructs- Starts with a user narrative => elementary fact sentences- Objects (nouns) – instances, types, populations, sub/supertypes- Relationships (verbs) => Characteristics and Constraints- Attributes – where do they fit?- Identifiers, Keys, Foreign Keys
CvLvP
[slide#]
[4]
[10]
[23]
[30]
[35]
[40]
N
Page 2
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
3
SCOPE
User Narratives
SENTENCES“FACTS”
NOUNS
VERBS
OBJECTInstances
OBJECTTYPES
OBJECTNAMES
OBJECTPopulation
CONSTRAINTS
CONSTRAINTSdefined after
element introduced
ObjectIDENTIFIERS
DATA TYPES
SUB/SUPERTYPES
S/STYPECONSTRAINTS
RELATIONSHIPTYPES
RelationshipNAMES
MULTIPLICITY
DEPENDENCY
CLUSTER
ENTITYRecords
ATTRIBUTES
KEYS FOREIGNKEYS
DENORMALIZE
INDEXES
COLUMN NAMES
RESOLVEto Tables
RelationalTABLES
Relationship& Role Set
CONSTRAINTS
( 1NF )
BINARYonly
Fact ModelingER/RelationalModeling
PhysicalModeling
Stages of Data Modeling: Introducing Data Elements or Constraints
A B C
ROLENAMES
PARTITIONING
DISTRIBUTION
CvLvP
?
4
Common Understanding - Levels
• Conceptual – high-level, enterprise-wide, abstract model
• Physical – How data is stored in some database system
• Logical – adding detail to the conceptual model, … free of physical implementation details which do not contribute to the logical understanding of the data model.
– Often considered the ER or Relational Model.
CvLvP
Let’s look at the generic meaning of these terms, but first…
Generally depictedas a pyramid,
implying levelsof models:
Conceptual
Logical
Physical
See David HAY video
Page 3
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
5
Conceptual Data Model? Logical?
MAINTDIST Maintenance
District
COMMISSIONCommissioner
COMMWORKCommissionerHours Worked
COMREPORT Commissioners
Report
PETITIONPetition &
Lis Pendens
TRIALSETL Trial and
Settlement
IMPROVEMENT Improvementson R/W Parcel
OTHERBIDS Other Bids
SALESACTSales Action
EMDOMACTEm Domain Action: St vs.
COMASSIGN Commissioner Assignment
LEASE Lease
EDPARCTRK EmDom Parcel
Tracking
LESSEE Lessee
CONTRACTORContractor
REMOVCONT Removal Contract
FINALCERT Final
Certificate
COUNTYCounty
Num | Code... AGREEMENT Agreement
AUTHMAP Authorization
Map
CHARGEIDCharge
Identifier
RWPROJR/W PROJECT
900's or Dash # FEDPROJ Federal Project
PMSSPROJPMSS Project
PROJECTSProject Actions
COMORDACT CommissionersOrders Action
COMMORDER Commissioners
Order
PARTY NAD Party Name & Address
PARTY INT Party toInterest
INTHOLDER InterestHolder
APPRAISER Appraiser
APPRAISAL Appraisal
APPACTION Appraisal
Action & Cert
OCCATTRNY Occupant
Attorney NAD
MEMBERS Household Members
OCCUPANTOccupant
Relocation
RELOCPMTS Relocation
Payments & Appls
DIRPURCH Direct
Purchase
SUPHOUSINGSupplemental
Housing
3-5 1-4
5/yr
rare
rare10%
10%
rare
20%
rare
usually 1
rare <99
3%
0-2
?
ROADSECTRoad Section
Cty# |RS#
<.01
<- last
latest
V
<3
3%
2 if EG m if 88
Minnesota DOT Right of Way
Database Structure
Gordon C. Everest
LEGENDOne )----------E
Dependent -- --D -- --Orphan -- -- -- -- F -- --Foreign ID -- -- -- -- -->
DMODPRE
( many
N
PARCEL
Interest in aLand Parcel
6
PARCELInterest in a
Parcel of Land
Conceptual Data Model?DMODPRE
N
Page 4
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
7
Conceptual - Definition
CONCEPTUAL:
• Consisting of, relating to, concerned with…Concepts*; abstract.
• Concerned with the definitions or relations of concepts, rather than the facts.
Synonyms: theoretical, visual, imaginary.
Antonyms: real; facts.
*CONCEPT: • an idea of what something is or how it works;
something formed in the mind; a mental image.
-- Mirriam-Webster, Dictionary.com
If mental, how do we document, communicate?
If entity/object, relationship, identifier, domains – already logical?
If add attributes, foreign keys – now Relational.
CvLvP
8
Logical - Definition
LOGICAL• Of or according to the rules of logic or formal argument;
characterized by or capable of clear, sound reasoning.
Synonyms: natural, reasonable, sensible, understandable*
Logical Data Model – a model of some user domain**complete and understandable in the detail needed to represent that domain,
built according to and consistent with some formal modeling scheme,
within a defined scope.
**area of the business being modeled - real world, user world, domain of discourse, subject area, …
*Understandable - defined, documented, communicated.
CvLvP
Page 5
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
9
Physical Data Model
• How data will be encoded and stored
• Implemented in some data system (DBMS, NoSQL…)
• Dealing with storage & processing performance, volumetrics (time & space), partitioning, distribution.
Physical Data Model – a stored representation of a Logical data model
Physical vs. Logical separation• Historically, to better understand physically stored data existing
on punched cards, tape, etc. the notion of a logical representation was introduced to strip away storage considerations and focus on documenting just the logical aspects of the data.
• Logical derived from, a representation of… the Physical
CvLvP
10
AbstractionABSTRACTION* = “leaving something out”; Hiding
*Webster Dictionary: abstract. (n) summary; shortened version. abstraction. (n) the act of taking away
(v) to take out, remove something.(adj) as in abstract object (vs. concrete) - a different meaning, not useful here.
In Designing/Developing a Data Model: (can’t do it all at once)
• Start with high-level preliminary sketches (top down)Details are still presumed to be present, yet to be added
• Work on one part or subject area at a time• Could also start with some details (bottom up)• Once built, (how) do you maintain the Conceptual Model? Useful?
In Presenting a Data Model: (already completed in all its detail)
• Start with a high-level view, then successively add detail–> VERTICAL ABSTRACTION
• One part at a time –> HORIZONTAL ABSTRACTION
CvLvP
N
Page 6
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
11
Sample Data Model
MAINTDIST Maintenance
District
COMMISSIONCommissioner
COMMWORKCommissionerHours Worked
COMREPORT Commissioners
Report
PETITIONPetition &
Lis Pendens
TRIALSETL Trial and
Settlement
IMPROVEMENT Improvementson R/W Parcel
OTHERBIDS Other Bids
SALESACTSales Action
EMDOMACTEm Domain Action: St vs.
COMASSIGN Commissioner Assignment
LEASE Lease
EDPARCTRK EmDom Parcel
Tracking
LESSEE Lessee
CONTRACTORContractor
REMOVCONT Removal Contract
FINALCERT Final
Certificate
COUNTYCounty
Num | Code... AGREEMENT Agreement
AUTHMAP Authorization
Map
CHARGEIDCharge
Identifier
RWPROJR/W PROJECT
900's or Dash # FEDPROJ Federal Project
PMSSPROJPMSS Project
PROJECTSProject Actions
COMORDACT CommissionersOrders Action
COMMORDER Commissioners
Order
PARTY NAD Party Name & Address
PARTY INT Party toInterest
INTHOLDER InterestHolder
APPRAISER Appraiser
APPRAISAL Appraisal
APPACTION Appraisal
Action & Cert
OCCATTRNY Occupant
Attorney NAD
MEMBERS Household Members
OCCUPANTOccupant
Relocation
RELOCPMTS Relocation
Payments & Appls
DIRPURCH Direct
Purchase
SUPHOUSINGSupplemental
Housing
3-5 1-4
5/yr
rare
rare10%
10%
rare
20%
rare
usually 1
rare <99
3%
0-2
?
ROADSECTRoad Section
Cty# |RS#
<.01
<- last
latest
V
<3
3%
2 if EG m if 88
Minnesota DOT Right of Way
Database Structure
Gordon C. Everest
LEGENDOne )----------E
Dependent -- --D -- --Orphan -- -- -- -- F -- --Foreign ID -- -- -- -- -->
DMODPRE
PARCEL
Interest in aLand Parcel
( many
FOCUS
What could improve this presentation?N
CvLvP
12
2.a HORIZONTAL ABSTRACTION
- Partitioning
• Fencing off a part of the Diagram:– Often helpful to have some overlap of the partitions
APPRAISER Appraiser
APPRAISAL Appraisal
APPACTION Appraisal
Action & Cert
DIRPURCH Direct
Purchase
3%
0-2
DMODPRECvLvP
PARCEL
Interest in aLand Parcel
Page 7
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
13 DISTRICT
COMMISSIONERSORDERS
CONSTRUCTIONPROJECT
LEGALAGREEMENT
CONTRACTOR
AUTHORIZATIONMAP (GRAPHIC)
2.b DEPTH – Vertical Levels of Abstraction B
INTERESTHOLDER"OWNER"
APPRAISAL
INTEREST IN APARCEL OF LAND
DIRECTPURCHASE
OFFER
SUPPLEMENTALHOUSING PAYMENT
RELOCATIONOCCUPANT
CONDEMNATIONACTION
IMPROVEMENTS ONLAND PARCEL
IMPROVEMENT
REMOVAL CONTRACT
CONTRACTOR
SALES ACTION
OTHER BIDSAPPRAISER
APPRAISAL ACTION
APPRAISAL
CERTIFIED APPRAISAL
APPRAISER
NAME
ADDRESS
RATINGS
FEE RATES
APPRAISER:
ID NUMNAME, PERSONADDRESS, MAILINGPHONEALTPHONENAME-COMPANY (OPT)DATE OF LAST APPRAISAL (der)QUALIFICATION RATINGEVALUATION RATINGTESTIMONY RATINGHOURLY FEEWORK AGREEMENT NAMEXPIRATION DATE
DMODPRE
Drilling downon parts forincreasinglevels ofdetail.
How manyobjectshere?
N
CvLvP
Adding Attributes:
PARCEL OF LAND
14
HECB Student DatabaseDMODPRE
Is this how you would first present the data modelto your users?
What entity or entities are the most important?
FIRST:
What is missingfrom this diagram-- two things?
Page 8
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
15
HECB Student Database
Unfolding detail from the most important:
DMODPRE
PAST, PRESENTor PROSPECTIVE
STUDENT
POST-SECONDARYEDUCATIONAL
INSTITUTION
ENROLLMENT
PROGRAM(Degree/Diploma/Certif)
COMPLETION
FINANCIALAID
?
16
Student-Course High-Level Data Model
Start with major:
• Entities, and
• Relationships
COURSE INSTRUCTOR
STUDENT
DMOD
Page 9
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
17
Student-Course Data Model
Adding Intersection Entities:
• to resolve M:N Relationships
• to store additional attributes
REGISTRATIONCOURSEOFFERING
in >
DMOD
COURSE
STUDENT
INSTRUCTOR
18
Student-Course Data Model
• Adding AttributesSTUDENT
REGISTRATIONCOURSEOFFERING
COURSE INSTRUCTOR
StudentID
Name
Address
Major
Course#
Title
Credits
Grade
Year
Term
Section
Building
Room
Days-of-week
TimeStart
TimeEnd
SSN
Name
Address
Phone
Dept
in >
DMODPREDMOD
Page 10
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
19
Extended Student-Course Data Model B5
STUDENT
REGISTRATIONCOURSEOFFERING
COURSE INSTRUCTOR
StudentID
Name
Address
Major
Course#
Title
Credits
Year
Term
Section
Building
Room
Days-of-week
TimeStart
TimeEnd
SSN
Name
Address
Phone
Dept
in >
DMODPREDMOD
DEPTDeptNo
Name
TEXTBOOK
?AUTHOR
ISBN
Title
Author(s)
Grade
Description
Office Number
Course# (FK)
StudentID (FK)
CourseOffID (FK)
(City, State, Zipcode)
(FK)
20
Student-Course Database - Table Diagram
COURSECourse# TitleDescriptionCredits
INSTRUCTORSSNLastNameFirstNameAddressPhoneDept
STUDENTStudent IDNameAddressMajorGPA
COURSEOFFERINGCourse#YearTermSectionBuildingRoomDaysTime StartControlEnrollmentInstructor SSN
REGISTRATIONCourse IDStudent IDGrade
LEGEND:
ENTITY NAME (upper case)
Identifier (bold face)
Attributes (not bold face)
Foreign Key IdentifierM:1 relationship
Adding Attributes & FKeys. Diagram of the Schema:
What if you move the arrow head to the other end of the arc?
DMOD
Page 11
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
21
ORM Data Model - Presentation
BOSS
LIMITLIMIT
SKILL
RATING
EMPLOYEEDEP
Tworks in employs
supervises is headed by
reports to superior to
may spend up to of spending for
with proficiency of assigned to
possesses possessed by
"EmployeeSkill!"
1 .. 10
1000 .. 9999
ac
SALARY(dollars)
earns paid to
DESCRIPTION(name)
has is of <=5
DMODPRE
(number) (number)
(code)
A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottom-up design,
BUT… ER Diagrams usually hide the details of attributes and mostconstraints.
So, present the ORM model using a series of top-down unfolding … abstractions.
22
Abstractions of ORM Data Model
BOSS
LIMITLIMIT
SKILL
RATING
EMPLOYEEDEPT
works in employs
supervises is headed by
reports to superior to
may spend up to of spending for
with proficiency of assigned to
possesses possessed by
"EmployeeSkill!"
1 .. 10
1000 .. 9999
ac
SALARY(dollars)
earns paid to
DESCRIPTION(name)
has is of
<=5
DMODPRE
1. Hide "Terminal" (M:1) Objects (=> Attributes)
(number)(number)
(code)
2. Hide Reference Modes
3. Hide Constraints
4. Hide Less Important Objects & Predicates- Subtypes- Objectified Predicates- Reflexive Relationships
5. Hide all Predicates
Leaving BASE Entities!
6. Add back Multiplicitychar. on relationships
=> A High-level Abstract “Conceptual” Data Model...
an ER Diagram ?!!!
SKILL
EMPLOYEE DEPT
Is this the same data modelwe started with?
Page 12
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
23
Levels (or Stages?) of Data Models
• Reality - the real world User Domain, infinitely complex
• Mental Model - in our minds– must be formally documented so we can communicate it to others
• Conceptual Model - "natural", unconstrained, initial.– independent of physical storage and implementation
• Logical Model - according to a modeling scheme– e.g., the E-R or Relational Model (most popular today)
• Physical Model - defining storage characteristics– Encoding, storage structure and access methods (indexes, etc.)
• Implementation Model - for a given DataStore Manager– memory organization (blocking, buffering, partitioning,
distribution, etc.)
DMODCvLvP
24
Objective of Data Modeling
(WHAT we are trying to do)
TO ACCURATELY AND COMPLETELY MODEL
SOME PORTION OF THE REAL WORLDUNIVERSE OF DISCOURSE (UoD)
(the USER DOMAIN)
OF INTEREST TO SOME ORGANIZATIONOR COMMUNITY OF USERS.
DMOD
Page 13
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
25
Modeling: is Choosing...
• SCOPE / Boundary- where to look
• FOCUS- what to look for
• DEPTH / Resolution- how much detail to look for
... based upon our PURPOSE
DMOD
REALITY is Infinite, Complex, Multidimensional, Detailed.
- so we must CHOOSE:
26
A Model is an Abstraction DMOD
Both realities are infinitely complex.
NEED some constructs to look for and use in modeling.
Sometimes we have the data, and try to find what it means.
Abstract
“Conceptual” View
of the Real World
Concrete Symbols
Stored on some Medium
“Logical”
“DATA” MODEL
MentalModel
Physical (Storage)
Model
REALIZATION
TWO PERSPECTIVESof Data:
CvLvP
Page 14
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
27
Modeling Process – the HOW
MODEL = Abstract (Re).present.(ation)
Knowledgein the world(infinitely complex)
Knowledgeexternalized,formalized, shared.
Knowledgein the head(mental models)
Reality MODELMODELINGPROCESS
DMOD
pres
ent
Re .
pres
ent
What drives or guides the process?
28
The Modeling Process
Real WorldUniverse of Discourse
MODELINGPROCESS
MODELING SCHEMEContext
ConstructsCompositionConstraints
MODEL
perception
selection/filtering
DMOD
METHODOLOGY:
Steps/Tasks + Milestones + Deliverables +
REPRESENTATIONAL FORMS:Narrative, Graphical Diagram,Formal Language Statements
(the Syntax)The Semantics are most important
The SEMANTICS of a data modelcan only be seen through the presentation, the SYNTAX.N
Page 15
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
29
Data Model to Database Realization
DATAMODEL
DATABASE"Schema"
DEFINITION
DatabaseDefinitionLanguage
DATABASE
DataBaseManagementSystem
DataBaseManagementSystem
DATABASEDEFINER
describesdata
Input&
Query
DDL
stmts
DMOD
30
Data Modeling Schemes
• ALL data modeling activity and data management tools are driven or guided by some Data Modeling Scheme
• Think of it as a Meta Model (or Meta-Meta-Data)
• Logical Rules however formal or informal
• May be developed independently of any implementation Not based on any particular implementation
• Many variants within families
• Tells you what to look for, what constructs to use, how to put them together (compose) with what constraints, and how to represent that all syntactically.
CvLvP
Since all data modeling is driven by some modeling scheme,i.e., by some logical rules for building a model,
All data models are logical models!
Page 16
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
31
The Many Faces of Databases B
Logical Database Structures
Flat File(FORTRAN)
HierarchicalFile (COBOL)
CODASYL
Network
Object-Oriented(UML)
Multi-Dimensional“Star” schema
Fact Modeling(ORM)
Relational
ANSI SQL
DMOD
1
2
3 4
6
78
Snowflake
5
What do all these
have in common?----->
5
7s
SingleFile
(1)
DatabaseMulti-File
(M)
NoFile
(0)
© Gordon C. Everest
32
Data Modeling Schemes – Types & Examples
• Developed along generation linesSCHEME Examples
Flat file Fortran (1956?), spread sheet (VisiCalc, Multiplan>Excel, Quattro)
Hierarchy COBOL*(1960), System 2000, HQL*
NetworkO-O ext.
CODASYL*(1971), IDMS, ANSI-NDL*, IMS (DL/1), AdabasOO-COBOL*, ANSI-SQL:1999*, UML*
E-R Chen*(1976), IE*(Finkelstein), Barker, IDEF1X*(ERwin), ER Studio,
Relational(SQL)
Codd*(1970), SEQUEL*(1976), Oracle (1979), DB2, ANSI-SQL*(1986), Sybase, SQL Server, Dbase II
(Inverted) Fully indexed - not really a logical scheme, Model 204, CASE 360(IBM)
Dimensional As a “Cube”: EXPRESS(6) (MDS>IRI>Oracle), Multiplan(3), MicroStrategyAs a Relational Model = Star Schema*(R. Kimball), Red Brick
Fact-Based NIAM*(1976), ORM*(1989), NORMA, FCO-IM
NoSQL A family of tools to overcome the limitations of SQL tools.Each tool has its own modeling scheme – key-value pair, columnar, document (XML, Hierarchical), graph (nodes & edges).
*initially not an implementation but a concept paper or language specification
CvLvP
Page 17
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
33
Modeling Schemes in NoSQL Tools
• NoSQL refers to a family of tools designed to handle Big Data, “Unstructured” Data, Fast(er than SQL tools)
• Based on particular data storage schemes
• Vendors have augmented their Relational/SQL tools with some of these storage schemes, and other models
• One driver is OO programming languages which handle objects of varying structure and complexity.Mapping OO to a relational structure is inadequate.
PhysicalScheme Examples
Key-Value pairValue - any complex structure
Dynamo, Redis, Riak, LevelDB- an index
Graph (O1 O2 R – triplets) Neo4j, OrientDB, Infinite Graph, Mark Logic
“Document” (XML, JSON) MongoDB, CouchBase, Mark Logic
(Wide) Column storesInverse (“dual”) of Tables
Cassandra, HBase
CvLvP
34
Criteria for a Data Modeling Scheme
• Simple, understandable – for human communication
• Comprehensive – can model every phenomenon in theuser domain, e.g., overlapping populations ==> generalization
• Direct – visually intuitive, unambiguouse.g., “Fork” for manyness <
– without spurious, artificial, intermediate constructse.g, intersection entity (for M:N), foreign key (redundant with an arc)
• Minimal – at most one way to model a given phenomenon
• Consistent – uses same syntax for similar phenomenone.g., for dependency within a record, between records, and S/Stypes
• Universal – independent of language
DMOD
Page 18
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
35
Outward Facing vs. Inward FacingOutward Facing – to the business user domain
Inward Facing – to existing, stored data
CvLvP
• Historically we had data on punched cards or paper tape, and needed a representation which transcended its physical storage, hence, logical data models. Still inward facing.
• Next we found logical models too complex and needed to simplify, particularly at the beginning stages of development, hence conceptual data models, even before logical models.
• Then we realized that these models were really representations of things in the business user domain, hence outward facing.
• The modern approach to data modeling is to begin by modeling user domains independent of any physical storage or implementation considerations.
• More recently, we collect massive amounts of data (BIG data), it exists. Now the challenge is to process it efficiently (hence NoSQL tools), and apply analytics to make sense of it.
NOTE: Someone designed the stored data, so where are the definitions?
36
Data Model – Outward Facing
Initially a data model is outward facing, to the businessCvLvP
• Whether modeling big data, fast data, thick data, NoSQL data, Relational data (SQL) … (these are all representations for physical implementation) you still need to know, understand and document the business.
• The “first stage” data model is a fully detailed model of the business independent of any physical implementation
BUT… capturing rich, detailed semantics which describe the user domain in the model.
Page 19
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
37
“Data” Modeling is NOT about:
• Scope – what do we mean by “enterprise-wide”?
• Simplified, high-level, abstract, “conceptual”– Whether in data model development or– A matter of presentation, choosing to hide detail.
• Syntax – chosen notation to represent a Data Model– Data Modeling is about Semantics – meaning– The same semantic can have several notations
- e.g., multiplicity in a relationship: --<(fork), ‘M’, *, -->>
• Storage, Physical Implementation, Performance– Only exogenous information to represent the user domain– No unnecessary, artificial, spurious constructs introduced
on the path to implementation, e.g., FKey, 1NF, entity records!
– User-facing, NOT database/datastore-facing
Though these are all important aspects of Data Modeling.
CvLvP
38
What to Call our model?
• Conceptual is a scoping and presentation issue
• Physical is not part of the Data Model
• Logical is what we are left with. But all models are logical!
• Data Model - but not always a model of data,particularly when the database has not yet been built
None of these adjectives are helpful when referring to data models so…
How do we distinguish types of data models?
What do we call the initial complete data model?
CvLvP
Page 20
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
39
A Business Data Model
For our initial but complete, detailed data model• Capturing all exogenous* information about the user
domain which is of interest
• Capturing only exogenous information about the user domain
• Our mental models need to be externalized, and formally documented to be communicated.
• Hence, we need a modeling scheme with a rich Syntax to represent the Semantics of the user domain in the model
• Devoid of anything relating to physical storage, technology, encoding, implementation, etc.
CvLvP
* relating to, developed or derived from external factors; originating from outside
Let’s call it a “Business Data Model” (G. Witt)Halpin calls it the Conceptual Data Model.
40
Introducing Design Elements
As we move through the continuum of:
Conceptual ==> Logical ==> Physical
CvLvP
• How to rationalize the many differences and alternatives in logical data models?
• Logical data models differ based on which modeling elements are included in the modeli.e. the modeling scheme
SO
• Let’s lay out the various design elements in a precedence graph reflecting order of introduction
Page 21
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
41
Data Modeling Constructs
ENTITY(OBJECT)
ATTRIBUTE(Data Item)
RELATIONSHIP
IDENTIFIER [ FOREIGN KEY ]
characteristics
characteristics
What to look for: Relative emphasis differentiates Data Modeling Schemes
• ER modeling focuses on Entities and Relationships,de-emphasizing, even hiding Attributes.
• Relational (restricted ER, 1NF) focuses on Entities and Attributes,relegating Relationships to Foreign Keys.
• Object Role Modeling (ORM) folds Attribute and Entity into Object
DMOD
VALUEWhat about ?
N
42
Traditional “Levels” of Data Models
MentalModel Database
RealityUser Domain
N
| | | |
Managed
Datastore
CvLvP
Let’s forget levels, and focus on the ordering of design elements
Are intersection/associative entities or foreign keyspart of the logical model or the physical model?
Conceptual
Logical
Physical
Page 22
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
43
Introducing Elements of a Data Model
MentalModel Database
Point at which we have all the essential exogenous semantic
information from theuser domain to build a complete data model
=====>
=====> Point from which we begin to introduce additional data elements to physically implementthe data model (build a database).
Continuum for introducing
Elements of a Data Model
What to call it?_________ Data Model
RealityUser Domain
All of these are logicalbecoming physical.
N
CvLvP
Business
44
Data Modeling Constructs
• “Things” – objects, entities, attributes– Names of things– Populations (or types) of things, Domains (of values)– Subtypes/Supertypes to model overlapping populations
• Relationships– Names of relationships, Names of Object Roles– Characteristics/constraints (dependency, multiplicity)– Ternary+++ relationships
• Entity records– Clustered attributes (based on relationships)
• Identifiers– Encoded representation of instances of things (IDs)– Keys, Foreign keys
CvLvP
________________________________Business Data Model.moving to implementation
Page 23
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
45
SCOPE
User Narratives
SENTENCES“FACTS”
NOUNS
VERBS
OBJECTInstances
OBJECTTYPES
OBJECTNAMES
OBJECTPopulation
CONSTRAINTS
CONSTRAINTSdefined after
element introduced
ObjectIDENTIFIERS
SUB/SUPERTYPES
S/STYPECONSTRAINTS
RELATIONSHIPTYPES
RelationshipNAMES
MULTIPLICITY
DEPENDENCY
Relationship& Role Set
CONSTRAINTS
Fact Modeling
Stages of Data Modeling: Introducing Data Elements or Constraints
OBJECTROLE
NAMES
CvLvP
?
CLUSTERING
46
SCOPE
User Narratives
SENTENCES“FACTS”
NOUNS
VERBS
OBJECTInstances
OBJECTTYPES
OBJECTNAMES
OBJECTPopulation
CONSTRAINTS
CONSTRAINTSdefined after
element introduced
ObjectIDENTIFIERS
DATA TYPES
SUB/SUPERTYPES
S/STYPECONSTRAINTS
RELATIONSHIPTYPES
RelationshipNAMES
MULTIPLICITY
DEPENDENCY
CLUSTER
ENTITYRecords
ATTRIBUTES
KEYS FOREIGNKEYS
DENORMALIZE
INDEXES
COLUMN NAMES
RESOLVEto Tables
RelationalTABLES
Relationship& Role Set
CONSTRAINTS
( 1NF )
BINARYonly
Fact ModelingER/RelationalModeling
PhysicalModeling
Stages of Data Modeling: Introducing Data Elements or Constraints
A B C
ROLENAMES
PARTITIONING
DISTRIBUTION
CvLvP
?
Page 24
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
47
Observations on the Diagram
• Showing the precedence ordering of the introduction of data modeling elements
• At some point we have all the exogenous semantic information needed to complete the model. Up to that point our model is outward or user-facing
• Anything introduced later is physical realization or implementation, i.e., inward facing.
Stages:
• Fact Modeling
• >> controversial elements in between
• ER / Relational Modeling
• Physical Modeling
• Implementation Modeling
CvLvP
48
Start with User Narratives
Begin with Statements from User Domain Experts (SMEs) within a defined, agreed upon scope; they are the primary source of knowledge about the world being modeled.Then find the Model Elements
• Analyze the Vocabulary in the User Narratives – develop agreed upon definitions
• Breakdown user narratives => into elementary fact sentences
• Extract the nouns => become Things or Objects
• Extract the verbs => become Relationships
• Extract other words/phrases => become Constraints
CvLvP
Page 25
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
49
constraint
verbVerbalize User Descriptions
GIVEN A DESCRIPTION FROM THE USER(S):ORMODLG
Famous Foods, a small, specialty food wholesaler, fills orders for restaurants. Customers have names, addresses, etc. An order can include several products. Products have unique SKU numbers, descriptions, manufacturer, etc. The company has one big warehouse with many rooms on several floors. Each product is stored in only one bin location in the warehouse, but it can change frequently. Multiple products may be stored in the same bin. Bin numbers are only unique within a room, hence the same number can be used in different rooms. Since the bin locations can be hard to find in a room (could be on a shelf, on the floor, in a cabinet or cooler, hanging from the ceiling, etc.), and the rooms can be hard to find in the warehouse (with many hallways, doors, tunnels, split levels, mezzanines, etc.), explicit location directions must be recorded for each room and for each bin in the room. Location information is a textual narrative and is used by the pickers who run around gathering the items to fill an order. Each product has its own standard price but it may be modified by applying a discount (a fraction) on any individual order. The discount can be different for each of the products on an order, and for the same product on different orders. The quantity of each product on an order is recorded ( it is not the quantity on hand or in inventory). Terms indicates the number of days during which a standard discount can be taken on the payment. The terms can vary from one customer to the next, and from one order to the next for the same customer.
noun
50
Establishing the Vocabulary
Before we can develop a data model we must first carefully define our terms so we can talk about it• From a business perspective
• By the user domain or subject matter experts– Listen to what they say, talking about the domain
• Initially will be fuzzy, with areas of disagreement– Requiring some discussion and negotiation to come to a common
understanding; and documenting that– The most difficult and important aspect of data modeling
Call it a business glossary or [data] dictionary?
However, a glossary is usually only for nouns (objects).
We also need to define the relationships - the mortar that holds the bricks (nouns) together… and constraints.
CvLvP
N
Page 26
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
51
Objects
• Encompass Entities,and Attributes… independent of entities described
• Derived from the nouns in the user narratives
• A single instance (of what population?)… or a population of individual instances
– e.g., given the noun ‘George’: until it is associated with a particular, defined population, it is just a string of characters
• Each Object Population – uniquely named
• Define the population, criteria for inclusion of members, how we know we have one, what’s not included, etc.e.g., does Employee include retired, suspended, laid off, contract, visitor, temp
• Not concerned (yet) with how the members are represented – identifiers - surrogate lexical encoding. e.g. Days of the Week, 7 members, one represented by – Tuesday, Tues, Tue, Tu, Mardi, Martes, …
N
CvLvP
52
Objects – 2
• Grouping individual instances into populations does not occur naturally in the real world.The designer chooses to include members based on some common characteristics for some purpose(s)
• By convention we name object type populations with a singular noun, makes it easier to build sentences
• NOTE: in general, individual object populations could be overlapping, i.e., an individual could be a member of multiple populationse.g., an Employee could also be a Customer or a ShareholderThis is handled using Subtype/Supertype constructs
USER NARRATIVES in the Domain of Discourse=> NOUNS => OBJECT instances
=> OBJECT TYPES => NAMES
CvLvP
Page 27
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
53
Subtypes/Supertypes
• Data modeling schemes assume Object Populations are strictly disjointi.e., an individual member of an Object Population cannot be a member of any other Object Population
• We know that is not always truee.g., Person can be Employee, Customer, and ShareholderIf these are modeled as separate populations, redundancy results which can lead to inconsistent data.Maintaining consistent data becomes a user responsibility
• S/Stype construct is used to formally represent overlapping populations. It only depends upon the nature of defined Object populations.
• Supertype is a generalization of its Subtypes.Several constraints can be defined on S/Stypes.
See Dataversity Webinar
OBJECT TYPES => SUBTYPE/SUPERTYPES=> S/Stype CONSTRAINTS
CvLvP
54
Relationships
• “Connection” between or among members of one or more object populations.Arity = number of Roles played by Objects participating in the Relationship, e.g., Unary, Binary, Ternary, etc.
All valid X-A pairs(in the R/W)
X A(Binary) RELATIONSHIP TYPE:
RELATIONSHIP Instances:
X
A
What are the characteristics of the relationship ‘X-A’ ?
CvLvP
Page 28
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
55
Relationship Names and Object Roles
• Naming all Relationships and Roles not necessary,can use the Object Names as a default: Relationship “X-Y”
– But user narratives will include verb phrases to reference relationshipsmaking it easier to form sentences when talking about the user domain.
− Except if there are multiple relationships on X-Ye.g., “Employee works in Dept” and “Employee heads Dept” in which case the Employee plays the role of Boss in the “heads” relationship.NOTE: role order matters, e.g.,binary has two readings
− Except if the same Object type plays multiple rolese.g., “Person is parent of Person” then must name the relationship or distinguish the roles as Parent and Child
• Object Role names are nouns within context of a relationship
USER NARRATIVES in the Domain of Discourse => OBJECT TYPES => => VERBS => RELATIONSHIP TYPES
=> NAMES => ROLE NAMES
CvLvP
56
Constraints on a Relationship
• The defaults are the least constrained– Multiple
- every object instance may participate more than oncee.g., many-to-many (M:N) for a binary relationship
– Optional- every object instance need not participate in the relationship
• The Constraints would be: (the opposites)– Exclusive – at most one
– Dependent (Mandatory, Required …) – at least one
Many different notations, sometimes confusing.Combination called ‘Cardinality’ [min:max], a notational convenience
RELATIONSHIP TYPES => EXCLUSIVITY Constraint=> DEPENDENCY Constraint
CvLvP
Page 29
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
57
What is an Attribute?
An ATTRIBUTE is …ORMvER
What comes first?
Ω
N
playing a ROLE
with some (other) OBJECT.
in a RELATIONSHIP
an OBJECT...
of what?
RELATIONSHIPS => MULTIPLICITY => CLUSTERING => => ENTITY records => ATTRIBUTES
CvLvP
58
Data Modeling Constructs
ENTITY(Object)
ATTRIBUTE(Data Item)
RELATIONSHIP
IDENTIFIER [ FOREIGN KEY ]
characteristics
characteristics:
What to look for:DMOD
DOMAIN
What’s the difference?N
A Day of the Week:Tuesday, Tues, Tu, Mardi, Martes...
Page 30
CvLvP - May 31, 2016
© Gordon C. Everest, All rights reserved.
Presentation to DAMA, Minnesota, 2016/06Stages of Data Modeling
59
SCOPE
User Narratives
SENTENCES“FACTS”
NOUNS
VERBS
OBJECTInstances
OBJECTTYPES
OBJECTNAMES
OBJECTPopulation
CONSTRAINTS
CONSTRAINTSdefined after
element introduced
ObjectIDENTIFIERS
DATA TYPES
SUB/SUPERTYPES
S/STYPECONSTRAINTS
RELATIONSHIPTYPES
RelationshipNAMES
MULTIPLICITY
DEPENDENCY
CLUSTER
ENTITYRecords
ATTRIBUTES
KEYS FOREIGNKEYS
DENORMALIZE
INDEXES
COLUMN NAMES
RESOLVEto Tables
RelationalTABLES
Relationship& Role Set
CONSTRAINTS
( 1NF )
BINARYonly
Fact ModelingER/RelationalModeling
PhysicalModeling
Stages of Data Modeling: Introducing Data Elements or Constraints
A B C
ROLENAMES
PARTITIONING
DISTRIBUTION
CvLvP
?
60
Conceptual vs. Logical vs. Physical
Data Models
Questions?©Gordon C. Everest
Professor EmeritusCarlson School of Management
University of [email protected] http://geverest.umn.edu
GETITLE