Post on 09-Jul-2015
description
Relationships
The [Missing] Link
Michael P. Meier
WhiteLake Data Management
What I’m Going to Tell You
Data Architects ignore relationshipsThis includes all types– Inter-personal– Logical– Physical (data)
We’re suffering without relationshipHere’s what we need to do.
Modeling Formalisms
ERA (entity-relationship-attribute and variants)
UML (unified modeling language)ORM (object-role model)
None today give more than lip service
It wasn’t always so
Peter Chen (1976)* represented relationships as diamond-shaped
Two-dimensional symbols gave relationships equal weight with entities
3NF relational schemas were the goal
*1976 article published in ACM’s Transactions on Database Systems
Chen Diagram
doctor doctor
consultation
Dr-pt
residency
Chen Diagram Suggests
Three different relationships exist between two doctors
An implementation consisting of three tables to represent the relationships
The relationships may have additional properties that are different– Dates (start, end)– Relationships to other entities– Rotation, Diagnosis, Recommendation,…
“Modern” Interpretation
Doctor Doctor
or
Doctor
We usually don’t even get names on the “relationships.”
Which Suggests
Three foreign key references to Doctor within the Doctor table
Meaning that– Additional relationships and attributes
are lost.– Or, 3NF is lost (by including
information about the relationship in the Doctor entity)
More Importantly
The relationship is a detail– A connector on a diagram
When it should be a core piece of intelligence about the business
The majority of business rules are about relationships.
Models and applications suffer from trivializing relationships
Even Worse
Doctor
This is what we usually see.
This is the barest hint about the possible existence of relationships
Effect on Data Quality
Inattention to relationships leads to the worst sort of data quality problem.
The “I meant” problem– A foreign key column in a table contains
values that share• Data type• Domain• Range
– But were placed there for different reasons• Therefore have different semantic values
Undefined Relationships
Result in poorly designed user interfacesUndocumented rulesReporting nightmaresData degradationUnattributed or improperly attributed costs
to the business
Suggestions Always name relationships—including role names. Supply a description for the relationship Always model relationships as entities. Investigate them. They probably have attributes and
possibly relationships.– Events are really relationships– Often n-ary relationships (more than two participants)
Think, really think, before you “denormalize” your physical model– At least start with a normalized model– Many use “denormalize” as an excuse not to do the
work.
There is nothing simple about a relationship.
Ask your spouse, significant other, friends, neighbors…
Relationships are continuously being renegotiated (changing).
Give them attention up front or pay dearly later.