201407 MIT CDO IQ conceptual data modeling, big data, and information quality
-
Upload
peter-okelly -
Category
Software
-
view
219 -
download
1
Transcript of 201407 MIT CDO IQ conceptual data modeling, big data, and information quality
A Start-up CDO Perspective on the Pivotal Role of Conceptual Data Modeling in Maintaining Information Quality in Big Data Domains
Peter O’KellyShopAdvisor Chief Data Officer
2014/07/24
2
Agenda: Modeling and IQ
• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations
3
Conceptual Modeling Overview
• Core concepts– Entity• A type of real-world thing of interest
– Attribute• An entity descriptor (characteristic)
– Relationship• A bidirectional connection between two entities
– Identifier• One or more descriptors (attributes and/or relationship
links) that together identify entity instances
4
An Example Model Fragment
5
Conceptual Modeling Overview
“Everything should be made as simple as possible, but no simpler.”
6
Conceptual Modeling Overview
• Modeling levels of abstraction– Conceptual, which is technology-neutral and used primarily
to help establish contextual consensus among modeling domain stakeholders
– Logical, which captures conceptual models in a technology rendering• Relational and (Web-centric) hypertext are the two most widely-
used logical models today
– Physical, which includes implementation-level details such as indexing and federation/sharding
• Bonus: well-formed conceptual data models are easily transformed into logical data models
7
Conceptual Modeling Overview
• Modeling artifact types– Documents (a.k.a. resources)
• Digital artifacts optimized to impart narrative flows (e.g., to share stories)
• Usually organized in terms of narrative, hierarchy, and sequence
– Databases (a.k.a. relations)• Application-independent descriptions of real-world things and
relationships between things• Examples include popular database domains such as customer,
sales, and human resources models• Databases are designed to be primarily used by applications and
tools (such as query/reporting tools)
8
The Bigger Modeling PictureDocuments Databases
Conceptual Documents and links; documents focused primarily on narrative,
hierarchy, and sequence
Entities, attributes, relationships, and identifiers
Logical Model: hypertextLanguage: XQuery (ideally…)
Model: extended relationalLanguage: SQL
Physical Indexing (e.g., scalar data types, XML, and full-text), locking and isolation levels (for transactions), federation, replication/synchronization, in-memory
databases, columnar storage, table spaces, caching, and more
9
Agenda: Modeling and IQ
• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations
10
Conceptual Modeling and IQ
• Pretty straightforward– If a team is not confident it has established
consensus about entities, attributes, relationships, and identifiers, there’s a good chance the people on the team collectively don’t know what they’re talking about
– If you don’t know what you’re talking about, it’s unlikely your data is going to be of high quality
11
Conceptual Modeling and IQ
• Big data and conceptual data modeling– Without sufficiently detailed conceptual data
modeling, big data market dynamics essentially represent new opportunities to cause more significant damage faster and for less money
– Common problems include• Homonyms and synonyms• Inconsistent data• Duplicated data• Inadequate access control and usage tracking
12
Conceptual Modeling and IQ
• This all probably seems pretty obvious, yet…– I am familiar with several big data projects that
failed because of insufficient conceptual data modeling
– Some people appear to believe that the advent of NoSQL and big data tools means it’s okay to deemphasize models and simply make copies of data, in order to simplify sharing• Which in many respects means reverting to c1969
programs/apps-have-files modus operandi
13
Agenda: Modeling and IQ
• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations
14
Reality Checks
• Some conceptual data modeling fallacies– Conceptual data modeling is easy– Conceptual data modeling is just for data nerds– Conceptual data modeling is expensive and time-
consuming– NoSQL and big data make data modeling
unnecessary
15
Reality Checks
• Fallacy: conceptual data modeling is easy– Creating conceptual data models is not easy
• Conceptual data modeling complexity is a function of the complexity inherent in the parts of the real world you seek to model– Useful conceptual data modeling techniques can be easy to
learn, however
– Reading conceptual data models is relatively easy• As long as the model diagrams are well-formed and
adequately documented– Tangent: super-detailed logical and physical data models are
rarely easy for non-geeks to understand
16
Reality Checks
• Fallacy: conceptual data modeling is just for data nerds– In reality, many data nerds race ahead to logical and
physical data modeling without first creating sufficiently precise conceptual models• And many application developers have reverted to a pre-
DBMS programs-have-files approach • “With great power comes great responsibility”…
– Business domain experts working with modeling experts can collaboratively create detailed conceptual data models without using complex and costly tools
17
Reality Checks
• Fallacy: conceptual data modeling is expensive and time-consuming– Limitations in and costs of some earlier database
design and data modeling tools often led to costly and protracted data model analysis and design cycles
– Open source advances and other market dynamics have produced several options for inexpensive (or free) modeling tools, and you don’t need to master all of the logical/physical features to effectively work with conceptual data models
18
Reality Checks
• Fallacy: NoSQL and big data technologies and techniques make data modeling unnecessary– Common – and incorrect – assertions include• Schemas are too rigid• Model-based development can’t be “agile”• Traditional database models are incompatible with
“Web scale” needs
– In many cases, these are symptoms of developers who simply dislike SQL (and/or XML)
19
Reality Checks
• Clearly still market demand for data modelers
20
Agenda: Modeling and IQ
• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations
21
Recommendations
• Develop conceptual data modeling skills• Don’t go on a quest for the perfect modeling
tool/framework• Build and share model collections
22
Recommendations
• Develop conceptual data modeling skills– Establish a core group of
modeling experts• Have them read “Mastering
Data Modeling”
– Have other stakeholders learn how to read and critique model diagrams
23
Recommendations
• Don’t go on a quest for the perfect modeling tool/framework– Many data modeling tools go deep on logical and
physical data modeling features that over-complicate conceptual modeling
– A whiteboard is often effective • Especially for collaborative and interactive modeling
sessions• Smartphone cameras may be the single most useful
modeling tool introduced during the last 15 years
24
Recommendations
• Build and share model collections– Application-specific models essentially mean
reverting to programs-have-files modus operandi– The industry has seen several waves of failed uber-
repository offerings over the last 20 years• As with “CASE” tools, many represented attempts to
over-reach
– Wikis can be effective for model sharing• Even for simply capturing, annotating, and sharing
photos of whiteboard diagrams
25
Discussion