201407 MIT CDO IQ conceptual data modeling, big data, and information quality

25
A Start-up CDO Perspective on the Pivotal Role of Conceptual Data Modeling in Maintaining Information Quality in Big Data Domains Peter O’Kelly ShopAdvisor Chief Data Officer 2014/07/24

Transcript of 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

Page 1: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

A Start-up CDO Perspective on the Pivotal Role of Conceptual Data Modeling in Maintaining Information Quality in Big Data Domains

Peter O’KellyShopAdvisor Chief Data Officer

2014/07/24

Page 2: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

2

Agenda: Modeling and IQ

• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations

Page 3: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

3

Conceptual Modeling Overview

• Core concepts– Entity• A type of real-world thing of interest

– Attribute• An entity descriptor (characteristic)

– Relationship• A bidirectional connection between two entities

– Identifier• One or more descriptors (attributes and/or relationship

links) that together identify entity instances

Page 4: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

4

An Example Model Fragment

Page 5: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

5

Conceptual Modeling Overview

“Everything should be made as simple as possible, but no simpler.”

Page 6: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

6

Conceptual Modeling Overview

• Modeling levels of abstraction– Conceptual, which is technology-neutral and used primarily

to help establish contextual consensus among modeling domain stakeholders

– Logical, which captures conceptual models in a technology rendering• Relational and (Web-centric) hypertext are the two most widely-

used logical models today

– Physical, which includes implementation-level details such as indexing and federation/sharding

• Bonus: well-formed conceptual data models are easily transformed into logical data models

Page 7: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

7

Conceptual Modeling Overview

• Modeling artifact types– Documents (a.k.a. resources)

• Digital artifacts optimized to impart narrative flows (e.g., to share stories)

• Usually organized in terms of narrative, hierarchy, and sequence

– Databases (a.k.a. relations)• Application-independent descriptions of real-world things and

relationships between things• Examples include popular database domains such as customer,

sales, and human resources models• Databases are designed to be primarily used by applications and

tools (such as query/reporting tools)

Page 8: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

8

The Bigger Modeling PictureDocuments Databases

Conceptual Documents and links; documents focused primarily on narrative,

hierarchy, and sequence

Entities, attributes, relationships, and identifiers

Logical Model: hypertextLanguage: XQuery (ideally…)

Model: extended relationalLanguage: SQL

Physical Indexing (e.g., scalar data types, XML, and full-text), locking and isolation levels (for transactions), federation, replication/synchronization, in-memory

databases, columnar storage, table spaces, caching, and more

Page 9: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

9

Agenda: Modeling and IQ

• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations

Page 10: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

10

Conceptual Modeling and IQ

• Pretty straightforward– If a team is not confident it has established

consensus about entities, attributes, relationships, and identifiers, there’s a good chance the people on the team collectively don’t know what they’re talking about

– If you don’t know what you’re talking about, it’s unlikely your data is going to be of high quality

Page 11: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

11

Conceptual Modeling and IQ

• Big data and conceptual data modeling– Without sufficiently detailed conceptual data

modeling, big data market dynamics essentially represent new opportunities to cause more significant damage faster and for less money

– Common problems include• Homonyms and synonyms• Inconsistent data• Duplicated data• Inadequate access control and usage tracking

Page 12: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

12

Conceptual Modeling and IQ

• This all probably seems pretty obvious, yet…– I am familiar with several big data projects that

failed because of insufficient conceptual data modeling

– Some people appear to believe that the advent of NoSQL and big data tools means it’s okay to deemphasize models and simply make copies of data, in order to simplify sharing• Which in many respects means reverting to c1969

programs/apps-have-files modus operandi

Page 13: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

13

Agenda: Modeling and IQ

• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations

Page 14: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

14

Reality Checks

• Some conceptual data modeling fallacies– Conceptual data modeling is easy– Conceptual data modeling is just for data nerds– Conceptual data modeling is expensive and time-

consuming– NoSQL and big data make data modeling

unnecessary

Page 15: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

15

Reality Checks

• Fallacy: conceptual data modeling is easy– Creating conceptual data models is not easy

• Conceptual data modeling complexity is a function of the complexity inherent in the parts of the real world you seek to model– Useful conceptual data modeling techniques can be easy to

learn, however

– Reading conceptual data models is relatively easy• As long as the model diagrams are well-formed and

adequately documented– Tangent: super-detailed logical and physical data models are

rarely easy for non-geeks to understand

Page 16: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

16

Reality Checks

• Fallacy: conceptual data modeling is just for data nerds– In reality, many data nerds race ahead to logical and

physical data modeling without first creating sufficiently precise conceptual models• And many application developers have reverted to a pre-

DBMS programs-have-files approach • “With great power comes great responsibility”…

– Business domain experts working with modeling experts can collaboratively create detailed conceptual data models without using complex and costly tools

Page 17: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

17

Reality Checks

• Fallacy: conceptual data modeling is expensive and time-consuming– Limitations in and costs of some earlier database

design and data modeling tools often led to costly and protracted data model analysis and design cycles

– Open source advances and other market dynamics have produced several options for inexpensive (or free) modeling tools, and you don’t need to master all of the logical/physical features to effectively work with conceptual data models

Page 18: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

18

Reality Checks

• Fallacy: NoSQL and big data technologies and techniques make data modeling unnecessary– Common – and incorrect – assertions include• Schemas are too rigid• Model-based development can’t be “agile”• Traditional database models are incompatible with

“Web scale” needs

– In many cases, these are symptoms of developers who simply dislike SQL (and/or XML)

Page 19: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

19

Reality Checks

• Clearly still market demand for data modelers

Page 20: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

20

Agenda: Modeling and IQ

• Conceptual modeling overview• Conceptual modeling and IQ• Conceptual modeling reality checks• Recommendations

Page 21: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

21

Recommendations

• Develop conceptual data modeling skills• Don’t go on a quest for the perfect modeling

tool/framework• Build and share model collections

Page 22: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

22

Recommendations

• Develop conceptual data modeling skills– Establish a core group of

modeling experts• Have them read “Mastering

Data Modeling”

– Have other stakeholders learn how to read and critique model diagrams

Page 23: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

23

Recommendations

• Don’t go on a quest for the perfect modeling tool/framework– Many data modeling tools go deep on logical and

physical data modeling features that over-complicate conceptual modeling

– A whiteboard is often effective • Especially for collaborative and interactive modeling

sessions• Smartphone cameras may be the single most useful

modeling tool introduced during the last 15 years

Page 24: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

24

Recommendations

• Build and share model collections– Application-specific models essentially mean

reverting to programs-have-files modus operandi– The industry has seen several waves of failed uber-

repository offerings over the last 20 years• As with “CASE” tools, many represented attempts to

over-reach

– Wikis can be effective for model sharing• Even for simply capturing, annotating, and sharing

photos of whiteboard diagrams

Page 25: 201407 MIT CDO IQ conceptual data modeling, big data, and information quality

25

Discussion