Big Data and Bad Analogies

69
Big Data, Bad Analogies GOTO Copenhagen September, 2014 Mark Madsen www.ThirdNature.net @markmadsen

description

Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means. This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities. Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.

Transcript of Big Data and Bad Analogies

Page 1: Big Data and Bad Analogies

Big Data, Bad Analogies

GOTO Copenhagen September, 2014 Mark Madsen www.ThirdNature.net @markmadsen

Page 2: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The problem with bad framing

s

Leads to bad assumptions about use, inappropriate features,

poor understanding of substitutability and the impacts it will have.

Page 3: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The data lake

Page 4: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The data lake after a little while

Page 5: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Data Exhaust

Page 6: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Data is the new oil

Page 7: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Reality: data is a choice

Page 8: Big Data and Bad Analogies

Copyright Third Nature, Inc.

“There is nothing new under the sun but there are lots of old things we don't know.”

Ambrose Bierce

Page 9: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Looking at past ways of organizing data

Page 10: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The Elizabethan Era

Commercial printing presses

Data management tech:

▪ Perfect copies

▪ Topical catalogs

▪ Font standardization

▪ Taxonomy ascends

Information explosion:

▪ 8M books in 1500

▪ 200M by 1600

▪ Commoditization

▪ Overload

Page 11: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Elizabethan Era Storage and Retrieval

Page 12: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Elizabethan Era Storage and Retrieval

Page 13: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Elizabethan Era Storage and Retrieval

Page 14: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The Georgian Era: The Explosion of Natural Philosophy

Page 15: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The Victorian Era

The powered printing information explosion:

▪ Card catalogs, cross-referencing, random access metadata

▪ Universal classification

▪ Extended information management debates

▪ Trading effort and flexibility for storage and retrieval

▪ Stereotyping

Page 16: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Melvil Dewey

Dewey Decimal System

Top down orientation

Static structure

Descriptive rather than explanatory

Taxonomic classification

Page 17: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Cutter Expansive Classification System (~1882)

Bottom up orientation

More flexible structure

Explanatory, descriptive

Faceted classification

Charles Ammi Cutter

Page 18: Big Data and Bad Analogies

Copyright Third Nature, Inc.

So why did Dewey beat Cutter?

Pragmatism

Good enough wins the day

It wasn’t solving the problem you thought it was.

X In every choice, something is lost when something is gained.

Page 19: Big Data and Bad Analogies

Copyright Third Nature, Inc.

What lessons does this history teach us?

1. Information requires organizing principles at multiple levels from items to collections.

2. Differences in scale require different principles.

3. At a key point in the adoption cycle, emphasis shifts from management of information to its dissemination and consumption.

First we record, then we use and share.

Like transaction processing and analysis.

Page 20: Big Data and Bad Analogies

Copyright Third Nature, Inc.

What has this to do with data and persistence?

“schema” is a broad term, a way of organizing and making something relatable and findable.

“Data” (or object) is to “Database”

as

“Books” are to “Library”

Page 21: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The printed became more important than the printer.

The book outlived generations of presses.

Just like data is now.

Which means we should pay attention to the broader organization and use of data, and persistence layers.

Page 22: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Order Entry

Order Database

Customer

Service

Interface Program

Inventory Database

Distribution

Interface Program

Receivables Database

Accounts

Receivable

Data Warehouse

Analysts &

users

Someone else always wants to use your data

Page 23: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Context (one company)

"In an infinite universe, the one thing sentient life cannot afford to have is

a sense of proportion.” – Douglas Adams

Page 24: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Monthly

Production plans

Weekly pre-

orders for

bulk cheese

Availability

confirmation

and location

In store system

Store

Stock

Management

Store EPOS

dataCategory

Supervis

or

Stock

adjustments/

order

interventions

Order

adjustment

Stock/order

interventions

*

*

Orders

(based on 6

day

forecast)

Dallas

Distrib Centre

WMS

Picking/load

teams

Pos/Pick

lists/Load

sheetsConfirmed

Deliveries/

Confirmed

picks +

loads

FarmersMilk intake/

silos Cheese plant

Plant

Processor

In-house Cheese

store

Contract Cheese

store

Processor

Packing plant

Processor

National

Distribution

Centre

Retailer

RDC

Retailer Stores

(550)

Retailer HQ

Consolidated

Demand

Ordering

Processor NDC

Customer

Services

Daily order -

SKU/Depot/

Vol

Sent @ 12.30-13:00

Delivery

orders

Processor HQ

Sales

Team/

Account

Manager

Processor HQ

Forecasting

Team

Processor HQ

Bulk Planning

Team

Cheese plant

Planner/Stock

office

Processor HQ

Milk

Purchasing

Team

Cheese plant

Transport

Manager

Actual

daily

delivery

figures

Daily

collection

planning

Weekly order for delivery to

Packing plant

Daily &

weekly Call-

off

Daily Call-off

15/day

22 pallet loads 15/day

A80

Shortages/

Allocation

instructions

Annual

Buying plan

Milk Availability

Forecast

Annual

prediction

of milk

production

Shortages/

Allocation

instructions

Daily milk

intake

Weekly milk

shortages

shortages

Spot mkt or

Processor

ingredients

Packing plant

Planning

Team

Processor HQ

JBA Invoicing

and

Sales Monitor

FGI and Last 5

weeks sales

Expedite

Changes

to existing

forecast -

exceptions

Retailer HQ

Retailer Buyer

Meeting

every 6

weeks

Packing plant

Cheese

ordering

10 day stock

plan

On line

stock info

7 day order

plan for bulk

cheese

Arrange

daily

delivery

schedule

Emergency

call-off

Daily

optimisation

of loads

Service

Monitor

Despatch and

delivery

confirmations

Processor NDC

Transport

Planning

Transport

Plan

Processor NDC

Inventory

MonitoringStock and

delivery

monitoring

Processor NDC

Warehouse

management

syatem

Operation

Instructions

Key

Shaded Boxes = Product flow system

Un-shaded boxes = Information flow system

Retailer Cheese ProcessorFarms

Schedule

weekly &

Daily

10 Day

plan(wed) and

daily plan

15/day

Changes

to existing

forecast -

exceptions

Stock

availabilityMonthly

review

Annual

f/cast

Source: IGD Food Chain Centre, February 2008

Context (multiple company supply chain) A value chain diagram, showing the data supply chain for cheese.

The side effects of a single bug can be massive.

Page 25: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Aside: technical debt: what those diagrams show

tek-ni-kuh l det: the cost that accrues due to decisions made in software design and coding.

Look at the choices and mistakes in development:

Intentional Unintentional

Purposeful

choices to

optimize

schedule,

budget,

satisfaction

Missed

requirements,

poor code

quality, poor

design

Page 26: Big Data and Bad Analogies

Copyright Third Nature, Inc.

It is a poor carpenter who blames his tools*

*but sometimes it is the tools

Page 27: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Technical Debt

The cost of some choices can be dealt with in the short term (e.g. the next sprint) and some only in the long term (redesign, start over)

Short term

Long term

Mostly about the

application code

Mostly about architecture,

design, and infrastructure

Page 28: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Code flaws

(i.e. bugs)

Design flaws

If you enter into decisions knowing the true nature of your coding alternatives, you will be better off

Green: these are deliberate, the tradeoffs known

Yellow : these are minor defects

Red: these are the things that kill a system

Short term

Long term

Intentional Unintentional

Code choices

Design

choices

Page 29: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Technical Debt can’t be avoided

Development

methods

Experience,

education

(but it can be managed)

Sometimes you think it’s intentional: incremental design

Long term debts can only be dealt with through planning

Short term

Long term

Intentional Unintentional

Agile

methods

Redesign

Page 30: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Technical Debt can’t be avoided

Let’s move this

to the left

What you believe about the technology underlying your system has a big influence on design choices, so the focus of this part is on architecture and design with the hope it will help reduce or avoid long term debt.

Short term

Long term

Intentional Unintentional

Page 31: Big Data and Bad Analogies

Copyright Third Nature, Inc.

How did we get here?

There’s a difference between having no past and actively rejecting it.

Page 32: Big Data and Bad Analogies

Copyright Third Nature, Inc.

MultiValue, Hierarchical

PICK

IMS

IDS

ADABAS

Relational

CODASYL

System R (SEQUEL)

SQL/DS

INGRES (QUEL)

Mimer

Oracle

RDBMS, SQL standard

DB2

Teradata

Informix

Sybase

Postgres

OODBMS, ORDBMS

Versant

Objectivity

Gemstone

Informix*

Oracle*

MPP Query, NoSQL

Netezza

Paraccel

Vertica

MongoDB

CouchBase

Riak

Cassandra

NewSQL

SciDB

MonetDB

NuoDB

CitusDB

1960s 1980s 2000s

1970s 1990s 2010s

Spanner

F1

Page 33: Big Data and Bad Analogies

Copyright Third Nature, Inc.

In the beginning: RMSs and pre-relational DBs

At first, common code libraries so there was reusability for file ops.

Problems:

▪ Portability across languages, OSs

▪ Queries of more than one file

▪ No metadata, what’s in there? Who wrote it?

▪ Concurrency

The databases brought things like recoverability, durability, ACID transactions. But they were rigid, prone to breakage.

Operations:

First

Next

Prev

Last

Page 34: Big Data and Bad Analogies

Copyright Third Nature, Inc.

What is

best in life?

Page 35: Big Data and Bad Analogies

Copyright Third Nature, Inc.

To crush the vendors,

see them driven

before you, and hear

the lamentation of

their salespeople.

Page 36: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Wrong!

Loose coupling.

Scalability.

Reusability.

Page 37: Big Data and Bad Analogies

Copyright Third Nature, Inc.

The miracle of pre-relational DB: schema

Loose coupling – the physical model of data structures and physical placement are no longer a program’s responsibility

Reusability – More than one program can access the same data, and no more custom coding for each application or OS

Scalability – Constraints of schema and typing reduce resource usage, have finer granularity for concurrent access, multiple online users.

Page 38: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Key schema flexibility tradeoff for data management

Global validation vs

contextual validation

=

Strict rules vs

lenient rules =

Write rules vs

read rules

Page 39: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Schema on write vs schema on read

Match the shape to the hole

or

Match the hole to the shape

Predicate schemas for write flexibility (agility) and speed

Page 40: Big Data and Bad Analogies

Copyright Third Nature, Inc.

“Flexibility” – a recent experience in query

The problem with many of these pr-relational databases is tight coupling between a program and data structures.

The physical model leaks into the logical with potentially career-ending effects if the DB is used for the wrong thing.

And it’s happening again.

It’s a poor carpenter who blames his tools. Or the users.

Page 41: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Trading for consistency for performance: CAP theorem

http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 42: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Services provided

Standard API/query layer*

Transaction / consistency

Query optimization

Data navigation, joins

Data access

Storage management Database

Database

Tradeoffs: In NoSQL the DBMS is in your code

SQL database NoSQL database

Application Application

Anything not done by the DB becomes a developer’s task.

Welcome to 1985!

Page 43: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Simplifying ACID vs BASE

Remember: it’s a poor carpenter who blames his tools.

Page 44: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Google on eventual consistency:

“F1: A Distributed SQL Database That Scales”, Proceedings of the VLDB Endowment, Vol. 6, No. 11, 2013

Page 45: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Party like it’s 1985

Fastest TPC-B benchmark in 1985 was IMS running on an IBM 370, 100 TPS, 400 iops, 30 iops/disk

The best relational vendors could muster was 10 TPS

25 years later, SQLServer on an Intel box ran the TPC-B at 25,000 TPS, 100,000 iops, 300 iops/disk

It looks like you’re

running a benchmark...

Page 46: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Joins. Wait, there’s more than one?

Wait, there’s more

than one?

1986: SQL, joins!

Page 47: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Hipster bullshit

I can’t get MySQL to scale

therefore

Relational databases don’t scale

therefore

We must use NoSQL* for everything *including Hadoop and related

Page 48: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Enumerate logically equivalent plans by applying

equivalence rules

For each logically equivalent plan, enumerate all

alternative physical query plans

Estimate the cost of each of the alternative physical

query plans

Run the plan with lowest estimated overall cost

2

1

3

4

Logical vs physical is an important thing. The CBO turns a SQL query into an optimal* execution plan for a parallel pipelined dataflow engine.

Diagram: David J. DeWitt

The relational gift: declarative language + CBO

Page 49: Big Data and Bad Analogies

Copyright Third Nature, Inc.

A simple 3 table join: a programmer’s job?

SELECT C.name, O.num

FROM Orders O, Lines L, Customers C

WHERE C.City = “Copenhagen” AND L.status = “X”

AND O.num = L.num AND C.cid = O.cid

Number of logical plans: 9

Ways to join (hash, merge, nested): 3

For each plan, there are multiple physical plans: 36

That makes a total of 324 physical plans, the efficiency of which changes based on cardinality.

Page 50: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Scalability? ZOMG just add nodes!

"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.“ – Henry Peteroski

After all, your database is web scale, isn’t it?

Page 51: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Just add hardware?

No amount of hardware will make incorrectly coded software run in parallel.

Declarative languages make this easier by turning the problem over to the computer to resolve.

Guess which runs in parallel:

Open cursor C

Loop (Fetch row C)

Open cursor O

Loop (Fetch row O)

Open cursor L

Output (Do-things)

End loop

End loop ...

SELECT C.name, O.num

FROM Orders O, Lines L,

Customers C

WHERE C.City = “Aarhus” AND

C.cid = O.cid AND O.num =

L.num AND L.status = “X”

Page 52: Big Data and Bad Analogies

Copyright Third Nature, Inc.

A more realistic example: TPC-H query #8 Select o_year,

sum(case

when nation = 'BRAZIL' then volume

else 0

end) / sum(volume)

from

(

select YEAR(O_ORDERDATE) as o_year,

L_EXTENDEDPRICE * (1 - L_DISCOUNT) as volume,

n2.N_NAME as nation

from PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION n1,

NATION n2, REGION

where

P_PARTKEY = L_PARTKEY and S_SUPPKEY = L_SUPPKEY

and L_ORDERKEY = O_ORDERKEY and O_CUSTKEY = C_CUSTKEY

and C_NATIONKEY = n1.N_NATIONKEY and n1.N_REGIONKEY = R_REGIONKEY

and R_NAME = 'AMERICA‘ and S_NATIONKEY = n2.N_NATIONKEY

and O_ORDERDATE between '1995-01-01' and '1996-12-31'

and P_TYPE = 'ECONOMY ANODIZED STEEL'

and S_ACCTBAL <= constant-1

and L_EXTENDEDPRICE <= constant-2

) as all_nations

group by o_year order by o_year

22 million possible execution plans, please find the

best one in 4 milliseconds

Page 53: Big Data and Bad Analogies

Copyright Third Nature, Inc.

0%

100%

200%

300%

400%

500%

600%

700%

800%

900%

1000%

1 2 3 4 5 6 7 8 9 10

10% overhead Linear

Just add hardware?

Number of processors

Speedup

(amdahl’s law)

Page 54: Big Data and Bad Analogies

Data Warehouse +

Behavioral

Singularity

Data Warehouse

Semi-Structured

SQL++

Structured

SQL

Low End Enterprise-class System

Contextual-Complex Analytics

Deep, Seasonal, Consumable Data Sets

Production Data Warehousing

Large Concurrent User-base

+ + 150+

concurrent users

500+ concurrent users

Enterprise-class System

5-10 concurrent users

Unstructured

Java/C Structure the Unstructured

Detect Patterns

Commodity Hardware System

6+PB 40+PB 20+PB

Hadoop

Analyze & Report Discover & Explore

Parallel Efficiency and Platform Costs

Page 55: Big Data and Bad Analogies

Platform Metrics for Table Scan and Sum, Hadoop vs Teradata

Page 56: Big Data and Bad Analogies

Copyright Third Nature, Inc.

There are really three workload classes to consider

1. Operational: OLTP systems

2. Analytic: Query systems

3. Scientific: Computational systems

Unit of focus:

1. Transaction

2. Query

3. Computation

Different problems require different platforms

Page 57: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Workloads

OLTP BI Analytics

Access Read-Write Read-only Read-mostly

Predictability Fixed path Unpredictable All data

Selectivity High Low Low

Retrieval Low Low High

Latency Milliseconds <seconds msecs to days

Concurrency Huge Moderate 1 to huge

Model 3NF, nested object Dim, denorm BWT

Task sizw Small Large Small to huge

Page 58: Big Data and Bad Analogies

Copyright Third Nature, Inc.

A key point worth remembering:

Performance over size <> performance over complexity

OLTP performance is mostly related to transaction coordination and writing under high concurrency.

BI performance is mostly related to data volume and query complexity.

Analytics performance is about the intersection of these with computational complexity.

Page 59: Big Data and Bad Analogies

Copyright Third Nature, Inc.

A history of databases in No notation

1970s: NoSQL = We have no SQL

1980s: NoSQL = Know SQL

2000s: NoSQL = No SQL!

2005s: NoSQL = Not only SQL

2010s: NoSQL = No, SQL!

(R)DB(MS)

Page 60: Big Data and Bad Analogies

Copyright Third Nature, Inc.

TANSTAAFL

Technologies are not perfect replacements for one another. Often not better, only different.

When replacing the old with the new (or ignoring the new over the old) you always make tradeoffs, and usually you won’t see them for a long time.

Page 61: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Unintended consequences

Unintended consequences

Page 62: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Away from “one throat to choke”, back to best of breed

Tight coupling leads to slow changes. The market is not in the tight coupling phase

In a rapidly evolving market, componentized architectures, modularity and loose coupling are favorable over monolithic stacks, single-vendor architectures and tight coupling.

Page 63: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Don’t follow the market

Some people can’t resist getting the next new thing because it’s new and new is always better.

Many IT organizations are like this, promoting a solution and hunting for the problem that matches it.

Better to ask “What is the problem for which this technology is the answer?”

Copyright Third Nature, Inc.

Page 64: Big Data and Bad Analogies

Copyright Third Nature, Inc.

How we develop best practices: survival bias

We don’t need best practices, we need worst failures. Copyright Third Nature, Inc.

Page 65: Big Data and Bad Analogies

Copyright Third Nature, Inc.

Summarizing some key points

1. Data lives longer than code. It pays to focus on data when choosing and using persistence layers.

2. Different persistence layer approaches have different tradeoffs (like performance vs app complexity in BASE models).

3. Don’t throw away 30 years of engineering unless know why it was put there.

4. Decoupling, reusability (of data) and scalability are nice things to have.

Page 66: Big Data and Bad Analogies

Copyright Third Nature, Inc.

References (things worth reading on the way home) A relational model for large shared data banks, Communications of the ACM, June, 1970, http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

Column-Oriented Database Systems, Stavros Harizopoulos, Daniel Abadi, Peter Boncz, VLDB 2009 Tutorial http://cs-www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf

Nobody ever got fired for using Hadoop on a cluster, 1st International Workshop on Hot Topics in Cloud Data ProcessingApril 10, 2012, Bern, Switzerland.

A co-Relational Model of Data for Large Shared Data Banks, ACM Queue, 2012, http://queue.acm.org/detail.cfm?id=1961297

A query language for multidimensional arrays: design, implementation and optimization techniques, SIGMOD, 1996

Probabilistically Bounded Staleness for Practical Partial Quorums, Proceedings of the VLDB Endowment, Vol. 5, No. 8, http://vldb.org/pvldb/vol5/p776_peterbailis_vldb2012.pdf

“Amorphous Data-parallelism in Irregular Algorithms”, Keshav Pingali et al

MapReduce: Simplified Data Processing on Large Clusters, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/mapreduce-osdi04.pdf

Dremel: Interactive Analysis of Web-Scale Datasets, Proceedings of the VLDB Endowment, Vol. 3, No. 1, 2010 http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36632.pdf

Spanner: Google’s Globally-Distributed Database, SIGMOD, May, 2012, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/es//archive/spanner-osdi2012.pdf

F1: A Distributed SQL Database That Scales, Proceedings of the VLDB Endowment, Vol. 6, No. 11, 2013, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/41344.pdf

Page 67: Big Data and Bad Analogies

Copyright Third Nature, Inc.

About the Presenter

Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business use of data and analytics. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and on the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net

Page 68: Big Data and Bad Analogies

Copyright Third Nature, Inc.

About Third Nature

Third Nature is a research and consulting firm focused on new and

emerging technology and practices in business intelligence, analytics and

performance management. If your question is related to BI, analytics,

information strategy and data then you‘re at the right place.

Our goal is to help companies take advantage of information-driven

management practices and applications. We offer education, consulting

and research services to support business and IT organizations as well as

technology vendors.

We fill the gap between what the industry analyst firms cover and what IT

needs. We specialize in product and technology analysis, so we look at

emerging technologies and markets, evaluating technology and hw it is

applied rather than vendor market positions.

Page 69: Big Data and Bad Analogies

Copyright Third Nature, Inc.

CC Image Attributions

Thanks to the people who supplied the creative commons licensed images used in this presentation:

bookshelf by spectrum.jpg - http://flickr.com/photos/santos/1704875109/

Vatican library - http://www.flickr.com/photos/paullew/1550844955

round hole square peg - https://www.flickr.com/photos/epublicist/3546059144

text composition - http://flickr.com/photos/candiedwomanire/60224567/

twitter_network_bw.jpg - http://www.flickr.com/photos/dr/2048034334/

round hole square peg - https://www.flickr.com/photos/epublicist/3546059144

glass_buildings.jpg - http://www.flickr.com/photos/erikvanhannen/547701721 CAP diagram - http://blog.nahurst.com/visual-guide-to-nosql-systems

refinery-hdr.jpg - http://www.flickr.com/photos/vermininc/2477872191/

refinery-night.jpg - http://www.flickr.com/photos/vermininc/2485448766/