A brief history of data processing

51
A Brief History of Data Processing @garyorenstein Deckset Theme - Next White (c) Gary Orenstein 1

Transcript of A brief history of data processing

Page 1: A brief history of data processing

A Brief Historyof Data Processing

@garyorenstein

Deckset Theme - Next White

(c) Gary Orenstein 1

Page 2: A brief history of data processing

In The Beginning

(c) Gary Orenstein 2

Page 3: A brief history of data processing

Computers and Data

(c) Gary Orenstein 3

Page 4: A brief history of data processing

Accounting Transactions

(c) Gary Orenstein 4

Page 5: A brief history of data processing

Financial Transactions

(c) Gary Orenstein 5

Page 6: A brief history of data processing

Inventory Management

(c) Gary Orenstein 6

Page 7: A brief history of data processing

Human Resources

(c) Gary Orenstein 7

Page 8: A brief history of data processing

Enter the Database

(c) Gary Orenstein 8

Page 9: A brief history of data processing

Enter the DatabasePut stuff in. Take stuff out.

Reliably. Quickly.

(c) Gary Orenstein 9

Page 10: A brief history of data processing

Now Let Me Ask A Question

(c) Gary Orenstein 10

Page 11: A brief history of data processing

Just A Moment Please

(c) Gary Orenstein 11

Page 12: A brief history of data processing

Let's Build A Bigger, Faster Database

(c) Gary Orenstein 12

Page 13: A brief history of data processing

maybe this is not as easy as we thought

(c) Gary Orenstein 13

Page 14: A brief history of data processing

We Need A Data Warehouse

(c) Gary Orenstein 14

Page 15: A brief history of data processing

Database, Meet Data Warehouse

(c) Gary Orenstein 15

Page 16: A brief history of data processing

Welcome to the ETL Gap

(c) Gary Orenstein 16

Page 17: A brief history of data processing

And Never The Two Shall Meet

(c) Gary Orenstein 17

Page 18: A brief history of data processing

Four Ways Your DBMS is Holding You Back1

• ETL (Extract, Transform, Load)

• Analytic Latency

• Synchronization

• Copies of data

1 Source: Gartner Hybrid/Transactional/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation, Published: 28 January 2014

(c) Gary Orenstein 18

Page 19: A brief history of data processing

Why Did We Separate• Performance

• Performance

• Performance

• Governance

(c) Gary Orenstein 19

Page 20: A brief history of data processing

Primary Performance Impediment

Disk Drives

(c) Gary Orenstein 20

Page 21: A brief history of data processing

Scale-up for Databases and Data

Warehouses

(c) Gary Orenstein 21

Page 22: A brief history of data processing

Complex and Costly

(c) Gary Orenstein 22

Page 23: A brief history of data processing

Quest for Scale-out

(c) Gary Orenstein 23

Page 24: A brief history of data processing

Paths Across Databases and Data Warehouses

(c) Gary Orenstein 24

Page 25: A brief history of data processing

NoSQL Wave And Hadoop Ecosystem

(c) Gary Orenstein 25

Page 26: A brief history of data processing

NoSQL Theory

• Scale

• Performance

• Eventual Consistency

• No need for SQL

(c) Gary Orenstein 26

Page 27: A brief history of data processing

NoSQL Reality

• Scale and performance?

• Stick to one thing at a time

• Consistency?

• Just wait

• Analytics?

• Thank goodness for SQL on NoSQL

(c) Gary Orenstein 27

Page 28: A brief history of data processing

Hadoop Theory

• Just store it

• Who needs a schema

• Let's learn MapReduce

• Compute on disk, no problem

(c) Gary Orenstein 28

Page 29: A brief history of data processing

Hadoop Reality

• Data lakes are deep and dark

• Unclear what is going on

• Hard to fill shoes

• MapReduce

• Hadoop ecosystem engineering

• Occasionally feels like the data strategy is upside down

(c) Gary Orenstein 29

Page 30: A brief history of data processing

What is the one thing never

intended for NoSQL and

Hadoop?SQL.

(c) Gary Orenstein 30

Page 31: A brief history of data processing

Hadoop (HDFS) is a filesystem, not a

database

(c) Gary Orenstein 31

Page 32: A brief history of data processing

NoSQL is, well...

Just part of a complete solution(c) Gary Orenstein 32

Page 33: A brief history of data processing

Why did we pursue a split data warehouse, NoSQL, HDFS?

Performance, performance, performance, governance

(c) Gary Orenstein 33

Page 34: A brief history of data processing

Idea

(c) Gary Orenstein 34

Page 35: A brief history of data processing

Let's Use Memory

(c) Gary Orenstein 35

Page 36: A brief history of data processing

Let's Use Memory

And understandably architect for persistence

(c) Gary Orenstein 36

Page 37: A brief history of data processing

What About Flash

(c) Gary Orenstein 37

Page 38: A brief history of data processing

The Right Solution Spans Memory, Flash, and Disk

(c) Gary Orenstein 38

Page 39: A brief history of data processing

New Tech: Distributed Systems

(c) Gary Orenstein 39

Page 40: A brief history of data processing

Old Tech: Relational Databases

Proudly serving SQL since 1970

(c) Gary Orenstein 40

Page 41: A brief history of data processing

Do we really have to split

databases and data

warehouses?

(c) Gary Orenstein 41

Page 42: A brief history of data processing

Mergewith in-memory solutions

(c) Gary Orenstein 42

Page 43: A brief history of data processing

Do I need to worry about high costs?

(c) Gary Orenstein 43

Page 44: A brief history of data processing

Distributescale across low cost machines or

cloud instances

(c) Gary Orenstein 44

Page 45: A brief history of data processing

Do I need to give up SQL?

(c) Gary Orenstein 45

Page 46: A brief history of data processing

OrchestrateA Multi-Model Solution

• Full transactional SQL

• Inserts, updates, deletes

• JSON

• Geospatial

• Spark

(c) Gary Orenstein 46

Page 47: A brief history of data processing

But not all of my data needs to be in-memoryExactly

• Combine with a disk/flash based columnstore

• Keep real-time data in memory

• Keep historical data on disk

• Query both datastores through a single interface

(c) Gary Orenstein 47

Page 48: A brief history of data processing

What happens if a node goes down?

Replicate for availability

(c) Gary Orenstein 48

Page 49: A brief history of data processing

What happens if I need to

recover?

Persist logs to disk, take snapshots, make backups

(c) Gary Orenstein 49

Page 50: A brief history of data processing

Explore the possibilities• In-memory, distributed database

• Relational and multi-model

• Software for your data center or the cloud

• Real-time data pipelines and analytics

• New world of modern applications

(c) Gary Orenstein 50

Page 51: A brief history of data processing

Find Your Inner SQL

for more@garyorenstein

(c) Gary Orenstein 51