A brief history of data processing
-
Upload
gary-orenstein -
Category
Technology
-
view
1.774 -
download
6
Transcript of A brief history of data processing
A Brief Historyof Data Processing
@garyorenstein
Deckset Theme - Next White
(c) Gary Orenstein 1
In The Beginning
(c) Gary Orenstein 2
Computers and Data
(c) Gary Orenstein 3
Accounting Transactions
(c) Gary Orenstein 4
Financial Transactions
(c) Gary Orenstein 5
Inventory Management
(c) Gary Orenstein 6
Human Resources
(c) Gary Orenstein 7
Enter the Database
(c) Gary Orenstein 8
Enter the DatabasePut stuff in. Take stuff out.
Reliably. Quickly.
(c) Gary Orenstein 9
Now Let Me Ask A Question
(c) Gary Orenstein 10
Just A Moment Please
(c) Gary Orenstein 11
Let's Build A Bigger, Faster Database
(c) Gary Orenstein 12
maybe this is not as easy as we thought
(c) Gary Orenstein 13
We Need A Data Warehouse
(c) Gary Orenstein 14
Database, Meet Data Warehouse
(c) Gary Orenstein 15
Welcome to the ETL Gap
(c) Gary Orenstein 16
And Never The Two Shall Meet
(c) Gary Orenstein 17
Four Ways Your DBMS is Holding You Back1
• ETL (Extract, Transform, Load)
• Analytic Latency
• Synchronization
• Copies of data
1 Source: Gartner Hybrid/Transactional/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation, Published: 28 January 2014
(c) Gary Orenstein 18
Why Did We Separate• Performance
• Performance
• Performance
• Governance
(c) Gary Orenstein 19
Primary Performance Impediment
Disk Drives
(c) Gary Orenstein 20
Scale-up for Databases and Data
Warehouses
(c) Gary Orenstein 21
Complex and Costly
(c) Gary Orenstein 22
Quest for Scale-out
(c) Gary Orenstein 23
Paths Across Databases and Data Warehouses
(c) Gary Orenstein 24
NoSQL Wave And Hadoop Ecosystem
(c) Gary Orenstein 25
NoSQL Theory
• Scale
• Performance
• Eventual Consistency
• No need for SQL
(c) Gary Orenstein 26
NoSQL Reality
• Scale and performance?
• Stick to one thing at a time
• Consistency?
• Just wait
• Analytics?
• Thank goodness for SQL on NoSQL
(c) Gary Orenstein 27
Hadoop Theory
• Just store it
• Who needs a schema
• Let's learn MapReduce
• Compute on disk, no problem
(c) Gary Orenstein 28
Hadoop Reality
• Data lakes are deep and dark
• Unclear what is going on
• Hard to fill shoes
• MapReduce
• Hadoop ecosystem engineering
• Occasionally feels like the data strategy is upside down
(c) Gary Orenstein 29
What is the one thing never
intended for NoSQL and
Hadoop?SQL.
(c) Gary Orenstein 30
Hadoop (HDFS) is a filesystem, not a
database
(c) Gary Orenstein 31
NoSQL is, well...
Just part of a complete solution(c) Gary Orenstein 32
Why did we pursue a split data warehouse, NoSQL, HDFS?
Performance, performance, performance, governance
(c) Gary Orenstein 33
Idea
(c) Gary Orenstein 34
Let's Use Memory
(c) Gary Orenstein 35
Let's Use Memory
And understandably architect for persistence
(c) Gary Orenstein 36
What About Flash
(c) Gary Orenstein 37
The Right Solution Spans Memory, Flash, and Disk
(c) Gary Orenstein 38
New Tech: Distributed Systems
(c) Gary Orenstein 39
Old Tech: Relational Databases
Proudly serving SQL since 1970
(c) Gary Orenstein 40
Do we really have to split
databases and data
warehouses?
(c) Gary Orenstein 41
Mergewith in-memory solutions
(c) Gary Orenstein 42
Do I need to worry about high costs?
(c) Gary Orenstein 43
Distributescale across low cost machines or
cloud instances
(c) Gary Orenstein 44
Do I need to give up SQL?
(c) Gary Orenstein 45
OrchestrateA Multi-Model Solution
• Full transactional SQL
• Inserts, updates, deletes
• JSON
• Geospatial
• Spark
(c) Gary Orenstein 46
But not all of my data needs to be in-memoryExactly
• Combine with a disk/flash based columnstore
• Keep real-time data in memory
• Keep historical data on disk
• Query both datastores through a single interface
(c) Gary Orenstein 47
What happens if a node goes down?
Replicate for availability
(c) Gary Orenstein 48
What happens if I need to
recover?
Persist logs to disk, take snapshots, make backups
(c) Gary Orenstein 49
Explore the possibilities• In-memory, distributed database
• Relational and multi-model
• Software for your data center or the cloud
• Real-time data pipelines and analytics
• New world of modern applications
(c) Gary Orenstein 50
Find Your Inner SQL
for more@garyorenstein
(c) Gary Orenstein 51