A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data...
-
Upload
ben-stopford -
Category
Technology
-
view
12.614 -
download
3
description
Transcript of A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for High Performance Data Access !
Ben Stopford : RBS
How fast is a HashMap lookup?
~20 ns
That’s how long it takes light to travel a room
How fast is a database lookup?
~20 ms
That’s how long it takes light to go to Australia and back
3 times
Computers really are very fast!
The problem is we’re quite good at writing software that slows them down
Desktop Virtualization
We love abstraction
There are many reasons why abstraction is a good idea… …performance just isn’t one of them
Question: is it fair to compare a Database with a HashMap?
Not really…
Key Point
On one end of the scale sits the
HashMap…
..on the other sits the database…
…but it’s a very very long scale that sits between them.
Times are changing
Database Architecture is Aging
The Traditional Architecture
Traditional
Distributed In Memory
Shared Disk In Memory Shared
Nothing
Simpler Contract
Simplifying the Contract
How big is the internet?
5 exabytes
(which is 5,000 petabytes or 5,000,000 terabytes)
How big is an average enterprise database
80% < 1TB (in 2009)
Simplifying the Contract
Databases have huge operational overheads
Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Improving Database Performance !Shared Disk Architecture
Shared Disk
Improving Database Performance !Shared Nothing Architecture
Each machine is responsible for a subset of the records. Each record exists on only one
machine. !
765, 769…
1, 2, 3… 97, 98, 99…
333, 334… 244, 245…
169, 170… Client
Improving Database Performance (3) !
In Memory Databases !(single address-space)
Databases must cache subsets of the data in memory
Cache
Not knowing what you don’t know
Data on Disk
90% in Cache
If you can fit it ALL in memory you know everything!!
The architecture of an in memory database
Memory is at least 100x faster than disk
0.000,000,000,000
μs ns ps ms
L1 Cache Ref
L2 Cache Ref
Main Memory Ref
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB Disk/Network
* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
Memory allows random access. Disk only works well for sequential reads
This makes them very fast!!
The proof is in the stats. TPC-H Benchmarks on a 1TB data set
So why haven’t in memory databases taken off?
Address-Spaces are relatively small and of a finite, fixed size
Durability
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM.
765, 769…
1, 2, 3… 97, 98, 99…
333, 334… 244, 245…
169, 170… Client
Distribution solves our two problems
We get massive amounts of parallel processing
But at the cost of loosing the single address space
Traditional
Distributed In Memory
Shared Disk In Memory Shared
Nothing
Simpler Contract
There are three key themes here:
Distribution
Gain scalability through a distributed architecture
Simplify the contract
Improve scalability by picking appropriate ACID properties.
No Disk
All data is held in RAM
ODC
ODC – Distributed, Shared Nothing, In Memory, Semi-Normalised, Graph DB
450 processes
Messaging (Topic Based) as a system of record (persistence)
2TB of RAM
ODC represents a balance between throughput and
latency
What is Latency?
What is Throughput
Which is best for latency?
Latency?
Traditional Database
Shared Nothing
(Distributed)
In-Memory Database
Which is best for throughput?
Latency?
Traditional Database
Shared Nothing
(Distributed)
In-Memory Database
So why do we use distributed in memory?
In Memory Plentiful hardware
Latency Throughput
This is the technology of the now. So what is the technology of the future?
Terabyte Memory Architectures
Fast Persistent Storage
New Innovations on the Horizon
These factors are remolding the hardware landscape to one where
memory both vast and durable
This is changing the way we write software
Huge servers in the commodity space are driving us towards single process architectures that utilise many cores and large address spaces
We can attain hundreds of thousands of executions per second from a single process if it is well optimised.
“All computers wait at the same speed” !
We need to optimise for our CPU architecture
0.000,000,000,000
μs ns ps ms
L1 Cache Ref
L2 Cache Ref
Main Memory Ref
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB Disk/Network
* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm
Tools like Vtune allow us to optimise software to truly leverage
our hardware
So what does this all mean?
Further Reading