BlowFish: Dynamic Storage-Performance
Tradeoff in Data Stores
Anurag Khandelwal, Rachit Agarwal, Ion Stoica
High-Throughput Data Stores
High-Throughput Data Stores
Key-Value Stores
BigTable
High-Throughput Data Stores
Key-Value Stores
BigTable
NoSQL Stores
High-Throughput Data Stores
Key-Value Stores
BigTable
NoSQL Stores
High Throughput: Queries/Second
Existing Data Stores
Existing Data Stores
Storage
Throughput
Existing Data Stores
Storage
Throughput
Uncompressedmore cache
High Throughput
Existing Data Stores
Storage
Throughput
Uncompressed
Compressed
Low Throughput
less cache
Existing Data Stores
Storage
Throughput
Uncompressed
Compressed
10x
10x
Existing Data Stores
Storage
Throughput
Uncompressed
Compressed
Unachievable points
Existing Data Stores
Storage
Throughput
Uncompressed
Compressed
Unachievable points
Switching between the two incurs high latency & CPU
Compressio
nDECompression
Leads to degraded performance when underlying workload or infrastructure changes
Existing Data Stores
Storage
Throughput
Uncompressed
Compressed
Unachievable points
Switching between the two incurs high latency & CPU
Compressio
nDECompression
A Motivating Example
A Motivating Example
Object
Load
Load across items heavily skewed
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Ideal
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Ideal
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
IdealCompressed + Uncompressed
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
IdealCompressed + Uncompressed
Object
Load
1Compressed
Uncompressed10
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
IdealCompressed + Uncompressed
Object
Load
1Compressed
Uncompressed10
Not Cached
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
IdealCompressed + Uncompressed
Selective Replication: #Replicas α Load
Object
Load
1Compressed
Uncompressed10
Not Cached
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
Ideal
Object
Load
1Compressed
Compressed + Uncompressed
Selective Replication: #Replicas α Load
Object
Load
1Compressed
Uncompressed10
Not Cached
Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
Ideal
Object
Load
1Compressed
Wasted Cache!
Compressed + Uncompressed
Selective Replication: #Replicas α Load
Object
Load
1Compressed
Uncompressed10
Not Cached
Not Cached Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
Ideal
Object
Load
1Compressed
Wasted Cache!
Compressed + Uncompressed
Selective Replication: #Replicas α Load
Object
Load
1Compressed
Uncompressed10
Load changes over time
Not Cached
Not Cached Not Cached
1Compressed
Uncompressed10
A Motivating Example
Object
Load
Load across items heavily skewed
Wasted Cache!
Ideal
Object
Load
1Compressed
Wasted Cache!
Compressed + Uncompressed
Selective Replication: #Replicas α Load
Object
Load
1Compressed
Uncompressed10
Load changes over time Degraded performance→
Not Cached
Not Cached Not Cached
BlowFish
Storage
Throughput
Compressed
Uncompressed
BlowFish
Smooth Tradeoff Curve
Storage
Throughput BlowFish
Compressed
Uncompressed
BlowFish
Dynamic Navigation
Smooth Tradeoff Curve
Storage
Throughput BlowFish
Compressed
Uncompressed
BlowFish
Dynamic Navigation
Applications in Several Classical Systems Problems
Smooth Tradeoff Curve
Storage
Throughput BlowFish
Compressed
Uncompressed
Storage
Throughput
Storage-Performance Tradeoff
Storage
Throughput
Background
Background
Builds on Succinct [NSDI’15]
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Background
Builds on Succinct [NSDI’15]
Succinct stores: Sampled Values
Sampled Array
UNSampled Values
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Auxiliary Arrays
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Auxiliary Arrays
‣ Small
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Auxiliary Arrays
‣ Small‣ Compute unsampled
values on the fly
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Auxiliary Arrays
Sampling Rate (α)
‣ Small‣ Compute unsampled
values on the fly
Sampling Rate proxy for Storage & Performance
Background
Builds on Succinct [NSDI’15]
Succinct stores:
Sampled Array
Auxiliary Arrays
Sampling Rate (α)
‣ Small‣ Compute unsampled
values on the fly
Storage ≈ OriginalSize/α
Latency ≈ α
Sampling Rate proxy for Storage & Performance
OriginalSampled Array 9 15 3 0 12 8 14 5
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
9 12RATE = 8
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
9 12RATE = 83 14RATE = 4
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
9 12RATE = 83 14RATE = 4
15 0 8 5RATE = 2
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
9 12RATE = 83 14RATE = 4
15 0 8 5RATE = 2
Different combination of layers
Inspired by multi-layered video encoding techniques
Layered Sampled Array
Rate = 2
OriginalSampled Array 9 15 3 0 12 8 14 5
9 12RATE = 83 14RATE = 4
15 0 8 5RATE = 2
Different combination of layers Different points on tradeoff curve
Inspired by multi-layered video encoding techniques
Layered Sampled Array
→
Rate = 2
Technical Details
Technical Details
Technical Details
Technical Details
‣ How should partitions share cache on a server?
Technical Details
‣ How should partitions share cache on a server?
‣ How should partitions share cache across servers?
Technical Details
‣ How should partitions share cache on a server?
‣ How should partitions share cache across servers?
‣ How should requests be scheduled across replicas?
Technical Details
‣ How should partitions share cache on a server?
‣ How should partitions share cache across servers?
‣ How should requests be scheduled across replicas?
Unified Solution: Back-pressure style scheduling
Technical Details
‣ How should partitions share cache on a server?
‣ How should partitions share cache across servers?
Cache proportional to load,
‣ How should requests be scheduled across replicas?
Unified Solution: Back-pressure style scheduling
Technical Details
‣ How should partitions share cache on a server?
‣ How should partitions share cache across servers?
Cache proportional to load,
‣ How should requests be scheduled across replicas?
Unified Solution: Back-pressure style scheduling
without explicit coordination
Storage
Throughput
Dynamic Navigationof tradeoff curve
Storage
Throughput
Layer Additions & Deletions
9 12RATE = 83 14RATE = 4
15 0 8 5RATE = 2
Layer Additions & Deletions
9 12RATE = 83 14RATE = 4
Layer Additions & Deletions
Layer Deletions: simple
RATE = 2
9 12RATE = 83 14RATE = 4
Layer Additions & Deletions
Layer Addition:
RATE = 2
9 12RATE = 83 14RATE = 4
Unsampled values already computed during query execution
Layer Additions & Deletions
Layer Addition:
RATE = 2
9 12RATE = 83 14RATE = 4
815
Unsampled values already computed during query execution
Layer Additions & Deletions
Layer Addition:
Layers in LSA populated opportunistically!!
Assumptions & Limitations
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
get() search()put() delete() regex()
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
get() search()put() delete()
‣ Queries do not touch all servers
regex()
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
get() search()put() delete()
‣ Queries do not touch all servers
regex()
Many systems employ sharding schemes to avoid touching all servers, e.g., [Schism, VLDB’10]
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
get() search()put() delete()
‣ Queries do not touch all servers
‣ System is not network-bottlenecked
regex()
Many systems employ sharding schemes to avoid touching all servers, e.g., [Schism, VLDB’10]
Assumptions & Limitations
‣ Functionality close to state-of-the-art NoSQL stores
get() search()put() delete()
‣ Queries do not touch all servers
‣ System is not network-bottlenecked
regex()
[MICA, NSDI’14] → True for most data stores today
Many systems employ sharding schemes to avoid touching all servers, e.g., [Schism, VLDB’10]
Applications
ApplicationsLook at classical systems problems through a new “lens”
Spatial Skew
Spatial SkewLoad distribution across partitions is heavily skewed
Object
Load
1Compressed
Wasted Cache!
Spatial SkewLoad distribution across partitions is heavily skewed
#Replicas α Load
Selective Replication
Spatial SkewLoad distribution across partitions is heavily skewed
#Replicas α Load
Selective Replication
BlowFish
Fractionally change storage just enough to meet load
1Compressed
Uncompressed10
Object
Load
Spatial SkewLoad distribution across partitions is heavily skewed
#Replicas α Load
Selective Replication
BlowFish
Fractionally change storage just enough to meet load
1.5x higher throughput than Selective Replication,
1Compressed
Uncompressed10
Object
Load
Spatial SkewLoad distribution across partitions is heavily skewed
#Replicas α Load
Selective Replication
BlowFish
Fractionally change storage just enough to meet load
1.5x higher throughput than Selective Replication,
within 10% of optimal
1Compressed
Uncompressed10
Object
Load
Changes in Spatial Skew
Changes in Spatial Skew
Study on Facebook Warehouse Cluster
[HotStorage’13]
Changes in Spatial Skew
Transient failures → 90% of failuresStudy on Facebook Warehouse Cluster
[HotStorage’13]
Changes in Spatial Skew
Transient failures → 90% of failures
Replica creation delayed by 15 mins
Study on Facebook Warehouse Cluster
[HotStorage’13]
Changes in Spatial Skew
Transient failures → 90% of failures
Replica creation delayed by 15 mins
Study on Facebook Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time
Changes in Spatial Skew
Transient failures → 90% of failures
Replica creation delayed by 15 mins
Replica#1
Replica#2
Replica#3
Data Partitions Request Queues
Study on Facebook Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time
Changes in Spatial Skew
Transient failures → 90% of failures
Replica creation delayed by 15 mins
Replica#1
Replica#2
Replica#3
Data Partitions Request Queues
Study on Facebook Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time
Changes in Spatial Skew
Transient failures → 90% of failures
Replica creation delayed by 15 mins
Replica#1
Replica#2
Replica#3
Data Partitions Request Queues
Study on Facebook Warehouse Cluster
[HotStorage’13]
Leads to variation in load over time
Changes in Spatial Skew
Replica#1
Replica#2
Replica#3
Changes in Spatial Skew
Replica#1
Replica#2
Replica#3
Changes in Spatial SkewOperation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Replica#1
Replica#2
Replica#3
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Changes in Spatial Skew
Load Throughput
Operation
s / second
0
600
1200
1800
2400
3000
Time (mins)
0 30 60 90 120
Request Queue Siz
e0K
10K
20K
30K
40K
50K
Time (mins)
0 30 60 90 120
Adapts to 3x higher load in < 5 mins
Replica#1
Replica#2
Replica#3
Summary
Storage
Throughput
Summary
Storage
Throughput
Smooth Tradeoff Curve
Summary
Storage
Throughput
Dynamic Navigation
Smooth Tradeoff Curve
Summary
Storage
Throughput
Dynamic Navigation
Applications in Several Classical Systems Problems
Smooth Tradeoff Curve
Summary
Thank You! Questions?
Storage
Throughput
Dynamic Navigation
Applications in Several Classical Systems Problems
Smooth Tradeoff Curve
Backup Slides
Top Related