Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...

54
Low Latency for Cloud Data Management Felix Gessert December 18, 2018, Universität Hamburg, DBIS Group

Transcript of Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...

Page 1: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Low Latency for Cloud Data Management

Felix GessertDecember 18, 2018, Universität Hamburg, DBIS Group

Page 2: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Presentationis loading

Page 3: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

+1% Revenue

Web PerformanceAn Open Challenge With a Huge Impact

100 ms faster → = $1.7 Billion

+20% Ad Sales→ = $19 Billion500 ms faster

3Greg Linden. Make Data Useful. 2006

Tammy Everts. Time Is Money: The Business Value of Web Performance. O’Reilly Media, 2016.

Page 4: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

2. Network Delays

1. Backend Processing

4

3. Frontend Rendering

Cloud-Based Web ApplicationsThree Sources of Page Load Time

Page 5: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Latency is the ProblemThroughput vs. Latency

5

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9 10

Pa

ge

Lo

ad

Tim

e (

ms)

Bandwidth in MBit/s (at 60ms latency)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

240 220 200 180 160 140 120 100 80 60 40 20 0

Pa

ge

Lo

ad

Tim

e (

ms)

Latency in ms (at 5MBit/s bandwidth)

Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.

Throughput Latency

VS

Page 6: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Latency is the ProblemThroughput vs. Latency

6Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.

VS

2× Throughput=

Same Load Time

½ Latency≈

½ Load Time

Page 7: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Problem StatementFour Challenges

7

Latency ofDynamic Data

PolyglotPersistence

Transaction Abort Rates

Direct ClientAccess

1 2

3 4

Page 8: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Problem StatementResearch Question

8

How can the latency of retrieving dynamic datafrom cloud services be minimized in an

application- and database-independent way while maintaining strict consistency guarantees?

Page 9: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

1. Latency

9

Problem StatementFour Challenges

2. Direct Access 3. Transactions

4. Polyglot Persistence

Page 10: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Cloud Data Manage-ment Middleware

Caching Dynamic Data

End-to-End Latency in Cloud-based Architectures

Background &Motivation

Outlook and Summary

10

Outline

Providing Modern NoSQL Systems as a Low-Latency DBaaS

1 2

Cache Sketches: Solving Staleness of Reads and Queries

3

The Future of Polyglot Data Management in the Cloud

4

Page 11: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Why is end-to-end latency an open problem?

Page 12: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Background & Motivation

Network Cloud Data ManagementFrontend

12

Page 13: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Background & Motivation

Network Cloud Data ManagementFrontend

State of the Art: Web interface based on HTML, files,

APIs, and application logic Performance defined by critical

rendering path

Problems: No direct access or integration

to data management Storage maintained manually

13

Page 14: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Background & Motivation

Network Cloud Data ManagementFrontend

State of the Art: Service interfaces use REST & HTTP Web caching can reduce

end-to-end latency

Problem: Web caching not compatible with

consistent dynamic data

13

Page 15: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Background & Motivation

Network Cloud Data ManagementFrontend

State of the Art: SaaS, PaaS, and IaaS models Scalability & multi-tenancy

Problems: Combination of cloud services

entails high latency Common application building blocks

often re-implemented13

Page 16: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Background & Motivation

Network Cloud Data ManagementFrontend

State of the Art: Scalability & high availability through

NoSQL systems Sharding & replication Database-as-a-Service (DBaaS) model

Problems: Lack of common data management

abstractions DBaaS model not supported Polyglot persistence manual & error-prone

13

Page 17: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Data Management

13

Background & Motivation

RDBMS

DocumentStore

Key-ValueStore

Wide-ColumnStore

DistributedFile System

Volatile Data

Large Data Sets

Critical Data

Static Files

Nested Data

Challenge:

mapping problem?

data database

How to tackle the

Page 18: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Data Management Techniques

18

LoggingUpdate-in-Place

CachingIn-Memory Storage

Append-Only Storage

Global Secondary IndexingLocal Secondary Indexing

Query PlanningAnalytics FrameworkMaterialized Views

Commit/Consensus ProtocolSynchronous

AsynchronousPrimary Copy

Update Anywhere

Range-ShardingHash-Sharding

Entity-Group ShardingConsistent Hashing

Shared-Disk

Query Processing

Sharding

Replication

Storage Management

Elasticity

Consistency

Read Latency

Write Throughput

Read Availability

Write Availability

Durability

Write Latency

Write Scalability

Read Scalability

Data ScalabilityScan Queries

ACID Transactions

Conditional or Atomic Writes

Joins

Sorting

Filter Queries

Full-text Search

Aggregation and Analytics

FunctionalRequirements

Non-FunctionalRequirements

[GWFR16]

Page 19: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

19

LoggingUpdate-in-Place

CachingIn-Memory Storage

Append-Only Storage

Global Secondary IndexingLocal Secondary Indexing

Query PlanningAnalytics FrameworkMaterialized Views

Commit/Consensus ProtocolSynchronous

AsynchronousPrimary Copy

Update Anywhere

Range-ShardingHash-Sharding

Entity-Group ShardingConsistent Hashing

Shared-Disk

Query Processing

Sharding

Replication

Storage Management

Elasticity

Consistency

Read Latency

Write Throughput

Read Availability

Write Availability

Durability

Write Latency

Write Scalability

Read Scalability

Data ScalabilityScan Queries

ACID Transactions

Conditional or Atomic Writes

Joins

Sorting

Filter Queries

Full-text Search

Aggregation and Analytics

NoSQLToolbox

Access

Fast Lookups

RAM

RedisMemcache

Unbounded

AP CP

CassandraRiak

VoldemortAerospike

HBaseMongoDBCouchBaseDynamoDB

Complex Queries

HDD-Size Unbounded

Analytics

MongoDBRethinkDB

HBase, Accumulo

ElasticSearch, Solr

Hadoop, SparkParallel DWH

Cassandra, HBaseRiak, MongoDB

ACID Availability

RDBMSNeo4j

RavenDBMarkLogic

CouchDBMongoDBSimpleDB

Ad-hoc

CacheShopping-

basketOrder

HistoryOLTP Website

SocialNetwork

Big Data

VolumeVolume

CAP Query PatternConsistency

Example Applications

DecisionTree

[GWFR16]

Page 20: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Backend-as-a-Service

Database-as-a-Service

20

NoSQLToolbox

DecisionTree

Unified DataManagement API

QuerySupport

TransactionProcessing

Data Validation

Partial Updates

Schema Management

CodeExecution

AccessControl

Indexing & Configuration

ObjectPersistence

[GFW+14]

Page 21: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

How can cloud data management be unified & combined with low latency?

Page 22: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Orestes: GoalsA Data Management Middleware for Low Latency

22

DatabaseIndependence

Low Latency withTunable Consistency

Scalable, Available, Multi-Tenant

DBaaS & BaaSFunctionality

Page 23: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Orestes ConceptOverview

23

Heterogeneous Data Stores

UnmodifiedDatabase Systems

Web and Mobile Applications

Scalable Data Management Platform(Multi-Tenancy, Scaling, Caching, Failover, …)

Data and Default Modules

Web Cachingfor Low Latency

DBaaS/BaaSMiddleware

Unified REST API

[GBR14, GB13]

Page 24: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

How can dynamic data be accelerated through web caching?

Page 25: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

The Web‘s Caching Model

25

Expiration-Based Caches:

An object x is consideredfresh for TTLx seconds

Server assigns TTLs for eachobject

Invalidation-Based Caches:

Expose object evictionoperation to the server

Client

Expiration-based Caches

Invalidation-based Caches

RequestPath

Server/DB

Invalidations,Objects

CacheHits

Browser Caches, Forward Proxies, ISP Caches

Content Delivery Networks, Reverse Proxies

[GSW+15]

Page 26: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

ExpirationCache

InvalidationCache

Web Caching for Data ManagementOverview of Cache Sketch Method

26

0 4 020

invalidate

Add to Server Cache Sketch

30 1 110Compact Cache Sketch

ValidateFreshness

1

Data Cachedfor Fixed TTLWithout Cache Sketch:

Stale Cached Data

[GSW+15, GSW+17]

Page 27: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Client

Expiration-based Caches

Invalidation-based Caches

RequestPath

Server/DB

Invalidations,Objects

CacheHits

Needs Invalidation? Server Cache Sketch

10201040

10101010

Counting Bloom Filter

Non-expiredObject Keys

Report Expirations and Writes

The Cache Sketch Approach

27

Needs Revalidation?

Client Cache Sketch

10101010Bloom filter

atconnect

Periodicevery Δ

seconds

attransaction

begin

MinimizeStaleness

MinimizeInvalidations

1

4

Initializationfrom Cache

1

2Δ-AtomicConsistency

3Cache-Aware Transactions

4InvalidationMinimization

32

[GSW+15]

Page 28: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Cache SketchMain Properties

28

To ensure Δ-atomicity the Cache Sketch at time t contains

key(x) of every object x that was written before it expired in

all caches.

t1

r(x)TTL

t1 + TTLt2

w(x)

t

ct

Retrieved Cache Sketch

t3

r(x) Δ

StalenessBound

timespan for which x c[GSW+15, GSW+17]

Page 29: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Cache SketchConstruction

29

To ensure compactness the Cache Sketch stores n keys

in a Bloom filter with m bits, k hash functions and a false

positive rate of 𝑓 ≈ 1 − exp𝑘⋅𝑛

𝑚

𝑘.

k hash functions m Bloom filter bits

1 0 0 1 1 0 1 1h1

hk

...keyfind(key)

Client Cache Sketch

Bits = 1

no

yes

GET request

Revalidation

Cache

Hit

Miss

key

key

Example

11 KB in size

20 000 entries & 5% false positives

[GSW+15, GSW+17]

Page 30: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Writes Follow

Reads

Read Your

Writes

Monotonic

Reads

Monotonic

Writes

Δ-Atomicity

Linearizability

PRAM

Causal

Consistency

Sequential

Consistency

(Δ,p)-

Atomicity

Δ-Atomicity

(Δ,p)-

Atomicity

Read Your

Writes

Monotonic

Reads

Monotonic

Writes

PRAM

Writes Follow

Reads

Causal

Consistency

Linearizability

Sequential

Consistency

Controllable Consistency LevelsCache Sketch Guarantees

30

ControllableStaleness

DefaultGuarantees

Opt-in Guarantees

With CacheBypassing

P. Viotti and M. Vukoli´c. Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys, 2016.[WGW+18, GWR17, GWFR16, GR16, FWGR14, FWGR14]

Page 31: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Determining TTLsTrade-Off

31

Longer TTLsShorter TTLs

⇩ Cache Misses⇩ Invalidations⇩ False Positive Rate

VS

[GSW+17, GSW+15]

Page 32: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

TTL EstimationOptimizing Expiration & Cacheability

32

1. Collect workload statistics for reads and writes

2. Estimate time to next write 𝐸[𝑇𝑤] or mark uncacheable

Constrained Adaptive TTL Estimator

C-LMModel

LWMAEstimator

EWMAEstimator

Ideal for PoissonProcesses

Quick Adaption to Changes

Converges forStatic Workloads

Highly Space-Efficient

[GSW+15]

Page 33: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Average throughput for YCSB workloadsA and B (YCSB benchmark):Average latency for YCSB workloadsA and B (YCSB benchmark):

Evaluation ResultsSimulation & Benchmarking

33

CDNClient MongoDBOrestesSetup:

Page load times with cachedinitialization (Simulation):

Ireland California

[GSW+15]

Page 34: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

247 ms3837 ms

2763 ms1456 ms

4442 ms1576 ms

California

260 ms4226 ms

2645 ms2122 ms

753 ms1836 ms

266 ms5263 ms

3214 ms1622 ms

6573 ms1944 ms

277 ms9963 ms

6505 ms6325 ms

9321 ms4697 ms

BaqendAzureParse

FirebaseKinvey

Apiomat

Baqend

Baqend

Baqend

Frankfurt

Tokyo

Sydney

Evaluation ResultsIndustry Evaluation of Commercial Implementation

34[GSW+17, Ges17]

Page 35: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

How can object caching be extended to query results?

Page 36: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Query CachingChallenges

36

Invalidation Detection

CacheCoherence

Query ResultRepresentation

When do queryresults change?

How to apply Cache Sketches to queries?

What is the best resultstructure for caching?

Q

[GSW+17]

Page 37: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Invalidation DetectionCache Coherence for Query Results

37

Update Orestes Cached Query Result

Cache Invalidation

0 1 110

Updated Cache Sketch

Real-TimeQueries

Add

Change

Remove

Query Events

Product A

Product B

Scalable Streaming System (InvaliDB)

Query Expression

↓Normalized

String

[WRG18, WGF+17, GSW+17, WGFR16]

Page 38: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Solution: Cost-based decision model weighsexpected round-trips vs. invalidations

[𝑢𝑟𝑙1, 𝑢𝑟𝑙2, 𝑢𝑟𝑙3]

Object ListsID Lists

[{𝑖𝑑: 𝑜𝑏𝑗1, 𝑛𝑎𝑚𝑒: "𝑎𝑙𝑖𝑐𝑒"},

{𝑖𝑑: 𝑜𝑏𝑗2, 𝑛𝑎𝑚𝑒: "𝑏𝑜𝑏"},{𝑖𝑑: 𝑜𝑏𝑗3, 𝑛𝑎𝑚𝑒: "𝑒𝑣𝑒"}]

⇩ Invalidations ⇩ Round-Trips

VS

38

Learning Result RepresenationsHandling Changes to Query Results

[GSW+17]

Page 39: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Evaluation ResultsQuery Caching for YCSB-Based Workloads

39

Throughput with growingrequest parallelsim:

Average end-to-end query andread latency:

11×

ThroughputImprovement

47×

Lower Query Latency

Lower Read Latency

[GSW+17]

Page 40: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Can Cache Sketches improve transaction performance?

Page 41: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Problem of Optimistic TransactionsAbort Rates Depend on Latency

41

10 ms

50 ms100 ms150 ms

Transaction Abort Rates Increase

Exponentially withLatency

[GBR14]

Page 42: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Distributed Cache-Aware TransactionsDCAT Solves Latency Problem

42

Orestes Server

Orestes Server

Orestes Server

DB

Coordinator

Client Cache

Cache

Cache

Begin Transaction

Cache Sketch

Reads

WritesBuffer

Commit: read-set and updates

Committed OR aborted + stale objects

Mutual Exclusion

Writes

Read all

1. Cache Sketch: staleness barrier at transaction begin

2. Shorter duration through cached reads

[GBR14]

Page 43: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

ResultsSimulation-Based Abort Analysis

43

15×

Faster Transactions

More Objects BeforeExceeding 2 Seconds

Page 44: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Can polyglot data management be automated in the future?

Page 45: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

VisionAutomated Choice of Databases

45

Latency < 20ms

AnnotatedSchema

Polyglot Persistence Mediator

Application

DB1 DB2 DB3

[SGR15]

Page 46: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Towards Automated Polyglot PersistenceThree-Step Process

46

Requirements specifiedas SLA annotations forschemas (based on NoSQL Toolbox)

Find or provision a suitable combination ofdatabases throughranking algorithm

Mediate data allocationand database operationsbetween applications and databases

1. Requirements 2. Resolution 3. Mediation

CounterTop-k Query

20 ms Write LatencyCounter

Redis

MongoDBCounterUpdate Redis

[SGR15]

Page 47: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Evaluation ResultsCase Study

47

Article

IDTitle…

Imp.

Imp.ID

Document Sorted Set

2.1×

Higher Throughput

10 ms

Predictable Write Latency

Scenario:News Articles With Impression Counts

[SGR15]

Page 48: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Future WorkThree Promising Areas

48

Proactive SLA Enforcement

Monitor & predict database behavior

Action: change routing, live migration, polyglot scaling

Reinforcement Learning of Caching Decisions

Learn best TTLs for any workload

Applications define goals

Polyglot Transaction Processing

Optimal choice of concurrency & commit protocol – across DBs

[SKE+18, SG16]

Page 49: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

What are themain contributions?

Page 50: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Publications (1/4)

50

[SKE+18]Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. LIFT: Reinforcement Learning in Computer Systems by Learning FromDemonstrations. arXiv preprint arXiv:1808.07903 (under submission), 2018.

[WRG18]Wolfram Wingerath, Norbert Ritter, and Felix Gessert. Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer, book to be published in late 2018.

[WGW+18]

Wolfram Wingerath, Felix Gessert, Erik Witt, Steffen Friedrich, and Norbert Ritter. Real-time Data Management for Big Data. In Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018. OpenProceedings.org, 2018.

[GSW+17]Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Erik Witt, Eiko Yoneki, and Norbert Ritter. Quaestor: Query Web Caching for Database- as-a-Service Providers. Proceedings of the VLDB Endowment, 2017.

[GWR17]Felix Gessert, Wolfram Wingerath, and Norbert Ritter. Scalable Data Management: An In-Depth Tutorial on Nosql Data Stores. In BTW (Workshops), volume P-266 of LNI, pages399–402. GI, 2017.

Page 51: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Publications (2/4)

51

[WGF+17]

Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt, and Norbert Ritter. The Case for Change Notifications in Pull-Based Databases. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, 2017.

[GR17]Felix Gessert and Norbert Ritter. SCDM 2017 - Vorwort. In BTW (Workshops), volume P-266 of LNI, pages 211–213. GI, 2017.

[Ges17]Felix Gessert. Lessons Learned Building a Backend-as-a-Service. Baqend Tech Blog, May 2017. (Accessed on 08/11/2017).

[GWFR16]Felix Gessert, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. NoSQL Database Systems: A Survey and Decision Guidance. Computer Science - Research and Development, November 2016.

[GR16]Felix Gessert and Norbert Ritter. Scalable Data Management: NoSQL Data Stores in Research and Practice. In 32nd IEEE International Conference on Data Engineering, ICDE, 2016.

[SG16]Michael Schaarschmidt and Felix Gessert. Learning Runtime Parameters in Computer Systems with Delayed Experience Injection. In Deep Reinforcement Learning Workshop, NIPS, 2016.

Page 52: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Publications (3/4)

52

[WGFR16]Wolfram Wingerath, Felix Gessert, Steffen Friedrich, and Norbert Ritter. Real- Time Stream Processing for Big Data. it - Information Technology, 58(4), January 2016.

[GSW+15]

Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. The Cache Sketch: Revisiting Expiration-based Caching in the Age of Cloud Data Management. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme". GI, 2015.

[GR15a]Felix Gessert and Norbert Ritter. Polyglot Persistence. Datenbank-Spektrum, 15(3):229–233, November 2015.

[GR15b]Felix Gessert and Norbert Ritter. Skalierbare NoSQL- und Cloud-Datenbanken in Forschung und Praxis. In Datenbanksysteme für Business, Technologie und Web (BTW2015) - Workshopband, 2.-3. März 2015, Hamburg, Germany, pages 271–274, 2015.

[Ges15]Felix Gessert. Low Latency Cloud Data Management through Consistent Caching and Polyglot Persistence. In Proceedings of the 9th Advanced Summer School on Service Oriented Computing, SummerSOC, 2015.

[SGR15]Michael Schaarschmidt, Felix Gessert, and Norbert Ritter. Towards Automated PolyglotPersistence. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.

Page 53: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Publications (4/4)

53

[WFGR15]

Wolfram Wingerath, Steffen Friedrich, Felix Gessert, and Norbert Ritter. Who Watches theWatchmen? On the Lack of Validation in NoSQL Benchmarking. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.

[GBR14]Felix Gessert, Florian Bücklers, and Norbert Ritter. ORESTES: a Scalable Database-as-a-Service Architecture for Low Latency. In CloudDB, Data Engineering Workshops (ICDEW), pages 215–222. IEEE, 2014.

[GFW+14]

Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael Schaarschmidt, and Norbert Ritter. Towards a Scalable and Unified REST API for Cloud Data Stores. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 723–734. GI, 2014.

[FWGR14]

Steffen Friedrich, Wolfram Wingerath, Felix Gessert, and Norbert Ritter. NoSQL OLTP Benchmarking: A Survey. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 693–704. GI, 2014.

[GB13]Felix Gessert and Florian Bücklers. ORESTES: ein System für horizontal skalierbaren Zugriff auf Cloud-Datenbanken. In Informatiktage. GI, March 2013.

Page 54: Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge Impact 100 ms faster → = $1.7 Billion 500 ms faster → +20% Ad Sales = $19 Billion

Client (Browser)

Expiration-based Caches

Invalidation-based Caches

Cloud Backend(DBaaS/BaaS)

DatabaseSystems

Expiration (TTL)

Best CacheableStructure

Cached Data

Files

Records, Documents

Query Results

{}

Main ContributionsSummary

54

Cache Coherence for Files, Records & Queries

2

1

TTL Estimation & Result Structure

3

Unified Data Management Interface

4

Database-IndependentDBaaS and BaaS

Scalable Cache-AwareTransactions

Polylgot PersistenceMediation

5

6