Towards Latency: An Online Learning Mechanism for Caching ...ey204/pubs/MPHIL/2015_MICHAEL.pdfrelies...

Towards Latency: An Online LearningMechanism for Caching Dynamic

Query Content

Michael SchaarschmidtSidney Sussex College

A dissertation submitted to the University of Cambridgein partial fulfilment of the requirements for the degree of

Master of Philosophy in Advanced Computer Science

(Research Project - Option B)

University of CambridgeComputer Laboratory

William Gates Building15 JJ Thomson AvenueCambridge CB3 0FDUnited Kingdom

Email: [email protected]

June 11, 2015

Declaration

I Michael Schaarschmidt of Sidney Sussex College, being a candidate for the

M.Phil in Advanced Computer Science, hereby declare that this report and

the work described in it are my own work, unaided except as may be specified

below, and that the report does not contain material that has already been

used to any substantial extent for a comparable purpose.

Total word count: 14,975 (excluding appendices A and B)

Signed:

Date:

This dissertation is copyright c©2015 Michael Schaarschmidt.

All trademarks used in this dissertation are hereby acknowledged.

Acknowledgements

I would like to express gratitude to my supervisor Dr. Eiko Yoneki for her

comments, advice and encouragement throughout this project. I would fur-

ther like to especially thank Felix Gessert for his advice and our discussions

on practical caching issues. Additionally, I would like to thank Valentin Dal-

ibard for his insights into Bayesian optimisation. Finally, I want to thank

Dr. Damien Fay for his comments on online learning.

Abstract

This study investigates caching models of dynamic query content in dis-tributed web infrastructures. Web performance is largely governed by latencyand the number of round-trips required to retrieve content. It has also beenestablished that latency is directly linked to user behaviour and satisfaction[1]. Recently, access latency has gained importance together with serviceabstraction in the data management space. Instead of having to manage adedicated cluster of database servers on premises, applications can use highly-available and scalable database-as-a-service (DBaaS) platforms. These ser-vices typically provide a REST interface to a set of basic database operations[2]. A REST-ful approach enables the use of HTTP caching through browsercaches, content delivery networks (CDNs), proxy caches and reverse proxycaches [3, 4, 5]. Such methods are used extensively to cache static con-tent like JavaScript libraries or background images. However, caching resultsets of database queries over an arbitrary number of dynamic objects in dis-tributed infrastructures poses multiple challenges. First, any query-cachingscheme needs to maintain consistency from the client’s perspective, i.e. acache should not return stale content. From the server’s perspective, it ishard to predict an optimal expiration for a collection of objects that form aquery result since each individual object is read and updated with arbitraryfrequency. DBaaS providers thus generally do not cache their interactivecontent, resulting in noticable loading times when interacting with dynamicapplications.

This project introduces a comprehensive scheme for caching dynamic queryresults. The first component of this model is based upon the idea that thereare multiple ways to represent and cache query results. Further, the modelrelies on a stochastic method to estimate optimal expiration times for dynam-ically changing content. Finally, an online learning model enables real-timedecisions on the different cache representations. As a result, the model is ableto provide imperceptible request latency and consistent reads for clients.

Contents

1 Introduction 1

2 Background and Related Work 52.1 Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Introduction to Web Caching . . . . . . . . . . . . . . 52.1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . 102.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 Reinforcement Learning . . . . . . . . . . . . . . . . . 122.4.2 Machine Learning in Data Management . . . . . . . . 14

3 Caching Queries 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 The Latency Problem . . . . . . . . . . . . . . . . . . . 173.1.2 The Staleness Problem . . . . . . . . . . . . . . . . . . 183.1.3 Model Assumptions and Terminology . . . . . . . . . . 19

3.2 Caching Models for Queries . . . . . . . . . . . . . . . . . . . 213.2.1 Caching Object-Lists . . . . . . . . . . . . . . . . . . . 213.2.2 Caching Id-Lists . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Matching Queries to Updates . . . . . . . . . . . . . . 263.2.4 When Not to Cache . . . . . . . . . . . . . . . . . . . . 29

3.3 Estimating Expirations . . . . . . . . . . . . . . . . . . . . . . 313.3.1 Approximating Poisson Processes . . . . . . . . . . . . 313.3.2 Write-Only Estimation . . . . . . . . . . . . . . . . . . 333.3.3 Dynamic Quantile Estimation . . . . . . . . . . . . . . 34

4 Online Learning 374.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 Representation as an MDP . . . . . . . . . . . . . . . . . . . . 38

i

4.2.1 State and Action spaces . . . . . . . . . . . . . . . . . 384.2.2 Decision Granularity . . . . . . . . . . . . . . . . . . . 394.2.3 Reward Signals . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Belief State Approximation . . . . . . . . . . . . . . . . . . . 424.3.1 Convergence and Exploration . . . . . . . . . . . . . . 434.3.2 Sampling Techniques . . . . . . . . . . . . . . . . . . . 444.3.3 Hyperparameter Optimisation . . . . . . . . . . . . . . 45

5 Evaluation 495.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Simulation Framework . . . . . . . . . . . . . . . . . . . . . . 50

5.2.1 Design and Implementation . . . . . . . . . . . . . . . 505.2.2 Benchmark Configuration . . . . . . . . . . . . . . . . 53

5.3 Comparing Execution Models . . . . . . . . . . . . . . . . . . 535.3.1 Read-Dominant Workload . . . . . . . . . . . . . . . . 535.3.2 Write-Dominant Workload . . . . . . . . . . . . . . . . 57

5.4 Consistency and Invalidations . . . . . . . . . . . . . . . . . . 595.4.1 Adjusting Quantiles . . . . . . . . . . . . . . . . . . . . 595.4.2 Reducing Invalidation Load . . . . . . . . . . . . . . . 61

5.5 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 635.5.1 Learning Decisions . . . . . . . . . . . . . . . . . . . . 635.5.2 Evaluating Trade-offs . . . . . . . . . . . . . . . . . . . 655.5.3 Convergence and Stability . . . . . . . . . . . . . . . . 67

6 Outlook and Conclusion 696.1 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 696.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2.1 Parsing Query Predicates . . . . . . . . . . . . . . . . 706.2.2 Unified Learning Model . . . . . . . . . . . . . . . . . 70

A Proofs 73A.1 Minimum of Exponential Random Variables . . . . . . . . . . 73

B Additional Analysis 75B.1 Impact of Invalidation Latency . . . . . . . . . . . . . . . . . 75B.2 Monte Carlo Optimisation . . . . . . . . . . . . . . . . . . . . 76

ii

List of Figures

1.1 Simplified caching architecture with clients in Europe boundby access latency to a backend server in the USA. . . . . . . . 3

2.1 Empty Bloom filter of length m. . . . . . . . . . . . . . . . . . 92.2 Insertion of new element e into Bloom filter. . . . . . . . . . . 92.3 Reinforcement learning: An agent takes actions and observes

new states and rewards through his environment. . . . . . . . 12

3.1 Query matching architecture overview. A load balancer dis-tributes requests from caches. An invalidation engine deter-mines which query results are stale. Bloom filters can then beused to determine whether they are still cached at some CDNedge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Topology of an Apache Storm invalidation pipeline. After-images of update operations are published to Storm spouts(data stream endpoints). They determine which bolt (streamprocessing node) holds the cached queries related to that up-date. Bolts evaluate the queries on the after-image to findwhich result sets are invalid and notify the DBaaS, which sendsinvalidations to the cache. . . . . . . . . . . . . . . . . . . . . 30

4.1 Utility function example for response times. . . . . . . . . . . 46

5.1 Overview of the simulation architecture. . . . . . . . . . . . . 525.2 Cache hit rates as a function of average query selectivity on a

mixture of 40% reads, 55% queries and 5% writes. . . . . . . . 545.3 Average query response times as a function of average query

selectivity on a mixture of 40% reads, 55% queries and 5%writes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Cache hit rates as a function of average query selectivity ona mixture of 40% reads, 55% queries and 5% writes under auniform access distribution. . . . . . . . . . . . . . . . . . . . 56

iii

5.5 Average query response times as a function of average queryselectivity on a mixture of 40% reads, 55% queries and 5%writes under a uniform acess distribution. . . . . . . . . . . . . 57

5.6 Average query response times as a function of average queryselectivity on a mixture of 25% reads, 25% queries and 50%writes. Estimation quantiles have been adjusted to p = 0.4 toaccount for the write-dominant workload. . . . . . . . . . . . . 58

5.7 Cache hit rates as a function of average query selectivity on amixture of 25% reads, 25% queries and 50% writes. . . . . . . 58

5.8 Absolute number of stale reads on the write-dominant work-load as a function of the quantile of the next expected write. . 60

5.9 Cache hit rates on the write-dominant workload as a functionof the quantile of the next expected write. . . . . . . . . . . . 60

5.10 Cache hit rates on the write-dominant workload as a functionof the quantile of the next expected write and compared to astatic caching method. . . . . . . . . . . . . . . . . . . . . . . 62

5.11 Invalidation loads for using the naive id-list approach versusdynamically marking frequently written objects as uncachable. 62

5.12 Global utility as a function of operations performed. . . . . . . 665.13 Behaviour of learning model versus random guessing under a

change of workload mixture. . . . . . . . . . . . . . . . . . . . 68

B.1 Stale reads as a function of mean invalidation latency on 100,000operations. Higher invalidation latency gives rise to more stalereads, as there is a bigger time window to retrieve stale con-tent from the cache. Marking frequently written objects asuncachable reduces this effect. . . . . . . . . . . . . . . . . . . 75

iv

List of Tables

3.1 Employee table. . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 CDN after caching Q1 as an object-list. . . . . . . . . . . . . . 233.3 CDN after caching Q1, Q2 as object-lists. . . . . . . . . . . . . 233.4 CDN after caching Q1 as an id-list, before client has requested

individual resources. . . . . . . . . . . . . . . . . . . . . . . . 243.5 CDN after client has requested all individual resources. . . . . 243.6 CDN after invalidation of id 1, id-list still matches query pred-

icate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.7 CDN after invalidation of id 1, id-list does not match query

predicate any more. . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1 Average overall request response times (ms) for learning modelcompared to random guessing and static decisions on a read-dominant workload. . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 Cache hit rates for learning model compared to random guess-ing and static decisions on a read-dominant workload. . . . . . 63

5.3 Average query response times (ms) for learning model com-pared to random guessing on execution model on read-dominantworkload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4 Average request response times (ms) for learning model com-pared to random guessing on execution model on write-dominantworkload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5 Invalidation loads for learning model compared to randomguessing on execution model on write-dominant workload. . . 65

B.1 Bayesian optimisation of optimal quantile p and maximumallowed ttl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

v

Chapter 1

Introduction

In recent years, cloud computing has allowed users to delegate the task of

storing and managing data for web-based services. In particular, companies

or individual developers can now completely withdraw from the costly task

of setting up and maintaining dedicated database servers. Instead, they can

utilise database-as-a-service (DBaaS) platforms that offer service level agree-

ments on availability and performance, flexible pricing models and elastic

configurations that can quickly allocate virtualised resources [2].

This project assesses how providers of such services can cache content that

is updated by users interacting with dynamic applications, e.g. mobile ap-

plications or web sites. For instance, a social network application constantly

refreshes in order to show the latest content from a user’s network. While

interacting with such applications, users need to wait on the DBaaS server

to deliver new data to their end devices. For an interactive application, im-

perceptible loading times (ideally below 100 milliseconds) are desirable so

the user’s experience is not interrupted. However, this often times proves

problematic if clients and DBaaS servers are located in different geographic

regions.1 Hence, service providers aim to deliver as much content as possible

through local cache servers. A good example of an effective local cache server

1A single round-trip between Europe and the United States takes around 170 millisec-onds. [6]

1

are content delivery networks (CDNs). CDNs are often used to deliver static

background images and style sheets.

Caching dynamic content can nevertheless prove difficult because some con-

tent may be updated very frequently, e.g. every few seconds. This is problem-

atic because on every update to any part of the content, the DBaaS provider

needs to determine which entries to delete from caches, which is a compu-

tationally expensive task for large databases. Further, in order to prevent

clients from reading stale content the DBaaS would have to permanently

send out requests to delete old cached content. These issues of maintaining

a consistent view of the data for the client while not blocking performance of

the backend server generally prevent DBaaS providers from caching volatile

content. Consequently, many applications suffer from long loading times.

This study proposes a comprehensive caching scheme for caching dynamic

query content. Figure 1.1 provides an abstract view on the suggested caching

infrastructure. There are clients in one geographic region and a DBaaS in an-

other geographic region. In a drastically simplified network topology, clients

access the geographically closest CDN server to access content, thus minimis-

ing latency. By interacting with their applications, clients also continuously

update content, e.g. by posting a comment in a social feed. This project thus

deals with mechanisms that allow the DBaaS to cache the content of queries

that change on the scale of seconds while ensuring high consistency at the

client. On a high level, this will be achieved by monitoring access metrics

like the frequency of incoming updates, thus allowing for a stochastic view

on optimal cache expirations. Further insights with regard to the seman-

tics of caching lead to different caching models. Acting on these results, a

machine learning module will provide an effective model for online decisions

on incoming queries. Through the combination of semantic insights about

caching, stochastic analysis and machine learning, I will therefore present the

first comprehensive web caching scheme for highly volatile query content.

2

DBaaS

Cache

MongoDB

Learner

Clients (end devices)

Access metrics

Amazon EC2 US region

CDN edge in Europe

Clients in

Europe

Updates

Estimatesrewards

Forwards requestsReturns resources

and requests invalidations

QueriesRequests decision and ttl

Requests resourcesIssues queries

Figure 1.1: Simplified caching architecture with clients in Europe bound byaccess latency to a backend server in the USA.

3

In summary, this work makes the following contributions:

• A comprehensive scheme for caching dynamic query results, thus en-

abling interactive applications with drastically reduced response times.

• A stochastic method to estimate optimal expiration times for dynami-

cally changing query results.

• An online learning mechanism that can adapt caching models to chang-

ing request loads.

• A dedicated Monte Carlo simulation framework which can be used to

analyse various properties of query processing.

The structure of my dissertation is as follows: Chapter 2 provides a brief

overview on REST-based web caching, Bloom filters, Monte Carlo meth-

ods and reinforcement learning as well as on related work on these topics.

Chapter 3 introduces the concept of query caching and the implications of

different cache representations. Chapter 4 proposes a machine learning model

that provides online decisions on these representations. In chapter 5, I first

explain the implementation of my simulation framework before evaluating

different cache representations and the learning model. Finally, chapter 6

summarises my findings and concludes with an outlook on future work.

4

Chapter 2

Background and Related Work

2.1 Web Caching

2.1.1 Introduction to Web Caching

In this chapter, I provide an overview on some essential concepts concerning

web caching, Bloom filters and machine learning. I also supply recent exam-

ples of work related to these concepts. In doing so, I assume the reader to

be familiar with the basic ideas of web protocols, database management and

probability theory. This section begins with an introduction to web (HTTP)

caching.

The fundamental challenge of web caching is consistency, i.e. the require-

ment that content read from a cache be up-to date. For consistency pur-

poses, there are essentially two types of caches. Expiration-based caches

like browser caches, forward proxy caches or ISP caches control consistency

through freshness and validation. Freshness is the duration for which a

cached copy is considered fresh and can be controlled through “max-age”

or “expires” HTTP headers. For instance, if a cached object expires after

one minute, then there is a clear one minute upper limit on how long a client

may see old content if the original content is modified. Expiration-based

5

caches can also validate their content by using an “If-modified” header in a

refresh request. On the other hand, invalidation-based caches like content

delivery networks or reverse proxy caches are server-controlled caches. That

is, the origin server of the content can actively control consistency by deleting

(invalidating) content from the cache through a specific HTTP request [7, 8].

A reverse proxy is usually located at the network of the origin server and can

be used to hide the structure of the internal network, reduce and distribute

load from incoming requests and cache content from the origin server. Note

that reverse proxies, due to their location at the origin server, do not aid

in mitigating latency caused by access from a geographically distant client.

Thus, this project deals primarily with invalidation-based mechanics using

the example of CDNs.

Content delivery networks distribute content through a globally distributed

network of cache servers (or edge servers). Requests from clients are usually

routed to the closest edge server to minimise access latency. There are various

types of CDN architectures, network topologies and use cases. CDNs can

be used to cache complete websites in the function of a proxy cache, to

synchronise and deliver streaming content or to cache the embedded static

parts (e.g. stylesheets) of dynamic websites. For this project, the most

relevant feature of CDNs is their invalidation mechanism. The origin server

generally sets an expiration for the cached content. However, the origin server

can also ask the CDN to remove the content through an invalidation request,

which means the origin server can actively mitigate reads of stale content

from the cache. Clients can also add revalidation headers to their request

if they do not want to risk reading stale cache content. This instructs the

CDN to request the latest content version from the origin server.

The key point here is that this does not enforce strong consistency because

the cache does not know about updates at the DBaaS immediately. Instead,

there is the notion of eventual consistency, i.e. consistency requirements are

relaxed for higher performance (more cache hits, less requests sent to the

origin server) [9]. Even if the origin server sends an invalidation to the CDN

directly after an update, there is a small time window until the invalidation is

6

completed in which clients can read stale content. A read is only guaranteed

to be consistent if it adds a revalidation header, thus excluding cached content

and increasing load at the origin server. A large part of this work concerns the

mechanisms of invalidation for dynamic query content and their implication

for overall system performance.

2.1.2 Previous Work

In this section, I briefly survey previous and related work on caching. First,

this project relies upon my own previous work on expiration based caching

[6] and on the architecture of scalable cloud-databases [10, 11]. Gessert

and I have proposed a comprehensive scheme for leveraging global HTTP

caching infrastructures. More specifically, we have introduced the Cache

Sketch, a Bloom filter-based representation of database records that is used

to enable tunable consistency and performance in web-caching. Throughout

this dissertation, I will repeatedly point towards specific aspects of this work

(and other related work) in order to clarify my analysis. The primary focus of

our previous work was to introduce a proof of concept for dynamic caching of

single database records. The contribution of this project is to advance these

ideas into a model of caching full query results as well as adding an online

learning component for decision making on query-execution. To this end,

Monte Carlo methods are employed to analyse the performance of various

configurations. Monte Carlo simulations have been used previously to help

quantify eventual consistency measures [12, 13, 14].

Recently, Huang et al. have provided an in-depth analysis of a large scale

productive caching infrastructure by looking at Facebook’s photo cache [15,

16]. Even though this example contains some photo-specific problems (re-

sizing), it still contains relevant insights. Pictures are essentially read-only

content and the challenge in an infrastructure at the scale of Facebook’s photo

cache lies in the huge data volume. Nevertheless, their work can provide an

understanding of typical workloads and achievable cache hit rates. Apart

from this recent work, there is an extensive body of research on the nature

7

of internet content delivery systems [17, 18] and their workloads [19]. Recent

research has also looked into content delivery networks (CDN) and their role

in dealing with sudden popularity (“flash crowds”) of social media content

as well as with geographically distributed workloads [20, 21, 22].

This work aims to provide low-latency through exploiting existing HTTP

caching infrastructures. Another popular approach that however requires

additional infrastructure is geo-replication. Instead of caching data on geo-

graphically distributed edges of a CDN infrastructure, the database system

itself is globally distributed [23, 24]. A primary example of this is Google’s

Spanner [25]. Data is replicated across datacenters and manages serialisation

of distributed transactions through globally meaningful commit timestamps.

This enables globally-consistent reads across the database for a given times-

tamp. The main performance issue stems from the fact that synchronisation

between data centers is costly, as it is bound by round-trip delay time be-

tween geographic regions. Finally, there have been some previous efforts into

scalable query caching. Garrod et al. have achieved high cache hit rates by

using proxy servers with a distributed consistency management model based

on a publish/subscribe invalidation architecture [26]. There have also been

some efforts into adaptive time-to-live (ttl) estimation of web-search results

[27]. This work separates itself from previous work in multiple aspects. First,

it uses existing HTTP infrastructure and does not require additional dedi-

cated servers for caching. Employing stochastic models, this work provides

a record-level analysis of query results to provide much more fine-grained

ttl estimates. Furthermore, the online learning model can achieve tunable

trade-offs between average query response time, consistency and server load

by changing execution models for queries at runtime.

8

2.2 Bloom Filters

Bloom filters are space-efficient probabilistic data structures that allow mem-

bership queries on sets with a certain false positive rate [28]. A Bloom filter

represents a set S of n elements through a bit array of length m. It also

requires k independent hash functions h1, . . . , hk with range 1, . . . ,m that

map each element uniformly to a random index of the bit array. To save

an element s ∈ S to the Bloom filter, all k hash functions are computed

independently and the appropriate indices in the bit array are set to 1 (and

stay 1 if they were already set from another insert), as seen in figures 2.1 and

2.2.

0 0 0 0 0 0 0 0

1 m

Figure 2.1: Empty Bloom filter of length m.

h1(e)

1 0 0 1 0 1 0 0

1 m

Element e

h2(e) h3(e)

Figure 2.2: Insertion of new element e into Bloom filter.

A membership query can then be performed by again computing the hash

functions and looking up if all k result indices are set to 1. This means that

a false positive occurs through hash collisions if inserts from other elements

have already set the relevant bits. An extension of this concept is the counting

Bloom filter, which has counters instead of single bits, thus also enabling the

deletion of elements through decreasing the counter (which could cause a

9

false negative with a single bit). It can then be shown that the false positive

rate can be approximated as follows [29]:

f =

(1− (1− 1

m

kn)k≈ 1− e−

knm . (2.1)

The implication of being able to determine the false positive rate as a function

of expected objects, length m and hash functions is that Bloom filters are

precisely tunable, i.e. the size can be controlled according to the false positive

rate. Bloom filters have found particular use in networking applications, as

they can be transferred quickly due to their compact representation [30, 31,

32].

2.3 Monte Carlo Methods

Monte Carle methods are a set of computational techniques that are used

to approximate distributions in experiments through repeated random sam-

pling. Monte Carlo simulations are widely employed in the physical sciences

to model and understand the behaviour of probabilistic systems. They essen-

tially rely on the law of large numbers, i.e. the expectation that the sample

mean over a sufficient number of inputs will approximate the actual mean

of the target distribution [33]. There are three central components to Monte

Carlo simulations [34]:

(1) A known input distribution for the system.

(2) Random sampling from the input distribution and simulation of the

system and its conditions of interest under the sampled inputs.

(3) Numerical evaluation of the aggregated results.

A generic approach to Monte Carlo simulation is the construction of a Markov

Chain that converges to a target density equal to the distribution of interest.

10

This is particularly relevant to the simulation of complex multivariate dis-

tributions. Consequently, there is an extensive body of research on sampling

methods, notably Gibbs sampling and the Metropolis-Hastings algorithm

[35, 36]. Monte Carlo simulation is also useful in the analysis of distributed

systems and caching infrastructures. In particular, Monte Carlo simulation

of access and latency distributions enables detailed analysis of caching be-

haviour, as it can quantify the impact of small changes in latency and work-

load on performance. Fortunately, simulation of database workloads can be

achieved by drawing a key for a database entry to access from a univariate

discrete distribution. An easy way to do this is the inverse integral transform

method, which will be introduced briefly [37]. Consider a discrete random

variable X to sample from and its probability mass function

fX(x) = Pr(X = x) = pj, j = 1, 2, . . . ,∑j

pj = 1, (2.2)

as well as its cumulative mass function

Pr(X ≤ Xi) ≡ F (Xi) = p1 + p2 + . . .+ pj. (2.3)

The inverse then takes the form

F−1(u) = Xi if p1 + p2 + . . .+ pj−1 ≤ u ≤ p1 + p2 + . . .+ pj. (2.4)

Hence, the discrete distribution can be sampled by drawing a sample U from

a distribution uniform on (0, 1) and then computing the inverse F−1(U) ,

as described by Chib [38]. It thus follows that one can sample Xi with its

probability pi because

Pr(F−1(U) = Xi) = Pr(p1 + . . .+ pj−1 ≤ U ≤ p1 + . . .+ pj) = pj. (2.5)

In the Monte Carlo simulation framework, inverse integral transform is used

because it is computationally inexpensive and provides good accuracy.

11

2.4 Machine Learning

2.4.1 Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that is char-

acterised by software agents that interact with an environment and learn

optimal behaviour through rewards on the actions they take [39]. Initially,

the agent does not know how its actions change its environment and thus

has to explore the space of available actions (as schematically depicted in

figure 2.3). Hence, RL does not require an explicit analytical model of the

environment.

Agent

Environment

ActionReward State

Figure 2.3: Reinforcement learning: An agent takes actions and observes newstates and rewards through his environment.

More precisely, RL is a form of sequential decision making. The goal of the

agent is to select actions that maximise the sum of all future rewards. A

reward is a scalar feedback value the agent receives after taking an action.

Rewards can be stochastic and delayed, thus making it harder for the agent to

reason about the consequences of his actions. For instance, a single move in

a board game during the beginning of a match might have consequences that

only become apparent after one player wins. Variations of RL have been used

in various applications, notably navigation in robotics [40, 41] and complex

12

board games [42, 43, 44, 45]. Formally, RL problems can be understood as

Markov decision processes (MDPs). A finite MDP has four elements [46, 39]:

(1) A set of states S.

(2) A set of actions A.

(3) For a given pair of state and action (s, a) at some point in time t, a

transition probability of possible next states s′ is

P ass′ = Pr{st+1 = s′|st = s, at = a}.

(4) The associated expected reward for a transition from s to s′ trough a:

Rass′ = E{rr+1|st = s, at = a, ss+1 = s′}.

A policy then maps states to actions that presumably maximise rewards.

In general, RL techniques aim to find optimal policies for a given MDP by

iteratively improving upon their current estimates for state and action pairs

as they observe rewards. A popular RL method is Q-learning, which can

learn optimal policies by comparing expected cumulative rewards (Q-values)

in environments with stochastic transitions and rewards [47]. This is achieved

by updating a function Q : S × A→ R:

Qt+1(st, at)← Qt(st, at) + α[rt+1 + γmax

aQt(st+1, a)−Qt(st, at)

](2.6)

Intuitively, initially fixed Q-values are adjusted by combining observed re-

ward rt+1 after taking a transition and selection of the action that is estimated

to maximise future rewards. Updates are parametrised through a learning

rate α and a discount factor γ that prohibits infinite rewards in state-action

loops. A central component of RL is the trade-off between exploitation and

exploration. During learning, the agent needs to explore his environment

by trying out actions that are non-optimal under his current policy to find

whether these state-action sequences lead to higher overall rewards than fol-

lowing his current policy. Typically, this exploration rate is decreased over

time so the agent eventually primarily exploits its found policy.

13

Reinforcement learning (RL) in small (finite) state and action spaces is a well

understood application of finite Markov decision processes [39, 48, 47, 49].

For large state and action spaces, policies cannot be expressed as simple

lookup tables of actions and associated rewards on transitions. Hence, func-

tion approximators are frequently employed to estimate rewards [50, 42].

Specifically, the rise of deep neural networks has recently inspired novel re-

search, as neural networks have been successfully applied to approximate

delayed rewards in complex noisy environments [51, 52]. Other RL ap-

proaches from Bayesian learning employ Gaussian processes to estimate re-

wards [53, 54].

2.4.2 Machine Learning in Data Management

In recent years, many scientific disciplines have begun to investigate machine

learning methods as a new tool for research in their domains. In database

management and cloud computing, a particularly interesting problem is the

question of how to adapt behaviour to changing workloads. Traditional rule-

based or threshold-approaches on resource allocation in compute clusters (e.g.

provision a new server if load is over a given percentage of capacity) can be

replaced by online learning strategies [55, 56, 57]. Such improvements can be

practically achieved by implementing a middleware control layer that tracks

data flow in real-time on top of web-based services. For instance, Angel et

al. have demonstrated how to provide throughput guaruantees for multi-

tenancy clusters by network request header inspection [58]. Alici et al. have

explored machine learning based expiration estimation strategies for search

engine results that however rely on offline training data to build a set of

features and cannot adapt to highly dynamic workloads [27]. This is because

their approach works on a much larger time-scale of months, whereas this

work provides a learning model that recognises workload changes in a matter

of minutes.

14

The advantage of the middleware approach is that it allows for a more generic

and transferable learning process, as opposed to interferring with the under-

lying application to achieve more control over specific configurations. In

this project, I also opt for a middleware approach and treat the database

as a query execution service to the learning model. This way, the concept

is not limited to specific query languages or database paradigms but relies

only on properties of request distributions that are interpreted as stochastic

processes.

15

Chapter 3

Caching Queries

3.1 Introduction

This chapter contains an in-depth description of the query caching scheme.

First, the challenges of caching dynamic content are discussed in detail. Next,

different representations for caching queries are suggested. Further, the prob-

lem of determining which queries are invalidated by an update is considered.

Finally, a stochastic method to determine optimal expirations for query re-

sults is introduced.

3.1.1 The Latency Problem

The framework of assumptions is a database-as-a-service provider exposing

its API through a HTTP/REST interface, e.g. Facebook’s popular parse

platform [59]. Understanding the structure of modern web or mobile ap-

plication helps to see the importance of latency. In general, there are two

aspects to the performance of a web-based application. First, there is an

initial loading period when the browser has to request all external resources

and build the Document Object Model of the application. The duration of

this so called critical rendering path depends on the number of critical exter-

17

nal resources, their size and the time it takes to fetch them from a DBaaS,

a CDN or a proxy cache. Loading times hence depend on the number of

round-trips and on the round-trip latency. Static resources like JavaScript li-

braries or background images are thus cached on all levels of the web-caching

hierarchy. Second, there is dynamic and interactive content that has to be

requested by the end device while the user is interacting with the applica-

tion. Single-page applications are a typical form of this interaction. On a

single-page application, all navigation happens within a single website that is

never left but dynamically changed depending on user actions [60]. Consid-

ering mobile applications running with DBaaS platforms, application logic

is often executed on the client side (smart client), whereas the server is pri-

marily a data management service. Consequently, user experience critically

depends on low latencies of all interactions with the DBaaS, which includes

minimising the number of geographically-bound latency round-trips as well

as maximising cache hits in the caching hierarchy.

3.1.2 The Staleness Problem

The latency problem cannot be solved by simply pushing as much dynamic

content as possible into various layers of the caching hierarchy. Without

further measures, writes would continuously flush the caches. This creates

problems for both clients and servers. First, determining which objects are

potentially alive in which layer of the caching hierarchy and sending invalida-

tions requests creates load on the DBaaS. Further, every invalidation creates

a potential stale read for the client. A stale read occurs in the following

situation:

• A client sends an update on some object and gets an acknowledgement

at some point t0 for version vw.

• At some later point in time t1, a client requests the same object.

• The cache returns the object with some version vr.

18

• If the write-acknowledged version vw is newer than the version vr read

from the cache, the read is stale.

How can the cache return an older version if the server already acknowl-

edged the write of a newer version? This is because invalidation is generally

an asynchronous operation. The DBaaS executes an update and waits for

a write-acknowledgement from the database. It then sends an acknowledge-

ment of the write back to the client and an invalidation to the appropriate

caching layers. Blocking on invalidations is not feasible because an inval-

idation could get lost (e.g. network partition) and the DBaaS would lose

availability. Further, a single cached object might have to be invalidated

over multiple geographic locations, thus potentially incurring multiple ex-

pensive round-trips. This work thus aims for a best-effort view on eventual

consistency. First, if the cached content expires before a write, there can be

no stale read. Second, even if a cached item has not expired, there cannot

be a stale read if the invalidation is fast enough and there is enough time

between an update and subsequent read. There is an inherent trade-off be-

tween providing results both in a consistent and timely manner. This notion

of eventual consistency has been thoroughly investigated by Bailis et al. in

their seminal work on probabilistically bounded staleness [12, 61, 62, 13]. In

particular, they were able to provide expected bounds on staleness for certain

request distributions.

In summary, the problem of staleness and invalidation load makes it pro-

hibitively expensive to cache dynamic query content. To the best of my

knowledge, DBaaS-providers and other web-services thus refrain from caching

their interactive content.

3.1.3 Model Assumptions and Terminology

Previous work has proposed a cache-coherent scheme to cache volatile ob-

jects [6], as summarized in chapter 2. This work dealt exclusively with simple

create, read and update operations on individual objects. However, at least

regarding the content of the requests, this was rather a proof-of-concept,

19

as actual queries usually do not just request single database objects. This

project thus turns to investigate query-specific problems. The first insight

of the query-caching scheme is to acknowledge that there are multiple ways

to execute queries and represent query results. Before introducing these dif-

ferent representations, it is worth discussing some model assumptions. As

explained in the introduction on web-caching, there is generally a whole hi-

erarchy of expiration and invalidation-based caches. This work concentrates

on the specific interaction of clients and servers with an invalidation-based

cache, i.e. a CDN. When I use the term “the cache” in the remainder of this

work, I am referring to an instance of an invalidation-based cache. Note that

a CDN has multiple points of presence, ideally one in all major geographic

regions.

It is a valid reduction to investigate caching behaviour on a single edge-server.

First, for a given client, requests will usually be routed to the same edge-

server in a CDN infrastructure, i.e. the one that minimises (geographically

bounded) round-trip latency. Second, consider the hypothetical case of a

client whose HTTP requests are randomly routed to one of multiple edges.

A write operation still only causes a single invalidation request to be sent out

by the DBaaS. This is because the DBaaS does not have to send invalidation

requests to every edge-server of a CDN. Instead, an invalidation request is

only sent to the closest CDN edge. The invalidation can then be distributed

through a bimodal multicasting protocol [63, 64]. The important insight here

is that the invalidation load for the DBaaS does not depend on the number of

CDN edges (and grows linearly for reverse proxies). Similarly, routing queries

to different edge-servers will lead to a lower cache hit rate on the individual

edges, which can easily be simulated on a single cache by adjusting query

parameters. I thus consider the abstraction of using a single invalidation-

based cache to be feasible for the analysis of query-caching behaviour.

Furthermore, the term “database object” needs to be clarified. An object

refers to a single entry in the database, which can be a single row in a rela-

tional model, a JSON-style document in a document database or a serialised

string in a key-value store. The usage of the term “object” is primarily mo-

20

tivated through the fact that the DBaaS server represents database entries

as REST-ful resources after retrieving them from the database. This illus-

trates the point that the proposed caching scheme is independent from the

database employed by the DBaaS. Naturally, the performance of the system

will vary depending on whether the chosen database matches the require-

ments of the workload. MongoDB, a wide-spread document database that

is based on JSON-style documents as the basic record [65] is used in the

evaluation. MongoDB organises documents in collections (roughly equiva-

lent to a relational table) and is popular for its scalability and flexibility to

store schema-free data. Finally, this work assumes a large cache so the per-

formance does not depend on cache size or eviction algorithms. It is clear

that a smaller cache leads to systematically lower cache hit rates for cer-

tain request distributions. Incorporating this additional degree of freedom is

hence not particularly interesting to this study. Nevertheless, the impact of a

limited cache size will be factored into the discussion of uncachable objects.

3.2 Caching Models for Queries

3.2.1 Caching Object-Lists

The first query-caching model is the naive approach of caching query results

as complete collections of result-objects, i.e. a single entry in the cache maps

a query to its result. The processing flow of this model is fairly straight-

forward. A client issues a query that initially reaches the closest CDN edge.

In the case of a cache miss, the CDN forwards the request to the DBaaS,

which evaluates the query. It then estimates a time-to-live for the query

result, the mechanics of which will be discussed later. The result is then

returned to the CDN, added to the cache and finally returned to the client.

For all subsequent requests with the same query, a single round-trip to the

CDN is sufficient to retrieve the whole query result, as long as the result has

not expired or has been invalidated. The CDN simply checks the hashes of

incoming queries for a match in its table of cache entries. In terms of min-

21

imising response time, this is optimal from the client’s perspective. Note that

the CDN is agnostic towards the content of its entries. It cannot recognise

that some query’s result set is a subset of another cached query result. That

would both require semantic insights into the nature of the queries as well as

knowledge about the structure of the database’s values. This would essen-

tially require a geo-replicated database to locally validate the similarity of

queries, which is a different caching paradigm. However, this project specifi-

cally aims to exploit readily available HTTP caching infrastructure that does

not require multiple dedicated DBaaS-server locations.

This model can be illustrated with a simple example. For ease of reading,

I use a relational table and a SQL query, which I will assume the reader

to be familiar with. As pointed out above, even though DBaaS-queries are

typically abstracted to short method calls and often query NoSQL databases,

the mechanics of my caching scheme do not rely on a specific database or

query paradigm. Consider a drastically simplified employee-table that only

contains an id as its primary key and a salary, as seen in table 3.1.

Id Salary

1 20,0002 25,0003 30,0004 50,000

Table 3.1: Employee table.

A query Q1 now selects all employees with salaries under a certain limit:

SELECT * FROM employee WHERE salary < 30000

Consequently, the CDN will store a mapping from Q1 to the result set, as

seen in table 3.2. The cache is conceptualised a simple hash table, i.e. the

key Q1 refers to its hash.

If another similar query Q2 is evaluated, the CDN blindly caches intersecting

results separately:

SELECT * FROM employee WHERE salary > 22000

22

Key Value

Q1 {{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}}

Table 3.2: CDN after caching Q1 as an object-list.

The query Q2 will now leave the cache in the state seen in table 3.3.

Key Value

Q1 {{id : 1, salary : 20, 000}, {id : 2, salary : 25, 000}}Q2 {{id : 2, salary : 25, 000}, {id : 3, salary : 30, 000},

{id : 4, salary : 50, 000}}

Table 3.3: CDN after caching Q1, Q2 as object-lists.

Now consider a write on some object that is part of a cached query result.

Since the query result was cached as one big object (i.e. a single list of

database objects), the whole result is invalidated. In the example, both

entries for Q1 and Q2 are removed from the cache if the object with id 2 is

updated. Depending on the workload, this can lead to drastically reduced

cache hit rates, as a single update could empty the whole cache. However, if

result sets of queries have mostly empty intersections, writes invalidate fewer

results and increase cache-performance. Note that determining the result

sets that need to be invalidated is a potentially expensive task on its own,

which will be discussed later in this chapter.

3.2.2 Caching Id-Lists

An alternative option to caching query results is the id-list model. Assuming

an empty cache, the first difference of this model from the object-list approach

is the actual query execution. Instead of executing the query in full and

retrieving complete database objects, the query is intentionally executed to

only return the ids (or keys) of matching objects. This can improve query

cost, as the query can potentially be executed as a so called covered query.

A query is covered if an index covers it: if all fields requested in the query

are part of an index and all result fields are also part of that index, the query

23

can be executed by querying the index. The index is typically located in the

RAM of the database server and thus significantly faster than disk reads.

This is an established technique for query optimisation and routinely offered

by databases [66]. The DBaaS then returns this list of ids to the CDN, which

creates an entry for it. Reusing the previous example with an initially empty

cache, the cache state is now in the state seen in table 3.4 after executing

Q1.

Key Value

Q1 {{id : 1}, {id : 2}}

Table 3.4: CDN after caching Q1 as an id-list, before client has requestedindividual resources.

Finally, the id-list is passed back to the client. Note that this already incurred

a full round-trip to the DBaaS without delivering any actual result-objects.

The client then starts requesting the individual REST resources identified by

their ids in the list of results, leaving the CDN in the state shown in table

3.5.

Key Value

Q1 {{id : 1}, {id : 2}}1 {{id : 1, salary : 20, 000}2 {{id : 2, salary : 25, 000}

Table 3.5: CDN after client has requested all individual resources.

In the worst case, this incurs another full round-trip to the DBaaS for ev-

ery individual resource. How is this model useful if it can involve so many

expensive round-trips? There are two potential sources of cache hits. First,

every time the client requests one of the resources from the id-list from the

CDN, there is a potential cache hit on that resource. This is because the

cache is potentially “prewarmed” by other queries with intersecting result

sets. Second, if the client issues the same query again, the CDN can re-

turn the id-list (which is a separate cache entry), saving a round-trip to the

DBaaS. Furthermore, the client does not necessarily request the individual

resources sequentially, but will usually do so in parallel. I will later explore

24

the impact of parallel connections as part of the cost of caching queries in

the online learning model. In a best case scenario, the client thus needs one

round-trip to fetch the id-list from the CDN and one round-trip to fetch the

(also cached) individual resources in parallel from the CDN. This seems an

unintuitive choice, since the lower-bound on latency is cleary higher than the

object-list model, which only needs one round-trip to the CDN to look up

the query result in its best case.

The advantage of the id-list model becomes more apparent upon consider-

ation of its invalidation mechanics. In the framework of the example, Q1

selected for employees with salaries below 30,000. Now consider an update

that changes the salary of employee 1 from 20,000 to 21,000. The DBaaS

now needs to invalidate resource 1 in the CDN but it does not need to inval-

idate the id-list, as the same objects still match the query-predicate. In the

example, the invalidation of id 1 would leave the CDN in the state of table

3.6.

Key Value

Q1 {{id : 1}, {id : 2}}2 {{id : 2, salary : 25, 000}

Table 3.6: CDN after invalidation of id 1, id-list still matches query predicate.

The point of this model is that the id-list contains the information which

objects match the query-predicate, whereas the concrete objects are cached

separately. The key advantage compared to the object-list model is thus that

a single update only invalidates entries from the cache that have actually

changed, as opposed to invalidating a whole list of objects. If object 1 had

been updated to a salary over 30,000, this would have invalidated both the

id-list and the resource, as seen in table 3.7.

Key Value

2 {{id : 2, salary : 25, 000}

Table 3.7: CDN after invalidation of id 1, id-list does not match query pred-icate any more.

25

Even after invalidating both id-list and individual resource, the cached re-

source 2 can still cause cache hits for other overlapping queries. It is not

hard to see how highly intersecting result sets can increase cache hit rates for

the overall system in this model. Note that in the new HTTP/2 standard,

multiplexing and server push can make the id-list an optimal choice for all

workloads, since round-trips would be the same as for the object-list model

[67].

So far, I have not explained how the DBaaS server detects if an update

invalidates a query-predicate. In the following sections, I will outline how the

task of query invalidation is a key factor in the performance of the caching

scheme.

3.2.3 Matching Queries to Updates

Matching queries to updates is necessary to determine which result sets are

not valid any more. I begin by describing the invalidation mechanism for

caching individual volatile objects, as described by Gessert et al. [6]. A

key point to understanding the invalidation process is remembering the dis-

tributed nature of a DBaaS-infrastructure. In principle, there are both mul-

tiple cache edges as well as arbitrarily many instances of the DBaaS middle-

ware server, interfaced for instance through an elastic load balancer. This

has the following implication to invalidation: In a system with more than one

DBaaS server, each individual server does not have sufficient information for

invalidation. An invalidation is only necessary if the object is cached, i.e.

if it was read from the DBaaS previously. The problem is that reads and

writes may be be processed by difference server instances. That means a

server receiving a write request cannot know on its own whether the object

might be cached from a read to another server. Hence, there needs to be a

central lookup-service that keeps track of cached objects and their expira-

tions. Any centralised service is a potential performance bottleneck. Gessert

et al. found an efficient solution by using Redis-backed Bloom filters [10, 68].

Every time an object is read and the DBaaS decides to cache with a certain

26

time-to-live estimation, it reports the key of the object and the ttl to the

Bloom filter. The Bloom filter is implemented to always keep track of the

longest expiration. If different DBaaS servers have different local estimates

of an optimal expiration, the Bloom filter keeps the ttl of the longest abso-

lute expiration in the future. Thus, whenever an update is processed, the

DBaaS can query the Bloom filter. If it has an entry for the key, the object is

potentially cached at some edge-server of the CDN. The DBaaS then deletes

the object from the Bloom filter and requests an invalidation.

If the object has already expired from the cache, it also has expired from

the Bloom filter, since all estimated expirations are reported. This way,

invalidations are only requested when they are actually necessary, with some

small false positive rate through the Bloom filter. This approach cannot be

used for matching updates to queries because the relation between updates

and affected queries is one-to-many. The DBaaS has no way of knowing

which entries to query from the Bloom filter on an update, so it has to try

to match updated objects to result sets. In principle, the DBaaS can hold

the id-lists of all cached query results in memory, which is suitable for Monte

Carlo simulations.

For practical purposes, a distributed stream processing engine like Apache

Storm [69] might be appropriate for query matching. For every update, after

images of the write operation can be streamed into Storm, which evaluates

them against queries and result sets. If the after image of a write does not

match result sets containing the changed object, the queries belonging to the

respective result sets need to be invalidated from the cache, as illustrated in

figure 3.2. Practically, a load balancer routes requests to various instances

of the DBaaS server. Each instance communicates with the database cluster

to execute queries and updates. On every update, an invalidation engine is

consulted to determine which cached query results have become stale. Fi-

nally, a central Bloom filter service is consulted to look up if the stale result

is still potentially cached before sending out an invalidation. An overview

of this architecture can be found in figure 3.1. The implementation of the

matching algorithm will depend on the database-paradigm. For instance, a

27

Load balancer

CDN edge ...

Clients (end devices)

CDN edge nCDN edge 1

DBaaS instance 1

DBaaS instance ...

DBaaS instance n

0 1 0 1 0

DB clusterCentral Bloom filter

Distributes requests in

network

Look up records

Find stale results

Query matching engine

requests

Figure 3.1: Query matching architecture overview. A load balancer dis-tributes requests from caches. An invalidation engine determines which queryresults are stale. Bloom filters can then be used to determine whether theyare still cached at some CDN edge.

28

document database like MongoDB represents objects as JSON-documents.

There are specific libraries to evaluate MongoDB queries on JSON-documents

[70], thus enabling the matching of after-images to queries. Instead of going

into more detail on how to achieve query-matching for specific databases,

some high-level comments on the role of invalidation in the caching-scheme

are necessary. Generally, any matching system will only be able to handle

a certain throughput. A possible perspective on this limit would to be to

consider the matching throughput a resource that needs to be leveraged op-

timally for overall performance. This naturally leads to the question of when

it is not feasible to cache an object or query, which I will briefly discuss in

the following section.

3.2.4 When Not to Cache

From the client’s perspective, reading a cached copy is naturally desirable.

Nevertheless, there are situations when it is impractical for the DBaaS to

cache objects. Entries that are (almost) exclusively written should not be

cached. This would increase the risk of stale reads and importantly cause a

high invalidation load. This notion of observing write and read frequencies

is employed in the estimation of expiration for query results in the next

section. However, there is another relevant aspect to the cost of caching.

The invalidation of a single resource comes at predictable computational

cost, i.e. a (constant time) Bloom filter lookup to determine whether the

resource is cached. In contrast, the matching cost of determining which

queries need invalidation is practically unbounded, as an object might be

part of arbitrarily many cached query results. This creates another decision

problem for the DBaaS. It does not only need to decide which caching model

to use for each query, it also needs to make economical decisions not to cache

some queries depending on the matching cost.

29

Updatespout

Updatespout

Matchingbolt

Matchingbolt

After-image of update

Determine bolt with relevant

queries

Evaluate cachedqueries on

after-image for changed result

DBaaS

Sends invalidations

Output invalidatedqueries

Figure 3.2: Topology of an Apache Storm invalidation pipeline. After-imagesof update operations are published to Storm spouts (data stream endpoints).They determine which bolt (stream processing node) holds the cached queriesrelated to that update. Bolts evaluate the queries on the after-image to findwhich result sets are invalid and notify the DBaaS, which sends invalidationsto the cache.

30

3.3 Estimating Expirations

3.3.1 Approximating Poisson Processes

I now turn to discussing the estimation model for cache expirations. For

now, the problem of estimating an optimal ttl for a query result is treated

separately from the question of whether to represent the query result as an

object-list or an id-list. Remember, the goal of estimating expirations for

queries is to find an optimal trade-off between invalidation load and cache

hits while also minimising stale reads. Ideally, a cached item will expire right

before an update at the DBaaS so there is no matching cost. My approach

to estimate durations for result sets of queries tries to approximate query

behaviour through Poisson processes.

Poisson processes count the occurrences of events in time intervals and are

characterised by an arrival rate λ and a time interval t. For a Poisson process,

the interarrival times of events have an exponential cumulative distribution

function (CDF), i.e. each of the identically and independently distributed

random variables Xi has the cumulative density

F (x;λ) = 1− e(−λx) for x ≥ 0 (3.1)

and mean 1/λ. The probability for a number of arrivals n in some interval

(0, t] is then given by the Poisson probability mass function (PMF) [71]:

pN(t)(n) =(λt)ne−λt

n!(3.2)

The DBaaS can only approximate the λ of the write-process. For each

database entry, the DBaaS can track the rate of incoming writes λw in some

time window t. The expected time of the next write is then 1/λw. However,

the Poisson process of reads and queries is only partially observable, as the

DBaaS only receives cache misses on queries and reads. In previous work,

expirations for single records were estimated by comparing miss rates and

31

write rates to compute quantiles on write probabilities [6]. How can expi-

rations for complete result sets be estimated? The result set of a query Q

of cardinality n can be conceptualised as a set of independent exponentially

distributed random variables Xi, . . . , Xn with different write rate parame-

ters λw1, . . . , λwn. Estimating the expected time-to-live before one of the

objects is written requires a distribution that models the minimum to the

next write, i.e. min{X1, . . . , Xn}, which is again exponentially distributed

(proof in appendix A):

min{X1, . . . , Xn} ∼ exponential

(n∑i=0

λi

)(3.3)

Hence, the DBaaS can simply compute λmin as the rate-parameter for Q by

summing up write rates on individual records:

λmin = λ1 + . . .+ λn (3.4)

It is questionable whether cache miss rates should be tracked and compared to

cache miss rates, as proposed in previous work. Ultimately, DBaaS providers

are interested in the workload mixture of reads/queries and writes on a given

table or collection. For instance, if the workload is dominated by write oper-

ations, ttls should be estimated rather conservatively to reduce invalidations

and stale reads. However, if the read process cannot be directly observed,

there are two options. First, the model can simply rely on writes. Second,

the model can try to approximate the workload mixture of reads and writes

through various measures. In the remainder of this section, I will outline both

alternatives. Further, I will comment on some practical issues of real-time

monitoring at the end of this chapter.

32

3.3.2 Write-Only Estimation

From a perspective of scalability, tracking miss rates on every database record

can be too expensive. However, one could also take a position of ignoring the

read proportion of the workload. Instead, one could base the ttl estimation

simply on the probability of the next write. This requires the inverse CDF

(or quantile function) of the exponential distribution parametrised by λmin

to estimate expirations. The quantile function then provides time-to-lives

that have a probability of p of seeing a write before expiration:

F−1(p, λmin) =−ln(1− p)

λmin(3.5)

Using the median inter-arrival time of writes (p = 0.5) then gives a straight-

forward ttl estimate for the result set of a query:

F−1(0.5, λmin) =ln(2)

λmin(3.6)

The problem with this approach is that it does not provide a good intuition

about the trade-off between cache hit rate and latency. It completely ignores

whether a workload mixture consists primarily of reads or if it is dominated

by writes. Fundamentally, service providers need to determine how many

potential cache hits they are willing to trade for one invalidation that carries

the risk of a stale read with it. The expected reduction in invalidations is

(1 − p) · writes: for p = 1, every write is expected to cause an invalidation,

for p = 0 (no caching), no object is invalidated.

If the model completely ignores reads, it might not be flexible enough to

deal with changing workloads. For instance, to instantly increase cache hits

and thus reduce database load, p could be increased to e.g. 0.75. Developers

could also specify a p for a given table or collection of documents by choosing

33

from predefined options. This is somewhat unsatisfying, as developers cannot

be realistically expected to be aware of the detailed tuning mechanisms in

the caching infrastructure. Another possible issue with this model is that it

performs differently depending on the chosen cache model. For an object-

list, anticipating the next write on the result set is sensible as it invalidates

the whole result. For id-lists, the next write is only relevant if the changed

object does not match the query predicate any more. It is thus possible that

optimal quantiles differ for the different representations.

3.3.3 Dynamic Quantile Estimation

The goal of dynamic quantile estimation is to determine a p in the inverse

CDF that reflects workload mixture as well as the tolerance on eventual

consistency (higher consistency requirements lead to less cache hits). Instead

of comparing cache misses on records and using the miss-to-write ratio as

a proxy, one can directly estimate the workload mixture in the first step.

Later, this estimate can be used to adjust quantiles of the next expected

write. Again, one can argue that using the miss rate at the database is

not informative enough. Since the true workload is hidden behind caches,

the DBaaS cannot use the miss rate to infer if the workload even warrants

caching. There are multiple possible models for estimating the workload

mixture. First, the developer can specify the expected workload mixture.

Note that this is different from the write-only model, where it was suggested

the developer could directly choose a quantile. Providing a workload mixture

is much more intuitive, as the developer can be expected to know whether a

schema is primarily read or written.

Another option is based on the insight that some objects will not be cached

at all for various reasons. First, every cached object requires an entry in

the server-side expiring Bloom filter, thus increasing probability of a false

positive lookup, which in turn causes unnecessary invalidations. Second, the

limited cache size can force the DBaaS to mark some objects as uncachable.

This issue is related not only to the workload mixture, but also to the request

34

distribution. The workload mixture is the proportion of reads, writes and

queries, whereas the request distribution describes how often individual keys

are accessed by operations. In a typical Zipfian request distribution, some

objects will be written extremely frequently, even though the workload mix-

ture is dominated by reads. Furthermore, the expected bounds on stale reads

depend on the latency distribution of the invalidation request: the longer it

takes for an invalidation to complete, the higher the cumulative probability

of a stale read. In summary, the overall workload mixture for a table can

be estimated by marking some objects as uncachable for various reasons and

then measuring their read/write mixture.

Finally, one could track other query metrics through CDN log analysis, as

proposed by Ozcan et al. [72, 73]. Query Shareness (QS) quantifies how

many clients request a certain query, which is also interesting to the object-

list versus id-list decision, as a query that is shared by multiple users can

particularly benefit from a pre-warmed cache. Query frequency stability

(QFS) models the popularity change of query frequency over time.

After obtaining an estimate of the workload mixture, there are multiple op-

tions to map estimates to quantiles. Using offline optimisation, a provider

can obtain optimal values for typical workload mixtures. Quantiles can then

simply be looked up from a configuration file. It is however questionable if

such a model can reflect the nature of drastically changing workloads, e.g.

applications suddenly growing viral. Alternately, an online learning model

could use a budgeting approach. If there is a limited number of invalidations

the system can perform, quantiles can be adjusted according to the number

of invalidations performed. For instance, if the invalidation load is too high,

the probability of seeing a write within the time-to-live of a cached object

needs to be lowered.

In summary, this chapter has introduced different query-caching execution

models that are based on record-level access frequencies. I have also discussed

strategies to invalidate result sets of queries, ttl estimation based on Poisson

processes as well as various practical limitations. The baseline of all these

considerations is a very long static ttl, which causes a maximum of cache

35

hits, invalidation cost and stale reads. In the next chapter, these insights

are combined into an online decision model that can achieve fine-grained

performance trade-offs.

36

Chapter 4

Online Learning

4.1 Introduction

In the previous chapter, a theory of different execution models and their

constraints was introduced. Specifically, trade-offs between execution models

and parameters of ttl estimation were discussed. However, these insights are

only actionable if the DBaaS has a decision model that can adapt to changing

request loads. In this chapter, I first describe the decision problem in a formal

framework and then derive a solution. Further, I introduce a generic method

to find optimal parametrisations through utility functions.

In order to construct a learning model, one first needs to consider what

information is available at what point in the decision process. The learning

process will also need to consider the granularity of decision making both for

the execution model as well as for ttl estimation. I begin by considering what

is available to the DBaaS. The DBaaS can monitor reads, writes and issues

invalidations requests after updates. Further, the DBaaS does not know the

exact status of the various caching layers nor about the latencies a client sees

for specific requests. Next, the processing flow of a potential decision model

needs to be considered. The base case is a system that has not processed

any queries yet and all caches are empty. At some point, the server receives

37

an initial query. The challenge from the server’s perspective is that it does

not know anything about the result of this query yet but still has to make a

decision on how to execute the query.

As explained previously, the DBaaS can either order the database to execute

a covered query on the index that only returns ids or a full query that returns

all entries matching the query predicate. The query result is more informa-

tive than the query itself. Since the result contains the specific database

objects or at least their ids, any available metrics on these objects can be

used to improve future decisions. In principle, the model aims to improve

decision making by considering how the previous decision impacted system

performance, i.e. average response times for clients and load at the back-

end. In the following sections, I express this problem in a formal framework,

present my solution and reason about the issues related to the scalability and

performance of real-time learning.

4.2 Representation as an MDP

4.2.1 State and Action spaces

Finding a closed-form solution might be impractical due to the complex and

stochastic nature of the problem. Many of the relevant variables like write

rates, workload mixture and response times can only be approximated at

the DBaaS. This lack of an analytical model suggests that reinforcement

learning could be a sensible approach. This requires the task to be framed as

a Markov decision process. The learning model is hence constructed by first

considering each component of an MDP with regard to the problem. I then

derive a model that I believe captures best the constraints of the problem.

For now, the decision not to cache an object is ignored and deferred to the ttl

estimation. This means the learning model only makes a decision between

execution models and the ttl estimation model can then estimate a ttl of 0

if it decides the object should not be cached. Clearly, the space of possible

38

actions in this simplified scenario is A = {object-list, id-list}. One could then

argue that queries should constitute the space of states, as the decision model

must map queries to actions. Each action would then lead to a new query

as the next state, i.e. Queries×A→ Queries. There are multiple problems

with this representation. It is questionable whether using queries as states

even satisfies the Markov property since the effects of a decision taken in

a state do not only depend on that single query. An incoming query does

not capture all information relevant to the DBaaS. As I argued in chapter

3, various realtime metrics need to be taken into account. Further, a RL

agent assumes that his actions determine his next state. Even if there is a

probabilistic transition model that assumes a distribution of possible states

for a decision, this is not a valid assumption. Queries from different clients

are not in any causal relationship. Assuming that a decision on one query

leads to another query as a new state is thus not a useful intuition.

4.2.2 Decision Granularity

The observation that a decision model should depend on access patterns and

workload metrics leads to multiple insights into the model structure. First,

this suggests that states could be conceptualised as a set of load metrics

instead of single queries. This implies a large state space that cannot be

represented as a lookup table and must be approximated either through a

linear sum of weighted features or a non-linear approximator like a neural

network.

Second, using the global system performance as a state has consequences

on the granularity at which decision making is sensible. Measuring the im-

pact of the decision on a single query on the system is infeasible. While the

model aims to make ttl estimates on the level of individual query result sets,

the execution model might be captured on the level of tables or document

collections. As shown in the examples in chapter 3, the choice of execu-

tion model should in part depend on how much query predicates overlap.

Consequently, a sensible model might assess access patterns on the level of

39

collections and use a single execution model for all queries on that collection

or for all parametrisations of prepared queries.

4.2.3 Reward Signals

Before discussing how to map states to actions in a formal manner, a reward

signal needs to be specified. A fundamental problem in online learning is

the definition of a good reward function. In data management, users and

providers are often interested in learning how to achieve very specific trade-

offs on various performance metrics, which are then expressed through service

level agreements. Naturally, one can only achieve trade-offs on features that

are modelled into the reward function. For many examples in reinforcement

learning, this is a straight forward measure such as the score in a game or

making it to a certain height in the mountain car problem [74]. The difficulty

is then rather to learn an approximation of the cost for actions in continuous

state spaces from noisy and delayed rewards. For the decision model, the

structure of the reward signal itself is a challenge.

I begin by recapitulating features that are relevant to the execution model.

For a given query, the database returns a result comprised of objects or

keys (ignoring the trivial case of an empty result). In general, the goal is to

minimise invalidations on these keys, to maximise cache hits, and to minimise

overall query latency. However, only invalidations are directly visible to the

DBaaS through the server-side expiring Bloom filter. However, cache misses

registered at the DBaaS might be used as a proxy for cache hits. Earlier,

I argued that cache misses cannot be used to infer the workload mixture.

Nevertheless, a learner can still extract a reward from just comparing the

total amount of cache misses in a given time period for different decisions.

Further, while request latency for clients is unknown, a learning model could

instead use the expected relative cost between execution models as a reward.

Using the id-list model, the relative latency cost is a factor of result set

cardinality and parallel connections.

40

Requesting all resources from a list of ids is more expensive by a factor of⌈card(result set)

connections

⌉. (4.1)

The key point is that the model lacks an absolute notion of the quality of

an action. The exact number of invalidations or cache misses following a

sequence of decisions is not meaningful. While a certain absolute number of

invalidations can be seen as an indicator for uncachable objects, cache misses

are only meaningful when compared between execution models under the

assumption that the workload is constant during the period of observation.

It is also notable that the same metrics that I suggested to represent a state

are used in the reward signal. Specifically, the state is comprised by the over-

all load, whereas the reward consists of the specific metrics for keys that are

part of a query result. The structure of the reward suggests that the model

needs to continuously compare choices for the same queries to see which de-

cision yields the higher relative reward. So far, this approach has not dealt

with the question of how to map the continuous action space to a binary set

of actions. While substantial research efforts have gone into approximation of

continuous state and action spaces [75, 51, 52], it is questionable whether this

effort is necessary here. If reward features are a subset of state features and

states need to be mapped to actions according to relative rewards, the model

can simply represent its policy as a probability distribution over actions to

sample from, as actions need to be constantly compared for relative rewards.

In the following section, I hence combine the previous observations into a

model that directly updates the belief state about the optimal distribution

of actions.

41

4.3 Belief State Approximation

I propose the following model: each collection or table begins with a prior on

execution models, e.g. without further assumptions one might use a uniform

prior with p(object - list) = 0.5 and p(id - list) = 0.5. A learning period is

defined by the number of samples n that the model collects before updating

its belief state. Further, the model is parametrised through the interval

length of the moving window at which writes, invalidations and cache misses

can be tracked. Every time the DBaaS receives a query, a decision on the

execution model is made by drawing from the distribution, e.g. the model

optimises on the overall distribution for a collection of entries. One could

also imagine the model learning a distribution for all parametrisations of

a prepared query, e.g. a query that always requests the same content but

allows for user-defined filters. After query execution, a reward r on a list of

k result ids id1, . . . , idk for a sample is computed through

r =1

c·

k∑j=1

(ω1

invalidations(idj)+

ω2

cache missses(idj)

), (4.2)

with c being the relative cost of execution

c =

⌈

kconnections

⌉if id - list

ω3 if object - list(4.3)

and invalidations and cache misses representing their approximated frequen-

cies. An inverse sum is used because the goal is to minimise these values.

Scalable sampling methods to approximate these frequencies will be discussed

at the end of this chapter. The reward also needs to include weights ω1, ω2, ω3

to be able to express a preference towards lowering invalidations, cache misses

or response times at the client. For instance, increasing ω3 would increase the

reward for using object-lists, thus generally lowering client latency. At the

end of a learning period (n samples and rewards), the belief state is updated

batch-wise. First, the model computes the normalised total reward for each

42

execution model by averaging over the number of samples out of n for which

the decision object-list (n1) or id-list (n2) was made:

robject - list =

∑n1

i=1 ri(object - list)

n1

,

rid - list =

∑n2

i=1 ri(id - list)

n2

(4.4)

Finally, the current belief state is batch-updated through

p(object - list)t+1 = p(object - list)t + αt ·robject - list − rid - listrobject - list + rid - list

(4.5)

and

p(id - list)t+1 = 1− p(object - list)t+1, (4.6)

where αt ∈ [0, 1] is the learning rate at time point t. Again, the reason

updates are performed through batch-wise comparison is that rewards on

single queries are deemed to be too noisy. Intuitively, the model simply

samples rewards for decisions, compares rewards and shifts its belief state

according to the difference in rewards in the observation period normalised

by the total reward obtained. The learning rate can either be held constant

or tuned proactively. An apparent disadvantage of the model is that, as

it converges towards one execution model, fewer and fewer samples will be

drawn from the model that is deemed to be less relevant. Hence, special

consideration needs to be taken with regard to convergence strategies.

4.3.1 Convergence and Exploration

There are multiple convergence scenarios. First, the model could convert

to a mixture that does not put a clear preference on one execution model,

which could also be caused by an unfavourable parametrisation that leads

to too much noise or to little data, e.g. observation window for reward

43

measurements is too short. This could be defined as a case where

0.4 ≤ p(object - list) ≤ 0.6 and thus also 0.4 ≤ p(id - list) ≤ 0.6. This is an

unfavourable outcome, as it implies that random decisions on a uniform prior

are sufficient (hence no learning necessary). If the model converges strongly

towards one model, e.g. a probability of 90% or more for a single decision, it

might not be able to adapt to changing workloads later. A typical solution

to this problem is to introduce a small probability ε where a non-dominant

action is taken, a so called epsilon-greedy approach [76]. The model greedily

chooses the presumed best action with a probability of 1 − ε and otherwise

a random action. This can be practically achieved by bounding probabilities

for one decision by (1−ε). In the experimental evaluation, I will demonstrate

how this enables the model to detect and adapt to changing workloads.

4.3.2 Sampling Techniques

Various methods exist to approximate streams of incoming data [77]. Good

examples for approximations are cache miss and invalidation frequencies

through a moving window of arrival times. However, more sophisticated

methods exist: (biased) reservoir sampling can be used to summarise streams

by keeping a fixed-size reservoir of representative values and updating the

reservoir through a bias function [78, 79]. Initially, all incoming values are

used to fill the reservoir. A bias function (often exponential) f(r, t) is then

used to define a relative probability of an r-th point still belonging to the

reservoir at the arrival of a later arriving t-th point. Alternatively, one can

simply replace elements in the reservoir with a certain rate uniformly at ran-

dom. The advantage of the reservoir sampling approach is that it does not

completely ignore values after a certain period (like a moving window) [80].

Another aspect of sampling and approximation is extrapolation. For a large

database, it is infeasible to hold moving windows for all database records in

memory. Instead, one should expect to extrapolate from a set of representa-

tive records. I expect the learning model to be computationally inexpensive,

as it primarily consists of in-memory summations. From a practical perspec-

44

tive, this makes the model very favourable, as many sophisticated prediction

techniques require costly matrix factorisations that are problematic for scal-

able realtime learning. For instance, inference on Gaussian processes runs

with O(n3) runtime and O(n2) space complexity [81]. Sparse matrix approx-

imation techniques exist, but are rather targeted at offline processing of large

datasets and do not operate on a timescale of miliseconds [82, 83].

4.3.3 Hyperparameter Optimisation

At various point in this work, I have pointed towards trade-offs in consistency,

latency, server load and cache efficiency. For instance, the reward function

is parametrised through weights ω1, ω2, ω3 that characterise a preference

between cache misses and invalidations. It is however not straight-forward

to define parameters that express a specific performance level. For instance,

a DBaaS provider might desire to analyse the required performance at var-

ious components to achieve a certain average latency for a specific caching

topology. This section briefly explains how to optimise the parameters of the

learning model. First, one needs to define a global utility of an instance of

the Monte Carlo simulation. An instance means running a certain workload

with specific request and latency distributions and monitoring all perfor-

mance measures of interest. The global utility u of an instance is a linear

combination of n utility functions f that map concrete values to a normalised

utility

u =n∑i

ωifi. (4.7)

For illustration, consider a possible utility function for average query latency

at the client, as seen in figure 4.1. Here, an average latency below 50 millisec-

onds is desired. Latencies of about 100 milliseconds are already considered

to be of much less utility and latencies close to 200 milliseconds are of no

value, e.g. due to a service level agreement.

45

Average response time (ms)0 50 100 150 200

Util

ity

0

0.2

0.4

0.6

0.8

1Utility

Figure 4.1: Utility function example for response times.

In general, a configuration can then be found according to the following steps:

(1) Definition of a hyperparameter space, e.g. parameters ω1, ω2 ∈ (0, 1)

define a two-dimensional grid.

(2) Definition of a linear combination of utility functions on the perfor-

mance metrics of interest, thus mapping concrete desired values to a

normalised score.

(3) By repeatedly drawing samples from the hyperparameter space, a local

optimum is determined.

The key insight is that this method does not draw at random or by using

a stochastic gradient descent, which would only take into account local im-

provements. Traditional approaches include random search, grid search and

manual search of optimal parameters [84, 85]. However, these approaches

can be inefficient if the reward function is expensive to evaluate. In con-

trast, Bayesian approaches using Gaussian processes construct a probabilis-

tic model of the reward function and make educated estimates on where in

the parameter space to next evaluate the function. This is done by utilising

all available information from previous evaluations instead of just making a

46

local estimate [86]. I use the Spearmint framework described by Snoek et al.

to perform Monte Carlo optimisation with Gaussian processes [87, 88].

47

Chapter 5

Evaluation

5.1 Aims

This chapter describes the implementation, the experimental set-up and the

experimental evaluation. First, however, the evaluation goals need to be

defined. The experiments should

(1) confirm the trade-offs of the different execution models suggested by

my theory,

(2) investigate the relationship between the stochastic ttl estimation model

and consistency,

(3) demonstrate that the estimation method is superior to a static model

with regard to invalidation load, and

(4) validate the learning model as a method to achieve the desired trade-

offs.

It is also necessary to understand the baseline of the evaluation. Section

5.3 compares cache hit rates and response times between different execution

models. DBaaS providers usually do not cache their dynamic content be-

cause of consistency and invalidation issues. An appropriate baseline is thus

a DBaaS that does not cache its dynamic content. Section 5.4 investigates

49

consistency and invalidation load for the proposed model. Specifically, it is il-

lustrated how naive caching techniques for dynamic content are bottlenecked

by invalidation cost (and hence not used in practice). Finally, section 5.5

analyses the performance of the online learning scheme.

5.2 Simulation Framework

5.2.1 Design and Implementation

All experiments were carried out in a Java 8 simulation framework. I chose

Java for its concurrency utilities and for the availability of some required

libraries. The implementation is based on previous work on dynamic caching

as well as the Yahoo Cloud Serving Benchmark (YCSB) [89, 6]. YCSB is an

established framework to benchmark cloud databases by providing a set of

typical workloads and a common interface for standard database operations.

It thus enables a comparison of database performance. In order to compare

two databases, users can provision a certain computing power (often Ama-

zon EC2 instances [90]) and then deploy the benchmark by implementing

the interface and specifying a workload. A workload is defined as a set of

parameters like read rate and write rate (e.g. 50/50), a request-distribution,

a desired throughput, the number of objects in the database, the number

of fields per entry and the length of these fields as well as the number of

concurrent clients.

In previous work, YCSB was extended to analyse caching behaviour of in-

dividual database entries [6]. In particular, no actual database was used in

my previous work with Gessert et al. Instead, a database was simulated as

a hash table. In this work, I abandoned the YCSB framework in favour of a

dedicated query simulation framework. I reused and extended some classes,

namely modules for treating individual resources. Specifically, I adapted

and modified the simulated cache class, the staleness detection mechanism

through time stamping, the moving window mechanism used to collect fre-

50

quencies, as well as the expiring Bloom filter. In the code, I have commented

each individual class to indicate whether it was reused, modified existing code

or completely independent. The main difference of my framework compared

to our previous work is the ability to generate, execute and evaluate queries.

Queries were constructed by drawing projections from specified ranges (e.g.

field 1 > 10) and then parsed and executed on a MongoDB server. The

advantage of using MongoDB is that users do not have to specify a schema.

Thus, the benchmark can simply insert and overwrite documents with ar-

bitrary specified contents instead of having to declare typed attributes. In

further benchmarks, one could also set up a schema to test other database

paradigms (e.g. relational), e.g. by following the specifications of the estab-

lished TPC-C benchmark [91].

Figure 5.1 provides an overview of my implementation. The main compo-

nents are clients (each associated to a thread generating requests), a cache

instance and the DBaaS endpoint that is managing database access, ttl es-

timation, invalidations, and learning. After specifying the workload param-

eters, the simulation populates MongoDB and ensures indices. All Mon-

goDB requests are executed with the write concern “acknowledged”. Write

concerns are guarantees on consistency after updates which directly affect

performance. For instance, “acknowledged” as the default concern means

that changes have been applied to the in-memory view of the data. Clients

then continuously generate requests that are routed to the CDN edge server,

which forwards them to the DBaaS. The DBaaS server parses requests and

executes queries on MongoDB while consulting the learning module for de-

cisions on execution models and the ttl estimator for expirations. On every

update, the query matching engine is consulted to decide which query results

have to be invalidated. The specific control flow of query caching and the

decision model have been extensively covered in chapters 3 and 4. A detailed

explanation of the individual modules can be found on the project website

[92].

51

DBaaS layer

Cache layer

MongoDBAccess metrics

Match updates onCached results

executeQueries/Updates

Stalenessdetector

Issues global versions

Tests for staleness on each read

Worker thread

Workload mixture Request distribution Number of operations ...

Generates nextrequest

Client layer

Learner

Query Matching

engine

TLL estimator

Send requests delayed by latency sample

Collapse and forward requests

Invalidate Return cached query

results

Figure 5.1: Overview of the simulation architecture.

52

5.2.2 Benchmark Configuration

All experiments were carried out on a machine with 16 GB RAM and a

quad-core i5 CPU (2.8 GHz). Further, normal distributions were used for

the latencies between client and cache, cache and DBaaS (using known Ama-

zon EC2 region latencies) and for invalidation latency (using data from the

Fastly CDN [64]). It should be noted that a distributed benchmark was

not performed. A requirement of this project was a cost-neutral evaluation.

While Amazon Web Services provides free micro tier instances to students,

these are not very useful here. Matching updates to query results is a compu-

tationally expensive task. Executing the benchmark on micro tier instances

would skew the results.

5.3 Comparing Execution Models

5.3.1 Read-Dominant Workload

The evaluation begins by comparing the object-list and the id-list model for

typical workloads. The model suggests that caching whole query results leads

to much lower latency due to fewer latency round-trips. In turn, I expect

higher cache hit rates when caching results as id-lists because intersecting

query predicates benefit from sharing cached entries. First, I examine a

typical read-heavy workload that consists of 95% reads and queries and 5%

updates on a Zipfian access distribution, e.g. photo tagging. To clarify, a

read is equivalent to a GET request on a single resource identified by its key,

whereas a query consists of at least one projection and requires evaluation by

the database’s query engine. The workload initially inserts 1000 documents

with each 10 fields of random data and then performs 100, 000 requests by

10 parallel threads (each one connection), beginning with a mixture of 40%

reads, 55% queries and 5% updates to demonstrate the principal difference

in execution models.

53

Query selectivity10 -410 -310 -210 -110 0

Cac

he h

it ra

te

0.2

0.4

0.6

0.8

1

Object-listId-list

Figure 5.2: Cache hit rates as a function of average query selectivity on amixture of 40% reads, 55% queries and 5% writes.


Ave

rage

res

pons

e tim

e (m

s)

0

50

100

150

200

Object-listId-listUncached DBaaS

Figure 5.3: Average query response times as a function of average queryselectivity on a mixture of 40% reads, 55% queries and 5% writes.

Figures 5.2 and 5.3 show how the performance of execution models relates

to average query selectivity. Further, response times for a DBaaS without

54

dynamic query caching are shown. A query selectivity of 1 means that the

query predicate matches all objects in the database and a query predicate of

0.1 means that the predicate matches 10% of all keys, i.e. selectivity indicates

how much result sets of different queries intersect. First, the result matches

the expectations as the object-list execution model achieves better response

times than the id-list model but has a worse cache hit rate. For an uncached

DBaaS, every request requires a full round-trip to the backend, resulting in

noticeable response times for the client particularly if the application requires

more than one round-trip.

There are two artifacts in figure 5.3 worth discussing. For a query selectivity

of 1, i.e. the query predicate matching all documents in a collection, both

models have slow response times. This is due to collapsed forwarding in the

cache. If many clients request the same content from a cache edge server, the

cache will collapse the requests to a single database query, thus blocking mul-

tiple clients. This decrease in request parallelism causes longer response times

for clients. Additionally, write locks also block incoming reads. Further, re-

sponse times for very selective queries (average selectivity of 0.0001) are very

similar because the predicate matches only one object in the simulation (or

none). This means that most queries will be uncached and thus require a full

round-trip to the DBaaS. Note that a round-trip between Europe and USA

EC2 regions is around 170 milliseconds [6], matching the result. Cache hit

rates on resources are still drastically different, because in the id-list model,

normal reads still pre-warm the cache for query results. The reason it still

takes a full round-trip is that the client first needs to fetch the id-list.

The result shown in this section were averaged over 5 runs. Considering the

probabilistic nature of the experiments, the simulation is very consistent.

For the plot in figure 5.2, the average cache hit rate for the id-list model

for an average query selectivity of 1 is 98.73% with a standard deviation

(sd) of 0.00015 and 95% confidence intervals (CI) of (0.9871, 0.9875). The

average cache hit rate of the object-list model is 74.34% (sd = 0.0035, CI =

0.7190, 0.7278). Similarly, the average response time for the id-list model is

158.88 ms (sd = 0.7 ms,CI = 158.01, 159.74). For the object-list model, an

55

average of 65.75 ms (sd = 0.55 ms,CI = 65.07, 66.44) is observed. In sum-

mary, errors were negligible: the simulation converges to the desired target

distributions after a few thousand requests. Since each workload executes

100, 000 requests, small fluctations (e.g. garbage collection) are averaged

out. Errors are hence omitted in further experiments.

The same experiment was repeated for a workload with a uniform access

distribution on the keys, as seen in figures 5.4 and 5.5. Notably, there is no

spike in latency at a selectivity of 1, as was observed in figure 5.3. Since

individual reads are now uniformly distributed over the key space, there is

less lock contention due to writes and thus more query parallelism at the

database. However, one can observe the same genereal trend and I will thus

in the remainder of the experiments use a Zipfian access distribution, which

is a more typical case [93, 94].


Cac

he h

it ra

te

0.2

0.4

0.6

0.8

1

Object-listId-list

Figure 5.4: Cache hit rates as a function of average query selectivity on amixture of 40% reads, 55% queries and 5% writes under a uniform accessdistribution.

In the experiments above, all expirations are estimated by only tracking

incoming writes, as described in chapter 3. The cumulative probability of

a write within the expiration is adjusted to p = 0.75 to enable high cache

hit rates on a read-heavy workload. While the impact of write-quantiles on

56


Ave

rage

res

pons

e tim

e (m

s)

0

50

100

150

200


Figure 5.5: Average query response times as a function of average queryselectivity on a mixture of 40% reads, 55% queries and 5% writes under auniform acess distribution.

invalidation load and eventual consistency will be analysed in the upcoming

sections, the execution models will first be compared under another workload.

5.3.2 Write-Dominant Workload

Figures 5.6 and 5.7 show the same metrics for a write-heavy workload that

consists of 50% writes and 25% queries and reads each.

Figure 5.7 shows a similar overall trade-off between execution models as the

read-heavy workload. However, there is a clear difference in cache hit rates,

as the object-list model provides very weak cache performance. This was

not the case in the read-heavy workload, where cache hit rates began high

but degraded with increasing selectivity. In turn, average response times are

rather high (above 400 milliseconds) for the id-list model. Notably, they are

even worse than not caching at all. Since objects are written very frequently,

clients first do not get a cache hit on the id-list in the CDN.

57


Ave

rage

res

pons

e tim

e (m

s)

100

150

200

250


Figure 5.6: Average query response times as a function of average query se-lectivity on a mixture of 25% reads, 25% queries and 50% writes. Estimationquantiles have been adjusted to p = 0.4 to account for the write-dominantworkload.


Cac

he h

it ra

te

0.2

0.4

0.6

0.8

1Object-listId-list

Figure 5.7: Cache hit rates as a function of average query selectivity on amixture of 25% reads, 25% queries and 50% writes.

58

After retrieving the id-list, clients have to iterate over the individual re-

sources, which also might not be available in the cache. Hence, in this case

it is more economical to use the object-list model. This experiment has il-

lustrated how demanding write-dominant workloads are for the DBaaS. In

order to maintain reasonable latencies at clients, one has to accept low cache

efficiency. In the following section, I investigate how ttl estimations affect

invalidation load and client consistency.

5.4 Consistency and Invalidations

5.4.1 Adjusting Quantiles

This section deals with the effect of write quantiles on client consistency and

cache hit rates. Specifically, the experiments should quantify how invalida-

tion load and stale reads are connected to the cumulative write probability

within a cache expiration duration. I again consider the write-dominant

workload that causes expensive trade-offs on cache efficiency for acceptable

client latency. The first experiment investigates staleness using the data on

invalidation latencies provided by the Fastly CDN [64]. Figure 5.8 shows

the absolute number of stale reads. As expected, stale reads increase with

increasing quantiles because every write on a still cached object triggers the

possibility of a stale read, depending on how fast the invalidation is executed.

The highest number of stale reads observed accounted for 1% of all reads

and queries (500 out of 50,0000), depending on the execution model (see

appendix B for impact of invalidation latency on stale reads). For workload-

adjusted quantiles (i.e. lower quantiles on write-dominant workloads) the

average number of stale reads is about 0.1%, which seems acceptable for

most applications without strong transactional semantics. Figure 5.9 shows

how cache hit rates depenend on the quantile of the next expected write on

the same workload. As noted above, the object-list model provides weak

cache performance on a write-dominant workload.

59

Quantile of next expected write0 0.2 0.4 0.6 0.8 1

Sta

le r

eads

0

200

400

600

Object-listId-list

Figure 5.8: Absolute number of stale reads on the write-dominant workloadas a function of the quantile of the next expected write.

Specifically, most cache hits in in the object-list model in this scenario stem

from GET requests on individual resources, not from cached queries.


Cac

he h

it ra

te

0

0.2

0.4

0.6

0.8

1

Object-listId-list

Figure 5.9: Cache hit rates on the write-dominant workload as a function ofthe quantile of the next expected write.

In these experiments, all objects were cachable, leading to an invalidation

load of 45, 000 to 50, 000, i.e. almost every write leading to an invalidation on

60

a Zipfian access distribution. This illustrates the necessity of marking certain

objects uncachable, as they will otherwise bottleneck the query matching

engine. The following section investigates which trade-offs can be achieved

with regard to invalidation load.

5.4.2 Reducing Invalidation Load

As discussed in chapter 3, the DBaaS needs to be able to reduce invalidation

load depending on the achievable throughput of matching updates to query

results. I thus have implemented the proposed model of marking certain

objects uncachable based on the insight that some objects might be updated

so frequently that they cannot reasonably be cached (e.g. they would be stale

by the time they have arrived at the CDN). Figure 5.10 shows a comparison

of invalidation loads when using this approach to the previous approach

of caching all objects depending on their write frequency and the chosen

quantile. For comparison, I also show a static caching method that caches

all objects with the same expiration. First, one can note that for a write-

dominant workload, the cache hit rate is capped at 86.6%. Second, marking

certain objects as uncachable results in an average cache hit rate of 72.6%

(excluding quantile 0, which means no caching). The interesting question is

now how this relates to invalidation load. Figure 5.11 compares invalidation

loads from the same experiments.

Notably, there is a drastic decrease in invalidation load by dynamically mark-

ing objects as uncachable. Average invalidation load is reduced by about

50%, while only giving up 14% cache hit rate (response times did not differ

significantly). The same effect can be observed for the object-list model.

61


Cac

he h

it ra

te

0

0.2

0.4

0.6

0.8

1

Naive id-listUncachable objects markedStatic caching

Figure 5.10: Cache hit rates on the write-dominant workload as a functionof the quantile of the next expected write and compared to a static cachingmethod.


Inva

lidat

ions

#10 4

0

2

4

6

Naive id-listUncachable objects markedStatic caching

Figure 5.11: Invalidation loads for using the naive id-list approach versusdynamically marking frequently written objects as uncachable.

62

5.5 Online Learning

5.5.1 Learning Decisions

In the previous sections, I have demonstrated the trade-offs related to execu-

tion models and their parametrisations. I begin the evaluation of the learning

model by applying the decision model to the read-dominant workload from

above while also using the optimisation of marking some objects uncachable.

With only 5% writes, primary focus is not on limiting invalidation load but

rather on client latency.

Query selectivity Learner Random guess Object-list Id-list

1 211.2 348.2 188.9 552.70.1 151.5 195.5 142.3 251.10.01 107 124.4 104.5 1670.001 130.5 135.9 129.1 147.80.0001 146.8 147 148.7 148.6

Table 5.1: Average overall request response times (ms) for learning modelcompared to random guessing and static decisions on a read-dominant work-load.

Query selectivity Learner Random guess Object-list Id-list

1 0.3 0.61422 0.22 0.870.1 0.42 0.7115 0.28 0.8890.01 0.6 0.74 0.46 0.930.001 0.37 0.441 0.29 0.720.0001 0.19 0.21 0.18 0.46

Table 5.2: Cache hit rates for learning model compared to random guessingand static decisions on a read-dominant workload.

Table 5.1 compares average request response times for the learning model

compared to a uniform random guess and static decisions. The differences

in response times between learner and random guessing are small because a

random mixture already provides relatively low latencies, as id-cached results

pre-warm the cache for individual reads. By having a bias towards low laten-

cies, the learning model has traded in cache efficiency (as seen in table 5.2).

63

One can also see that the learner converges towards the performance of the

object-list model. For the evaluation of the learning model, the comparison

to a static decision is not as useful because it is already known that either

object-list or id-list is optimal depending on the desired trade-offs. Having

established that the model can converge to the performance of a static model,

the question is thus rather whether its decisions are better than random deci-

sions, which I will focus on in the following (hence omitting static decisions,

as I have covered them extensively above).

For a more detailed analysis, isolated query response times of learning and

guessing can be considered, i.e. ignoring response times for individual GET

requests, as seen in table 5.3. Previously, I already suggested that the ex-

treme ends of selectivity can be rather ignored because of lack of parallelism

for selectivity of 1 and no difference between decisions for highly selective

queries. Instead, one might consider rather typical cases, e.g. for an average

query selectivity of 1% the average query response time could be reduced

from 104.8 to 67.4 milliseconds (35.6% decrease).

Query selectivity Belief state approximation Random guess

1 295.7 528.60.1 176.8 229.80.01 67.4 104.80.001 120.4 139.50.0001 171.8 173.7

Table 5.3: Average query response times (ms) for learning model comparedto random guessing on execution model on read-dominant workload.

I have repeated the same experiment for the write-dominant workload This

time, learning was focussed on invalidation load and response times, as pre-

vious experiments have already shown that high cache performance is not

possible without very high latencies. Again, latency can be drastically re-

duced while maintaining approximately the same invalidation levels, as seen

in table 5.5. This is achieved by trading in cache performance. In this par-

ticular experiment, the average cache-hit rate of the learner sinks to 6% from

21% for random guessing. Both cache hit rates are very low, as hotspot

64


1 230.3 5820.1 215.9 422.80.01 189.6 260.60.001 168.3 187.40.0001 168.5 171

Table 5.4: Average request response times (ms) for learning model comparedto random guessing on execution model on write-dominant workload.

objects are marked uncachable to reduce invalidation loads. In principle,

these experiments have established that the learning model can indeed learn

towards certain metrics. In the following sections, I will analyse the quality

of these trade-offs and convergence properties in more detail.


1 28142 247250.1 26493 269880.01 18919 250280.001 18184 167680.0001 12066 11650

Table 5.5: Invalidation loads for learning model compared to random guessingon execution model on write-dominant workload.

5.5.2 Evaluating Trade-offs

The tables above show that the learning model can achieve improvements on

various metrics. However, it is hard to quantify the quality of a trade-off by

simply comparing for instance cache hit rates and invalidations, as was done

above. To this end, the approach of defining a linear combination of utilities

from chapter 4 is used. By defining utility functions for response times,

invalidations and cache hit rates, the global system utility of a workload

instance is defined. Consequently, it can be assessed if and how the learning

model increases utility over time.

65

I use the utility function from section 4.3.3 for latency and a linear function of

invalidation utility, i.e. u(invalidation) = 1−invalidations/writes. Further,

the cache hit rate itself can be used as a utility function because it is already

normalised. Figure 5.12 shows how system utility changes over the num-

ber of operations performed during a benchmark instance (read-dominant

workload).

After 2, 000 operations, there is an initially high utility for random decisions

and a rather low utility for learning. I attribute this to warmup effects. As

initial response times are longer, less utility comes from latency. Within a

few thousand operations, the learning model (learning rate α = 0.1) achieves

much higher utility and the utility of random guessing degrades.

Operations performed #1040.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Util

ity

0.25

0.3

0.35

0.4Random guessingLearner

Figure 5.12: Global utility as a function of operations performed.

The key point here is that there is a formal framework of mapping concrete

values of metrics (e.g. latency of 100 milliseconds) to a normalised score. The

learning model is consequently able to learn trade-offs that optimise towards

whatever preference is expressed by the service provider.

66

5.5.3 Convergence and Stability

Finally, the evaluation needs to consider situations when the learning model

has converged and is then confronted with a changing workload. To this

end, I start with the write-dominant workload for 100, 000 operations and

then introduce a higher proportion of reads (mixture of 40% reads, 20%

writes, 40%). Figure 5.13 demonstrates the associated behaviour. Ostensibly,

the model quickly achieves a higher utility which then steadily increases.

However, one can also note that random guessing seems to achieve higher

utility over time. This can be explained through higher cache utilisation. As

the cache fills up again after a change in request load, latency and cache hit

rate improve and total utility increases for all models.

In summary, the evaluation has validated the caching scheme. Using dy-

namic query caching and the learning scheme, average response times for

clients can be lowered to an imperceptible range (below 100 ms). Further,

I have investigated consistency and invalidation load as manageable practi-

cal constraints. Finally, the experiments have demonstrated how an online

learning model can achieve these trade-offs dynamically through a method

motivated by reinforcement learning.

67

Operations performed #1050 0.5 1 1.5 2

Util

ity

0.1

0.2

0.3

0.4Random guessingLearner

Figure 5.13: Behaviour of learning model versus random guessing under achange of workload mixture.

68

Chapter 6

Outlook and Conclusion

6.1 Summary and Conclusion

In this project, I have identified remote access latency as a key performance

problem in interactive applications. What is more, I have pointed out the

constraints of naive caching schemes with regard to consistency and invalida-

tion load. Considering these limitations, I have introduced a caching scheme

which I believe can achieve low latency for the client while maintaining tun-

able invalidation load and consistency. The first component of the caching

mechanism is based on the idea that different representations and execution

models can be used for varying workloads. The second contribution is an

online learning model that uses various approximations to make decisions

based upon these representations. Through the Monte Carlo simulation of

typical workloads, various trade-offs for client performance, cache efficiency

and server load were shown.

To the best of my knowledge, this study has introduced the first model for

caching highly dynamic query content. In principle, any REST-ful web ser-

vice can implement the proposed architecture thereby achieving drastically

improved response times for clients whithout incurring too great an invalida-

tion load at the backend. On a more general note, this project has provided

69

an example of how the intersection of distributed systems, databases and

machine learning enables more flexible and adaptive infrastructures.

6.2 Future Work

6.2.1 Parsing Query Predicates

This work did not extensively cover query semantics during invalidation. I

suggested comparing before and after images of documents affected by an

update. In future work, query predicates could be parsed by a schema-

aware middleware that could enable more efficient invalidation mechanisms.

For instance, on a numeric predicate, deciding whether an invalidation is

necessary is a simple range comparison between update value and predicate

range. On a similar note, knowledge about the schema would also enable

mixed decisions on cache representations. A typical use case of this is a

schema containing a counter, which is an essential data type for today’s

application economy (counting impressions, click-streams). A query would

usually select the counter to display its value. A sensible model might select

to cache counters as id-lists and other parts of a result as an object-list, since

updates on the counter would always invalidate the whole object-list.

6.2.2 Unified Learning Model

In the learning model, estimating expirations and making decisions on the

execution models were treated as two distinct tasks. This is because the

model was lacking a good function approximation of mapping the state space

of load metrics to a pair of expiration time and a decision on the execution

model. In future work, a unified reinforcement learning model that supports

more proactive decisions at the server could be explored. Instead of just

reacting to individual queries, an advanced model could maintain lists of

cachable and uncachable objects. It could then independently decide to

70

push and remove objects to and from caches. This prefetching of data to

edge servers is particularly relevant for initial load times. Further, there are

various other decisions and tunable runtime parameters related to latency. In

particular, my model examines eventual consistency in the context of stale

reads from the cache. Equally, performance could be tuned by adjusting

write concerns at the database cluster itself. That is to say, a tenant might

have a default setting of blocking request responses until an update has been

persisted to all replica sets. This could be relaxed during flash crowds.

71

Appendix A

Proofs

A.1 Minimum of Exponential Random Vari-

ables

The following theorem is straight-forward but I have not been able to locate

a proof in print [95].

Theorem A.1.1. Let Xi, . . . , Xn be mutually independent exponentially dis-

tributed random variables with rate parameters λi, . . . , λn . Then the mini-

mum is again exponentially distributed:

min{X1, . . . , Xn} ∼ exponential

(n∑i=0

λi

)

Proof. Each Xi has the cumulative distribution function

F (x;λ) = 1− exp(−λx) for x ≥ 0

and rate parameter λi. The random variable Xmin = min{X1, . . . , Xn} has

the CDF

73

F (x;λmin) = P (Xmin ≤ x)

= 1− P (min{X1, . . . , Xn} > x)

= 1− P (X1 > x, . . . , Xn > x)

= 1−n∏i=1

P (Xi > x)

= 1−n∏i=1

exp(−λix)

= 1− exp

(−x

n∑i=1

λi

)= 1− exp(−λminx).

74

Appendix B

Additional Analysis

B.1 Impact of Invalidation Latency

Mean invalidation latency100 150 200 250

Sta

le r

eads

0

200

400

600

800Naive id-listUncachable objects marked

Figure B.1: Stale reads as a function of mean invalidation latency on 100,000operations. Higher invalidation latency gives rise to more stale reads, as thereis a bigger time window to retrieve stale content from the cache. Markingfrequently written objects as uncachable reduces this effect.

75

B.2 Monte Carlo Optimisation

Table B.1 demonstrates an example of hyperparameter optimisation through

Bayesian inference using the Spearmint framework [88]. Consider the write-

dominant workload from chapter 5. The target parameters are the write-

quantile and the maximum time-to-live (values between 0 and 60 seconds

allowed) the model can estimate. In this simple example, the utility of re-

sponse time is set to three times the utility of the cache hit rate.

Experiment Quantile p Maximum ttl (s) Utility

1 0.5 30 0.1872 0.75 15 0.2173 1 2 0.3264 1 0 0.3375 1 60 0.364

Table B.1: Bayesian optimisation of optimal quantile p and maximum allowedttl.

The Gaussian process quickly predicts that the highest utility is achieved by

setting a high write quantile and a high maximum ttl. Further experiments

did not show any improvement, as the inference model tried to improve the

utility by making tiny adjustments in the allowed range (e.g. a ttl of 59).

Since every run of the simulation takes a few minutes, Bayesian optimisation

is a convenient tool for quickly finding parametrisations. This is simply

done by defining the utility of the performance metrics of interest and then

sampling the Gaussian process for suggestions on the parameters repeatedly.

Every suggestion takes into account the utility of previous suggestions to

quickly find a maximum.

76

Bibliography

[1] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. Impact of re-sponse latency on user behavior in web search. In Proceedings of the37th International ACM SIGIR Conference on Research & Devel-opment in Information Retrieval, SIGIR ’14, pages 103–112, New York,NY, USA, 2014. ACM.

[2] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Managementfor the Cloud. Springer, New York, 2013 edition, April 2013.

[3] Guoqiang Zhang, Yang Li, and Tao Lin. Caching in information centricnetworking: A survey. Comput. Netw., 57(16):3128–3141, November2013.

[4] R. T. Hurley and B. Y. Li. A performance investigation of web cachingarchitectures. In Proceedings of the 2008 C3S2E Conference, C3S2E ’08,pages 205–213, New York, NY, USA, 2008. ACM.

[5] Taekook Kim and Eui-Jik Kim. Hybrid storage-based caching strat-egy for content delivery network services. Multimedia Tools Appl.,74(5):1697–1709, March 2015.

[6] Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, SteffenFriedrich, and Norbert Ritter. The cache sketch: Revisiting expiration-based caching in the age of cloud data management. BTW ’15, Hamburg,Germany, March 2015.

[7] Mukaddim Pathan and Rajkumar Buyya. A taxonomy of cdns. In Ra-jkumar Buyya, Mukaddim Pathan, and Athena Vakali, editors, ContentDelivery Networks, volume 9 of Lecture Notes Electrical Engineering,pages 33–77. Springer Berlin Heidelberg, 2008.

[8] Jia Wang. A survey of web caching schemes for the internet. SIGCOMMComput. Commun. Rev., 29(5):36–46, October 1999.

77

[9] Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40–44,January 2009.

[10] Felix Gessert, Steffen Friedrich, Wolfram Wingerath, MichaelSchaarschmidt, and Norbert Ritter. Towards a scalable and unifiedREST API for cloud data stores. In 44th annual conference of the soci-ety for informatics, Informatik 2014, Big Data - Mastering Complexity,22.-26. September 2014 in Stuttgart, Deutschland, pages 723–734, 2014.

[11] F. Gessert, F. Bucklers, and N. Ritter. Orestes: A scalable database-as-a-service architecture for low latency. In Data Engineering Workshops(ICDEW), 2014 IEEE 30th International Conference on, pages 215–222,March 2014.

[12] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M.Hellerstein, and Ion Stoica. Probabilistically bounded staleness for prac-tical partial quorums. Proceedings of the VLDB Endowment (PVLDB2012), 5(8):776–787, 2012.

[13] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M.Hellerstein, and Ion Stoica. Quantifying eventual consistency with pbs.Commun. ACM, 57(8):93–102, August 2014.

[14] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. Analyzing consis-tency properties for fun and profit. In Proceedings of the 30th AnnualACM SIGACT-SIGOPS Symposium on Principles of Distributed Com-puting, PODC ’11, pages 197–206, New York, NY, USA, 2011. ACM.

[15] Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, SanjeevKumar, and Harry C. Li. An analysis of facebook photo caching. InProceedings of the Twenty-Fourth ACM Symposium on Operating Sys-tems Principles, SOSP ’13, pages 167–181, New York, NY, USA, 2013.ACM.

[16] Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li.Ripq: Advanced photo caching on flash for facebook. In Proceed-ings of the 13th USENIX Conference on File and Storage Technologies,FAST’15, pages 373–386, Berkeley, CA, USA, 2015. USENIX Associa-tion.

[17] Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Grib-ble, and Henry M. Levy. An analysis of internet content delivery systems.SIGOPS Oper. Syst. Rev., 36(SI):315–327, December 2002.

78

[18] Michael J. Freedman. Experiences with coralcdn: A five-year opera-tional view. In In Proc NSDI, 2010.

[19] Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang.The stretched exponential distribution of internet media access patterns.In Proceedings of the Twenty-seventh ACM Symposium on Principles ofDistributed Computing, PODC ’08, pages 283–294, New York, NY, USA,2008. ACM.

[20] Patrick Wendell and Michael J. Freedman. Going viral: Flash crowds inan open cdn. In Proceedings of the 2011 ACM SIGCOMM Conference onInternet Measurement Conference, IMC ’11, pages 549–558, New York,NY, USA, 2011. ACM.

[21] Salvatore Scellato, Cecilia Mascolo, Mirco Musolesi, and Jon Crowcroft.Track globally, deliver locally: Improving content delivery networks bytracking geographic social cascades. In Proceedings of the 20th Inter-national Conference on World Wide Web, WWW ’11, pages 457–466,New York, NY, USA, 2011. ACM.

[22] Mike P. Wittie, Veljko Pejovic, Lara Deek, Kevin C. Almeroth, andBen Y. Zhao. Exploiting locality of interest in online social networks. InProceedings of the 6th International COnference, Co-NEXT ’10, pages25:1–25:12, New York, NY, USA, 2010. ACM.

[23] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khor-lin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, andVadim Yushprakh. Megastore: Providing scalable, highly available stor-age for interactive services. In Proceedings of the Conference on Inno-vative Data system Research (CIDR), pages 223–234, 2011.

[24] Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins,Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, BeatJegerlehner, Kyle Littlefield, and Phoenix Tong. F1: The fault-tolerantdistributed rdbms supporting google’s ad business. In Proceedings ofthe 2012 ACM SIGMOD International Conference on Management ofData, SIGMOD ’12, pages 777–778, New York, NY, USA, 2012. ACM.

[25] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes,Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev,Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kan-thak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, DavidMwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Ya-sushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and

79

Dale Woodford. Spanner: Google’s globally distributed database. ACMTrans. Comput. Syst., 31(3):8:1–8:22, August 2013.

[26] Charles Garrod, Amit Manjhi, Anastasia Ailamaki, Bruce Maggs, ToddMowry, Christopher Olston, and Anthony Tomasic. Scalable query re-sult caching for web applications. Proc. VLDB Endow., 1(1):550–561,August 2008.

[27] Sadiye Alici, Ismail Sengor Altingovde, Rifat Ozcan, Berkant BarlaCambazoglu, and Ozgur Ulusoy. Timestamp-based result cache invali-dation for web search engines. In Proceedings of the 34th InternationalACM SIGIR Conference on Research and Development in InformationRetrieval, SIGIR ’11, pages 973–982, New York, NY, USA, 2011. ACM.

[28] Burton H. Bloom. Space/time trade-offs in hash coding with allowableerrors. Commun. ACM, 13(7):422–426, July 1970.

[29] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh,and George Varghese. An improved construction for counting bloomfilters. In Algorithms–ESA 2006, pages 684–695. Springer, 2006.

[30] Andrei Broder and Michael Mitzenmacher. Network applications ofbloom filters: A survey. Internet Math., 1(4):485–509, 2003.

[31] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, andJohn Lockwood. Deep packet inspection using parallel bloom filters.In High performance interconnects, 2003. proceedings. 11th symposiumon, pages 44–51. IEEE, 2003.

[32] Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. Summarycache: A scalable wide-area web cache sharing protocol. IEEE/ACMTrans. Netw., 8(3):281–293, June 2000.

[33] W. R. Gilks. Markov Chain Monte Carlo. John Wiley & Sons, Ltd,2005.

[34] Reuven Y. Rubinstein and Dirk P. Kroese. Markov Chain Monte Carlo,pages 167–200. John Wiley & Sons, Inc., 2007.

[35] Siddhartha Chib and Edward Greenberg. Understanding the metropolis-hastings algorithm. THE AMERICAN STATISTICIAN, 1995.

[36] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distri-butions, and the bayesian restoration of images. IEEE Trans. PatternAnal. Mach. Intell., 6(6):721–741, November 1984.

[37] Luc Devroye. Non-uniform random variate generation, 1986.

80

[38] Siddhartha Chib. Chapter 57 - markov chain monte carlo methods:Computation and inference. volume 5 of Handbook of Econometrics,pages 3569 – 3649. Elsevier, 2001.

[39] Richard S. Sutton and Andrew G. Barto. Introduction to ReinforcementLearning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.

[40] G. Yen and T. Hickey. Reinforcement learning algorithms for roboticnavigation in dynamic environments. In Neural Networks, 2002. IJCNN’02. Proceedings of the 2002 International Joint Conference on, vol-ume 2, pages 1444–1449, 2002.

[41] AndreyV. Gavrilov and Artem Lenskiy. Mobile robot navigation us-ing reinforcement learning based on neural network with short termmemory. In De-Shuang Huang, Yong Gan, Vitoantonio Bevilacqua, andJuanCarlos Figueroa, editors, Advanced Intelligent Computing, volume6838 of Lecture Notes in Computer Science, pages 210–217. SpringerBerlin Heidelberg, 2012.

[42] Gerald Tesauro. Temporal difference learning and td-gammon. Com-mun. ACM, 38(3):58–68, March 1995.

[43] Johannes Furnkranz. Recent advances in machine learning and gameplaying. OGAI Journal, 26(2):19–28, 2007.

[44] M. Wiering and M. van Otterlo. Reinforcement Learning: State-of-the-Art. Adaptation, Learning, and Optimization. Springer Berlin Heidel-berg, 2012.

[45] Glenn F Matthews and Khaled Rasheed. Temporal difference learningfor nondeterministic board games. In IC-AI, pages 800–806, 2008.

[46] Peter Dayan and Bernard W Balleine. Reward, motivation, and rein-forcement learning. Neuron, 36(2):285–298, 2002.

[47] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learn-ing, 8(3):279–292, 1992.

[48] Andrew G. Barto, R. S. Sutton, and C. J. C. H. Watkins. Learning andsequential decision making. In LEARNING AND COMPUTATIONALNEUROSCIENCE, pages 539–602. MIT Press, 1989.

[49] AndrewW. Moore and ChristopherG. Atkeson. Prioritized sweeping:Reinforcement learning with less data and less time. Machine Learning,13(1):103–130, 1993.

81

[50] Kenneth O. Stanley and Risto Miikkulainen. Efficient reinforcementlearning through evolving neural network topologies. In Proceedings ofthe Genetic and Evolutionary Computation Conference, GECCO ’02,pages 569–577, San Francisco, CA, USA, 2002. Morgan Kaufmann Pub-lishers Inc.

[51] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioan-nis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atariwith deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.

[52] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu,Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, An-dreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie,Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran,Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level con-trol through deep reinforcement learning. Nature, 518(7540):529–533,02 2015.

[53] Carl Edward Rasmussen and Malte Kuss. Gaussian processes in re-inforcement learning. In Advances in Neural Information ProcessingSystems 16, pages 751–759. MIT Press, 2004.

[54] Yaakov Engel, Shie Mannor, and Ron Meir. Reinforcement learning withgaussian processes. In Proceedings of the 22Nd International Conferenceon Machine Learning, ICML ’05, pages 201–208, New York, NY, USA,2005. ACM.

[55] G. Tesauro, R. Das, W.E. Walsh, and J.O. Kephart. Utility-function-driven resource allocation in autonomic systems. In Autonomic Com-puting, 2005. ICAC 2005. Proceedings. Second International Conferenceon, pages 342–343, June 2005.

[56] G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. A hybrid reinforce-ment learning approach to autonomic resource allocation. In Proceedingsof the 2006 IEEE International Conference on Autonomic Computing,ICAC ’06, pages 65–73, Washington, DC, USA, 2006. IEEE ComputerSociety.

[57] Jianxin Yao, Chen-Khong Tham, and Kah-Yong Ng. Decentralized dy-namic workflow scheduling for grid computing using reinforcement learn-ing. In Networks, 2006. ICON ’06. 14th IEEE International Conferenceon, volume 1, pages 1–6, Sept 2006.

[58] Sebastian Angel, Hitesh Ballani, Thomas Karagiannis, Greg O’Shea,and Eno Thereska. End-to-end performance isolation through virtual

82

datacenters. In Proceedings of the 11th USENIX Conference on Op-erating Systems Design and Implementation, OSDI’14, pages 233–248,Berkeley, CA, USA, 2014. USENIX Association.

[59] Facebook. The parse backend platform. https://parse.com/.

[60] David Flanagan. JavaScript: The Definitive Guide. O’Reilly Media,Inc., 2006.

[61] Peter Bailis and Ali Ghodsi. Eventual consistency today: limitations,extensions, and beyond. Communications of the ACM, 56(5):55–63,2013.

[62] Peter Bailis, Ali Ghodsi, Joseph M Hellerstein, and Ion Stoica. Bolt-oncausal consistency. In SIGMOD 2013, pages 761–772. ACM, 2013.

[63] Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, MihaiBudiu, and Yaron Minsky. Bimodal multicast. ACM Trans. Comput.Syst., 17(2):41–88, May 1999.

[64] fastly. Blog post on multicast implementation in fastly.http://www.fastly.com/blog/building-fast-and-reliable-purging-system/, February 2014.

[65] MongoDB, Inc. MongoDB. http://www.mongodb.org/.

[66] MongoDB, Inc. Tutorial on query optimization for mongodb.http://docs.mongodb.org/manual/core/query-optimization/, 2015.

[67] Ilya Grigorik. Presentation on http/2 mechanics. goo.gl/8yczyz.

[68] Saar Cohen and Yossi Matias. Spectral bloom filters. In Proceedings ofthe 2003 ACM SIGMOD International Conference on Management ofData, SIGMOD ’03, pages 241–252, New York, NY, USA, 2003. ACM.

[69] The Apache Software Foundation. Apache Storm.https://storm.apache.org/.

[70] Craig Jefferds. Sift.js library for evaluating mongodb-queries.https://github.com/crcn/sift.js/tree/master.

[71] R.G. Gallager. Discrete Stochastic Processes. The Springer InternationalSeries in Engineering and Computer Science. Springer US, 1995.

[72] Amine Abou-Rjeili and George Karypis. Multilevel algorithms for parti-tioning power-law graphs. In Proceedings of the 20th International Con-ference on Parallel and Distributed Processing, IPDPS’06, pages 124–124, Washington, DC, USA, 2006. IEEE Computer Society.

83

[73] Yinglian Xie and D. O’Hallaron. Locality in search engine queries andits implications for caching. In INFOCOM 2002. Twenty-First AnnualJoint Conference of the IEEE Computer and Communications Societies.Proceedings. IEEE, volume 3, pages 1238–1247 vol.3, 2002.

[74] Satinder Singh and Richard S. Sutton. Reinforcement learning withreplacing eligibility traces. In MACHINE LEARNING, pages 123–158,1996.

[75] Hesam Montazeri, Sajjad Moradi, and Reza Safabakhsh. Continuousstate/action reinforcement learning: A growing self-organizing map ap-proach. Neurocomputing, 74(7):1069–1082, 2011.

[76] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Lopes Gancarski. Acontextual-bandit algorithm for mobile context-aware recommender sys-tem. In Neural Information Processing, pages 324–331. Springer, 2012.

[77] Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jer-maine. Synopses for massive data: Samples, histograms, wavelets,sketches. Found. Trends databases, 4(1–3):1–294, January 2012.

[78] Charu C. Aggarwal. On biased reservoir sampling in the presence ofstream evolution. In Proceedings of the 32Nd International Conferenceon Very Large Data Bases, VLDB ’06, pages 607–618. VLDB Endow-ment, 2006.

[79] Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactionson Mathematical Software (TOMS), 11(1):37–57, 1985.

[80] Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, and Bo-jian Xu. Forward decay: A practical time decay model for streamingsystems. In Data Engineering, 2009. ICDE’09. IEEE 25th InternationalConference on, pages 138–149. IEEE, 2009.

[81] James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processesfor big data. arXiv preprint arXiv:1309.6835, 2013.

[82] Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifyingview of sparse approximate gaussian process regression. The Journal ofMachine Learning Research, 6:1939–1959, 2005.

[83] Edward Snelson and Zoubin Ghahramani. Local and global sparse gaus-sian process approximations. In International Conference on ArtificialIntelligence and Statistics, pages 524–531, 2007.

84

[84] James S. Bergstra, Remi Bardenet, Yoshua Bengio, and Balazs Kegl.Algorithms for hyper-parameter optimization. In J. Shawe-Taylor, R.S.Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advancesin Neural Information Processing Systems 24, pages 2546–2554. CurranAssociates, Inc., 2011.

[85] James Bergstra and Yoshua Bengio. Random search for hyper-parameteroptimization. J. Mach. Learn. Res., 13(1):281–305, February 2012.

[86] Nimalan Mahendran, Ziyu Wang, Firas Hamze, and Nando de Freitas.Adaptive mcmc with bayesian optimization. In Neil D. Lawrence andMark Girolami, editors, AISTATS, volume 22 of JMLR Proceedings,pages 751–760. JMLR.org, 2012.

[87] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesianoptimization of machine learning algorithms. In F. Pereira, C.J.C.Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural In-formation Processing Systems 25, pages 2951–2959. Curran Associates,Inc., 2012.

[88] Jasper Snoek. Spearmint package for bayesian optimisation.https://github.com/HIPS/Spearmint.

[89] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan,and Russell Sears. Benchmarking cloud serving systems with ycsb. InProceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10,pages 143–154, New York, NY, USA, 2010. ACM.

[90] Amazon Web Services. Amazon Elastic Compute Cloud (amazon ec2).http://aws.amazon.com/de/ec2/.

[91] Scott T. Leutenegger and Daniel Dias. A modeling study of the tpc-cbenchmark. In Proceedings of the 1993 ACM SIGMOD InternationalConference on Management of Data, SIGMOD ’93, pages 22–31, NewYork, NY, USA, 1993. ACM.

[92] Michael Schaarschmidt. Github project page of the Monte Carlo sim-ulation framework. https://github.com/mschaars/Query-Simulation-Framework.

[93] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Dr-uschel, and Bobby Bhattacharjee. Measurement and analysis of onlinesocial networks. In Proceedings of the 7th ACM SIGCOMM Conferenceon Internet Measurement, IMC ’07, pages 29–42, New York, NY, USA,2007. ACM.

85

[94] L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. Web cachingand zipf-like distributions: evidence and implications. In INFOCOM’99. Eighteenth Annual Joint Conference of the IEEE Computer andCommunications Societies. Proceedings. IEEE, volume 1, pages 126–134vol.1, Mar 1999.

[95] Daniel S. Myers. Lecture notes on exponen-tial distributions. http://pages.cs.wisc.edu/ dsmy-ers/cs547/lecture 9 memoryless property.pdf.

86

Towards Latency: An Online Learning Mechanism for Caching ...ey204/pubs/MPHIL/2015_MICHAEL.pdfrelies...

Documents

Transcript of Towards Latency: An Online Learning Mechanism for Caching ...ey204/pubs/MPHIL/2015_MICHAEL.pdfrelies...