Queues, Pools and Caches paper

21
Queues, Pools and Caches - Everything a DBA should know of scaling modern OLTP Gwen (Chen) Shapira, Senior Consultant The Pythian Group [email protected] Scalability Problems in Highly Concurrent Systems When we drive through a particularly painful traffic jam, we tend to assume that the jam has a cause. That road maintenance or an accident blocked traffic and created the slowdown. However, we often reach the end of the traffic jam without seeing any visible cause. Traffic researcher Prof. Sugiyama and his team showed that with sufficient traffic density, traffic jams will occur with no discernible root cause. Traffic jams will even form when cars drive in constant speed on a circular one-lane track i . When a large number of vehicles, beyond the road capacity, are successively injected into the road, the density exceeds the critical value and the free flow state becomes unstable.” ii OLTP systems are systems built to handle large number of small transactions. In those systems the main requirements are servicing large number of concurrent requests, with low and predictable latency. Good scalability for OLTP system can be defined as “Achieving maximum useful concurrency from a shared system” iii . OLTP systems often behave exactly like traffic jams in Prof. Sugiyama’s experiments – more and more traffic is loaded into the database. Inevitably, a traffic jam will occur, and we may not be able to find any visible root cause for that. In a wonderful video, Andrew Holdsworth of Oracle’s Real World Performance group shows how increasing traffic on a database server can dramatically increase latency without any improvement in response times and how reducing the number of connections to the database can improve performance iv . In this presentation, I’ll discuss several design patterns and frameworks that are used to improve scalability by controlling concurrency in modern OTLP systems and web based architectures. All the patterns and frameworks I’ll discuss are considered part of the software architecture. DBAs often take little interest in the design and architecture of the applications that use the database. But databases never operate in vacuum, DBAs who understand application design can have better dialog with the software team when it comes to scalability, and progress beyond finger pointing and “The database is slow” blaming. Those frameworks require sizing, capacity planning and monitoring – a task that DBAs are better qualified for than software developers, I’ll go into details on how DBAs can help size and monitor these systems with the database performance in mind.

description

Transaction processing systems are generally considered easier to scale than data warehouses. Relational databases were designed for this type of workload, and there are no esoteric hardware requirements. Mostly, it is just matter of normalizing to the right degree and getting the indexes right. The major challenge in these systems is their extreme concurrency, which means that small temporary slowdowns can escalate to major issues very quickly.In this presentation, Gwen Shapira will explain how application developers and DBAs can work together to built a scalable and stable OLTP system - using application queues, connection pools and strategic use of caches in different layers of the system.

Transcript of Queues, Pools and Caches paper

Page 1: Queues, Pools and Caches paper

Queues, Pools and Caches -

Everything a DBA should know of scaling modern OLTP Gwen (Chen) Shapira, Senior Consultant

The Pythian Group

[email protected]

Scalability Problems in Highly Concurrent Systems

When we drive through a particularly painful traffic jam, we tend to assume that the jam has a cause.

That road maintenance or an accident blocked traffic and created the slowdown. However, we often

reach the end of the traffic jam without seeing any visible cause.

Traffic researcher Prof. Sugiyama and his team showed that with sufficient traffic density, traffic jams

will occur with no discernible root cause. Traffic jams will even form when cars drive in constant speed

on a circular one-lane tracki.

“When a large number of vehicles, beyond the road capacity, are successively injected into the

road, the density exceeds the critical value and the free flow state becomes unstable.”ii

OLTP systems are systems built to handle large number of small transactions. In those systems the main

requirements are servicing large number of concurrent requests, with low and predictable latency. Good

scalability for OLTP system can be defined as “Achieving maximum useful concurrency from a shared

system”iii.

OLTP systems often behave exactly like traffic jams in Prof. Sugiyama’s experiments – more and more

traffic is loaded into the database. Inevitably, a traffic jam will occur, and we may not be able to find any

visible root cause for that. In a wonderful video, Andrew Holdsworth of Oracle’s Real World

Performance group shows how increasing traffic on a database server can dramatically increase latency

without any improvement in response times and how reducing the number of connections to the

database can improve performanceiv.

In this presentation, I’ll discuss several design patterns and frameworks that are used to improve

scalability by controlling concurrency in modern OTLP systems and web based architectures.

All the patterns and frameworks I’ll discuss are considered part of the software architecture. DBAs often

take little interest in the design and architecture of the applications that use the database. But

databases never operate in vacuum, DBAs who understand application design can have better dialog

with the software team when it comes to scalability, and progress beyond finger pointing and “The

database is slow” blaming. Those frameworks require sizing, capacity planning and monitoring – a task

that DBAs are better qualified for than software developers, I’ll go into details on how DBAs can help

size and monitor these systems with the database performance in mind.

Page 2: Queues, Pools and Caches paper

Connection Pools

The Problem:

Scaling application servers is a well understood problem. Through use of horizontal scaling and stateless

interactions it is relatively easy to deploy enough application capacity to support even thousands of

simultaneous user requests. This scalability, however, does not extend to the database layer.

Opening and closing a database connection is a high latency operation, due to the network protocol

used between the application server and the database and the significant overhead of database

resources. Web applications and OLTP systems can't afford this latency for every user request.

The Solution:

Instead of opening a new connection for each application request, the application engine prepares a

certain number of open database connections and caches them in a connection pool.

In Java, DataSource class is a factory for creating database connections and the preferred way of getting

a connection. Java defines a generic DataSource interface, and there are many vendors that provide

their own DataSource implementations. Many, but not all the implementations also include connection

pooling.v

Using the generic DataSource interface, developers call getConnection(), and the DataSource class

provides the connection. Since the developers write the same code regardless of whether the

DataSource class they are using implements pooling or not, asking a developer whether he is using

connection pooling is not a reliable method to determine if connection pooling is used.

To make things more complicated, the developer is often unaware of which DataSource class he is using.

The DataSource implementation will be registered with the Java Naming Directory (JNDI) and can be

deployed and managed separately from the application that is using it. Finding out which DataSource is

used and how the connection pool is configured can take some digging and creativity. Most application

servers contain a configuration file called "server.xml" or "context.xml" that will contain various

resource descriptions. Search for a resource with type "javax.sql.DataSource" can find the configuration

of the DataSource class and the connection pool minimum and maximum sizes.

Page 3: Queues, Pools and Caches paper

The Architecture:

New problems:

1. When connection pools are used all users share the same schema and same sessions, tracing

can be difficult. We advise developers to use DBMS_APPLICATION_INFO to set extra information

such as username (typically in client_info field), module and action to assist in future

troubleshooting.

2. Deciding on the size of a connection pool is the biggest challenge in using connection pools to

increase scalability. As always, the thing that gets us into trouble is the thing we don’t know

that we don’t know.

Most developers are well aware that if the connection pool is too small, the database will sit idle

while users are either waiting for connections or are being turned away. Since the scalability

limitation of small connection pools are known, developers tend to avoid them by creating large

connection pools, and increasing their size at the first hint of performance problems.

Application Business Layer

Application Data Layer

DataSource

JNDI

JDBC Driver Connection

Pool

DataSource Interface

Page 4: Queues, Pools and Caches paper

However a too large connection pool is a much greater risk to the application scalability. Here is

what the scalability of an OLTP system typically looks likevi:

Amdahl’s law say that the scalability of the system is constrained by its serial component as the

users are waiting for shared resources such as IO and CPU (This is the contention delay), but

according to the Universal Scalability Law there is a second delay called “coherency delay” –

which is the cost of maintaining data consistency in the system, this models waits on latches and

mutexes. After a certain point, adding more users to the system will decrease throughput.

Even when throughput doesn’t increase, at the point where throughput stops growing linearly,

data starts to queue and response times suffer proportionally:

If you check the wait events for a system that is past the point of saturation, you will see very

high CPU utilization, high “log file sync” event as a result of the CPU contention and high waits

for concurrency events such as “buffer busy waits” and “library cache latch”.

Page 5: Queues, Pools and Caches paper

3. Even when the negative effects of too many concurrent users on the system are made clear,

developers still argue for oversized connection pools with the excuse that most of the

connections will be idle most of the time. There are two significant problems with this approach:

a. While we believe that most of the connections will be idle most of the time, we can’t be

certain that this will be the case. In fact, the worst performance issues I’ve seen were

caused by the application actually using the entire connection pool allocated.

This often happens when response times at the database already suffer for some

reason, and the application does not receive response in a timely manner. At this point

the application or users rerun the operation, using another connection to run the exact

same query. Soon there are hundreds of connections to the database, all attempting to

run the same queries and waiting for the same latches.

b. The oversized connection pools have to be re-established during failover events or

database restarts. The larger the connection pool is, the longer the application will take

to recover from failover event, as a result decreasing the availability of the application.

4. Connection pools typically allow setting minimum and maximum sizes for the pool. When the

application starts it will open connections until the minimum number of connections is met.

Whenever it runs out of connections, it will open new connections until it reaches the maximum

level. If connections are idle for too long, they will be closed, but never below the minimum

level. This sounds fairly reasonable, until you ask yourself - if we set the minimum to the

number of connections usually needed, when will the pool run out of connections?

A connection pool can be seen as a queue. Users arrive and are serviced by the database while

holding a connection. According to little's law the avg. number of connections used in the queue

is (avg. DB response time)*(avg. user arrival rate). It is easy to see that you will run out of

connections if the rate that users use your site increases, or if the database performance

degrades and response times increase.

If your connection pool can grow at these times, it means that it will open new connections, a

resource intensive operation as we previously noted, to a database that is already abnormally

busy. This will farther slow things down, which can lead to a vicious cycle known as "connection

storm". It is much safer to configure the connection pool to a specific size – which is the

maximum number of concurrent users that can run queries on the database with acceptable

performance. We’ll discuss later how to determine this size. This will ensure that during peak

times you will have enough connections to maximize throughput at acceptable latency, and no

more.

5. Unfortunately, even if you decide on a proper number of database connections, there is the

problem of multiple application servers. In most web architectures there are multiple web

servers, each with a separate connection pool, all connecting to the same database server. In

this case, it seems appropriate to divide the number of connections the DB will sustain by the

number of servers and size the individual pools by that number. The problem with this approach

is that load balancing is never perfect, so it is expected that some app servers will run out of

connections while others still have spare connections. In some cases the number of application

Page 6: Queues, Pools and Caches paper

servers is so large that dividing the number of connections leaves less than one connection per

server.

Solutions to new problems:

As we discussed in the previous section, the key to scaling OLTP systems is by limiting the number of

concurrent connections to a number that the database can reasonably support even when they are all

active. The challenge is in determining this number.

Keeping in mind that OLTP workloads are typically CPU-bound, the number of concurrent users the

system can support is limited by the number of cores on the database server. A database with 12 cores

can typically only run 12 concurrent CPU-bound sessions.

The best way to size the connection pool is by simulating the load generated by the application.

Running a load test on the database is a great way of figuring out the maximum number of concurrent

active sessions that can be sustained by the database. This should usually be done with assistance from

the QA department, as they probably already determined the mix of various transactions that simulates

the normal operations load.

It is important to test the number of concurrently active connections the database can support at its

peak, therefore while testing it is critical to make sure that the database is indeed at full capacity and is

the bottleneck at the point when we decide the number of connections is maximal. This can be

reasonably validated by checking the CPU and IO queues at the database server and correlating with the

response times of the virtual users.

In usual performance tests, you try to decide on the maximum numbers of users the application can

support. Therefore you run the test with increasing number of virtual users, until the response times

become unacceptable. However, when attempting to determine the maximum number of connections

in the pool, you should run the test with a fixed number of users and keep increasing the number of

connections in the connection pool until the database CPU utilization goes above 60%, the wait events

go from “CPU” to concurrency events and response times become unacceptable. Typically all three of

these symptoms will start occurring at approximately the same time.

If a QA department and load testing tools are not available, it is possible to use the methodology

described by James Morle in his paper "Brewing Benchmarks" and generate load testing scripts from

trace files, which can later be replayed by SwingBench.

When running a load test is impractical, you will need to estimate the number of connections based on

available data. The factors to consider are:

1. How many cores are available on the database server?

2. How many concurrent users or threads does the application need to support?

3. When an application thread takes a connection from the pool, how much of the time is spent

holding the connection without actually running database queries? The more time the

Page 7: Queues, Pools and Caches paper

application spends “just holding” the connection, the larger the pool will need to be to support

the application workload.

4. How much of the database workload is IO-bound? You can check IOWAIT on the database server

to determine this. The more IO-bound your workload is, the more concurrent users you can run

without running into concurrency contention (You will see a lot of IO contention though).

“Number of cores”x4 is a good connection pool starting point. Less if the connections are heavily utilized

by the application and there is little IO activity and more if the opposite is true.

The remaining problem is what to do if the number of application servers is large and it is inefficient to

divide the connection pool limit among the application servers. Well-architected systems usually have a

separate data layer that can be deployed on separate set of servers. This data layer should be the only

component of the application allowed to open connections to the database, and it provides data objects

to the various application server components. In this architecture, the connections are divided between

the data-layer servers, of which there are typically much fewer.

This design has three great advantages: First, the data layer usually grows much slower than the

application and rarely requires new servers to be added, which means that pools rarely require resizing.

Second, application requests can be balanced between the data servers based on the remaining pool

capacity and third, if there is a need to add application-side caching to the system (such as Memcached),

only the data layer needs modification.

Page 8: Queues, Pools and Caches paper

Application Message Queues

The Problem:

By limiting the number of connections from the application servers to the database, we are preventing a

large number of queries from queuing at the database server. If the total number of connections

allowed from application servers to the database is limited to 400, the run queue on the database will

not exceed 400 (at least not by much).

We discussed in the previous section why preventing excessive concurrency in the database layer is

critical for database scalability and latency. However, we still need to discuss how the application can

deal with the user requests that arrive when there is no free database connection to handle them.

Let’s assume that we limited the connection pool to 50 connections, and due to a slow-down in the

database, all 50 connections are currently busy servicing user requests. However, new user requests are

still arriving into the system at their usual rate. What shall we do with these requests?

1. Throw away the database request and return error or static content to the user.

Some requests have to be serviced immediately. If the front page of your website can't load

within few seconds, it is not worth servicing at all. Hopefully, the database is not a critical

component in displaying these pages (we'll discuss the options when we discuss caches). If it

does depend on the database and your connection pool is currently busy, you will want to

display a static page and hope the customer will try again later.

2. Place the request in queue for later processing.

Some requests can be put aside for later processing, giving the user the impression of

immediate return. For example, if your system allows the user to request reports by email, the

request can certainly be acknowledged and queued for off-line processing. This option can be

mixed with the first option – limit the size of the queue to N requests and display error

messages for the rest.

3. Give the request extra-high priority. The application can recognize that the request arrived from

the CIO and make sure it gets to the database ahead of any other user, perhaps cancelling

several user requests to get this done.

4. Give the request extra-low priority. Some requests are so non-critical that there is no reason to

even attempt serving them with low latency. If a user uses your application to send a message

to another user, and there is no guarantee on how soon the message will arrive, it makes sense

to tell the user the message was sent while in effect waiting until a connection in the pool is idle

before attempting to serve the message. Recurring events are almost always lower priority than

one-time events: User signing up for the service is one time event, and if lost, will have

immediate business impact. Auditing user activity, on the other hand, is recurring event, and in

case of delay will have lower business impact.

5. Some requests are actually a mix of requests from different sources such as a dashboard, in

these cases it is best to display the different dashboard components as the data arrives, with

some components taking longer than others to show up.

Page 9: Queues, Pools and Caches paper

In all those cases, the application is able to prioritise requests and decide on a course of action, based on

information that the database did not have at the time. It makes sense to shift the queuing to the

application when the database is highly loaded, because the application is better capable of dealing with

the excessive load.

Databases are not the only constrained resources, as application servers have their own limitations

when dealing with excess load. Typically, application servers have limited number of threads. This is

done for the same reason we limit the number of connections to the database servers - the server only

has limited number of cores and excessive number of threads will overload the server without

improving throughput. Since database requests are usually the highest latency action that is done by an

application thread, when the database is slow to response, all the application server threads can be busy

waiting for the database. The CPU on the application server will be idle while the application cannot

respond to additional user requests.

All this leads to the conclusion that from both the database perspective and the application perspective,

it is preferable to decouple the application requests from the database requests. This allows the

application to prioritise requests, hide latency and keep the application server and database server busy

but not overloaded.

The Solution:

Message queues provide an asynchronous communications protocol, meaning that the sender and

receiver of the message do not need to interact with the message queue at the same time. They can be

used by web applications and OLTP systems as a way to hide latency or variance in latency.

Java defines a common messaging API, JMS. There are multiple implementations of this API, both open

source and commercial. Oracle advanced queues are bundled with Oracle RDBMS both SE and EE at no

extra cost. These implementations differ in their feature set, supported operations, reliability and

stability. The API supports queues for point-to-point messaging with a single publisher and single

consumer. It also supports topics for publish-subscribe model where multiple consumers can subscribe

to various topics and receive the messages broadcasted with the topic.

Message queues are typically installed by system administrators as a separate server or component, just

like databases are installed and maintained. The message queue server is called "Broker", and is usually

backed by a database to ensure that messages are persistent even when the broker fails. The application

server then connects to the broker by a URL, and can publish and consume from queues by the queue

name.

Page 10: Queues, Pools and Caches paper

The Architecture:

New Problems:

There are some common mythologies related to queue management, which may make developers

reluctant to use them when necessaryvii:

1. It is impossible to reliably monitor queues

2. Queues are not necessary if you do proper capacity planning

3. Message queues are unnecessarily complicated. There must be a simpler way to achieve the

same goals.

Solutions to New Problems:

While queues are undeniably useful to improve throughput both at the database and application server

layers, they do complicate the architecture. Let’s tackle the myths one by one:

1. If it was indeed impossible to monitor queues, you would not monitor the CPU, load average,

average active sessions, blocking sessions, disk IO waits, latches.

All systems have many queues. The only question is - where is the queue managed and how

easy it will be to manage each specific queue.

If you use Oracle Advanced Queues, V$AQ will show you the number of messages in the queue

and the average wait for messages in the queue, which is usually all you need to determine the

status of the queue. For the more paranoid, I'd recommend adding a heartbeat monitor - insert

a monitoring message to the queue at regular intervals and check that your process can read it

from queue and the amount of time it took to arrive.

The more interesting question is what do you do with the monitoring information - at what

point will you send an alert to the on-call SA and what will you want her to do when she receives

the alert?

Application Business Layer

Application Data Layer

DataSource

JNDI

JDBC Driver Connection

Pool

DataSource Interface

Message

Queue

Page 11: Queues, Pools and Caches paper

Any queuing system will have high variance in service times and arrival rates of work. If the

service time and arrival rates were constant, there will be no need for queues. The high variance

is expected to lead to spikes in system utilization, which can cause false alarms - the system is

behaving as it should, but messages are accumulating in the queue. Our goal is to give as early

as possible notice that there is a genuine issue with the system that should be resolved and not

send warnings when the system is behaving as expected.

For this end, I recommend monitoring the following parameters:

• Service time - this will be monitored at the consumer thread. The thread should track

(i.e. instrument) and log at regular intervals the average time it took to process a

message from the queue. If service time increase significantly (compared to a known

baseline, taking into account the known variance in response times), it can indicate a

slowdown in processing and should be investigated.

• Arrival rate should be monitored at the processes that are writing to the queue. How

many messages are inserted to the queue every second? This should be tracked for long

term capacity planning and to determine peak usage periods.

• Queue size - the number of messages in the queue. Using Little's Law we can measure

the amount of time a message spends in the queue (wait time) instead.

If queue size or wait time increase significantly, this can indicate a "business issue" - i.e.

impending breach of SLA. If the wait time frequently climbs to the point when SLAs are

breached, it indicates that the system is does not have enough capacity to serve the

current workloads. In this case either service times should be reduced (i.e. tuning), or

more processing servers should be added. Note that queue size can and should go up

for short periods of time, and recovering from bursts can take a while (depending on the

service utilization), so this is only an issue if the queue size is high and does not start

declining within few minutes, which will indicate that the system is not recovering.

Page 12: Queues, Pools and Caches paper

• Service utilization - what percent of the time the consumer thread is busy. This can be

calculated by (arrival rate/(service time x number of consumers)).

The more the service is utilized, the higher the probability that when a new message

arrives, it will have other messages ahead of it in the queue and since R=S+W, the

service times will suffer. Since we already measure the queue size directly, the main use

of service utilization is capacity planning, and in particular detection of over-provisioned

systems. For known utilization and fixed service times, if we know the arrival rates will

grow by 50% tomorrow, you can calculate the expected effect on response timesviii:

Note that by replacing many small queues on the database server with one (or few)

centralized queue in the application, you are in a much better position to calculate

utilization and predict the effect on response times.

2. Queues are inevitable. Capacity planning or not, the fact that arrival rates and service times are

random will ensure that there will be times when requests will be queued, unless you plan to

turn away a large percentage of your business.

I suspect that what is really meant by "capacity planning will eliminate need for queues" is that

it is possible to over-provision a system in a way that the queue servers (consumers) will have

very low utilization. In this case queues will be exceedingly rare so it may make sense to throw

the queue away and have the application threads communicate with the consumers directly.

The application will then have to throw away any request that arrives when the consumers are

busy, but in this system it will almost never happen. This is “capacity planning by

overprovisioning”. I've worked on many databases that rarely exceeded 5% CPU. You'll still need

to closely monitor the service utilization to make sure you increase your capacity to keep

utilization low. I would not call this type of capacity planning "proper", though.

On the other hand, introduction of a few well defined and well understood queues will help

capacity planning. If we assume fixed server utilization, the size of the queue is proportional to

the number of servers. So on some systems; it is possible to do the capacity planning just by

examining the queue sizes.

Page 13: Queues, Pools and Caches paper

3. Message Queues are indeed a complicated and not always stable beast. Queues are a simple

concept. How did we get to a point where we need all those servers, protocols and applications

to simply create a queue?

Depending on your problem definition, it is possible that message queues are an excessive

overhead. Sometimes all you need is a memory structure and few pointers. My colleague Marc

Fielding created a high-performance queue system with a database table and two jobs. Some

developers consider the database a worse overhead and prefer to implement their queues with

a file, split and xargs. If this satisfies your requirements, then by all means, use those solutions.

In other cases, I've attempted to implement a simple queuing solution, but the requirements

kept piling up: What if we want to add more consumers? What if the consumer crashed and

only processed some of the messages it retrieved? By the time I finished tweaking my system to

address all the new requirements; it was far easier to use an existing solution. So I advise to only

use home-grown solutions if you are reasonably certain the requirements will remain simple. If

you suspect that you'll have to start dealing with multiple subscribers, which may or may not

need to retrieve the same message multiple times, which may or may not want to ack messages,

and that may or may not want to filter specific message types, then I recommend using an

existing solution.

ActiveMQ, RabbitMQ (acquired by springsource) are popular open source implementations, and

Oracle Advanced Queue is free if you already have Oracle RDBMS license. When choosing an off

the shelf message queue, it is important to understand how the system can be monitored and

make sure that queue size, wait times and availability of the queue can be tracked by your

favorite monitoring tool. If high availability is a requirement, this should also be taken into

account when choosing message queue provider, since different queue systems support

different HA options.

Page 14: Queues, Pools and Caches paper

Application Caching:

The Problem:

The database is a sophisticated and well optimized caching machine, but as we saw when we discussed

connection pools, it has its limitations when it comes to scaling. One of those limitations is that a single

database machine is limited in the amount of RAM it has, so if your data working set is larger than the

amount of memory available, your application would have to access the disk occasionally. Disk access is

10,000 times slower than memory access. Even a slight increase in the amount of disk access your

queries have to perform, the type that happens naturally as your system grows, can have devastating

impact on the database performance.

With Oracle RAC, more cache memory is available by pooling together memory from multiple machines

into global cache. However, the performance improvement from the additional servers is not

proportional to what you'd see if you would add more memory to the same machine. Oracle has to

maintain cache consistency between the servers, and this introduces significant overhead. RAC can

scale, but not in every case and it requires careful application design to make this happen.

The Solution:

Memcached is a distributed, memory-only, key-value store. It can be used by the application server to

cache results of database queries that can be used multiple times. The great benefit of Memcached is

that it is distributed and can use free memory on any server, allowing for caching to be done outside of

Oracle’s scarce buffer cache. If you have 5 application servers and you allocate 1G RAM to Memcached

on each server, you have 5G of additional caching.

Memcached cache is an LRU, just like the buffer cache. If the application is trying to store a new key, and

there is no free memory, the oldest item in the cache will be evicted and its memory used for the new

key.

According to the documentation, Memcached scales very well when adding additional servers because

the servers do not communicate with each other at all. Each client has a list of available servers and the

hash function that allows it to know which server will hold the value for which key. When the

application requests data from cache, it connects to a single server and accesses exactly one key. When

a single cache node crashes, there will get more cache misses and therefore more database requests,

but the rest of the nodes will continue operating as usual.

I was unable to find any published benchmarks that confirm this claim, so I ran my own un-official

benchmark, using Amazon’s ElastiCache, a service which allows one to create a Memcached cluster and

add nodes to it.

Few comments regarding the use of Amazon’s ElastiCache and how I ran the tests:

1. Amazon’s ElastiCache is only usable from servers on Amazon’s EC2 cloud. To run the test, I

created an ElastiCache cluster with two small servers (1.3G RAM, 1 virtual core), and one EC2

Page 15: Queues, Pools and Caches paper

micro node (613 MB, up to two virtual cores for short bursts) running Amazon’s Linux

distribution.

2. I ran the test using Brutisix, a Memcached load test framework, written in PHP. The test is fairly

configurable, and I ran it as follows:

• 7 gets to 3 sets read/write mix, all reads and writes were random. Values were limited

to 256 bit.

• First test ran with a key space of 10K keys, which fit easily in memory of one

Memcached node. The node was pre-warmed with the keys.

• Second test ran with the same key space, two-nodes, both pre-warmed.

• Third test was one node again, 1M keys, which do not fit in memory of one or two

nodes and no pre-warming of cache.

• Fourth test with two nodes, 1M keys. Second node added after first node was already

active.

• The first 3 tests ran for 5 minutes each, the fourth ran for 15 minutes.

• The single node tests ran with 2 threads, and the two-node tests ran with four.

3. Amazon’s cloud monitoring framework was used to monitor Memcached’s statistics. It had two

annoying properties – it did not automatically refresh, and the values it showed were always 5

minutes old. In the future, it will be worth the time to install my own monitoring software on an

EC2 node to track Memcached performance.

Here is a chart of the total number of gets we could run on each node:

Page 16: Queues, Pools and Caches paper

Number of hits and misses per node:

Few conclusions from the tests I ran:

1. In the tests I ran, get latency was 2ms on AWS cluster and 0.0068 on my desktop. It appears that

the only latency you’ll experience with Memcached is the network latency.

2. The ratio of hits and misses did not affect the total throughput of the cluster. The throughput is

somewhat better with a larger key space, possibly due to fewer get collisions.

3. Throughput dropped when I added the second server, and total throughput never exceeded 60K

gets per minute. It is likely that at the configuration I ran, the client could not sustain more than

60K gets per minute.

4. 60K random reads per minute at 2ms latency is pretty impressive for two very small servers,

rented at 20 cents an hour. You will need a fairly high-end configuration to get the same

performance from your database.

By using Memcached (or other application-side caching), load on the database will be reduced, since

there are fewer connections and fewer reads. Database slowdowns will have less impact on the

application responsiveness, since on many pages most of the data arrives from cache, the page can

gradually display without the users feeling that they wait forever to get results. Even better, if the

database is unavailable, you can still maintain partial availability of the application by displaying cached

results – in the best cases, only write operations will be unavailable when the database is down.

The Architecture:

Application Business Layer

Application Data Layer

DataSource

JNDI

JDBC Driver Connection

Pool

DataSource Interface

Message

Queue

Memcached

Page 17: Queues, Pools and Caches paper

New Problems:

Unlike Oracle's buffer cache, which is automatically used by queries, use of the application cache does

not happen automatically and requires code changes to the application. In this sense it is somewhat

similar to Oracle's result cache - it stores results by request and not data blocks automatically. The

changes required to use Memcached are usually done in the data layer. The code that queries the

database is replaced by code that only queries the database if the result was not found in the cache first.

This places the burden of properly using the cache on the developers. It is said that the only difficult

problems in computer science are naming things and cache invalidation. The purpose of this paper is not

to solve the most difficult problem in computer science, but we will offer some advice on proper use of

Memcached.

In addition, Memcached presents the usual operational questions – How big should it be, and how can it

be monitored. We will discuss capacity planning and monitoring of Memcached as well.

Solutions to new problems:

The first step in integrating Memcached into your application is to re-write the functions in your data

layer, so they will look for data in the cache before querying the database:

For example, the following:

function get_username(int userid) {

username = db_select("SELECT usename FROM users WHERE userid = ?",

userid);

return username;

}

Will be replaced by:

function get_username(int userid) {

/* first try the cache */

name = memcached_fetch("username:" + userid);

if (!name) {

/* not found : request database */

name = db_select("SELECT username FROM users WHERE userid = ?",

userid);

/* then store in cache until next get */

memcached_add("username:" + userid, username);

}

return data;

}

Page 18: Queues, Pools and Caches paper

We will also need to change the code that updates the database so it will update the cache as well,

otherwise we risk serving stale data:

function update_username(int userid, string username) {

/* first update database */

result = db_execute("Update users set username=? WHERE userid=?",

userid,username);

if (result) {

/* database update successful: update cache */

memcached_set("username:" + userid, username);

}

Of course, not every function should be cached. The cache has limited size, and there is an overhead for

attempting to use the cache for data that is not actually there. The main benefits are to use the cache

for results of large or highly redundant queries.

To use the cache effectively without risking data corruption, keep the following in mind:

1. Use ASH data to find the queries that use the most database time. Queries that take significant

amount of time to execute and short queries that execute very often are good candidates for

caching. Of course many of these queries use bind variables and return different results for each

user. As we showed in the example, the bind variables can be used as part of the cache key to

store and retrieve results for each group of binds separately. Due to the LRU nature of the

cache, commonly used binds will remain and cache and get reused while infrequently used

combinations will get evicted.

2. Memcached takes large amounts of memory (the more the merrier!) but there is evidencex that

it does not scale well across large number of cores. This makes Memcached a good candidate to

share a server with an application that makes intensive use of the CPU and doesn't require as

much memory. Another option is to create multiple virtual machines on a single multi-core

server and install Memcached on all the virtual machines. However this configuration means

that you will lose most of your caching capacity with the crash of a single physical server.

3. Memcached is not durable. If you can't afford to lose specific information, store it in the

database before you store it in Memcached. This seems to imply that you can't use Memcached

to scale a system which is doing primarily large number of writes. In effect, it depends on the

exact bottlenecks. If your top wait event is "Log file sync", you can use Memcached to reduce

the total amount of work the database does, reduce the CPU load and therefore potentially

reduce "log file sync" wait.

4. Some data should be stored eventually but can be lost without critical impact to the system.

Instrumentation and logging information is definitely in this category. This information can be

stored in Memcached and written to the database in batches and infrequently.

Page 19: Queues, Pools and Caches paper

5. Consider pre-populating the cache: If you rely on Memcached to keep your performance

predictable, a crash of a Memcached server will send significant amounts of traffic to the

database and the effects on performance will be noticeable. When the server comes back, it can

take a while until all the data is loaded to the cache again, prolonging the period of reduced

performance. To improve performance in the first minutes after a restart, consider a script that

will pre-load data into the cache when the Memcached server starts.

6. Consider very carefully what to do when the data is updated:

Sometimes it is easy to simultaneously update the cache - if user changes his address and the

address is stored in the cache, update the cache immediately after updating the database. This

is the best case scenario, as the cache is kept useful through update. Memcached API contains

functions that allow changing data atomically or avoid race conditions.

When the data in the cache is actually aggregated data, it may not be possible to update it, but

will be possible to evict the current information as irrelevant and reload it to the cache when it

is next needed. This can make the cache useless when the data is updated and reloaded very

frequently.

Sometimes it isn't even possible to figure out what keys should be evicted from cache when

specific field is updated, especially if the cache contains results of complex queries. This

situation is best avoided, but can be dealt with by setting expiration time for the data, and

preparing to serve possibly-stale data for that period of time.

How big should the cache be?

• It is better to have many servers with less memory than few servers with a lot of memory. This

minimises the impact of one crashed Memcached server. Remember that there is no

performance penalty to a large number of nodes.

• Losing a Memcached instance will always send additional traffic to the database. You need to

have enough Memcached servers to make sure the extra traffic will not cause unacceptable

latency to the application.

• There are no downsides to a cache that is too large, so in general allocate to Memcached all the

memory you can afford.

• If the average number of gets per item is very low, you can safely reduce the amount of memory

allocated.

• There is no "cache size advisor" for Memcached, and it is impossible to predict the effect of

adding or reducing the cache size based on the monitoring data available from Memcached.

SimCache is a tool that based on detailed hit/miss logs for the existing Memcached can simulate

an LRU cache and predict the hit/miss ratio in various cache sizes. In many environments

keeping such detailed log is impractical, but tracking a sample of the requests could be possible

and can still be used to predict cache effects.

• Knowing the average latency of database reads under various loads and the latency of

Memcached reads should allow you to predict changes in response time as Memcached size and

its hit ratio changes. For example:

You use SimCache to see that with cache size of 10G you will have hit ratio of 95% in

Page 20: Queues, Pools and Caches paper

Memcached. Memcached has latency of 1ms in your system. With 5% of the queries hitting the

database, you expect the database CPU utilization to be around 20%, almost 100% of the DB

Time on the CPU, and almost no wait time on the queue between the business and the data

layers (you tested this separately when sizing your connection pool). In this case the database

latency will be 5ms, so we expect the average latency for the data layer to be

0.95*1+0.05*5=1.2ms.

How do I monitor Memcached?

• Monitor number of items, gets, sets and misses. An increase in the number of cache misses will

definitely mean that the database load is increasing at same time, and can indicate that more

memory is necessary. Make sure that the number of gets is higher than the number of sets. If

you are setting more than getting, the cache is a waste of space. If the number of gets per item

is very low, the cache may be oversized. There is no downside to an oversized cache, but you

may want to use the memory for another purpose.

• Monitor for number of evictions. Data is evicted when the application attempts to store new

item but there is no memory left. An increase in the number of evictions can also indicate that

more memory is needed. Evicted time shows the time between the last get of the item to its

eviction. If this period is short, this is a good indication that memory shortage makes the cache

less effective.

• It is important to note that low hit rate and high number of evictions do not immediately mean

you should buy more memory. It is possible that your application is misusing the cache:

o Maybe the application sets large numbers of keys, most of which are never used again.

In this case you should reconsider the way you use the cache.

o Maybe the TTL for the keys is too short. In this case you will see low hit rate but not

many evictions.

o The application frequently attempts to get items that don't exist, perhaps due to data

purging of some sort. Consider setting the key with a "null" value, to make sure the

invalid searches do not hit the database over and over.

• Monitor for swapping. Memcached is intended to speed performance by caching data in

memory. If the data is spilled to disk, it is doing more harm than good.

• Monitor for average response time. You should see very few requests that take over 1-2ms,

longer wait times can indicate that you are hitting the maximum connection limit for the server,

or that CPU utilization on the server is too high.

• Monitor that the number of connections to the server does not come close to the max

connections settings of Memcached (configurable).

• Do not monitor "stat sizes" for statistics about size of items in cache. This locks up the entire

cache.

Page 21: Queues, Pools and Caches paper

All the values I mentioned can be read from Memcached using the STAT call in its API. You can run this

command and get the results directly by telnet to port 11211. Many monitoring systems, including Cactii

and Ganglia include monitoring templates for Memcached.

i Traffic jam without bottleneck -experimental evidence for the physical mechanism of the formation of a jam

Yuki Sugiyama, Minoru Fukui, Macoto Kikuchi, Katsuya Hasebe, Akihiro Nakayama, Katsuhiro Nishinari, Shin-ichi

Tadaki, Satoshi Yukawa New Journal of Physics, Vol.10, (2008), 033001 ii http://www.telegraph.co.uk/science/science-news/3334754/Too-many-cars-cause-traffic-jams.html

iii Scaling Oracle8i™: Building Highly Scalable OLTP System Architectures, James Morle

iv http://www.youtube.com/watch?v=xNDnVOCdvQ0

v http://docs.oracle.com/javase/1.4.2/docs/guide/jdbc/getstart/datasource.html

vi http://www.perfdynamics.com/Manifesto/USLscalability.html

vii http://teddziuba.com/2011/02/the-case-against-queues.html

viii http://www.cmg.org/measureit/issues/mit62/m_62_15.html

ix http://code.google.com/p/brutis/

x

http://assets.en.oreilly.com/1/event/44/Hidden%20Scalability%20Gotchas%20in%20Memcached%20and%20Frien

ds%20Presentation.pdf