Memcached vs DB Cache Comparison

Today I’m going to talk about some performance considerations about web applications and the advantages of having a memory cache.

Introduction

As noted in my previous memcached post, for most applications, there is some data that can be cached in a local cache, so you don’t have to query it every time a users enters your site, as the exchange rates example in my previous post.
For the exchange rates example, you could choose to store the values locally in several formats, you may choose to store the data in a temporal file in the local file system, a database, a memory cache, or whatever comes to your mind, so you don’t need to query the external service again for the rest of the day.
Sometimes, if you already have a DB running that is used by your application, you may want to reuse that in order to store the temporal data, so you don’t need to install any additional applications, for example memcached. As a developer, it may sometimes become a hard task to convince the sysadmin to install a new software in the environment.
In this post I’ll compare the performance between using a local DB as a cache (Postgre SQL) and a memory cache (Memcached). You may use these comparisons as an argument to move to a memory cache (I mention memcached, but there are more, some I mentioned in my previous post, but you can find more using your preferred search engine 😉 ).

Test Environment

For the test environment I installed a local Postgre SQL sever and a Memcached server, both were running simultaneously during the tests.
For every test I inserted a total of 10000 keys and then read them sequentially. After that, I compared the inserts/second and reads/second and made a graph for each test case. The variables considered were: length of the key (10 to 250 characters), size of value object (10KB to 1MB) and the quantity of threads (1 to 40). This means, the horizontal axis represents the variable, and the vertical axis shows the insertions or reads per second, depending the case.

You can find the source code of the tests here. I didn’t develop the source with the intention of sharing it afterwards, so it may be incomplete (the DB cache only has the put and get methods implemented, not the delete or flush) and a bit hard to read if you don’t speak Spanish. If you want to read it and have questions just ask in the comments.

Test Results

Variable length of key

In this test, the variable is the length of the key. The object size is fixed to 6kB. According to memcached, the limit length is 256 characters, so I wondered how this limit affects performance. Here you can see that when adding keys longer than 200 characters, the number of insertions/sec decreases significantly for memcached (red line).

The same tests running against the DB (blue line) show almost a constant number of insertions and reads per second, but far less than memcached. The peak insertions/second in the case of Memcached is 14700 whereas for the DB it is 350.

The reads per second for memcached is more than 3 times the amount of the DB.

Variable object size

In this case the variable is the object size, from 10kB to 1MB. The key length is fixed to 100 characters.
It is very clear how the performance drops in both cases as the size of the inserted object increases. The biggest size inserted is 1MB, since this is the limit for Memcached.

Variable length of key

This last case fixes the length of the key to 100 characters and the size to 6kB. The variable is the number of threads used to read the keys. Every one of these threads read all the 10000 keys stored previously in the cache. In this case, only the read part is important, since the objects are inserted only once into the cache.

Clearly, the reads/second limit for the DB is reached very fast, this is, using more than 4 threads, the reads/sec remain constant between 1500 and 2000.
For memcached, this limit is not reached and adding more threads increases the number of reads/sec. I didn’t try with more than 40 threads, but this shows that from a performance point of view, a single memcached server can be shared 2 or more applications.

Conclusion

In every case, according to the expected result, the memory cache always performs better than the DB, no matter the variable.
Even though memcached looses performance when using big objects or, when using it in a normal range (e.g.: objects smaller than 500kB) it will perform very nice, and more important is that the memcached server can scale very well and also very easy. If the performance isn’t good enough you can always easily add another memcached server in any server with some spare memory and that’s it, the number of reads and writes per second will increase.

Please note, this comparison is not a formal benchmark between these cache because the tests add some overhead. It is developed in Java, and the Java VM has some time and memory overhead. A more formal benchmark would have to use a small footprint language for the client, for example C or Lua. I used Java because my main point wasn’t to get an exact number of inserts a cache can provide, but to get an approximate improvement percentage when using a memory cache, instead of a DB.

I hope you enjoyed reading the post as much as I enjoyed writing it!

Please let me know if you have any comments or suggestions. I would also like you to share experiences using other memory caches, I am open for alternatives 😉 .

Memcached and SpyMemcached

Memcached and SpyMemcached

Hi all, today I’m going to write about Memcached, which I’ve been using for a while now.
Memcached ( http://memcached.org/ ) is a free and open source memory object caching system, which also has a DB integrated version called Coachbase (previously called Membase). The main idea of a memory cache is to hold objects in memory avoiding unneeded DB queries. When using a cache you need to make sure that the data can be recovered from somewhere else when the cache fails, since the data in the cache can be erased when a timeout occurs, when it needs to free up some memory, or the cache server fails. An example of this is the daily currency rates (you can query these values once through a web service, and then store it locally in a cache server), since these values don’t change for several hours, you can store them in a local cache, so you don’t query the web service each time a user navigates to your site.
When storing objects in a Memcached server there are some limits. The maximum size of the object must not exceed 1MB and the key length has a maximum of 256 characters.
Memcached can be compiled for any Linux distribution (download here) or it can be run in Windows after downloading the ported binaries, which can be found here (http://code.jellycan.com/memcached/).
Either way, running and configuring the server is extremely easy. I will be running these examples in Windows, but the Linux way is analogous. The simplest way to run the server is with the command:

memcached

This command starts a memcached server which by default can hold up to 64MB of objects and listens on port 11211. You can change these parameters with the -m and -p options respectively. There are other parameters you can set when starting up the memcached server, but these two are enough to have a functional memcached server. You can look up the other options in the wiki. Here is an example:

#Runs a memcached server which can hold up to 512MB objects and on port 11230
memcached -m 512 -p 11230

Memcached has clients for several languages (C, C++, Java, PHP and so on), all the possible clients are listed here. You can even use it from a telnet client.
Since I am mostly into Java, I’ll talk a bit more about the Java clients. According to the clients page, there are several clients for Java. I didn’t get to try them all, but I went with Spymemcached, since it is the one developed by Couchbase (the creators of memcached), it has support for couchbase and is one of the most recently updated ones. There is another recently updated project called xmemcached, which has a nice documentation.

Jumping right into Spymemcached, it is quite straightforward to use it. Just download the jar file, create a client and start setting keys:

//Create a memcached client (this will also start a thread which will monitor the server availability and communication)
MemcachedClient client=new MemcachedClient(new InetSocketAddress("localhost", 11211));

//Just start putting objects in the cache. 
//Here,"someKey" is the key under which the object will be stored, 
// 3600 is the number of seconds the object will be kept and 
// someObject is any Object that implements Serializable.
client.set("someKey", 3600, someObject);

//This will synchronously retrieve the object from the cache 
Object obj = client.get("someKey");

Even having a cluster of Memcached servers is easy, you just create one MemcachedClient with several addresses. The client decides where to store/retrieve the keys with an internal hashing method.

//Create the Memcached client and use it, simple as that
MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("localhost:11211 localhost:11212"));

There is also a CacheMap class which allows you to access the cache as it were a Map:

MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("localhost:11211"));
//Here, 100 is the default timeout to use when adding elements to the cache, "prefix" is the string that will be prepended to all the keys in this map
CacheMap map = new CacheMap(c,100,"prefix");

//...
//use map
mapa.put("k1", someObject);

//Retrieve the element using the map
Object obj1 = mapa.get("k1");
//or even using the MemcachedClient (prepending the prefix we used for this map):
Object obj2 = c.get("prefixk1");

Finally I want to mention that there are several alternatives for a caching system (these examples are designed to work with Java platform):

  1. Apache JCS: It has several caching levels, memory, disk, database, etc. It also seems a bit more complex to install and configure. link
  2. OSCache: Works inside the same VM which the application is running, but works. It wasn’t updated since long ago
  3. JCache: It’s an old api, which is getting refreshed with Java EE 7, end of 2012. link

That’s it for now, as simple as that you can start using Memcached 🙂
I hope you enjoyed this post, let me know your comments!
On my next post I’ll add some comparisons between using a Database as a cache and Memcached.