MySQL Conf - Memcached Internals

By Ilya Grigorik on April 22, 2008

Brian Aker and Alan Kasindorf gave a great talk on Memcached - every developer's favorite caching layer - at MySQL User Conference earlier last week. The slides are available online, and they're well worth a bookmark. Brian and Alan promised to continue updating the deck as they hone out their presentation at OSCON and other events. In fact, their next iteration will include the Nginx integration trick, and new and improved section on consistent hashing. In other words, check back frequently.

Memcached Best Practices

At its core, Memcached is a high-performance, distributed caching system. It is application neutral, and is currently used on many large scale web sites such as Facebook (2TB of cache, circa Q1 2008), LiveJournal, Mixi, Hi5, etc. However, it is also an extremely simple piece of software: all of the logic is client-side, there is no security model, failover, backup mechanisms, or persistence (albeit the last one is in the roadmap). But that hasn't stopped the developers from deploying it in all kinds of environments, and here are a few best practices suggested by Brian:

Don't think row-level (database) caching, think complex objects
Don't run memcached on your database server, give your database all the memory it can get
Don't obsess about TCP latency - localhost TCP/IP is optimized down to an in-memory copy
Think multi-get - run things in parallel whenever you can
Not all memcached client libraries are made equal, do some research on yours - hint, use Brians.

Slab Allocator - Heart of Memcached

The heart of Memcached is in its memory slab allocator. A little daunting at first sight, it is actually a very elegant solution once you understand the motivation and the tradeoffs of its architecture:

Size of maximum allocated memory to Memcached is only limited by the architecture (32/64-bit)
The key size is limited to 250 bytes, the data (value) size is limited to 1mb
On bootup, Memcached grabs the memory and holds on to it - wasted memory at cost of performance
Allocated memory is split into variable sized buckets by the slab allocator
Default slab allocator will create 32-39 buckets for objects up to 1mb in size
You can customize the page size at compile time, and slab sizes on startup
Each stored objects gets stored in a closest-size bucket - yes, memory is wasted
Fragmentation can be a problem - either customize slab sizes, or evict/flush your cache every so often
If there is unused memory in a different slab class, that memory will be re-allocated, if required
To guarantee no-paging, disable swap on your OS - practically, just keep it tiny, as to avoid disasters

Memory Management

The memory is managed via a Least Recently Used (LRU) algorithm:

Each slab class has its own LRU - evict target depends on the size of the object
Expiration timestamps are checked once every second - minimum lifespan is 1 second
Objects marked for deletion are handled asynchronously - checked and evicted once every 5 seconds
Inconsistency between the two timers listed above can result in sub-optimal eviction policy
LRU can be completely disabled - do it at your own risk

Best Practices: Invalidations and Expiry

Memcached does not provide any mechanisms for deleting a set of associated keys (object, name, etc.). For good or for worse, you could implement this functionality yourself with the help of prepend and append commands, however, be careful with the 1mb limit! A much cleaner way to handle this situation is to forgo the invalidation process all together:

Instead of invalidating your data, expire it whenever you can - memcached will do all the work
Generate smart keys - ex. on update, increment a version number, which will become part of the key
For bonus points, store the version number in memcached - call it generation
The latter will be added to Memcached soon - as soon as Brian gets around to it

Roadmap and Future Ahead

Memcached is a force to be reckoned with already, and the roadmap is only going to solidify this position. Amongst the big objectives:

Binary protocol is in the works
Support for generations - as described in paragraph above
Multi-engine support: char based, durable (persistence!), queue
Facebook has overhauled the core to be highly-threaded, and is expected to contribute the changes

Kudos to Brian and Alan for a great presentation! Don't forget to bookmark the slides and check back every so often, as they will be updated with time to describe best practices, common mistakes, and the roadmap ahead.

Ilya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.