Caching

Summary

This subject is much larger than the blog title suggests.  I’m going to discuss some basics of using a caching system to take the load off of your database and speed up your website.  I’ll also discuss how cache should be handled by your software and what the pitfalls of caching can be.

Caching Data Calls

First of all, you can speed up your database routines by using a cache system configured as a helper.  Here’s a diagram of what I mean by configuring a cache system as a helper:

Basically, what this abstract diagram shows is that the website front-end retrieves it’s data from the database by going through the cache server.  Technically, that is not how the system is setup.  The cache and database are accessed through the back-end, but the cache instructions are setup with the following flow-chart:

You’ll notice that the first place to check is the cache system.  Only if the data is not in the cache, does the program connect to the database and read the data from there.  If the data is read from the database, then it must be saved in the cache first, then returned to the calling method.

One other thing to note here, is that it is vitally important to design and test your cache system in a way that it will work without crashing if the cache server is off or not responding.  If your cache server crashes, you don’t want your website to crash, you want it to continue operating directly from the database.  Your website will operate more slowly, but it will operate.  This will prevent your cache system from becoming a single point of failure.

Why is Caching Faster?

Programmers who have never dealt with caching systems before might ask this question.  It doesn’t seem like a good way to speed up a system by just throwing an extra server in the middle and adding more code to every data read.  The reason for the speed increase is due to the fact that the cache system is typically RAM based, where databases are mostly hard-drive based (though all modern databases use caching of their own).  Also, a Redis server costs pennies compared to a SQL server just in licensing costs alone.  By reducing the load on your SQL server you can reduce the number of processors needed for your SQL instances and reduce the amount of your licensing fees.

This sounds so magical, but caching requires great care and can cause a lot of problems if not implemented correctly.

Caching Pitfalls to Avoid

One pitfall of caching is how to handle stale data.  If you are using an interactive interface with CRUD operations, you’ll want to make sure you flush your cached objects for edits and deletes.  You only need to delete the key of the data in the cache server that relates to the data changed in the database.  This can become complicated if data being changed shows up in more than one cache key.  An example is where you cache result sets instead of the raw data.  You might have one cache object that contains a list of products at a particular store that contains the store name and another cache object that contains product coupons offered by a store containing the store name.  Caching is not a normalized structure and in this instance the cached instance is matched to the web page that needs the data.  Now, think about an example where the store name is changed for one reason or another.  If your software is responsible for clearing the cache when data is changed, then the store name change administration page must be aware of all cache keys that could contain the store name and delete/expire those keys.  The quick and dirty method of such an operation is to flush all the cache keys, but that method could cause a slowdown of the system and is not recommended.

It’s also tempting to cache every result in your website.  While this could theoretically be done, there is a point where caching can become more of a headache than a help.  One instance is in caching large result sets that constantly change.  One example is if you have a list of stores with sales figures that you refer to often.  The underlying query to compute this data might be SQL intensive, so it’s tempting to cache the results.  However, if the data is constantly changing, then each change in data must clear the cache and the underlying operation of caching and expiring cache keys can slow down your system.  Another example is to cache a list of employees with a different cache key per sort or filter operation.  This could lead to a large number of cached sets that can fill up your cache server and cause the server to expire cached items early in order to make room for the constant saving of new data.  If you need to cache a list, then cache the whole list and do your filtering and sorting in your website after reading it from the cache.  If there is too much data to cache, you can limit the cache time.

Sticky Cache

You can adjust the cache expire time to improve system performance.  If your cached data doesn’t change very often, like a lookup table that is used everywhere in your system, then make the expire time infinite.  Handle all cache expiring through your interface by only expiring the data when it is changed.  This can be one of your greatest speed increases.  Especially if you have lookup tables that are hit by all kinds of pages, like your system settings or user rights matrix.  These lookup tables might get hit several times for each web page access.  If you can cache that lookup, then the cache system will take the hit instead of your database.  This type of caching is sometimes referred to as “Sticky” because the cache keys stick in the system and never expire.

Short Cache Cycles

The opposite extreme is to assign a short expire time.  This can be used for instances where you would cache the results of a list page.  The cached page might have an expire time of 5 minutes with a sliding expire time.  That would allow a user to see the list, click on a next page button until they find what they are looking for.  After the user has found the piece of data to view or edit, the list is no longer needed.  The expire time can kick in and expire the cached data, or the view/edit screen can expire the cache when the user clicks on the link to go to that page.  It can also reduce database calls by using a cached set that is just the raw list.  Sorting and searching can be done from the front-end.  When the user clicks on the header of a column to sort by that column, the data can be re-read from the cache and sorted by that column, instead of being read from the database.

This particular use of caching should only be used after careful analysis of your data usage.  You’ll need the following data points:

  • Total amount of data per cache key.  This will be the raw query of records that will be read by the user.
  • Total number of times database is accessed first time arrival at the web page in question, per user.  This can be used to compute memory used by cache system.
  • Average number of round trips to the database the website uses when a user is performing a typical task.  This can be obtained by totaling the number of accesses to the database by each distinct user.

Multiple Cache Servers

If you get to a point where you need more than one cache server there are many ways to divide up your cached items.  If you’re using multiple database instances, then it would make sense to use one cache server per instance (or one per two database instances depending on usage).  Either way, you’ll need to make sure that you are able to know which instance your cached data is located on.  If you have a load-balanced web farm setup with a round-robin scheme, you don’t want to perform your caching per web server.  Such a setup would cause your users to get cache misses more often than hits and you would duplicate your caching for most items.  It’s best to think of this type of caching as being married to your database.  Each database should be matched up with one cache server.

If you have multiple database instances that you maintain for your customer data and you have a common database for system related lookup information, it would be advisable to setup a caching system for the common database system first.  You’ll get more bang for the buck by doing this and you’ll reduce your load on your common system.  Your common data is usually where your sticky caching will be used.  If your fortunate, you’ll be able to cache all of your common data and only use your common database for loading the system when there are changes or outages.

Result Size to Cache

Let’s say you have a lookup table containing all the stores in your company.  This data doesn’t change very often and there is one administrative screen that edits this data.  So sticky caching is your choice.  Analysis shows that each website call causes a read from this table to lookup the store name from the id or some such information.  Do you cache the entire table as one cache item or do you cache each store as one cache key per store?

If your front-end only looks up a store by it’s id, then you can name your cache keys with the store id and it will be more efficient to store each key separately.  Loading the cache will take multiple reads to the database, but each time your front-end hits the cache, the minimum amount of cached data is sent over the wire.

If your front-end searches or filters by store name or by zip code or by state, etc.  Then it’s wiser to cache the whole table into the cache system as one key.  Then your front-end can pull all the cached data and perform filtering and sorting as needed.  This will also depend on data size.

If your data size is very large, then you might need to create duplicated cached data for each store by id, each zip code, each state, etc.  This would seem wasteful at first, but remember the data is not stored permanently.  It’s OK to store duplicate results in cache.  The object of cache is not to reduce wasted space but to reduce website latency and reduce the load on your database.

 

.Net MVC Project with AutoFac, SQL and Redis Cache

Summary

In this blog post I’m going to demonstrate a simple .Net MVC project that uses MS SQL server to access data.  Then I’m going to show how to use Redis caching to cache your results to reduce the amount of traffic hitting your database.  Finally, I’m going to show how to use the AutoFac IOC container to tie it all together and how you can leverage inversion of control to to break dependencies and unit test your code.

AutoFac

The AutoFac IOC container can be added to any .Net project using the NuGet manager.  For this project I created an empty MVC project and added a class called AutofacBootstrapper to the App_Start directory.  The class contains one static method called Run() just to keep it simple.  This class contains the container builder setup that is described in the instructions for AutoFac Quick Start: Quick Start.

Next, I added .Net library projects to my solution for the following purposes:

BusinessLogic – This will contain the business classes that will be unit tested.  All other projects will be nothing more than wire-up logic.

DAC – Data-tier Application.

RedisCaching – Redis backed caching service.

StoreTests – Unit testing library

I’m going to intentionally keep this solution simple and not make an attempt to break dependencies between dlls.  If you want to break dependencies between modules or dlls, you should create another project to contain your interfaces.  For this blog post, I’m just going to use the IOC container to ensure that I don’t have any dependencies between objects so I can create unit tests.  I’m also going to make this simple by only providing one controller, one business logic method and one unit test.

Each .Net project will contain one or more objects and each object that will be referenced in the IOC container must use an interface.  So there will be the following interfaces:

IDatabaseContext – The Entity Framework database context object.

IRedisConnectionManager – The Redis connection manager provides a pooled connection to a redis server.  I’ll describe how to install Redis for windows so you can use this.

IRedisCache – This is the cache object that will allow the program to perform caching without getting into the ugly details of reading and writing to Redis.

ISalesProducts – This is the business class that will contain one method for our controller to call.

Redis Cache

In the sample solution there is a project called RedisCaching.  This contains two classes: RedisConnectionManager and RedisCache.  The connection manager object will need to be setup in the IOC container first.  That needs the Redis server IP address, which would normally be read from a config file.  In the sample code, I fed the IP address into the constructor at the IOC container registration stage.  The second part of the redis caching is the actual cache object.  This uses the connection manager object and is setup in the IOC container next, using the previously registered connection manager as a paramter like this:

builder.Register(c => new RedisConnectionManager("127.0.0.1"))
    .As<IRedisConnectionManager>()
    .PropertiesAutowired()
    .SingleInstance();

builder.Register(c => new RedisCache(c.Resolve<IRedisConnectionManager>()))
    .As<IRedisCache>()
    .PropertiesAutowired()
    .SingleInstance();

In order to use the cache, just wrap your query with syntax like this:

return _cache.Get("ProductList", 60, () =>
{
  return (from p in _db.Products select p.Name);
});

The code between the { and } represents the normal EF linq query.  This must be returned to the anonymous function call: ( ) =>

The cache key name in the example above is “ProductList” and it will stay in the cache for 60 minutes.  The _cache.Get() method will check the cache first, if the data is there, then it returns the data and moves on.  If the data is not in the cache, then it calls the inner function, causing the EF query to be executed.  The result of the query is then saved to the cache server and then the result is returned.  This guarantees that the next query in less than 60 minutes will be in the cache for direct retrieval.  If you dig into the Get() method code you’ll notice that there are multiple try/catch blocks that will error out if the Redis server is down.  For a situation where the server is down, the inner query will be executed and the result will be returned.  In a production situation your system would run a bit slower and you’ll notice your database is working harder, but the system keeps running.

A precompiled version of Redis for Windows can be downloaded from here: Service-Stack Redis.  Download the files into a directory on your computer (I used C:\redis) then you can open a command window and navigate into your directory and use the following command to setup a windows service:

redis-server –-service-install

Please notice that there are two “-” in front of the “service-install” instruction.  Once this is setup, then Redis will start every time you start your PC.

The Data-tier

The DAC project contains the POCOs, the fluent configurations and the context object.  There is one interface for the context object and that’s for AutoFac’s use:

builder.Register(c => new DatabaseContext("Server=SQL_INSTANCE_NAME;Initial Catalog=DemoData;Integrated Security=True"))
    .As<IDatabaseContext>()
    .PropertiesAutowired()
    .InstancePerLifetimeScope();

The context string should be read from the configuration file before being injected into the constructor shown above, but I’m going to keep this simple and leave out the configuration pieces.

Business Logic

The business logic library is just one project that contains all the complex classes and methods that will be called by the API.  In a large application you might have two or more business logic projects.  Typically though, you’ll divide your application into independent APIs that will each have their own business logic project as well as all the other wire-up projects shown in this example.  By dividing your application by function you’ll be able to scale your services according to which function uses the most resources.  In summary, you’ll put all the complicated code inside this project and your goal is to apply unit tests to cover all combination of features that this business logic project will contain.

This project will be wired up by AutoFac as well and it needs the caching and the data tier to be established first:

builder.Register(c => new SalesProducts(c.Resolve<IDatabaseContext>(), c.Resolve<IRedisCache>()))
    .As<ISalesProducts>()
    .PropertiesAutowired()
    .InstancePerLifetimeScope();

As you can see the database context and the redis caching is injected into the constructor of the SalesProjects class.  Typically, each class in your business logic project will be registered with AutoFac.  That ensures that you can treat each object independent of each other for unit testing purposes.

Unit Tests

There is one sample unit test that performs a test on the SalesProducts.Top10ProductNames() method.  This test only tests the instance where there are more than 10 products and the expected count is going to be 10.  For effective testing, you should test less than 10, zero, and exactly 10.  The database context is mocked using moq.  The Redis caching system is faked using the interfaces supplied by StackExchange.  I chose to setup a dictionary inside the fake object to simulate a cached data point.  There is no check for cache expire, this is only used to fake out the caching.  Technically, I could have mocked the caching and just made it return whatever went into it.  The fake cache can be effective in testing edit scenarios to ensure that the cache is cleared when someone adds, deletes or edits a value.  The business logic should handle cache clearing and a unit test should check for this case.

Other Tests

You can test to see if the real Redis cache is working by starting up SQL Server Management Studio and running the SQL Server Profiler.  Clear the profiler, start the MVC application.  You should see some activity:

Then stop the MVC program and start it again.  There should be no change to the profiler because the data is coming out of the cache.

One thing to note, you cannot use IQueryable as a return type for your query.  It must be a list because the data read from Redis is in JSON format and it’s de-serialized all at once.  You can de-searialize and serialize into a List() object.  I would recommend adding a logger to the cache object to catch errors like this (since there are try/catch blocks).

Another aspect of using an IOC container that you need to be conscious of is the scope.  This can come into play when you are deploying your application to a production environment.  Typically developers do not have the ability to easily test multi-user situations, so an object that has a scope that is too long can cause cross-over data.  If, for instance, you set your business logic to have a scope of SingleInstance() and then you required your list to be special to each user accessing your system, then you’ll end up with the data of the first person who accessed the API.  This can also happen if your API receives an ID to your data for each call.  If the object only reads the data when the API first starts up, then you’ll have a problem.  This sample is so simple that it only contains one segment of data (top 10 products).  It doesn’t matter who calls the API, they are all requesting the same data.

Other Considerations

This project is very minimalist, therefore, the solution does not cover a lot of real-world scenarios.

  • You should isolate your interfaces by creating a project just for all the interface classes.  This will break dependencies between modules or dlls in your system.
  • As I mentioned earlier, you will need to move all your configuration settings into the web.config file (or a corresponding config.json file).
  • You should think in terms of two or more instances of this API running at once (behind a load-balancer).  Will there be data contention?
  • Make sure you check for any memory leaks.  IOC containers can make your code logic less obvious.
  • Be careful of initialization code in an object that is started by an IOC container.  Your initialization might occur when you least expect it to.

Where to Get The Code

You can download the entire solution from my GitHub account by clicking here.  You’ll need to change the database instance in the code and you’ll need to setup a redis server in order to use the caching feature.  A sql server script is provided so you can create a blank test database for this project.

 

Data Caching with Redis and C#

Summary

Caching is a very large subject and I’m only going to dive into a small concept that uses Redis, an abstract caching class, and a dummy caching class for unit testing and show how to put it all together for a simple but powerful caching system.

Caching Basics

The type of caching I’m talking about in this blog post involves the caching of data from a database that is queried repetitively.  The idea is that you would write a query to read your data and return the results to a web site, or an application that is under heavy load.  The data being queried might be something that is used as a look-up, or maybe it’s a sales advertisement that everybody visiting your website is going to see.  

The data request should check to see if the data results are already in the cache first.  If they are not, then read the data from the database and copy to the cache and then return the results.  After the first time this data is queried, the results will be in the cache and all subsequent queries will retreive the results from the cache.  One trick to note is that the cached results name needs to be unique to the data set being cached, otherwise, you’ll get a conflict.

Redis

Redis is free, powerful and there is a lot of information about this caching system.  Normally, you’ll install Redis on a Linux machine and then connect to that machine from your website software.  For testing purposes, you can use the windows version of Redis, by downloading this package at GitHub (click here).   Once you download the Visual Studio project, you can build the project and there should be a directory named “x64”.  You can also download the MSI file from here.  Then you can install and run it directly.

Once the Redis server is up and running you can download the stack exchange redis client software for C#.  You’ll need to use “localhost:6379” for your connection string (assuming you left the default port of 6379, when you installed Redis).


Caching System Considerations

First, we want to be able to unit test our code without the unit tests connecting to Redis.  So we’ll need to be able to run a dummy Redis cache when we’re unit testing any method that includes a call to caching.

Second we’ll need to make sure that if the Redis server fails, then we can still run our program.  The program will hit the database every time and everything should run slower than with Redis running (otherwise, what’s the point), but it should run.

Third, we should abstract our caching class so that we can design another class that uses a different caching system besides Redis.  An example Windows caching system we could use instead is Memcached.

Last, we should use delegates to feed the query or method call to the cache get method, then we can use our get method like it’s a wrapper around our existing query code.  This is really convenient if we are adding caching to a legacy system, since we can leave the queries in place and just add the cache get wrappers.


CacheProvider Class

The CacheProvider class will be an abstract class that will be setup as a singleton pattern with the instance pointing to the default caching system.  In this case the RedisCache class (which I haven’t talked about yet).  The reason for this convoluted setup, is that we will use the CacheProvider class inside our program and ignore the instance creation.  This will cause the CacheProvider to use the RedisCache class implementation.  For unit tests, we’ll override the CacheProvider instance in the unit test using the BlankCache class (which I have not talked about yet).

Here’s the CacheProvider code:

public abstract class CacheProvider
{
    public static CacheProvider _instance;
    public static CacheProvider Instance
    {
        get
        {
            if (_instance == null)
            {
                _instance = new RedisCache();
            }
            return _instance;
        }
        set { _instance = value; }
    }

    public abstract T Get<T>(string keyName);
    public abstract T Get<T>(string keyName, Func<T> queryFunction);
    public abstract void Set(string keyName, object data);
    public abstract void Delete(string keyName);
}

I’ve provided methods to save data to the cache (Set), read data directly from the cache (Get) and a delete method to remove an item from the cache (Delete).  I’m only going to talk about the Get method that involves the delegate called “queryFunction”.


RedisCache Class

There is a link to download the full sample at the end of this blog post, but I’m going to show some sample snippets here.  The first is the Redis implementation of the Get.  First, you’ll need to add the Stack Exchange Redis client using NuGet.  Then you can use the connect to the Redis server and read/write values.

The Get method looks like this:

public override T Get<T>(string keyName, Func<T> queryFunction)
{
    byte[] data = null;

    if (redis != null)
    {
        data = db.StringGet(keyName);
    }

    if (data == null)
    {
        var result = queryFunction();

        if (redis != null)
        {
            db.StringSet(keyName, Serialize(result));
        }

        return result;
    }

    return Deserialize<T>(data);
}

The first thing that happens is the StringGet() method is called.  This is the Redis client read method.  This will only occur if the value of redis is not equal to null.  The redis value is the connection multiplexer connection that occurs when the instance is first created.  If this fails, then all calls to Redis will be skipped.

After an attempt to read from Redis is made, then the variable named data is checked for null.  If the read from Redis is successful, then there should be something in “data” and that will need to be deserialized and returned.  If this is null, then the data is not cached and we need to call the delegate function to get results from the database and save that in the cache.

The call to StringSet() is where the results of the delegate are saved to the Redis server.  In this instanced, the delegate is going to return the results we want (already deserialized).  So we need to serialize it when we send it to Redis, but we can just return the results from the delegate result.

The last return is the return that will occur if we were able to get the results from Redis in the first place.  If both the Redis and the database servers are down, then this method will fail, but the app will probably fail anyway.  You could include try/catch blocks to handle instances where the delegate fails, assuming you can recover in your application if your data doesn’t come back from the database server and it’s not cached already.

You can look at the serialize and deserialize methods in the sample code.  In this instanced I serialized the data into a binary format.  You can also serialize to JSON if you prefer.  Just replace the serialize and deserialize methods with your own code.


Using the RedisCache Get Method

There are two general ways to use the Get method: Generic or Strict.  Here’s the Generic method:

var tempData = CacheProvider.Instance.Get(“SavedQuery“, () =>
{
    using (var db = new SampleDataContext())
    {
        return (from r in db.Rooms select r).ToList();
    }
});



For strict:

for (int i = 0; i < iternations; i++)
{
    List<Room> tempData = CacheProvider.Instance.Get<List<Room>>(“SavedQuery2“, () =>
    {
        using (var db = new SampleDataContext())
        {
            return (from r in db.Rooms select r).ToList();
        }
    });
}


In these examples you can see the LINQ query with a generic database using statement wrapping the query.  This sample was coded in Entity Framework 6 using Code-First.  The query is wrapped in a function wrapper using the “() => {}” syntax.  You can do this with any queries that you already have in place and just make sure the result set is set to return from this.  The tempData variable will contain the results of your query.


Using the BlankCache Class

There are two different ways you could implement a dummy cache class.  In one method, you would provide a Get() method that skips the caching part and always returns the result of the delegate.  You would be implementing an always miss cache class.  

The other method is to simulate a caching system by using a dictionary object to store the cached data and implement the BlankCache class to mimic the Redis server cache without a connection.  In this implementation we making sure our code under test will behave properly if a cache system exists and we’re not concerned about speed per-say.  This method could have a negative side-effect if your results are rather large, but for unit testing purposes you should not be accessing large results.

In either BlankCache implementation, we are not testing the caching system.  The purpose is to use this class for unit testing other objects in the system.

A snippet of the BlankCache class is shown here:

public class BlankCache : CacheProvider
{
    // This is a fake caching system used to fake out unit tests
    private Dictionary<string, byte[]> _localStore = new Dictionary<string, byte[]>();

    public override T Get<T>(string keyName, Func<T> queryFunction)
    {
        if (_localStore.ContainsKey(keyName))
        {
            return Deserialize<T>(_localStore[keyName]);
        }
        else
        {
            var result = queryFunction();
            _localStore[keyName] = Serialize(result);
            return result;
        }
    }
}

As you can see, I used a dictionary to store byte[] data and used the same serialize and deserialize methods that I used with Redis.  I also simplified the Get method, since I know that I will always get a connection to the fake cache system (aka the Dictionary).

When using the CacheProvider from a unit test you can use this syntax:

CacheProvider.Instance = new BlankCache();

That will cause the singleton instance to point to the BlankCache class instead of Redis.


Getting the Sample Code

You can download the sample code from my GitHub account by clicking here.  Make sure you search for “<your data server name here>” and replace with the name of your SQL server database name (this is in the TestRedis project under the DAL folder inside the SampleDataContext.cs file).

If you didn’t create the ApiUniversity demo database from any previous blog posts, you can create an empty database, then copy the sql code from the CreateApiUniversityDatabaseObjects.sql file included in the Visual Studio code.  The sample code was written in VS2013, but it should work in VS2012 as well.