Press enter to see results or esc to cancel.

Caching in Microservice – A highly effective way to maximize performance

Reading Time: 3 minutes

Caching improves availability, scalability, and performance of Microservices by reducing roundtrips to dependencies.

Querying database every time is nonessential. Perhaps we can save our applications a little compute by serving data from the buffer. This is profitable when the nature of data is static, and don’t change very often.

Frequently calling dependent Microservices is also redundant. We can buffer their response to improve speed and availability. If a particular service is down, we can serve out of the cache.

If not managed properly, caching becomes useless or worst disastrous. Refreshing cache data too often can affect response time and defeats the purpose. Refreshing too less can lead to inconsistent data and  affecting the reliability.

Below are the 4 things to consider when deciding to use the cache.

  1. What to cache.
  2. When to cache.
  3. Where to cache.
  4. How long to cache.

What to cache.

Objects and configurations are the most common things we can store in the cache.


Objects are the business entities our applications are constantly operating on. A lot of time, this data is persisted in databases and we can avoid frequent querying for these heavy components by buffering temporarily in the cache.

For example, static data like product and prices do not change very often. We can safely pre-load this data on application startup into the cache, and query on-demand basis.


Configurations are the application wide settings like images and static global values. Static global values usually don’t change a lot in the lifetime of the application and can be safely cached.


Identifying and storing meta-data can keep cache cost low. Since the cache is relatively expensive storage — choose eligible data by keeping in mind the volume and frequency of usage.

When to cache.

Caching can be performed one-time during application startup, or also an on-demand basis.


Pre-fetching cache data during app initialization will fasten the request processing from the word go. However, it can also cause peaks on database access. Peaks can be handled by queueing up the requests or bringing down the application for maintenance. Pre-fetching complete data from the database could also lead to excessive storage in cache.


On the contrary, we can also cache data on the on-demand basis. For example, we cache data the first time it’s accessed. All the subsequent requests can be served from cache. This will save our application from peaks during startups. However requests still have to be pulled the first time from the database.

Where to cache.

In-memory and distributed caching are the 2 most common places we can cache data.


In-memory caching is faster compared to distributed caching, as it sits closest to the application server. Storing data in In-memory can lead to data inconsistency when multiple instances of application servers are created.

Since RAM storage is shared between cache and application, chances are server can slow down as traffic increases.

I have seen a server that crashed immediately as soon as we deployed. Our RAM quickly ran out of storage space, and the application was dead in 5 minutes after deployment.

Distributed caching:

In Distributed caching — a series of primary-secondary servers are dedicated to buffering and serving data. Here data is consistent as it is stored as a single copy. It is less confusing to refresh cache data one location. Maintenance is also easy when your cache size is growing.

However, this separation between the application server and cache server can cause a slight delay in network calls though. Faults and failures can happen any time as well.


Multi-level caching could lead to data inconsistencies if the consumer and creator both cache separately. Some companies cache even between presentation, business and database layers.

In my experience such multi-layer caching smells. It can be easily avoided by using distributed caching. However, distributed caching needs to be in consideration right from application design phase.

It is not easy to change existing apps to move to distributed caching. Such an effort requires lots of code re-work.

How long to cache.

The freshness of the data is important. Choose your cache refresh mechanism carefully.

Periodic and Workflow based refresh mechanism are the 2 ways to refresh your cache data.


A lot of times, financial transactions are not reflected immediately in our accounts. Banks cache transactions data and refresh them only at the end of the day. This Periodic cache busting mechanism is helpful when time is not a critical factor. This is the most simple and commonly used cache busting mechanism.


In time sensitive application, we can choose to bust cache based on the business workflow. For example, as soon as a merchant updates the prices of products we can bust the cache. We can partially bust the cache by deleting specific product data or the entire cache. Depends on the scenario.

To realize workflow based caching, every Microservice has to expose endpoints for tenants. These endpoints will be used to bust cache based the business demand.


Leave a Comment

Leave a Reply