Blog ›A Hitchhiker’s Guide to Caching Patterns

By Nicolas Frankel

Developer Advocate, Hazelcast

Nicolas Fränkel is a Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Currently working for Hazelcast. Also double as a teacher in universities and higher education schools, a trainer and triples as a book author.

View all blogs by the author

Dec 7, 2020

Back to Blog

A Hitchhiker’s Guide to Caching Patterns

When your application starts slowing down, the reason is probably a bottleneck somewhere in the execution chain. Sometimes, this bottleneck is due to a bug. Sometimes, somebody didn’t set up the optimal configuration. And sometimes, the process of fetching the data is the bottleneck.

One option would be to change your whole architecture. Before moving to such a drastic, and probably expensive measure, one can consider a trade-off: instead of getting remote data every time, you can store the data locally after the first read. This is the trade-off that caching offers: stale data vs. speed.

Deciding to use caching is just the first step in a long journey. The next step is to think about how your application and the cache will interact. This post focuses on your options regarding those interactions.

Cache-Aside

Cache-Aside is probably the most widespread caching pattern. With this approach, your code handles the responsibility of orchestrating the flow between the cache and the source of truth.

Regarding reads, it translates as the following:

Cache-aside is probably the most widespread caching pattern.

For writes, it’s even simpler:

How cache-aside handles writes.

The biggest advantage of using Cache-Aside is that anybody can read the code and understand its execution flow. Moreover, the requirements toward the cache provider are at their lowest: it just needs to be able to get and set values. That allows for pretty straightforward migrations from a cache provider to another one (e.g. Hazelcast).

The biggest issue of Cache-Aside is that your code needs to handle the inconsistency gap between the cache and the datastore. Imagine that you’ve successfully updated the cache but the datastore fails to update. The code needs to implement retries. Worse, during unsuccessful retries, the cache contains a value that the datastore doesn’t.

Switching the logic to update the datastore first doesn’t change the problem. What if the datastore updates successfully but the cache doesn’t?

Read-Through

Compared to Cache-Aside, Read-Through moves the responsibility of getting the value from the datastore to the cache provider.

Read-Through implements the Separation of Concerns principle. Now, the code interacts with the cache only. It’s up to the cache to manage the synchronization between itself and the datastore. It requires a more advanced cache provider than for Cache-Aside, as the former needs to provide such capability.

Hazelcast provides the MapLoader interface for this usage.

Write-Through

Similar to Read-Through but for writes, Write-Through moves the writing responsibility to the cache provider.

The main benefit of Write-Through is that the code is now free of failure handling and retry logic. Of course, it’s now up to the cache to manage them.

Hazelcast provides the MapStore interface for this usage. Because in most of the cases, Write-Through also implies Read-Through, MapStore is a child-interface of MapLoader so that interactions with the datastore are co-located in the same implementation class.

Write-Behind

Write-Behind looks pretty similar to Write-Through.

I believe some of you dear readers didn’t even see the difference. And if you did, you might be wondering what it does mean.

To make it clear, the difference lies in the last arrow’s arrowhead: it changed from solid to line. If your UML days are past (I had to look at how to represent it), it means that the cache sends an asynchronous message to the datastore.

Up to this point, all messages exchanged between actors were synchronous: the caller needs to wait until the callee has finished processing and returned before continuing its flow. With Write-Behind, the cache sets the value to the datastore and doesn’t wait for confirmation.

On the plus side, this approach speeds the whole process since the datastore is the slowest component – it sits somewhere over the network and writes to disk. On the other hand, it runs the risk of introducing inconsistencies in the cache. In Write-Through, you could retry to your heart’s content until the value was set. In Write-Behind, you don’t know if the set was even successful.

With Hazelcast, changing from a Write-Through approach to a Write-Behind one is just a matter of configuring the write-delay-seconds property to a value higher than 0.

Refresh-Ahead

The old saying goes that there are two hard things in computer science: naming things and cache invalidation. Cache invalidation is about planning how long an item should be stored in the cache before it expires. When it does or when the cache is still empty, you need to fetch the item from the datastore using one of the patterns above – Cache-Aside or Read-Through.

Both patterns implement a flow that involves the code, the cache, and the datastore. As mentioned above, reading from the datastore is an expensive operation: you need to first cross through the network and then request data from the datastore. What if you could prefetch the data, making it available before you even request, thus saving you from incurring the performance hit on the critical path? That’s exactly what Refresh-Ahead does.

Implementations of Refresh-Ahead are cache provider-dependent. A safe bet that is agnostic to the provider is to use Hazelcast Jet. With its Change-Data-Capture capability, Jet allows to connect to any cache provider with a public API and update cached entities as soon as the datastore is updated. Here’s a bird’s eye view of CDC in action:

For more details on Refresh-Ahead, please check my previous post Designing an Evergreen Cache with Change-Data-Capture.

Summary

Here’s a quick summary of the patterns and the context they best fit it:

Pattern	Consider	Cons
Cache-Aside	When you’re limited by the capabilities of your cache provider	The application is responsible for the cache orchestration flow
Read-Through	Solid default
Write-Through	Solid default
Write-behind	When performance considerations outweigh short-term consistency	Asynchronous systems are harder to reason with
Refresh-ahead	When fetching data from the datastore impairs throughput	Additional component to develop, deploy and maintain

Keep Reading

Blog

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Network hops cost milliseconds; milliseconds cost money. Put data, compute, and AI on one platform, and both bills shrink, whether…

Blog

Understanding the Value of Distributed Compute

Introduction Hazelcast is a powerful platform. It delivers the power of a highly reliable, distributed cache. Equally important is the…

Blog

Resilience That Holds Under Load: Hazelcast Platform 5.7

A major release for institutions where the operational state must remain correct during degradation, not just be restored afterward. The…

Blog

Testing distributed resilient applications powered by Hazelcast

Applications powered by Hazelcast and that use it to drive business logic need tests that go beyond happy-path validation. Serialization,…

Datasheet

/ PDF

/ 2 pages

Resilient, Continuous, Active Data – without Compromise Datasheet

The unified in-memory and stream processing platform for resilient, continuous active data at sub-millisecond speed.

Webinar

/ Video

/ 45 min

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Zero-downtime upgrades aren’t the hard part—schema evolution is. Learn how mixed service versions interact with shared cached data, why subtle inconsistencies cause failures, and how to design forward-compatible changes using Hazelcast and real Java examples.

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Nicolas Frankel

A Hitchhiker’s Guide to Caching Patterns

Cache-Aside

Read-Through

Write-Through

Write-Behind

Refresh-Ahead

Summary

Keep Reading

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Understanding the Value of Distributed Compute

Resilience That Holds Under Load: Hazelcast Platform 5.7

Testing distributed resilient applications powered by Hazelcast

Resilient, Continuous, Active Data – without Compromise Datasheet

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Nicolas Frankel

Spread the Word

A Hitchhiker’s Guide to Caching Patterns

Cache-Aside

Read-Through

Write-Through

Write-Behind

Refresh-Ahead

Summary

Keep Reading

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Understanding the Value of Distributed Compute

Resilience That Holds Under Load: Hazelcast Platform 5.7

Testing distributed resilient applications powered by Hazelcast

Resilient, Continuous, Active Data – without Compromise Datasheet

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect