Blog ›Some Background on Tiered Storage

By Neil Stevenson

Principal Architect

With over 30 years of industry experience, Neil has designed, developed, debugged, and supported software systems for numerous customers large and small. Initially, a C and assembler programmer, most of the last 20 years have been Java-based, with a focus on distributed systems, data grids, and stream processing. Neil is an occasional committer to the Hazelcast code base, with special interest on GoLang.

View all blogs by the author

Mar 16, 2022

Back to Blog

Some Background on Tiered Storage

Hazelcast Platform 5.1 introduces a new feature, “Tiered Storage.”

You can read about it here.

But in this blog post, we’ll take a look at the benefits and the drawbacks.

Some of it is counter-intuitive at first, so to understand how best to use tiered storage, it’s important to really understand how it works.

Traditional fast/slow architectures

In many installations, Hazelcast contains some of the data.

For example, a bank may hold this month’s and last month’s transactions in Hazelcast and keep a few years of transactions somewhere else, let’s say an Oracle database.

Looking at this more abstractly, data is hosted with an appropriate speed. Data you need quickly, you hold in memory. Data you don’t need quickly, you could hold in memory, but you will likely use disk instead because it’s cheaper.

This seems right. As a bank customer, when looking at transactions on the web or mobile, it’s most likely to be the recent ones and you’d expect this to be highly responsive. If you, as a customer, want to see the previous year’s transactions then a spinning hourglass while these are retrieved isn’t (currently) perceived as a bad user experience. Immediate retrieval of transactions from months long past would be impressive, but probably not worth the cost.

What doesn’t seem right is the technical debt. You need Hazelcast and something else, use a map store to connect them, and perhaps have licenses and staffing costs to run the “something else.” In many cases, the something else is a relational database when all you really want is disk storage.

Capacity planning

Thinking more on the above, there is a decision to be made on which data to keep in fast memory.

If you were a bank, you might go with memory capacity for two months’ worth of transactions. Customers are most likely to look at this month and last month.

Implementing this has some options.

You can go for calendar months. So in mid-April you have a full month worth for March and a partial month for April. As April continues data is added for each day. As April becomes May, you have to unload all of March to make room for April.

Or you may go for a rolling method. You keep the last 9 weeks of data to be sure the current and previous calendar months are in memory, and don’t care that some data from the month before the previous is also present.

Either way, it’s not exactly difficult, but a bit of work you’d rather not bother about.

Tiered Storage basics

The idea of tiered storage is to simplify this problem for you, the tradition of Hazelcast is to make your life simpler.

So, you can configure an IMap for tiered storage.

You select a memory capacity for that IMap. When you write more data to the IMap, the configured capacity, excess data is written to a disk that you nominate on an LRU basis.

You can access all that data via SQL and the IMap API, and behind the scenes the Hazelcast Platform will deal with paging data in and out for you.

For example:

  map:
    ‘BankAccount':
      tiered-store:
        enabled: true
        memory-tier:
          capacity:
            unit: MEGABYTES
            value: 256

The map BankAccount will be capped at 256MB in memory per node, anything else goes to disk. It’s that easy.

Replace the database

Tiered storage is a commercial feature, but it could end up saving you money.

The question here is, what actually does your other store deliver for you?

If it’s for querying, you can run SQL against Hazelcast.

If it’s for storage, well, that’s not really provided by the external store but instead by the disk upon which it sits.

Typically also some ORM mechanism, such as Hibernate, will be involved to transform the object into a different format. And, likely, also the database will have some integrity checks it wishes to perform. Even if these don’t fail, it’s going to delay the save.

So perhaps what the external store adds is cost, slowness and complexity. Rip it out and enjoy the benefit.

The Tiered Storage Roadmap

In 5.1, Tiered Storage is a BETA feature. This means you shouldn’t use it in production, but instead consider it a feature preview so you can start to experiment with it in dev and test.

Our hope is that it will become a production ready feature from 5.2.

Also, at the moment, only two tiers are supported – for memory and disk – and this is only for IMap.

Later releases may support other tiers, if you have the likes of SSD disks and ordinary disks. Similarly, other data structures that hold large volumes of data may benefit from tiered storage.

Two surprises

It’s a great feature and will bring you a lot of benefits. You can read about the details here.

But there are two behaviors that are confusing at first, so better to know about them. These aren’t faults, just ways in which the implementation must do something that seems counter-intuitive.

Intermittent write speed

The first is that write speeds will vary.

If you update an entry that is currently in the memory tier and the updated entry isn’t larger that the previous, you can just replace it in the memory tier.

If you update an entry that is currently on the disk tier, you bring it into memory as it’s LRU based.

So the performance of a write will depend on whether the entry being written is currently on disk or memory.

Intermittent read speed

Unsurprisingly, the same variation affects reads.

If you read something, it will take longer to retrieve if it is currently on disk than if it is currently in memory.

Less obviously, a read can trigger a write. If a read brings an entry from disk up into the memory tier, something else may need to be moved down to disk to make space.

MapStore

If you’re familiar with the old approach, a MapStore, you’ll spot similar issues.

When you do a read, it may trigger the load(K) method and run at a different pace than if the data is found in memory.

If you do a write to memory, it always triggers the store(K, V) method and the write that ultimately ends on disk.

Summary

Use of MapStore is still valid, but now you have another option.

While the MapStore isn’t necessarily complicated, the thing used remotely for storage may be. Removing an RDBMS or similar will simplify the tech stack.

Hazelcast can be responsible for all of the data in the map, even though only some of it will be in memory.

SQL queries and other operations work on the entire dataset, not just the portion in memory.

You don’t need to care about what’s in memory and what’s not, but be aware that what’s on disk takes longer.

Keep Reading

Blog

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Network hops cost milliseconds; milliseconds cost money. Put data, compute, and AI on one platform, and both bills shrink, whether…

Blog

Understanding the Value of Distributed Compute

Introduction Hazelcast is a powerful platform. It delivers the power of a highly reliable, distributed cache. Equally important is the…

Blog

Resilience That Holds Under Load: Hazelcast Platform 5.7

A major release for institutions where the operational state must remain correct during degradation, not just be restored afterward. The…

Blog

Testing distributed resilient applications powered by Hazelcast

Applications powered by Hazelcast and that use it to drive business logic need tests that go beyond happy-path validation. Serialization,…

Datasheet

/ PDF

/ 2 pages

Resilient, Continuous, Active Data – without Compromise Datasheet

The unified in-memory and stream processing platform for resilient, continuous active data at sub-millisecond speed.

Webinar

/ Video

/ 45 min

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Zero-downtime upgrades aren’t the hard part—schema evolution is. Learn how mixed service versions interact with shared cached data, why subtle inconsistencies cause failures, and how to design forward-compatible changes using Hazelcast and real Java examples.

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Neil Stevenson

Spread the Word

Some Background on Tiered Storage

Traditional fast/slow architectures

Capacity planning

Tiered Storage basics

Replace the database

The Tiered Storage Roadmap

Two surprises

Intermittent write speed

Intermittent read speed

MapStore

Summary

Keep Reading

Decisions at the Speed of Memory: Hazelcast on IBM® LinuxONE 5

Understanding the Value of Distributed Compute

Resilience That Holds Under Load: Hazelcast Platform 5.7

Testing distributed resilient applications powered by Hazelcast

Resilient, Continuous, Active Data – without Compromise Datasheet

Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect