Hazelcast 2.0 Released: Big Data In-Memory

Miko Matsumura | Mar 8, 2012

With 2.0 release, Hazelcast is ready for caching/sharing terabytes of data in-memory. Storing terabytes of data in-memory is not a problem but avoiding GC and being resilient to crashes are big challenges. Among several others, there are two major features added to tackle these challenges. Elastic Memory (off-heap storage) and Distributed Backups.

1. Elastic Memory

By default, Hazelcast stores your distributed data (map entries, queue items) in a Java heap which is subject to garbage collection. As your heap gets bigger, garbage collection might cause your application to pause tens of seconds, badly effecting your application performance and response times.

Elastic Memory is Hazelcast with off-heap memory storage to avoid GC pauses. Even if you have terabytes of cache in-memory with lots of updates, GC will have almost no effect; resulting in more predictable latency and throughput.

For each map storage type (off-heap/on-heap) can be configured. If map’s storage type is off-heap then primary, backup and even near-cached (if available) entries are all stored off the heap and your off-heap storage space will dynamically scale as you add more nodes.

Elastic Memory implementation doesn’t require any defragmentation. Here is how things work: User defines the number of GB storage to have off the heap per JVM, let’s say it is 40GB. Hazelcast will create 40 DirectBuffers, each with 1GB capacity. If you have, say 50 nodes, then you have total of 2TB off-heap storage capacity. Each buffer is divided into configurable chunks (blocks) (default chunk-size is 1KB). Hazelcast uses a queue of available (writable) blocks. 3KB value, for example, will be stored into 3 blocks. When the value is removed, these blocks are returned back into the available blocks queue so that they can be reused to store another value.

2. Distributed Backups

In dynamically scalable partitioned storage systems, whether it is a NoSQL database, filesystem or in-memory data grid, changes in the cluster (adding or removing a node) can lead to big data moves in the network to re-balance the cluster. Re-balancing will be needed for both primary and backup data on those nodes. If a node crashes for example, dead node’s data has to be re-owned (become primary) by other node(s) and also its backup has to be taken immediately to be fail-safe again. Shuffling megabytes of data around has a negative effect in the cluster as it consumes your valuable resources such as network, CPU and RAM. It might also lead to higher latency of your operations during that period.

Imagine a cluster of 50 machines, each holding 40GB data; 20GB primary and 20GB backup. When a node, say node5, crashes, obviously we should make sure that primary (owned) data by the dead node5 has to be owned by at least one other node in the cluster. So 20GB will be re-owned and its backup will be taken. But where was that 20GB backed up before? Let’s say it was backed up on node7 before, then node7 will be the new owner of that 20GB. This means node7 owns currently two times the other nodes. which also means, node7 will burn more CPUs than others to handle twice as much requests. In addition, node7 will have to backup this 20GB on another node, say node9. This means node7 will send 20GB to node9. This is really bad for the network and for the CPUs of these two nodes. Also node9 will need enough memory space to store 20GB backup-data! It is huge memory waste. Such limitations will force you allocate (buy) more memory so that you don’t go out of memory upon crashes. Say you have enough memory, to balance the load, node7 and node9 will have to migrate its data onto other nodes overtime.

2.0 is designed to tackle all these big-data challenges. With this version, data owned by a node will be evenly backed up by all the other nodes.

In other words, every node takes equal responsibility to backup every other node. This leads to better memory usage and less influence in the cluster when you add/remove nodes.

Say you have 2TB data on 50 node cluster; each node storing 20GB primary, 20GB backup data. Let’s focus on one of the nodes, say node3. 20GB primary data that node3 has will be backed up by all 49 other nodes each backing up 1/49th of 20GB. If node3 dies, each node will own 1/49th of its data; notice that no migration is needed and cluster is still well-balanced!

So the backup mechanism is designed in a way that there will no need to rebalance after crashes. Say you added 5 more nodes. There is no immediate action to be taken by the cluster; because existing nodes are already in proper state. Only thing Hazelcast will do is to slowly migrate some of the data to the new nodes so that all nodes can eventually be equally loaded.

The Hazelcast team is working on a demo to show this. Tha plan is to store 1 billion entries, total of 4TB data on 100 nodes at Amazon EC2. Recorded video of the demo will be posted online so that we all can see it in action.