Introducing Hazelcast 3.7 EA: A Better, Faster Hazelcast

I am excited and happy to introduce Hazelcast 3.7 EA to you. It is publicly available as an Early Access release in the Maven repositories and at our downloads page. We introduce exciting new features, but this time, we also invested a lot into deep internals. Following on from 3.6, we improved performance a lot.

The client protocol we introduced in 3.6 is yielding results: we just released supported versions of the Python and Node.js clients. What’s even more exciting: we see our user community is starting to experiment with the protocol and they are building independent clients. This is Open Source at its best and we love it!

What does EA mean?

Early Access release is a teaser to give you a taste of the upcoming features and also an opportunity to tell us what do you think about them! Although we have a very solid test suite and strong Continuous Integration, we now go on to our QA phase where we spend 4 to 6 weeks to ensure quality. In this period, besides fixing the known bugs; we will challenge our product with stress and performances tests using our own tool – Hazelcast Simulator. Enough talking; let’s have a look at some of the major new features and improvements:

New Features

First Modularized Release: 3.7 is the first fully modularized version of Hazelcast. We have separate repositories, Maven modules and release cycles for many aspects of Hazelcast now. Each client/language and plugin is now a module. When you download a 3.7 distribution, it contains the latest released version. But we can release updates, new features, and bug fixes much faster than the Hazelcast core. When we say in this blog we will release something parallel to 3.7, we mean we are releasing a module. And it speeds up development. And of course, it is easier to contribute to as an open source contributor. A win-win all round.

Custom eviction policies: In Hazelcast you could always set an eviction policy from one of LRU or LFU. But what if you want more flexibility to suit custom requirements of your app. Custom eviction policy exactly helps on that. We implemented a custom eviction both for our Map and JCache implementations. Here you can see an example of an odd-based evictor. It works with our O(1) probabilistic evictors. You simply provide a comparator and we choose the best eviction candidate.

/**
    * Odd evictor tries to evict odd keys first.
    */
   private static class OddEvictor extends MapEvictionPolicy {


       @Override
       public int compare(EntryView o1, EntryView o2) {
           Integer key = (Integer) o1.getKey();
           if (key % 2 != 0) {
               return -1;
           }


           return 1;
       }
   }

See the complete code sample here.

Fault-Tolerant ExecutorService: Imagine you send executables to Hazelcast nodes and they take hours to complete. What if one the nodes crashes and you do not know whether the task completed or not? In 3.7, we introduce DurableExecutorService. It guarantees ‘execute at least once’ semantics. Its API is a narrowing of IExecutorService. Unlike IExecutorService, users will not be able to submit/execute tasks to selected member/members. (Note: This module has not been released with EA1. It will be available in EA2 in a few weeks.

New Cloud Integrations: We are releasing the CloudFoundry and OpenShift plugins parallel to the 3.7 release. The Hazelcast members deployed to CloudFoundry and OpenShift will discover each other automatically. Also, you will have the option to connect and use Hazelcast as a service inside CloudFoundry and OpenShift. You also have the option of using this with Docker – https://hub.docker.com/r/hazelcast/openshift/. See Rahul’s following blog to learn more about using the CloudFoundry Integration.

We also released our Azure Cloud Discovery Plugin for running Hazelcast on Azure. Hazelcast will also be up in Microsoft Azure Marketplace before the end of June. Look out for the forthcoming blog on that.

Apache Spark Connector: Parallel to the 3.7 release we are releasing our new plugin Hazelcast-Spark connector. It allows Hazelcast Maps and Caches to be used as shared RDD caches by Spark using the Spark RDD API as per the following cache example.

// read from cache
HazelcastJavaRDD rddFromCache = hsc.fromHazelcastCache("cache-name-to-be-loaded")

JavaPairRDD<Object, Long> rdd = hsc.parallelize(new ArrayList<Object>() {{
    add(1);
    add(2);
    add(3);
}}).zipWithIndex();

// write to cache
javaPairRddFunctions(rdd).saveToHazelcastCache(name);

Both Java and Scala Spark APIs are supported. See the module repo for details.

Reactive – We did the hard work of adding back pressure to Hazelcast over the 3.5 and 3.6 releases. Hazelcast internally is fully asynchronous. Now we are beginning to expose reactive methods. In 3.7 most of the IMAP async methods plus AsyncAtomicLong have been added. Vert.x-Hazelcast is a very popular combination. Vert.x 3.3.0-CR2 has the new reactive Hazelcast integration for much higher performance. The reactive methods return ICompletableFuture which can accept a callback allowing a reactive style.

WAN Replication Enhancements (Enterprise Only): With 3.7; we implemented the ability to resynchronize a remote WAN cluster. This was implemented for IMap (Cache implementation is left for 3.8). This is very useful when you initiate a WAN connection or need to reinitiate after maintenance to a remote cluster.

WAN Replication via Solace (Enterprise Only): We also added WAN replication with Solace, a high-performance enterprise-grade messaging solution.

Notable Improvements in Hazelcast internals

This release, in particular, is focused on some deep plumbing to fix some rare edge cases in Hazelcast to improve correctness. For those of you who like to read about gory details I cherry-picked a few notable changes:

  • Improvements on partitioning system: Our community had detected the following issue: During a migration process, there can happen a moment that data is kept by a number of nodes which is less than the configured backup count even there is enough number of nodes in the cluster. If any node crashes at this unfortunate moment, we were losing data. Although, some may claim this is an edge case scenario; still it was a conflict with our guarantee that we give to our users. So we designed and implemented major improvements in our partitioning and migration system. You can find a detailed explanation of the solution here: https://hazelcast.atlassian.net/wiki/display/COM/Avoid+Data+Loss+on+Migration+-+Solution+Design
  • Graceful shutdown improvements: We need to ensure data safety while shutting down multiple nodes concurrently. But there was a counter example: When a node shuts down, it checks if all 1st backups of its partitions are synced without checking if the backup node is also shutting down or not. There is a race situation here. If both owner and backup nodes shutdown at the same time, we lose data since owner is not aware that 1st backup is also shutting down. See the following PR to get an idea about the solution: https://github.com/hazelcast/hazelcast/pull/7989
  • Improvement to the threading model: In 3.7, there is now at least 1 generic priority thread that will only process generic priority operations like member/client heartbeats. This means that under load the cluster remains more stable since these important operations get processed. See following PR for details of the problem and solution: https://github.com/hazelcast/hazelcast/pull/7857
  • Improvements to the invocation system: Invocation service is one of the parts that we were wanting to improve and simplify. Because it is complex, it was hard to fix bugs, even minor changes were prone to regressions. We simplified the invocation logic and fixed some ambiguities. Although it is a completely internal development, it has made Hazelcast more stable preventing many problems regarding invocation system. Relatedly various enhancements (e.g. moving IsExecutingOperation into its own thread) fixed several issues like the following: https://github.com/hazelcast/hazelcast/issues/6248
  • Performance improvement on map.putAll(): We introduced grouping and batching of remote invocations but also reduced some internal litter for higher performance. Our efforts resulted in a performance gain up to 15% especially when the argument size is bigger. If you want to read some code, here it is: https://github.com/hazelcast/hazelcast/pull/8023
  • Prevent blocking reads in transactions: To provide atomicity for transactions, we were blocking the reads on entries which are locked transactionally. This is not an optimized solution. We changed the architecture so that we block reads just before the commit.
  • Improvements on Hot Restart and HD (Enterprise HD Only): We introduced batching of hot restart operations (when fsync is enabled) that will improve performance notably. Moreover, we optimized memory usage of Hot Restart by persisting values to disk, reducing metadata. In high-density memory, we created an abstraction layer for safer memory access which also handled unaligned memory access to enable HD on Oracle Sparc CPUs commonly found in Solaris systems.
  • .NET Client Enhancements: Besides working on new clients, our team is enhancing existing clients. We added predicate and SSL support for our .Net client.

New Clients

After publishing the client protocol in 3.6 and with the support of our community, we started working on new native clients. Happily Python and Node.js clients have now been released to support clients. See our new clients page to see our list of clients and features implemented in each.

Closing Words

After releasing EA, we start working hard to make Hazelcast production quality. We run long running tests, stress and performance tests, improve documentation etc. Also, we will strive to solve bugs reported by you and our tests. We always need your help, you can test EA and create issues you encounter. You can use our forum, Twitter (@hazelcast) or Stackoverflow for your questions; and Github for your bug reports and contributions.

Thank you all for your support.