Hazelcast is a powerful real-time stream processing platform that can scale, enrich and process large data volumes of fresh, new and historical data. As every enterprise in this day and age has built numerous “haystacks” of data and seeking the needle hidden in each. Thankfully, Hazelcast has great query abilities to help you find the needle – at the right time – with SQL. To simplify the search for our users, we have implemented the LINQ to SQL provider for Hazelcast .Net Client to make SQL easier for the .NET platform.

I will explain the LINQ usage for Hazelcast .Net Client but before that, we need a cluster to connect. You can download the Hazelcast binary or build it yourself, however, I will use Hazelcast Viridian Serverless solution; ready for action, with no infrastructure to worry about. You can sign-up and create a free cluster in a few minutes.

For Viridian users, after you have signed up and created a cluster, you will see a “Connect Client“ button.

Viridian Connect Client

You can download an example client ready to run with your cluster or retrieve the credentials to connect to the cluster. I choose the second option because I want to create my own project. To reach your credentials, click the “Advanced setup“ at the top.

This screen has everything you need to connect the cluster. Keep the window open, we will need it later.
Okay, we launched the cluster with almost no effort. Now, it is time to connect to it.
First, let’s create a new .NET project named as HzLinqUsage.
dotnet new console -n HzLinqUsage
Then, go to the project folder and add dependencies to the Hazelcast.Net NuGet package, and to the new Hazelcast.Net.Linq.Async NuGet package, which provides LINQ.

cd HzLinqUsage
dotnet add package Hazelcast.Net
dotnet add package Hazelcast.Net.Linq.Async
dotnet add Microsoft.Extensions.Logging.Console

Note that we also add a dependency to the Microsoft.Extensions.Logging.Console package as we want to use the Hazelcast .NET client logging abilities to view the SQL queries emitted by LINQ.

Now, open the project with your favorite IDE. I can demonstrate the new features with a simple data type but real life is not always simple. So, let’s do something more fancy. I will create a map which hold academy awards as keys and best pictures of that award as values.

You can create the data models as classes or records. It doesn’t matter because with Hazelcast 5.2 Compact serialization, I do not need to handle the serialization, and can let the Hazelcast .NET Client do it with zero configuration. If you are curious about Compact, I strongly encourage you to look for details.

After having data models, here is the start of my Program.cs file, with the data records.

using Hazelcast;
using Microsoft.Extensions.Logging;

Console.WriteLine("Hello, World! Let's use LINQ over Hazelcast.");

public record AcademyAward(int Year, string Title)
public AcademyAward() : this(0, "") {} // parameter-less ctor

public record Film(string FilmName, string Director, int Budget)
public Film() : this("","", 0) {} // parameter-less ctor

Note that the LINQ provider only works with public properties, and requires that the class (or record) exposes a parameter-less constructor (this constraint may be lifted in the future).

Next, configure and create the client. It’s time to revisit the Viridian credentials page.

using Hazelcast;
using Microsoft.Extensions.Logging;

Console.WriteLine("Hello, World! Let's use LINQ over Hazelcast.");

var options = new HazelcastOptionsBuilder()
.With("Logging:LogLevel:Hazelcast", "Debug")
.With(opt =>
opt.ClusterName = "CLUSTER_ID";
opt.Networking.Cloud.DiscoveryToken = "DISCOVERY_TOKEN";
opt.Networking.Ssl.Enabled = true;
opt.Networking.Ssl.CertificatePassword = "KEY_STORE_PASSWORD";
opt.Networking.Ssl.CertificateName = "PATH_TO_CLIENT.PFX"; // You'll get it by "Download keystore file"
.With((configuration, opt) =>
opt.LoggerFactory.Creator = () => LoggerFactory.Create(loggingBuilder =>

await using var client = await HazelcastClientFactory.StartNewClientAsync(options);

public record AcademyAward(int Year, string Title)
public AcademyAward() : this(0, "") {} // default ctor

public record Film(string FilmName, string Director, int Budget, bool Liked)
public Film() : this("","", 0, false) {} // default ctor

Client is ready. Now, let’s put some data to query.

var map = await client.GetMapAsync<AcademyAward, Film>("bestPicture");

var a93 = new AcademyAward(2021, "93rd Academy Awards");
var a93_film = new Film("Nomadland", "Chloé Zhao", 15_000_000);
await map.PutAsync(a93, a93_film);

var a92 = new AcademyAward(2020, "92rd Academy Awards");
var a92_film = new Film("Parasite", "Bong Joon-ho", 5_000_000);
await map.PutAsync(a92, a92_film);

var a91 = new AcademyAward(2019, "91rd Academy Awards");
var a91_film = new Film("Green Book", "Peter Farrelly", 23_000_000);
await map.PutAsync(a91, a91_film);

Console.WriteLine("Put the data!");

Cluster, data and client is ready. Now, let’s run queries. Before moving on, we need to create a mapping for SQL, so that it can understand the data and its field.

await client.Sql.ExecuteCommandAsync(@"
Year INT EXTERNAL NAME \"__key.Year\",
Title VARCHAR EXTERNAL NAME \"__key.Title\",
Director VARCHAR,
Budget INT
'keyFormat' = 'compact', 'keyCompactTypeName' = 'award',
'valueFormat' = 'compact', 'valueCompactTypeName' = 'film'

While mapping fields of the object, you should be careful about naming and casing. They must match with class/record properties.

Now, we can use LINQ.

var likedFilmNames = map.AsAsyncQueryable()
.Where(p=> p.Value.Liked)
.Select(p=> p.Value.FilmName);

await forearch(var name in likedFilmNames)
Console.WriteLine($"-Liked {name}");

Here, we filtered the the entries with Where in the map by Liked property. Note that, since we are working on a map Key and Value fields appeared as expected. We reached the Film object by Value of the entry. Later, with Select we projected the FilmName to a string enumerable.

Since the Hazelcast .NET Client as well as the LINQ Provider are async, we fetch the entries by awaiting. When await foreach will call the MoveNextAsync of likedFilmNames, the SQL query will be prepared and will be executed on the server. Currently, it uses the default SQL options of the for execution and fetching.

What was the case before LINQ Provider?

Let’s look at it;

var sqlFilmNameResult = await client.Sql.ExecuteQueryAsync("Select FilmName From bestPicture Where Liked == TRUE");

await foreach(var row in sqlFilmNameResult)
Console.WriteLine($"-Liked { row.GetColumn("FilmName")}");

It’ still okay but more error prone because string is involved. Also, we need to deal reading the column unless you fetch the whole row as object, deserialize and cast it.

Here some other examples,

Entries which has film budget more than 10,000,000.

var bigBudgetOnes = map.AsAsyncQueryable()
.Where(p=> p.Value.Budget > 10_000_000);

await forearch(var entry in bigBudgetOnes)
Console.WriteLine($"-Budget of {entry.Value.FilmName} is ${entry.Value.Budget:N1}.");

Custom projection of entry’s title is “91rd Academy Awards“.

var awards93= map.AsAsyncQueryable()
.Where(p=> p.Key.Title == "91rd Academy Awards"")
.Select(p=> new { Film=p.Value.FilmName, Director=p.Value.Director });

await forearch(var entry in awards93)

Console.WriteLine($"93rd Best Picture: {entry.FilmName} is directed ${entry.Director}.");

Currently, LINQ provider is beta and has limited functionalities compare to the full SQL interface. However, we are planning to enlarge its abilities. Your feedbacks are important guide for us, please contact us via GitHub or Slack Community channel if you have any questions.

If you curious about Hazelcast .NET Client, please don’t forget to check out the repo and examples.

We’ve been hard at work and are excited for this moment: CLC v5.2.0 is released!

What is Hazelcast CLC?

Hazelcast CLC is a command line tool that empowers developers to do various Hazelcast-related tasks, such as running SQL queries or sending commands to a Hazelcast cluster. Since CLC is a single binary without dependencies, it is small, fast and easy to install.

Before carrying on let’s address one thing: what does “C-L-C” actually stand for? With so many acronyms floating around, it’s easy to get terms mixed up and here, CLC is an acronym for “Command Line Client.” CLC indeed acts like a client to a Hazelcast cluster for many use cases, yet it will have more features in the future which is outside the scope of being a client.

CLC provides three modes of operation that we explain in the Operating Modes section: Command-line (or non-interactive) mode, shell (or interactive) mode, and batch mode.

CLC can connect to Hazelcast Viridian clusters, as well as Hazelcast IMDG 4.x and Hazelcast Platform clusters (5.x and up) running locally or on a remote server. To run SQL queries, the cluster must be version 5.0 or up. JSON support in SQL was introduced at Hazelcast Platform 5.1.

CLC was developed using the Go programming language. that allowed us to make CLC a single self-contained native binary without dependencies. That has several advantages:

  • You do not need JRE or the Hazelcast distribution to run CLC. Just downloading the appropriate binary is sufficient.
  • Being a native binary, CLC starts immediately. That makes it very suitable to use within shell scripts.
  • Since it is a static binary, it doesn’t use or require dynamic libraries. That makes it very convenient to use in Docker scratch images.
  • As it runs in a terminal, CLC can be executed within your favourite IDE.

CLC uses the client/server mode when communicating with a Hazelcast cluster. So it can be confined to a licensed server-side which supports ACL permissions.

Having talked about the benefits, here are the current limitations of CLC:

  • CLC cannot connect to all Hazelcast Platform clusters. For instance, it does not have Kerberos support at the moment.
  • It currently does not support distributed data structures besides the Map.
  • CLC currently cannot decode all values. particularly ones encoded using language-specific serializers, such as Java serialization. It always, helpfully, shows which key/values it cannot decode and their types though.
  • The types that CLC can write is limited at the moment. They are as follows for both Map keys and values: boolean, 32 and 64-bit floats, 8, 16, 32 and 64-bit integers, strings and JSON.

In most cases there is not a fundamental limitation. We will be introducing some of those features gradually.

Use Cases

Let us check out two use cases where CLC is the perfect match. We will have more articles in the future about more complex use cases.

Running SQL Queries

Running SQL is one of the most important use cases for CLC, so it is not a surprise that there’s good support for that.

When you run CLC in the shell mode, you can directly run SQL queries or use shortcut commands like dm which helps with data exploration.

The SQL can be multiline and it is syntax highlighted. You can use the up/down arrow commands to navigate the shell history.

In the shell mode, the results are displayed as tables.

CLC> select __key, name from dessert order by name limit 2;
    __key | name                            
        4 | Apple cake                      
        0 | Baklava                         
OK (19 ms)

Of course you can run SQL queries against streaming data sources as well. The simplest example would be:

CLC> select * from table(generate_stream(1));
OK (4370 ms)

Running SQL queries in the command-line mode enables using more output formats, such as JSON. You can feed the output to a specialized tool, like Visidata for easier exploration:

$ clc sql "select * from dessert" -f json -q | visidata -f json


Finally, you can write a bunch of SQL statements to a file and run them in batch. That’s a nice and easy way of creating mappings and populating maps. We discuss that more in the Operating Modes section, but here’s a sample for Linux/MacOS:

$ cat my-script.sql | clc

Diagnosing and Fixing Map Entry Problems

Hazelcast Maps do not have a schema, both the keys and values can be in any of the supported types. The keys and values also can be serialized by any serializer, such as Compact Serialization or Avro. This is both a blessing and a curse: It is trivial to write to a Map, but when you want to read the data, you usually should know the type of the object stored there.

That sometimes becomes a challenge when different programs or different versions of the same programs write keys and values using a different type or a serializer. For instance, Program A written in Java sets key “person1.age” to 32 bit integer, but Program B written in Javascript sets the same key to float64. If Program A wants to read the value back, then it would raise an exception, since the type it received is not the same type it expects.

A similar problem happens when you have a SQL mapping on a Map. If the Map is updated with set/put calls and the key or value doesn’t match the mapping, you’ll get an error similar to the one below when you want to read / update even delete the data using SQL:

com.hazelcast.jet.JetException: Execution on a member failed: com.hazelcast.jet.JetException: Exception in ProcessorTasklet{09ae-c83a-f582-0001/Project(IMap[public.dessert])#3}: com.hazelcast.sql.impl.QueryException: Failed to extract map entry key because of type mismatch [expectedClass=java.lang.Integer, actualClass=java.lang.String]

It can be challenging to find out which entries in the Map cause the problem. But CLC makes it trivial to spot the problematic entries with its --show-type flag which can be used with map entry-set and map get commands:

CLC> \map entry-set -n dessert --show-type
    __key | __key_type | this | this_type
        2 | INT32      | >    | JSON
        0 | INT32      | >    | JSON
       14 | INT32      | >    | JSON
  Baklava | STRING     | YUM  | STRING
        6 | INT32      | >    | JSON

In the output above, it is immediately obvious that the key/value types of the entry with key Baklava doesn’t match the others. You may opt to remove that entry altogether:

CLC> \map -n dessert remove -k string Baklava

Or replace it if you know the correct value:

CLC> \map -n dessert set -k i32 0 \
    -v json '{"name":"Baklava", "theme":9, "crossteam":0, "jit":1}'

A Tour of CLC

Here’s a short tour of CLC. We will have more articles detailing some of the futures mentioned in this section. In the meanwhile, you can check out our documentation.


We provide binaries for the popular platforms at our Releases page. Since CLC is a single binary, you can just download the release package for your platform, extract it and optionally move it to somewhere in your PATH.

Currently we provide precompiled binaries of CLC for the following platforms and architectures:

  • Linux/amd64
  • Windows/amd64
  • MacOS/amd64
  • MacOS/arm64

If your platform is not one of the above, you may want to compile CLC yourself. Our build process is very simple and doesn’t have many dependencies. In most cases just running make is sufficient to build CLC if you have the latest Go compiler installed.

Additionally, we provide an installer for Windows 10 and up. The installer can install CLC for either system-wide or just for the user.

Home Directory

CLC keeps all of its files in a well-known location, which we will call $CLC_HOME.
$CLC_HOME includes known configurations, logs and other directories and files.
You can find out $CLC_HOME by running clc home.


The configuration contains information about how to connect to a cluster, and other bits of settings.
The directories in $CLC_HOME/configs are named or known configurations. It is adequate to tell about them to CLC using the --config (or -c for short) flag.

You can pass the full absolute/relative path of the configuration to CLC using --config even if that configuration is not a named configuration.

If the configuration was specified, or there’s the default named configuration, CLC tries to load it. Otherwise:

  • In the command-line mode it displays an error
  • In the shell mode, it displays a list of configuration items you can choose from. If there are no configuration, it shows you the config wizard to add a Viridian cluster configuration.

You can check out clc config --help for help on the configuration commands.

Getting Help

You can use the help command or the --help flag to display the help in the command-line mode:

$ clc --help
$ clc map --help
$ clc help map

In the shell mode, try the help and help commands.

Operating Modes

As mentioned before, CLC can operate in one of three modes: command-line, shell or batch.

The command-line mode is suitable to run one-off commands in the terminal, or to use CLC in shell scripts, such as Bash, Powershell, etc. In this mode, CLC exits after completing a single operation.

$ clc map -n dessert size

You can use the -q flag to suppress unnecessary output:

$ clc map -n dessert size -q

Auto-completion files for CLC commands can be generated for many shells, In order to see your options, check out the output of the clc completion --help command.

In the shell mode, CLC starts a command shell, similar to Python, Bash or Powershell. It displays the CLC> prompt and waits for you to input a SQL statement or a command.

You can enter SQL statements directly, but should add a semicolon so CLC knows the statement ended:

CLC> select * from dessert limit 1;
    __key | name        |      theme |  crossteam |        jit
        2 | Carrot cake |          9 |          9 |          1
OK (13 ms)

CLC commands can be entered by prefixing them with a backslash () character. These commands should fit in the line:

CLC> \version --verbose
Name                   | Version                                 
Hazelcast CLC          | v5.2.0                                  
Latest Git Commit Hash | XXXXX
Hazelcast Go Client    | 1.4.0                                   
Go                     | go1.20.2 linux/amd64                    

You can cancel long running operations by pressing Ctrl+C, and exit the shell by pressing Ctrl+D or typing exit.

CLC connects to the cluster once if necessary and stays connected to the cluster until you end the session by exiting the shell. Because of that, interactive mode commands that require a connection to the cluster take shorter to complete.

It is desirable to run a batch of SQL statements and commands from a file in some cases, such as creating mappings and pre-defined data in a cluster.

In order to run a batch file with CLC, just save the SQL statements and commands in a file and pipe it to CLC.

On Linux/MacOS:

$ cat examples/sql/dessert.sql | clc

On Windows:

$ type examplessqldessert.sql | clc

Just like with the shell mode, the batch file can consist of SQL statements ending with semicolumns and commands prefixed with backslash. You can use SQL comments (--) as well. Here are a few lines from dessert.sql which can be found in the CLC repository:

-- (c) 2023, Hazelcast, Inc. All Rights Reserved.

    __key int,
    name varchar,
    theme int,
    crossteam int,
    jit int
    'keyFormat' = 'int',
    'valueFormat' = 'json-flat'

Output Formats

When it comes to output, one size doesn’t fit all, so CLC supports a few. You can specify the output command using the --format (or -f for shorthand) for all commands.

When using the delimited format, the fields in the output are separated by tab characters. Useful when the output is fed to another command which expects simple text. The output will be trimmed if it doesn’t fit to a single line. This is the default in command-line mode.

$ clc sql "select name, jit from dessert order by name limit 2" -f delimited
Apple cake  0
Baklava 1

With the json format, each row of output is converted to a JSON document. This format is preferred if the output is fed to another command which can read JSON input.

$ clc sql "select name, jit from dessert order by name limit 2" -f json
{"jit":0,"name":"Apple cake"}

CSV is the universal data exchange format, so there are many tools which can read this format, including Microsoft Excel and LibreOffice. Setting the format to csv makes the output a CSV table.

$ clc sql "select name, jit from dessert order by name limit 2" -f csv
Apple cake,0

In order to view the output, table is the best format. This is the default in the shell mode.

$ clc sql -c local "select name, jit from dessert order by name limit 2" -f table
name                      |        jit
Apple cake                |          0
Baklava                   |          1

We’ll end the tour here, but there’s more to CLC which we are going to explore in future articles.

What’s Next?

We have just started and we plan lots of new features and functionality for CLC. One of the most important new features is Jet job submission.

Currently, you can create a Jet streaming pipeline by either executing some SQL (which CLC handles very well) or coding a pipeline with Java and submitting it with the hz-cli tool. The latter way of creating a pipeline is currently not possible with CLC, but I have good news: We are working on it and the new feature will be released with CLC v5.3.0.

Just to be clear, the pipeline still should be written in Java but CLC will be able to submit it as a Jet job. This feature will work for Viridian and Hazelcast Platform v5.3.x and above clusters.


Hazelcast CLC is a command line tool primarily for developers. It starts fast, runs fast and plays well with other command-line tools.

We are always looking for more feedback! Consider joining the #clc channel at Hazelcast Community Slack. You can get an invite at: https://slack.hazelcast.com and participate in our survey.

We recently had a chat about CLC, here’s the recording of the chat.

Some useful links about CLC:

Earlier this month, Hazelcast hosted the Real-Time Stream Processing Unconference, a peer-to-peer event focused on trends in real-time analytics, the future of stream processing and getting started with “real-time”. The hybrid event was hosted at London’s famous Code Node and streamed via  LinkedIn, YouTube and Twitch.

The Real-Time Stream Processing Workshop

The one benefit of attending in person was the pre-Unconference workshop on real-time stream processing led by Randy May. During this session,  participants learned to build solutions that react to events in real time and how to deploy their solutions to the Hazelcast Viridian Serverless. More specifically, participants walked away with a number of new skills, including:

  • Conceptual foundations needed to successfully implement stream processing solutions, such as the basic elements of a Pipeline.
  • How to build a real-time stream processing solution.
  • Deploying and scaling the solution using Hazelcast Viridian Serverless.

Everyone at the Stream Processing Fundamentals Workshop earned a digital badge highlighting their new stream-processing skills on LinkedIn and during job interviews. Based on the early feedback, it’s safe to say we’re looking forward to more of these workshops in the future!

The Real-Time Community

Moving on to the actual Unconference, I had the Hazelcast presented the Real-Time Community, a community that is driven by its members to create and share use cases and solutions for real-time stream processing applications. The section covered what’s slowing down the real-time trend, what’s the DNA for real-time stream processing and how to simplify the software architecture. The takeaways from the previous section are to make the community feel at home, to build a community-based portal and to focus on the importance of the community by listening to the community feedback and act accordingly.

Since I joined Hazelcast, I’ve spent a lot of time speaking with members of the community at meetups, industry events and through our Slack  Community. As this was my first time broadly addressing the community, I wanted to open the event by sharing what I learned through those conversations and the vision for the Real-Time Community, a new initiative within the company. The TL;DR version:

  1. Community feels at home
  2. Build a community-based portal
  3. Listen to the community and act accordingly

Finally, I wrapped up the opening session with an open, honest conversation on “real-time” and Hazelcast’s perspective on the rapidly growing topic. The section covered the confusion surrounding the trend, what’s the DNA for real-time stream processing and simplifying the software architecture. 

The Real Fun of Real-Time

From there the real fun got started with a look at the future of real-time stream processing. From where data is processed, to taking action in real-time to understanding where and when to enrich data with the historical context, this section had a lot of information in it, so here is your TL;DR summary:

  1. The new horizon for the data-driven enterprise of 2025
  2. This horizon impacts all apps
  3. Getting started begins with deploying a high-value, real-time stream processing engine that enriches new data with the historical context of stored data.

The Real-time Stream Processing Roundtable Discussion

The Unconference wrapped up with a roundtable discussion between real-time experts from all walks of life – daily practitioners, industry partners, academia and of course, the open source community. Among the topics discussed between the panelists and audience, included: partners, community members, academia, and open-source and paid users to cover the topics of current trends of real-time stream processing; challenges of real-time stream processing; benchmarking real-time stream processing applications. 

The inaugural Unconference was an amazing success and I’d like to thank everyone that attended in person and virtually. I’d also like to congratulate the newest Hazelcast Hero Martin W. Kirst for his valuable contribution to our community. During the day, Martin works for Oliver Wyman, and at night he patrols the community offering bits of wisdom and amazing contributions.

A special thank you – from all of Hazelcast – to the entire presenters for being gracious with their time and making it a success:

Tim Spann 🥑 : Tim is a Principal Developer Advocate in Data In Motion for Cloudera.

Dr. Guenter Hesse: Dr. Hesse drives the group data integration platform (GRIP) at Volkswagen AG as Business Partner Manager and Cloud Architect. He created a new benchmark for comparing data stream processing architectures. He proved the function of his benchmark by comparing three different stream processing systems, Hazelcast Jet being one of them.

Mateusz Kotlarz: Mateusz is a software developer BNP Paribas Bank Polska with around 9 years of experience. For the last 6 years he’s focused primarily on integration projects using Hazelcast technology for one of biggest banks in Poland.

Vishesh Ruparelia: Vishesh is a software engineer, currently at AppDynamics, with a keen interest in data processing and storage. With a background in computer science, he actively seeks new opportunities to expand his knowledge and experience in the field. Vishesh has been exploring stream processing capabilities lately to gain a deeper understanding of their potential applications.

– From Hazelcast: Manish Devgan, Chief Product Officer. Karthic Thope, Head of Cloud Engineering and Avtar Raikmo, Head of Platform Engineering, Randy May, Industrial Solutions Advocate.

Finally, make sure to follow and join the community to stay up-to-date on the latest, including future community events:

– Slack: https://slack.hazelcast.com/

– Twitch: https://www.twitch.tv/thehazelcast

– Watch the event: https://www.youtube.com/watch?v=GncQz8HZr68

Reacting to data is one thing; reacting to data as quickly as possible — as soon as it’s generated — is quite another. Today, it’s the difference between slaying the competition or being slayed by the competition. 

I saw the power of streaming when I was with Meta. In fact, I joined Hazelcast because I was blown away by how it enables companies that aren’t Meta-size or that don’t have Meta-dollars to take full advantage of real-time stream processing. Just a few years ago, companies didn’t understand what real-time stream processing could do for them, especially if they were in an industry that didn’t historically use real-time data. But now they see the benefits streaming has across an organization, regardless of size or industry, and in consumers’ everyday lives.

Just look at mobile phones today. They have crash detection. They can read from a number of different sensors at the same time and make an assessment that they believe you have been in an accident. Based on anomalous behavior detected by the sensors, your phone dials emergency services. Simultaneously, the phone directs them to your precise location using real-time information (e.g. GPS coordinates) combined with historical context and reference information.

It’s really about combining and comparing what’s known to be “normal” with what’s happening now and using the resulting insight to react. More specifically, the phone reacts by taking action, not in a few minutes or even a few seconds, but instantaneously. That’s what consumers have come to expect and what companies must give them. 

We’ve seen the significant benefits of real-time stream processing and decision-making in the financial arena. For example, institutions have generated and saved many millions of dollars through the ability to offer loans and prevent fraud, respectively.

But finance is just the tip of a very big iceberg. 

From a supply chain perspective, combining transactional, operational and historical data into a single platform benefits every organization.

The pandemic was a really big (OK, gigantic) example of how changes to the supply chain affect organizations and their customers. Still, even the smallest anomaly can affect whether organizations can get what they need so that customers can get what they need (and want). In fact, companies that react immediately and automatically to even the smallest changes in the supply chain not only rise above the rest, but do so in a lean and mean manner. They need fewer people focusing on logistics, which puts them in a totally different competitive category than organizations that require hand-holding when links in the supply chain begin to weaken and — often, without immediate intervention — eventually break.

You can really see the flexibility and potential of real-time stream processing when you start to apply the technology to specific industries. The finance industry has been on the real-time streaming journey for a long time — it was really predisposed to streaming. Now, we’re seeing how the technology can filter into more and more spaces, including (but certainly not limited to):

Energy: Real-time streaming technology can help optimize clean energy generation, such as wind and solar. If you know there is an environmental event somewhere, real-time stream processing can change the profiles of how energy is captured, stored and/or delivered to make it as cost-effective as possible and ensure the energy is delivered where it is needed in the moment — or even ahead of time. 

Healthcare: Healthcare is already changing due to real-time data and wearable technologies. Real-time stream processing can help healthcare providers and consumers be more proactive about health because they can predict that something will happen. On an individual-patient basis, this could mean preventing an hours-long wait in the ER or even a potentially fatal event. On a macro level, healthcare organizations – from hospitals to universities to non-profits – can optimize services to ensure they have the right providers and supplies for a dynamically changing landscape, where emergent healthcare needs can be promptly addressed.

Government: If you think about how governments divert cash into things that matter today, it’s a very manual, very slow process. The use of real-time streaming data could help drive that decision-making. Governments are really good at collecting information, but not so good at reacting to it. It will be a slower uptake, but I predict real-time stream processing capabilities will filter through to the public sector in the next decade.

No matter the industry or application, it’s all about personalizing and enriching reference data with streaming data – and vice versa.

But while all organizations can benefit from real-time stream processing, few today have the resources to enable true real-time decision-making. Not only must relevant data sources be connected, but it also takes a significant amount of compute power and storage space to support the intelligent combining of transactional, operational and historical data. Whether on-premises or in the cloud, Hazelcast provides the technology and support organizations need to deploy fast, highly scalable applications for running large calculations, simulations, and other data- and compute-intensive workloads.

We’re still quite early on the journey. Still, mobile phones, smartwatches and social media are making organizations and consumers more aware of real-time data and decision-making benefits. Soon, they’ll expect nothing less if it hasn’t happened already.

Have you ever completed a major purchase – where you felt like you had to provide way too much personal information about yourself to complete the sale – only to have to keep reminding the store who you are and what you purchased from them as though the purchase never happened? This blog post contemplates this very scenario and wonders if from the store to the manufacturer to the service center to the products themselves, if we, the consumer, should be expecting a better customer experience.

Honey, I want a new washer and dryer for Christmas

Years ago, my wife and I needed to purchase a new washer and dryer. We went to a local appliance store and purchased the appliances. They were not top-of-the-line, however, like all the models they did have digital sensors that would return diagnostics codes should any issues arise. We also purchased the extended warranty since these appliances were a significant investment.

As luck would have it the washer would need maintenance around one year (thankfully we had the extended warranty!). We called to schedule for service and this is where things get interesting. Of course, the service center wants to troubleshoot the washer over the phone. We push various buttons on the washer, and it returns the status codes that indicate the issues. The service center schedules a service call.

Now, having a washer out of order in my house is awful. There always seems to be something being cleaned. Plus, the soonest we could get a service call is two weeks out. This means we are going to have to make a trip to a laundromat at some point. Not a happy house.

Day of the service call, the service technician arrives, runs the same diagnostics, and then concludes he needs to order the parts. How long will that take? At least another two weeks! We waited two weeks already for the service technician to come to our house to tell us what we already knew, just so he could order parts. The service center already knew what was wrong with the washer – it should have sent the parts with the service technician.

Eventually, the parts showed up and the service technician arrived for a second time to install the parts. The washer was fixed and operating again. However, this experience demonstrated to me that despite the sensors that could run diagnostics, and service centers being able to retrieve those trouble codes there are still major gaps in systems downstream for a seamless customer experience.

First, why didn’t the service technician have the parts during the first service call? The troubleshooting session via phone with the service center was quite thorough and extensive. Why perform this exercise if it “doesn’t matter”? Perhaps there was conflicting information that needed to be sorted out. Or the extent of the repair needed to be assessed. I would still argue in favor of being prepared for each contingency than having to return a second time. 

Second, why have the service technician come back twice? We only paid for one service call. In fact, we didn’t pay for anything since it was all under warranty, so from an operational and financial standpoint you could conclude that completing the first service call as efficiently as possible would be desirable.

Third, the parts were shipped to us, not the service technician and then we were supposed to call the service center when they arrived. This process seemed very strange to us, especially since Amazon takes a picture when they deliver a package and sends it to you. After all, “they” (we really don’t know who “they” are – is the service center part of the manufacturer or the appliance store or a completely different third-party?) shipped us the parts, with a tracking number, to our home address, linked to this service call and faulty appliance – you would think they would know the parts arrived as soon as we did. Why are we calling them to tell them the parts they shipped us have arrived?

Clearly there are operational and financial inefficiencies with this process.

I am connected but I cannot go outside

Last year, the oven on our range stopped working. After some light troubleshooting by your author, it was determined that the range was 38 years old and that we should just buy a new one. I wondered where we are on the Internet-of-Things? What is the killer app for appliances?

I keep remembering those IBM Watson commercials where the elevator repair person shows up to fix the elevator before it breaks. I am not expecting this to have trickled down to ranges yet but maybe they have closed some of the maintenance and repair gaps from years earlier. After all, appliances are not getting cheaper, and they are adding more “connected” features.

The term “IoT in a Box” comes to mind as I perform my research. These are features that appear to be “connected” that upon further inspection are limited to the appliance itself and maybe the owner’s mobile phone and home Wi-Fi network. I know, some will argue that this makes it connected. Yes, in spirit, but just like how a larger antenna increases your broadcast signal range, or how a megaphone makes your voice louder.

The “killer app” can be summed up by this example from one manufacturer that showed a person cutting vegetables on their kitchen counter without having to stop cutting to preheat the oven by using speech commands. They touted the convenience of not having to stop what they were doing to preheat the oven. Oh, the burden! This range cost another $1,000 for this convenience. No mention regarding the maintenance and repair.

Funny thing about appliances is that you don’t realize how much you use them until they are not working. In the end, we purchased a relatively low-end model range. Fortunately, the science behind making heat to cook food has not changed too much over the years. Making fire with a push of a button or turn of a knob met our needs. It did have sensors and would return trouble codes, if needed.

The trouble with trouble codes

With a new range installed and household harmony restored, I was still bothered by what appeared to be a lack of progress in appliance maintenance and repair. Seemed to me like a big missed opportunity by manufacturers and retailers. I should note that everyone wants to sell you protection, extended warranties, but that is just insurance and depending on the item could be a waste of money or a lifesaver. Whether you buy the extended warranty or not is usually determined by your life experiences. How the warranty works, repair or replacement, is also usually determined by the item and extent of repairs required.

As luck would have it the oven on our year-old range suddenly stopped working. Funny thing about trouble codes, I ran the diagnostics, and range did not know the oven was not working. Here we are with a range that does not have a working oven. Diagnostics that indicate everything is good. The regular warranty expired. The retailer is happy to repair it for free if we purchase the extended warranty – which is about 40% of the initial purchase price.

We found a local repair service that could call in two days. Fortunately, it was a defective ignitor, which the service technician had in his truck, and he was able to fix it without issue. The service technician was surprised it failed – they usually last 5-8 years!

Demand “real” real-time solutions

As I reflect on these experiences, I see ample opportunity for real change in the future. My experiences above illustrate that real change in the Internet of Things space has yet to begin. For the Internet of Things to become a reality it will require not just sensors in appliances to communicate back to the manufacturer – but for manufacturers to use this data to improve the entire customer experience.

As much as companies tout the benefits of customer journeys and customer retention they should be focused on delivering real value and minimizing real pain points. We may like our washer and dryer, but we will always grumble about our visits to the laundromat.

Using data to benefit both customers and the business is quite mainstream. For example, financial services companies are leveraging Hazelcast real-time stream processing capabilities in fraud detection. Another way of looking at fraud detection is looking for anomalies in financial processes, such as credit card transactions or cash withdrawals. 

The same logic can be applied when performing diagnostics in appliances – looking for anomalies. Manufacturing companies that build their next-generation of Internet-of-Things products using Hazelcast real-time stream processing can deliver features and functionality beyond just “IoT in a Box.” They can deliver customer experiences that dazzle and delight while differentiating themselves in the marketplace.

Hazelcast can ingest multiple streams of real-time data from the Internet and enrich it with on-premises data sources while in motion. The potential to merge all these data-streams:

  1. Diagnostics
    1. washing machine metrics: water temperature, spin cycles, load sizes, maintenance cycles
    2. sensor data: trouble codes, etc., after the data is captured then further diagnostics can be performed to determine root cause and machine-learning algorithms can be applied to predict future failures and recommend preventative maintenance
  2. Parts inventory – with incoming sensor data manufacturers have insight into the locations of their products and can stage spare parts with great precision
  3. Shipping – next-day or same-day shipping and delivery of parts could become commonplace with the visibility that sensor data brings
  4. Service technician scheduling – with increased sensor data comes improved reliability resulting in service technicians becoming more confident that the appliance, issue, parts and resolution are all correctly aligned, this would lead to efficient and timely scheduling of service calls, resulting in less downtime for the product. IoT would provide manufacturers with the ability to determine which products need maintenance, stage and track parts inventory, quickly ship parts to customers, track shipments in real-time, and schedule service technicians efficiently.

Products break and require maintenance, however, as anyone who has had to spend an evening at the laundromat while their broken washing machine is at home waiting for repair could testify, what is the story you want your customer to talk about your product then? In today’s connected world, a seamless and near-effortless repair experience would be a “killer app.”

Deliver real-time solutions with Hazelcast today!

If the idea of leveraging stream processing to handle your real-time data sounds like a good idea, take a look at the Hazelcast Platform, a real-time stream processing platform that offers the simplicity and performance (among other advantages) you seek to gain a true real-time advantage. “How simple” and “how fast,” you ask? With our free offerings, you can see for yourself. Check out our software, and our cloud-managed service, or even just contact us so we can discuss the implementation challenges you face and how we can help you overcome them.


We’re excited to announce our new Hazelcast Hero, Martin W. Kirst!

Martin is a leading software architect for Oliver Wyman, a global strategic consultancy company. He has been crafting software and building large cloud platforms for three decades and has gained plenty of experience in various roles such as Software Engineering, Software Architecture, and Product Ownership across the Energy Retail, Logistic/Transport, Finance/Accounting, and Meteorology. He’s super passionate about technologies, tools and automation to drive the business. While developing the infrastructure for a large cloud-native workflow automation and orchestration platform, he contacted Hazelcast. It’s our pleasure to welcome Martin’s contribution of Ringbuffer support in Hazelcast’s Go Client SDK. You can connect with Martin on Linkedin: https://www.linkedin.com/in/martin-w-kirst/

We had a quick chat with Martin about his experience with Hazelcast:

Fawaz: How did you get to know Hazelcast?

Martin: I got to know Hazelcast because of my engagement with Camunda‘s BPMN suite. They are using Hazelcast ringbuffers to decouple the core BPMN service from any kind of visualization or archiving service via a pub-sub event architecture.

F: What was your contribution to Hazelcast?

M: I did contribute (code) ringbuffer support in the Hazelcast Go client SDK

F: What advice would you give to developers who contribute to Hazelcast?

M: The Hazelcast Open Source Community is very open and that’s what I enjoy. E.g. chatting via Slack, or reviewing PRs live, and being honest to each other and sharing constructive feedback. It’s not about my personal ego, but rather the collaborative way of creating something truly awesome.


You can watch our live stream about Level-Up Go Applications with Ringbuffers on Youtube: https://www.youtube.com/watch?v=yYCCGc2rzDg&ab_channel=Hazelcast

You can also speak to Martin live at the upcoming Unconference event. Sign up free: https://hazelcast.com/lp/unconference/

More about Martin

Martin is working for Oliver Wyman, a global strategic consultancy company, as a leading software architect which consults and engineers for clients. Martin is crafting software and building large cloud platforms for three decades already. He gained plenty of experience in various roles such as Software Engineering, Software Architecture, and Product Ownership in the industries Energy Retail, Logistic/Transport, Finance/Accounting, and Meteorology. He’s super passionate about technologies & tools and automation to drive business. While developing the infrastructure for a large cloud native workflow automation and orchestration platform, he got in touch with Hazelcast. With Hazelcast’ engagement in Open Source as well as the very inviting community, it was a pleasure for him to contribute the Ringbuffer support in the Hazelcast’s Go Client SDK.


What if there was a delay between a racing driver’s steering wheel and the desired change in direction of the vehicle travelling at 350 km/h? The results would be devastating. What about a delay between a pilot’s stick input and the response from the aircraft’s flight control system? The ramifications could be fatal for all onboard the flight and, potentially, for those on the ground as well. Or a delay in the quality detection of expensive, raw materials before they embark on the next stage for irreversible processing? This would be extremely costly and disruptive to everyday operations and production.

The impact of “lag” (or maybe latency or delay) in real-time, mission-critical systems is clear to see in these three examples, and there are many other cases where response lag could have an immeasurably disastrous and costly impact, for example:

  • Nuclear reactor control systems
  • Air traffic control systems
  • Healthcare systems
  • Transport signalling systems
  • Space exploration

But the above industries have billion-dollar budgets and huge amounts of resources supporting them around the clock to ensure that lag never happens.

What about examples where the outcome or impact may not be as severe, but nonetheless critical to those affected, like in modern business environments?

Consider an e-commerce business. If you had lag in your website response time, that would create frustration with your customer base and cause sales to drop. In fact, one major North American fashion retailer experienced this very situation – their online sales fell by 11 percent when the response to their website slowed by just half a second. That resulted in tens of millions of dollars being lost in that financial year. If you are a financial services company and your fraud detection system couldn’t accurately detect fraud while a payment was being made, what would happen? Denying customer purchases due to inadequate fraud systems could result in customer churn as well as a negative brand association. And failing to detect fraudulent payments (false negatives) could result in significant financial loss and penalties. Neither is a very good outcome.

Now, to avoid any confusion, let’s be clear about what real-time means. 

Real-time means:

“Acting on information in the moment. While, when, during and before an event, not after”.

Let’s look at some examples:

  • When a customer is shopping online – offering real-time recommendations and personalisation
  • During a fraudulent event – making sure you make the right call.
  • Before a process breaks something in an industrial machine
  • When a patient needs immediate attention

Nowadays, the term “real-time” has been adopted by leading businesses. They know they can gain competitive advantage, dominate the market, and become much more agile and efficient if they build applications that are responsive to their real-time, in other words, their SLA. Like automatically upselling to a customer while they’re purchasing online, or identifying a fraudulent transaction as it begins processing, for example. 

So, it’s clear that you need real-time action in an ever-changing, competitive landscape. But you need applications that can be built quickly and easily to provide continuous and consistent real-time capabilities and always be available. Many platforms claim to deliver real-time, but the reality is that they’re only retrieving the data very quickly. But data retrieval happens well after the event, along with the data extraction, transformation, loading, aggregation, and analytics. And even after that there may be a dependency on human action, which is a significant source of delay. So, the value of that data diminishes since the opportunity is missed. Collecting data in real-time is one thing, while acting on it is another. Knowing what needs to be done is a good step but automated action (i.e., real-time responsiveness) is needed to deliver the desired positive outcome. True real-time applications involve continuously ingesting, analysing, processing, and then acting on the data “in-flow” outside of a database or any other service so that the maximum value can be gained – in the moment, while the event is taking place.

One additional, crucial component needed to support effective real-time action is reference data. This is external data that, when merged in, enriches the freshly created, in-flow data to add context, meaning, truth and credibility. We all know that the best way to make important decisions is to have all the supporting information to hand. Context is everything. Being able to answer the who, what, where, why and when, as well as other questions, offers the best chance of making the best decision. 

So, the ideal platform will allow you to build applications that can ingest multiple streams of continuous, fast flowing data, analyse it, merge it with reference data and give you the ability to act on data automatically – all in real-time.

The Hazelcast Effect

The Hazelcast Platform delivers true real-time capabilities that drives greater innovation. We combine a high-performance stream processing engine with low-latency storage and machine learning inference capabilities. This enables you to swiftly and easily build high-performance, intelligent, business applications that can ingest huge volumes of perpetually flowing data and deliver valuable real-time insights and action for any use case. Being able to process more quality data can yield more accurate, context-rich insights, removing the need for guesswork and assumptions when acting on data – all within your business SLAs, in real-time. With vast quantities of data from various data sources, you can build logic to determine patterns by correlating multiple streams of changing event data based on conditions common to both streams of events. 

You can deploy anywhere – in any cloud, on-premises, or hybrid – and deliver a fully-managed service for those not wanting to manage infrastructure operations. You also get  connectivity to any data source and destination, benefiting omnichannel partnerships by providing a seamless experience across the divide.

Real-Time Use Cases 

Here’s a list of just a few use cases that benefit from real-time responses

  • Personalized recommendations in e-commerce
  • Payment processing
  • Fraud detection
  • Industrial operations optimisation
  • Distracted driver analysis
  • Predictive maintenance
  • Back-office trade monitoring
  • Counterparty risk calculations
  • Credit value adjustments

Real-Time Case Studies

Real-time is about meeting SLAs with narrow time windows. That means you can get:

Real-time is here. Now. In fact, very soon, real-time action will be such a standard expectation in our lives that, without it, our lives will be dramatically inconvenienced.


Sophia Chang was a Hazelcast intern in the summer of 2022 and is a Data Science major at the University of California, Berkeley.

I was in my junior year of college and sending applications out to companies for internships for work experience and seeing what I could do outside of the classroom. I chose Hazelcast because they worked in a field I found unique while also having room to work on ideas I found interesting prior to my internship.

The project I decided to work on was a movie recommendation system. The user could input a movie title, run a pipeline runner that funnels the title through a Python script and then write out the results. Having a pipeline runner allows processing large amounts of data across all the cores of a CPU, meaning lower latency and time complexity. For the recommender, the imdb dataset I trained on had a decently large amount of films and outputs several movies in response to one single liked film. This program has a request-response access pattern with a one-to-many relationship. In response to a request containing the title of one example movie, the recommendation system will generate a response of several movies similar to the example.

The rationale behind such a project was that I wanted to implement a demonstration that was not only related to what I studied in class, but also was approachable to many people. Because, let’s face it, people love movies: film suggestions are easy to grasp. This project also lent itself to visually appealing, interactive experiences where clicking on a single movie poster in a UI could return a webpage full of related movies, each displayed as its original cinematic release poster. I knew this was possible because the IMDB database includes cinemartic poster artwork. To attract more people, interactivity and visually interesting interfaces would need to be present. Furthermore, movie recommendation algorithms are of interest to entertainment enterprises, so this system can be used to advertise to them.

There was a lot to learn about the Hazelcast Platform in a short amount of time. Luckily, there were video tutorials and “lab exercises” I made use of that gave me a grasp of the technology. Most of my work was done with stream processing pipelines rather than with in-memory storage. Hence, Hazelcast’s low-latency, in-memory data store makes an ideal feature store for things like the IMDB data used by my ML model, which is also a thing I want to work with (and integrate into the project) in the future.

I had a little experience working in a company before during high school, but what I did there had more of an IT bent than a computer or data science bent, so I was still a bit unprepared about how working in the industry differed from working on projects in academia. In general, working on projects in a company (rather than in college) exposed me to how different things can work out (or, in my case, not work at all) in Unix systems vs. Windows. A VirtualBbox running Ubuntu was a godsend for running and testing my project during the later stages when working on the pipeline app. This was necessary since running Python apps through the Hazelcast pipeline is, as of this writing, impossible on Windows. Other than that, it was the first time I worked on a single project that utilized multiple programming languages (Java and Python) and where I had an immense amount of freedom to define its scope and direction.

Tuning the model took a large amount of time, including the time I spent trying to find one pesky error that turned out to be much simpler to solve than expected, so sadly there was much I could not get to with regards to my initial plan, like creating a visual interface. In its current state the program is mainly a command line interface that spits out a file, though with further work a website would be great. Initially I thought it was a good idea to use IMDB tags, not realizing that since anyone could make and add tags, they would actually be almost useless for recommendations. I realized this when in the initial stages having Toy Story as a liked film recommended The Shining. Even reducing the weight that IMDB tags held in the overall algorithm did nothing, so eventually I decided to take that out completely and now inputting Toy Story returns a completely age-appropriate list (mostly other Disney/Pixar animated films like Frozen, The Lion King, Monster Inc, etc.) with no unexpected picks.

My mentor/manager, Lucas Beeler, was extraordinarily supportive in helping me plan the project and getting used to the Hazelcast Platform. Daily standups about my progress were common, soon leading to Zoom collaborations and in-person meetings for further work on the project. In addition, I also asked about in the company Slack server about how to integrate the Python scripts and troubleshooting a problem (that turned out to be a simple issue of case sensitivity in the string input), receiving many helpful responses from Frantisek, who built the Python runner and was based in the Czech Republic. Coordinating times was a hassle due to the time difference but it went well and was much appreciated.

My time working at Hazelcast exposed me to a great degree of new technical skills and helpful people to work with and ask questions of. As an aspiring data scientist, it also gave me new ideas of workplace problems beyond the ‘things that happen in real life but we also have to structure it for a classroom format’ projects on campus. In the classroom, we try to model problems that can happen in real life, but many of the exercises are contrived. Working in real-life is something decidedly different. For example, most of my technical classes don’t talk about proper logging and debugging practices. The logging system that you select can be the difference between knowing why your project is failing and being left in the dark, which is part of why the issue with case sensitivity persisted for so long. The movie recommender application was a great way to synthesize my personal interest in data-related topics, what I have learned, and new knowledge from running data through pipelines. I am definitely glad about working here and would do so again; I would wholeheartedly recommend interning at Hazelcast to anyone.

Author’s note: The blog updates the original post written in 2019.

Data is valuable. Or I should write some data that is valuable. You may think that if the data is important to you, you must store it in a persistent volume, like a database or filesystem. This sentence is obviously true. However, there are many use cases where you don’t want to sacrifice the benefits of in-memory data stores. After all, no persistent database provides fast data access or allows us to combine data entries with high flexibility. Then, how to keep your in-memory data safe? That is what I’m going to present in this blog post.

High Availability

Hazelcast is distributed and highly available by nature. It’s achieved by keeping the data partition backup always on another Hazelcast member. For example, let’s look at the diagram below.

Imagine you put some data into the Hazelcast cluster, for example, a key-value entry (“foo”, “bar”). It is placed into data partition 1, and this partition is situated in member 1. Now, Hazelcast guarantees that the backup of any partition is kept in a different member. So, in our example, the backup of partition 1 could be placed in member 2 or 3 (but never in member 1). Backups are also propagated synchronously, so strict consistency is preserved.

Imagine that member 1 crashes. What happens next is that the Hazelcast cluster detects it, promotes the backup data, and creates a new backup. This way, you can always be sure that if any of the Hazelcast members crashes, you’ll never lose any data. That is what I call “high availability by nature.”

We can increase the backup-count property and propagate the backup data synchronously to multiple members simultaneously. However, the performance would suffer. In the corner case scenario, we could have backup-count equal to number of members, and even if all members except for one crash, the data is not lost. Such an approach, however, would not only be prolonged (because we have to propagate all data to all members synchronously) but also use a lot of in-memory data. That is why it’s not very common to increase the backup-count. For the simplicity of this post, let’s say that we’ll always keep its value as 1.

High Availability on Kubernetes

Let’s move the terminology from the previous section to Kubernetes. We’re confident that if one Hazelcast pod fails, we don’t experience any data loss. So far, so good. It sounds like we are highly available, right? Well… yes and no. Let’s look at the diagram below.

Kubernetes may schedule two of your Hazelcast member pods to the same node, as presented in the diagram. Now, if node 1 crashes, we experience data loss. That’s because both the data partition and the data partition backup are effectively stored on the same machine. How do you think I could solve this problem?

Luckily, Kubernetes is quite flexible, so we may ask it to schedule each pod on a different node. Starting from Kubernetes 1.16, you can achieve it by defining Pod Topology Spread Constraints.

Let’s assume you want to run a 6-members Hazelcast cluster on a 3 nodes Kubernetes cluster.

$ kubectl get nodes
NAME                                    STATUS  ROLES    AGE   VERSION
my-cluster-default-pool-17b544bc-4467   Ready   <none>   31m   v1.24.8-gke.2000
my-cluster-default-pool-17b544bc-cb58   Ready   <none>   31m   v1.24.8-gke.2000
my-cluster-default-pool-17b544bc-38js   Ready   <none>   31m   v1.24.8-gke.2000

Now you can start the installation of the cluster using Helm.

$ helm install hz-hazelcast hazelcast/hazelcast -f - <<EOF
        enabled: true
        group-type: NODE_AWARE
  memberCount: 6
- maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway
      "app.kubernetes.io/instance": hz-hazelcast

Let’s comment on the parameters we just used. It contains three interesting parts:
– enabling NODE_AWARE for the Hazelcast partition-group
– setting topologySpreadConstraints spreads all the Hazelcast pods among the nodes.
Now you can see that the six pods are equally spread among the nodes.

$ kubectl get po -owide
NAME             READY STATUS    RESTARTS AGE   IP           NODE                                    NOMINATED NODE READINESS GATES
hz-hazelcast-0   1/1   Running   0        15m   my-cluster-default-pool-17b544bc-4467   <none>         <none>
hz-hazelcast-1   1/1   Running   0        15m   my-cluster-default-pool-17b544bc-cb58   <none>         <none>
hz-hazelcast-2   1/1   Running   0        14m   my-cluster-default-pool-17b544bc-38js   <none>         <none>
hz-hazelcast-3   1/1   Running   0        13m   my-cluster-default-pool-17b544bc-4467   <none>         <none>
hz-hazelcast-4   1/1   Running   0        12m   my-cluster-default-pool-17b544bc-cb58   <none>         <none>
hz-hazelcast-5   1/1   Running   0        12m   my-cluster-default-pool-17b544bc-38js   <none>         <none>

Note that with such a configuration, your Hazelcast member counts must be multiple of node count; otherwise, it won’t be possible to distribute the backups between your cluster members equally.

All in all, with some additional effort, we achieved high availability on Kubernetes. So, let’s see what happens next.

Multi-zone High Availability on Kubernetes

We’re sure that if any of the Kubernetes nodes fail, we don’t lose any data. However, what happens if the whole availability zone fails? First, let’s look at the diagram below.

Kubernetes cluster can be deployed in one or many availability zones. Usually, for the production environments, we should avoid having just one single availability zone because any zone failure would result in the downtime of our system. If you use the Google Cloud Platform, you can start a multi-zone Kubernetes cluster with one click (or one command). On AWS, you can easily install it with kops, and Azure offers multi-zone Kubernetes service as part of AKS (Azure Kubernetes Service). Now, when you look at the diagram above, what happens if the availability zone 1 is down? We experience data loss because both the data partition and the data partition backup are effectively stored inside the same zone.

Luckily, Hazelcast offers the ZONE_AWARE functionality, which forces Hazelcast members to store the given data partition backup inside a member located in a different availability zone. Having the ZONE_AWARE feature enabled, we end up with the following diagram.

Let me stress it again. Hazelcast guarantees that the data partition backup is stored in a different availability zone. So, even if the whole Kubernetes availability zone is down (and all related Hazelcast members are terminated), we won’t experience any data loss. That is what should be called the real high availability on Kubernetes! And you should always go ahead and configure Hazelcast in that manner. How to do it? Let’s now look into the configuration details.

Hazelcast ZONE_AWARE Kubernetes Configuration

One of the requirements for the Hazelcast ZONE_AWARE feature is to set an equal number of members in each availability zone. You can as well achieve it by defining Pod Topology Spread Constraints.

Let’s assume the cluster of 6 nodes with the availability zones named us-central1-a and us-central1-b.

$ kubectl get nodes --show-labels
NAME                                    STATUS  ROLES    AGE   VERSION            LABELS
my-cluster-default-pool-28c6d4c5-3559   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-a
my-cluster-default-pool-28c6d4c5-ks35   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-a
my-cluster-default-pool-28c6d4c5-ljsr   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-a
my-cluster-default-pool-654dbc0c-9k3r   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-b
my-cluster-default-pool-654dbc0c-g809   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-b
my-cluster-default-pool-654dbc0c-s9s2   Ready   <none>   31m   v1.24.8-gke.2000   <...>,topology.kubernetes.io/zone=us-central1-b

Now you can start the installation of the cluster using Helm.

$ helm install hz-hazelcast hazelcast/hazelcast -f - <<EOF
        enabled: true
        group-type: ZONE_AWARE
  memberCount: 6
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: ScheduleAnyway
      "app.kubernetes.io/instance": hz-hazelcast

The configuration is similar to one seen before, with small differences:
– enabling ZONE_AWARE for the Hazelcast partition-group
– setting topologySpreadConstraints spreads all the Hazelcast pods among the availability zones.
Now you can see that the six pods are equally spread among the nodes in two availability zones, and they all form a cluster.

$ kubectl get po -owide
NAME             READY STATUS    RESTARTS AGE   IP           NODE                                    NOMINATED NODE READINESS GATES
hz-hazelcast-0   1/1   Running   0        20m   my-cluster-default-pool-28c6d4c5-3559   <none>         <none>
hz-hazelcast-1   1/1   Running   0        19m   my-cluster-default-pool-654dbc0c-9k3r   <none>         <none>
hz-hazelcast-2   1/1   Running   0        19m   my-cluster-default-pool-28c6d4c5-ks35   <none>         <none>
hz-hazelcast-3   1/1   Running   0        18m   my-cluster-default-pool-654dbc0c-g809   <none>         <none>
hz-hazelcast-4   1/1   Running   0        17m   my-cluster-default-pool-28c6d4c5-ljsr   <none>         <none>
hz-hazelcast-5   1/1   Running   0        16m   my-cluster-default-pool-654dbc0c-s9s2   <none>         <none>
$ kubectl logs hz-hazelcast-0
Members {size:6, ver:6} [
 Member []:5701 - a78c6b6b-122d-4cd6-8026-a0ff0ee97d0b this
 Member []:5701 - 560548cf-eea5-4f07-82aa-1df2d63a4a47
 Member []:5701 - fa5f89a4-ee84-4b4e-993a-3b0d88284826
 Member []:5701 - 3ecb97bd-b1ea-4f46-b7f0-d649577c1a92
 Member []:5701 - d2620d61-bba6-4865-b6a6-9b7a417d7c49
 Member []:5701 - 1cbef695-6b5d-466b-93c4-5ec36c69ec9b

What we just deployed is a Hazelcast cluster resilient to Kubernetes zone failures. Just to add, if you want to have your cluster deployed on more zones with the same Helm installation, please don’t hesitate to let me know. Note, however, that it won’t mean that if 2 zones fail simultaneously, you don’t lose data. Hazelcast guarantees that the data partition backup is stored in the member, which is always located in a different availability zone.

Hazelcast Platform Operator

With the Hazelcast Platform Operator, you can achieve the same effect much more easily. All you need is to apply the Hazelcast CR with the highAvailabilityMode parameter set to NODE to achieve resilience against node failures.

$ kubectl apply -f - <<EOF
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
  name: hz-hazelcast
  clusterSize: 6
  highAvailabilityMode: NODE

Or if you have a multi-zone cluster and you want to have a cluster that is resilient against zone failures, you can use the highAvailabilityMode set to ZONE.

$ kubectl apply -f - <<EOF
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
  name: hz-hazelcast
  clusterSize: 6
  highAvailabilityMode: ZONE

And the Operator will configure both the partition-group and the topologySpreadConstraints to guarantee the needed level of high availability.

Hazelcast Cloud

Last but not least, Hazelcast multi-zone deployments will soon be available in the managed version of Hazelcast. You can check it at cloud.hazelcast.com. By ticking multiple zones in the web console, you can enable the multi-zone high availability level for the Hazelcast deployment. It’s no great secret that while implementing Hazelcast Cloud internally, we used the same strategy described above.


In the cloud era, multi-zone high availability usually becomes a must. Zone failures happen, and we’re no longer safe just by having our services on different machines. That is why any production-ready deployment of Kubernetes should be regional in scope and not only zonal. The same applies to Hazelcast. Enabling ZONE_AWARE is highly recommended, especially because Hazelcast is often used as a stateful backbone for stateless services. If your Kubernetes cluster is deployed only in one availability zone, please at least make sure Hazelcast partition backups are always effectively placed on a different Kubernetes node. Otherwise, your system is not highly available, and any machine failure may result in data loss.

Everyone is jumping on the real-time bandwagon, especially by deploying streaming data technologies like Apache Kafka. But many of us are not taking advantage of real-time data in the fullest ways available. It’s worth assessing what you currently have in place for your real-time initiatives to explore what else can be done to create more value from your data.

Many of us use Kafka (or a similar message bus) to capture data on business events that ultimately can be stored in an analytics database. Information updates on business artifacts such as sales orders, product inventory levels, website clickstreams, etc., all can be transferred to Kafka and then transformed into a format optimized for analytics. This lets analysts quickly and easily find actionable insights from the data that they can pursue. Example actions include creating new promotions, replenishing stock, and adding webpage calls-to-action.

The typical real-time data journey is depicted in the diagram below, which represents a pattern that has long been known as the Kappa Architecture. While that is not a term that’s used as much these days as when it was first defined, it is still an important concept for designing systems that leverage real-time data (i.e., data that was just created).

Real-Time Analytics Diagram
The Kappa Architecture for Real-Time Analytics

The Real-Time Analytics Data Flow – A Quick Review

There are 3 data stages between the various components in this pattern, as labeled by the circled numerals, that contribute to the “real-timeness” of the data usage. Arrow #1 represents the transmission of data from the source to the message bus (e.g., Apache Kafka). In many cases, the data in this leg of the journey is not significantly modified, so the data loading process typically occurs very quickly. This stage is a simple form of serialization, in which the data is formatted as comma-separated values (CSV) or JSON prior to insertion into the message bus, to simplify or standardize downstream processing. Sometimes the data is filtered or aggregated prior to loading it into the message bus to reduce storage requirements. But in general, keeping all data points intact is a common practice that lets subsequent data processing steps work on the raw data at a granular level.

Arrow #2 represents the most significant step in this pattern, where data must be processed to turn it into a queryable format to store in a database (or any analytical store) and thus make it more usable by human analysts. In this stage, data can be filtered, transformed, enriched, and aggregated, and then inserted into an analytics platform like a database or data warehouse. We know this step to be the extract/transform/load (ETL) step, and many “streaming ETL” technologies designed for data-in-motion can be used to perform this step. Traditional batch-oriented tools can also be used for this step, with the caveat that delays are introduced due to the time intervals of periodic ETL processing, versus continuous ETL processing. Once the queryable data is in the analytics platform, additional processing could be done, including creating indexes and materialized views to speed up queries.

One great thing about step #2 is that you aren’t limited to a single flow. In other words, you can run as many processing jobs as you want on the raw data. This is why the use of Kafka as intermediary storage is important here. Each of your processing jobs can independently read data from Kafka and format the data in its own way. This allows not only the ability to support many distinct types of analytical patterns by the analysts, but also the opportunity to easily fix any data errors caused by bugs in the processing code (also referred to as “human fault tolerance”).

Arrow #3 represents the last mile that entails quickly delivering query results to analysts. The delivery speed is mostly a function of the analytics database, so many technologies in the market today tout their speed of delivering query results.

The faster each of these steps run, the sooner you get data to the analysts, and thus the sooner you can take action. Certainly, the speed of the 3 steps combined is a limiting factor in achieving real-time action, but the main bottleneck is the speed of the human analysts. Fortunately, in most real-time analytics use cases, this is not a major problem, as the relatively slow responsiveness of human analysts is well within any service-level agreement (SLA) that the business has defined.

Adding Real Real-Time Action

At this point, you likely know where we’re going in this blog—that there is a complementary pattern that can add new uses for your real-time data that cannot wait for human responses. This pattern involves the use of stream processing engines to automate a real-time response to your real-time data stored in Kafka.

Real-Time Analytics with Hazelcast Diagram
Real-Time Stream Processing Complements Your Existing Infrastructure

In addition to arrows #2 and #3, you have arrows #2A and #3A in which real-time applications take care of the things that should be completed quickly and automatically without waiting for human intervention. Consider real-time offers while your customers are interacting with you (via shopping, browsing, banking), versus delivering an offer later in the day after they’ve left. Or how about sending your customer an immediate SMS about a potentially fraudulent transaction on their credit card? If the transaction is legitimately initiated by the customer, you won’t inadvertently deny the card swipe, and they won’t use a different credit card, causing you to lose the transaction fee. Or what about any process that involves machine learning (e.g., predictive maintenance, real-time route optimization, logistics tracking, etc.) that needs to deliver a prediction immediately so that quick corrective action can avoid a bigger disaster?

You Already Have the Data, Now Make More Use of It

If you already have a real-time data analytics system that leverages real-time data in Kafka, think about all the other real-time use cases that can help give your business additional advantage. This is where Hazelcast can help. You don’t have to throw out what you’ve already built. You simply tap into the data streams you already have, and automate the actions to create real-time responses.

If you’d like to explore what other real-time opportunities you might have that you are not currently handling, please contact us and we’d be happy to discuss these with you.