Blog ›Overview: Multi-Member Routing in Hazelcast Platform

By Neil Stevenson

Principal Architect

With over 30 years of industry experience, Neil has designed, developed, debugged, and supported software systems for numerous customers large and small. Initially, a C and assembler programmer, most of the last 20 years have been Java-based, with a focus on distributed systems, data grids, and stream processing. Neil is an occasional committer to the Hazelcast code base, with special interest on GoLang.

View all blogs by the author

Sep 5, 2024

Back to Blog

Overview: Multi-Member Routing in Hazelcast Platform

Introducing `MULTI_MEMBER` Routing

One of the new features in Hazelcast Platform 5.5 for enterprise customers is “MULTI_MEMBER” routing for clients. Here, we’ll take a look at what it is, where it can be useful, and how it may reduce cloud costs.

Background – routing options

In the following cases, all cluster members can connect to each other. The connectivity being discussed is from clients to cluster members, which may have different needs due to the clients’ location.

`ALL_MEMBERS`

This is the default routing option, originally called “smart routing.” In this option, the client maintains a connection to each cluster member and selects the best connection for every request. For operations involving a primary key, the client knows which cluster member is responsible for storing that data record. The best connection for such a request is always direct to that cluster member.

So, if the client does “IMap.get(123)” twice, and key 123 is stored on member 5; both requests get sent to member 5. The client takes a round-robin approach for operations that don’t involve a primary key. The best connection for such a request is to the next server in sequence to try to even the load placed on the cluster. Here, if the client does “SELECT * FROM m” twice, the first request may go to member 3 and the second to member 4.

`SINGLE_MEMBER`

In this option, the client only has a connection to one cluster member open, which must be used for every request. If the client does “IMap.get(123)” and is connected to member 1, the request is sent to member 1. Member 1 forwards the request to member 5, acting as a gateway. It is important to note this gateway behavior here. Any operation sent to a member that can’t execute it directly will be passed on to one that can.

This is an extra network step that reduces performance. Since all requests here go through member 1, the load on the cluster is not evenly spread. As a general rule, it’s not a good routing option to set since it can lead to uneven load on the cluster. All traffic from a client goes to one cluster member. However, there are occasions when it’s required. Usually, this is a networking restriction. For example, in a Kubernetes deployment, the client may be outside of Kubernetes, and the cluster members may be inside. Networking may require only one ingress point, which is why this single or uni-socket mode is needed.

`MULTI_MEMBER`

The new option sits between the two previous ones. Here the client connects to a strict subset of the cluster members, more than one but less than all. For example, with 6 cluster members, the client might be configured to connect to 3 of them. In this example, the keys are spread across 6 members for primary key-based requests, but the client has a direct connection to 3 members. So, 50% of primary key requests can be sent directly to the correct target member. The remaining 50% cannot and instead follow a round-robin approach. Two requests for keys that are not directly reachable are sent to different members who are reachable.

Round-robin is also used for requests without a primary key. Requests cycle through the list of servers with open connections: the first request goes to the first reachable server, the second to the second reachable server, and so on. Note that this continues to follow the “best” connection principle. If the optimal target is reachable, that is used. If the optimal target is not reachable or unknown, load is balanced across the reachable members.

Partition Groups

Partition groups in the Hazelcast Platform allow you to dictate how data and backups are kept separate to increase data safety. Imagine a Hazelcast cluster with 6 machines, each with a member numbered 0 to 5. If a data record is placed on member 0, the backup goes on one of the others. Hazelcast Platform has no information besides IP addresses, so members 1, 2, 3, 4, and 5 are considered equally good for hosting the data backup.

However, you may know something that Hazelcast Platform cannot: Members 0, 2, and 4 use one power supply, and members 1, 3, and 5 use a different power supply. If a data record is placed on member 0, and the backup is placed on member 2, and the power supply fails, both members go offline, and access to the data is lost.

The solution is partition groups (See Partition Group Configuration). Members could be grouped into “EVEN” and “ODD” groups. “EVEN” for 0, 2, and 4, and unsurprisingly “ODD” for 1, 3, and 5.

Now, when Hazelcast Platform places a data record onto a member in one group, the other members of that group are not used to host the backup. If a data record is placed on member 0 in the “EVEN” group, the data backup will not be placed on members 2 and 4. It will go on 1, 3, or 5. If one power supply fails, only one data or its backup goes offline.

Which members does the client connect to?

The initial configuration option for MULTI_MEMBER uses partition groups.

    cluster-routing:
      mode: MULTI_MEMBER
      routing-strategy: PARTITION_GROUPS

The client will use the partition group for the first member to which it is configured to connect. Continuing our example, if the first member found is member 3 (in the “ODD” group), the client will open connections to members 1, 3, and 5 but not to 0, 2, or 4.

How is `MULTI_MEMBER` useful?

There are at least 2 ways that “MULTI_MEMBER” can be useful in cases of limited connectivity and for cost control.

Limited Connectivity

First, a client may not be able to connect to all members. The cluster members may be spread across two or more data centers for high availability. The client may be located in one of these data centers, but networking restrictions may block it from connecting to the other data centers. Connecting to a subset of members allows the client to load-balance across that subset, which gives the best available loading in that scenario.

Data Plane Cost Control

Second, it may not be desirable for a client to connect to all members, perhaps for cost reasons, as cloud providers frequently have different charges for retrieving data from different locations. In cloud deployments, an availability zone mostly means a data center. A more robust deployment spans availability zones to insulate against loss of service should one zone go offline.

Members here would be located in different availability zones. Member-to-member communication would use a networking setup, which spans availability zones and has a certain cost model for data retrieval between zones. Client-to-member communication may be localized to only the same availability zone, and this has a different cost model.

For requests without a primary key, such as an SQL query, having the majority of the execution performed in the same availability zone as the client reduces data transfer between availability zones, which can, reduce cloud costs.

Summary

In situations where it is not desirable or possible for a client to connect to all cluster members, there are now two options, “SINGLE_MEMBER” and “MULTI_MEMBER”.

“MULTI_MEMBER” gives better performance than “SINGLE_MEMBER” for primary key-based requests.
“MULTI_MEMBER” gives better load balancing than “SINGLE_MEMBER” for requests where a primary key isn’t applicable.

The more members the client can connect to, the better the performance and load balancing. But “MULTI_MEMBER” might be more cost-effective on the cloud than “ALL_MEMBERS“.

If you want to test the new “Multi-Member Routing” functionality, you can request an enterprise trial license for Hazelcast Platform: https://hazelcast.com/get-started/

Keep Reading

Blog

Recap: Establishing a Digital Core for Real-Time Applications

Discover how real-time data and AI drive business agility in our webinar recap. Experts from Discover, JPMorgan Chase, Baxter, and Capri Holdings shared insights on customer-centric ROI, AI strategies, and shedding legacy systems. Learn how businesses unlock efficiencies, scale seamlessly, and stay competitive with real-time data.

Webinar

/ Video

/ 60 min

Establishing a Digital Core for the Real-Time Economy

Blog

Recap: Hazelcast, Santander, and London Java Community Holiday Meetup

Close out the year in style with the London Java Community! Highlights included Santander’s Humphrey Fu sharing how Hazelcast evolved into a core part of their FX platform, and Hazelcast’s Ed Thurman exploring fault-tolerant Java applications. With insights, festive fun, and community connections, this meetup was one to remember.

Case Study

IoT Analytics and Telematics at the Largest Logistics Operation in the World

Video

Beyond Caching with Hazelcast, Santander, and London Java Community

Discover how Santander scaled Hazelcast to power messaging, compute grids, and more, with insights from Engineering Lead Humphrey Fu. Plus, Hazelcast’s Ed Thurman shares practical strategies for resilient distributed systems, covering replication, backups, and failure-resistant Java design. Perfect for architects, engineers, and developers!

Blog

Get Instant Answers with Ask AI on Hazelcast Documentation

Introducing Ask AI on our documentation site! This chatbot uses a retrieval-augmented generation system trained on 5,000+ pages to deliver accurate answers to technical questions, supporting multiple languages and improving the developer experience. User feedback enhances response quality to keep Hazelcast’s documentation reliable.

Why Hazelcast?

Ready to see how Hazelcast can transform your data architecture?

Key Capabilities

Products

Tool Kit

Quick Links

A cloud-agnostic architecture for your applications

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

By Neil Stevenson

Spread the Word

Overview: Multi-Member Routing in Hazelcast Platform

Introducing MULTI_MEMBER Routing

Background – routing options

ALL_MEMBERS

SINGLE_MEMBER

MULTI_MEMBER

Partition Groups

Which members does the client connect to?

How is MULTI_MEMBER useful?

Limited Connectivity

Data Plane Cost Control

Summary

Keep Reading

Recap: Establishing a Digital Core for Real-Time Applications

Establishing a Digital Core for the Real-Time Economy

Recap: Hazelcast, Santander, and London Java Community Holiday Meetup

IoT Analytics and Telematics at the Largest Logistics Operation in the World

Beyond Caching with Hazelcast, Santander, and London Java Community

Get Instant Answers with Ask AI on Hazelcast Documentation

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Introducing `MULTI_MEMBER` Routing

`ALL_MEMBERS`

`SINGLE_MEMBER`

`MULTI_MEMBER`

How is `MULTI_MEMBER` useful?