Overview: Multi-Member Routing in Hazelcast Platform

Introducing MULTI_MEMBER Routing

One of the new features in Hazelcast Platform 5.5 for enterprise customers is “MULTI_MEMBER” routing for clients. Here, we’ll take a look at what it is, where it can be useful, and how it may reduce cloud costs.

Background – routing options

In the following cases, all cluster members can connect to each other. The connectivity being discussed is from clients to cluster members, which may have different needs due to the clients’ location.

ALL_MEMBERS

This is the default routing option, originally called “smart routing.” In this option, the client maintains a connection to each cluster member and selects the best connection for every request. For operations involving a primary key, the client knows which cluster member is responsible for storing that data record. The best connection for such a request is always direct to that cluster member.

So, if the client does “IMap.get(123)” twice, and key 123 is stored on member 5; both requests get sent to member 5. The client takes a round-robin approach for operations that don’t involve a primary key. The best connection for such a request is to the next server in sequence to try to even the load placed on the cluster. Here, if the client does “SELECT * FROM m” twice, the first request may go to member 3 and the second to member 4.

SINGLE_MEMBER

In this option, the client only has a connection to one cluster member open, which must be used for every request. If the client does “IMap.get(123)” and is connected to member 1, the request is sent to member 1. Member 1 forwards the request to member 5, acting as a gateway. It is important to note this gateway behavior here. Any operation sent to a member that can’t execute it directly will be passed on to one that can.

This is an extra network step that reduces performance. Since all requests here go through member 1, the load on the cluster is not evenly spread. As a general rule, it’s not a good routing option to set since it can lead to uneven load on the cluster. All traffic from a client goes to one cluster member. However, there are occasions when it’s required. Usually, this is a networking restriction. For example, in a Kubernetes deployment, the client may be outside of Kubernetes, and the cluster members may be inside. Networking may require only one ingress point, which is why this single or uni-socket mode is needed.

MULTI_MEMBER

The new option sits between the two previous ones. Here the client connects to a strict subset of the cluster members, more than one but less than all. For example, with 6 cluster members, the client might be configured to connect to 3 of them. In this example, the keys are spread across 6 members for primary key-based requests, but the client has a direct connection to 3 members. So, 50% of primary key requests can be sent directly to the correct target member. The remaining 50% cannot and instead follow a round-robin approach. Two requests for keys that are not directly reachable are sent to different members who are reachable.

Round-robin is also used for requests without a primary key. Requests cycle through the list of servers with open connections: the first request goes to the first reachable server, the second to the second reachable server, and so on. Note that this continues to follow the “best” connection principle. If the optimal target is reachable, that is used. If the optimal target is not reachable or unknown, load is balanced across the reachable members.

Partition Groups

Partition groups in the Hazelcast Platform allow you to dictate how data and backups are kept separate to increase data safety. Imagine a Hazelcast cluster with 6 machines, each with a member numbered 0 to 5. If a data record is placed on member 0, the backup goes on one of the others. Hazelcast Platform has no information besides IP addresses, so members 1, 2, 3, 4, and 5 are considered equally good for hosting the data backup.

However, you may know something that Hazelcast Platform cannot: Members 0, 2, and 4 use one power supply, and members 1, 3, and 5 use a different power supply. If a data record is placed on member 0, and the backup is placed on member 2, and the power supply fails, both members go offline, and access to the data is lost.

The solution is partition groups (See Partition Group Configuration). Members could be grouped into “EVEN” and “ODD” groups. “EVEN” for 0, 2, and 4, and unsurprisingly “ODD” for 1, 3, and 5.

Now, when Hazelcast Platform places a data record onto a member in one group, the other members of that group are not used to host the backup. If a data record is placed on member 0 in the “EVEN” group, the data backup will not be placed on members 2 and 4. It will go on 1, 3, or 5. If one power supply fails, only one data or its backup goes offline.

Which members does the client connect to?

The initial configuration option for MULTI_MEMBER uses partition groups.

    cluster-routing:
      mode: MULTI_MEMBER
      routing-strategy: PARTITION_GROUPS

The client will use the partition group for the first member to which it is configured to connect. Continuing our example, if the first member found is member 3 (in the “ODD” group), the client will open connections to members 1, 3, and 5 but not to 0, 2, or 4.

How is MULTI_MEMBER useful?

There are at least 2 ways that “MULTI_MEMBER” can be useful in cases of limited connectivity and for cost control.

Limited Connectivity

First, a client may not be able to connect to all members. The cluster members may be spread across two or more data centers for high availability. The client may be located in one of these data centers, but networking restrictions may block it from connecting to the other data centers. Connecting to a subset of members allows the client to load-balance across that subset, which gives the best available loading in that scenario.

Data Plane Cost Control

Second, it may not be desirable for a client to connect to all members, perhaps for cost reasons, as cloud providers frequently have different charges for retrieving data from different locations. In cloud deployments, an availability zone mostly means a data center. A more robust deployment spans availability zones to insulate against loss of service should one zone go offline.

Members here would be located in different availability zones. Member-to-member communication would use a networking setup, which spans availability zones and has a certain cost model for data retrieval between zones. Client-to-member communication may be localized to only the same availability zone, and this has a different cost model.

For requests without a primary key, such as an SQL query, having the majority of the execution performed in the same availability zone as the client reduces data transfer between availability zones, which can, reduce cloud costs.

Summary

In situations where it is not desirable or possible for a client to connect to all cluster members, there are now two options, “SINGLE_MEMBER” and “MULTI_MEMBER”.

  • MULTI_MEMBER” gives better performance than “SINGLE_MEMBER” for primary key-based requests.
  • MULTI_MEMBER” gives better load balancing than “SINGLE_MEMBER” for requests where a primary key isn’t applicable.

The more members the client can connect to, the better the performance and load balancing. But “MULTI_MEMBER” might be more cost-effective on the cloud than “ALL_MEMBERS“.

If you want to test the new “Multi-Member Routing” functionality, you can request an enterprise trial license for Hazelcast Platform: https://hazelcast.com/get-started/