Compact Serialization for the Hazelcast .NET client

Serialization is inevitable in every client-server architecture whenever client-side application objects must be transferred to or from the server. Hazelcast has always supported various serialization solutions, each with pros and cons. To begin with, there are the built-in solutions:

  • Java Serializable and Java Externalizable – standard Java-only interfaces ;
  • .NET Serializable – a .NET-only BinaryFormatter based solution.

On the pros side, they are quite simple, aiming at total transparent serialization. On the cons side, they do not give control over how fields are (de)serialized. They may come with some security caveats (to the point that BinaryFormatter usage is strongly discouraged by Microsoft and even disabled by default in recent versions of .NET). And, they do not play well in polyglot scenarios, as the (Java) Hazelcast cluster cannot understand .NET Serializable more than the .NET client can understand Java Serializable.

To overcome those limitations, Hazelcast introduced more custom solutions:

  • DataSerializable – improves the built-in solutions by controlling how fields are (de)serialized (Java-only).
  • IdentifiedDataSerializable – improves DataSerializable by avoiding reflection (all languages).
  • Portable – improves these two by adding support for multi-version of the same object type, and partial deserialization of individual fields.

These solutions are safe, and IdentifiedDataSerializable and Portable work well in polyglot scenarios. They have served us well, including in demanding applications.

However, they require that the user write explicit serialization code: a serializer, maybe a serializer factory, and some configuration code. When Hazelcast is used as a plain key/value store, this means writing a few client-side .NET additional classes—the server will treat keys and values as binary blobs. But as soon as one wants to use more advanced features such as map filtering via predicates, SQL, or LINQ, which require that the server can (de)serialize keys and values, one must also write the same additional classes in Java for the server.

Introducing Compact Serialization

For all these reasons, Hazelcast has recently introduced a new serialization solution, Compact serialization. By default, Compact is a safe, efficient, schema-based, no-code serialization solution. Like Portable, it supports multi-version of the same object type and partial deserialization of individual fields but does not require any code.

What this practically means is that the following code runs and adds a Product object to the map, with the Hazelcast client managing the serialization of that object entirely by itself :

var map = await client.GetMapAsync<string, Product>();
var product = new Product { Name = "swibble", Price = 1234 };
await map.SetAsync(product.Name, product);

Under the hood, the client uses reflection to discover the public properties of the Product type, and constructs a schema describing that type, along with a serializer that will be used to handle all further instances of the type. In addition, the schema is sent to the cluster, allowing each member to (partially) deserialize the objects and run optimized queries. For instance, using LINQ, we can retrieve cheap products:

var result = await map.AsAsyncQueryable().Where(entry => entry.Value.Price < 100);
await foreach (var (name, product) in result)
    Console.WriteLine($"Product {name} costs {product.Price};

And… that is it. Simple things are simple. For this reason, we consider that Compact serialization is now the default, recommended solution.

Beyond Simple Things

Relying purely on no-code has two drawbacks:

  • Relying on reflection can have an impact on high-demanding applications ;
  • Names (type name, property names) are generated automatically.

As far as performance is concerned, we highly recommend benchmarking first. Although not using reflection is always faster, it may not make a big difference compared to the network roundtrip between the client and the member. Names can be more annoying, though. Java and .NET do not use the same conventions for names.

For instance, .NET Compact identifies a type using its full CLR name. Here, we can assume it would be My.Example.Product. However, should a Java client connect to the same cluster and receive such an object, it would not be able to deserialize it, having no way to map it to the corresponding org.example.Product Java class. Well, complex things are possible: it is possible to write your own serializer and take control of everything. For example:

public class ProductCompactSerializer : CompactSerializerBase<Product>
{
    public override string TypeName => "product";

    public override Product Read(ICompactReader reader)
    {
        return new Product
        {
            Name = reader.ReadString("name"),
            Price = reader.ReadFloat32("price")
        };
    }

    public override void Write(ICompactWriter writer, Product product)
    {
        writer.WriteString("name", product.Name);
        writer.WriteFloat32("price", product.Price);
    }
}

The serializer is registered via the SerializationOption :

options.Serialization.Compact.AddSerializer(new ProductCompactSerializer());

And now the type is formally identified as “product“. If a similar Java class is created, a Java client connecting to the same cluster would be able to (de)serialize the object, thus providing a full-polyglot experience.

When such a serializer is registered, every object coming from the server and identified as “product” will be routed to the serializer. This is where support for multi-version can be implemented. Of course, one can implement a version field, but the ICompactReader provides built-in schema inspection capabilities, allowing for instance:

public override Product Read(ICompactReader reader)
{
    var product = new Product
    {
        Name = reader.ReadString("name"),
        Price = reader.ReadFloat32("price")
    };

    // for a while category has been an integer
    if (reader.GetFieldKind("category") == FieldKind.Int32)
        product.Category = reader.ReadInt32("category").ToString();
    // but really, now, it is and should be a string
    else if (reader.GetFieldKind("category") == FieldKind.String)
        product.Category = reader.ReadString("category");
    // of course for old data, category may be entirely missing
    else
        product.Category = "Unknown";
    return product;
}

And… that is about all.

Compact serialization is not a highly complex topic; it just works and simplifies things.