Comparing Hazelcast 3 Serialization methods with Kryo Serialization and Jackson Smile

Important note: This blog post have been updated to reflect recent changes in Hazelcast API.

Hazelcast relies heavily on serialization. It is designed to be a distributed cache and honestly it is not very suitable for a local cache.

Sometimes people compare it with other caching solutions and notice that Hazelcast is very slow. This is because a local cache doesn’t need to serialize its entries, but Hazelcast does. And serialization is very costly.

Hazelcast 3 comes with a big update on serialization. Not only it provides 2 new way of serialization, now you can plug your own serialization and it works like a charm. Here I tried Kryo for custom serialization and did a quick comparison to shade some light on different choices of serialization and their performance.

My Test Environment

The purpose of this test is to compare different serialization methods so I tested everything on my laptop in a single node. I like Lambda expressions that come with Java 8, so I am using the following Java 8 JDK for Mac OS X.

I did a map.put and map.get, each with 1M times with 1M different Customer objects. Notice that PUT serializes the object and GET deserializes it.

My Customer class has the following fields:

public enum Sex {
        MALE, FEMALE
    }

    String name;
    Date birthday;
    Sex gender;
    String emailAddress;
    long[] longArray; // length is 100

I create a Customer object using this method:

import java.util.Date;
 import java.util.Random;

 public class CustomerFactory {

     private static Random random = 
             new Random(new Date().getTime());

     private static long[] longArray;

     static {
         longArray = new long[100];
         for (int j = 0; j < longArray.length; j++) {
             longArray[j] = random.nextLong();
         }
     }

     public static Customer newCustomer(int i) {
         Customer customer = new Customer();
         customer.setCustomerId(i);
         customer.setName("Name" + i);
         Customer.Sex sex = i % 2 == 0 ? Customer.Sex.MALE :
                 Customer.Sex.FEMALE;
         customer.setGender(sex);
         customer.setBirthday(new Date(System.currentTimeMillis() -
                 random.nextInt(100 * 365 * 24 * 60 * 60 * 1000)));
         customer.setEmailAddress("email." + customer.getName() +
                 "@gmail.com");
         customer.setLongArray(longArray);
         return customer;
     }
 }

Populating Map 1M times is easy. Here count was set as 1,000,000:

Config config = new Config();
 config.setProperty(GroupProperties.PROP_WAIT_SECONDS_BEFORE_JOIN, "1");
 HazelcastInstance h = Hazelcast.newHazelcastInstance(config);
 IMap customerMap = h.getMap("customers");
for (int i = 0; i < count; i++) {
     Customer customer = CustomerFactory.newCustomer(i);
     customerMap.put(i, customer);
 }

and similarly to read 1M times I use the following code:

for (int i = 0; i < count; i++) {
     customerMap.get(i);
 }

I also measured the size of the binary output. I used the Serialization Service within Hazelcast. After the benchmark the following code gave me the size of the serialized Customer object.

SerializationService ss = new SerializationServiceBuilder()
             .setConfig(config.getSerializationConfig()).build();
  System.out.println("Binary size of value is " + 
             ss.toData(customerMap.get(1)).bufferSize() + " bytes");

Default Java Serialization

By default you can always use Java Serialization. Simply make your class implement java.io.Serializable and you are good to go. With Java serialization I get the following results:

1000000 PUT took 23780 ms
1000000 GET took 40890 ms
Binary size of value is 1230 bytes

Hazelcast DataSerializable

DataSerializable is a faster alternative to Java serialization. Unfortunately it is not as simple as marking a class. You need to implement the actual serialization. DataSerializable looks alike Externalizable.

Here is how I implement DataSerializable:

public void writeData(ObjectDataOutput out) 
                            throws IOException {
         out.writeInt(customerId);
         out.writeUTF(name);
         out.writeLong(birthday.getTime());
         out.writeUTF(gender.toString());
         out.writeUTF(emailAddress);
         out.writeLongArray(longArray);
     }

     public void readData(ObjectDataInput in) 
                         throws IOException {
         customerId = in.readInt();
         name = in.readUTF();
         birthday = new Date(in.readLong());
         gender = Sex.valueOf(in.readUTF());
         emailAddress = in.readUTF();
         longArray = in.readLongArray();
     }

Unlike to Java serialization, DataSerializable doesn’t have any extra overhead and is much faster and has a smaller footprint.

1000000 PUT took 17913 ms
1000000 GET took 20813 ms
Binary size of value is 918 bytes

Notice deserialization(GET) is slower than serialization. This is because DataSerializable writes and reads class name at the beginning and uses reflection to create an instance of the class. To avoid reflection move to the next chapter.

Hazelcast IdentifiedDataSerializable

IdentifiedDataSerializable is the extended version of DataSerializable to avoid reflection that is introduced with Hazelcast 3. To avoid reflection, Hazelcast requires you to register a factory. That’s why a class implementing IdentifiedDataSerializable needs to implement the getId() and getFactoryID() methods. This id is passed to factory to identify the class.

public int getFactoryId() {
        return 1;
    }

    public int getId() {
        return 1;
    }

In this case I am just returning 1. The last step is to implement and register the factory to Config. Lambda expressions come very handy here.

config.getSerializationConfig().addDataSerializableFactory
        (1, (int id) -> (id == 1) ? new Customer() : null);

And the result is:

1000000 PUT took 18164 ms
1000000 GET took 15937 ms
Binary size of value is 878 bytes

Notice that serialization is almost same, but deserialization is much faster with IdentifiedDataSerializable because it doesn’t use reflection to create the Customer object.

Hazelcast Portable

Portable is an advanced serialization that supports the following features:

  • Support multiversion for the same object type.
  • Fetching individual fields without having to rely on reflection
  • Querying and indexing support without de-serialization and/or reflection

In order to support these features, a serialized Portable object contains meta information like the version, field names and location of the each field in the binary data. This way Hazelcast is able to navigate in the binary data and deserialize only the required field without actually deserializing the whole object. This improves the query performance a lot. By query I mean Hazelcast Predicate API, where you can write a Predicate on the value and Hazelcast will distribute it to the cluster and return all the matched entries. Similar to IdentifiedDataSerializable Portable also needs a factory to instantiate the objets.

Now my class implementing Portable looks like the following:

public int getFactoryId() {
         return 1;
     }

     public int getClassId() {
         return 1;
     }

     public void writePortable(PortableWriter writer) 
                                 throws IOException {
         writer.writeInt("id", customerId);
         writer.writeUTF("name", name);
         writer.writeLong("date", birthday.getTime());
         writer.writeUTF("gender", gender.toString());
         writer.writeUTF("e-mail", emailAddress);
         writer.writeLongArray("longArray", longArray);
     }

     public void readPortable(PortableReader reader) 
                                throws IOException {
         customerId = reader.readInt("id");
         name = reader.readUTF("name");
         birthday = new Date(reader.readLong("date"));
         gender = Sex.valueOf(reader.readUTF("gender"));
         emailAddress = reader.readUTF("e-mail");
         longArray = reader.readLongArray("longArray");
     }

factoryId is the id of the factory that we will implement in a minute. classId is the id of this Customer class. It will be used within factory method to identify the Customer class. Finally here is our factory, added to the Config.

config.getSerializationConfig().addPortableFactory
       (1, (int id) -> (id == 1) ? new Customer() : null);

And the results are:

1000000 PUT took 19064 ms
1000000 GET took 16695 ms
Binary size of value is 901 bytes

When the object is serialized, Portable zips and sends the Meta information along with entry. Because of this extra work it is slightly slower than IdentifiedDataSerializable

Custom Serialization using Kryo

Finally Hazelcast 3 lets you to implement and register your own serialization. As part of my comparison I tried Kryo. The beauty of Kryo is that, you don’t need to make your domain classes implement anything. We just need to implement a StreamSerializer or StreamSerializer. As you can guess first one serializes into and deserializes from a stream, whereas the later one serializes into byte[] and deserializes from byte[]. I implemented a Kryo based StreamSerializer for Customer object and register it to Hazelcast config. Note that Kryo is not thread-safe. That is why each thread needs to have its own Kryo. Here is the implementation details.

import com.esotericsoftware.kryo.Kryo;
 import com.esotericsoftware.kryo.io.Input;
 import com.esotericsoftware.kryo.io.Output;
 import com.hazelcast.nio.ObjectDataInput;
 import com.hazelcast.nio.ObjectDataOutput;
 import com.hazelcast.nio.serialization.StreamSerializer;

 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;

 public class CustomerKryoSerializer 
                    implements StreamSerializer<Customer> {

     private static final ThreadLocal<Kryo> kryoThreadLocal
             = new ThreadLocal() {

         @Override
         protected Kryo initialValue() {
             Kryo kryo = new Kryo();
             kryo.register(Customer.class);
             return kryo;
         }
     };

     public CustomerKryoSerializer() {
     }

     public int getTypeId() {
         return 2;
     }

     public void write(ObjectDataOutput objectDataOutput, 
                                         Customer customer)
             throws IOException {
         Kryo kryo = kryoThreadLocal.get();
         Output output = new Output((OutputStream) objectDataOutput);
         kryo.writeObject(output, customer);
         output.flush();
     }

     public Customer read(ObjectDataInput objectDataInput)
             throws IOException {
         InputStream in = (InputStream) objectDataInput;
         Input input = new Input(in);
         Kryo kryo = kryoThreadLocal.get();
         return kryo.readObject(input, Customer.class);
     }

     public void destroy() {
     }
 }

And finally here is how we plug this StreamSerializer

config.getSerializationConfig().getSerializerConfigs().add(
                new SerializerConfig().
                setTypeClass(Customer.class).
                setImplementation(new CustomerKryoSerializer()));

We basically tell Hazelcast to use CustomerKryoSerializer whenever it sees Customer.class. And the result is:

1000000 PUT took 22204 ms
1000000 GET took 19072 ms
Binary size of value is 957 bytes

The results are not bad given that we didn’t have to implement the serialization for each field manually. It can be used whenever you are not able to implement IdentifiedDataSerializable or Portable.

Custom Serialization using Jackson Smile

Tim suggested to try Jackson Smile and I did. I implemented it as a ByteArrraySerializer. Note that we used StreamSerializer for Kryo. Here is the code. I should say that it is the simplest serialization code so far:) Note that Jackson is already thread-safe.

import com.hazelcast.nio.ObjectDataInput;
 import com.hazelcast.nio.ObjectDataOutput;
 import com.hazelcast.nio.serialization.ByteArraySerializer;
 import org.codehaus.jackson.map.ObjectMapper;
 import org.codehaus.jackson.smile.SmileFactory;

 import java.io.IOException;
 import java.io.InputStream;

 public class CustomerSmileSerializer 
                     implements ByteArraySerializer<Customer> {

     ObjectMapper mapper = new ObjectMapper(new SmileFactory());

     public int getTypeId() {
         return 5;
     }

     public void write(ObjectDataOutput out, Customer object)
             throws IOException {
         byte[] data = mapper.writeValueAsBytes(object);
         System.out.println("Size is " + data.length);
         out.write(data);
     }

     public Customer read(ObjectDataInput in) throws IOException {
         return mapper.readValue((InputStream) in,
                 Customer.class);
     }

     public void destroy() {
     }

     @Override
     public byte[] write(Customer customer) throws IOException {
         return mapper.writeValueAsBytes(customer);
     }

     @Override
     public Customer read(byte[] bytes) throws IOException {
         return mapper.readValue(bytes, Customer.class);
     }
 }

Registering is similar to Kryo:

config.getSerializationConfig().getSerializerConfigs().add(
          new SerializerConfig().
          setTypeClass(Customer.class).
          setImplementation(new CustomerSmileSerializer()));

And the result is:

1000000 PUT took 22055 ms
1000000 GET took 23294 ms
Binary size of value is 1184 bytes

Kryo wins in terms of both time and size compared to Jackson Smile.

Summary

To summarize; Java serialization is the worst. With Custom Serialization you can easily implement and plug Kryo or Jackson Smile serializers. Hazelcast supports Stream based or ByteArray based serializers. They are way better than Java Serialization and doesn’t require to change your Classes. Can be easily used for third party objects. They don’t even have to be Serializable.

However Hazelcast DataSerializable, IdentifiedDataSerializable and Portable are better for serialization and deserialization. They are very optimized and produces less binary. However they have an implementation overhead, you need to implement additional serialization methods for all your Classes. But if you need better query performance without having to add index to all your fields, then it should be worth of trying Portable.

Serialization Type (bytes) 1M PUT (ms) 1M GET (ms) Value Size (bytes)
Java Serialization 23780 40890 1230
DataSerializable 17913 20813 918
IdentifiedDataSerializable 18164 15937 878
Portable 19064 16695 901
Kryo 22204 19072 957
Jackson Smile 22055 23294 1184