Comparing Hazelcast 3 Serialization methods with Kryo Serialization and Jackson Smile
Important note: This blog post have been updated to reflect recent changes in Hazelcast API.
Hazelcast relies heavily on serialization. It is designed to be a distributed cache and honestly it is not very suitable for a local cache.
Sometimes people compare it with other caching solutions and notice that Hazelcast is very slow. This is because a local cache doesn’t need to serialize its entries, but Hazelcast does. And serialization is very costly.
Hazelcast 3 comes with a big update on serialization. Not only it provides 2 new way of serialization, now you can plug your own serialization and it works like a charm. Here I tried Kryo for custom serialization and did a quick comparison to shade some light on different choices of serialization and their performance.
My Test Environment
The purpose of this test is to compare different serialization methods so I tested everything on my laptop in a single node. I like Lambda expressions that come with Java 8, so I am using the following Java 8 JDK for Mac OS X.
I did a map.put
and map.get,
each with 1M times with 1M different Customer
objects. Notice that PUT serializes the object and GET deserializes it.
My Customer
class has the following fields:
public enum Sex { MALE, FEMALE } String name; Date birthday; Sex gender; String emailAddress; long[] longArray; // length is 100
I create a Customer object using this method:
import java.util.Date; import java.util.Random; public class CustomerFactory { private static Random random = new Random(new Date().getTime()); private static long[] longArray; static { longArray = new long[100]; for (int j = 0; j < longArray.length; j++) { longArray[j] = random.nextLong(); } } public static Customer newCustomer(int i) { Customer customer = new Customer(); customer.setCustomerId(i); customer.setName("Name" + i); Customer.Sex sex = i % 2 == 0 ? Customer.Sex.MALE : Customer.Sex.FEMALE; customer.setGender(sex); customer.setBirthday(new Date(System.currentTimeMillis() - random.nextInt(100 * 365 * 24 * 60 * 60 * 1000))); customer.setEmailAddress("email." + customer.getName() + "@gmail.com"); customer.setLongArray(longArray); return customer; } }
Populating Map 1M times is easy. Here count was set as 1,000,000:
Config config = new Config(); config.setProperty(GroupProperties.PROP_WAIT_SECONDS_BEFORE_JOIN, "1"); HazelcastInstance h = Hazelcast.newHazelcastInstance(config); IMap customerMap = h.getMap("customers");
for (int i = 0; i < count; i++) { Customer customer = CustomerFactory.newCustomer(i); customerMap.put(i, customer); }
and similarly to read 1M times I use the following code:
for (int i = 0; i < count; i++) { customerMap.get(i); }
I also measured the size of the binary output. I used the Serialization Service within Hazelcast. After the benchmark the following code gave me the size of the serialized Customer object.
SerializationService ss = new SerializationServiceBuilder() .setConfig(config.getSerializationConfig()).build(); System.out.println("Binary size of value is " + ss.toData(customerMap.get(1)).bufferSize() + " bytes");
Default Java Serialization
By default you can always use Java Serialization. Simply make your class implement java.io.Serializable
and you are good to go. With Java serialization I get the following results:
1000000 PUT took 23780 ms 1000000 GET took 40890 ms Binary size of value is 1230 bytes
Hazelcast DataSerializable
DataSerializable
is a faster alternative to Java serialization. Unfortunately it is not as simple as marking a class. You need to implement the actual serialization. DataSerializable
looks alike Externalizable
.
Here is how I implement DataSerializable
:
public void writeData(ObjectDataOutput out) throws IOException { out.writeInt(customerId); out.writeUTF(name); out.writeLong(birthday.getTime()); out.writeUTF(gender.toString()); out.writeUTF(emailAddress); out.writeLongArray(longArray); } public void readData(ObjectDataInput in) throws IOException { customerId = in.readInt(); name = in.readUTF(); birthday = new Date(in.readLong()); gender = Sex.valueOf(in.readUTF()); emailAddress = in.readUTF(); longArray = in.readLongArray(); }
Unlike to Java serialization, DataSerializable doesn’t have any extra overhead and is much faster and has a smaller footprint.
1000000 PUT took 17913 ms 1000000 GET took 20813 ms Binary size of value is 918 bytes
Notice deserialization(GET) is slower than serialization. This is because DataSerializable writes and reads class name at the beginning and uses reflection to create an instance of the class. To avoid reflection move to the next chapter.
Hazelcast IdentifiedDataSerializable
IdentifiedDataSerializable
is the extended version of DataSerializable
to avoid reflection that is introduced with Hazelcast 3. To avoid reflection, Hazelcast requires you to register a factory. That’s why a class implementing IdentifiedDataSerializable
needs to implement the getId()
and getFactoryID()
methods. This id is passed to factory to identify the class.
public int getFactoryId() { return 1; } public int getId() { return 1; }
In this case I am just returning 1. The last step is to implement and register the factory to Config
. Lambda expressions come very handy here.
config.getSerializationConfig().addDataSerializableFactory (1, (int id) -> (id == 1) ? new Customer() : null);
And the result is:
1000000 PUT took 18164 ms 1000000 GET took 15937 ms Binary size of value is 878 bytes
Notice that serialization is almost same, but deserialization is much faster with IdentifiedDataSerializable
because it doesn’t use reflection to create the Customer object.
Hazelcast Portable
Portable is an advanced serialization that supports the following features:
- Support multiversion for the same object type.
- Fetching individual fields without having to rely on reflection
- Querying and indexing support without de-serialization and/or reflection
In order to support these features, a serialized Portable object contains meta information like the version, field names and location of the each field in the binary data. This way Hazelcast is able to navigate in the binary data and deserialize only the required field without actually deserializing the whole object. This improves the query performance a lot. By query I mean Hazelcast Predicate API, where you can write a Predicate on the value and Hazelcast will distribute it to the cluster and return all the matched entries. Similar to IdentifiedDataSerializable
Portable
also needs a factory to instantiate the objets.
Now my class implementing Portable
looks like the following:
public int getFactoryId() { return 1; } public int getClassId() { return 1; } public void writePortable(PortableWriter writer) throws IOException { writer.writeInt("id", customerId); writer.writeUTF("name", name); writer.writeLong("date", birthday.getTime()); writer.writeUTF("gender", gender.toString()); writer.writeUTF("e-mail", emailAddress); writer.writeLongArray("longArray", longArray); } public void readPortable(PortableReader reader) throws IOException { customerId = reader.readInt("id"); name = reader.readUTF("name"); birthday = new Date(reader.readLong("date")); gender = Sex.valueOf(reader.readUTF("gender")); emailAddress = reader.readUTF("e-mail"); longArray = reader.readLongArray("longArray"); }
factoryId
is the id of the factory that we will implement in a minute. classId
is the id of this Customer
class. It will be used within factory method to identify the Customer
class. Finally here is our factory, added to the Config
.
config.getSerializationConfig().addPortableFactory (1, (int id) -> (id == 1) ? new Customer() : null);
And the results are:
1000000 PUT took 19064 ms 1000000 GET took 16695 ms Binary size of value is 901 bytes
When the object is serialized, Portable
zips and sends the Meta information along with entry. Because of this extra work it is slightly slower than IdentifiedDataSerializable
Custom Serialization using Kryo
Finally Hazelcast 3 lets you to implement and register your own serialization. As part of my comparison I tried Kryo. The beauty of Kryo is that, you don’t need to make your domain classes implement anything. We just need to implement a StreamSerializer
or StreamSerializer
. As you can guess first one serializes into and deserializes from a stream, whereas the later one serializes into byte[]
and deserializes from byte[]
. I implemented a Kryo based StreamSerializer
for Customer object and register it to Hazelcast config. Note that Kryo is not thread-safe. That is why each thread needs to have its own Kryo. Here is the implementation details.
import com.esotericsoftware.kryo.Kryo; import com.esotericsoftware.kryo.io.Input; import com.esotericsoftware.kryo.io.Output; import com.hazelcast.nio.ObjectDataInput; import com.hazelcast.nio.ObjectDataOutput; import com.hazelcast.nio.serialization.StreamSerializer; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; public class CustomerKryoSerializer implements StreamSerializer<Customer> { private static final ThreadLocal<Kryo> kryoThreadLocal = new ThreadLocal() { @Override protected Kryo initialValue() { Kryo kryo = new Kryo(); kryo.register(Customer.class); return kryo; } }; public CustomerKryoSerializer() { } public int getTypeId() { return 2; } public void write(ObjectDataOutput objectDataOutput, Customer customer) throws IOException { Kryo kryo = kryoThreadLocal.get(); Output output = new Output((OutputStream) objectDataOutput); kryo.writeObject(output, customer); output.flush(); } public Customer read(ObjectDataInput objectDataInput) throws IOException { InputStream in = (InputStream) objectDataInput; Input input = new Input(in); Kryo kryo = kryoThreadLocal.get(); return kryo.readObject(input, Customer.class); } public void destroy() { } }
And finally here is how we plug this StreamSerializer
config.getSerializationConfig().getSerializerConfigs().add( new SerializerConfig(). setTypeClass(Customer.class). setImplementation(new CustomerKryoSerializer()));
We basically tell Hazelcast to use CustomerKryoSerializer
whenever it sees Customer.class. And the result is:
1000000 PUT took 22204 ms 1000000 GET took 19072 ms Binary size of value is 957 bytes
The results are not bad given that we didn’t have to implement the serialization for each field manually. It can be used whenever you are not able to implement IdentifiedDataSerializable or Portable.
Custom Serialization using Jackson Smile
Tim suggested to try Jackson Smile and I did. I implemented it as a ByteArrraySerializer
. Note that we used StreamSerializer for Kryo. Here is the code. I should say that it is the simplest serialization code so far:) Note that Jackson is already thread-safe.
import com.hazelcast.nio.ObjectDataInput; import com.hazelcast.nio.ObjectDataOutput; import com.hazelcast.nio.serialization.ByteArraySerializer; import org.codehaus.jackson.map.ObjectMapper; import org.codehaus.jackson.smile.SmileFactory; import java.io.IOException; import java.io.InputStream; public class CustomerSmileSerializer implements ByteArraySerializer<Customer> { ObjectMapper mapper = new ObjectMapper(new SmileFactory()); public int getTypeId() { return 5; } public void write(ObjectDataOutput out, Customer object) throws IOException { byte[] data = mapper.writeValueAsBytes(object); System.out.println("Size is " + data.length); out.write(data); } public Customer read(ObjectDataInput in) throws IOException { return mapper.readValue((InputStream) in, Customer.class); } public void destroy() { } @Override public byte[] write(Customer customer) throws IOException { return mapper.writeValueAsBytes(customer); } @Override public Customer read(byte[] bytes) throws IOException { return mapper.readValue(bytes, Customer.class); } }
Registering is similar to Kryo:
config.getSerializationConfig().getSerializerConfigs().add( new SerializerConfig(). setTypeClass(Customer.class). setImplementation(new CustomerSmileSerializer()));
And the result is:
1000000 PUT took 22055 ms 1000000 GET took 23294 ms Binary size of value is 1184 bytes
Kryo wins in terms of both time and size compared to Jackson Smile.
Summary
To summarize; Java serialization is the worst. With Custom Serialization you can easily implement and plug Kryo or Jackson Smile serializers. Hazelcast supports Stream based or ByteArray based serializers. They are way better than Java Serialization and doesn’t require to change your Classes. Can be easily used for third party objects. They don’t even have to be Serializable.
However Hazelcast DataSerializable
, IdentifiedDataSerializable
and Portable
are better for serialization and deserialization. They are very optimized and produces less binary. However they have an implementation overhead, you need to implement additional serialization methods for all your Classes. But if you need better query performance without having to add index to all your fields, then it should be worth of trying Portable
.
Serialization Type (bytes) | 1M PUT (ms) | 1M GET (ms) | Value Size (bytes) |
---|---|---|---|
Java Serialization | 23780 | 40890 | 1230 |
DataSerializable | 17913 | 20813 | 918 |
IdentifiedDataSerializable | 18164 | 15937 | 878 |
Portable | 19064 | 16695 | 901 |
Kryo | 22204 | 19072 | 957 |
Jackson Smile | 22055 | 23294 | 1184 |