What’s new in Hazelcast 3?
Hazelcast 3 is the biggest change that we have introduced to Hazelcast, ever. As Hazelcast grew with more and more features, we often felt that the inner architecture is no longer good enough to support those additions. It was almost impossible to introduce some of new features that community really wanted. So a decision made to take action and as a result Hazelcast 3 was born with a new design and lots of new features.
Core Changes
We almost re-implemented the entire architecture from scratch. Now, the core framework handles anything from invocation to blocking operations to backup handling to events. These concepts used to be part of the Data Structure implementation. Those who know Hazelcast 2 internals are aware that there is a single thread called Service Thread that handles all the operations on the Node level. Service Thread handles almost everything that is received by the socket. With the new design, there are many threads that handle the operations. We call them Operation Threads. A Node can have multiple Operation Threads where each Thread is responsible from multiple Partitions. This way it is easier to scale up on multicore machines.
Hazelcast 3 introduces some changes to the API. We removed most of the static method calls like Hazelcast.getMap()
and left the non static ones. The only way of getting/creating a distributed Map is to get it from HazelcastInstance
.
Config config = new Config(); HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config); Map map = hazelcast.getMap(“myMap”);
You need to write a little more code, but it is less confusing. Hazelcast supports creation of more than one Hazelcast Instance (Node) in one JVM. That’s why static methods were wrong and misleading.
The Transaction API is completely changed. However, a general tip, stay away from transactions. They are evil in the scalability world. But if you really need them, from now on we support local and distributed transactions, namely 2PC.
In general 70%-80% of the existing code is rewritten. But 95% of the API is same and doesn’t need any change. As a result we have clear layers from low to high: Networking, Clustering, Partitioning, Node Engine and SPI. The well defined layers lets us to try new ideas and different implementations for each layer and optimize them. On the other hand, SPI is the Service Provider Interface, an extension point to Hazelcast. You can use it to implement a new Distributed Data Structure of your own. We have re-implemented all of the existing distributed objects like; map, queue, executor service using SPI.
For the migration guide from Hazelcast 2 to Hazelcast 3, please see “Upgrading from 2.x versions”.
Serialization
Hazelcast optimizes serialization for certain data types and by default uses Java Serialization. There is also DataSerializable that will improve serialization in terms of time and space if a class implements it. Hazelcast 3 adds two new Serialization interfaces that are.
IdentifiedDataSerializable
Portable
IdentifiedDataSerializable
is a slightly optimized version of DataSerializable
that doesn’t use class name, thus reflection for deserializing the objects. This way deserialization is faster and it is actually language independent.
However, Portable
is an advanced serialization framework that has all goodies of IdentifiedDataSerializable
such as no-reflection and being language independent. It also supports multi version and fetching, querying and indexing individual fields without relying on reflection and de-serialization. Portable
object contains Meta information like version and serialized field definitions. This way Hazelcast is able to navigate in the byte[]
and de-serialize only the required field without actually de-serializing the whole object which improves the Query performance a lot.
Both IdentifiedDataSerializable
and Portable
requires an additional Factory
class to be used instead of reflection.
Our last addition to the serialization is ability to customize it. With Custom Serialization one can plug his choice of serialization instead of default Java Serialization and tell Hazelcast how to serialize/deserialize. Here is a sample implementation that serializes/deserializes Java Objects to/from XML using XMLEncoder
and XMLDecoder
.
public static class XmlSerializer implements TypeSerializer { @Override public int getTypeId() { return 10; } @Override public void write(ObjectDataOutput out, Object object) throws IOException { ByteArrayOutputStream bos = new ByteArrayOutputStream(); XMLEncoder encoder = new XMLEncoder(bos); encoder.writeObject(object); encoder.close(); out.write(bos.toByteArray()); } @Override public Object read(ObjectDataInput in) throws IOException { final InputStream inputStream = (InputStream) in; XMLDecoder decoder = new XMLDecoder(inputStream); return decoder.readObject(); } @Override public void destroy() { } }
And the last step is to register our XMLSerializer
into Hazelcast Config.
SerializationConfig config = new SerializationConfig(); TypeSerializerConfig tsc = new TypeSerializerConfig(). setImplementation(new XmlSerializer()). setTypeClass(Object.class); config.addTypeSerializer(tsc);
With this configuration, for every Object other than DataSerializable
and Portable
, Hazelcast will use XMLSerializer
to serialize and deserialize.
Map
Hazelcast distributed Map received the following new features:
- Storage Data Type: Binary, Object and Cached
- Execute On Key with an
EntryProcessor
. - Continuous Query
- Interceptors
- Lazy Indexing
By default Hazelcast stores entries in its serialized format (Binary). However with Hazelcast 3, you can store entries in Object format. It is useful when you have a very heavy query usage and you want to get rid of deserialization. And, if you are using EntryProcessor
, then Object format becomes much more important.
EntryProcessor
allows you to execute a code on a key, which may mutate the state without a need for explicit lock. When you execute an EntryProcessor
on key, Hazelcast implicitly locks the key. This allows performing atomic operations on the key. Hazelcast also guarantees the locality by delaying the migration until the end of the processing.
If the “in-memory-format” of the Map is Object, then there will not be any serialization/deserialization while EntryProcessor
mutates the state of the entry. To set the “in-memory-format”, the following Config API can be used.
Config config = new Config(); config.getMapConfig("myMap"). setInMemoryFormat(MapConfig.InMemoryFormat.OBJECT);
Continuous Query is a special EntryListener
, which takes a Predicate
as a parameter. This way, the listener will be invoked whenever there is an addition/update/remove/evict event on entries that match the Predicate
.
An Interceptor lets you to intercept operations and execute your own business logic synchronously blocking the operation. You can change the returned value from a get operation, change the value to be put or cancel operations by throwing exception.
And last but not least, there is no need to add indexes at the very beginning any more. You can add indexes to the entries at any point and Hazelcast can lazily index them.
Apart from Map, there also some changes to Queue, Topic and MultiMap. Queues are no longer dependent on Map and they have their own QueueStore
for persistence and they scale much better when there is multiple Queues. MultiMap values could be Set or List. Now, they can also be a Queue. With Hazelcast 3, topics support global ordering. With global ordering enabled, all receivers will receive all messages from all sources with the same order. Previously only the messages from one source were received in an order. Two different receivers could receive messages from source A and B in different orders. To enable the global ordering:
Config config = new Config(); config.getTopicConfig("global-ordered"). setGlobalOrderingEnabled(true);
Of course global ordering is not as scalable as the default behavior.
Final major update is the Native Clients. We have removed the LiteMember and added a smart routing capability to the clients. Now a client can be both smart and dummy. A dummy client will have connection only to one member, however a smart client can route operations to the owner of the data. And a new binary protocol is created using Portable Serialization. The same protocol is used to implement C++ and C# clients. Note that C# and C++ clients are not ready yet.
When we started to build Hazelcast, our goal was to simplify distributed programming. Hazelcast 3 is another big step towards our mission. We build it with passion and by listening to community. The current release is 3.0.3. Some bugs, undocumented parts, mismatching documentations are inevitable. Grab it, while it is hot, and share your story with us. You can check out the source (branch 3.0) and open issues at Github.