Hazelcast Jet 0.4 – New Windowing and Event-Time Processing Functionality Enhances Real-Time Stream Processing Engine

Hazelcast Jet outperforms Flink and Spark with 40ms average latency benchmark

Palo Alto, Calif., June 13, 2017Hazelcast®, the leading open source in-memory data grid (IMDG) with hundreds of thousands of installed clusters and over 26 million server starts per month, today announced the 0.4 release of Hazelcast Jet® – an application-embeddable, distributed processing engine for big data stream and batch. Major new functionality in Hazelcast Jet 0.4 includes event-time processing with tumbling, sliding and session windowing. Using these new capabilities, users benefit from a feature-rich stream processing architecture which provides a flexible mechanism to build and evaluate windows over continuous data streams. Easy to use, deploy and program, Hazelcast Jet is appropriate for applications such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms.

Stream processing has overtaken batch processing as a preferred method of processing big data sets for companies that require immediate insight into data. However, to get value from data, it must be partitioned i.e. take a fragment of the stream and analyze it. To classify data windows during processing, each data element in the stream needs to be associated with a timestamp. In Hazelcast Jet 0.4 this is achieved via event-time processing (a logical, data-dependent timestamp, embedded in the event itself). However, a major drawback of event-time processing is that events may arrive out of order or late, so you can never be sure if you see all events in a given time window.

To alleviate this issue, the latest release of Hazelcast Jet also includes windowing functionality which enables users to evaluate stream processing jobs at regular time intervals, regardless of how many incoming messages the job is processing. Hazelcast Jet offers three types of windows:

  • Fixed/tumbling – time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step.
  • Session – windows have various sizes and are defined basing on data, which should carry some session identifiers.

Additional enhancements in Hazelcast Jet 0.4 also include:

  • Users are now able to use the ICache/Hazelcast integration as a source and sink of data.
  • java.util.stream can be used on top of ICache to enable basic data processing.
  • Streaming File Connector – improved connector allows users to watch files and directories for changes.
  • Numerous Hazelcast Jet code samples are now available which can be used as building blocks for Jet applications, providing a gradual learning experience.

In a new latency benchmark study published today Hazelcast Jet outperformed its competitors with a 40ms average latency for stream processing computations which remained flat as messages increased. Flink and Spark’s execution latencies were hundreds of ms rising to seconds at the higher message throughputs.

The study compares the average latencies of Hazelcast Jet, Flink and Spark Streaming under various different criteria such as message rate and window size. The full benchmark is available here. Results can be viewed in the tables below (all results are given in milliseconds).

1 second tumbling window

Messages / sec10,000100,0001,000,0005,000,000

Jet 57 48 37 20
Flink 201 204 195 1371
Spark 189 273 307 600

10 seconds by 1 second sliding window

Messages / sec10,000100,0001,000,0005,000,000

Jet 75 37 37 21
Flink 208 210 174 5700*
Spark 188 240 307 615

60 seconds by 1 second sliding window

Messages / sec10,000100,0001,000,0005,000,000

Jet 41 37 37 20
Flink 210 235 241 27088*
Spark 196 246 382 623

*Latency kept increasing without bound because Flink wasn’t able to keep up.

With Hazelcast IMDG® providing storage functionality, Hazelcast Jet is an Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Built on top of a one-record-per-time architecture (sometimes known as continuous operators), Hazelcast Jet processes incoming records as soon as possible, opposed to accumulating records into micro-batches, consequently lowering latency for applications.

Greg Luck, CEO of Hazelcast, said: “The Jet project is progressing faster than we could have hoped. The new functionality in 0.4 brings stream processing for the first time. As with batch, we are achieving a new performance level, giving us a real edge over alternative market solutions. Jet’s architecture is performance and low latency driven, which is why there are no real surprises in the results of our latest benchmark. Driven by the community, Jet is an easy to deploy fast data solution for programmers built on the premise of simplicity.”

Hazelcast will be providing 24×7 enterprise support subscriptions for Hazelcast Jet. For more information about Hazelcast Jet visit http://jet.hazelcast.org/