Open Source Projects:
Back to top

Jet 3.0 Streaming Benchmark


The streaming benchmark is intended to measure the latency overhead for a streaming system under different conditions such as message rate and window size. It compares Hazelcast Jet, Apache Flink, and Apache Spark Streaming.

Summary of Findings

  1. Jet is the only system with a latency below one second across all benchmarks.
  2. Spark’s max latency exceeds five seconds in all benchmarks.
  3. Flink and Spark were unable to keep up for the 10-second window sliding by 100 milliseconds. Their reported max latency is therefore just an artifact of the duration of the benchmark. Actually, their max and average latency is unbounded.

The streaming benchmark is based on a stock exchange aggregation. Each message representing a trade is published to Kafka and then a simple windowed aggregation which calculates the number of traders per ticker symbol is done using various data processing frameworks.

The latency is measured as the delay between the earliest time we could have received results for the window the event is in, and the actual time the result was received.

For example, if we want the result for events happening between 12:00:00.000 and 12:00:01.000, theoretically, the earliest time we can get an aggregated result is at 12:00:01.000. However, this does not take into account that events happening at 12:00:00.999 might not reach the system immediately and there also can be some out of orderness due to partitioning (Kafka only guarantees ordering by partition).

As a solution, we need to allow for some delay to allow all the events to reach the system. If the delay is configured as one second, we should wait until receiving an event with timestamp 12:00:02 before we can compute the window for events that happened between 12:00:00-12:00:01. Based on this, the earliest time we can get a result will be the time when we received an event with a timestamp 12:00:02.000. If the system’s actual output happens at 12:00.02.100, then we define the latency as 100ms. This latency includes all of the following:

  • TT Time for message to be published to the broker
  • TT Time for message to be consumed by the framework
  • TT Time for message to be processed

Each framework is expected to output tuples to one or more files in the following format:


WINDOW_TIME is defined as the end time of a window. For example, for a window of events between 12:00:00 and 12:00:01, the WINDOW value would be 12:00:01. Latency can then be calculated based on the difference between the WINDOW and the CALCULATION_TIME.

A sample output could look as follows:


The first value is the window close timestamp, which indicates what time period this value is for (WINDOW_TIME). The second value is the stock ticker, and the third the count for that ticker within that window. The next value represents when the processing for that window was completed (CALCULATION_TIME), and the last value (LATENCY) is simply CALCULATION_TIME – WINDOW_TIME.

If the allowed latency was 1000 ms, then this number should also be subtracted from LATENCY to find the real latency of the processing framework.

The following windowing combinations are tested:

  • 1-second tumbling window
  • TT 10 seconds by 100 milliseconds sliding window
  • TT 60 seconds by 1-second sliding window

Allowed out of orderness is 1 sec

The output files as above are parsed by a simple log parser written in Python to calculate the average latencies.

All source is available here: big-data-benchmark

Benchmark Tool Configuration

  • TT Clock drift was about 5-6 ms so the final values can be +-6ms off.
  • TT Used OpenJDK version 1.8.
  • TT All tests performed on AWS using 10Gbit network.


Parameter Value
Tool Version 2.0.0 – Scala 2.11
Nodes 4 nodes (type c4.8xlarge)
Partitions 72 Partitions (16 per node)
Replication replication-factor=1
Retention (1 minute)
Number of network threads


Parameter Value
Tool Version 3.0
Nodes 2 nodes (type c5.9xlarge)
Snapshot Snapshot interval was 10 seconds, Jet JVM heap was 32G (60G available on
Configuration Kafka source vertex local parallelism was increased from default 1 to 36 to be
able to handle the load


Parameter Value
Tool Version 1.6.1
Nodes 2 nodes (type c5.9xlarge)
Heap 32G heap for taskmanagers, 36 task slots
Instances jobmanager running on one of the taskmanager instances
Latency Measured latencies were very stable, snapshotting is very fast and does not
affect latency
Snapshot Used 10 second snapshot interval, snapshotting to local file system


Parameter Value
Tool Version 2.3.2
Nodes 2 nodes (type c5.9xlarge)
  • TT master: 1g
  • TT worker: 1g
  • TT driver: 1g
  • TT executor: 16g
Instances jobmanager running on one of the taskmanager instances
Latency Measured latencies were very stable, snapshotting is very fast and does not
affect latency
Snapshot Used 10 second snapshot interval, snapshotting to local file system

All latency results are given in milliseconds, to three significant digits.

Due to clock skew between machines, there is a ±20 ms uncertainty in the results. This is especially relevant to Jet’s results, where we measured about 22 ms minimum latency. The real minimum latency may be closer 0-2 ms and average latency about 20 ms.

Duration of a benchmark run: 140 seconds.

The tests used 2 million messages per second and 10,000 distinct keys.

Window size Sliding by Jet Flink Spark
1 sec 1 sec average: 43.7,
min: 23, max: 223
average: 243,
min: 128, max: 476
average: 483,
min: 243, max: 5620
10 sec 0.1 sec average: 47.2,
min: 22, max: 612
average: 40700,
min: 234, max: 95100
average: 13700,
min: 5760, max: 22700
60 sec 1 sec average: 46.3,
min: 22, max: 281
average: 6990,
min: 234, max: 36900
average: 439,
min: 226, max: 5220

Latency Distribution Charts

The 1-second tumbling window is the only benchmark for which it makes sense to compare all three of Jet, Flink and Spark on the same chart.

One-second tumbling window:

Jet 3.0 Benchmark: One Second Tumbling Window

10-second window sliding by 1 second:

Jet 3.0 Benchmark: 10-second window sliding by 1 second

10-second window sliding by 0.1 seconds:

Jet 3.0 Benchmark: 10-second window sliding by 0.1 seconds