Jet 0.4 Streaming




The streaming benchmark is based on a simple stock exchange aggregation. Each message representing a trade is published to Kafka and then a simple windowed aggregation which calculates the number of traders per ticker per window is done using various data processing frameworks.

A message looks like follows:

public class Trade implements Serializable {

private final long time;
private final String ticker;
private final int quantity;
private final int price; // in cents

Trade(long time, String ticker, int quantity, int price) {
this.time = time;
this.ticker = ticker;
this.quantity = quantity;
this.price = price;

public long getTime() {
return time;

public String getTicker() {
return ticker;

public int getQuantity() {
return quantity;

public int getPrice() {
return price;

public String toString() {
return “Trade{time=” + time + “, ticker='” + ticker + ‘\” + “, quantity=” + quantity + “, price=” + price + ‘}’;

The streaming benchmark is intended to measure the latency for a streaming system under different conditions such as message rate and window size. The latency is measured as the delay between the earliest time we could have received results for the window the event is in, and the actual time the result was received.

For example, if we want the result for events happening between 12:00:00.000 and 12:00:01.000, theoretically, the earliest time we can get an aggregated result is at 12:00:01.000. However, this does not take into account that events happening at 12:00:00.999 might not reach the system immediately and there also can be some out of orderness due to partitioning (Kafka only guarantees ordering by partition).

As a solution, we need to allow for some delay to allow all the events to reach the system. If the delay is configured as one second, we should wait until receiving an event with timestamp 12:00:02 before we can compute the window for events that happened between 12:00:00-12:00:01. Based on this, the earliest time we can get a result will be the time when we received an event with a timestamp 12:00:02.000. If the system’s actual output happens at 12:00.02.100, then we define the latency as 100ms. This latency includes all of the following:

  • Time for message to be published to the broker
  • Time for message to be consumed by the framework
  • Time for message to be processed

Each framework is expected to output tuples to one or more files in the following format:


WINDOW_TIME is defined as the end time of a window. For example, for a window of events between 12:00:00 and 12:00:01, the WINDOW value would be 12:00:01. Latency can then be calculated based on the difference between the WINDOW and the CALCULATION_TIME.

A sample output could look as follows:


The first value is the window close timestamp, which indicates what time period this value is for (WINDOW_TIME). The second value is the stock ticker, and the third the count for that ticker within that window. The next value represents when the processing for that window was completed (CALCULATION_TIME), and the last value (LATENCY) is simply CALCULATION_TIME – WINDOW_TIME.

If the allowed latency was 1000, then this number should also be substracted from LATENCY to find the real latency of the processing framework.

The following windowing combinations are tested:

  • 1 second tumbling window
  • 10 seconds by 1 second sliding window
  • 60 seconds by 1 second sliding window
  • Allowed out of orderness is 1 sec

The output files as above are parsed by a simple log parser written in Python to calculate the average latencies.

All source is available here: big-data-benchmark

Test Environment

All tests are performed on AWS using 10Gbit network.

Compute Optimized c4.8xlarge EC2 instances were used.

2 node clusters were used for Jet, Flink and Spark.


  • Version 0.10.2 – Scala 2.12
  • 4 nodes
  • 72 Partitions (16 per node)

The limit of messages that can be pushed to one broker is observed to be around 1.2M messages / sec.

All system clocks are synchronized before running the tests.

Notes about Spark Streaming

Spark Streaming, unlike Jet or Flink does not support dealing with event time. It’s not clear what timestamp is used for the events, but it’s likely to be ingestion time of event by Spark. It has a very different programming model, where you set a microbatch size in essence for the whole JVM and a batch job is run repeatedly depending on your microbatch size. As such the latency results for Spark are not comparable to Flink or Jet. There is a new Spark API called “Structured Streaming” which supports event time. It is still alpha in Spark 2.1.

It also outputs a new folder per each window (so creates a new folder every second for 1 sec windows) since each window is handled by a different batch job.


All results are latencies to compute the aggregation, given in milliseconds.

In summary, Jet’s aggregation latency stayed flat at 20-75 ms independent of the message rate per second. The other products saw latencies increase with message throughput.

1 second tumbling window

Messages / sec 10,000 100,000 1,000,000 5,000,000
Jet 57 48 37 20
Flink 201 204 195 1371
Spark 189 273 307 600

10 seconds by 1 second sliding window

Messages / sec 10,000 100,000 1,000,000 5,000,000
Jet 75 37 37 21
Flink 208 210 174 5700*
Spark 188 240 307 615

60 seconds by 1 second sliding window

Messages / sec 10,000 100,000 1,000,000 5,000,000
Jet 41 37 37 20
Flink 210 235 241 27088*
Spark 196 246 382 623

*: Latency kept increasing without bound because Flink wasn’t able to keep up.