Document Mini Preview

Benchmark

Jet 0.3 vs Flink & Spark Benchmark

Comparison

Benchmark

Word Count – Total size of input file is given in parentheses.

  • 1 million distinct words (64GB)
  • 1 million distinct words (640GB)

All data sets are distributed across all 10 nodes evenly. Each file contains several lines, with each line containing 20 words. Words are all numeric, starting from 0 to the maximum distinct number. So a typical file would look like:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
...
........................... 999998 999999 1000000

All source code is available here: https://github.com/hazelcast/big-data-benchmark

Test Environment

10 Servers running:

Hardware HP DL380p Gen9
CPU 2x Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz. Total 20 cores, 40 with hyper threading
CPU 2.5GHz Intel Core i7-4870HQ 4 cores
RAM 768 GB
Storage 240 GB SSD
Network 1Gb network is used for the tests.
OS Red Hat Enterprise Linux Server release 7.1
Java
Hadoop Hadoop 2.7.2

Results Summary

Jet Benchmark 0.3 Summary

Hazelcast Jet 0.3

10 Nodes – 20 GB JVM Heap per Node

Data Set Duration (secs) Throughput (MB/s)
1m distinct 64GB 37.12 1,769.87
1m distinct 640GB 338.15 1,942.85

Apache Flink 1.2.0

On Heap

10 Nodes – 20 GB JVM Heap per Node – No Off Heap

Configuration:

taskmanager.heap.mb: 20480
taskmanager.numberOfTaskSlots: 40
taskmanager.network.numberOfBuffers: 64000
taskmanager.memory.preallocate: true
taskmanager.debug.memory.startLogThread: true
env.java.opts.taskmanager: -XX:+PrintGC -XX:+PrintGCTimeStamps -Xloggc:/home/can/appdisk/flink.gc.log
Data Set Duration (secs) Throughput (MB/s)
1m distinct 64GB 95.35 689.01
1m distinct 640GB 767.59 855.89

Off Heap

10 Nodes – 20 GB JVM Heap per Node – 20GB Off Heap

Configuration:

taskmanager.heap.mb: 20480
taskmanager.memory.off-heap: true
taskmanager.numberOfTaskSlots: 40
taskmanager.network.numberOfBuffers: 64000
taskmanager.memory.preallocate: true
taskmanager.debug.memory.startLogThread: true
env.java.opts.taskmanager: -XX:+PrintGC -XX:+PrintGCTimeStamps -Xloggc:/home/can/appdisk/flink.gc.log
Data Set Duration (secs) Throughput (MB/s) Notes
1m distinct 64GB 89.7 732.41
1m distinct 640GB N/A N/A Didn’t run test – since no difference is anticipated without off-heap

Apache Spark 2.1.0

On Heap

10 Nodes – 20 GB JVM Heap per Node – No Off Heap

Configuration:

spark.driver.memory 16g
spark.executor.memory 20g
spark.executor.cores 40
spark.executor.extraJavaOptions -XX:+PrintGC -XX:+PrintGCTimeStamps -Xloggc:/home/can/appdisk/spark.gc.log
Data Set Duration (secs) Throughput (MB/s)
1m distinct 64GB 85.54 768.03
1m distinct 640GB 796.14 825.20

Off Heap

10 Nodes – 20 GB JVM Heap per Node – 20GB Off Heap

Configuration:

spark.driver.memory 16g
spark.executor.memory 20g
spark.executor.cores 40
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 20g
spark.executor.extraJavaOptions -XX:+PrintGC -XX:+PrintGCTimeStamps -Xloggc:/home/can/appdisk/spark.gc.log
Data Set Duration (secs) Throughput (MB/s) Notes
1m distinct 64GB 85.17 771.37
1m distinct 640GB N/A N/A Didn’t run test – since no difference is anticipated without off-heap

Get the Benchmark

Oops!

There's supposed to be a form right here, but its been hidden by your adblocker. Please disable your adblocker so you can get the benchmark you came for.

Hazelcast.com

Menu