Hazelcast on AWS – Kernel Bug

The other day we discovered a bug in older Linux kernel versions and I thought you would like to know about it.

Summary

Use Linux kernel 3.19+ when running Hazelcast on AWS. TCP connections can get stuck with older kernel versions. They appear to be fine, but data are not flowing. This can result in hard-to-explain timeouts.

The Gory Details

Read this: http://johnjianfang.blogspot.bg/2015/08/xen-driver-bug-xennetfront-xennet-skb.html

tl;dr: There is a bug in Xen network driver and AWS happens to use Xen for virtualization. A workaround is to disable (sudo ethtool -K eth0 sg off) the buggy features, but it comes with a performance price and it’s better to use a kernel version with fix = 3.19+. I assume Linux distribution vendors back-ported the fix to older kernel versions, but I have not checked that.

Further Sources:

Kernel Patches: