Zero Downtime, Real Pain: Schema Evolution in Cached, Live Systems
Register NowZero downtime upgrades in distributed Java systems are table stakes if you care about uptime.
The real problem? During a rolling upgrade, multiple versions of your service run at the same time and operate on the same cached or shared state. Everything looks fine. Dashboards are green. No alerts.
Until subtle inconsistencies start showing up under load. Sometimes hours later.
The compiler won’t catch it. Deployment won’t fail. Different versions of your code make different assumptions about the same in memory data. That mismatch, not serialization, is what causes production incidents.
Avro and Protobuf handle boundaries well. But once data is cached, mutated, and executed on inside a distributed runtime, schema evolution becomes a runtime behavior problem.
In this session, we’ll break down application vs platform rolling upgrades using Hazelcast and Compact Serialization as a concrete example. You’ll see what really happens when two versions of the same class operate on shared live data, how compatible and incompatible changes behave, and what forward compatibility actually requires in real code, with Java examples and mixed version tests.
You’ll leave with a practical mental model for upgrading stateful distributed systems safely, and designing changes that behave predictably in production.