Schema Evolution in Hazelcast: Evolve your data without downtime
Applications change continuously: business rules evolve, new requirements appear, and data models change to accommodate these changes. In systems that serve critical workloads, those changes must happen while the system is live.
With a shared data layer like Hazelcast, you often end up with multiple application versions running at once, all reading and writing the same data but with different object layouts. Without explicit support for schema evolution, old code can’t read new data, and upgrades may end up in downtime.
Hazelcast solves this with Compact Serialization and its support for schema evolution. In this post, I’ll walk through how this works in practice, what changes are safe, which ones are not, and how to evolve live data resiliently.
The internals of Compact Serialization (binary layout, schema fingerprint generation, replication, and partial deserialization) are covered in this blog post, so I won’t repeat them here.
Many thanks to Krzysztof Jamróz for helping clarify edge cases around schema evolution and Compact Serialization and for reviewing this post.
Hazelcast support for schema evolution
With Compact Serialization, Hazelcast separates an object’s schema (its field names and types) from the binary data stored in memory or on disk.
Each distinct schema is identified by a 64-bit fingerprint derived from its structure, and each record references the fingerprint of the schema used when it was written. This allows multiple schema versions of the same logical type to coexist in the cluster. Schemas are registered when first encountered and exchanged between members and clients on demand, and when persistence is enabled, they are stored alongside the data to allow exact recovery after a restart.
Key Terms
- Schema: the structural definition of a serialized object, including its field names and types
- Type name (typeName): a unique and stable identifier associated with the Compact type. It binds all schema versions for a given logical class
- Fingerprint: a 64-bit identifier derived from the schema definition. Hazelcast uses a Rabin Fingerprint algorithm to determine the identifier, and every time the schema changes, a new fingerprint is generated.
Hazelcast uses the typeName to decide what an object represents, and the fingerprint to determine which version of the schema was used.
Compatible vs incompatible changes
When a schema changes, compatibility is crucial because, during rolling upgrades of the application, both old and new versions may run concurrently, reading and writing the same data through a shared Hazelcast cluster.
From Hazelcast’s point of view, a change is compatible if existing readers can still deserialize data written with the new schema. From the application’s point of view, it is only compatible if the data still makes sense to the business logic. In practice, compatible changes allow old and new schema versions to coexist during application rolling upgrades against a stable Hazelcast cluster, while incompatible changes require explicit migration to transform existing data.
As a rule of thumb, compatible changes include adding optional fields, removing fields that readers no longer depend on, widening numeric types when readers handle both representations, and extending nested objects in an additive way. These changes can usually be introduced without rewriting existing data.
Incompatible changes include narrowing types, renaming fields, changing field meaning, restructuring nested types, switching between unrelated type families, changing date or timestamp semantics (such as time zones or epoch units), and altering partitioning or key structure. These changes require explicit migration, because schema evolution alone does not address semantic correctness.
The key distinction is that compatibility here refers to serialization, not meaning. Date and time changes are a common case where deserialization succeeds but semantics change, so migrations should always be explicit and verified.
More details on compatible vs incompatible changes are available in the Hazelcast’s official documentation.
A simple Compact example
Full code in support of this blog is available on GitHub in the schema-migration-samples repo.
To show how to handle changes in practice, we’ll follow up with a concrete example. Let’s start with a basic Order class:
public record Order(long id, long customerId, BigDecimal amount, String status) {}
and the respective compact OrderSerializer:
public final class OrderSerializer implements CompactSerializer<Order> {
@Override
public String getTypeName() {
return "com.acme.Order";
}
@Override
public Class<Order> getCompactClass() {
return Order.class;
}
@Override
public void write(CompactWriter w, Order o) {
w.writeInt64("id", o.id());
w.writeInt64("customerId", o.customerId());
w.writeDecimal("amount", o.amount());
w.writeString("status", o.status());
}
@Override
public Order read(CompactReader r) {
return new Order(
r.readInt64("id"),
r.readInt64("customerId"),
r.readDecimal("amount"),
r.readString("status")
);
}
}
Strategies to handle compatible schema evolution
Using the previous example as a starting point, we can now look at practical strategies for handling schema evolution. For data that is cached, applying these strategies means using the same map to handle objects that have different compatible schema changes.
Keeping typeName stable for compatible changes
In Compact Serialization, the typeName acts as the logical identity of a type. Hazelcast uses it to decide whether two schemas describe the same concept or represent entirely different data.
As long as schema changes are compatible, different schema versions can safely share the same typeName. In that case, Hazelcast treats them as evolutions of the same type, allowing old and new application code to exchange data transparently. When the typeName changes, that continuity is deliberately broken: Hazelcast no longer attempts to relate the new schema to the old one, and the data is treated as belonging to a different type.
This distinction becomes important during rolling application upgrades, where multiple schema versions may coexist in the cluster at the same time.
Deterministic writes
Compact schema fingerprints are derived from the fields written by the serializer. If the set of written fields changes depending on runtime conditions or object state, Hazelcast may observe multiple schemas for what is logically the same type.
This is where problems start to appear. Multiple fingerprints for the same logical structure make schema evolution unpredictable and can lead to unnecessary schema registrations or deserialization failures. In practice, serializers that always write the same fields in the same order produce stable fingerprints and predictable evolution behavior.
Safely evolve Order
Let’s see how to apply the strategies described above.
Let’s introduce a new OrderV2 class that adds a new field, “currency” to the schema. This is a compatible change that allows Order and OrderV2 objects to be serialzied in the same map (e.g. orders).
public record OrderV2(long id, long customerId, BigDecimal amount, String status, String currency) {}
public final class OrderV2Serializer implements CompactSerializer<OrderV2> {
@Override
public String getTypeName() {
return "com.acme.Order"; // same typeName: same logical type
}
@Override
public Class<OrderV2> getCompactClass() {
return OrderV2.class;
}
@Override
public void write(CompactWriter w, OrderV2 o) {
w.writeInt64("id", o.id());
w.writeInt64("customerId", o.customerId());
w.writeDecimal("amount", o.amount());
w.writeString("status", o.status());
w.writeString("currency", o.currency());
}
@Override
public OrderV2 read(CompactReader r) {
String currency = "GBP";
if (r.getFieldKind("currency") == FieldKind.STRING) {
currency = Optional.ofNullable(r.readString("currency")).orElse("GBP");
}
return new OrderV2(
r.readInt64("id"),
r.readInt64("customerId"),
r.readDecimal("amount"),
r.readString("status"),
currency
);
}
}
Both serializers share the same typeName and deterministic field order. OrderV2Serializer remains backward compatible by providing a default for currency when the field is missing.
As a result:
- When OrderSerializer reads an OrderV2 object, currency is ignored. This applies to full Compact deserialization paths; SQL and index evaluation operate directly on stored fields.
- When OrderV2Serializer reads an Order object, currency is defaulted.
Backward and forward compatibility are maintained.
When a change breaks compatibility
As discussed above, some changes break old readers that cannot interpret the data correctly.
Once this compatibility is broken, coexistence is no longer sufficient. Schema evolution alone cannot bridge the gap, because the structure or semantics of the data no longer match what the application expects. At that point, existing data must be transformed to align with the new schema and its meaning.
This is why incompatible changes require explicit migration. Migration makes the transition intentional by converting data written under the old schema into a form that the new version can consume safely, restoring correctness before normal operation resumes.
There are several ways to implement migration once compatibility is broken.
In some cases, applications embed an explicit version field and handle multiple representations in the reader logic. In others, data is migrated explicitly, and old and new representations are kept separate during the transition. Here, we focus on a simple and explicit approach based on using multiple map versions, which makes schema boundaries clear and keeps migration logic isolated from normal read and write paths.
Let’s create a new schema for Order that introduces a breaking change by renaming the field customerId to accountId
public record OrderV3(long id, long accountId, BigDecimal amount, String status, String currency) {}
The new serializer is:
public final class OrderV3Serializer implements CompactSerializer<OrderV3> {
@Override
public String getTypeName() {
return "com.acme.OrderV3"; // new typeName: new schema
}
@Override
public Class<OrderV3> getCompactClass() {
return OrderV3.class;
}
@Override
public void write(CompactWriter w, OrderV3 o) {
w.writeInt64("id", o.id());
w.writeInt64("accountId", o.accountId());
w.writeDecimal("amount", o.amount());
w.writeString("status", o.status());
w.writeString("currency", o.currency());
}
@Override
public OrderV3 read(CompactReader r) {
return new OrderV3(
r.readInt64("id"),
r.readInt64("accountId"),
r.readDecimal("amount"),
r.readString("status"),
r.readString("currency")
);
}
}
By introducing a new typeName (com.acme.OrderV3), the new schema is explicitly separated from earlier versions. This makes it clear that the data represents a different structure and avoids ambiguity during reads and writes.
Data written under the old schema can no longer be used directly and must be transformed to match the new representation. Keeping old and new data separate during this transition makes the change easier to reason about and validate.
Because each map maintains its own Compact schemas, the two representations remain isolated and recoverable throughout the migration. In the following sections, we use Hazelcast Jet as simple examples of how this transformation can be performed.
Migration pipeline example
For simplicity, assume that the orders map currently contains only OrderV2 records. Migrating existing data then becomes a batch operation: entries are read from the old representation, transformed, and written using the new schema.
A Jet pipeline provides a concise way to express this transformation and runs to completion once all existing records have been rewritten.
Pipeline bulk = Pipeline.create();
bulk.readFrom(Sources.<Long, OrderV2>map("orders"))
.map(e -> Map.entry(
e.getValue().id(),
new OrderV3(/* transformed value */)
))
.writeTo(Sinks.map("orders_v3"));
hz.getJet().newJob(bulk).join();
In practice, migrations often run alongside a rolling application upgrade. While the batch job is processing existing entries, older versions of the application may still be writing to the original map.
To keep the new representation consistent during this transition, a second pipeline can consume changes from the source map’s event journal and apply them as they occur.
Pipeline tail = Pipeline.create();
tail.readFrom(
Sources.<Long, OrderV2>mapJournal(
"orders",
JournalInitialPosition.START_FROM_CURRENT
)
)
.map(e -> {
if (e.getValue() != null) {
return Map.entry(
e.getValue().id(),
new OrderV3(/* transformed value */)
);
} else {
return null; // represents a delete
}
})
.writeTo(Sinks.map("orders_v3"));
hz.getJet().newJob(tail);
The batch pipeline handles the existing dataset, while the journal-driven pipeline keeps the new schema up to date by propagating inserts, updates, and deletes as they happen. Together, these pipelines allow data to be migrated incrementally without stopping the application until all clients have transitioned to the new schema.
The tail pipeline in the above code snippet is logically correct but incomplete. The full pipeline is available on github (com.fcannizzohz.samples.schemaevolution. migration.V2toV3PipelineFactory#createTailPipeline).
Persistence and MapStore
If you use Persistence or MapStore, Compact schemas are saved alongside map data.
Each record stores its schema fingerprint, so Hazelcast can restore it exactly as it was after a restart.
So the strategies described above carry on working as expected.
You must pay attention when both the MapStore and Jet pipeline operate on the same target map to handle backward incompatible changes and migration is required.
Since both MapStore and the job may operate on the same key, the order of operations determines which value persists.
If a key is not in memory, Hazelcast may call the MapStore’s load or loadAll methods to fetch it from the external system. If the load executes concurrently with the pipeline on the same key the order matters and the last value wins.
The Jet pipeline may be coded using Sinks.map, Sinks.mapWithMerging, Sinks.mapWithUpdating, or Sinks.mapWithEntryProcessor to decide what strategy to adopt in case the pipeline attempts to sink a key that already exist.
Near cache behaviour
When Near Cache is enabled, it remains unaffected by Compact schema evolution.
Each map’s Near Cache stores its own serialized entries. Old clients use the old schema and cache; new clients use the new one. No invalidation is required.
SQL and Indexes
When Compact-serialized data is queried through Hazelcast SQL, fields are read directly from the stored binary using the schema associated with each record. SQL does not invoke the application’s CompactSerializer.read() logic, so any defaulting or migration implemented there is not applied at query time.
This means that both old and new data can be queried using either the original mapping or an extended one, once the cluster is running a backward-compatible serializer. With the original mapping, queries continue to see only the fields defined in the old schema. With an extended mapping, newly added fields become queryable, but records written under earlier schemas simply return NULL for those columns, while records written with the new schema return actual values.
This behaviour reflects SQL’s schema-on-read model: data is not rewritten when schemas evolve, and each row is interpreted using the schema version it was written with.
In our orders example, with the following mapping, old schema orders (without the currency field) that match a SQL where clause are returned but with NULL currency.
CREATE OR REPLACE MAPPING orders (
id BIGINT,
customerId BIGINT,
amount DECIMAL,
status VARCHAR,
currency VARCHAR
)
TYPE IMap
OPTIONS (
'keyFormat' = 'bigint',
'valueFormat' = 'compact',
'valueCompactTypeName' = 'com.acme.Order' -- matches Compact typeName in serializer
);
Indexes follow the same boundaries as the data they describe. They are defined per map and remain valid across compatible schema changes.
New indexes can be added for newly introduced fields without affecting existing queries. When a schema change is incompatible and data is migrated explicitly, corresponding indexes must be recreated to match the new representation before queries rely on them.
Closing thoughts
Compact Serialization turns schema evolution into a manageable part of running Hazelcast-powered systems, rather than something to work around.
Different schema versions can coexist in the same cluster, and compatibility rules are explicit instead of implicit. That’s what makes it possible to evolve applications without synchronized upgrades or planned downtime.
Most changes in a live system are additive: new fields, optional data, or reshaped objects that old code can safely ignore. If readers tolerate missing or additional fields, these changes can be rolled out while the system is running. When a change breaks compatibility, that break is visible and deliberate. You know a migration is required, instead of discovering it later through failed deserialization or corrupted state.
Handled this way, schema evolution stops being a special event. Upgrades carry less risk and become part of normal operations rather than something that disrupts them.
More details are available in the Schema Evolution section of the Hazelcast docs.
Join our community and get started
Stay engaged with Hazelcast—join our Community Slack channel to connect with peers, share insights, and influence future features. Sign up for a free Enterprise Trial License to try all our enterprise features. Build scalable, resilient systems today!