Hazelcast Jet

Introducing Hazelcast Jet

In-Memory Stream and Batch Processing—Lightweight, Embeddable, Powerful

Hazelcast Jet is an application embeddable, distributed computing platform for fast processing of big data sets. The Hazelcast Jet architecture is high performance and low latency driven, based on a parallel, streaming core engine which enables data-intensive applications to operate at near real-time speeds.

Hazelcast Jet is built on top of Hazelcast IMDG, the leading open source in-memory data grid with tens of thousands of installed clusters. Hazelcast Jet processing jobs take full advantage of the distributed in-memory data structures provided by Hazelcast IMDG to:

  • Write operational results of a computation job to an in-memory NoSQL key-value store
  • Cache data for record enrichment, pre-processing and data cleaning
  • Enable in-memory messaging and notifications
  • Process the data stored in the IMDG making use of a data locality
  • Achieve breakthrough application speed by keeping both the computation and data in-memory
Jet Internet of Things Diagram

Typical use cases for Hazelcast Jet include:

  • Complex event processing
  • Data processing microservice architectures
  • Fast batch processing
  • Implementing Event Sourcing and CQRS
  • In-store e-commerce systems
  • Internet of Things (IoT) data ingestion, processing and storage
  • Online trading
  • Real-time big data processing
  • Social media platforms
  • Streaming analytics
  • System log events

Hazelcast Jet Performance

Hazelcast Jet Streaming Processing Platform

See Jet Performance for the architectural choices in Jet behind our breakthrough performance.

Hazelcast Jet Features

Hazelcast Jet Architecture Diagram
Distributed Computation

Hazelcast Jet

Hazelcast Jet
Open Source

Pipeline API

Pipeline API is the primary API of Hazelcast Jet. It follows the general shape pattern of any data processing pipeline: drawFromSource -> transform -> drainToSink.

The Jet library contains a set of Transforms covering standard data operations (map, filter, group, join). Also there are Source and Sink adapters available. The library can be extended by custom Transforms, Sources or Sinks.

See the Docs for this feature

Distributed Java 8 Streams

java.util.stream API is a well known and popular API in the Java community. It supports functional style operations on streams of elements.

Hazelcast Jet shifts java.util.stream to a distributed world – the processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast distributed data structures, the data locality is utilized.

See the Docs for this feature
Stream Processing


As data streams are unbounded and there is the possibility of infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provides a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – The stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – Windows have fixed length, but are separated by a sliding interval, so they can overlap.
  • Session – Windows have various sizes and are defined based on session identifiers contained in the records. Sessions are closed after a period of inactivity (timeout).
See the Docs for this feature

Event Time Processing

Hazelcast Jet allows you to classify records in a data stream based on the timestamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, which results in the discarding of the latter.

See the Docs for this feature

Handling Back Pressure

In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not be left idle wasting resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every part of the Jet Job keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

See the Docs for this feature

Distributed Snapshots

Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.

Jet supports distributed state snapshots and automatic restarts. Snapshots are periodically created and backed up. When there is a failure (network failure or split, node failure), Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.

See the Docs for this feature

Resilient Snapshot Storage

Hazelcast Jet uses the distributed in-memory storage to back-up the computation regularly. This storage is integral component of the Jet cluster, no further infrastructure is necessary. Data are stored in multiple replicas distributed across the cluster to increase the resiliency.

See the Docs for this feature

Exactly-Once or At-Least Once

For snapshot creation, exactly-once or at-least-once semantics can be used. This is a trade-off between correctness and performance. It is configured per job.

See the Docs for this feature


Hazelcast Jet supports the scenario where a new member joins the cluster while a job is running. Currently, the ongoing job will not be replanned to start using the member, although this is on the roadmap for future versions. The new member can also leave the cluster while the job is running and this won’t affect its progress.

See the Docs for this feature
In-Memory Data Grid

Hazelcast IMDG Integration

Hazelcast Jet is integrated with Hazelcast IMDG, an elastic in-memory storage, to provide a highly optimized read and write to distributed, in-memory implementations of java.util.Map, java.util.Cache and java.util.List.

IMDG is to be used for:

  • Data ingestion prior to processing.
  • Connecting multiple Jet jobs using IMDG as an intermediate buffer to keep jobs loosely coupled.
  • Enriching processed events, cache remote data, e.g., fact tables from a database on Jet nodes making use of data locality.
  • Distributing Jet processed data with Hazelcast IMDG.
  • Running advanced data processing tasks on top of Hazelcast data structures.
See the Docs for this feature

Embedded IMDG

The complete Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shut down, scaling etc.)

To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.

See the Docs for this feature

Hazelcast IMDG

High-performance readers and writers for Hazelcast IMap, ICache and IList. The IMap and ICache are partitioned and distributed. Jet makes use of data locality reading the data from the same node to prevent network transit penalty.

The streaming connectors for IMap and ICache allow the user to treat the Hazelcast distributed map itself as a streaming source, where an event is created for every change that happens on the map. This allows the map to be used as a source of events during a streaming job.

See the Docs for this feature

Kafka Connector

Hazelcast Jet utilizes message brokers for ingesting data streams and it is able to work as a data processor connected to a message broker in the data pipeline.

Jet comes with a Kafka connector for reading from and writing to the Kafka topics.

See the Docs for this feature


Hadoop Distributed File System (HDFS) is a common file system used for building large, low cost data warehouses and data lakes. Hazelcast Jet can use HDFS as either data source or data sink. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without incurring network transit latency penalty.

See the Docs for this feature

Local data files

Jet comes with batch and streaming filer readers to process local data (e.g. CSVs or logs). The batch reader processes lines from a file or directory. The streamer watches file or directory for changes streaming the new lines to Jet.

See the Docs for this feature


The socket connector allows Jet Jobs to read text data stream from the socket. Every line is processed as one record.

See the Docs for this feature

Custom Sources and Sinks

Hazelcast Jet provides a flexible API that makes it easy to implement your own custom sources and sinks. Both sources and sinks are implemented using the same API as the rest of the Processors.

See the Docs for this feature
Cloud and Virtualization Support

Amazon Web Services

Hazelcast IMDG AWS cloud module helps Hazelcast cluster members discover each other and form the cluster on AWS. It also supports tagging, IAM Role, and connecting clusters from clients outside the cloud.

Get this Plugin
See the Docs for this feature

Apache jclouds

Hazelcast IMDG supports the Apache jclouds API, allowing applications to be deployed in multiple different cloud infrastructure ecosystems in an infrastructure-agnostic way.

Get this Plugin

Apache Zookeeper Discovery

The Hazelcast IMDG Zookeeper Discovery Plugin provides a service based discovery strategy by using Apache Curator for communicating your Zookeeper server for Hazelcast IMDG 3.6.1+ Discovery SPI enabled applications.

Get this Plugin
See the Docs for this feature

Azure Cloud Discovery

Azure DiscoveryStrategy is for Hazelcast IMDG 3.6.1 and above. It provides all Hazelcast instances in a cluster by returning VMs within your Azure resource group that are tagged with a specified value.

Get this Plugin
See the Docs for this feature

Pivotal Cloud Foundry

Hazelcast IMDG Enterprise for Pivotal Cloud Foundry is an on-demand service broker that dynamically creates and scales Hazelcast IMDG clusters without pre-provisioning blocks of Virtual Machines.

Get this Plugin
See this blog for more details

OpenShift Container Platform

Hazelcast IMDG Docker Image is an extension of official Hazelcast Docker image packed with Kubernetes discovery plugin which enables deployment of Hazelcast on your Openshift platform as managed cache service.

Get this Plugin
See the Docs for this feature


Consul is a tool for discovering and configuring services in your infrastructure. It provides: Service Discovery, Health Checking, Key/Value Store, and Multi Datacenter support out-of-the-box.

Get this Plugin
See the Docs for this feature
Visit the Open Source Home of Hazelcast Jet