Hazelcast Jet

Hazelcast Jet

Ultra Fast Stream and Batch Processing — Lightweight, Embeddable, Powerful

Hazelcast Jet® is the 3rd Generation Big Data Processing Engine. It is an application embeddable, distributed computing platform for fast processing of big data sets. The Hazelcast Jet architecture is high performance and low latency driven, based on a parallel, streaming core engine which enables data-intensive applications to operate at near real-time speeds.

Hazelcast Jet is built on top of Hazelcast IMDG®, the leading open source in-memory data grid with tens of thousands of installed clusters. Hazelcast Jet processing jobs take full advantage of the distributed in-memory data structures provided by Hazelcast IMDG to:

  • Write operational results of a computation job to an in-memory NoSQL key-value store
  • Cache data for record enrichment, pre-processing and data cleaning
  • Enable in-memory messaging and notifications
  • Process the data stored in the IMDG making use of a data locality
  • Achieve breakthrough application speed by keeping both the computation and data in-memory
Jet Internet of Things Diagram

Typical use cases for Hazelcast Jet include:

  • Complex event processing
  • Data processing microservice architectures
  • Fast batch processing
  • Implementing Event Sourcing and CQRS
  • In-store e-commerce systems
  • Internet of Things (IoT) data ingestion, processing and storage
  • Online trading
  • Real-time big data processing
  • Social media platforms
  • Streaming analytics
  • System log events

Hazelcast Jet Performance

Hazelcast Jet Streaming Processing PlatformSee Jet Performance for the architectural choices in Jet behind our breakthrough performance.

Hazelcast Jet Features

Hazelcast Jet Architecture Diagram
Distributed Computation

Hazelcast Jet
Professional

Hazelcast Jet
Open Source

Pipeline API

Pipeline API is the primary API of Hazelcast Jet. It follows the general shape pattern of any data processing pipeline: drawFromSource -> transform -> drainToSink.

The Jet library contains a set of Transforms covering standard data operations (map, filter, group, join). Also there are Source and Sink adapters available. The library can be extended by custom Transforms, Sources or Sinks.

See the Docs for this feature

Distributed Java 8 Streams

java.util.stream API is a well known and popular API in the Java community. It supports functional style operations on streams of elements.

Hazelcast Jet shifts java.util.stream to a distributed world – the processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast® distributed data structures, the data locality is utilized.

See the Docs for this feature

Stream Processing

Streaming Core

Hazelcast Jet is built on top of a low latency streaming core. This refers to processing the incoming records as soon as possible as opposed to accumulating the records into micro-batches before processing.

Windowing

As data streams are unbounded and there is the possibility of infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provides a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – The stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – Windows have fixed length, but are separated by a sliding interval, so they can overlap.
  • Session – Windows have various sizes and are defined based on session identifiers contained in the records. Sessions are closed after a period of inactivity (timeout).

See the Docs for this feature

Event Time Processing

Hazelcast Jet allows you to classify records in a data stream based on the timestamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, which results in the discarding of the latter.

See the Docs for this feature

Handling Back Pressure

In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not be left idle wasting resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every part of the Jet Job keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

See the Docs for this feature

Elasticity

Distributed Snapshots

Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.

Jet supports distributed state snapshots and automatic restarts. Snapshots are periodically created and backed up. When there is a failure (network failure or split, node failure), Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.

See the Docs for this feature

Resilient Snapshot Storage

Hazelcast Jet uses the distributed in-memory storage to back-up the computation regularly. This storage is integral component of the Jet cluster, no further infrastructure is necessary. Data are stored in multiple replicas distributed across the cluster to increase the resiliency.

See the Docs for this feature

Exactly-Once or At-Least Once

For snapshot creation, exactly-once or at-least-once semantics can be used. This is a trade-off between correctness and performance. It is configured per job.

See the Docs for this feature

Elasticity

Hazelcast Jet supports the scenario where a new member joins the cluster while a job is running. Running jobs can be restarted from the last snapshot to make use of added resources.

See the Docs for this feature

In-Memory Data Grid

Hazelcast IMDG Integration

Hazelcast Jet is integrated with Hazelcast IMDG, an elastic in-memory storage, to provide a highly optimized read and write to distributed, in-memory implementations of java.util.Map, java.util.Cache and java.util.List.

IMDG is to be used for:

  • Data ingestion prior to processing.
  • Connecting multiple Jet jobs using IMDG as an intermediate buffer to keep jobs loosely coupled.
  • Enriching processed events, cache remote data, e.g., fact tables from a database on Jet nodes making use of data locality.
  • Distributing Jet processed data with Hazelcast IMDG.
  • Running advanced data processing tasks on top of Hazelcast data structures.

See the Docs for this feature

Embedded IMDG

The complete Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shut down, scaling etc.)

To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.

See the Docs for this feature

Streaming from IMDG

In Hazelcast Jet a connector is included which allows the user to process streams of changes (Event Journal) of an IMap and ICache, enabling developers to stream process IMap/ICache data or to use Hazelcast IMDG as storage for data ingestion.

Connectors

Hazelcast IMDG

High-performance readers and writers for Hazelcast IMap, ICache and IList. The IMap and ICache are partitioned and distributed. Jet makes use of data locality reading the data from the same node to prevent network transit penalty.

The streaming connectors for IMap and ICache allow the user to treat the Hazelcast distributed map itself as a streaming source, where an event is created for every change that happens on the map. This allows the map to be used as a source of events during a streaming job.

See the Docs for this feature

Kafka Connector

Hazelcast Jet utilizes message brokers for ingesting data streams and it is able to work as a data processor connected to a message broker in the data pipeline.

Jet comes with a Kafka connector for reading from and writing to the Kafka topics.

See the Docs for this feature

HDFS

Hadoop Distributed File System (HDFS) is a common file system used for building large, low cost data warehouses and data lakes. Hazelcast Jet can use HDFS as either data source or data sink. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without incurring network transit latency penalty.

See the Docs for this feature

Local data files

Hazelcast Jet comes with batch and streaming filer readers to process local data (e.g. CSVs or logs). The batch reader processes lines from a file or directory. The streamer watches the file or directory for changes, streaming the new lines to Hazelcast Jet.

See the Docs for this feature

Sockets

The socket connector allows Jet Jobs to read text data stream from the socket. Every line is processed as one record.

See the Docs for this feature

Custom Sources and Sinks

Hazelcast Jet provides a flexible API that makes it easy to implement your own custom sources and sinks. Both sources and sinks are implemented using the same API as the rest of the processors.

See the Docs for this feature

Cloud and Virtualization Support

Cloud Provider

Hazelcast Jet can be extended by cloud plugins, allowing applications to be deployed in different cloud infrastructure ecosystems.

See Hazelcast Jet Cloud Plugins: Amazon Web Services, Microsoft Azure, Pivotal Cloud Foundry, OpenShift Container Platform, Docker, Apache JClouds, Consul Discovery, Apache ZooKeeper Discovery

Visit the Open Source home of Hazelcast Jet

Hazelcast.com

Menu