Hazelcast IMDG

Hazelcast Jet

Ultra Fast Event Stream Processing Engine — Lightweight, Embeddable, Powerful

Companies that know how to leverage event streaming data make smarter, and sometimes faster, business decisions. Event-driven microservices architectures also allow digital businesses to maximize agility and the freedom to enable frictionless experimentation as they pursue innovation. Hazelcast Jet® is a 3rd generation event stream processing engine. It is an application embeddable, distributed computing platform for building IoT and microservices-based applications. The Hazelcast Jet architecture is high performance and low latency driven, based on a parallel, distributed core engine which enables data-intensive applications to operate at near real-time speeds. Hazelcast Jet is built on the foundation of Hazelcast IMDG, the leading in-memory data grid and one of the top data stores for microservices deployments. Hazelcast Jet processing jobs take full advantage of the distributed in-memory data structures provided by Hazelcast IMDG to:

  • Write operational results of a computation job to an in-memory NoSQL key-value store
  • Cache data for record enrichment, pre-processing and data cleaning
  • Enable in-memory messaging and notifications
  • Process the data stored in the IMDG making use of a data locality
  • Achieve breakthrough application speed by keeping both the computation and data in-memory
Hazelcast Jet Architecture Diagram

Typical use cases for Hazelcast Jet include:

  • Complex event processing
  • Data processing microservice architectures
  • Fast batch processing
  • Implementing Event Sourcing and CQRS
  • In-store e-commerce systems
  • Internet of Things (IoT) data ingestion, processing and storage
  • Online trading
  • Real-time ETL
  • Social media platforms
  • Streaming analytics

Hazelcast Jet Performance

Hazelcast Jet Performance Benchmark

Hazelcast Jet Features

Distributed Computation

Hazelcast Jet Enterprise

Hazelcast Jet Professional

Pipeline API

Pipeline API is the primary API of Hazelcast Jet for processing both bounded and unbounded data. Use it to declare the data processing pipelines by composing high-level operations (such as map, filter, group, join, window) on a stream of records. The Pipeline API is a Java 8 API with static type safety.

Core API

Core API is a Jet’s low-level API that directly exposes the computation engine’s raw features (DAGs, partitioning schemes, vertex parallelism, distributed vs. local edges, etc.).
The purpose of Core API is to serve as the infrastructure on top of which to build high-level DSLs and APIs that describe computation jobs.
Use Core API for low-level control over the data flow, for fine-tuning performance or for building DSLs.

  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional
Stream Processing

Streaming Core

Hazelcast Jet is built on top of a low latency streaming core. This refers to processing the incoming records as soon as possible as opposed to accumulating the records into micro-batches before processing.

  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional

Windowing

As data streams are unbounded and there is the possibility of infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provides a tool to define the grouping.

Types of windows supported by Jet:

  • Fixed/tumbling – The stream is divided into same-length, non-overlapping chunks. Each event belongs to exactly one window.
  • Sliding – Windows have fixed length, but are separated by a sliding interval, so they can overlap.
  • Session – Windows have various sizes and are defined based on session identifiers contained in the records. Sessions are closed after a period of inactivity (timeout).
  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional

Event Time Processing

Hazelcast Jet allows you to classify records in a data stream based on the timestamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.

For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, which results in the discarding of the latter.

  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional

Handling Back Pressure

In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not be left idle wasting resources.

Hazelcast Jet comes with a mechanism to handle back pressure. Every part of the Jet job keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.

  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional
Elasticity

Elasticity

Hazelcast Jet is elastic — it is able to dynamically re-scale to adapt to workload changes.

When the cluster extends or shrinks, running jobs can be automatically replanned to make use of all available resources.

  • Hazelcast Jet Enterprise
  • Hazelcast Jet Professional

Fault-Tolerance

Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.

When there is a fault, Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.

Exactly-Once or At-Least Once

Hazelcast Jet supports distributed state snapshots. Snapshots are periodically created to back up the running state. Periodic snapshots are used as a consistent point of recovery for failures. Snapshots are also taken and used for up-scaling.

For snapshot creation, exactly-once or at-least-once semantics can be used. This is a trade-off between correctness and performance. It is configured per job.

Resilient Snapshot Storage

Hazelcast Jet uses the distributed in-memory storage to store the snapshots. This storage is an integral component of the Jet cluster, no further infrastructure is necessary. Data is stored in multiple replicas distributed across the cluster to increase the resiliency.

In-Memory Data Grid

Hazelcast IMDG Integration

Hazelcast Jet is integrated with Hazelcast IMDG, an elastic in-memory storage, to provide a highly optimized read and write to distributed, in-memory implementations of java.util.Map, java.util.Cache and java.util.List. IMDG is to be used for:

  • Data ingestion prior to processing.
  • Connecting multiple Jet jobs using IMDG as an intermediate buffer to keep jobs loosely coupled.
  • Enriching processed events, cache remote data, e.g., fact tables from a database on Jet nodes making use of data locality.
  • Distributing Jet processed data with Hazelcast IMDG.
  • Running advanced data processing tasks on top of Hazelcast data structures.

Embedded IMDG

Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shut down, scaling etc.)

To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.

Streaming from IMDG

In Hazelcast Jet a connector is included which allows the user to process streams of changes (Event Journal) of an IMap and ICache, enabling developers to stream process IMap/ICache data or to use Hazelcast IMDG as storage for data ingestion.

Connectors

Hazelcast IMDG

High-performance readers and writers for Hazelcast IMap, ICache and IList. The IMap and ICache are partitioned and distributed. Jet makes use of data locality reading the data from the same node to prevent network transit penalty.

The streaming connectors for IMap and ICache allow the user to treat the Hazelcast distributed map itself as a streaming source, where an event is created for every change that happens on the map. This allows the map to be used as a source of events during a streaming job.

Kafka Connector

Hazelcast Jet utilizes message brokers for ingesting data streams and it is able to work as a data processor connected to a message broker in the data pipeline.

Jet comes with a Kafka connector for reading from and writing to the Kafka topics.

JMS Connector

Java Messaging Services is a traditional means for implementing an enterprise integration. Jet JMS connector allows you to stream messages from/to a JMS queue or a JMS topic using a JMS Client on a classpath (such as ActiveMQ or RabbitMQ). Reading from the queue can be parallelized for higher throughput.

JDBC

Hazelcast Jet JDBC connector can be used to read or write the data from/to relational databases or another source that supports the standard JDBC API. It’s a batch connector that executes a SQL query and sends the result to the Jet pipeline. It supports parallel reading for partitioned sources.

HDFS

Hadoop Distributed File System (HDFS) is a common file system used for building large, low cost data warehouses and data lakes. Hazelcast Jet can use HDFS as either a data source or data sink. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without incurring network transit latency penalty.

Avro

Jet can read and write Avro-serialized data from the self-contained files (Avro Object Container format), HDFS and Kafka. A Kafka connector can be configured to use the schema registry.

Local Data Files

Hazelcast Jet comes with batch and streaming file readers to process local data (e.g. CSVs or logs). The batch reader processes lines from a file or directory. The streamer watches the file or directory for changes, streaming the new lines to Hazelcast Jet.

Sockets

The socket connector allows Jet jobs to read text data streams from the socket. Every line is processed as one record.

Custom Sources and Sinks

Hazelcast Jet provides a flexible API that makes it easy to implement your own custom sources and sinks. Here are the code samples to be used as a template.

Cluster Management

Management Center

Management Center enables you to monitor and manage your Hazelcast Jet cluster. In addition to monitoring the overall health of your cluster, you can also analyze the data flow of the distributed pipelines. Management Center provides visual tools to inspect running jobs and detect potential bottlenecks.

Crucially, developers can observe clusters in real-time and gain far more insight into what is occurring “under the hood”.

Continuous Operations

Lossless Recovery (coming with Jet version 1.0)

Hazelcast Jet uses persistence to back up it’s state snapshots regularly. Jobs, Job State, Job Configuration is configured to be persistent with Hazelcast’s Hot Restart capability. Computations are restarted from where they left off after the cluster is online.

Rolling Job Upgrades (coming with Jet version 1.0)

Rolling Job Upgrades allow jobs to be upgraded without data loss or interruption and makes use of Jet state snapshots to switch to new job version in milliseconds.

Security Suite

SSL/TLS 1.2 Asymmetric Encryption

Provides encryption based on TLS Certificates between members, between clients and members, and between members and Management Center.

SSL/TLS 1.2 Asymmetric Encryption with OpenSSL

Uses the SSLEngine built in to the JDK with some performance enhancements.

Secure Connectors

Connectors are used to connect the Jet job with data sources and sinks. Secure connections to external systems combined with security within the Jet cluster make the data pipeline secure end-to-end.

The following Jet connectors do have security features: Hazelcast IMDG, Kafka, JDBC, JMS.

Authentication

The authentication mechanism for Hazelcast client security works the same as cluster member authentication. To implement client authentication, configure a Credential and one or more LoginModules. The client side does not have and does not need a factory object to create Credentials objects like ICredentialsFactory. Credentials must be created at the client side and sent to the connected node during the connection process.

Authorization

Hazelcast client authorization is configured by a client permission policy. Hazelcast has a default permission policy implementation that uses permission configurations defined in the Hazelcast security configuration. Default policy permission checks are done against instance types (map, queue, etc.), instance names (map, queue, name, etc.), instance actions (put, read, remove, add, etc.), client endpoint addresses, and client principal defined by the Credentials object. Instance and principal names and endpoint addresses can be defined as wildcards(*).

Symmetric Encryption

In symmetric encryption, each node uses the same key, so the key is shared.

JAAS Module

Hazelcast has an extensible, JAAS-based security feature you can use to authenticate both cluster members and clients, and to perform access control checks on client operations. Access control can be done according to endpoint principal and/or endpoint address.

Pluggable Socket Interceptor

Hazelcast allows you to intercept socket connections before a node joins to a cluster or a client connects to a node. This provides the ability to add custom hooks to join and perform connection procedures (like identity checking using Kerberos, etc.).

Security Interceptor

Hazelcast IMDG allows you to intercept every remote operation executed by the client. This lets you add a flexible custom security logic.

Cloud and Virtualization Support

Cloud Provider

Hazelcast Jet can be extended by cloud plugins, allowing applications to be deployed in different cloud infrastructure ecosystems.

See Hazelcast Jet Cloud Plugins: Amazon Web Services, Microsoft Azure, Pivotal Cloud Foundry, OpenShift Container Platform, Docker, Apache JClouds, Consul Discovery, Apache ZooKeeper Discovery

Pivotal Cloud Foundry

Hazelcast for Pivotal Cloud Foundry is an on-demand service broker that dynamically creates and scales Hazelcast Jet clusters without pre-provisioning blocks of Virtual Machines.

OpenShift Container Platform (Coming with Jet version 1.0)

Hazelcast Jet Docker image is an extension of the official Hazelcast Docker image with a Kubernetes discovery plugin which enables deployment of Hazelcast on your OpenShift platform as a data processing service.