Questions? Feedback? powered by Olark live chat software
Hazelcast Jet

Introducing Hazelcast Jet

In-Memory Stream and Batch Processing—Light-weight, Embeddable, Powerful

Hazelcast Jet is a distributed computing platform for fast processing of big data sets. With Hazelcast In-Memory Data Grid (IMDG) providing storage functionality, Hazelcast Jet performs parallel execution enabling data-intensive applications to operate in real-time. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both stream and batch-based data processing applications simultaneously because it is continuously operating. Hazelcast Jet is an Apache 2 licensed open source project.

Hazelcast Jet is built on top of a one-record-per-time architecture (sometimes known as continuous operators). This means that it processes incoming records as soon as possible, opposed to accumulating records into micro-batches, consequently lowering latency for applications. Jet ingests data at high-velocity (via socket, file, HDFS or Kafka interfaces), and processes the business logic or complex computation on incoming data. A pure in-memory approach, Jet is 20x faster than Hadoop, enabling users to meet service-level requirements.

Breakthrough application speed is achieved by keeping both the computation and data storage in memory. Embedded Hazelcast IMDG provides elastic in-memory storage and is a great tool for distributing the results of the computation as well as a caching data sets used during the computation. Extremely low end-to-end latencies can be achieved with Jet.

It is extremely simple to program and deploy – in particular Jet can be embedded for IoT and Microservices architectures – making it easier for development teams and ISVs to build and maintain next generation systems.

Jet Technical Architecture

Typical use cases for Hazelcast Jet include:

  • Real-time (low-latency) stream processing
  • Implementing Change Data Capture (CDC)
  • Moving from batch to stream processing
  • Fast batch processing
  • Internet-of-things (IoT) data ingestion, processing and storage
  • Data processing microservice architectures

Hazelcast Jet Performance

See Jet Performance for the architectural choices in Jet behind our breakthrough performance.

Hazelcast Jet Features

Distributed Computation

Hazelcast Jet
Professional

Hazelcast Jet
Open Source

Distributed DAG Execution

Hazelcast Jet uses DAGs to model your data processing tasks also known as Jet Jobs. The Jet Job is composed of processors — units of parallel processing such as data source readers, joiners, sorters, aggregators, filters, mappers and output writers. These nodes are connected by the edges representing the data flow. Hazelcast Jet provides a low-latency and high-throughput distributed DAG execution.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Jet Core API

Hazelcast Jet provides a convenience API to implement a DAG representing the Jet Job.

See the Docs for this Client
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Distributed j.u.stream

Java.util.stream (j.u.s.) API is a well-known and popular API in the Java community. It supports functional-style operations on streams of elements. Jet shifts java.util.stream to a distributed world – processing is distributed across the Jet cluster and parallelized. If j.u.s is used on top of Hazelcast’s distributed data structures, the data locality is utilized.

Jet adds support for java.util.stream API to Hazelcast IMDG collections.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source
Distributed Caching

Embedded Hazelcast IMDG

Hazelcast Jet is integrated with Hazelcast IMDG to provide a highly optimized read and write to distributed, in-memory implementations of java.util.Map and java.util.List.

Users can take benefit of the embedded IMDG instance or the remote IMDG cluster.

The Hazelcast IMDG instance is embedded in Jet. Jet can use the embedded IMDG as source and sink for data and make use of data locality. The embedded IMDG is fully controlled by Jet (start, shutdown, scaling etc.).

Embedded in-memory data grid works well for:

  • Sharing the processing state among Jet Jobs
  • Caching intermediate processing results
  • Enriching processed events, and caching remote data (e.g. fact tables from database) on Jet nodes
  • Running advanced data processing tasks on top of Hazelcast data structures
  • Development requirements–since starting the Jet cluster is so simple and fast

The Hazelcast IMDG connector is used to read records from and write output to a remote Hazelcast IMDG instance.

Use remote Hazelcast IMDG cluster for:

  • Sharing state or intermediate results among more Jet clusters
  • Isolate the Jet processing cluster from the IMDG operational data storage cluster
  • More control over your Hazelcast IMDG cluster with the embedded IMDG being managed by Jet
See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source
Connectors

Custom Sources and Sinks

Hazelcast Jet provides a flexible API that makes it easy to implement your own custom sources and sinks. Both sources and sinks are implemented using the same API as the rest of the Processors.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Kafka Connector

Hazelcast Jet utilizes message brokers for ingesting data streams and it is able to work as a data processor connected to a message broker in the data pipeline.

Jet comes with a Kafka connector for reading from and writing to the Kafka topics.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

HDFS

Hadoop Distributed File System (HDFS) is a common file system used for building large, low cost data warehouses and data lakes. Hazelcast Jet can use HDFS as either data source or data sink. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without incurring network transit latency penalty.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Hazelcast IMDG

The Python client is the reference implementation of the new Hazelcast Client Binary Protocol. Hazelcast’s robust In-Memory Data Grid is now available to Python applications.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Local data files

The local data reader watches specific directory and feeds Jet Job with data from local files (e.g. CSVs or logs).

  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Sockets

The socket connector allows Jet Jobs to read text data stream from the socket. Every line is processed as one record.

  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source
Cloud and Virtualization Support

Amazon Web Services

Hazelcast AWS cloud module helps Hazelcast cluster members discover each other and form the cluster on AWS. It also supports tagging, IAM Role, and connecting clusters from clients outside the cloud.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Azure Cloud Discovery

Azure DiscoveryStrategy provides all Hazelcast instances in a cluster by returning VMs within your Azure resource group that are tagged with a specified value.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Discovery Service Provider Interface (SPI)

The Hazelcast Discovery is an extension SPI to attach external cloud discovery mechanisms. Discovery finds other Hazelcast instances based on filters and provides their corresponding IP addresses.

The SPI ships with support for Apache jclouds and Google’s Kubernetes as reference implementations.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Docker

Docker containers wrap up Hazelcast in a complete filesystem that contains everything it needs to run – code, runtime, system tools, system libraries – guaranteeing that it will always run the same, regardless of the environment it is running in.

You can deploy your Hazelcast projects using the Docker containers. Hazelcast has the following images on Docker:

  • Hazelcast
  • Hazelcast Enterprise
  • Hazelcast Management Center
  • Hazelcast OpenShift
See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Kubernetes

Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Apache jclouds

Hazelcast supports the Apache jclouds API, allowing applications to be deployed in multiple different cloud infrastructure ecosystems in an infrastructure-agnostic way.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Zookeper Discovery

The Hazelcast Zookeeper Discovery plugin provides a service based discovery strategy by using Apache Curator for communicating your Zookeeper server for Hazelcast 3.6.1+ Discovery SPI enabled applications.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source
Deployment & Resource Management

Standalone

  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Mesos

Hazelcast Mesos Integration module gives you the ability to deploy Hazelcast Jet on the Mesos cluster. Since it depends on Hazelcast Zookeeper module for discovery, the deployed version of Hazelcast on Mesos cluster should not be lesser than 3.6.

See the Docs for this feature
  • Hazelcast Jet Professional
  • Hazelcast Jet Open Source

Hazelcast.com

Menu