Companies need a data-processing solution that increases the speed of business agility, not one that is complicated by too many technology requirements. This requires a system that delivers continuous/real-time data-processing capabilities for the new business reality.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing?
In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Setting up servers and configuring software can get in the way of the problems you are trying to solve. With Hazelcast Cloud we take all of those pain points away.
Watch this webinar to learn how you can instantly fire up and then work with Hazelcast Cloud from anywhere in the world. With our auto-generated client stubs for Java, Go, Node.js, Python and .NET, we can have you connected and coding in less than a minute!
Compare features across the Hazelcast Jet product family.
HAZELCAST JET ENTERPRISE
HAZELCAST JET PRO
HAZELCAST JET OPEN SOURCE
Pipeline API is the primary API of Hazelcast Jet for processing both bounded and unbounded data. Use it to declare the data processing pipelines by composing high-level operations (such as map, filter, group, join, window) on a stream of records. The Pipeline API is a Java 8 API with static type safety.
Core API is a Jet’s low-level API that directly exposes the computation engine’s raw features (DAGs, partitioning schemes, vertex parallelism, distributed vs. local edges, etc.).
The purpose of Core API is to serve as the infrastructure on top of which to build high-level DSLs and APIs that describe computation jobs.
Use Core API for low-level control over the data flow, for fine-tuning performance or for building DSLs.
Hazelcast Jet is built on top of a low latency streaming core. This refers to processing the incoming records as soon as possible as opposed to accumulating the records into micro-batches before processing.
As data streams are unbounded and there is the possibility of infinite sequences of records, a tool is required to group individual records to finite frames in order to run the computation. Hazelcast Jet windows provides a tool to define the grouping.
Types of windows supported by Jet:
Hazelcast Jet allows you to classify records in a data stream based on the timestamp embedded in each record — the event time. Event time processing is a natural requirement as users are mostly interested in handling the data based on the time that the event originated (the event time). Event time processing is a first-class citizen in Jet.
For handling late events, there is a set of policies to determine whether the event is “still on time” or “late”, which results in the discarding of the latter.
In the streaming system, it is necessary to control the flow of messages. The consumer cannot be flooded by more messages than it can process in a fixed amount of time. On the other hand, the processors should not be left idle wasting resources.
Hazelcast Jet comes with a mechanism to handle back pressure. Every part of the Jet job keeps signaling to all the upstream producers how much free capacity it has. This information is naturally propagated upstream to keep the system balanced.
Hazelcast Jet supports distributed state snapshots. Snapshots are periodically created to back up the running state. Periodic snapshots are used as a consistent point of recovery for failures. Snapshots are also taken and used for up-scaling.
For snapshot creation, exactly-once or at-least-once semantics can be used. This is a trade-off between correctness and performance. It is configured per job.
Hazelcast Jet ensures exactly-once semantics when a replayable source (e.g., Kafka) is used with an idempotent sink (e.g., any store with upsert functionality).
Hazelcast Jet supports the two-phase commit protocol to enable exactly-once guarantees on more types of sources and sinks to participate in transaction-based streaming.
This capability currently adds JMS as a source, and both Kafka and files as sinks, with more planned.
Hazelcast Jet is able to tolerate faults such as network failure, split or node failure.
When there is a fault, Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot.
Hazelcast Jet uses the distributed in-memory storage to store the snapshots. This storage is an integral component of the Jet cluster, no further infrastructure is necessary. Data is stored in multiple replicas distributed across the cluster to increase the resiliency.
IN-MEMORY DATA GRID
Hazelcast Jet is integrated with Hazelcast IMDG, an elastic in-memory storage, to provide a highly optimized read and write to distributed, in-memory implementations of java.util.Map, java.util.Cache and java.util.List. IMDG is to be used for:
Hazelcast IMDG is embedded in Jet. So, all the services of IMDG are available to your Jet jobs without any additional deployment effort. As Hazelcast IMDG is the embedded, support structure for Jet, IMDG is fully controlled by Jet (start, shut down, scaling etc.)
To isolate the processing from the storage, you can still make use of Jet reading from or writing to remote Hazelcast IMDG clusters.
In Hazelcast Jet a connector is included which allows the user to process streams of changes (Event Journal) of an IMap and ICache, enabling developers to stream process IMap/ICache data or to use Hazelcast IMDG as storage for data ingestion.
High-performance readers and writers for Hazelcast IMap, ICache and IList. The IMap and ICache are partitioned and distributed. Jet makes use of data locality reading the data from the same node to prevent network transit penalty.
The streaming connectors for IMap and ICache allow the user to treat the Hazelcast distributed map itself as a streaming source, where an event is created for every change that happens on the map. This allows the map to be used as a source of events during a streaming job.
Hazelcast Jet utilizes message brokers for ingesting data streams and it is able to work as a data processor connected to a message broker in the data pipeline.
Jet comes with a Kafka connector for reading from and writing to the Kafka topics.
Java Messaging Services is a traditional means for implementing an enterprise integration. Jet JMS connector allows you to stream messages from/to a JMS queue or a JMS topic using a JMS Client on a classpath (such as ActiveMQ or RabbitMQ). Reading from the queue can be parallelized for higher throughput.
Hazelcast Jet JDBC connector can be used to read or write the data from/to relational databases or another source that supports the standard JDBC API. It’s a batch connector that executes a SQL query and sends the result to the Jet pipeline. It supports parallel reading for partitioned sources.
Hadoop Distributed File System (HDFS) is a common file system used for building large, low cost data warehouses and data lakes. Hazelcast Jet can use HDFS as either a data source or data sink. If Jet and HDFS clusters are co-located, then Jet benefits from the data locality and processes the data from the same node without incurring network transit latency penalty.
Jet can read and write Avro-serialized data from the self-contained files (Avro Object Container format), HDFS and Kafka. A Kafka connector can be configured to use the schema registry.
Hazelcast Jet comes with batch and streaming file readers to process local data (e.g. CSVs or logs). The batch reader processes lines from a file or directory. The streamer watches the file or directory for changes, streaming the new lines to Hazelcast Jet.
The socket connector allows Jet jobs to read text data streams from the socket. Every line is processed as one record.
Hazelcast Jet provides a flexible API that makes it easy to implement your own custom sources and sinks. Here are the code samples to be used as a template.
Hazelcast Jet supports the use of any Kafka Connect module without the presence of a Kafka cluster. This adds more sources and sinks to the Jet ecosystem. This feature includes full support for fault-tolerance and replaying.
Hazelcast Jet supports change data capture through integration with Debezium and Striim. Debezium provides CDC integration with SQL Server, MySQL, PostgreSQL, MongoDB, and Cassandra. Striim provides CDC integration with Oracle.
Management Center enables you to monitor and manage your Hazelcast Jet cluster. In addition to monitoring the overall health of your cluster, you can also analyze the data flow of the distributed pipelines. Management Center provides visual tools to inspect running jobs and detect potential bottlenecks.
Crucially, developers can observe clusters in real-time and gain far more insight into what is occurring “under the hood”.
Hazelcast Jet is elastic — it is able to dynamically re-scale to adapt to workload changes.
When the cluster extends or shrinks, running jobs can be automatically replanned to make use of all available resources.
Hazelcast Jet uses persistence to back up it’s state snapshots regularly. Jobs, Job State, Job Configuration is configured to be persistent with Hazelcast’s Hot Restart capability. Computations are restarted from where they left off after the cluster is online.
Allow jobs to be upgraded without data loss or interruption and makes use of Jet state snapshots to switch to new job version in milliseconds.
Provides encryption based on TLS Certificates between members, between clients and members, and between members and Management Center.
Uses the SSLEngine built in to the JDK with some performance enhancements.
Connectors are used to connect the Jet job with data sources and sinks. Secure connections to external systems combined with security within the Jet cluster make the data pipeline secure end-to-end.
The following Jet connectors do have security features: Hazelcast IMDG, Kafka, JDBC, JMS.
Allowed Connection IP Ranges
The authentication mechanism for Hazelcast client security works the same as cluster member authentication. To implement client authentication, configure a Credential and one or more LoginModules. The client side does not have and does not need a factory object to create Credentials objects like ICredentialsFactory. Credentials must be created at the client side and sent to the connected node during the connection process.
Hazelcast client authorization is configured by a client permission policy. Hazelcast has a default permission policy implementation that uses permission configurations defined in the Hazelcast security configuration. Default policy permission checks are done against instance types (map, queue, etc.), instance names (map, queue, name, etc.), instance actions (put, read, remove, add, etc.), client endpoint addresses, and client principal defined by the Credentials object. Instance and principal names and endpoint addresses can be defined as wildcards(*).
In symmetric encryption, each node uses the same key, so the key is shared.
Hazelcast has an extensible, JAAS-based security feature you can use to authenticate both cluster members and clients, and to perform access control checks on client operations. Access control can be done according to endpoint principal and/or endpoint address.
Hazelcast allows you to intercept socket connections before a node joins to a cluster or a client connects to a node. This provides the ability to add custom hooks to join and perform connection procedures (like identity checking using Kerberos, etc.).
Hazelcast IMDG allows you to intercept every remote operation executed by the client. This lets you add a flexible custom security logic.
CLOUD AND VIRTUALIZATION SUPPORT
Hazelcast Jet can be extended by cloud plugins, allowing applications to be deployed in different cloud infrastructure ecosystems.
See Hazelcast Jet Cloud Plugins: Amazon Web Services, Microsoft Azure, Docker, Apache JClouds, Consul Discovery, Apache ZooKeeper Discovery
Hazelcast Jet Docker image is an extension of the official Hazelcast Docker image with a Kubernetes discovery plugin which enables deployment of Hazelcast on your OpenShift platform as a data processing service.
We can help you decide which option is best for you.
All fields marked * are required
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.