This short video explains why companies use Hazelcast for business-critical applications based on ultra-fast in-memory and/or stream processing technologies.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing? In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Now, deploying Hazelcast-powered applications in a cloud-native way becomes even easier with the introduction of Hazelcast Cloud Enterprise, a fully-managed service built on the Enterprise edition of Hazelcast IMDG. Can't attend the live times? You should still register! We'll be sending out the recording after the webinar to all registrants.
Overview
In batch processing, a person or application regularly launches a processing job against a bounded, input data set. Batch processing is often used for tasks such as ETL (extract-transform-load) for populating data warehouses, data mining, and analytics. Some of the most common functions of batch processing are filtering, joining, sorting, grouping, and aggregating data.
Traditionally, developers used specialized ETL tools operating against relational databases. Now, however, it is quite common to see generic open source tools such as Hadoop and Spark used for ETL. Such tools leverage parallel computation against distributed storage, which can offer very high performance for batch processing jobs such as ETL workloads.
In batch processing, the complete data set is assembled and available before a job is submitted for processing. Hazelcast Jet treats batch processing as a specific type of stream processing with a finite source and no windows. As a result, developers can use the same programming interface for both batch and stream processing, making the transition to streaming straightforward.
Hazelcast Jet is a single 15MB Java library with no external dependencies. It runs fast, scales automatically, and handles failures itself without requiring any additional infrastructure. You can fully embed Hazelcast Jet into applications such as data processing microservices, making it easier to build and maintain next-generation systems. Or you can launch each Hazelcast Jet processing job within its own cluster to maximize service isolation.
Contrast Jet to other popular processing technologies. For example, Hadoop and Spark have many components that require a heavyweight installation and maintenance effort. They are complex to deploy and manage. Developers must select the right modules and maintain their dependencies, creating both development and operational challenges.
Solutions
Hazelcast Jet accelerates batch processing up to 15x compared to Spark or Flink, and Hazelcast Jet outperforms Hadoop by orders of magnitude (See the complete benchmark). Hazelcast Jet achieves this performance through the combination of a directed acyclic graph (DAG) computation model, in-memory processing, data locality, partition mapping affinity, spsc queues, and green threads.
In-Memory Real Time Processing with Hazelcast Jet
Hazelcast Jet source and sink adapters make it easy to insert Hazelcast Jet into the data processing pipeline. Hazelcast Jet includes pre-built connectors for Hazelcast IMDG (specifically for the Map, Cache, and List objects), Hadoop Distributed File System, JDBC, and local data files (e.g., CSV, logs, or Avro files).
When a Hazelcast Jet cluster is co-located with Hazelcast IMDG or HDFS, it makes use of data locality. Hazelcast Jet nodes are able to efficiently read the data by having every node only read from their respective local partitions. You can create your own connectors for integration with databases or enterprise applications.
The Hazelcast Jet Pipeline API is a general-purpose, declarative API that provides developers with tools to compose fast, distributed, concurrent batch processing jobs from building blocks such as mappers, reducers, filters, aggregators, and joiners. It is simple and easy to understand, as well as powerful. For expert users, Hazelcast Jet provides a Core API, which is an edge- and vertex-level API for fine-grained control of your data pipelines.
Resources
Java Champion, Ben Evans, will provide an introduction to stream processing and teach more about core techniques and how to get started building a stream processing application using real world use cases and live demos.
Learn about in-memory distributed processing for big data with Hazelcast Jet®. Hazelcast Jet is a new Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both batch and stream-based data processing applications.
The streaming benchmark is intended to measure the latency overhead for a streaming system under different conditions such as message rate and window size. It compares Hazelcast Jet, Apache Flink, and Apache Spark Streaming.
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.