This short video explains why companies use Hazelcast for business-critical applications based on ultra-fast in-memory and/or stream processing technologies.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing?
In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Now, deploying Hazelcast-powered applications in a cloud-native way becomes even easier with the introduction of Hazelcast Cloud Enterprise, a fully-managed service built on the Enterprise edition of Hazelcast IMDG. Can't attend the live times? You should still register! We'll be sending out the recording after the webinar to all registrants.
In batch processing, a person or application regularly launches a processing job against a bounded, input data set. Batch processing is often used for tasks such as ETL (extract-transform-load) for populating data warehouses, data mining, and analytics. Some of the most common functions of batch processing are filtering, joining, sorting, grouping, and aggregating data.
Traditionally, developers used specialized ETL tools operating against relational databases. Now, however, it is quite common to see generic open source tools such as Hadoop and Spark used for ETL. Such tools leverage parallel computation against distributed storage, which can offer very high performance for batch processing jobs such as ETL workloads.
In batch processing, the complete data set is assembled and available before a job is submitted for processing. Hazelcast treats batch processing as a specific type of stream processing with a finite source and no windows. As a result, developers can use the same programming interface for both batch and stream processing, making the transition to streaming straightforward.
Hazelcast is a single 15MB Java library with no external dependencies. It runs fast, scales automatically, and handles failures itself without requiring any additional infrastructure. You can fully embed Hazelcast into applications such as data processing microservices, making it easier to build and maintain next-generation systems. Or you can launch each Hazelcast processing job within its own cluster to maximize service isolation.
Contrast Hazelcast to other popular processing technologies. For example, Hadoop and Spark have many components that require a heavyweight installation and maintenance effort. They are complex to deploy and manage. Developers must select the right modules and maintain their dependencies, creating both development and operational challenges.
Hazelcast accelerates batch processing up to 15x compared to Spark or Flink, and Hazelcast outperforms Hadoop by orders of magnitude (See the complete benchmark). Hazelcast achieves this performance through the combination of a directed acyclic graph (DAG) computation model, in-memory processing, data locality, partition mapping affinity, spsc queues, and green threads.
Hazelcast source and sink adapters make it easy to insert Hazelcast into the data processing pipeline. Hazelcast includes pre-built connectors for Hazelcast IMDG (specifically for the Map, Cache, and List objects), Hadoop Distributed File System, JDBC, and local data files (e.g., CSV, logs, or Avro files).
When a Hazelcast cluster leverages its in-memory data, or is colocated with data stores like HDFS, it makes use of data locality. Hazelcast nodes are able to efficiently read the data by having every node only read from their respective local partitions. You can create your own connectors for integration with databases or enterprise applications.
The Hazelcast Pipeline API is a general-purpose, declarative API that provides developers with tools to compose fast, distributed, concurrent batch processing jobs from building blocks such as mappers, reducers, filters, aggregators, and joiners. It is simple and easy to understand, as well as powerful. For expert users, Hazelcast provides a Core API, which is an edge- and vertex-level API for fine-grained control of your data pipelines.
Java Champion, Ben Evans, will provide an introduction to stream processing and teach more about core techniques and how to get started building a stream processing application using real world use cases and live demos.
Learn about in-memory distributed processing for big data with Hazelcast Jet®. Hazelcast Jet is a new Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both batch and stream-based data processing applications.
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.