Fast Batch Processing

Distributed, in-memory, parallel batch processing for speed

In batch processing, a person or application regularly launches a processing job against a bounded, input data set. Batch processing is often used for tasks such as ETL (extract-transform-load) for populating data warehouses, data mining, and analytics. Some of the most common functions of batch processing are filtering, joining, sorting, grouping, and aggregating data.

Traditionally, developers used specialized ETL tools operating against relational databases. Open source tools such as Hadoop and Spark made large-scale batch processing more mainstream. Such tools leverage parallel computation on distributed storage to efficiently process data. Hadoop uses an older processing paradigm called MapReduce, while Spark uses directed acyclic graphs (DAGs) to coordinate the processing. Hazelcast also uses DAGs, but adds in-memory speeds to complete the work much more quickly.

Hazelcast treats batch processing as a specific type of stream processing with a finite source and no time windows. Since Hazelcast was built for high-speed processing, that same performance advantage can be applied to batched data. As a result, developers can use the same programming interface for both batch and stream processing, making the transition to streaming straightforward.

Hazelcast runs fast, scales automatically, and handles failures itself without requiring any additional infrastructure. You can fully embed Hazelcast into applications such as data processing microservices, making it easier to build and maintain next-generation systems. Or you can launch each Hazelcast processing job within its own cluster to maximize service isolation.

Contrast Hazelcast to other popular processing technologies. For example, Hadoop and Spark have many components that require a heavyweight installation and maintenance effort. They are complex to deploy and manage. Developers must select the right modules and maintain their dependencies, creating both development and operational challenges.

Hazelcast significantly accelerates batch processing compared to other processing frameworks. Our benchmarks show that Hazelcast delivers extreme speed with extreme efficiency, processing 1 billion events per second on far fewer hardware resources than other technologies. Hazelcast achieves this performance through the combination of a highly optimized directed acyclic graph (DAG) computation model, in-memory processing, data locality, partition mapping affinity, queues, and green threads.

Why Hazelcast?

Forrester names Hazelcast as a Strong Performer

Platform

Introducing Hazelcast Platform 5.4

Solutions

By Industry

By Use Case

By Architecture

Join us for a deep dive into Hazelcast Platform's capabilities

Resource Center

Learn

The Gartner®️ Market Guide for Event Stream Processing

Developers

Community

Learn

Toolbox

Fast Batch Processing

See Hazelcast in Action

Master intelligent applications with Hazelcast unified real-time stream processing platform.

Introduction

Distributed, in-memory, parallel batch processing for speed

Connect to your Existing World

Solution: The Hazelcast Platform

High-level declarative Java API

Featured Resources

Hazelcast Jet Quick Start Deployment Guide

Getting Started with Stream Processing

Introducing Hazelcast Jet – Distributed Stream and Batch Processing

Level up with Hazelcast

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect