Stream Processing

Stream processing capabilities let you act instantly on data that represents the latest activities in your business. Instead of relying on batch processing that inserts delays into your data-driven initiatives, leverage Hazelcast stream processing to gain instant action.

Introduction

Stream processing involves performing operations on data in motion. This process typically handles one data element at a time or small data groups ("windows"), contrasting batch processing which operates on the entire dataset as a single task.

A common business aim is to swiftly utilize data. This challenge appears daunting when unsuitable technologies are employed. Using legacy tools designed for batch processing creates obstacles. Conversely, harnessing a genuine stream processing engine efficiently extracts value from real-time data, providing the sought-after business advantages.

Components

On a technology level, a basic stream processing architecture consists of:

  • A set of data sources that provide the real-time data to be processed.
  • A message bus, like Apache Kafka or Apache Pulsar, that stores the data from the data sources to be consumed by the stream processing application.
  • A stream processing engine that reads (“ingests”) data from the message bus to perform several operations on it like filtering, aggregations, enrichment, transformations, scoring, delivery, and loading.
  • One or more reference databases that provide contextual information that helps to enrich the streaming data to make it more meaningful.
  • A set of data destinations (“sinks”) to which the streaming data is ultimately stored, typically in a queryable format.

When more capabilities are required in the stream processing architecture, the following components are also included:

  • A stateful store, which is a fast data store that holds information as an artifact of the stream processing workflow that is used for subsequent processing of data in the stream. This component is necessary for what’s known as “stateful processing” or “stateful computations.”
  • A machine learning inference mechanism, which allows the system to run streaming data through a machine learning algorithm in an operational/production environment.
  • A digital integration hub (also known as a hot data layer) to which streaming data is delivered, and which acts as a central access point for recent data, especially for end user-facing, interactive applications that rely on real-time data.

Hazelcast Platform

Hazelcast Platform is a unified real-time data platform that combines stream processing and a fast data store in a single platform that is run in a single cluster. This unified architecture simplifies the development and deployment of real-time applications, and provides performance advantages that are required for applications that require real-time responsiveness.

Hazelcast Platform provides the stream processing engine, the reference database, the stateful store, the machine learning inference components, and the digital integration hub to help you quickly and easily build out your stream processing architecture. Its fast data store can also be used for messaging, so in some use cases, you can pass messages between applications using the Hazelcast fast data store (instead of an external message bus), especially when extremely low latency is required.

Business Requirements

Stream processing requirements primarily revolve around elevating business value beyond batch processing. Recognizing the potential for competitive advantage through real-time data utilization, businesses find stream processing a compelling choice. Key requirements include:

  • Automate certain business operations that are driven by real-time data and cannot wait for human intervention
  • Leverage real-time machine learning to get more accuracy in predicting future actions, especially when needing to find non-obvious actionable insights
  • Respond to customer interactions instantly to continue positively engaging with the customer while their attention is still focused on your business
  • Analyze or act instantly on data throughout the day on an ongoing basis, versus waiting for end-of-day analytics which preclude the chance to take advantage of time-sensitive opportunities or needs

Technical Challenges

Stream processing deployments can be difficult without the right technologies. Many deployments today suffer from extreme complexity from stitching together many different technology components. Hazelcast provides an advantage for you by unifying key components of a stream processing deployment to significantly simplify the effort for developing and deploying streaming applications that deliver a real-time advantage. A summary of technical challenges in stream processing deployments are listed below.

Complexity

Complexity of having many different technology components. Hazelcast simplifies a stream processing deployment with its unified architecture that handles several critical capabilities and thus eliminates the need for multiple distinct components.

Cost of running separate technologies in separate clusters. Hazelcast Platform consolidates several functional components into a single cluster, reducing the hardware footprint and cost that is otherwise required in a non-Hazelcast deployment.

Suboptimal performance in a traditional deployment, in which the many components are not optimized to work together, so significant effort must be spent to tune each component and create optimal integrations. Our Platform runs the same functionality as multiple components in the same cluster, eliminating the bottlenecks that arise from integrating distinct technologies.

Scaling out streaming applications can be a challenge with basic stream processing engines that follow a client/server pattern. Load balancing, parallelism, and distributed concerns must be coded by the application developer, adding to the time-to-market as well as the complexity of the application code. Hazelcast provides a framework for submitting job-oriented applications that are automatically allocated across a Hazelcast cluster, handling the orchestration and coordination of the subtasks in an application for you. This lets you write fully distributed, parallelized streaming applications that optimally run in your Hazelcast cluster without having to write plumbing code.

Streaming data is fundamentally different from batch or micro-batch processing because both inputs and outputs are continuous. In many cases, streaming computations look at how values change over time. Typically, we look at streaming data in terms of “windows,” a specific slice of the data stream constrained to a time period. Hazelcast stream processing supports tumbling, sliding, and session windows. For more information on windows, please read the documentation page on time windowing.

Hazelcast Platform supports the notion of “event time” in which events can have their own timestamp and may arrive out of order. To handle out-of-order events and especially late-arriving events, stream processors must keep calculations (i.e., aggregations) open until all events have arrived. However, stream processors cannot know if all events have arrived, so they need to discard any extremely late events. To define what constitutes an “extremely late” event, Hazelcast sets a “watermark” that marks a time window in which late-arriving events can still be processed in the appropriate aggregation window. Events arriving from beyond the watermarked time window are discarded.

Fault tolerance in stream processing systems must deal with preserving data that is not necessarily stored in any permanent medium. This means that stream processors need to know how to handle failures with data-in-motion, or else data can be lost. Hazelcast platform is fault-tolerant for issues such as network failures, splits, and node failures. When there is a fault, our platform uses the latest state snapshot and automatically restarts all jobs that contain the failed member as a job participant from this snapshot. With in-memory snapshots saved to distributed in-memory storage, Hazelcast plaform resumes processing where it left off. Distributed in-memory storage is an integral component of the cluster. Multiple replicas of data are stored in a distributed manner across the cluster to increase the cluster’s resiliency.

Event processing systems often grapple with reconciling performance and correctness, with certain systems lacking firm processing assurances, complicating programming.

Hazelcast Platform simplifies this by enabling you to opt for a processing guarantee when initiating a job. While heightened guarantees entail some performance trade-offs, our platform maintains impressive processing speed while adhering to your chosen guarantee. Hazelcast platform offers options including exactly-once processing (the most accurate but slower), at-least-once processing, or no strict guarantee (the fastest choice).

Why Hazelcast

Hazelcast provides an extremely fast stream processing engine, engineered to tackle the most demanding workloads. Its optimized computing model, combined with the integrated fast data store, eradicates bottlenecks that contribute to latencies in other stream processing engines. Beyond speed, the Hazelcast Platform boasts versatility across various use cases, offering the reliability needed for mission-critical systems. The unified architecture streamlines deployments, empowering you to craft potent applications with reduced costs, accelerated time-to-market, and enhanced ROI.

Easy to Develop and Deploy

Hazelcast Platform was designed to simplify the application development process by providing a familiar API that abstracts away the complexity of running a distributed application across multiple nodes in a cluster. This allows developers to spend more time on business logic and not on writing custom integration and orchestration code. Our platform can seamlessly integrate with your IT architecture to add new capabilities without having to rip and replace your existing stack. The Hazelcast cloud-native architecture requires no special coding expertise to get the elasticity to scale up or down to meet highly fluctuating workload demands.

Performance at Scale

Whether you process a large volume of transactions, enhance online experiences with faster responsiveness, run large-scale transformations on data, or cut costs with a mainframe integration deployment, Hazelcast Platform is designed for the ultra-performance that today’s banking workloads require. The proven performance advantage is especially valuable for data-focused experimentation that enables ongoing business optimization, especially in data science initiatives including machine learning inference for fraud detection.

Mission-Critical Reliability

With built-in redundancy to protect against node failures, and efficient WAN Replication to support disaster recovery strategies that safeguard against total site failures, Hazelcast Platform was built to provide the resilience to run mission-critical systems. The extensive built-in security framework protects data from unauthorized viewers, and security APIs allow custom security controls for sensitive environments.

Customer Success Story

A global bank needed to boost its loan business, and saw an opportunity to offer loans to customers who showed signs of needing additional cash.

However, their otherwise sophisticated core banking architecture lacked provisions for real-time offers. The time it took to calculate an appropriate loan offer (based on income history, credit score, and related factors) led to delays of a few days, often resulting in customers seeking alternative funding sources.

To address this, the bank's core IT team harnessed Hazelcast stream processing to access data in motion. When customers requested money they lacked at an ATM, a Hazelcast-driven application instantly computed suitable loan offers in real time and conveyed them to customers via SMS. This real-time offer initiative increased loan offer conversions by 400%. Within a year, this promotional initiative became profitable within one year, hinting at multiple opportunities for revenue enhancement in the future.

Use Cases

A wide variety of stream processing use cases are supported by Hazelcast. With its simplicity, performance, and reliability advantages, it is very well suited for handling any type of stream processing use case.

  • Personalized, real-time recommendations
  • Real-time offers
  • Behavior analysis
  • Connected vehicle analysis
  • Internet of things analytics
  • Payment processing
  • Real-time machine learning
  • Streaming data integration
  • Patient monitoring