Glossary Terms›Kappa Architecture

What Is the Kappa Architecture?

The Kappa Architecture is a software architecture used for processing streaming data. The main premise behind the Kappa Architecture is that you can perform both real-time and batch processing, especially for analytics, with a single technology stack. It is based on a streaming architecture in which an incoming series of data is first stored in a messaging engine like Apache Kafka. From there, a stream processing engine will read the data and transform it into an analyzable format, and then store it into an analytics database for end users to query.

The Kappa Architecture supports (near) real-time analytics when the data is read and transformed immediately after it is inserted into the messaging engine. This makes recent data quickly available for end user queries. It also supports historical analytics by reading the stored streaming data from the messaging engine at a later time in a batch manner, to create additional analyzable outputs for more types of analysis.

The Kappa Architecture is useful for on-demand analytics.
Read the reference architecture white paper for back-office trade monitoring here.

The Kappa Architecture is considered a simpler alternative to the Lambda Architecture as it uses the same technology stack to handle both real-time stream processing and historical batch processing. Both architectures entail the storage of historical data to enable large-scale analytics. Both architectures are also useful for addressing “human fault tolerance,” in which problems with the processing code (either bugs or just known limitations) can be overcome by updating the code and running it again on the historical data. The main difference with the Kappa Architecture is that all data is treated as if it were a stream, so the stream processing engine acts as the sole data transformation engine.

Kappa Architecture Diagram — The Kappa Architecture is typically built around Apache Kafka® along with a high-speed stream processing engine.

What Is a Streaming Architecture?

A streaming architecture is a defined set of technologies that work together to handle stream processing, which is the practice of taking action on a series of data at the time the data is created. In many modern deployments, Apache Kafka acts as the store for the streaming data, and then multiple stream processors can act on the data stored in Kafka to produce multiple outputs. Some streaming architectures include workflows for both stream processing and batch processing, which either entails other technologies to handle large-scale batch processing, or using Kafka as the central store as specified in the Kappa Architecture.

How Do the Kappa and Lambda Architectures Compare?

Both architectures handle real-time and historical analytics in a single environment. However, one major benefit of the Kappa Architecture over the Lambda Architecture is that it enables you to build your streaming and batch processing system on a single technology. This means you can build a stream processing application to handle real-time data, and if you need to modify your output, you update your code and then run it again over the data in the messaging engine in a batch manner. There is no separate technology to handle the batch processing, as is suggested by the Lambda Architecture.

With a sufficiently fast stream processing engine (like Hazelcast Platform), you may not need a separate technology that is optimized for batch processing. You simply read the stored streaming data pipeline in parallel (assuming the data in Kafka is appropriately split into separate channels, or “partitions”) and transform the data as if it were from a streaming source. For some environments, you can potentially create the analyzable output on demand, so when a new query is submitted from an end user, the data can be transformed ad hoc to optimally answer that query. Again, this requires a high-speed stream processing engine to enable low latency in the processing.

While the Lambda Architecture does not specify the technologies that must be used, the batch processing component is often done on a large-scale data platform like Apache Hadoop. The Hadoop Distributed File System (HDFS) can economically store the raw data that can then be transformed via Hadoop tools into an analyzable format. While Hadoop is used for the batch processing component of the system, a separate engine designed for stream processing is used for the real-time analytics component. One advantage of the Lambda Architecture, however, is that much larger data sets (in the petabyte range) can be stored and processed more efficiently in Hadoop for large-scale historical analysis.

Back to Glossary Terms

Keep Reading

On-Demand Training

Stream Processing Essentials

Through conceptual overviews, demos, and hands-on practice, you will learn to create stream processing pipelines using Java and Hazelcast Jet.

Guide

/ PDF

/ 16 pages

A Reference Guide to Stream Processing

This paper is intended for software architects and developers who are planning or building systems utilizing stream processing, fast batch processing, data processing microservices, or distributed java.util.stream.

Webinar

/ Video

/ 60 min

Machine Learning Inference at Scale with Python and Stream Processing

In this talk we will show you how to write a low-latency, high throughput distributed stream processing pipeline (in Java), using a model developed in Python.

Webinar

/ Video

/ 45 min

5 Reasons to Upgrade to Hazelcast Enterprise

With Hazelcast Enterprise, you get additional reliability and security capabilities on top of the open-source edition that makes it easier to maintain a production deployment on Hazelcast. Join us as we cover the leading technology trends today, the five main reasons to use Hazelcast Enterprise, and customer examples.

Why Hazelcast?

Forrester names Hazelcast as a Strong Performer

Key Capabilities

Products

Tool Kit

Quick Links

Unlock your perfect plan with our flexible pricing

Key Solutions

By Industry

By Use Case

By Architecture

Solutions

By Industry

By Use Case

By Architecture

Join us for a deep dive into Hazelcast Platform's capabilities

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

What Is the Kappa Architecture?

What Is a Streaming Architecture?

How Do the Kappa and Lambda Architectures Compare?

Keep Reading

Stream Processing Essentials

A Reference Guide to Stream Processing

Machine Learning Inference at Scale with Python and Stream Processing

5 Reasons to Upgrade to Hazelcast Enterprise

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Why Hazelcast?

Forrester names Hazelcast as a Strong Performer

Key Capabilities

Products

Tool Kit

Quick Links

Unlock your perfect plan with our flexible pricing

Key Solutions

By Industry

By Use Case

By Architecture

Solutions

By Industry

By Use Case

By Architecture

Join us for a deep dive into Hazelcast Platform's capabilities

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

What Is the Kappa Architecture?

What Is a Streaming Architecture?

How Do the Kappa and Lambda Architectures Compare?

Related Topics

Spread the Word

Keep Reading

Stream Processing Essentials

A Reference Guide to Stream Processing

Machine Learning Inference at Scale with Python and Stream Processing

5 Reasons to Upgrade to Hazelcast Enterprise

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect