A Reference Guide to Stream Processing

| 16 pages

The goal of streaming systems is to process big data volumes and provide useful insights into the data prior to saving it to long-term storage. The traditional approach to processing data at scale is batching; the premise of which is that all the data is available in the system of record before the processing starts. In the case of failures, the whole job can be simply restarted.

While quite simple and robust, the batching approach clearly introduces a large latency between gathering the data and being ready to act upon it. The goal of stream processing is to overcome this latency. It processes the live, raw data immediately as it arrives and meets the challenges of incremental processing, scalability and fault tolerance.

This white paper introduces you to the domain of stream processing covering these topics:

  • Use cases that benefit from stream processing
  • Building blocks of a stream processing solution
  • Key concepts used when building a streaming pipeline: definition of the dataflow, keyed aggregation, windowing
  • Runtime aspects and tradeoffs between performance and correctness
  • Overview of distributed stream processing engines
  • Hands-on examples based on the Hazelcast Platform

Who Should Read It?

This paper is intended for software architects and developers who are planning or building systems utilizing stream processing, fast batch processing, data processing microservices, or distributed

What’s In This White Paper?

  • Fast Processing of Infinite and Big Data
  • What is Stream Processing
  • When to Use Stream Processing
  • The Building Blocks of Stream Processing
  • Transformations
  • Windowing
  • Running Jobs
  • Fault Tolerance
  • Sources and Sinks
  • Overview of Stream Processing Platforms