Glossary Terms›Apache Flink

What Is Apache Flink?

Apache Flink is an open source software technology for running stateful computations on both streaming data and batch data in a distributed system. It typically reads data from a source data repository, performs some action on the data, and then writes the outputs to a destination data repository (i.e., a “sink”). It provides APIs, including SQL, for building stream processing and batch processing jobs. It takes data one point at a time and performs operations on the individual data points or on groups of data points known as “windows.” It is similar to technologies like Hazelcast, Apache Storm, Apache Samza, Apache Apex, and AWS Kinesis Data Analytics.

Flink started as a fork of a research project called Stratosphere (which was a collaboration of the Technical University of Berlin, the Humboldt University of Berlin, and the Hasso Plattner Institute) and became an Apache Incubator project in March 2014. It was later accepted as an Apache top-level project in December 2014.

Why Use Apache Flink?

Flink is used as a framework for building data pipeline jobs that process large amounts of data. Rather than having the software developer worry about task coordination and allocation, Flink takes care of those low-level concerns. The data can be in the form of data streams (also known as data in motion) or in the form of batch data (also known as data at rest).

When used to process data streams (i.e., stream processing), it typically connects to a message queue (e.g., Apache Kafka, Apache Pulsar) and reads data from the queue to process it in an ordered manner. As a stream processing engine, it complements the message queue because it can efficiently process one data point at a time, which is ideal for use cases that require that level of granularity for identifying trends and patterns.

When used to process batch data, it can read data from a large data store in a parallelized way to boost throughput. It behaves similarly to processing streaming data, except that the batch data is bounded so there is an expected end to the processing. The processing is done on the entire collection of batch data to produce an output such as an analytics-ready data set.

Flink Use Cases

Flink can handle many types of use cases, and according to the official Flink website, the ideal use cases can be grouped into 3 categories:

Event-driven applications: These types of applications take “event data” (i.e., data points that reflect a change, such as updates in a database table) and react to those changes per the business logic in the application. Oftentimes the event data is used to propagate a data update from a source to many sinks.
Data analytics applications: These applications uncover insights from data in a more proactive way, versus the traditional way of preparing data for downstream use by human analysts using business intelligence (BI) tools. Examples of analytics include capturing max/min values, averages, and standard deviations, typically in aggregations defined by a sequence of data points in a specified window.
Data pipeline applications: Data pipelines are a series of processing steps that prepare the data for downstream use, typically for human-driven analytics. These applications typically take data from a source and transform it in a way to create more analytical value, including by enriching it with external data. Data pipelines are the construct for practices like Streaming ETL.

Who Uses Apache Flink?

Flink is used by a wide variety of companies across many industries. Some well-known companies that leverage Flink include Alibaba, Amazon, eBay, Netflix, and Uber. Here’s a comprehensive list of businesses that use Flink.

Back to Glossary Terms

Keep Reading

Webinar

/ Video

/ 43 min

Accelerate Your Stream Processing Development

The webinar compares two popular real-time distributed data processing platforms – Hazelcast and Apache Flink. Both solutions offer distributed computing capabilities, but the webinar will highlight the unique benefits of Hazelcast over Apache Flink, including: Simplified Architecture, Performance at Scale, and Mission-Critical Reliability

Why Hazelcast?

Forrester names Hazelcast as a Strong Performer

Key Capabilities

Products

Tool Kit

Quick Links

Unlock your perfect plan with our flexible pricing

Key Solutions

By Industry

By Use Case

By Architecture

Solutions

By Industry

By Use Case

By Architecture

Join us for a deep dive into Hazelcast Platform's capabilities

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

What Is Apache Flink?

Why Use Apache Flink?

Flink Use Cases

Who Uses Apache Flink?

Keep Reading

Accelerate Your Stream Processing Development

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Why Hazelcast?

Forrester names Hazelcast as a Strong Performer

Key Capabilities

Products

Tool Kit

Quick Links

Unlock your perfect plan with our flexible pricing

Key Solutions

By Industry

By Use Case

By Architecture

Solutions

By Industry

By Use Case

By Architecture

Join us for a deep dive into Hazelcast Platform's capabilities

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

What Is Apache Flink?

Why Use Apache Flink?

Flink Use Cases

Who Uses Apache Flink?

Related Topics

Further Reading

Spread the Word

Keep Reading

Accelerate Your Stream Processing Development

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect