Directed Acyclic Graph (DAG)

A directed acyclic graph (DAG) is a conceptual representation of a series of activities. The order of the activities is depicted by a graph, which is visually presented as a set of circles. Each circle represents an activity, some of which are connected by lines, representing the flow from one activity to another. Each circle is known as a “vertex,” and each line is known as an “edge.” “Directed” means that each edge has a defined direction. Each edge necessarily represents a single directional flow from one vertex to another. “Acyclic” means that there are no loops (i.e., “cycles”) in the graph. For any given vertex, if you follow an edge that connects that vertex to another, there is no path in the graph to get back to that initial vertex.

Directed Acyclic Graph (DAG) Diagram

Why Are Directed Acyclic Graphs Useful?

DAGs help represent many different types of flows, including data processing flows. By thinking about large-scale processing flows in terms of DAGs, one can more clearly organize the various steps and the associated order for these jobs. In many data processing environments, a series of computations are run on the data to prepare it for one or more destinations. This data processing flow type is often called a data pipeline. For example, sales transaction data may be processed immediately to prepare it for making real-time recommendations to consumers. As part of the processing lifecycle, data can undergo many steps. This includes cleansing (correcting incorrect/invalid data), aggregation (calculating summaries), enrichment (identifying relationships with other relevant data), and transformation (writing the data into a new format).

Characteristics of DAGs in Stream Processing

One key characteristic of streaming DAGs and the data processing flows they model is that there can be multiple paths in the flow. This is important because it recognizes the need to process data in multiple ways to accommodate different outputs and needs. In the example flow below, a stream of sensor data is processed. The streaming data is first loaded from the sensors and then separated by the sensor type. Sensor X data will be summarized per second and then analyzed in real-time. If any critical status is observed, an alert is sent. The data is also saved for long-term storage and for possible further analysis. In this flow, data from sensor Y is summarized per minute, then stored in the same long-term store as data for sensor X.

Global sales data represented by the directed acyclic graph (DAG). — A stream of sensor data is represented as a directed acyclic graph.

Keep Reading

White Paper

Accelerate Your Kafka Application Response Time

White Paper

Accelerate Your Stream Processing Development

Our comprehensive white paper delves deep into Apache Flink and Hazelcast, revealing their distinct qualities that can elevate your brand.

e-Book

A Comprehensive Guide for Enterprise Architects to Evaluate Stream Processing Technologies

Streaming data technologies are here to transform the world of data in motion into immediate, actionable insights. This guide is your compass in this new era, helping you understand your options and how to evaluate them.

White Paper

Real-Time Payment Processing and Fraud Detection for the Mobile Age

As payments are increasingly executed using mobile devices, the infrastructure is changing. As always, a multitude of banking channels, financial services providers, payment processors, and payment networks are jockeying for position in a highly competitive ecosystem.

This paper discusses the challenges that payment processors face today, along with examples of how leading businesses solve these challenges.

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

Distributed Computing

Distributed Computing

Directed Acyclic Graph (DAG)

Why Are Directed Acyclic Graphs Useful?

Characteristics of DAGs in Stream Processing

DAGs in Batch Processing

Keep Reading

Accelerate Your Kafka Application Response Time

Accelerate Your Stream Processing Development

A Comprehensive Guide for Enterprise Architects to Evaluate Stream Processing Technologies

Real-Time Payment Processing and Fraud Detection for the Mobile Age

Level up with Hazelcast

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect

Platform

Cloud Deployment Options

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

Distributed Computing

Distributed Computing

Directed Acyclic Graph (DAG)

Why Are Directed Acyclic Graphs Useful?

Characteristics of DAGs in Stream Processing

DAGs in Batch Processing

Related Topics

Keep Reading

Accelerate Your Kafka Application Response Time

Accelerate Your Stream Processing Development

A Comprehensive Guide for Enterprise Architects to Evaluate Stream Processing Technologies

Real-Time Payment Processing and Fraud Detection for the Mobile Age

Level up with Hazelcast

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect