ETL and Data Ingestion

ETL is an acronym for “extract, transform, load.” Extract refers to collecting data from some source. Transform refers to any processes performed on that data. Load refers to sending the processed data to a destination, such as a database. ETL is a data processing concept dating back to the 1970s, but it remains important today because it is one of the most dominant frameworks for providing people and applications with data. Engineering and product teams load and preprocess data from a variety of sources to a number of destinations with ETL techniques and software.

The Hazelcast Approach to ETL and Data Ingestion

Hazelcast Platform provides all the necessary infrastructure to build and run real-time ETL applications so you can focus on the business logic of your data pipelines. Key components of Hazelcast Platform include:

Pipeline API to declaratively define your data pipelines
Connectors for extracting data from sources and loading it into sinks
Runtime for executing data pipelines with fault-tolerance and parallel execution at scale

Hazelcast Platform can move data between a variety of systems, which is often used for operational storage or as a distributed cache. Hazelcast Platform is a very convenient tool for keeping in-memory caches hot through real-time ETL.

One popular data ingestion use case is loading event streams from Kafka into Hazelcast Platform, essentially creating a materialized view on top of the stream for real-time querying. Learn more about loading data into Hazelcast IMDG using Jet.

Hazelcast Platform was built for developers by developers. Therefore, its primary programming interface is a Java-based DSL called the Pipeline API, which allows you to declaratively define the data processing pipeline by composing operations against a stream of records. Common operations include filtering, transforming, aggregating, joining, and data enrichment. The Pipeline API is similar to java.util.stream. However, it has been designed to support distributed stream processing as a first-class citizen.

The heart of Hazelcast Platform is a high-performance execution engine. Once deployed, Hazelcast Platform performs the steps of the data pipeline concurrently, making use of all available CPU cores. Hazelcast Platform processes partitioned data in parallel. Hazelcast Platform processes data continuously, performing with millisecond latencies. The Hazelcast Platform architecture enables you to process hundreds of thousands of records per second with millisecond latencies using a single node.

Learn more about Hazelcast Platform performance

ETL jobs have to meet strict SLAs. If there is a failure in the system, the jobs cannot simply restart and still meet the business deadlines.

Hazelcast Platform uses checkpointing to enable continuity. Checkpoints are regularly taken and saved in multiple replicas for resilience. In the event of a failure, an ETL job is rewound back to the most recent checkpoint, delaying the job for only a few seconds rather than starting from scratch.

Hazelcast Platform clusters are elastic, allowing dynamic scaling to handle load spikes. You can add new nodes to the cluster with zero downtime to linearly increase the processing throughput. Learn more about how Hazelcast makes your computation elastic.

Key Capabilities

Products

Tool Kit

Quick Links

A cloud-agnostic architecture for your applications

Key Solutions

By Industry

By Use Case

By Architecture

A cloud-agnostic architecture for your applications

Resource Center

Content Types

Learn

33% Reduction in Operational Costs

Developers

Community

Learn

Toolbox

A cloud-agnostic architecture for your applications

Why Hazelcast?

Ready to see how Hazelcast can transform your data architecture?

ETL and Data Ingestion

See Hazelcast in Action

Master intelligent applications with Hazelcast unified real-time stream processing platform.

Introduction

Solution

The Hazelcast Approach to ETL and Data Ingestion

How it Works

Connectors for Extracting and Loading Data

Running Data Pipelines

In-Memory Storage and Cache

Featured Resources

Hazelcast Jet Briefing for the Financial Services Community

Getting Started with Stream Processing

Hazelcast Jet for Hazelcast IMDG Users

Level up with Hazelcast

Why Hazelcast

About Us

Platform

Solutions

Developers

Learn

Connect