This short video explains why companies use Hazelcast for business-critical applications based on ultra-fast in-memory and/or stream processing technologies.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing? In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Now, deploying Hazelcast-powered applications in a cloud-native way becomes even easier with the introduction of Hazelcast Cloud Enterprise, a fully-managed service built on the Enterprise edition of Hazelcast IMDG. Can't attend the live times? You should still register! We'll be sending out the recording after the webinar to all registrants.
ETL is an acronym for “extract, transform, load.” Extract refers to collecting data from some source. Transform refers to any processes performed on that data. Load refers to sending the processed data to a destination, such as a database. ETL is a data processing concept dating back to the 1970s, but it remains important today because it is one of the most dominant frameworks for providing people and applications with data. Engineering and product teams load and preprocess data from a variety of sources to a number of destinations with ETL techniques and software.
Solutions
Hazelcast Jet provides all the necessary infrastructure to build and run real-time ETL applications, so you can focus on the business logic of your data pipelines. Key components of Hazelcast Jet include:
Hazelcast Jet can move data between a variety of systems, including Hazelcast IMDG, which is often used for operational storage or as a distributed cache. Hazelcast Jet is a very convenient tool for keeping in-memory caches hot through real-time ETL.
One popular data ingestion use case is loading event streams from Kafka into Hazelcast IMDG, essentially creating a materialized view on top of the stream for real-time querying. Learn more about loading data into Hazelcast IMDG using Jet.
Hazelcast Jet was built for developers by developers. Therefore, its primary programming interface is a Java-based DSL called the Pipeline API, which allows you to declaratively define the data processing pipeline by composing operations against a stream of records. Common operations include filtering, transforming, aggregating, joining, and data enrichment. The Pipeline API is similar to java.util.stream. However, it has been designed to support distributed stream processing as a first-class citizen.
Hazelcast Jet provides a variety of connectors for streaming data into Hazelcast Jet pipelines and storing the results to sinks such as Hazelcast IMDG, Java Message Service, JDBC systems, Apache Kafka®, Hadoop Distributed File System, and TCP Sockets. Also, Hazelcast provides a convenience API so you can easily build custom connectors.
The heart of Hazelcast Jet is a high-performance execution engine. Once deployed, Hazelcast Jet performs the steps of the data pipeline concurrently, making use of all available CPU cores. Hazelcast Jet processes partitioned data in parallel. Hazelcast Jet processes data continuously, performing with millisecond latencies. The Hazelcast Jet architecture enables you to process hundreds of thousands of records per second with millisecond latencies using a single Jet node.
ETL jobs have to meet strict SLAs. If there is a failure in the system, the jobs cannot simply restart and still meet the business deadlines. Hazelcast Jet uses checkpointing to enable continuity. Checkpoints are regularly taken and saved in multiple replicas for resilience. In the event of a failure, an ETL job is rewound back to the most recent checkpoint, delaying the job for only a few seconds rather than starting from scratch. Hazelcast Jet clusters are elastic, allowing dynamic scaling to handle load spikes. You can add new nodes to the cluster with zero downtime to linearly increase the processing throughput. Learn more about how Jet makes your computation elastic.
Resources
The streaming benchmark is intended to measure the latency overhead for a streaming system under different conditions such as message rate and window size. It compares Hazelcast Jet, Apache Flink, and Apache Spark Streaming.
Java Champion, Ben Evans, will provide an introduction to stream processing and teach more about core techniques and how to get started building a stream processing application using real world use cases and live demos.
Hazelcast Jet® is a 3rd generation stream processing engine that adds advanced data processing capabilities to Hazelcast IMDG®. Jet makes it simple to build distributed, fault-tolerant data processing pipelines on top of Hazelcast IMDG and provides 500% performance increase over similar processing done with Apache Spark. Just like Hazelcast®, it can be embedded into your application or run as standalone.
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.