Companies need a data-processing solution that increases the speed of business agility, not one that is complicated by too many technology requirements. This requires a system that delivers continuous/real-time data-processing capabilities for the new business reality.
Stream processing is a hot topic right now, especially for any organization looking to provide insights faster. But what does it mean for users of Java applications, microservices, and in-memory computing?
In this webinar, we will cover the evolution of stream processing and in-memory related to big data technologies and why it is the logical next step for in-memory processing projects.
Setting up servers and configuring software can get in the way of the problems you are trying to solve. With Hazelcast Cloud we take all of those pain points away.
Watch this webinar to learn how you can instantly fire up and then work with Hazelcast Cloud from anywhere in the world. With our auto-generated client stubs for Java, Go, Node.js, Python and .NET, we can have you connected and coding in less than a minute!
Lightweight, embeddable, powerful.
Looking for DEVELOPER specific content?
Jet.Hazelcast.org | Jet Open Source
Hazelcast Jet is an application embeddable, stream processing engine designed for fast processing of big data sets. The Hazelcast Jet architecture is high performance and low-latency-driven, based on a parallel, streaming core engine that enables data-intensive applications to operate at near real-time speeds. Jet is used to develop stream or batch processing applications using directed acyclic graph (DAG) at its core.
Jet supports Apache Beam, a portable API layer for building sophisticated parallel processing pipelines. Use Jet as an execution engine (“runner”) to get the highest throughput and lowest latency Beam pipelines. Read the Beam documentation for Jet, and the Hazelcast blog on running Jet with Beam.
For more information:
Why are DAGs relevant to Jet processing? Jet uses DAGs to represent the execution plan. When running jobs, Jet identifies the available resources (i.e., CPU cores) and deploys tasks across the cluster to achieve the highest levels of performance. Jet simplifies this process for you by determining the optimal configuration of tasks and deploying the tasks for you, freeing you from the otherwise complicated effort of identifying where to run tasks and at what levels.
To accomplish this, Jet models its stream processing pipelines as DAGs, which lay out the tasks of a job and how they interact. It then optimizes the DAG to leverage parallelism for performance and efficiency of jobs. In the DAG representation, vertices represent the computational steps, and edges represent the data flows. Each vertex receives data from its inbound edges, performs a step in the computation, then emits data to its outbound edges. The breakdown of a job into separate vertices is possible thanks to data partitioning, so that subtasks in the overall job can be processed in parallel, independently of each other.
The key point here is that you do not have to worry about DAG planning. Jet does it all for you, so simply write your processing code using the Jet Pipeline API, and the Jet engine will take care of the runtime.
In the diagram below, you can see how tasks can be distributed across cores and nodes to run in parallel. The use of cooperative threads (i.e., application-level threads) lets Jet more efficiently run that parallelism without incurring the overhead of context switching in OS-level threads. Jet coordinates the threads to deliver superior performance with no planning required on the application developer.
Below is a visual representation of a DAG for a Jet job, as displayed in Management Center. You can see how a job is broken down into distinct tasks, and the processed data is delivered to separate sinks (i.e., destination repositories).
Architecture & Features
Built on a distributed computing platform, Hazelcast Jet offers a parallel, low-latency core engine for data-intensive applications to operate at real-time processing speeds, while cooperative multi-threading architecture enables operation of thousands of jobs simultaneously.
Hazelcast Jet operates as shared application infrastructure or embedded directly in applications. It is ideal for microservices with a lightweight footprint, making data manipulation easy for developers and DevOps. Cloud and container ready.
One solution provides stream, batch, and RPC processing, with a variety of connectors to enable easy integration into data processing pipelines. Scaling, failure handling and recovery are all automated for ease of operational management.
Embed Hazelcast IMDG as an operational storage for enterprise-grade reliability and resilience, with high-performance integration that eliminates network transit latency and integration issues between processing and storage.
Big data processing at millisecond speed
Highly available and fault tolerant
Distributed, in-memory computation
Supports common data sources (Kafka, HDFS, Sockets, JMS, JDBC etc.)
Enables easy creation of custom sources
Contains scalable data storage with the clients Java, .NET, C++, Python, Node.js and Go
Embeddable for isolated and fully self-contained data-centric services
Elasticity for fault-tolerance and automatic scaling
Supports network discovery
Supports in-memory messaging
Enables low-latency analytics and decision making
Saves bandwidth and enhances privacy by processing data locally
Fully embeddable for simple packaging
Turnkey solution for using trained models in a production environment
Low-latency model execution environment
Apply real-time analytics to high-volume event data to make faster and better decisions
Integrate multiple applications to support multiple business functions
Build continuous analytics processing for enhanced revenue generation, smart resource allocation, improved customer service and other metrics
SigmaStream is deploying Hazelcast as a processing backbone to monitor and analyze high-frequency (50,000+ events per second) data streaming in from oil well sensors on drilling rigs operating thousands of feet beneath the North Sea (an extreme edge use case). This is an industry-leading application of streaming technology in the context of oil and gas drilling platforms and has both broad and deep applicability across the entire domain. Initial estimates indicate a 10% + cost savings in the time required to achieve true vertical depth.
Hazelcast Jet uploads a stream of stock market pricing data from a Kafka topic into an IMDG map. Data is analyzed as part of the upload process, calculating the moving averages to detect buy/sell indicators.
Hazelcast Jet reads a stream of telemetry data from ADS-B on all commercial aircraft flying anywhere in the world, typically 5,000 to 6,000 aircraft at any point in time. This data is filtered and aggregated, and then certain features are enriched and displayed in Grafana.
The world's most advanced stream processing framework.
The basics of stream processing using Hazelcast Jet
Hazelcast Jet® is an application embeddable, distributed computing platform for fast processing of big data sets. The Hazelcast Jet architecture is high performance and low latency driven, based on a parallel, streaming core engine which enables data-intensive applications to operate at near real-time speeds.
Hazelcast Jet is built on top of Hazelcast IMDG®, the leading open source in-memory data grid with tens of thousands of installed clusters. Hazelcast Jet processing jobs take full advantage of the distributed in-memory data structures provided by Hazelcast IMDG
Performant, elastically scalable, and resilient, Hazelcast Jet has been used extensively in financial services for a wide range of real-time use cases. In this webinar, Hazelcast Jet product manager Vladimir Schreiner will provide an overview of Hazelcast Jet, and how it is used at some of the world’s leading banks and credit card providers.
Event stream processing continues to play an increasingly important role in today’s data architectures. This is no surprise, considering that companies are striving to respond faster to ongoing changes in their business environments. However, these companies are still not taking full advantage of the value of their data, typically because they have not planned for the right approaches and architectures for stream processing. Read this Gartner report to learn more.
Whether you're interested in learning the basics of in-memory systems, or you're looking for advanced, real-world production examples and best practices, we've got you covered.