Hazelcast Jet 3.2 is Released

We are excited to introduce Hazelcast Jet 3.2, the latest version of Hazelcast’s batch and stream processing platform. We’ve added a number of powerful capabilities that not only expands the addressable use cases, but also further simplifies your application development efforts.

Stateful Mapping Operator

Jet now has support for more general fault-tolerant, stateful operations that extend its capabilities beyond the types of windowing already supported (tumbling, sliding, session) and enable new types of operations such as pattern matching and event correlation. The simple example below illustrates how this new operator can be used to match the start and end events of a transaction:

Pipeline p = Pipeline.create();
StreamStage<Entry<String, Long>> transactionOutcomes = p.drawFrom(eventStream())
    .groupingKey(event -> event.getTransactionId())
    .mapStateful(
        () -> new TransactionEvent[2], 
        (startEnd, transactionId, event) -> { 
            switch (event.type()) {
                case TRANSACTION_START:
                    startEnd[0] = transactionEvent;
                    break;
                case TRANSACTION_END:
                    startEnd[1] = transactionEvent;
                    break;
                default:
            }
            return (startEnd[0] != null && startEnd[1] != null)
                    ? entry(transactionId, startEnd[1].timestamp() - startEnd[0].timestamp())
                    : null;
        });

For more information, see the stateful mapping section in the Hazelcast Jet reference manual.

Amazon S3 Connector

Jet now includes a native connector for reading and writing files to Amazon AWS S3. The connector is currently limited to reading/writing text files.

Extended Support for Job-related Metrics

In previous versions of Jet, you could only retrieve the metrics via JMX or Jet Management Center, and even then, it was not possible to see the final metrics for completed jobs. With Jet 3.2, it is now possible to query metrics directly through the Job API both for running and completed jobs:

job.getMetrics()
   .filter(MetricTags.VERTEX, "vertexName")
   .get(MetricNames.EMITTED_COUNT)

Furthermore, metrics can be turned on and off for each job through JobConfig, and we have extended the reference manual to include a list of all available metrics. To continue with the metrics story, we are currently working on introducing user-defined metrics in the next versions of Jet.

Test Sources and Assertion Sinks

When prototyping or unit testing pipelines, it’s usually quite convenient to have some mock sources that can be used to quickly provide some data for processing. With this version, we have added both batch and streaming sources which can be used for development and testing:

BatchStage<Integer> stage = pipeline.drawFrom(TestSources.items(1, 2, 3, 4, 5));
// emits 10 items per second
StreamSourceStage<Integer> stage = pipeline.drawFrom(TestSources.itemStream(10)); 

For testing pipelines, it’s also useful to have convenient sinks that can assert the incoming items. This is also part of Jet 3.2 and can be used as follows:

p.drawFrom(TestSources.items(1, 2, 3, 4))
 .apply(Assertions.assertOrderedArrays.asList(1, 2, 3, 4)))
 .drainTo(Sinks.logger());

We are also using these test sources and sinks as part of testing Jet connectors and features. For more information, see the test sources section in the reference manual.

Performance Improvements

There are many performance improvements as part of this release:

  • IMap and ICache sources make greater use of parallelism and have been rewritten to use cooperative processors, giving higher throughput.
  • Various IMap sinks (plain, merging, and updating) also make use of cooperative processing, giving up to 5x throughput compared to previous versions.
  • Optimizations in job submission pipelines mean Jet can now execute small jobs with lower latency, so a job can now be executed to completion in 10 milliseconds or less.

These improvements significantly speed up the “cache and query” use case, where Jet processes the data stored in the Hazelcast cluster. This enables environments that are sometimes referred to as “self-service analytics portals,” where a large number of end users can each submit a series of data processing or querying requests and get immediate responses.

Hazelcast IMDG 3.12.3

Hazelcast Jet is now updated to include the latest release of Hazelcast IMDG, 3.12.3.

More to Come

We’ve delivered quite a bit on Hazelcast technology this year, but we’re not done yet for 2019! Stay tuned for additional updates coming soon that will let you do even more with Hazelcast.