Announcing Jet 0.6

We are happy to announce the release of Hazelcast Jet 0.6! This version brings many new improvements which I will try to squeeze into this blog post.

Improved streaming support for Pipeline API

With 0.5, Jet introduced a new high-level API called Pipelines which offered a convenient way to do distributed data processing using a powerful and expressive DSL. With 0.6, this API is now overhauled and extended to support a more complete set of features, including full streaming support with windowing. Previously, stateful streaming operations such as windowing was only possible to do using the core API. An example of a streaming job which uses windowing and uses a map event journal as the source is below:

Pipeline p = Pipeline.create();
    
        p.drawFrom(Sources.<Trade, Integer, Trade>mapJournal(TRADES_MAP_NAME,
                alwaysTrue(), EventJournalMapEvent::getNewValue, START_FROM_CURRENT))
         .addTimestamps(Trade::getTime, 3000)
         .groupingKey(Trade::getTicker)
         .window(WindowDefinition.sliding(SLIDING_WINDOW_LENGTH_MILLIS, SLIDE_STEP_MILLIS))
         .aggregate(counting(),
                 (winStart, winEnd, key, result) -> String.format("%s %5s %4d", toLocalTime(winEnd), key, result))
         .drainTo(Sinks.logger());
    
        jet.newJob(p).join();

Stateful transforms with Pipeline API

Another improvement to the Pipeline API is the ability to stateful map, flat map and filter operations. This is useful when the stage involves some external library or object that needs to be initialized and then re-used for each item. One example of this is using a Hazelcast Replicated Map for enriching a stream:

p.drawFrom(tradesSource)
         .mapUsingContext(ContextFactories.replicatedMapContext(TICKER_INFO_MAP_NAME),
                 (map, trade) -> tuple2(trade, map.get(trade.getTicker())))
         .drainTo(Sinks.logger());

Replicated Map can be updated in runtime and acts as a realtime lookup table, giving the user ability to update static data while the job is still running.

Custom sink builder

Typically to write new sinks in Jet required using the Core API to write a new Processor, but with 0.6 we are introducing an easy to use sink builder which can be used to rapidly define new sinks.

An example sink which writes the output to a Hazelcast Topic can be created as follows with a few lines of code:

private static Sink<String> buildTopicSink() {
        return Sinks.<ITopic<String>, String>builder((jet) -> jet.getHazelcastInstance().getTopic(TOPIC_NAME))
                .onReceiveFn(ITopic::publish)
                .build();
    }

Merging and updating map sinks

Hazelcast Maps have a feature called entry processor which allows entries in the map to be updated in place in a thread-safe manner. Previously the map sink in Jet would always overwrite the existing entry. With 0.6, we are introducing merging and updating sinks, which allow the sink to update the entry in place instead.

Job management improvements

In Jet 0.5 there were big changes to how job submission works and it was the first release to have fault-tolerance with jobs restarting automatically upon a node failure. With 0.6, we have improved how job cancellation works and added the ability to name jobs.

Manual job scale up

It has always been possible to add new Jet members to a running cluster without effecting the running jobs, however it was not possible to make use of the new computational capacity added by the new members. Now jobs can be restarted manually using job.restart() method. Once a job is restarted, it will run on the whole cluster including the new members, and if processing guarantee for the job is set to at-least-once or exactly-once, it will use the latest available snapshot to restore the job state.

Improved Java 9 support

The main Jet code base now builds on Java 9 without any problems. We have also added Automatic Module Names to all the modules so that Jet can be used as a module in Java 9 projects.

Jet Demo Repository

We have created a new repository with eight demo applications that are designed to showcase just some of the use cases which are possible using Hazelcast Jet. The demos available are:

  • Real-Time Trade Processing – Processes immutable events from an event bus (Kafka) to update storage optimized for querying and reading (IMDG).
  • Markov Chain Generator – Generates a Markov Chain with probabilities based on supplied classical books.
  • Market Data Ingest – Uploads a stream of stock market data (prices) from a Kafka topic into an IMDG map.
  • Flight Telemetry – Reads a stream of telemetry data from ADB-S on all commercial aircraft flying anywhere in the world.
  • Real-Time Sports Betting Engine – This is a simple example of a sports book and is a good introduction to the Pipeline API. It also uses Hazelcast IMDG as an in-memory data store.
  • Real-Time Road Traffic Analysis and Prediction – Continuously computes linear regression models from current traffic. Uses the trend from week ago to predict traffic now.
  • Twitter Cryptocurrency Sentiment Analysis – Twitter and Reddit content is analyzed with sentiment analysis in real time to calculate a cryptocurrency aggregated sentiment score.
  • Real-time Image Recognition – Recognizes images present in the webcam video input with a model trained with CIFAR-10 dataset.

Hazelcast Version Updated to 3.10

Jet 0.6 has been updated to use Hazelcast 3.10, which has several new features. Currently 0.6 is using 3.10-BETA-2, but Jet 0.6.1 will be updated to the final version as soon as it is released.