Get Hazelcast

Tech Talk: Machine Learning at Scale Using Distributed Stream Processing

Webinar

The capabilities of machine learning are now pretty well understood and there are great tools to do data science and construct models that answer nontrivial questions about your data. These tools are mostly used from Python.

The key new challenge is making the trained prediction model usable in real-time, while the user is interacting with your software. Getting answers from an ML model (this is called inference) takes a lot of CPU and must be done at serious scale. The ML tools are optimized mainly for batch-processing a lot of data at once, and often the implementations aren’t parallelized.

In this talk, I will show one approach which allows you to write a low-latency, auto-parallelized and distributed stream processing pipeline in Java that seamlessly integrates with a data scientist’s work taken in almost unchanged form from their Python development environment.

The talk includes a live demo using the command line and going through some Python and Java code snippets.

Presented By:

Marko Topolnik
Marko Topolnik
Senior Software Engineer
Hazelcast

Marko Topolnik is a senior engineer in the Jet Core team. He has been with Hazelcast® since 2015, holds a PhD in computer science and has a six-figure score on Stack Overflow.

Vladimír Schreiner
Vladimír Schreiner
Product Manager, Hazelcast Jet
Hazelcast

Vladimir is a product manager with an engineering background and deep expertise in stream processing and real-time data pipelines. Ten years of building internal software platforms and development infrastructure have made him passionate about new technologies and finding ways to simplify data processing. Vladimir co-authored two white papers on the topic: Understanding Stream Processing: Fast Processing of Infinite and Big Data, and A Reference Guide to Stream Processing. His tutorial video on stream processing and real-time data pipelines discusses the building blocks of a stream processing pipeline and demonstrates how developers can write a full-blown streaming pipeline in less than a hundred lines of Java code for a variety of applications. Vladimir is also a lecturer with the Czechitas Foundation, whose mission is to inspire women and girls to explore the world of information technology. Czechitas Foundation teaches coding in various programming languages, software testing, and data analysis.