Tech Talk Series

Vladimir Schreiner | Apr 1, 2020

Tech conferences and meetups have been canceled or postponed across the world. To make the situation a little bit more pleasing for everybody who misses them, Hazelcast has started a series of virtual tech meetups.

Please join us on Thursdays, starting April 2nd. Always at 3:30pm CET/ 7:30am PDT / 10:30am EDT / 2:30pm GMT.

The list of topics:

Streaming in the world of legacy applications (Vladimir Schreiner)

Date: Thursday, April 2, 2020


A practical introduction to CDC (Change Data Capture). Architecture, trade-offs, tooling, and demos.

There are common themes when people describe their reasons for rearchitecting legacy business applications at a technical level: Speed & Scalability. At a business level: The need to gain new real-time insights. These legacy applications commonly center around some central datastore, such as a relational database. Moving away from this architecture requires massive migration effort. The costs and risks associated with such an effort can sometimes be prohibitive for business owners, you can’t just rip out your relational database.    

A lower risk, gradual transition to a target architecture, often wins the day. Streaming, Caching & CDC technologies are vital tools for this journey. CDC (Change Data Capture) can turn your legacy data stores into streaming sources. Modern caching technologies can host data in a way that provides speed and scalability, and finally, streaming acts as the glue that can drive new use cases as well as bridging the old.    

Machine Learning at Scale using distributed stream processing (Marko Topolnik)

Date: Thursday, April 9, 2020


The capabilities of machine learning are now pretty well understood, and there are great tools to do data science and construct models that answer nontrivial questions about your data. These tools are mostly used in Python.

The key new challenge is making the trained prediction model usable in real time, while the user is interacting with your software. Getting answers from an ML model (this is called inference) takes a lot of CPU and must be done at serious scale. The ML tools are optimized mainly for batch-processing a lot of data at once, and often the implementations aren’t parallelized.

In this talk, I will show an approach that allows you to write a low-latency, auto-parallelized, and distributed stream processing pipeline in Java that seamlessly integrates with a data scientist’s work taken in almost unchanged form from their Python development environment.

The talk includes a live demo using the command line and going through some Python and Java code snippets.

3 Easy Improvements in Your Microservices Architecture (Nicolas Frankel)

Date: Thursday, April 16, 2020


While a microservices architecture is more scalable than a monolith, it has a direct hit on performance.

To cope with that, one performance improvement is to set up a cache. It can be configured for database access, for REST calls or just to store session state across a cluster of server nodes. In this demo-based talk, I’ll show how Hazelcast In-Memory Data Grid can help you in each one of those areas and how to configure it. Hint: it’s much easier than one would expect.

Distributed Snapshots (Viliam Ďurina)

Date: Thursday, April 23, 2020


Having fault-tolerance can be a factor in choosing a distributed system even if a single machine can handle the expected load – a distributed system can tolerate failures of its parts while a system running on a single machine cannot. How can a stream-processing engine guarantee an exactly-once semantics? 

I’ll describe the Chandy-Lamport algorithm that can be used to snapshot the global state of a distributed system consistently. I’ll also describe its particular simplified case that’s used in Jet.

Advanced Kubernetes: Lesson Learned From Building a Managed Service (Hüseyin BABAL)

Date: Thursday, May 7, 2020


In this session, I will mention how to create a multi-tenant environment on Kubernetes to build a managed service.
I will provide golden rules of building managed service on top of Kubernetes with real-life examples as I gained experience during Hazelcast Cloud development:

  • Environment isolation
  • Microservice Architecture
  • Monitoring
  • Logging
  • Tracing

Embedded Time Series Storage: A Cookbook (Andrey Pechkurov)

Date: Thursday, May 21, 2020


Recently Hazelcast Management Center team had to build an embedded Java time series storage on top of existing well-known components. In this (very) practical talk we are going to discuss technical challenges and design decisions made during the process. The talk should be helpful for those who want to learn more about time series storages and databases.

Relevant Resources

View All Resources
About the Author

Vladimir Schreiner

Vladimir is a product manager with an engineering background and deep expertise in stream processing and real-time data pipelines. Ten years of building internal software platforms and development infrastructure have made him passionate about new technologies and finding ways to simplify data processing. Vladimir co-authored two white papers on the topic: Understanding Stream Processing: Fast Processing of Infinite and Big Data, and A Reference Guide to Stream Processing. His tutorial video on stream processing and real-time data pipelines discusses the building blocks of a stream processing pipeline and demonstrates how developers can write a full-blown streaming pipeline in less than a hundred lines of Java code for a variety of applications. Vladimir is also a lecturer with the Czechitas Foundation, whose mission is to inspire women and girls to explore the world of information technology. Czechitas Foundation teaches coding in various programming languages, software testing, and data analysis.