What Is a Distributed Hash Table?
Tech Talk: Machine Learning at Scale Using Distributed Stream Processing
In this talk, Marko will show one approach which allows you to write a low-latency, auto-parallelized and distributed stream processing pipeline in Java that seamlessly integrates with a data scientist’s work taken in almost unchanged form from their Python development environment. The talk includes a live demo using the command line and going through some Python and Java code snippets.
Tech Talk: Distributed Snapshots
Having fault-tolerance can be a factor to choose a distributed system even if the expected load can be handled by a single machine – a distributed system can tolerate failures of its parts while a system running on a single machine cannot. How can a stream-processing engine guarantee an exactly-once semantics? Viliam will describe the Chandy-Lamport algorithm that can be used to consistently snapshot the global state of a distributed system. I’ll also describe its special simplified case that’s used in Jet.