Tech Talk: Distributed Snapshots

Webinar
/ Video
/ 60 min

Having fault-tolerance can be a factor to choose a distributed system even if the expected load can be handled by a single machine – a distributed system can tolerate failures of its parts while a system running on a single machine cannot. How can a stream-processing engine guarantee an exactly-once semantics?

I’ll describe the Chandy-Lamport algorithm that can be used to consistently snapshot the global state of a distributed system. I’ll also describe its special simplified case that’s used in Jet.

Presented By