What is the CAP Theorem?

The CAP theorem is a theorem about distributed computing systems; it has been stated in various forms over the years. The original statement of the theorem by Eric Brewer (also known as Brewer’s theorem) states that a computer system can at best provide two of the three properties from the following list:

  • Consistency:  the view of the data is up-to-date on all members of a distributed computing system.
  • Availability:  the data is always accessible for reading and updating
  • Partition tolerance: the distributed system continues to operate in the presence of a failure of the network where not all members can reach other members.
CAP theorem
CAP theorem states that a computer system can at best provide two of these three properties: Consistency, Availability, or Partition Tolerance.

CAP Theorem System Design

In practical terms, a distributed system cannot be made immune to network failures, thus designers of such systems must accept the possibility of such failures as a given, and then decide how to respond. If the system chooses to favor Availability (such systems are designated AP systems), then it will continue to service requests, even though the data could be in an inconsistent state (for example, a failing member may have received and acknowledged an update to the data, but then failed before that update was synchronized with other members of the system). If the system chooses to favor Consistency (known as a CP system), then it will choose to stop serving requests (become unavailable) if the consistency of data cannot be guaranteed. For a CP system, this is achieved by requiring a certain number of nodes to confirm that a data update has been made before acknowledging the update; this certain number will be greater than half of the nodes that comprise the distributed system. 

CP systems are obnoxiously hard to implement correctly, as there are many subtle ways in which distributed systems can defy expectations. A system believed to be down may just be experiencing network congestion, so updates can arrive out-of-sequence from a node that was believed to be offline.

 

To correctly implement a CP system, a way of achieving distributed consensus is needed – that is, a majority of the nodes present in the system must all agree on the correct shared state of the system. The two most common protocols for implementing such a consensus system are Raft and Paxos. 

Some systems can be configured to run in either an AP or a CP mode; for example, Hazelcast IMDG has a CP subsystem that provides strong consistency guarantees, while the non-CP portions of IMDG favor availability. The CP subsystem is based on Hazelcast’s implementation of the Raft consensus protocol.