The CAP-Theorem is one of those terms that are very badly and vaguely explained on the internet. I always suffered with it until I came across it while reading the book “Designing Data Intensive Applications” by Martin Kleppmann. In this post I will try to explain the idea in simple terms from the way I understood it.
The CAP-Theorem is a rule of thumb or a guide for the designers or operators of distributed database systems to help them decide certain properties & characteristics about the DB system itself. These properties are: Consistency, Availability and Partial-Tolerance, thus the CAP name. So what do these properties mean?
Consistency: in a distributed DB system the data is replicated across several replicas (yes copies). So when two clients make a request to read the same data from two different replicas, then we want the responses to be the same, i.e consistent. Consistency in this context is often referred to as Linearizability or strong-consistency.
Availability: in case one replica is down, is it acceptable to drive traffic to other replicas to serve clients? this is what availability means here, i.e being able to always serve clients.
Partial-Tolerance: this is a misleading and a confusing part (see why below) and pretty much means what to do in case some replicas are facing network issues.
The CAP-Theorem is actually mainly concerned with Consistency and Availability because Partial-Tolerance must be always assumed, in other words we should always consider what to do in case a network issue happens because they will happen.
So how is the CAP-Theorem then used? It’s used to make a decision on the following question:
In case of a network issue, is it okay for our DB to be inconsistent? if the answer is yes, then you can simply drive traffic to some replicas and thus always serve your clients and thus be available. If the answer is no, then you can’t be always available and thus can’t provide 100% availability.
So the CAP-Theorem is really a guide to make you decide what do you value more in case of a network issue between replicas: do you value being always available and serving the clients or do you value always returning consistent values. Because when a network issue happens, you can’t guarantee both.
If you want to guarantee 100% availability, then you can’t guarantee 100% consistent reads. And if you want to guarantee 100% consistent reads, then you can’t guarantee being always online and available.
“Designing Data Intensive Applications” – This book is great!