It is impossible for a distributed computer system to simultaneously
provide all three of the following guarantees:
- Consistency: describes how and whether a system is
left in a consistent state after an operation. In a distributed data
system, this usually means that once a writer has written, all readers
will see that write. A distributed data system is either strongly
consistent or has some form of weak consistency. The most well known
example of strong consistency in databases is ACID (Atomicity
Consistency Isolation Durability), used in most relational databases.
On the other end of the spectrum is BASE (Basically Available
Soft-state Eventual consistency). Most often, weak consistency comes in
the form of eventual consistency which means the database eventually
reaches a consistent state. Weak consistency systems are usually ones
where data is replicated; the latest version of something is sitting on
some node in the cluster, but older versions are still out there on
other nodes, but eventually all nodes will see the latest version.
- Availability: means how a system that is tolerant of
node failures and can also remain available during software and
hardware upgrades. This is perhaps the simplest to understand and most
commonly desired property but can be quite difficult to achieve to any
level of certainty.
- Partition Tolerance: refers to the ability for a
system to continue to operate in the presence of a network partitions.
For example, if I have a database running on 80 nodes across 2 racks
and the interconnect between the racks is lost, my database is now
partitioned. If the system is tolerant of it, then the database will
still be able to perform read and write operations while partitioned.
If not, often times the cluster is completely unusable or is
read-only.