Understanding its intricacies with simplicity in modernization context
The Simple Story Explains CAP Theorem
I wanted to design a traditional system of keeping record of client’s appointments. The system requirement is simple and the only use case it has that system has to maintain the updated client appointment. The client can communicate with me for creating, updating, deleting or reading their appointments.
I started keeping client’s appointment record in my diary. The clients were happy with the service as their appointment record was perfectly managed and communicated to them on time. However as the number of clients grew many of my clients started complaining that I remain busy when they usually call. – AVAILABLITY Problem
To solve this I involved my room partner to serve my clients in addition to me. My room partner started keeping the appointment records in his own diary. Meanwhile, the client started facing a different issue in case any client created appointment with me and inquired my room partner about their appointments or vice versa they get frustrated. – CONSISTENCY Problem.
I and my room partner resolved this by informing each other about each and every transaction recorded with us. Basically, you can say we started synchronizing our records whenever we have any transaction with us. Services again became smooth.
After few months my partner had to change the flat and he moved to another city- Network PARTITION Problem. But we still managed to synchronize the appointment records in our diaries with little delay by calling each other during transaction. But the client had to wait on call until transaction is acknowledged by both – causing LATENCY. But this way we either could
- Call each other to be CONSITENT with the transactions
- Be AVAILABLE to serve other client calls at the same time.
Eventually we started compromising Consistency or Availability at the given moment.
Now it is easier to understand CAP theorem
The CAP theorem states that a distributed system can scale and provide high availability but can guarantee only two of the following three properties: Consistency, Availability, and Partition tolerance. The CAP theorem is an attempt to describe what happens when you try to build a distributed system. It basically says that it is impossible to build one system that provides all three things at once.
Why CAP Theorem is still relevant and useful
The CAP theorem has become a staple of any discussion around distributed systems. It has also been the source of much confusion. For many, understanding the proof behind CAP is like trying to read quantum mechanics. It could be because in modern architectures Architects have explored strongly consistent solutions, with best-effort availability; they have explored weakly consistent solutions with high availability; and they have explored systems that mix both weaker availability and weaker consistency in varying ways.
In the era of Digital Transformation often we deal with Microservices, or any similar distributed system design. The CAP theorem helps to understand the limitations and design systems judicially. Microservices are defined as loosely coupled services that can be independently developed, deployed, and maintained. They include their own stack, database, and database model, and communicate with each other through a network. When we want to create a microservices application, we can use the CAP theorem to determine a database that will best fit our needs. But we have to be very clear for our use case that which one is our priority Consistency or Availability in case we are dealing with distributed and Network partitioned systems.
The Proof for the CAP Theorem
Like we do in our schooling to prove any theorem we first assume the theorem is incorrect and try to disprove ourselves hence prove the original theorem. The proof hinges on a specific assumption. It basically says that you must choose to sacrifice either consistency or availability. If you try to build a distributed database that provides both consistency and availability, you will fail. For the proof, let’s assume that we have a computer network that has nodes in different data centers across the world. Now, let’s say that a network partition occurs, which means that some of the nodes are no longer able to communicate with each other. To keep the database consistent, a node must be able to talk to all other nodes at all times. If a network partition occurs, this becomes impossible. Similarly, when a network partition occurs, a node must be able to continue serving its data. Otherwise, a client could be served stale data from a previous network state. But this is impossible if a network partition occurs. In other words, in order to be available, a node cannot wait for the network partition to end. Otherwise, the node cannot continue serving data to clients.
PACELC Theorem – an extension of CAP Theorem
It is often argued that in modern cloud technology world CAP theorem perspective has been disproven using PACELC but indeed it is not the case.
One of the developments of this line of argument is an extension to the CAP Theorem: the PACELC Theorem. It makes us think beyond consistency and availability and instead it places an emphasis on the trade-off between consistency and latency.
The PACELC Theorem builds on the CAP Theorem (the ‘PAC’) and adds an else (the ‘E’). What this means is that while you need to choose between availability and consistency if communication between partitions has failed in a distributed system, even if things are running properly and there are no network issues, there is still going to be a trade-off between consistency and latency (the ‘LC’).
Key Take Away for modern system design
Basically, the CAP theorem tells us that distributed databases design is complex. So Google and Amazon blessed us with custom-built distributed databases. Meanwhile, distributed databases that are designed for general-purpose use have to make choices that go against the CAP theorem. For example, Apache Cassandra compromises with consistency for availability. Google’s Spanner compromises with availability for consistency. In both cases, the architect behind the distributed databases have made a conscious decision to go against the theorem.
Despite modernizations and rethought algorithms such as partially synchronous models, CAP theorem remains relevant today. We are, however, in a better shape in terms of dealing with the problem than we used to be by using mechanisms like Partially-Synchronous or Model Weak/Eventual Consistency. Architects have explored strongly consistent solutions, with best-effort availability; they have explored weakly consistent solutions with high availability; and they have explored systems that mix both weaker availability and weaker consistency in varying ways.
About Dhiraj Kumar Gupta
Over 15 Years of versatile experience in IT. Currently working with Capgemini Technology Solutions as a Principal Solution Architect. As Application, Cloud and IoT Solution architect have Architected and Designed Enterprise grade solutions for various customers across globe. Has Deep Level Expertise with hands-on experience in AWS Services in Applications, IoT, Data, Integration, Analytics & Security Services. Has played role in Solutioning, Architecture, System Design, Microservices and Data Analytics. He has worked in various MNCs for Fortune 500 client’s locations including US, Europe & UK in domains like Banking, Telecom, Hospitality & Industrial IoT, Pharma & Insurance. Experience in Serverless solutions, IaaC, DevSecOps methodologies. Developed Solutions on Real Time & Batch analytics, Server Migration Factory Solutions & anomaly detection. He is currently leading an account Liberty Mutual (60+ team members) as senior Architect. Led OneDeliver’s OneMigrate AWS accelerator as solution architect.
More on Dhiraj Kumar Gupta.