Skip to main content

System design fundamentals: What is the CAP theorem?

👋 CAP Theorem is one of the important concepts used in Distributed Systems. 
Its applicability to various systems states that any distributed system or data store can simultaneously provide only two of three guarantees: consistency, availability, and partition tolerance (CAP).The theorem formalizes the tradeoff between consistency and availability when there’s a partition.

A distributed system is a network that stores data on more than one node (physical or virtual machines) at the same time. Because all cloud applications are distributed systems, it’s essential to understand the CAP theorem when designing a cloud app so that you can choose a data management system that delivers the characteristics your application needs most.

CAP Theorem Proof

Let’s look at a simple proof of the CAP theorem. Imagine a distributed system consisting of two nodes:


The distributed system acts as a plain register with the value of variable X. There’s a network failure that results in a network partition between the two nodes in the system. An end-user performs a write request, and then a read request. Let’s examine a case where a different node of the system processes each request. In this case, our system has two options:
  • It can fail at one of the requests, breaking the system’s availability
  • It can execute both requests, returning a stale value from the read request and breaking the system’s consistency
The system can’t process both requests successfully while also ensuring that the read returns the latest value written by the write. This is because the results of the write operation can’t be propagated from node A to node B because of the network partition.

Consistency, Availability, and Partition Tolerance explained

Let’s take a detailed look at the three distributed system characteristics to which the CAP theorem refers.

Consistency

Consistency means that all clients see the same data at the same time, no matter which node they connect to. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’

Availability

Availability means that any client making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.

Partition Tolerance

A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.
The CAP theorem states that it is impossible for a distributed system to simultaneously achieve consistency, availability, and partition tolerance. Therefore, it is important to understand the requirements of the system and choose a data management approach that meets its critical needs. It is also important to be aware of the CAP theorem when designing any cloud application or networked system.

CAP Theorem Database Architecture

Distributed networks heavily depend on NoSQL databases as they offer horizontal scalability, and they are highly distributed. Hence, they can easily and rapidly scale across a growing network of multiple interconnected nodes. But as discussed above, one can only have two of the three available functionalities. The different combinations and their use cases are discussed below:

CP System:

This system focuses more on consistency and partition tolerance. So these systems are not available most of the time. When any issue occurs in the system, it has to shut down the non-consistent node until the partition is resolved, and during that time, it is not available.

AP System: 

This type of database focuses more on availability and partition tolerance rather than consistency. When any issue occurs in the system, then it will no longer remain in a consistent state. However, all the nodes remain available, and affected nodes might return a previous version of data, and the system will take some time to become consistent.

CA System: 

This type of database focuses more on consistency and availability across all nodes than partition tolerance. Fault-Tolerance is the basic necessity of any distributed system, and hence it is almost rare to use a CA type of architecture for any practical purpose.

CAP Theorem and Microservices

Micro-services are loosely coupled, independently deployable application components that incorporate their own stack—including their own database and database model—and communicate with each other over a network. as have become especially popular in hybrid cloud and multi-cloud environments, and they are also widely used in on-premises data centers. If you want to create a micro-services application, you can use the CAP theorem to help you determine a database that will best fit your needs.

Conclusion

We are able to reach a comparatively greater degree of computing power thanks to the distributed systems architecture. Systems must be designed by taking into account the practical implications in everyday life and selecting the best design for the application. Now we understand why the CAP theorem is crucial for data management strategies. Given the complexity of the architecture, efficient network management is also necessary. When designed appropriately, these systems may eliminate problems like human error to maintain data relevancy and cut down on extra components.

Comments

Popular posts from this blog

Java Currency Formatter Solution

Given a  double-precision  number,  , denoting an amount of money, use the  NumberFormat  class'  getCurrencyInstance  method to convert   into the US, Indian, Chinese, and French currency formats. Then print the formatted values as follows: US: formattedPayment India: formattedPayment China: formattedPayment France: formattedPayment where   is   formatted according to the appropriate  Locale 's currency. Note:  India does not have a built-in Locale, so you must  construct one  where the language is  en  (i.e., English). Input Format A single double-precision number denoting  . Constraints Output Format On the first line, print  US: u  where   is   formatted for US currency.  On the second line, print  India: i  where   is   formatted for Indian currency.  On the third line...

Java Loops II print each element of our series as a single line of space-separated values.

We use the integers  ,  , and   to create the following series: You are given   queries in the form of  ,  , and  . For each query, print the series corresponding to the given  ,  , and   values as a single line of   space-separated integers. Input Format The first line contains an integer,  , denoting the number of queries.  Each line   of the   subsequent lines contains three space-separated integers describing the respective  ,  , and   values for that query. Constraints Output Format For each query, print the corresponding series on a new line. Each series must be printed in order as a single line of   space-separated integers. Sample Input 2 0 2 10 5 3 5 Sample Output 2 6 14 30 62 126 254 510 1022 2046 8 14 26 50 98 Explanation We have two queries: We use  ...

Java Static Initializer Block

Static initialization blocks are executed when the class is loaded, and you can initialize static variables in those blocks. It's time to test your knowledge of  Static initialization blocks . You can read about it  here. You are given a class  Solution  with a  main  method. Complete the given code so that it outputs the area of a parallelogram with breadth   and height  . You should read the variables from the standard input. If   or    , the output should be  "java.lang.Exception: Breadth and height must be positive"  without quotes. Input Format There are two lines of input. The first line contains  : the breadth of the parallelogram. The next line contains  : the height of the parallelogram. Constraints Output Format If both values are greater than zero, then the  main  method must output the area of the  parallelogram . Otherwise, pri...