Cassandra is a Distributed NoSQL database means all the data is distributed across the Cluster. in Cassandra data distribution and replication go together.
The distribution and replication depending on the partition key, key value and Token range.
A collection of ordered columns fetched by table row. A table consists of columns and has a primary key.
On the above diagram column1 having the Primary key
Primary key value is acting as the partition key. A Partitioner uses the hash functioning and determines Token range from partition key.
A Partitioner determines which node will receive the first replica of a piece of data, and how to distribute other replicas across other nodes in the cluster.
Refer this below link for more info.
A Partitioner uses the Consistent hashing, It allows distribution of data across a cluster.
The Consistent hashing minimizes reorganization of cluster when nodes are added or removed.
Cassandra uses the Murmur3hash function (default) for Consistent hashing. Each node in the cluster is responsible for a range of data based on the hash value.
In the below diagram you can see the distributed token range for 4 node cluster.
Example: A table with user details.
On the above diagram USERID having the Primary key. Hash values in a four node cluster.
Murmur3 Partitioner generates a hash value to each partition key.
|Partition key||Murmur3 hash value|
Cassandra places the data on each node according to the value of the partition key and the range that the node is responsible for the data.
Replication factor indicates the total number of replicas across the cluster.
Let’s tack Replication factor 2 for this 4 node cluster.
A replication factor of 2 means two copies of each row on the cluster where each copy is on a different node. In Cassandra All replicas are equally important there is no primary or master replica.
Continue with part2……