Cassandra Data distribution and replication Part-2

Cassandra Data distribution and replication Part-2.

Refer Cassandra Consistence hashing part-1 using below link.

Cassandra : Cassandra Data distribution and replication

In part-1 we have seen data distribution and replication.

In part-2 you’re going to learn different types of data distribution architectures present in the Cassandra.

We must ensure that the data is evenly divided across the all nodes in the cluster.

This can help to maintain the load balancing and healthy cluster.

As per Cassandra distribution and replication no two nodes share the same token value with in the same datacenter, even if they are in different datacenters.

What is Token?

Tokens are hash values that partitioners use to determine where to store rows on each node in the ring.

Cassandra having two types of data distribution architecture.

  • Single-token architecture (Manual Nodes)
  • Virtual Nodes architecture(V Nodes)

Single Token Architecture Nodes.

Single token architecture nodes can capable to allow the multiple tokens and responsible for store the data according to tokens allocated for that node.

Using the below parameter value we can configure the Single token architecture nodes in a cluster.

The initial_token parameter present in the cassandra.yaml configuration file.

The value of the parameter always different for all nodes present in the datacenter and entair cluster.

We can calculate the token values for cassandra cluster using python and Token Generator.

Tokens Calculation using Python.

Partitioner type: murmur3partitioner

Generate tokens for 6 nodes

partitioner type: murmur3partitioner

i1: -9223372036854775808

i2: -6148914691236517206

i3: -3074457345618258604

i4: -2

i5: 3074457345618258600

i6: 6148914691236517202

Tokens Calculation using Token Generator.

partitioner type: murmur3partitioner

Generate tokens for 6 nodes

DC #1:

Node #1: -9223372036854775808

Node #2: -6148914691236517206

Node #3: -3074457345618258604

Node #4: -2

Node #5: 3074457345618258600

Node #6: 6148914691236517202

Tokens Calculation using Token Generator for Multi Detacenter.

partitioner type: murmur3partitioner

Generate tokens for DC1=3 Nodes and DC2=2 Nodes

DC #1:

Node #1: -9223372036854775808

Node #2: -3074457345618258603

Node #3: 3074457345618258602

DC #2:

Node #1: -78496783292381071

Node #2: 9144875253562394737

Single token architecture:

When you Adding the new node/nodes to existing datacenter or existing cluster

Removing the new node/nodes from existing datacenter or existing cluster

We need to calculate the token values.

As per new token values we need to configure the initial_token values in cassandra.yaml file.

Adding, removing, rebalancing the nodes are difficulty as compare with V-Nodes.

Lot of administration is required to manage the cluster.

Virtual Nodes.

Virtual nodes are known as Vnodes. Cassandra introduced the V-Nodes in Version 1.2.

V-Nodes architecture nodes can capable to allow the multiple tokens and responsible for store the data according to tokens allocated for that node.

Using the below parameter value we can configure the V-Nodes architecture nodes in a cluster.

The num_tokens parameter present in the cassandra.yaml configuration file.

The value of the parameter always Same for all nodes present in the datacenter and entair cluster.

V-nodes simplify many tasks in Cassandra.

We no need to calculate and assign tokens to each node in a datacenter/cluster.

Adding, removing, rebalancing the nodes are very easy and faster changes can achieve with V-Nodes.

When adding or removing nodes from datacenter or cluster, cassandra itself calculate and distribute the token values and data across cluster. So Rebalancing the cluster is not required.

 

Note: Please test scripts in Non Prod before trying in Production.
1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 5.00 out of 5)
Loading...

Add Comment