In this article we will see Key components in Cassandra.
Cassandra uses The Gossip protocol for internal communication between nodes in a cluster. A peer-to-peer communication protocol to discover and share location and state information about the other nodes in a Cassandra cluster. Gossip information is also persisted locally by each node to use immediately when a node restarts.
A Partitioner determines which node will receive the first replica of a piece of data, and how to distribute other replicas across other nodes in the cluster.
Each row of data is uniquely identified by a primary key which may be the same as its partition key. Cassandra may also include other clustering columns.
A Partitioner is a hash function that derives a token from the primary key of a row.
The Partitioner uses the token value to determine which nodes in the cluster receive the replicas of that row. The Murmur3Partitioner partitioning strategy is the default from Cassandra 1.2 and later.
The total number of replicas across the cluster.
A replication factor of 1means that there is only one copy of each row. A replication factor of 2 means two copies of each row on the cluster where each copy is on a different node.
In Cassandra All replicas are equally important there is no primary or master replica. You can define the replication factor for each datacenter.
replication_factor: DC1:3, DC2:2
Generally you should set the replication strategy greater than one.
Replica placement strategy
Cassandra stores replicas of data on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines which nodes to place replicas on.
The first replica of data is simply the first copy.
Cassandra uses the different types of Replica placement strategies.
- Simple placement Strategy.
- The Network Topology Strategy.
- The Old Network Topology Strategy.
The Network Topology Strategy is highly recommended for most deployments because it is much easier to expand to multiple datacenters when required by future expansion.
When creating a keyspace, you must define the replica placement strategy and the number of replicas you want.
A snitch defines groups of machines into datacenters and racks that the replication strategy uses to place replicas.
We must configure a snitch when you create a cluster. Cassandra uses the Different kinds of Snitches. As per the Configuration we need to select the appropriate snitch.
All snitches use a dynamic snitch layer which monitors performance and chooses the best replica for reading. It is enabled by default and recommended for use in most deployments. Configure the snitch values for each node in the cassandra.yaml configuration file.
Cassandra recommends the Gossiping Property File Snitch for production environments. It defines a node’s datacenter and rack and uses gossip for propagating this information to other nodes.
The cassandra.yaml configuration file
Cassandra.yaml is the main configuration file which contents the all initialization properties for a cluster. By using the Cassandra.yaml file we can set the caching parameters for tables, properties for tuning and resource utilization, timeout settings, client connections, backups, and security etc.
By default Cassandra.yaml file uses to set cluster name and a node is configured to store the data it manages in a directories.
ex: commit logs, SSTables and Server logs.
The location of the cassandra.yaml file depends on the type of installation:
Package installations: /etc/cassandra/cassandra.yaml
Tarball installations: install_location/resources/cassandra/conf/cassandra.yaml
Thank you for giving your valuable time to read the above information. Please click here to subscribe for further updates
KTEXPERTS is always active on below social media platforms.