Share via:


Immutable SSTable files are used by Apache Cassandra to store data. The backup copies of the database data that are kept as SSTable files in the Apache Cassandra database are called backups. There are various uses for backups, some of which are as follows:
To preserve a copy of the data.
To be able to restore a table in the event that a node, partition, or network failure causes the loss of table data.
For portability, to be able to move the SSTable files to another computer.


A snapshot is a hard link-created copy of the SSTable files for a table at a specific moment in time. Also kept is the DDL needed to build the table. Users can create snapshots, or they can be created automatically. If snapshots are made prior to each compaction, it is determined by the setting snapshot_before_compaction in the cassandra.yaml file. Snapshot_before_compaction is initially set to false. Auto_snapshot can be set to true (default) in cassandra.yaml to automatically create snapshots prior to keyspace truncation or table deletion. The auto snapshots may cause truncates to be delayed, and there is another setting in cassandra.yaml that controls the length of time the coordinator should wait for truncates to finish. Cassandra waits 60 seconds by default for auto snapshots to finish.
When memtables are flushed to disk as SSTables, an incremental backup is a duplicate of the table’s SSTable files made by a hard link. In order to shorten backup times and save disk space, incremental backups are often combined with snapshots. It is need to explicitly enable incremental backups using nodetool or the incremental_backups parameter in cassandra.yaml as they are not enabled by default. When enabled, Cassandra builds a hard link in the backups/ subdirectory of the keyspace data for each SSTable that has been flushed or streamed locally. Additionally, system table incremental backups are produced.
We will generate some sample data in this part that may be used to illustrate incremental backups and snapshots. A Cassandra cluster with three nodes has been employed. The keyspaces are first made. Next, table data is inserted, and tables are created inside of a keyspace. Two keyspaces, cqlkeyspace and catalogkeyspace, each containing two tables, have been employed.
Create the keyspace cqlkeyspace:

Create a second keyspace catalogkeyspace:

In this section, we demonstrate creating snapshots. The command used to create a snapshot is nodetool snapshot with the usage:

Data Directory Structure:
The keyspace and table folders, along with the data files contained therein, make up the directory structure of Cassandra data. The table directory also contains backup and snapshot directories, which are used to store backups and snapshots for a certain table, respectively.
Sequential writes to Cassandra nodes first impact the Commit Log. (After that, Cassandra stores values in in-memory data structures called Memtables that are particular to a column family. Whenever a preset threshold is surpassed, the Memtables are flushed to disk. (1, memtable datasize). 2, the number of items reaches a predetermined limit, and 3, a memtable’s lifetime ends.))
Every keyspace has a subdirectory in the data folder. Three types of files are contained in each subfolder:
Information files: A file containing key-value string pairs that have been sorted by keys is known as an SSTable (a term taken from Google’s nomenclature).
File index: Pairs (key, offset) that point to data files.
Bloom filter: every key within the data set.



Author    : Neha Kasanagottu
LinkedIn :
Thank you for giving your valuable time to read the above information. Please click here to subscribe for further updates.
KTExperts is always active on social media platforms.
Facebook  :
LinkedIn    :
Twitter       :
YouTube   :
Instagram  :
Share via:
Note: Please test scripts in Non Prod before trying in Production.
1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)

Add Comment