In this article, we will learn about COMPRESSION IN TERADATA.
We all know that now-a-days how many transactions are happening per day in this digital world. To store those data in data warehouse we need more storage disk. So everyone is looking for reducing the cost of storing data. One way is to compress the data in the media. By compressing data we will get some benefits like-
- Reduces the storing cost as we can store more data into the disk.
- Reduces the I/O to read those data.
- Memory will hold more data for processing.
Teradata offers some Technique to compress the data like–
- Multi Value Compression(MVC) – Present since V2R5.
- Algorithmic Compression(ALC) – Present since 13.10.
- Block Level Compression(BLC) – Present since 13.10.
MVC ( Multi Value Compression ):
Multi value compression is a technique to apply compression on column level rather than row or block level. It is mainly developed for the column which has a large number of repeated values. MVC is dictionary based compression. The main idea behind this is that repeated values will be replaced by the bit pattern which will be saved in the table definition of a table. During the data access, this bit pattern as a part of dictionary will come to memory and used to look up the original value.
We can define the values to be replaced with bit pattern in the CREATE TABLE or can use an ALTER table to modify the columns of an existing table to define multi value compression.
How it works:
So how this MVC work? Is there any overhead to compress and decompress the data?
As we got to know that MVC is meant for the repetitive values only. So we need to analyze the column values first before applying MVC. Find the below example for better understanding-
From the above example, we can see that the column “Gender” has been defined as CHAR (1). That means it will occupy 1 byte of space. If we analyze, we will find that for the column “Gender” values will either ‘M’ or ‘F’. So in MVC technique, Teradata will replace these values like- 01 for ‘M’ and 11 for ‘F’ that means only 2 bits instead of 1 byte.
This information will be stored in the header. As the table header becomes memory resident during access or any operation, there will be no overhead to compress and de-compress the data. Here’s how table header and data will look like after applying MVC-
Like this we can apply MVC on column having repeated values and data type like BYTE, VARCHAR, INTEGER, DATE, DECIMAL, etc. We can apply MVC on 255 values per column and due to MVC, table header size will increase depending on number of values we are going to compress.
Algorithmic Compression (ALC) :
Algorithmic Compression (ALC), a new compression technique introduced in Teradata 13.10, is mainly designed for compressing data with well-know columns. Teradata provides a few built in ALC algorithms. For example, to compress Unicode column data Teradata provided TransUnicodeToUTF8 algorithm. Apart from that you can also develop your own customize algorithm to compress data.
The compression algorithm is invoked during creation of the data block from the table’s data and the same algorithm is used to decompress the data when any user wants to access any value from that particular column. The decompression algorithm will not use on those columns which are not accessed.
Unlike multi value compression (MVC), ALC requires CPU resources to execute the compression and decompression algorithm.
Defining ALC in Teradata Tables:
Same as MVC, ALC is also defined on column level. We can define ALC in Teradata column using CREATE TABLE statement.
CREATE TABLE Student
Student_AddressCHAR(200) CHARACTER SET UNICODE
COMPRESS USING TransUnicodeToUTF8
DECOMPRESS USING TransUTF8ToUnicode)
UNIQUE PRIMARY INDEX(Roll_No);
In the above example, column Student_Address is compressed using the packaged TransUnicodeToUTF8 algorithm.
Both ALC and MVC can be defined on the same column. In this case, ALC will be applied only to those values that are not specified by MVC. ALC will be applicable only on column with CHAR, VARCHAR and BYTE data type.
Similarities between ALC and MVC
- Applied on column level.
- Cannot be defined on primary index column.
- Can be specified on secondary index although will be not applied to index subtable.
- Supported on permanent as well as global temporary tables.
- Not supported on volatile tables
- All NULLS are automatically compressed.
Differences from MVC
Key differences in functionality between ALC and MVC are as below-
- MVC can be applied on max 255 values per column wherein ALC can be applied in all the values irrespective of what their particular value is.
- ALC is limited to the CHAR, VARCHAR and BYTE data type while MVC can incorporate all the numeric as well as date data types.
- There is overhead involved to compress and decompress data in ALC, but in case of MVC there is no such kind of overhead.
Recommendation for using ALC
Algorithmic Compression is very useful for large character column which carries mainly descriptive information and accessed less frequently.
Unicode columns are another good candidate for ALC as Unicode takes more space to represent same value compare to Latin. For example, in above Student table, Student_Address is a good candidate for ALC.