In Teradata, Primary index determines the data distribution for a table’s data. Lets recall that if the table has a unique primary index, then the row hashes will be unique and data will be evenly distributed across the amps.However, if the data contains duplicates, then higher the degree of duplication, higher would be the non-uniform distribution of data.
Suppose we have a table who primary index is a column whose data is highly non-unique. This high non unique data will cause rows having same primary index column value being stored on the same amp. This causes a lot of issues as far as the space and performance is concerned.
This uneven distribution of data can cause one of the amps to get full even though other amps have space. Now as one amp gets full, no more data can be inserted and even though on a whole the database has available space, any table creation will fail because of one amp being out of space.
Also this one amp which has more data when compared to other amps, will take more time to perform tasks as it has more data and hence this amp will be called as hot amp. Recall that any query is as fast as the slowest amp in the system. So, a hot amp will cause slow performance and degrade performance. So, it is very important that we choose the PI wisely.
If a bad PI is chosen, we will have no other option other than taking a backup of the data and inserting it into a new table with correct PI defined as once a table is created, its PI cannot be changed.
This degree of skewness (non uniqueness) of data in percentage is known as SKEW FACTOR. The skew factor of a table determines what percent of the table’s data is skewed. Below is the query to find the skew factor:
12345678910111213141516171819 SELECTTABLENAME,SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,(100-(AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTORFROMDBC.TABLESIZEWHERE DATABASENAME= 'DATABASENAME'ANDTABLENAME ='TABLENAME'GROUP BY tablename;
It is important that we choose the PI wisely and continuously monitor the skew factors of bigger tables as they can impact the database performance if not taken care of.