Secondary Indexes are another way of improving performance of queries hitting a particular table on Teradata. It is highly recommended that you first read about Primary indexes here
Secondary index is a way of giving the optimizer an alternate path to data i.e. you can have another column apart from primary index that you can employ for your queries to choose an optimized path in case primary index column is not always the column on which most of your queries run. However, there is a trade off when using secondary indexes as there is an overhead in the maintenance of secondary indexes. Each amp on the table on which secondary index is defined will hold a Secondary index sub table which will take some space depending on the size of the base table. Let us see the same with an example. Consider the image below:
The image above has a table employee which has a secondary index defined on the field PAN. Now, we know that data of the table is distributed across Teradata based on the primary index value which in this case is empno. So, the data is distributed across 4 amps and each amp has its own copy of the secondary index subtable.
The secondary index sub table stores the following information:
- Secondary Index value
- Secondary Index row id
- Base table row id
SI Access Mechanism:
Now suppose the following SQL is submitted :
Select * from employee where PAN = ‘nn456a’;
Now, when the optimizer gets this SQL, it knows that it’s a secondary index , so first a hash of the value nn456a is calculated and the optimizer knows that the sub table containing this value resides on amp 0 (please refer image above). So, now it goes to amp 0 and checks the sub table for this value. The sub table has a column called base table row id, which in this case is 2,6.
So, then optimizer goes on to the amp containing this row id and fetches the result.
You can see that in the absence of a secondary index, we would have resulted in a full table scan and an all amp operation. But, because of a secondary index we could finish this operation by accessing only 2 amps. In the best case, you may find the sub table holding the base table row id residing in the same amp which will make it a 1 amp operation. So, having this SI has dramatically improved the performance of our query.
Please be wary that this has a overhead as stated before and SI should be implemented only for those tables in which there is a column other than the PI which is accessed a lot in a number of queries.
There are two kinds of SI similar to PI
- Unique Secondary Index (USI)
- Non Unique Secondary Index (NUSI)
The access example of SI above is of a USI as the SI values in PAN are all unique. In case of NUSI, the base table row id in the sub table can have multiple values because of the SI being unique . So it can either be a 2 amp operation or an all amp operation depending on how much uniqueness the SI has.
Secondary index can be created as follows at any time.
create index(column_name) on tablename;
It is very good idea to collect stats on the secondary index columns. There can be a maximum of 32 SI’s that can be defined on a table in Teradata
This is it for Secondary Indexes. Do post questions in comments if you have any.