In this article , we will see JOIN INDEXES IN TERADATA.
A join index is an indexing structure containing columns from multiple tables, specifically the resulting columns from one or more tables. Rather than having to join an individual table each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance.
- It is used to define a pre-join table on frequently join column.
- You can define Join Index in Teradata on Single or multiple tables.
- Can be defined to create a partial or full replication of base table with a different primary index.
- Join Index is stored in the permanent space and maintained by the system automatically.
- You cannot fire any query on the join index directly.
Keep in mind that the defining join index in Teradata does not imply that parsing engine will use join index always. It is up to Teradata parsing engine whether to use join index or access from the base table.
You can create a join index to perform any of the following operations:
- Join multiple tables, optionally with aggregation, in a prejoin table.
- Replicate all or a vertical subset of a single base table and distribute its rows by a primary index on a foreign key column to facilitate joins of very large tables by hashing them to the same AMP.
- Aggregate one or more columns of a single table or the join results of multiple tables in a summary table.
- Support querying only those rows that satisfy the conditions specified by its WHERE clause. This is known as a sparse join index.
- If the index has a unique primary index, and a request specifies an equality condition on the columns that define the primary index for the index, then the index can be used for the access path in two-AMP join plans similarly to how USIs are used.
- The guidelines for creating a join index are the same as those for defining any regular join query that is frequently executed or whose performance is critical. The only difference is that for a join index the join result is stored as a subtable and automatically maintained by Teradata Database.
Performance and Join Indexes
Requests that can use join indexes can run many times faster than queries that do not use them. Performance improves whenever the Optimizer can rewrite a request to use a join index instead of the base tables specified by the query.
A join index is most useful when its columns can cover most or all of the requirements in a request. For example, the Optimizer might consider using a covering index instead of performing a merge join.
Covering indexes improve the speed of join queries. The extent of improvement can be dramatic, especially for requests involving complex, large-table, and multiple-table joins. The extent of the improvement depends on how often an index can be used to rewrite a query.
In-place join indexes, where the columns of the covering index and the columns of the table to which it is to be joined both reside on the same AMP, outperform indexes that require row redistribution. An in-place, covering, aggregate join index that replaces 2 or more large tables in requests with complex joins, aggregations, and redistributions can enable a request to run hundreds of times faster than it would otherwise.
There are different types of join index in Teradata:
Single Table Join Index(STJI):
A single table join index is used to create indexing structure for a single table but with different primary index. This improves the performance of joins as no distribution or duplication is needed. The user will query on the base table, but PE will decide whether to access base table or single table join index.
Single Table Join Index syntax:
CREATE JOIN INDEX STUDENT_IX
SELECT ROLL_NO,STUDENT_NAME, DEPT_NO, PH_NO
Here DEPT_NO column defined as primary index of STJI. So during join processing with DEPARTMENT table on DEPT_NO, there is no need to re-distribute STUDENT_BASE_TABLE table using DEPT_NO column as we already defined STUDENT_IX with the primary as of DEPT_NO
Multi Table Join Index(MTJI):
A multi table join index is used to hold pre-join result set from the two or more columns. So during join processing PE may decide to access data from the MTJI rather than joining again underlying base tables. We need to remember that we should define MTJI after lots of analysis based on frequency and cost of joining.
Multi Table Join Index syntax:
CREATE JOIN INDEX STUDENT_DEPT
SELECT ROLL_NO, STUDENT_NAME ,DEPT_NO, ADDRESS, PH_NO
FROM STUDENT_BASE_TABLE A
ON A. DEPT_NO = B. DEPT_NO
UNIQUE PRIMARY INDEX(ROLL_NO);
Aggregate Join Index:
Aggregate join indexes offer an extremely efficient method of resolving queries that frequently specify the same aggregate operations on the same column or columns. When aggregate join indexes are available, the system does not have to repeat aggregate calculations for every query. An aggregate join index can be defined on two or more tables, or on a single table.
An aggregate join index can be created using:
GROUP BY clause
Aggregate Join Index example: CREATE JOIN INDEX SALES_INX STORE_NO, SUM(QTY_SOLD) FROM SALES_BASE_TABLE GROUP BY 1, ORDER BY 2;
Sparse Join Index:
Sparse Join index a portion of the table using WHERE clause predicates to limit the rows indexed. When base tables are large, this feature can be used to reduce the content of the join index to only the portion of the table that is frequently used if the typical query only references a portion of the rows. This capability:
Reduce the space requirement for a Join Index.
Reduces cost to maintain join index.
CREATE JOIN INDEX SPARSE_BILL_INX
SEL ACC_NO, CUST_NAME, BILL_DATE, BILL_AMT
UNIQUE PRIMARY INDEX(ACC_NO);
Refreshing Join Indexes
The ALTER TABLE TO CURRENT statement enables you to refresh the content of a join index without having to drop it and recreate it.
The efficiency of the ALTER TABLE TO CURRENT alternative compared with dropping and recreating a join index depends on how often an ALTER TABLE TO CURRENT request is executed and the type of current date condition defined in the join index.
If the join index is refreshed infrequently and the current date condition requires a large volume of old rows to be removed and a large volume of new rows to be inserted, it might be more efficient to drop and recreate the join index.
The Join indexes are not supported by Fastload and Multiload utilities, They must be dropped and recreated after the table is loaded.
During restore of a base table or database join index is marked as an invalid.
Max 64 columns per table per Join Index.
Join Index Subtables cannot be fall back protected.