SAMPLE FUNCTION IN TERADATA

Share via:
SAMPLE FUNCTION

Dear Readers,

In this article, we will discuss SAMPLE FUNCTION

As the name suggests, SAMPLE function will an idea on the data stored in the table/view by returning few numbers of rows specified either as a list of fractions of the total number of rows or as a list of numbers of rows from the SELECT query.

The function can be specified in two ways  :

  • SAMPLE n – Will return a sample of n rows.
  • If the number n is greater than the number of rows in the table, the sample will consist of the number of rows in the        table.

Sample Syntax

Here n can either of the below two :

  • An actual number of rows.
  • A percentage of tables. (n can be any value between 1.00 and .00 )

If the number n is greater than the number of rows in the table, sample n will return all rows from the table.

Example :

Consider the following table.

STUDENT_ID FIRST_NAME LAST_NAME DEPARTMENT
1 Alex Green IT
2 Monica Adams Sales
3 Peter Williams IT
4 Chandler Bing HR
5 Ross Buffay Finance


Select a sample of2 records from the student table :

Explain plan :

Explanation

——————————————————————–

1) First, we lock KTExp.student in TD_MAP1 for read on a reserved RowHash to prevent global deadlock.

2) Next, we lock KTExp.student in TD_MAP1 for read.

3) We do an all-AMPs SAMPLING step in TD_MAP1 from KTExp.student by way of an all-rows scan with no residual conditions into Spool1(group_amps), which is built locally on the AMPs.  Samples arespecified as a number of rows.  The size of Spool 1 isestimatedwith high confidence to be 2 rows (184 bytes).

4) Finally, we send out an END TRANSACTION step to all AMPs Involvedin processing the request.

-> The contents of Spool 1 are sent back to the user as the result ofstatement 1.

  1. Select 30% sample records from the student table.

We have a total of 5 rows in the Student table. 50% percent of total rows is 2.5 rows. So, the output is 3 rows.

Fractional result less than .4999 will not add any row as shown below.

SAMPLE using SAMPLEID

SAMPLEID is an extension for SAMPLE to get multiple set of samples in a single query.

Example

In the example above , the table has a total of 5 rows whereas we have requested to provide sample of 7, since there are no enough rows to return , we get a WARNING message in BTEQ stating “7473 Requested sample is larger than table rows. All rows returned“.

Note:This warning is not seen in ODBC.

Explain Plan:

Explanation

1) First, we lock KTexp.STUDENT in TD_MAP1 for read on a reservedRowHash to prevent global deadlock.

2) Next, we lock KTExp.STUDENT in TD_MAP1 for read.

3) We do an all-AMPs SAMPLING step in TD_MAP1 from TEST.STUDENT by way of an all-rows scan with no residual conditions into Spool 1(group_amps), which is built locally on the AMPs.Then we do ASORT to order Spool 1 by the sort key in spool field1 ( KTExp.STUDENT.Field_1, KTExp.STUDENT.STUDENT_ID).  Samples are specified as a number of rows.  The size of Spool 1 is estimated with high confidence to be 5 rows (255 bytes).

4) Finally, we send out an END TRANSACTION step to all AMPs Involvedin processing the request.

-> The contents of Spool 1 are sent back to the user as the result ofstatement 1.

We can also use percentage with SAMPLEID as shown below :

 

Thank you for giving your valuable time to read the above information.
Follow us on 
Website  www.ktexperts.com
Facebook Page KTExperts Facebook
Linkedin Page : KT EXPERTS Linkedin

Share via:
Note: Please test scripts in Non Prod before trying in Production.
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Add Comment