Teradata DBA: How to check table distribution.

Question:

How can you tell if a table is well distributed?

Answer:

We hope (by making good PI choices and hashing on the PI columns) to get data distributed across the amps as evenly possible. Ideally, each amp would have the same number of rows. Rarely do we achieve perfection, but we often get close.

The following SQL will count the number of rows on each amp and report the (positive or negative) deviation from the average.

Select dt1.a (title'AMP') ,dt1.b (title'Rows') ,((dt1.b/dt2.x (float)) - 1.0) * 100 (format'+++9%',title'Deviation') from (select hashamp(hashbucket(hashrow(<primary index>))) ,count(*) from <databasename>.<tablename> group by 1 )dt1 (a,b) ,(sel (count(*) / (hashamp()+1)(float)) FROM <databasename>.<tablename> )dt2(x) order by 2 desc,1;

Ex :

Select dt1.a (title'AMP') ,dt1.b (title'Rows') ,((dt1.b/dt2.x (float)) - 1.0) * 100 (format'+++9%',title'Deviation') from (select hashamp(hashbucket(hashrow(RetailstoreID))) ,count(*) from teraqa.copy_src group by 1 )dt1 (a,b) ,(sel (count(*) / (hashamp()+1)(float)) FROM teraqa.copy_src )dt2(x) order by 2 desc,1;

Running this will show:

AMP Rows Deviation

----------- ----------- ---------

9 63 +15%

2 62 +13%

4 61 +11%

5 60 +9%

11 59 +7%

12 59 +7%

13 58 +5%

15 57 +4%

14 55 +0%

0 53 -4%

8 52 -5%

10 52 -5%

3 51 -7%

6 51 -7%

7 51 -7%

1 49 -11%

You need not worry until the max or min vary from the average by at least 15%.

Hash synonyms (two different values hashing to the same rowhash) are sometimes an issue with distribution questions. Here is a query to check:

.retlimit 10

Select h (title'Hash') ,c (title'Frequency') from (select hashrow(<primary index>) ,count(*) from <database>.<tablename> group by 1 )dt (h,c) qualify rank(c desc) <= 10 order by 2 desc, 1;

Ex :

Select h (title'Hash') ,c (title'Frequency') from (select hashrow(RetailstoreID) ,count(*) from teraqa.copy_src group by 1 )dt (h,c) qualify rank(c desc) <= 10 order by 2 desc, 1;

For example:

BTEQ -- Enter your DBC/SQL request or BTEQ command:

show table temp.table1;

*** Text of DDL statement returned.

*** Total elapsed time was 1 second.

---------------------------------------------------------------------------

CREATE SET TABLE temp.table1 ,NO FALLBACK ,

NO BEFORE JOURNAL,

NO AFTER JOURNAL

(

c1 CHAR(44) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT '

c2 INTEGER NOT NULL,

c3 SMALLINT NOT NULL,

c4 CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL DEFAULT '

PRIMARY INDEX I1 ( c1 )

UNIQUE INDEX I2 ( c1 ,c2 )

INDEX I3 ( c3 );

References to Procedures:

Additional Info/Comments:

A lump is a bunch of rows where some column has the same value. Lumpiness is the opposite of uniqueness. The ultimate lump would be all rows in a table having the same value in some column.

Current Attachments:

Teradata DBA

Monday, February 13, 2017

How to check table distribution.

No comments:

Post a Comment