Teradata DBA: Some IMP points related to Performance Tuning in Teradata

Teradata Performance Tuning - Basic Tips Performance tuning thumb rules.

Here are very basic steps which are used to PT any given query in given environment . As a prerequiste , make sure

- user has proper select rights and actual profile settings

- Enough space available to run and test the queries

1. Run explain plan (pressing F6 or “EXPLAIN sel * …”,)

Then see for potential information like

- No or low confidence

- Product joins conditions

- By way of an all row scan - FTS

- Translate

Also check for

- Distinct or group by keywords in SQL query

- In/ not in keywords and check for the list of values generated for the same

APPROACHES

A. In case of product join scenarios,check for

- Proper usage of alias

- joining on matching columns

- Usage of join keywords - like specifying type of joins (ex. inner or outer )

- use union in case of "OR” scenarios

- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

B. collects stats

- Run command "diagnostic help stats on for the session"

- Gather information on columns on which stats has to be collected

- Collect stats on suggestions columns

- Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename>

- Make sure stats are re-collected when at-least 10% of data changes

- remove unwanted stats or stat which hardly improves performance of the queries

- Collect stats on columns instead of indexes since index dropped will drop stats as well!!

- collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions
- Check if stats are re-created for tables whose structures have some changes

c. Full table scan scenarios

- Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system

- Make sure SI is defined on the columns which are used as part of joins or Alternate access path.

- Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column

2. If intermediate tables are used to store results, make sure that

- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query

5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins

Check which join works efficiently in given scenarios.Some examples are

- Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column)

- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing

Please note for outer join conditions:

1. Filter condition for inner table should be present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition

COLLECTION OF STATISTICS USING SAMPLE

Here is the step by step process on how to collect statistics using SAMPLE

SAMPLE statistics is collected using Random amp sampling and it is recommended to use when we dont have stats collected on index or set of columns.

As a preparation step we check whether table is suitable for SAMPLE STATISTICS collection using following query

/* suggested to use when data skew is less and also when more rows are there in table than number of amps*/

SEL

TAB1.A AS TABLECOUNT,

TAB2.B AS AMPCOUNT,

CASE WHEN TABLECOUNT > AMPCOUNT THEN ' RANDOM AMP SAMPLING CAN BE SUGGESTED'

ELSE 'RANDOM AMP SAMPLING NOT NEEDED'

END

FROM

(SEL COUNT (*) AS A FROM TABLENAME) TAB1,

(SEL HASHAMP () +1 AS B) TAB2;

Below is step by step process on Collecting statistics using SAMPLING

1. Check if Stats are already collected on the column of table for which Random AMP sampling is to be considered using

HELP STATISTICS ON <YOUR_DB>.<YOUR_TB>;

If YES, then this situation is tricky and do you still want to try out SAMPLING or look for other recommendations is up-to you..

2. If NO then, check if column is highly skewed using following Query.

SELECT HASHAMP (HASHBUCKET (HASHROW (<YOUR_COLUMN>))) ,

COUNT (*)
FROM <YOUR_DB>.<YOUR_TB>

GROUP BY 1;

If you see that Data is equally distributed among all the amps (Variance of +-5 % is accepted),

If there is large amount of DATASKEW in one AMP, then SAMPLING is not a good option

3. If you don’t find data skew on any particular AMP then,

Run sample statistics on column of particular table as follows.

COLLECT STATICSTICS ON <YOUR_DB>.<YOUR_TB> COLUMN (<YOUR_COLUMN>) USING SAMPLE;

4. Check the performance of query after running sample STATS, also note the time taken for collecting sample stats.

5. If not satisfied with performance, try to run full statistics on columns and measure performance and time taken to collect full stats

6. Decide which is the best option “FULL STATS or SAMPLE“considering factors like

Performance,
Time taken for statistics collection on scenarios,
Table size,
Data skew,
Frequency of table being loaded
How many times this table would be used in your environment.

Advantages of Collecting Stats using Sample

1. Only a sample of table rows is scanned. Default being 2%.

It is based on random amp sampling estimate of total rows. If you want to override the default value for particular session then use,

DIAGNOSTIC "COLLECTSTATS, SAMPLESIZE=n" ON FOR SESSION;

2. It uses less CPU and I/O resources compared to full statistics hence saving considerable amount of time and resources.

It is not recommended to used for

1. Columns which are not indexed

2. Indexes which has lot of duplicates or non unique combinations

3. for small tables like dimension /key tables

4. for tables that have greater data skew.

Please note that Sample statistics cannot be collected on

1. Global temporary tables

2. Join indexes

VIEWPOINT

Active: running queries present

Block: if two sessions working on same resource at a time.one session should be in waiting State.

Delay: id defined sessions on workload limit crosses, upto limit all sessions should be execute.remaining sessions should be delayed up to completing the old sessions.

Abort: after delete the query it will go to abort state time limit is 5 to 10 seconds.

Response: query already in active state,but it is taking time to display the output.that period is called responding time.due this sometimes sessions will be getting Hanged and some time if table contains huge amount of rows it will not execute.(more than 3 hrs).

Idle: without work.

Parse : the query is starting in that planning for the query .

Not Idle: other than Idle

Parse---àActive-----à Response

Teradata DBA

Thursday, February 9, 2017

Some IMP points related to Performance Tuning in Teradata

1 comment: