Teradata Performance Tuning - Basic Tips Performance tuning
thumb rules.
Here are very basic
steps which are used to PT any given query in given environment . As a
prerequiste , make sure
- user has proper
select rights and actual profile settings
- Enough space
available to run and test the queries
1. Run explain plan
(pressing F6 or “EXPLAIN sel * …”,)
Then see for potential
information like
- No or low confidence
- Product joins
conditions
- By way of an all row
scan - FTS
- Translate
Also check for
- Distinct or
group by keywords in SQL query
- In/ not in
keywords and check for the list of values generated for the same
APPROACHES
A. In case of product
join scenarios,check for
- Proper usage of
alias
- joining on matching
columns
- Usage of join
keywords - like specifying type of joins (ex. inner or outer )
- use union in case of
"OR” scenarios
- Ensure statistics
are collected on join columns and this is especially important if the columns
you are joining on are not unique.
B. collects stats
- Run command
"diagnostic help stats on for the session"
- Gather information
on columns on which stats has to be collected
- Collect stats on
suggestions columns
- Also check for stats
missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename>
- Make sure stats are
re-collected when at-least 10% of data changes
- remove unwanted
stats or stat which hardly improves performance of the queries
- Collect stats on
columns instead of indexes since index dropped will drop stats as well!!
- collect stats on
index having multiple columns, this might be helpful when these columns are
used in join conditions
- Check if stats are re-created for tables whose structures have some changes
- Check if stats are re-created for tables whose structures have some changes
c. Full table scan
scenarios
- Try to avoid FTS scenarios
as, it might take very long time to access all the data in every amp in the
system
- Make sure SI is
defined on the columns which are used as part of joins or Alternate access
path.
- Collect stats on SI
columns else there are chances where optimizer might go for FTS even when SI is
defined on that particular column
2. If intermediate
tables are used to store results, make sure that
- It has same PI of
source and destination table
3. Tune to get the
optimizer to join on the Primary Index of the largest table, when possible, to
ensure that the large table is not redistributed on AMPS
4. For large list of
values, avoid using IN /NOT IN in SQLs. Write large list values to a
temporary table and use this table in the query
5. Make sure when to
use exists/not exists condition since they ignore unknown comparisons
(ex. - NULL value in the column results in unknown) . Hence this leads to
inconsistent results
6. Inner Vs Outer
Joins
Check which join works
efficiently in given scenarios.Some examples are
- Outer joins can be
used in case of large table joining with small tables (like fact table joining
with Dimension table based on reference column)
- Inner joins can be used when we get actual
data and no extra data is loaded into spool for processing
Please note for outer
join conditions:
1. Filter condition for inner table should be
present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition
2. Filter condition for outer table should be present in "WHERE" condition
Here is the step by
step process on how to collect statistics using SAMPLE
SAMPLE statistics is
collected using Random amp sampling and it is recommended to use when we dont
have stats collected on index or set of columns.
As a preparation step
we check whether table is suitable for SAMPLE STATISTICS collection using
following query
/* suggested to use
when data skew is less and also when more rows are there in table than number
of amps*/
SEL
TAB1.A AS TABLECOUNT,
TAB2.B AS AMPCOUNT,
CASE WHEN TABLECOUNT > AMPCOUNT
THEN ' RANDOM AMP SAMPLING CAN BE
SUGGESTED'
ELSE 'RANDOM AMP SAMPLING NOT
NEEDED'
END
FROM
(SEL COUNT (*) AS A FROM TABLENAME)
TAB1,
(SEL HASHAMP () +1 AS B) TAB2;
Below is step by step
process on Collecting statistics using SAMPLING
1. Check if Stats are already collected on
the column of table for which Random AMP sampling is to be considered using
HELP STATISTICS ON
<YOUR_DB>.<YOUR_TB>;
If
YES, then this situation is tricky and do you still want to try out SAMPLING or
look for other recommendations is up-to you..
2. If NO then, check if column is highly
skewed using following Query.
SELECT HASHAMP (HASHBUCKET (HASHROW
(<YOUR_COLUMN>))) ,
COUNT (*)
FROM <YOUR_DB>.<YOUR_TB>
FROM <YOUR_DB>.<YOUR_TB>
GROUP BY 1;
If you see that Data is equally
distributed among all the amps (Variance of +-5 % is accepted),
If
there is large amount of DATASKEW in one AMP, then SAMPLING is not a good
option
3. If you don’t find data skew on any
particular AMP then,
Run sample statistics on column of
particular table as follows.
COLLECT STATICSTICS ON
<YOUR_DB>.<YOUR_TB> COLUMN (<YOUR_COLUMN>) USING SAMPLE;
4. Check the performance of query after
running sample STATS, also note the time taken for
collecting sample stats.
5. If not satisfied with performance, try
to run full statistics on columns and measure performance and time taken to
collect full stats
6. Decide which is the best option “FULL
STATS or SAMPLE“considering factors like
- Performance,
- Time taken for statistics
collection on scenarios,
- Table size,
- Data skew,
- Frequency of table being loaded
- How many times this table would be used in your environment.
Advantages of Collecting Stats using Sample
1. Only a sample of table rows is scanned.
Default being 2%.
It is based on random amp sampling
estimate of total rows. If you want to override the default value for
particular session then use,
DIAGNOSTIC "COLLECTSTATS,
SAMPLESIZE=n" ON FOR SESSION;
2. It uses less CPU and I/O resources
compared to full statistics hence saving considerable amount of time and
resources.
It is not recommended to used for
1. Columns which are not indexed
2. Indexes which has lot of duplicates or
non unique combinations
3. for small tables like dimension /key
tables
4. for tables that have greater data skew.
Please note that Sample statistics cannot be
collected on
1. Global temporary tables
2. Join indexes
VIEWPOINT
Active: running queries present
Block: if two sessions working on same resource at
a time.one session should be in waiting State.
Delay:
id defined sessions on workload limit crosses, upto limit all sessions should
be execute.remaining sessions should be delayed up to completing the old
sessions.
Abort:
after delete the query it will go to abort state time limit is 5 to 10 seconds.
Response: query already in active state,but it is taking time to display
the output.that period is called responding time.due this sometimes sessions
will be getting Hanged and some time if table contains huge amount of rows it
will not execute.(more than 3 hrs).
Idle: without work.
Parse :
the query is starting in that planning for the query .
Not Idle: other than Idle
Parse---àActive-----à Response
Hi, I really loved reading this article. By this article i have learnt many things about OBIEE QAs, please keep me updating if there is any update.
ReplyDeleteTeradata Online Training
Teradata Training
Teradata Online Course keep updating.........