=============================================================
When
and how should statistics be collected?
=============================================================
Procedure:
Statistics are data demographics
"hints" used by the Teradata optimizer. There are many ways to
generate a query plan for a given SQL, and collecting statistics ensures that
the optimizer will have the most accurate information to create the best access
and join plans. Without collected statistics, the optimizer assumes that any
column or non-unique index is highly non-unique and will create join plans
accordingly. Collect statistics on any columns or non-unique indexes which are
fairly or highly unique so that the optimizer can make proper use of them in
join plans, and thus, enhance performance.
===============================================================
Statistics
should be collected for the following tables / columns
===============================================================
For
small tables where the number of rows is less than 5 times the number of AMPs,
collect statistics on the primary index.
For
large tables, collect statistics on all NUPI's.For
all tables, collect statistics on columns in your JOIN condition (i.e. those
columns in any DML WHERE clause).
Once
collected, statistics should be maintained to accurately reflect the data
demographics of the underlying table. It is better to have no statistics than
stale or otherwise incorrect statistics. A rule of thumb is to collect
statistics when they've changed by 10%. (That would be 10% more rows inserted,
or 10% of the rows deleted, or 10% of the rows changed, or some combination.)
========================================================================
TTIP
1062 - Why table statistics should be kept fresh with COLLECT STATISTICS
=========================================================================
Tech
Note:
STATISTICS AND PLANS:
The
optimizing phase of the DBC/SQL language processor makes decisions on how to
access table data. These decisions can be very important when table joins
(especially those involving multiple joins) are required by a query. By
default, the Optimizer uses approximations of the number of rows in each table
(known as the cardinality of the table) and of the number of unique values in
indexes in making its decisions.
The
Optimizer gets its approximation of the cardinality of a table by picking a
random Access Module Processor (AMP) and asking that AMP how many rows there
are in the table. The chosen AMP does not actually count all of the rows it has
for the table, but generates an estimate based on the average row size and the
number of sectors occupied by the table on that AMP; the Optimizer then
multiplies that estimate by the number of AMPs in the system (making an
allowance for uneven hash bucket distribution) to estimate the table
cardinality. The number of unique index values is similarly estimated. Given that
most of the values involved in these estimates, other than the number of AMPs
in the system, is an approximation, it is possible (although unusual) for the
estimate to be significantly off. This can lead to poor choices of join plans,
and associated increases in the response times of the queries involved.
One
way to help the Optimizer make better decisions is to give it more accurate
information as to the content of the table. This can be done using the COLLECT
STATISTICS statement (see the Teradata DBS Reference Manual for details). When
the Optimizer finds that there are statistics available for a referenced table,
it will use those statistics instead of using estimated table cardinality or
estimated unique index value counts (that is, it will trust the statistics).
======================================================
STALE
STATISTICS
======================================================
Under
normal circumstances, this is the right thing to do; statistics provide more
detailed information, and include an exact row count as of the time that the
statistics were gathered. However, if the statistics are "stale" --
that is, the table's characteristics (distribution of data values for a column
or index for which statistics have been collected, number of rows in the table,
etc.) have changed significantly since the statistics were last gathered, the
Optimizer can be misled into making poor, or even horrible, join plans, with
associated poor performance of queries which use the stale statistics.
To
take an extreme case:
Table
A - statistics gathered when table had 1,000 rows, but table
now
has 1,000,000 rows (perhaps statistics were gathered
during
the prototyping phase)
Table
B - no statistics; table has 75,000 rows
If
a product join between Table A and Table B is necessary for a given query, and
one of the tables must be duplicated on all AMPs, then the Optimizer will
choose Table A to be duplicated, since 1,000 rows (from the stale statistics)
is much less than 75,000 rows. Since in reality Table A now has 1,000,000 rows,
the Optimizer will be making a very bad decision (duplicating 1,000,000 rows
instead of 75,000), and the query will run much longer than necessary.
Therefore,
it is important that statistics be kept fresh. The Reference Manual recommends
that statistics be recollected if there is as little as a 10% change -- rows
added or deleted -- in the table. There are two general circumstances under
which statistics can be considered to be stale:
1.
The number of rows in the table has changed significantly.
The
number of unique values for each statistic on a table, as well as the
date
and time the statistics were last gathered, can be obtained by:
HELP
STATISTICS tablename;
For
statistics on unique indexes, this can be cross-checked by comparing
the
row count returned by:
SELECT
COUNT(*) FROM tablename;
For
statistics on non-unique columns, the HELP STATISTICS result can be
cross-checked
by comparing the count returned by:
SELECT
COUNT(DISTINCT columnname) FROM tablename;
2.
The range of values for a index or column of a table for which statistics have
been collected has changed significantly. Sometimes one can infer this from the
date and time the statistics were last collected, or by the very nature of the
column (for instance, if the column in question holds a transaction date, and
statistics on that column were last gathered a year ago, it is almost certain
that the statistics for that column are stale).
=========================================================
REFRESHING
STALE STATISTICS
==========================================================
If
the statistics for a table may be stale, they can be easily recollected:
COLLECT
STATISTICS ON tablename;
will
recollect statistics on all indexes and columns for which previous COLLECT
STATISTICS statements were done (and for which DROP STATISTICS statements have
not been done). Because collecting statistics involves a full-table scan, it
may take a significant amount of time, and should therefore be done off-hours
for large tables. It may be revealing to execute:
HELP
STATISTICS tablename;
before
and after recollecting statistics, to see what, if any, difference the
recollect makes. Also, for frequently executed queries, requesting an EXPLAIN
before and after recollecting statistics may show differences in join plans
and/or spool row count/processing time estimates.
In
addition to keeping statistics current, the Reference Manual also recommends
that statistics be collected for the primary index of small tables involved in
joins.
STALE
STATISTICS EXAMPLE:
Here
is an example of stale statistics. The table definition is:
CREATE TABLE bpaefer.taemg22, FALLBACK,
( m22_m_fer_ff DECIMAL(10,2) NOT NULL
, m22_d_fin_validite DATE FORMAT
'YYYY-MM-DD'
, m22_m_fer_dev DECIMAL(9,2) NOT NULL
, m08_c_uic_m08 CHAR(2) NOT NULL
, m22_d_deb_v_id DATE FORMAT
'YYYY-MM-DD' NOT NULL
, p66_c_code_id CHAR(8) NOT NULL
)
UNIQUE PRIMARY INDEX ( m22_d_deb_v_id,
p66_c_code_id ) ;
Statistics
were collected on the unique primary index and individually on each of the two
columns comprising that index:
COLLECT
STATISTICS ON bpaefer.taemg22
INDEX
( m22_d_deb_v_id, p66_c_code_id ) ;
COLLECT
STATISTICS ON bpaefer.taemg22
COLUMN
m22_d_deb_v_id ;
COLLECT
STATISTICS ON bpaefer.taemg22
COLUMN
p66_c_code_id ;
A
recent check of this table and its statistics revealed the following:
SELECT
COUNT(*) FROM bpaefer.taemg22;
resulted
in:
Count(*)
-----------
9203
and
HELP
STATISTICS bpaefer.taemg22;
resulted
in:
Date
Time Unique Values Column Names
--------
-------- -------------------- ----------------------------
93/11/03
12:39:55 17 M22_D_DEB_V_ID
93/11/03
12:40:09 2,095 P66_C_CODE_ID
93/11/03
13:57:17 2,275 M22_D_DEB_V_ID,P66_C_CODE_ID
Note
that statistics were most recently collected almost three years ago (as of the
date of this TTIP). Also note that the last line from the HELP STATISTICS is
for the UNIQUE primary index, and therefore reflects the number of rows in the
table when the statistics were last gathered. At that time, there were 2275 rows
in the table; there are currently 9203 rows in the table, a factor of more than
four difference. Finally, note that the M22_D_DEB_V_ID column is a DATE, with
relatively few unique values. With the four-fold increase in the number of
rows, it is almost certain that additional values for this column are present.
It
is clearly time to recollect this table's statistics.
================================================================================
Script
to generate COLLECT STATISTICS statements for all non-unique indexes that have
not had statistics collected
================================================================================
Tech
Note:
The SQL below will generate COLLECT
STATISTICS statements for all non-unique indexes that have not had statistics
collected, except for database crashdumps and DBC. NOTE: You must logon as DBC
to generate the batch job and execute it. After executing the script there is a
file named as 'collect' that contains the COLLECT STATISTICS statements.
/*=================================================================*/
/*
Create a working database. */
create
database stats as perm = 10000000;
/*
Grant a right needed for macro. */
grant
select on DBC to stats with grant option;
/*
Create the work table. */
ct
stats.idx_info
(tvmid
byte(6)
,indexnumber
smallint
,NameText
varchar(512))
unique
primary index (tvmid,indexnumber);
/*
Create the macro. */
replace
macro stats.build_idx_info
(pass
smallint)
as
(
update stats.idx_info
set NameText = trim(NameText) ||
',' ||
trim(DBC.TVFields.fieldname)
where idx_info.tvmid = DBC.Indexes.tableid
and
idx_info.indexnumber = Indexes.indexnumber
and
TVFields.tableid = Indexes.tableid
and
TVFields.fieldid = Indexes.fieldid
and
DBC.TVM.tvmid = Indexes.tableid
and
DBC.Dbase.databaseid = TVM.databaseid
and
Indexes.fieldposition = :pass;
);
/*
Populate the work table. */
ins
stats.idx_info
sel
i.tableid
,i.indexnumber
,trim(f.fieldname)
from
DBC.indexes i
,DBC.tvfields f
,DBC.dbase d
,DBC.tvm t
where
f.tableid = i.tableid
and
f.fieldid = i.fieldid
and
t.tvmid = i.tableid
and
d.databaseid = t.databaseid
and
i.UniqueFlag <> 'Y'
and
i.fieldposition = 1
and
i.indexstatistics is null
and
d.databasenamei NOT IN ('dbc','crashdumps');
/*
Determine N. */
sel
max(i.fieldposition)
from
DBC.indexes i
,DBC.dbase d
,DBC.tvm t
where
t.tvmid = i.tableid
and
d.databaseid = t.databaseid
and
d.databasenamei NOT IN ('dbc','crashdumps')
and
i.UniqueFlag <> 'Y';
/*
Run macro to add fields. */
/*
Run multiple times. */
/*
Use arguments 2 to N. */
exec
stats.build_idx_info(2);
/*
ADD ADDITIONAL MACRO EXECUTIONS HERE!!! */
/*
Generate grant statements. */
.width
254
.export
report file=grant
sel
'grant drop table on '||
trim(d.databasename) ||
' to DBC;' (title'')
from
stats.idx_info idx
,DBC.tvm t
,DBC.dbase d
where
idx.tvmid = t.tvmid
and
t.databaseid = d.databaseid
group
by 1
order
by 1;
.export
reset
/*
Generate collect statistics statements. */
.export
report file=collect
sel
'collect statistics '||
trim(d.databasename) ||
'.' ||
trim(t.tvmname) ||
' index(' ||
trim(idx.NameText) ||
');' (title'')
from
stats.idx_info idx
,DBC.tvm t
,DBC.dbase d
where
idx.tvmid = t.tvmid
and
t.databaseid = d.databaseid
order
by d.databasename
,t.tvmname
,idx.indexnumber;
.export
reset
/*
Your work is done except for .running the files. */
============================================================
GSC
Canary Scripts for Teradata Performance Update
============================================================
Condition/Symptom:
The
Canary script has been replaced starting with V2R4.1 with a built in tool.
Here
is some important information regarding canary scripts:
We
were asked why a canary trigger for one customer site didn't fire when they are
constantly busy or are having problems.
There
can be multiple reason but one thing you need to check is to make sure the
canary script is logging on to the right performance group.
When
you (or customer) install the canary script, you want the user to logon in the
same performance group as where the problem is occurring. The default is using
SYSTEMFE as the userid; systemfe user usually runs under $M in the default
partition.
If
the customer has a lot of user partitions, most likely the perflook will not be
triggered since the default partition usually has highest weights.
For
this site, we found that the canary was logging in as $H:
You
can find this by in canary.sh. If you go to the bottom of the script or search
for "logon", you will find the userid. In this case the user id was
CanaryUser.
#
Part of canary.sh
#---------------------------------------------------------
bteq
>$MONITOR_DIR/bteq.o 2> $MONITOR_DIR/bteq.e <<- [end]
.logon
sysc/CanaryUser,hm53a2
.export
file=/dev/null
SELECT
* FROM DBC.DBCINFO;
.quit
[end]
Once
you find out the userid you will need to find the associated account string.
You will need to logon to the database. You can then run the following select
statement (replace canaryuser with the user id you found in your script):
select
accountname from DBC.DATABASES where databasename = 'canaryuser';
BTEQ
-- Enter your DBC/SQL request or BTEQ command:
select
accountname from dbc.databases where databasename = 'canaryuser';
select
accountname from dbc.databases where databasename = 'canaryuser';
***
Query completed. One row found. One column returned.
***
Total elapsed time was 1 second.
AccountName
------------------------------
$H$|&I
For
this user id, canaryuser, the user is logging on as $H. Therefore the user is
getting a very high priority and probably will not trigger the canary unless
the system really was hanging.
The
information came from the view databases:
If
you run help view dbc.databases, you will see quite a bit of information about
each database or user.
Following
is some information from the help.
help
view dbc.databases;
***
Help information returned. 16 rows.
***
Total elapsed time was 1 second.
Column
Name Type Comment
------------------------------
---- ---------------------------------------
DatabaseName
? The Databases.DatabaseName field identi
CreatorName
? The Databases.CreatorName field identif
OwnerName
? The Databases.OwnerName field identifie
AccountName
? The Databases.AccountName field identif
ProtectionType
? The Databases.ProtectionType field spec
JournalFlag
? The Databases.JournalFlag field specifi
PermSpace
? The Databases.PermSpace field specifies
SpoolSpace
? The Databases.SpoolSpace field specifie
TempSpace
? The Databases.TempSpace field specifies
CommentString
? The Databases.CommentString field conta
==============================================================
Teradata
Query to Summarize Access Logging Data
==============================================================
Tech
Note:
The following query will provide a
summary report on the Access Logging data stored in Teradata:
Select
DT.ld (format 'mm/dd/yyyy', title 'Date')
,
DT.st (title '//Statement//Type')
,
DT.db (format 'X(13)', title 'DatabaseName')
,
DT.cnt (title 'Number//of//Statements')
FROM
(Select
LogDate
,
StatementType
,
DatabaseName
,
count(*)
FROM
dbc.accesslog
WHERE
StatementType not in ('select', 'Begin/End Logging',
'Show','Exec')
GROUP
BY 1,2,3) DT(ld, st, db, cnt)
GROUP
BY 1,2,3,4
WITH
sum (DT.cnt)(title 'Total') BY DT.ld
ORDER
BY 1,3,2;
Here
is a sample of the output.....
Number
Statement
of
Date
Type DatabaseName Statements
----------
-------------------- ------------- -----------
10/29/2001
Create User DBC 1
-----------
Total
1
10/30/2001
Grant dbc 1
10/30/2001
Replace Macro DBC 6
10/30/2001
Replace View DBC 2
10/30/2001
Collect Statistics dss_tables 97
10/30/2001
Create Index dss_tables 41
10/30/2001
Create Table dss_tables 12
10/30/2001
Delete dss_tables 26
10/30/2001
Drop Index dss_tables 15
10/30/2001
Drop Statistics dss_tables 3
10/30/2001
Drop Table dss_tables 12
10/30/2001
Help dss_tables 72
10/30/2001
Insert dss_tables 114
10/30/2001
Rename Table dss_tables 6
10/30/2001
Update dss_tables 542
-----------
Total
949
ReplyDeleteThanks for publishing this article, really superb. Most of the valid good points are there and its very useful for my study also.
Drupal Training in Chennai
Drupal 7 Training
Drupal Certification Training
Drupal Training Course
Photoshop Classes in Chennai
Best Photoshop Training in Chennai
Manual Testing Training in Chennai
Mobile Testing Training in Chennai
It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me... oracle training in chennai
ReplyDeleteThank you for sharing valuable comments for us its very incredible post to whom looking for a particular course if any one who want to learn tally core to advance
ReplyDeleteTeradata Training in Bangalore
will omit your great writing due to this problem.
ReplyDeletedata analytics courses in ameerpet
data scientist course in hyderabad
Really an awesome blog, Keep sharing more blogs with us. Thank you.
ReplyDeleteUI Development Course in Hyderabad
RPA Course in Hyderabad
Python Course in Hyderabad
Mean Stack Development Course in Hyderabad
Really an awesome blog, Keep sharing more blogs with us. Thank you.
ReplyDeleteSelenium-training-in-hyderabad