What is the meaning of skewness
in teradata? When do we use nullif function in teradata?
Skewness:
Skewness
is the statistical term, which refers to the row distribution on AMPs. If the
data is highly skewed, it means some AMPs are having more rows and some very
less i.e. data is not properly/evenly distributed.
This
affects the performance/Teradata's parallelism. The data distribution or
skewness can be controlled by choosing indexes.
When
we choose wrong primary index the data unevenly distributed across all the amps
means some amps have more records and some amps had less records this is called
skewness.
Percentage
of skewness is called skew factor.30% skew factor is acceptable.
We
use nullif when we want to return a null value when the expression in that
statement is true.
In
Teradata Administrator you can simply right-click the table and select
"Space Summary".
How will you avoid skewness ?
Data
or Amp skew occurs in teradata due to uneven distribution of data across all
the amps. Often, this leads to spool space error too. To avoid skewness, try to
select a Primary Index which has as many unique values as possible.
PI
columns like month, day, etc. will have very few unique values. So during data
distribution only a few amps will hold all the data resulting in skew. If a
column (or a combination of columns) is chosen a PI which enforces uniqueness
on the table, then the data distribution will be even and the data will not be
skewed.
Typical reasons of skewness :
( 1)
Skewed tables
: Bad choice of PI, Skewed
data
( 2)
Bad execution plans (typically skewed
redistributions)
( 3)
Bad data model (normalization, data types,PI,
etc.)
( 4)
Missing or stale statistics
( 5)
Too many joins (break up the query!)
( 6)
Hash collision (load time problem)
How to find the skew factor for
a Teradata table (Or) Query to find SKEW
FACTOR of a particular table :
When
Teradata tables are not distributed evenly this can cause significant performance
problems. Storage space is also not utilized properly.
For
that reason identifying and monitoring tables that are not distributed
evenly
across AMPs is very important when working with Teradata databases.
The
following query will reveal the skew factor for the SALES and CUSTOMERS tables
from a Teradata table.
SELECT DatabaseName,TableName,SUM(CurrentPerm) AS CurrentPerm
,SUM(PeakPerm) AS PeakPerm,(100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS
SkewFactor FROM DBC.TableSize WHERE databasename = database AND TableName IN ('SALES' ,
'CUSTOMERS') GROUP BY 1,2 ORDER BY SkewFactor DESC;
Query to find SKEW FACTOR of a
particular table :
SELECT TABLENAME,SUM(CURRENTPERM) /(1024*1024) AS
CURRENTPERM,(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR FROM
DBC.TABLESIZE WHERE DATABASENAME= <DATABASENAME> AND TABLENAME
=<TABLENAME> GROUP BY 1;
(or)
select databasename,tablename,cast((100
-(AVG(a.CurrentPerm)/MAX(a.CurrentPerm)*100)) as integer) AS SkewFactor
from dbc.tablesize a where databasename='DatabaseName' and
tablename ='TableName' group by 1,2 order by 1;
FINDING SKEW FACTOR FOR TABLE
LEVEL :
SELECT databasename,tablename,100 * (1 - (AVG(currentperm) /
MAX(currentperm))) skew_factor FROM dbc.tablesize where databasename='TEMP_DB'
GROUP BY 1,2 ORDER BY 1,2;
To Find available and used space
in Teradata (GB) :
select DB.OWNERNAME,DS.databasename
,cast(sum(maxperm)/1024/1024/1024 as decimal (7,1)) MaxPerm_g
,cast(sum(currentperm)/1024/1024/1024 as decimal (7,1))
CurrPerm_g
,cast(sum(maxperm)/1024/1024/1024-sum(currentperm)/1024/1024/1024
as decimal (7,1)) FreePerm_g,cast(sum(currentperm)*100/nullifzero(sum(maxperm))as
decimal (5,2)) PctUsed,sum(maxperm) MaxPerm,sum(currentperm) CurrPerm,sum(maxperm)-
sum(currentperm) FreePerm ,db.commentstring from dbc.diskspace DS,
DBC.DATABASES DB where db.databasename like '%IDR%' AND DS.DATABASENAME =
DB.DATABASENAME group by 1,2,db.commentstring
order by 1,sum(maxperm)- sum(currentperm) desc,sum(maxperm) desc
,ds.databasename;);
Note: To fix skew problems, you need to change
the table’s primary index.
For an empty table, you can use
ALTER TABLE:
ALTER TABLE orders MODIFY PRIMARY INDEX (ORDER_NUM);
For
already populated table you might need to recreate the table.
You
can use the following sequence:
RENAME TABLE orders TO orders_tmp;
Step 1. Create an empty table
CREATE TABLE orders_tmp AS (SELECT * FROM orders) WITH NO DATA;
Step 2. Alter the primary index for the
new empty table
ALTER TABLE orders_tmp MODIFY PRIMARY INDEX (ORDER_NUM);
Step 3.
Copy data from the skewed table
INSERT INTO orders_tmp SELECT * FROM orders;
Step 4. Remove the old table and rename
the new table
RENAME TABLE orders TO orders_skewed;
RENAME TABLE orders_tmp TO orders;
You
might want to keep the old table for a while and after you are sure you are not
going to need it, drop it:
DROP TABLE orders_skewed;
Select * from employee1,employee2 where employee2.EMPLOYEE_ID=employee1.EMPLOYEE_ID
and employee2.EMPLOYEE_ID=
Skew
factor is tell the distribution of rows if uniformally distribution i.e skew is
the zero
if
the skew factor is reverse of parllel efficence
Finding Skewed Tables in
Teradata
Skewed Tables in Teradata :
Teradata
distributes its data for a given table based on its Primary Index and Primary
Index alone .If this Primary Index is not selected appropriately it can cause
performance bottle necks.
Following
are general parameters we should consider while creating Primary Indexes
1:Access Path
2:Volatility
3: Data Distribution
Volatility
is usually not an issue in a well designed database .That is to say we do not
expect update clauses updating the primary index itself. This usually leaves us
with data distribution and Access Path.
Access
Path implies particular column ( set of columns ) are always used in join
conditions.Advantage of using these columns as PI is that it will avoid
redistribution of data during join .( One of the most expensive operation for
teradata ) and therefore can reduce usage of Spool files.
Data
distribution implies, Data is evenly distributed amongst all amps of teradata.
This will ensure that all amps would be doing same amount of work. In case of
Skewed distribution one amp will end up doing most of the work this can
increase execution time considerable.
Following
Queries can be used to determine table skeweness.
SELECT DatabaseName,TableName,SUM(CurrentPerm) AS
CurrentPerm,(SUM(PeakPerm) AS PeakPerm,100 -
(AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactor
FROM DBC.TableSize WHERE TABLENAME =:TABLENAME GROUP BY 1,2;
However
tables which have very few rows (esp. lesser than amps) are inherently skewed
and cannot be changed. So we should take into account size of table also while
considering skeweness.
Query
can be further customized to have better details :
SELECT DATABASENAME, TABLENAME,SKEW_CATEGORY FROM(
,SELECT DATABASENAME,TABLENAME,(MAX(CURRENTPERM)-AVG(CURRENTPERM))/MAX(CURRENTPERM)
*100 SKEW_FACTOR,SUM(CURRENTPERM) /(1024*1024*1024) ACTUAL_SPACE
CASE WHEN SKEW_FACTOR >90 THEN 'EXTREMELY SKEWED,T0 BE RECTIFIED
IMMEDIATELY'
WHEN SKEW_FACTOR >70 THEN 'HIGHLY SKEWED'
WHEN SKEW_FACTOR >50 THEN 'SKEWED TABLE'
WHEN SKEW_FACTOR >40 THEN 'SLIGHTLY SKEWED'
WHEN SKEW_FACTOR >30 AND ACTUAL_SPACE >.5 THEN 'SLIGHTLY
SKEWED'
ELSE 'ACCEPTABLE'
END as SKEW_CATEGORY FROM
DBC.TABLESIZE WHERE DATABASENAME in ('DATABASENAME') GROUP BY 1,2 )x ORDER BY
SKEW_FACTOR DESC;
To find distribution of data
amongst amps, we can use hash functions as follows
SELECT HASHAMP(HASHBUCKET(HASHROW( Column ))),COUNT(*) FROM
DATABASENAME.TABLENAME GROUP BY 1;
This factor can further be
exploited to find distribution for primary index as follows :
SEL 'ColumnName ' , (MAX(CN)-AVG(CN))/MAX(CN)*100 FROM ( SEL
VPROC ,COUNT(*) CN FROM DATABASENAME.TABLENAME RIGHT OUTER JOIN (SEL VPROC FROM
DBC.TABLESIZE GROUP BY 1 )X ON VPROC=HASHAMP(HASHBUCKET(HASHROW( ColumnName )))
GROUP BY 1) C
GROUP BY 1 ;
This
query can also be used to check which singular column can be best fitted for
primary index on sole criteria of data distribution.
Following query is a generic
query which would create SQL statements to check distribution for that column
amongst amps.
SEL'
SEL ' || '27'xc || COLUMNNAME || '27'xc || ' , (MAX(CN)-AVG(CN))/MAX(CN)*100
FROM
( SEL VPROC ,COUNT(*) CN FROM' ' || DATABASENAME || '.' ||TABLENAME ||
RIGHT
OUTER JOIN (SEL VPROC FROM DBC.TABLESIZE GROUP BY 1
)X
ON VPROC=HASHAMP(HASHBUCKET(HASHROW( '|| COLUMNNAME || ' )))
GROUP
BY 1) C GROUP BY 1
UNION
ALL ' Title FROM DBC.COLUMNS WHERE TABLENAME='’AND DATABASENAME='’ ;
What does skew metric mean?
You
can see this word "Skewness" or "Skew factor" in a lot of
places regarding Teradata: documents, applications, etc. Skewed table, skewed
cpu.It is something wrong, but what does it explicitly mean? How to interpret
it?
Teradata
is a massive parallel system, where uniform units (AMPs) do the same tasks on
that data parcel they are responsible for. In an ideal world all AMPs share the
work equally, no one must work more than the average. The reality is far more
cold, it is a rare situation when this equality (called "even
distribution") exists.
It
is obvious that uneven distribution will cause wrong efficiency of using the
parallel infrastructure.
But
how bad is the situation? Exactly that is what Skewness characterizes.
Let
"RESOURCE" mean the amount of resource (CPU, I/O, PERM space)
consumed by an AMP.
Let
AMPno is the number of AMPs in the Teradata system.
Skew
factor := 100 - ( AVG ( "RESOURCE" ) / NULLIFZERO ( MAX
("RESOURCE") ) * 100 )
Total[Resource]
:= SUM("RESOURCE")
Impact[Resource]
:= MAX("RESOURCE") * AMPno
Parallel
Efficiency := Total[Resource] / Impact[Resource] * 100
or
with some transformation:
Parallel
Efficiency := 100 - Skew factor
Analysis
Codomain
0
<= "Skew factor" < 100
"Total[Resource]"
<= "Impact[Resource]"
0<"Parallel
Efficiency"<=100
Meaning
Skew
factor : This percent of the consumed
real resources are wasted
Eg.
an 1Gbytes table with skew factor of 75 will allocate 4Gbytes*
Total[Resource]
: Virtual resource consumption, single sum of individual resource consumptions
, measured on AMPs as independent
systems
Impact[Resource]
: Real resource consumption impacted on the parallel infrastructure
Parallel
Efficiency : As it says. Eg. Skew=80: 20%
Note
: Theoretically if there is/are complementary characteristics resource
allocation (consumes that less resources on that AMP where my load has excess)
that can compensate the parallel inefficiency from system point of view, but
the probability of it tends to zero.
How many types of Skew exist?
If
you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called
skew exists.
Major
are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).
-Data
skew?
When
data is not distributed equally on all the AMPs.
-Cpu
skew?
Who
is taking/consuming more CPU called cpu skew.
-IO
skew?
Who
perform more IO Operation? Resulting in IO Skew
Explain about Skew Factor?
The
data distribution of table among AMPs is called Skew Factor . Generally For
Non-Unique PI we get duplicate values so the more duplicate vales we get more
the data have same row hash so all the same data will come to same amp, it
makes data distribution inequality,One amp will store more data and other amp
stores less amount of data, when we are accessing full table,
The
amp which is having more data will take longer time and makes other amps
waiting which leads processing wastage In this situation (unequal distribution
of data)we get Skew Factor HighFor this type of tables we should avoid full
table scans.
Ex:
AMP0
AMP1
10000(10%)
9000000(90%)
In
this situation skew factor is very high 90%
How to
"re-parallelize" skewed joins ?
Case
description
Assume
that we have 1M customers, 4M transactions and our top customer produce the 2.5% of all transactions. Others produce
the remaining 97.5% of transactions approx. evenly.
Scroll down to the bottom of the post for sample table and data generator SQL.
Our task is to join a "Customer" and a "Transaction" tables
on Customer_id.
The join
SELECT
Customer_name, count(*)
FROM
Customer c
JOIN
Transact t ON c.Customer_id = t.Customer_id
GROUP
BY 1;
We experience a pretty slow execution. On the ViewPoint we see that only one
AMP is working, while others are not.
What is the problem?
There are two separate subsets of the Transact table from
"joinability" point of view:
·
"Peak"
part (records of top customer(s))
Very few customers have very much Transact records. Product join would be cost
effective
·
"Even"
part (records of other customers)
Much customers have much, but specifically evenly few Transact records. Merge
join would be ideal.
Unfortunately
Optimizer have to decide, only one operation type can be chosen. It will choose
merge join which consumes far less CPU time.
Execution plan looks like this:
This
query is optimized using type 2 profile T2_Linux64, profileid 21.
1) First, we lock a distinct D_DB_TMP."pseudo table" for read
on a
RowHash to prevent global deadlock for D_DB_TMP.t.
2) Next, we lock a distinct D_DB_TMP."pseudo table" for read
on a
RowHash to prevent global deadlock for D_DB_TMP.c.
3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.
4) We
do an all-AMPs RETRIEVE step from D_DB_TMP.t by way of an
all-rows scan with a condition of ("NOT
(D_DB_TMP.t.Customer_ID IS
NULL)") into Spool 4 (all_amps), which is redistributed by the
hash code of
(D_DB_TMP.t.Customer_ID) to
all AMPs. Then we do a
SORT to order Spool 4 by row hash. The size of
Spool 4 is
estimated with low confidence to be 125 rows (2,125
bytes). The
estimated time for this step is 0.01 seconds.
5) We
do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a
RowHash match scan, which is joined to D_DB_TMP.c by
way of a
RowHash match scan. Spool 4 and D_DB_TMP.c are joined using a
merge join,
with a join condition of ("D_DB_TMP.c.Customer_ID =
Customer_ID"). The result goes into Spool 3
(all_amps), which is
built locally on the AMPs. The
size of Spool 3 is estimated with
index join confidence to be 125 rows (10,375
bytes). The
estimated time for this step is 0.02 seconds.
6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
D_DB_TMP.c.Customer_name). Aggregate
Intermediate Results are
computed globally, then placed in Spool 5. The
size of Spool 5 is
estimated with no confidence to be 94 rows (14,758
bytes). The
estimated time for this step is 0.02 seconds.
7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is
built locally
on the AMPs. The size of Spool 1 is estimated
with no confidence
to be 94 rows (8,742 bytes). The estimated time
for this step is
0.02 seconds.
8) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.07
seconds.
How
to identify
If you experience extremely asymmetric AMP load
you can suspect on this case.
Find highly skewed JOIN steps in the DBQL (set
all logging options on):
select top 50 a.MaxAMPCPUTime
* (hashamp()+1) / nullifzero(a.CPUTime) Skw,a.CPUTime,a.MaxAMPCPUTime * (hashamp()+1)
CoveringCPUTime,
b.* from dbc.dbqlsteptbl a
join dbc.dbqlogtbl b on a.procid=b.procid and a.queryid=b.queryid
where StepName='JIN' and CPUtime > 100
and Skw > 2 order by CoveringCPUTime desc;
(Note: Covering CPU time is
<No-of-AMPs> * <Max AMP's CPU time>. Virtually this amount of CPU
is consumed because asymmetric load of the system)
Or if you suspect a specific query, check the
demography of the join field(s) in the "big" table:
SELECT TOP 100 <Join_field>,
count(*) Nbr
FROM <Big_table> GROUP BY 1 ORDER BY 2 DESC;
If the top
occurences are spectacularly larger than others (or than average) the idea
likely matches.
Solution :
Break the query
into two parts: join the top customer(s) separately, and then all others.
Finally union the results. (Sometimes additional modification also required if
the embedding operation(s) - the group by here - is/are not decomposable on the
same parameter.)
First we have to identify the top customer(s):
SELECT TOP 5 Customer_id, count(*) Nbr
FROM Transact GROUP BY 1 ORDER BY 2 DESC;
Customer_id
Nbr
------------------------------
345
100004
499873
4
677423
4
187236
4
23482
4
Replace the original query with his one:
SELECT Customer_name,
count(*) FROM Customer c JOIN Transact t ON c.Customer_id = t.Customer_id where
t.Customer_id in (345)
/*
ID of the top Customer(s).
If more customers are salient, list them, but max ~5
*/
GROUP BY 1 UNION ALL SELECT Customer_name, count(*) FROM Customer c JOIN
Transact t ON c.Customer_id = t.Customer_id where t.Customer_id not in
(345) -- Same customer(s) GROUP BY 1 ;
Be sure that
Customer.Customer_id, Transact.Transact_id and Transact.Customer_id have
statistics!
This query is more complex, has more steps,
scans Transact table 2 times, but runs much faster, you can check it.
But why? And how to determine which
"top" customers worth to be handled separately?
Explanation : Calculation :
Assume that we are on a 125 AMP system.
Customer table
contains 1M records with unique ID.
We have ~4.1M
records in the Transact table, 100k for the top customer (ID=345), and 4 for
each other customers. This matches the 2.5% we assumed above.
If the Transact table is redistributed on
hash(Customer_id) then we will get ~33k
records on each AMPs, excluding AMP(hash(345)). Here we'll get ~133k (33k + 100K).
That means that this AMP will process ~4x more
data than others, therefore runs 4x longer.
With other words in 75% of this JOIN step's time 124 AMPs will DO NOTHING with
the query.
Moreover the preparation and subsequent steps
are problematic also: the JOIN is prepared by a redistribution which produces a
strongly skewed spool, and the JOIN's result stays locally on the AMPs being
skewed also.
Optimized
version :
This query will
consume moderately more CPU, but it is distributed evenly across the AMPs, utilizing
the Teradata's full parallel capability.
It contains a product join also, but is it no
problem it joins 1 records to the selected 100k records of Transacts, that will
be lightning fast.
All
Look at the execution plan of the broken-up
query:
This query is optimized using
type 2 profile T2_Linux64, profileid 21.
1) First, we lock a distinct D_DB_TMP."pseudo table" for read
on a
RowHash to prevent global deadlock for D_DB_TMP.t.
2) Next, we lock a distinct D_DB_TMP."pseudo table" for read
on a
RowHash to prevent global deadlock for D_DB_TMP.c.
3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.
4) We do a single-AMP RETRIEVE step from D_DB_TMP.c by way of the
unique primary index "D_DB_TMP.c.Customer_ID =
345" with no
residual conditions into Spool 4 (all_amps), which
is duplicated
on all AMPs.
The size of Spool 4 is estimated with high
confidence to be 125 rows (10,625 bytes). The
estimated time for
this step is 0.01 seconds.
5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to D_DB_TMP.t by way of
an all-rows
scan with a condition of ("D_DB_TMP.t.Customer_ID
= 345"). Spool
4 and D_DB_TMP.t are joined using a product join, with a join
condition of ("Customer_ID =
D_DB_TMP.t.Customer_ID"). The result
goes into Spool 3 (all_amps), which is built locally
on the AMPs.
The size of Spool 3 is estimated with low confidence
to be 99,670
rows (8,272,610 bytes). The estimated time for
this step is 0.09
seconds.
6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
D_DB_TMP.c.Customer_name). Aggregate
Intermediate Results are
computed globally, then placed in Spool 5. The
size of Spool 5 is
estimated with no confidence to be 74,753 rows
(11,736,221 bytes).
The estimated time for this step is 0.20 seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
Spool 5 (Last Use) by
way of an all-rows scan
into Spool 1 (all_amps), which is
built locally on the
AMPs. The size of Spool 1 is estimated
with no confidence to be
74,753 rows (22,052,135 bytes). The
estimated time for this
step is 0.02 seconds.
2) We do an all-AMPs RETRIEVE step from D_DB_TMP.t by
way of an
all-rows scan with a
condition of ("D_DB_TMP.t.Customer_ID
<>
3454") into Spool 9 (all_amps), which is redistributed by the
hash code of (D_DB_TMP.t.Customer_ID) to all AMPs. The size
of Spool 9 is estimated
with high confidence to be 4,294,230
rows (73,001,910
bytes). The estimated time for this step is
1.80 seconds.
8) We do an all-AMPs JOIN step from D_DB_TMP.c by way of an all-rows
scan with a
condition of ("D_DB_TMP.c.Customer_ID <> 3454"), which
is joined to
Spool 9 (Last Use) by way of an all-rows scan.
D_DB_TMP.c and Spool 9 are joined using a single partition hash
join,
with a join condition of ("D_DB_TMP.c.Customer_ID =
Customer_ID"). The result goes into Spool 8
(all_amps), which is
built locally on the AMPs. The size of Spool 8
is estimated with
low confidence to be 4,294,230 rows (356,421,090
bytes). The
estimated time for this step is 0.72 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 8 (Last Use) by
way of an all-rows scan , grouping by field1 (
D_DB_TMP.c.Customer_name). Aggregate
Intermediate Results are
computed globally, then placed in Spool 10. The
size of Spool 10
is estimated with no confidence to be 3,220,673 rows
(505,645,661
bytes). The estimated time for this step is 8.46
seconds.
10) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is
built locally
on the AMPs. The size of Spool 1 is estimated
with no confidence
to be 3,295,426 rows (972,150,670 bytes). The
estimated time for
this step is 0.32 seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 11.60
seconds.
Find Skewed CPU and I/O in DBQL
Please
note that the solutions found in the article works on the DBQL logs, which
covers only that users' activity, for whom the logging is switched on.
About Skew
If
you are not comfortably familiar with Skewness or DBQL, please read the
corresponding posts before:
Teradata
Tuning - Skewness
Teradata
Tuning - DBQL I
As
you can see the difference between Impact[Resource] and Total[Resource] is a
net technological loss, which should be minimized.
Skewness
is a good indicator for highlight those workload that wastes lots of valuable
CPU and I/O resources because of inefficient parallelism.
Find bad queries in DBQL
With
this SQL you can filter top impacting queries
(replace the date value or maybe you have to adjust the date filtering
according to local settings):
select top 50
ProcID
,
QueryID
,
AMPCPUTime
,
MaxAMPCPUTime * (hashamp () + 1) CPUImpact
,
CAST (100 - ((AmpCPUTime / (hashamp () + 1)) * 100 / NULLIFZERO
(MaxAMPCPUTime)) AS INTEGER) "CPUSkew%"
,
TotalIOCount
,
MaxAMPIO * (hashamp () + 1) IOImpact
,
CAST (100 - ((TotalIOCount / (hashamp () + 1)) * 100 / NULLIFZERO (MaxAMPIO) )
AS INTEGER) "IOSkew%"
,
AMPCPUTime * 1000 / nullifzero (TotalIOCount) LHR
,
TotalIOCount / nullifzero (AMPCPUTime * 1000) RHL
,
ParserCPUTime
,
Queryband
,
Substr(QueryText,1,2000) QueryText
from
/*
For archived DBQL
dbql_arch.dbqlogtbl_hst where
logdate=1131201
and ampcputime>0
*/
/*
For online DBQL*/
dbc.dbqlogtbl where
cast(cast(starttime as char(10)) as date) =
'2013-12-18' (date)
and ampcputime>0
order
by CPUImpact desc
Explanation of extra fields:
ParserCPUTime:
Time parser spent on producing the execution plan. This can be high if SQL is
too complex or too many random AMP sampling has to be done.
LHR/RHL:
Larry Higa ( inverse Larry Higa) index. Empirical index that shows the CPU vs
I/O rate. By experience it should be usually around one (can be different
depending on your system configuration, but is a constant). If it is far from
1, that indicates CPU or I/O dominance, which means unbalanced resource
consumption, but it is a different dimension that skew.
QueryBand:
Labels that sessions use to identify themselves within the DBQL logs
QueryText:
First 200 characters of the query (depending on DBQL log settings)
OK,
we've listed the terrible top consumers, but what's next?
Have
to identify those queries. If your ETL and Analytics software is configured to
user QueryBand properly (this area deserves a separate post...) , you can find
which job or report issued that SQL, anyway, you can see the QueryText field.
If
you want to get the full SQL text, select it from the DBQLSQLTbl (SQL logging
needs to be switched on), replace the appropriate <procid> and
<queryid> values:
select SQLTextInfo from dbc.dbqlsqltbl where
procid=<procid> and queryid=<queryid>
order by SQLRowNo asc
You
will get the SQL in several records, broken up to 30K blocks, simply
concatenate them. Unfortunately the SQL will have very ugly make up, you can
use PRISE Tuning Assistant to beautify and highlight it for easy reading.
System level Skewness :
We've found those bad queries, nice. But
what can we say about the whole system? What is the total parallel efficiency?
Can we report how much resources were wasted due to bad parallel efficiency?
The
answer is: yes, we can estimate quite closely. The exact value we cannot
calculate because DBQL does not log AMP information for the query execution,
but the most important metrics.
We
can not calculate that situation when more skewed queries run the same time,
but have peaks on different AMPs. This reduces the system level resource
wasting, but is hard to calculate with, however its probability and effect is
negligible now.
select
sum(AMPCPUTime)
AMPCPUTimeSum
, sum(MaxAMPCPUTime * (hashamp () + 1)) CPUImpactSum
, sum(TotalIOCount) TotalIOCountSum
, sum(MaxAMPIO * (hashamp () + 1)) IOImpactSum
, cast(100 - (AMPCPUTimeSum / CPUImpactSum) * 100 as integer)
"CPUSkew%"
, cast(100 - (TotalIOCountSum / IOImpactSum) * 100 as integer)
"IOSkew%"
from
/* For archived DBQL
dbql_arch.dbqlogtbl_hst where logdate = '2013-12-18' (date)
and (ampcputime>0
or TotalIOCount > 0)
*/
/* For online DBQL*/
dbc.dbqlogtbl where
cast(cast(starttime as
char(10)) as date) = '2013-12-18' (date)
and (ampcputime>0
or TotalIOCount > 0)
Look
at the last two columns. That percent of your CPU and I/O goes to the sink...
OK,
let's check how many queries accumulate 5%,10%,25%,50%,75%,90% of this loss?
Here
you are (CPU version, transform for I/O implicitly):
select 'How many queries?' as "_",min(limit5)
"TOP5%Loss",min(limit10) "TOP10%Loss",min(limit25)
"TOP25%Loss",min(limit50) "TOP50%Loss",min(limit75)
"TOP75%Loss",min(limit90) "TOP90%Loss", max(rnk)
TotalQueries, sum(ResourceTotal) "TotalResource", sum(ResourceImpact)
"ImpactResource"
from
(
select
case when ResRatio <
5.00 then null else rnk end limit5
,case when ResRatio < 10.00 then null else rnk end limit10
,case when ResRatio < 25.00 then null else rnk end limit25
,case when ResRatio < 50.00 then null else rnk end limit50
,case when ResRatio < 75.00 then null else rnk end limit75
,case when ResRatio < 90.00 then null else rnk end limit90
,rnk
, ResourceTotal
, ResourceImpact
from
(
select
sum(ResourceLoss) over
(order by ResourceLoss desc ) totalRes
, sum(ResourceLoss) over (order by ResourceLoss desc rows unbounded preceding) subtotalRes
, subtotalRes *100.00 / totalRes Resratio
, sum(1) over (order by ResourceLoss desc rows unbounded preceding) rnk
, ResourceTotal
, ResourceImpact
from
(
select
AMPCPUTime ResourceTotal
, (MaxAMPCPUTime * (hashamp () + 1)) ResourceImpact
, ResourceImpact -
ResourceTotal ResourceLoss
/* For archived DBQL
from dbql_arch.dbqlogtbl_hst where logdate=1131207
and ampcputime>0
*/
/* For online DBQL*/
from dbc.dbqlogtbl where
cast(cast(starttime as char(10)) as date) = '2013-12-18' (date)
and ampcputime>0
) x
) y
) z
group by 1
I
expect you are a bit shocked now, how few queries waste how much golden
resources.
I
think we will agree that it is worth to tune those dozen of queries, and you
save in orders of 100K..MUSD for your company annually, am I right?
https://www.prisetools.com/blog/accelerate-skewed-joins
ReplyDeletei need some help to effectively apply and fix skew issues, let me know if i canpost my questions.
ReplyDelete