Wednesday, February 8, 2017

Skew Stuff

What is the meaning of skewness in teradata? When do we use nullif function in teradata?

Skewness:

Skewness is the statistical term, which refers to the row distribution on AMPs. If the data is highly skewed, it means some AMPs are having more rows and some very less i.e. data is not properly/evenly distributed.

This affects the performance/Teradata's parallelism. The data distribution or skewness can be controlled by choosing indexes.

When we choose wrong primary index the data unevenly distributed across all the amps means some amps have more records and some amps had less records this is called skewness.
Percentage of skewness is called skew factor.30% skew factor is acceptable.

We use nullif when we want to return a null value when the expression in that statement is true.

In Teradata Administrator you can simply right-click the table and select "Space Summary".

How will you avoid skewness ?

Data or Amp skew occurs in teradata due to uneven distribution of data across all the amps. Often, this leads to spool space error too. To avoid skewness, try to select a Primary Index which has as many unique values as possible.

PI columns like month, day, etc. will have very few unique values. So during data distribution only a few amps will hold all the data resulting in skew. If a column (or a combination of columns) is chosen a PI which enforces uniqueness on the table, then the data distribution will be even and the data will not be skewed.

Typical reasons of skewness :

( 1)  Skewed tables  :     Bad choice of PI, Skewed data
( 2)  Bad execution plans (typically skewed redistributions)
( 3)  Bad data model (normalization, data types,PI, etc.)
( 4)  Missing or stale statistics
( 5)  Too many joins (break up the query!)
( 6)  Hash collision (load time problem)

How to find the skew factor for a Teradata table  (Or) Query to find SKEW FACTOR of a particular table :

When Teradata tables are not distributed evenly this can cause significant performance problems. Storage space is also not utilized properly.

For that reason identifying and monitoring tables that are not distributed
evenly across AMPs is very important when working with Teradata databases.

The following query will reveal the skew factor for the SALES and CUSTOMERS tables from a Teradata table.

SELECT DatabaseName,TableName,SUM(CurrentPerm) AS CurrentPerm ,SUM(PeakPerm) AS PeakPerm,(100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactor FROM DBC.TableSize WHERE databasename =  database AND TableName IN ('SALES' , 'CUSTOMERS') GROUP BY 1,2 ORDER BY SkewFactor DESC;

Query to find SKEW FACTOR of a particular table :

SELECT TABLENAME,SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR FROM DBC.TABLESIZE WHERE DATABASENAME= <DATABASENAME> AND TABLENAME =<TABLENAME> GROUP BY 1;

(or)

select databasename,tablename,cast((100 -(AVG(a.CurrentPerm)/MAX(a.CurrentPerm)*100)) as integer) AS SkewFactor
from dbc.tablesize a where databasename='DatabaseName' and tablename ='TableName' group by 1,2 order by 1;

FINDING SKEW FACTOR FOR TABLE LEVEL :

SELECT databasename,tablename,100 * (1 - (AVG(currentperm) / MAX(currentperm))) skew_factor FROM dbc.tablesize where databasename='TEMP_DB'
GROUP BY 1,2 ORDER BY 1,2;

To Find available and used space in Teradata (GB) :

select DB.OWNERNAME,DS.databasename
,cast(sum(maxperm)/1024/1024/1024 as decimal (7,1)) MaxPerm_g
,cast(sum(currentperm)/1024/1024/1024 as decimal (7,1)) CurrPerm_g
,cast(sum(maxperm)/1024/1024/1024-sum(currentperm)/1024/1024/1024 as decimal (7,1)) FreePerm_g,cast(sum(currentperm)*100/nullifzero(sum(maxperm))as decimal (5,2)) PctUsed,sum(maxperm) MaxPerm,sum(currentperm) CurrPerm,sum(maxperm)- sum(currentperm) FreePerm ,db.commentstring from dbc.diskspace DS, DBC.DATABASES DB where db.databasename like '%IDR%' AND DS.DATABASENAME = DB.DATABASENAME group by 1,2,db.commentstring
order by 1,sum(maxperm)- sum(currentperm) desc,sum(maxperm) desc ,ds.databasename;);

Note:      To fix skew problems, you need to change the table’s primary index.

For an empty table, you can use ALTER TABLE:

ALTER TABLE orders MODIFY PRIMARY INDEX (ORDER_NUM);

For already populated table you might need to recreate the table.

You can use the following sequence:

RENAME TABLE orders TO orders_tmp;

Step 1. Create an empty table

CREATE TABLE orders_tmp AS (SELECT * FROM orders) WITH NO DATA;

Step 2. Alter the primary index for the new empty table

ALTER TABLE orders_tmp MODIFY PRIMARY INDEX (ORDER_NUM);

 Step 3. Copy data from the skewed table

INSERT INTO orders_tmp SELECT * FROM orders;

Step 4. Remove the old table and rename the new table

RENAME TABLE orders TO orders_skewed;
RENAME TABLE orders_tmp TO orders;

You might want to keep the old table for a while and after you are sure you are not going to need it, drop it:

DROP TABLE orders_skewed;
Select * from employee1,employee2 where employee2.EMPLOYEE_ID=employee1.EMPLOYEE_ID and employee2.EMPLOYEE_ID=

Skew factor is tell the distribution of rows if uniformally distribution i.e skew is the zero
if the skew factor is reverse of parllel efficence

Finding Skewed Tables in Teradata

Skewed Tables in Teradata  :

Teradata distributes its data for a given table based on its Primary Index and Primary Index alone .If this Primary Index is not selected appropriately it can cause performance bottle necks.

Following are general parameters we should consider while creating Primary Indexes
      1:Access Path
      2:Volatility
      3: Data Distribution
     
Volatility is usually not an issue in a well designed database .That is to say we do not expect update clauses updating the primary index itself. This usually leaves us with data distribution and Access Path.

Access Path implies particular column ( set of columns ) are always used in join conditions.Advantage of using these columns as PI is that it will avoid redistribution of data during join .( One of the most expensive operation for teradata ) and therefore can reduce usage of Spool files.

Data distribution implies, Data is evenly distributed amongst all amps of teradata. This will ensure that all amps would be doing same amount of work. In case of Skewed distribution one amp will end up doing most of the work this can increase execution time considerable.
Following Queries can be used to determine table skeweness.

SELECT DatabaseName,TableName,SUM(CurrentPerm) AS CurrentPerm,(SUM(PeakPerm) AS PeakPerm,100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactor
FROM DBC.TableSize WHERE TABLENAME =:TABLENAME GROUP BY 1,2;

However tables which have very few rows (esp. lesser than amps) are inherently skewed and cannot be changed. So we should take into account size of table also while considering skeweness.

Query can be further customized to have better details :

SELECT DATABASENAME, TABLENAME,SKEW_CATEGORY FROM(
,SELECT DATABASENAME,TABLENAME,(MAX(CURRENTPERM)-AVG(CURRENTPERM))/MAX(CURRENTPERM) *100 SKEW_FACTOR,SUM(CURRENTPERM) /(1024*1024*1024) ACTUAL_SPACE
CASE WHEN SKEW_FACTOR >90 THEN 'EXTREMELY SKEWED,T0 BE RECTIFIED IMMEDIATELY'
WHEN SKEW_FACTOR >70 THEN 'HIGHLY SKEWED'
WHEN SKEW_FACTOR >50 THEN 'SKEWED TABLE'
WHEN SKEW_FACTOR >40 THEN 'SLIGHTLY SKEWED'
WHEN SKEW_FACTOR >30 AND ACTUAL_SPACE >.5 THEN 'SLIGHTLY SKEWED'
ELSE 'ACCEPTABLE'
END as SKEW_CATEGORY  FROM DBC.TABLESIZE WHERE DATABASENAME in ('DATABASENAME') GROUP BY 1,2 )x ORDER BY SKEW_FACTOR DESC;

To find distribution of data amongst amps, we can use hash functions as follows

SELECT HASHAMP(HASHBUCKET(HASHROW( Column ))),COUNT(*) FROM DATABASENAME.TABLENAME GROUP BY 1;

This factor can further be exploited to find distribution for primary index as follows :

SEL 'ColumnName ' , (MAX(CN)-AVG(CN))/MAX(CN)*100 FROM ( SEL VPROC ,COUNT(*) CN FROM DATABASENAME.TABLENAME RIGHT OUTER JOIN (SEL VPROC FROM DBC.TABLESIZE GROUP BY 1 )X ON VPROC=HASHAMP(HASHBUCKET(HASHROW( ColumnName ))) GROUP BY 1) C
GROUP BY 1 ;

This query can also be used to check which singular column can be best fitted for primary index on sole criteria of data distribution.

Following query is a generic query which would create SQL statements to check distribution for that column amongst amps.

SEL' SEL ' || '27'xc || COLUMNNAME || '27'xc || ' , (MAX(CN)-AVG(CN))/MAX(CN)*100
FROM ( SEL VPROC ,COUNT(*) CN FROM' ' || DATABASENAME || '.' ||TABLENAME ||
RIGHT OUTER JOIN (SEL VPROC FROM DBC.TABLESIZE GROUP BY 1
)X ON VPROC=HASHAMP(HASHBUCKET(HASHROW( '|| COLUMNNAME || ' )))
GROUP BY 1) C GROUP BY 1
UNION ALL ' Title FROM DBC.COLUMNS WHERE TABLENAME='’AND DATABASENAME='’ ;

What does skew metric mean?

You can see this word "Skewness" or "Skew factor" in a lot of places regarding Teradata: documents, applications, etc. Skewed table, skewed cpu.It is something wrong, but what does it explicitly mean? How to interpret it?

Teradata is a massive parallel system, where uniform units (AMPs) do the same tasks on that data parcel they are responsible for. In an ideal world all AMPs share the work equally, no one must work more than the average. The reality is far more cold, it is a rare situation when this equality (called "even distribution") exists.

It is obvious that uneven distribution will cause wrong efficiency of using the parallel infrastructure.

But how bad is the situation? Exactly that is what Skewness characterizes.

Let "RESOURCE" mean the amount of resource (CPU, I/O, PERM space) consumed by an AMP.

Let AMPno is the number of AMPs in the Teradata system.

Skew factor := 100 - ( AVG ( "RESOURCE" ) / NULLIFZERO ( MAX ("RESOURCE") ) * 100 )

Total[Resource] := SUM("RESOURCE")

Impact[Resource] := MAX("RESOURCE") * AMPno

Parallel Efficiency := Total[Resource] / Impact[Resource] * 100

or with some transformation:

Parallel Efficiency := 100 - Skew factor

Analysis

Codomain
0 <= "Skew factor" < 100

"Total[Resource]" <= "Impact[Resource]"

0<"Parallel Efficiency"<=100

Meaning

Skew factor :   This percent of the consumed real resources are wasted
Eg. an 1Gbytes table with skew factor of 75 will allocate 4Gbytes*

Total[Resource] : Virtual resource consumption, single sum of individual resource consumptions , measured on  AMPs as independent systems

Impact[Resource] : Real resource consumption impacted on the parallel infrastructure

Parallel Efficiency : As it says. Eg. Skew=80: 20%

Note : Theoretically if there is/are complementary characteristics resource allocation (consumes that less resources on that AMP where my load has excess) that can compensate the parallel inefficiency from system point of view, but the probability of it tends to zero.

How many types of Skew exist?

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists.

Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.

-Cpu skew?
Who is taking/consuming more CPU called cpu skew.

-IO skew?
Who perform more IO Operation? Resulting in IO Skew

Explain about Skew Factor?

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.

Ex:

AMP0               AMP1
10000(10%)     9000000(90%)

In this situation skew factor is very high 90%

How to "re-parallelize" skewed joins ?

 

Case description

Assume that we have 1M customers, 4M transactions and our top customer produce the 2.5% of all transactions. Others produce the remaining 97.5% of transactions approx. evenly.
Scroll down to the bottom of the post for sample table and data generator SQL. 

Our task is to join a "Customer" and a "Transaction" tables on Customer_id.

The join

SELECT Customer_name, count(*)
FROM Customer c
JOIN Transact t ON c.Customer_id = t.Customer_id
GROUP BY 1;

We experience a pretty slow execution. On the ViewPoint we see that only one AMP is working, while others are not.

What is the problem?
There are two  separate subsets of the Transact table from "joinability" point of view:

·                     "Peak" part (records of top customer(s)) 
Very few customers have very much Transact records. Product join would be cost effective

·                     "Even" part (records of other customers) 
Much customers have much, but specifically evenly few Transact records. Merge join would be ideal.

Unfortunately Optimizer have to decide, only one operation type can be chosen. It will choose merge join which consumes far less CPU time.

Execution plan looks like this:


 This query is optimized using type 2 profile T2_Linux64, profileid 21.
  1) First, we lock a distinct D_DB_TMP."pseudo table" for read on a
     RowHash to prevent global deadlock for D_DB_TMP.t.
  2) Next, we lock a distinct D_DB_TMP."pseudo table" for read on a
     RowHash to prevent global deadlock for D_DB_TMP.c.
  3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.
  4) We do an all-AMPs RETRIEVE step from D_DB_TMP.t by way of an
     all-rows scan with a condition of ("NOT (D_DB_TMP.t.Customer_ID IS
     NULL)") into Spool 4 (all_amps), which is redistributed by the
     hash code of (D_DB_TMP.t.Customer_ID) to all AMPs.  Then we do a
     SORT to order Spool 4 by row hash.  The size of Spool 4 is
     estimated with low confidence to be 125 rows (2,125 bytes).  The
     estimated time for this step is 0.01 seconds.
  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a
     RowHash match scan, which is joined to D_DB_TMP.c by way of a
     RowHash match scan.  Spool 4 and D_DB_TMP.c are joined using a
     merge join, with a join condition of ("D_DB_TMP.c.Customer_ID =
     Customer_ID").  The result goes into Spool 3 (all_amps), which is
     built locally on the AMPs.  The size of Spool 3 is estimated with
     index join confidence to be 125 rows (10,375 bytes).  The
     estimated time for this step is 0.02 seconds.
  6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 5.  The size of Spool 5 is
     estimated with no confidence to be 94 rows (14,758 bytes).  The
     estimated time for this step is 0.02 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 1 (all_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 94 rows (8,742 bytes).  The estimated time for this step is
     0.02 seconds.
  8) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.07 seconds.


How to identify

If you experience extremely asymmetric AMP load you can suspect on this case.
Find highly skewed JOIN steps in the DBQL (set all logging options on):

select top 50 a.MaxAMPCPUTime * (hashamp()+1) / nullifzero(a.CPUTime) Skw,a.CPUTime,a.MaxAMPCPUTime * (hashamp()+1) CoveringCPUTime,
b.* from dbc.dbqlsteptbl a
join dbc.dbqlogtbl b on a.procid=b.procid and a.queryid=b.queryid
where  StepName='JIN' and CPUtime > 100
and Skw > 2 order by CoveringCPUTime desc; 

(Note: Covering CPU time is <No-of-AMPs> * <Max AMP's CPU time>. Virtually this amount of CPU is consumed because asymmetric load of the system) 

Or if you suspect a specific query, check the demography of the join field(s) in the "big" table:

SELECT TOP 100 <Join_field>, count(*) Nbr
FROM <Big_table> GROUP BY 1 ORDER BY 2 DESC;


If the top occurences are spectacularly larger than others (or than average) the idea likely matches.

Solution :


Break the query into two parts: join the top customer(s) separately, and then all others. Finally union the results. (Sometimes additional modification also required if the embedding operation(s) - the group by here - is/are not decomposable on the same parameter.)
First we have to identify the top customer(s):

SELECT TOP 5 Customer_id, count(*) Nbr
FROM Transact GROUP BY 1 ORDER BY 2 DESC;

Customer_id          Nbr
------------------------------
          345       100004
     499873                4
     677423                4
     187236                4
       23482                4
      
Replace the original query with his one:


SELECT Customer_name, count(*) FROM Customer c JOIN Transact t ON c.Customer_id = t.Customer_id where t.Customer_id in (345)  
/*
   ID of the top Customer(s). 
   If more customers are salient, list them, but max ~5
*/ 
GROUP BY 1 UNION ALL SELECT Customer_name, count(*) FROM Customer c JOIN Transact t ON c.Customer_id = t.Customer_id where t.Customer_id not in (345)  -- Same customer(s) GROUP BY 1 ;


Be sure that Customer.Customer_id, Transact.Transact_id and Transact.Customer_id have statistics!
This query is more complex, has more steps, scans Transact table 2 times, but runs much faster, you can check it.


But  why? And how to determine which "top" customers worth to be handled separately?


Explanation   :  Calculation  :

Assume that we are on a 125 AMP system.


Customer table contains 1M records with unique ID.


We have ~4.1M records in the Transact table, 100k for the top customer (ID=345), and 4 for each other customers. This matches the 2.5% we assumed above.

If the  Transact table is redistributed on hash(Customer_id) then we will get ~33k records on each AMPs, excluding AMP(hash(345)). Here we'll get ~133k (33k + 100K).
That means that this AMP will process ~4x more data than others, therefore runs 4x longer.
With other words in 75% of this JOIN step's time 124 AMPs will DO NOTHING with the query.

Moreover the preparation and subsequent steps are problematic also: the JOIN is prepared by a redistribution which produces a strongly skewed spool, and the JOIN's result stays locally on the AMPs being skewed also.


Optimized version :

This query will consume moderately more CPU, but it is distributed evenly across the AMPs, utilizing the Teradata's full parallel capability.
It contains a product join also, but is it no problem it joins 1 records to the selected 100k records of Transacts, that will be lightning fast.

All 

Look at the execution plan of the broken-up query:


 This query is optimized using type 2 profile T2_Linux64, profileid 21.
  1) First, we lock a distinct D_DB_TMP."pseudo table" for read on a
     RowHash to prevent global deadlock for D_DB_TMP.t.
  2) Next, we lock a distinct D_DB_TMP."pseudo table" for read on a
     RowHash to prevent global deadlock for D_DB_TMP.c.
  3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.
  4) We do a single-AMP RETRIEVE step from D_DB_TMP.c by way of the
     unique primary index "D_DB_TMP.c.Customer_ID = 345" with no
     residual conditions into Spool 4 (all_amps), which is duplicated
     on all AMPs.  The size of Spool 4 is estimated with high
     confidence to be 125 rows (10,625 bytes).  The estimated time for
     this step is 0.01 seconds.
  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
     all-rows scan, which is joined to D_DB_TMP.t by way of an all-rows
     scan with a condition of ("D_DB_TMP.t.Customer_ID = 345").  Spool
     4 and D_DB_TMP.t are joined using a product join, with a join
     condition of ("Customer_ID = D_DB_TMP.t.Customer_ID").  The result
     goes into Spool 3 (all_amps), which is built locally on the AMPs.
     The size of Spool 3 is estimated with low confidence to be 99,670
     rows (8,272,610 bytes).  The estimated time for this step is 0.09
     seconds.
  6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 5.  The size of Spool 5 is
     estimated with no confidence to be 74,753 rows (11,736,221 bytes).
     The estimated time for this step is 0.20 seconds.
  7) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by
          way of an all-rows scan into Spool 1 (all_amps), which is
          built locally on the AMPs.  The size of Spool 1 is estimated
          with no confidence to be 74,753 rows (22,052,135 bytes).  The
          estimated time for this step is 0.02 seconds.
       2) We do an all-AMPs RETRIEVE step from D_DB_TMP.t by way of an
          all-rows scan with a condition of ("D_DB_TMP.t.Customer_ID <>
          3454") into Spool 9 (all_amps), which is redistributed by the
          hash code of (D_DB_TMP.t.Customer_ID) to all AMPs.  The size
          of Spool 9 is estimated with high confidence to be 4,294,230
          rows (73,001,910 bytes).  The estimated time for this step is
          1.80 seconds.
  8) We do an all-AMPs JOIN step from D_DB_TMP.c by way of an all-rows
     scan with a condition of ("D_DB_TMP.c.Customer_ID <> 3454"), which
     is joined to Spool 9 (Last Use) by way of an all-rows scan.
     D_DB_TMP.c and Spool 9 are joined using a single partition hash
     join, with a join condition of ("D_DB_TMP.c.Customer_ID =
     Customer_ID").  The result goes into Spool 8 (all_amps), which is
     built locally on the AMPs.  The size of Spool 8 is estimated with
     low confidence to be 4,294,230 rows (356,421,090 bytes).  The
     estimated time for this step is 0.72 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 8 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 10.  The size of Spool 10
     is estimated with no confidence to be 3,220,673 rows (505,645,661
     bytes).  The estimated time for this step is 8.46 seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way of
     an all-rows scan into Spool 1 (all_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 3,295,426 rows (972,150,670 bytes).  The estimated time for
     this step is 0.32 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 11.60 seconds.



Find Skewed CPU and I/O in DBQL

Please note that the solutions found in the article works on the DBQL logs, which covers only that users' activity, for whom the logging is switched on.

About Skew

If you are not comfortably familiar with Skewness or DBQL, please read the corresponding posts before:

Teradata Tuning - Skewness
Teradata Tuning - DBQL I
As you can see the difference between Impact[Resource] and Total[Resource] is a net technological loss, which should be minimized.

Skewness is a good indicator for highlight those workload that wastes lots of valuable CPU and I/O resources because of inefficient parallelism.

Find bad queries in DBQL

With this SQL you can filter top impacting queries  (replace the date value or maybe you have to adjust the date filtering according to local settings):

select  top 50
  ProcID
, QueryID
, AMPCPUTime
, MaxAMPCPUTime * (hashamp () + 1) CPUImpact
, CAST (100 - ((AmpCPUTime / (hashamp () + 1)) * 100 / NULLIFZERO (MaxAMPCPUTime)) AS INTEGER) "CPUSkew%"
, TotalIOCount
, MaxAMPIO * (hashamp () + 1) IOImpact
, CAST (100 - ((TotalIOCount / (hashamp () + 1)) * 100 / NULLIFZERO (MaxAMPIO) ) AS INTEGER) "IOSkew%"
, AMPCPUTime * 1000 / nullifzero (TotalIOCount) LHR
, TotalIOCount / nullifzero (AMPCPUTime * 1000) RHL
, ParserCPUTime
, Queryband
, Substr(QueryText,1,2000) QueryText
from
/* For archived DBQL
    dbql_arch.dbqlogtbl_hst where logdate=1131201
 and ampcputime>0
*/
/* For online DBQL*/
    dbc.dbqlogtbl where
    cast(cast(starttime as char(10)) as date) = '2013-12-18' (date)
    and ampcputime>0
order by CPUImpact desc

Explanation of extra fields:

ParserCPUTime: Time parser spent on producing the execution plan. This can be high if SQL is too complex or too many random AMP sampling has to be done.

LHR/RHL: Larry Higa ( inverse Larry Higa) index. Empirical index that shows the CPU vs I/O rate. By experience it should be usually around one (can be different depending on your system configuration, but is a constant). If it is far from 1, that indicates CPU or I/O dominance, which means unbalanced resource consumption, but it is a different dimension that skew.

QueryBand: Labels that sessions use to identify themselves within the DBQL logs

QueryText: First 200 characters of the query (depending on DBQL log settings)

OK, we've listed the terrible top consumers, but what's next?
Have to identify those queries. If your ETL and Analytics software is configured to user QueryBand properly (this area deserves a separate post...) , you can find which job or report issued that SQL, anyway, you can see the QueryText field.

If you want to get the full SQL text, select it from the DBQLSQLTbl (SQL logging needs to be switched on), replace the appropriate <procid> and <queryid> values:

select SQLTextInfo from dbc.dbqlsqltbl where procid=<procid> and queryid=<queryid>
order by SQLRowNo asc

You will get the SQL in several records, broken up to 30K blocks, simply concatenate them. Unfortunately the SQL will have very ugly make up, you can use PRISE Tuning Assistant to beautify and highlight it for easy reading.

System level Skewness :

We've found those bad queries, nice. But what can we say about the whole system? What is the total parallel efficiency? Can we report how much resources were wasted due to bad parallel efficiency?

The answer is: yes, we can estimate quite closely. The exact value we cannot calculate because DBQL does not log AMP information for the query execution, but the most important metrics.
We can not calculate that situation when more skewed queries run the same time, but have peaks on different AMPs. This reduces the system level resource wasting, but is hard to calculate with, however its probability and effect is negligible now.

select
  sum(AMPCPUTime) AMPCPUTimeSum
, sum(MaxAMPCPUTime * (hashamp () + 1)) CPUImpactSum
, sum(TotalIOCount) TotalIOCountSum
, sum(MaxAMPIO * (hashamp () + 1)) IOImpactSum
, cast(100 - (AMPCPUTimeSum / CPUImpactSum) * 100 as integer) "CPUSkew%"
, cast(100 - (TotalIOCountSum / IOImpactSum) * 100 as integer) "IOSkew%"
from
/* For archived DBQL
    dbql_arch.dbqlogtbl_hst where logdate = '2013-12-18' (date)    
    and (ampcputime>0 or TotalIOCount > 0)
*/
/* For online DBQL*/
    dbc.dbqlogtbl where
    cast(cast(starttime as char(10)) as date) = '2013-12-18' (date)
    and (ampcputime>0 or TotalIOCount > 0)


Look at the last two columns. That percent of your CPU and I/O goes to the sink...

OK, let's check how many queries accumulate 5%,10%,25%,50%,75%,90% of this loss?

Here you are (CPU version, transform for I/O implicitly):

select 'How many queries?' as "_",min(limit5) "TOP5%Loss",min(limit10) "TOP10%Loss",min(limit25) "TOP25%Loss",min(limit50) "TOP50%Loss",min(limit75) "TOP75%Loss",min(limit90) "TOP90%Loss", max(rnk) TotalQueries, sum(ResourceTotal) "TotalResource", sum(ResourceImpact) "ImpactResource"
from
(
select
 case when ResRatio < 5.00 then null else rnk end limit5
,case when ResRatio < 10.00 then null else rnk end limit10
,case when ResRatio < 25.00 then null else rnk end limit25
,case when ResRatio < 50.00 then null else rnk end limit50
,case when ResRatio < 75.00 then null else rnk end limit75
,case when ResRatio < 90.00 then null else rnk end limit90
,rnk
, ResourceTotal
, ResourceImpact
from
(
select
  sum(ResourceLoss) over (order by ResourceLoss desc ) totalRes
, sum(ResourceLoss) over (order by ResourceLoss desc  rows unbounded preceding) subtotalRes
, subtotalRes *100.00 / totalRes Resratio
, sum(1) over (order by ResourceLoss desc  rows unbounded preceding) rnk
, ResourceTotal
, ResourceImpact
from
(
select
  AMPCPUTime ResourceTotal
, (MaxAMPCPUTime * (hashamp () + 1)) ResourceImpact
,  ResourceImpact - ResourceTotal ResourceLoss
/* For archived DBQL
from dbql_arch.dbqlogtbl_hst where logdate=1131207
and ampcputime>0
*/
/* For online DBQL*/
from dbc.dbqlogtbl where
cast(cast(starttime as char(10)) as date) = '2013-12-18' (date)
and ampcputime>0
) x
) y
) z
group by 1

I expect you are a bit shocked now, how few queries waste how much golden resources.
I think we will agree that it is worth to tune those dozen of queries, and you save in orders of 100K..MUSD for your company annually, am I right?



2 comments:

  1. https://www.prisetools.com/blog/accelerate-skewed-joins

    ReplyDelete
  2. i need some help to effectively apply and fix skew issues, let me know if i canpost my questions.

    ReplyDelete