Teradata
joins
When we
join two or more tables on a column or set of columns, Joining takes place.
This will result in data resulting from matching records in both the tables.
This Universal concept remains the same for all the databases.
In
Teradata, we have Optimizer (a very smart Interpreter), which determines type
of join strategy to be used based on user input taking performance factor in
mind.
In
Teradata, some of common join types are used like
- Inner
join (can also be "self join" in some cases)
- Outer
Join (Left, Right, Full)
- Cross
join (Cartesian product join)
When User
provides join query, optimizer will come up with join plans to perform joins.
These Join strategies include
- Merge
Join
- Nested
Join
- Hash
Join
- Product
join
-
Exclusion Join
Merge
Join
--------------------
Merge join
is a concept in which rows to be joined must be present in same AMP. If the
rows to be joined are not on the same AMP, Teradata will either redistribute
the data or duplicate the data in spool to make that happen based on row hash
of the columns involved in the joins WHERE Clause.
If two tables to be joined have same primary Index, then the records will be
present in Same AMP and Re-Distribution of records is not required.
There are
four scenarios in which redistribution can happen for Merge Join
Case 1: If joining columns are on UPI =
UPI, the records to be joined are present in Same AMP and redistribution
is not required. This is most efficient and fastest join strategy
Case 2: If joining columns are on UPI
= Non Index column, the records in 2nd table has to be redistributed on
AMP's based on data corresponding to first table.
Case 3: If joining columns are on Non
Index column = Non Index column , the both the tables are to be
redistributed so that matching data lies on same amp , so the join can happen
on redistributed data. This strategy is time consuming since complete
redistribution of both the tables takes across all the amps
Case 4: For join happening on Primary Index,
If the Referenced table (second table in the join) is very small, then this
table is duplicated /copied on to every AMP.
Nested
Join
-------------------
Nested
Join is one of the most precise join plans suggested by Optimizer
.Nested Join works on UPI/USI used in Join statement and is used to retrieve
the single row from first table . It then checks for one more matching rows in
second table based on being used in the join using an index (primary or
secondary) and returns the matching results.
Example:
Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP ,
DEPARTMENT DEP
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345; -- this
results in nested join
Hash
join
----------------
Hash join
is one of the plans suggested by Optimizer based on joining conditions. We can
say Hash Join to be close relative of Merge based on its functionality. In case
of merge join, joining would happen in same amp. In Hash Join, one
or both tables which are on same amp are fit completely inside the AMP's Memory
. Amp chooses to hold small tables in its memory for joins
happening on ROW hash.
Advantages
of Hash joins are
1. They
are faster than Merge joins since the large table doesn’t need to be sorted.
2. Since
the join happening b/w table in AMP memory and table in unsorted spool, it
happens so quickly.
Exclusion
Join
-------------------------
These type
of joins are suggested by optimizer when following are used in the queries
- NOT IN
- EXCEPT
- MINUS
- SET
subtraction operations
Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP
WHERE EMP.Enum NOT IN
( Select Enum from
DEPARTMENT DEP
where Enum is NOT NULL );
Please
make sure to add an additional WHERE filter “with <column> IS NOT NULL”
since usage of NULL in a NOT IN <column> list will return no results.
Exclusion
join for following NOT In query has 3 scenarios
Case 1: matched data in "NOT IN"
sub Query will disqualify that row
Case 2: Non-matched data in "NOT
IN" sub Query will qualify that row
Case 3: Any Unknown result in "NOT
IN" will disqualify that row - ('NULL' is a typical example of this
scenario).
Explain types of re-distribution of data happening for joining of
columns from two tables
Answer:
Case 1 - P.I = P.I joins
Case 2 - P.I = N.U.P.I joins
Case 3 - N.U.P.I = N.U.P.I joins
Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case 1 - P.I = P.I joins
Case 2 - P.I = N.U.P.I joins
Case 3 - N.U.P.I = N.U.P.I joins
Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case2 - data from second table will be re-distributed on all amps since joins
are happening on PI vs. NUPI column. Ideal scenario is when small table
is redistributed to be joined with large table records on same amp
Case3 - data from both the tables are redistributed on all AMPs. This is
one of the longest processing queries , Care should be taken to see that stats
are collected on these columns
No comments:
Post a Comment